Monday, April 30, 2007

imho Ocean Semantic..


I don't know if I ever mentioned this. Day in and day out, I cover the data management industry for Current Analysis.

One of the key segments of my coverage area is data integration (DI). And within that is a broad space called "enterprise information integration" (EII), which is often contrasted with "extract transform load" (ETL). So as not to bore you with unnecessary distinctions, EII primarily deals with logical dynamic integration of heterogeneous data across dispersed repositories, whereas ETL deals with physical integration of that data into a common, persistent data store called a data warehouse. That's slightly oversimplifying the matter, but please indulge for a moment. There is a payoff.

Principal vendors of EII solutions include IBM, BEA, Business Objects, Informatica, Sybase, Actuate, Composite Software, Ipedo, Inetsoft, and MetaMatrix (note: the latter vendor is in the process of being acquired by Red Hat and having its EII software open-sourced under In the EII space, most of the vendors offer solutions that incorporate what they often call a "semantic layer." Here now's a smattering of what is often included under the discussion of a "semantic layer":
  • data management layer
  • efficiently and reliably enables federated access to a wide range of heterogeneous data sources
  • support enterprise implementation of virtualized, composite, unified views of disparate data that has been retrieved from heterogeneous sources, including ERP systems, line-of-business applications, transactional databases, and Web services
  • federated data services and metadata management
  • enables data to be accessed from disparate data sources and used in any form in any application.
  • organizations can reduce the application development and integration costs associated with accessing and reconciling disparate data, while improving its overall utilization and consistency
  • resolves data access challenges and the physical and semantic differences among disparate, physical data sources.
  • provides a semantic-interoperability data services layer that decouples applications from their data sources and makes data assets available as services in an SOA, freeing data from single application silos.
  • instead of managing multiple data sources for different applications and trying to keep them reconciled with one another, users can take any data from any data source and use it with any application.
  • developers and architects create, deploy, and manage data services that access, transform, integrate, and aggregate data to provide the information needed by applications while hiding the complex details of diverse physical data sources
  • alleviates the need to create yet another new copy of the data
  • simultaneously provides mechanisms for data consistency, security and compliance
  • through a model-driven approach, application teams can create, deploy, and manage data services that simplify data integration.
  • enables users to access a single view of their data from multiple disparate systems
  • performs the necessary semantic mediation and vocabulary management to get the data into the right form - all without software programming.
All of which seems to be core to most industry discussions of "semantic Web." How is this all not the "semantic Web"? Is it all not here and now, and eminently feasible, thanks to the EII market? Do any of these commercial solutions depend on any of the core specs (i.e., RDF, OWL) usually associated with the W3C's flavor of "semantic Web"? (answer: no). Does Red Hat's decision to acquire MetaMatrix, open-source its EII technology, and bundle it with the JBoss Enterprise Middleware Suite represent a critical step toward making SOA-enabled EII (i.e, semantic Web) ubiquitous? (answer: you betcha).

And where do "ontologies" and "federation" fit into this picture?

More to come.