Semantics is just a fancy word for understanding what things truly mean.
In distributed IT environments, semantic interoperability enables applications to understand the precise meaning of each piece of data that they import, acquire, retrieve, and otherwise receive from elsewhere. Without a transparent view into the semantics of externally originated content, applications cannot know how to validate, map, transform, correlate, and otherwise process that information without garbling its meaning.
Semantic interoperability is and always has been one of the principal tasks in real-world integration projects. Typically, it requires sweat equity by business analysts and data architects, who must define mappings to ensure that meaning is not lost or misconstrued when data is transformed to the requisite schemas of target applications. This can be a complex, error-prone exercise, because separate application domains often use different data syntaxes, schemas, and formats to describe semantically equivalent entities, such as a particular customer’s various records or a specific product’s multifarious descriptions.
Complicating the integration process is the fact that application domains rarely describe their semantics—in other words, the entity-relationship conceptual models that inform their data structures—in any formal or consistent way. Furthermore, relational data structures can be frustratingly opaque to developers who are trying to associate a complex set of linked tables with a coherent, business-level conceptual model. Integration specialists must often infer semantics from sketchy documentation, and then create cross-application data mappings that are based on those inferences.
What is the Semantic Web?
In an ideal world, semantics standards would be implemented universally, thereby accelerating, automating, and tightening semantic integration among heterogeneous environments.
Semantic Web refers to a long-running industry initiative that is working toward this ambitious goal. The vision of a Semantic Web has been percolating within the service-oriented architecture (SOA) community since the 1ate 1990s. It has been promoted primarily by World Wide Web (WWW) inventor Tim Berners-Lee. And it continues to be developed through a formal activity of the World Wide Web Consortium (W3C), which Berners-Lee heads.
At heart, Semantic Web is a vision for how the WWW should evolve to realize its full potential (indeed, some industry observers have taken to calling it “Semantic SOA” or “Web 3.0”). Since its birth in the early 1990s, the WWW has transformed the Internet into an open book that—through common interoperability standards such as HyperText Transfer Protocol (HTTP), HyperText Markup Language (HTML), and Extensible Markup Language (XML)—allows content everywhere to be available, readable, searchable, and comprehensible to human consumers. The Semantic Web initiative extends that concept to include non-human consumers. Organizations can implement W3C-developed semantics standards—such as Resource Description Framework (RDF) and Web Ontology Language (OWL)--to make the meaning of content unambiguously comprehensible to services, applications, bots, and other automated components.
Nevertheless, people vary widely in how they interpret the scope of the Semantic Web initiative, and the market is swarming with a wide range of projects, products, and tools that implement different variants of this vision. In the broadest perspective, Semantic Web may be understood as referring to an all-encompassing metadata, description, and policy layer that enables universal, automatic, comprehensive end-to-end interoperability across every macro or micro entity—including data, components, services, applications, and services—on every conceivable level. At its most down-to-earth, though, Semantic Web is usually construed as the ability to associate structured data with controlled, application-domain-specific conceptual models known as “ontologies.”
The potential benefits of semantic interoperability fall into several application domains:
- Enterprise content management (ECM): Semantic approaches can support more powerful discovery, indexing, search, classification, commentary, and navigation across heterogeneous stores of unstructured and semi-structured content. Semantic search—driven by concepts, not mere text strings--is regarded by many as the potential killer application of Semantic Web technology. Indeed, many Semantic Web vendors are primarily implementing the technology in search engines that leverage ontology-based concepts to improve search accuracy and reduce spurious hits.
- Enterprise information integration (EII): Semantic approaches enable consolidated viewing, query, and update of structured data that has been retrieved from diverse sources. Indeed, most commercial EII environments present an abstract semantic layer that mediates access to heterogeneous data, such as enterprise resource planning (ERP) and customer relationship management (CRM) applications, converging it all to a common presentation-side schema. A handful of those EII vendors—including BEA and Red Hat/MetaMatrix--have begun to support Semantic Web standards, primarily through third-party software plug-ins.
- Enterprise service bus (ESB): Semantic approaches can facilitate multilayered application, process, and service interoperability across disparate environments. To date, there has been little production implementation of Semantic Web standards in the ESB arena, though vendors such as Telcordia Technologies have adopted semantics, ontologies, and RDF to describe the conceptual models implemented by application endpoints, agents, and intermediary nodes within ESB-like middleware approaches such as event stream processing (ESP).
What are the Principal Standards and Approaches for Implementing the Semantic Web?
On the standards front, the Semantic Web vision is starting to bear fruit, slowly but inexorably.
In the past year, there has been an upsurge in industry attention to the W3C’s Semantic Web activity, due in part to the growing realization that SOA-based interoperability demands attention to semantics issues. To date, W3C-developed Semantic Web specifications—most notably, RDF and OWL—have begun to gain significant traction in commercial products. Startups continue to emerge, offering ontology modeling tools, inference engines, RDF repositories, and other necessary components of Semantic Web solutions. And more and more users are incorporating semantics-based approaches in their search, text analytics, ECM, EII, and other mission-critical applications.
At the heart of Semantic Web environments is the notion of ontologies, which are conceptual models comprising entity-relationship statements that have been expressed in a “knowledge representation language.” For Semantic Web, the principal knowledge representation language is RDF, which is an official W3C Recommendation. RDF uses XML to define a rich data model, syntax, and vocabulary for the exchange of machine-understandable ontologies about URI-designated resources. Within an RDF ontology, statements consist of well-defined “subjects,” “predicates,” and “objects.” For example, in the statement “This BCR article has an author whose value is James Kobielus,” the subject is “This BCR article,” the predicate is “has an author,” and the object is “whose value is James Kobielus.” Under RDF notation, each of these “nodes” is designated with its own unique URI, and a syntactically complete statement can be created by concatenating subject, predicate, and object node URIs into a single structure called an “RDF triple.”
RDF is the core specification in a growing range of Semantic Web standards and specifications under W3C, including:
- OWL: This specification, which is an official W3C Recommendation, extends RDF to support richer description of resource properties, classes, relationships, equality, and typing.
- SPARQL Query Language for RDF: This specification, which is currently a W3C Candidate Recommendation, leverages XQuery and XPath to support queries across diverse RDF data sources.
- Gleaning Resource Descriptions from Dialects of Languages (GRDDL): This specification, which is currently a W3C Candidate Recommendation, specifies how an XML document can be marked up to declare that it includes RDF-compatible data and also to specify links to algorithms--typically represented in Extensible Stylesheet Language Transformations (XSLT)--for extracting this data from the document.
- Semantic modeling: In this scenario, developers explicitly model semantics as RDF/OWL ontologies, and/or as such related logical structures as taxonomies, thesauri, and topic maps. The ontologies are used to drive creation of structured content that instantiates the entities, classes, relationships, attributes, and properties defined in the ontologies. This is the classic model of greenfield development of application data under the Semantic Web paradigm.
- Semantic mediation: In this scenario, developers explicitly model semantics as RDF/OWL ontologies, and use the ontologies to drive the creation of mappings, transformations, and aggregations among existing, structured data sets. This describes the typical use of Semantic Web approaches within heterogeneous EII and other data integration environments.
- Semantic mining: In this scenario, developers use natural-language processing (NLP) and pattern-recognition tools to extract the implicit semantics from unstructured text sources. The extracted entities, relationships, facts, sentiments, and other artifacts are used to fashion RDF/OWL ontologies that drive the creation of indices, tags, annotations, and other metadata that layer a consistent semantic structure across the various items within an unstructured text store. This describes the typical use of Semantic Web in search and text mining/analytics environments.
- Semantic tools: Application developers require a broad range of tools to help them work with ontologies, taxonomies, thesauri, topic maps, and other semantic constructs. Developers need tools to discover, query, browse, analyze, visualize, model, design, edit, classify, and annotate semantic constructs. They also need tools to map among dissimilar ontologies, define transformation rules, and attach descriptive tags and metadata. Tools should support semantics development by individual developers or collaborative teams. And semantics tools should integrate with Eclipse and other common development platforms, and support visual development in Unified Modeling Language (UML) and other modeling frameworks.
- Semantic engines: Application environments require runtime components to mediate interactions among semantic-aware components, and also to interface with legacy systems. Runtime semantic engines should support such functions as validating ontologies against standards; matching, mapping, transformation, correlation, and merging of data to conform with standard ontologies; and inference-based extraction of implicit ontologies from unstructured text sources. Semantic inference engines should support deterministic mapping across ontologies, as well as fuzzy equivalence-matching between extracted entity-relationship models and concepts specified in formal ontologies.
- Semantic repositories: Application environments require repositories or libraries to manage ontologies and other semantic objects, and also to maintain the rules, policies, service definitions, and other metadata to support life-cycle management of application semantics. Semantic repositories should support storage, synchronization, caching, access, import/export, registration, archiving, backup, and administration of ontologies and the data that instantiate those ontologies. The most prevalent semantic repositories are “RDF-triple store” databases.
- Semantic controls: Application environments require that various controls—on access, change, versioning, auditing, and so forth—be applied to ontologies (otherwise, it would be meaningless to refer to ontologies as “controlled vocabularies”). Controls might be enforced at the repository-, engine-, and/or tool levels. Developers might be constrained by the corporate-standard semantic tool to only use particular standard ontologies, which could vary depending on the type of application or project on which they’re working. To the extent that developers work in teams, the semantic-application development tool might provide a role-based workflow to structure interactions in accordance with best practice.
In the marketplace, the Semantic Web community is spawning a expanding group of promising startups, as well as some tentative commitments by larger, established software vendors.
It’s no surprise that academic research institutions and open-source communities play a substantial role in catalyzing the development of the Semantic Web. Coordinating semantics projects are such communities as Advanced Knowledge Technologies, Digital Enterprise Research Institute, Gnowsis, Rx4RDF, and SemWebCentral.
As befits an embryonic market pushing a bleeding-edge technology, many Semantic Web vendors are in fact consultants pursuing ontology-based projects in ECM, EII, ESB, and other areas. In fact, many Semantic Web vendors are attempting to jumpstart a self-sustaining software business from a handful of consulting jobs. Still, there are many semantics firms that make their living primarily from consulting and other professional services engagements. These firms include Articulate Software, Business Semantics, EffectiveSoft, Mindful Data, Pragati Synergetic Research, Semantic Arts, Semantic Light, Taxonomy Strategies, and Zepheira.
As noted earlier, many software vendors are seeking the low-hanging commercial fruit of semantic search. The growing list of semantic search engine vendors includes Aduna, AskMeNow, ChaCha, Cognition Technologies, Copernic, Endeca, FAST Search and Transfer, Groxis, Hakia, Intelliseek, ISYS Search Software, Jarg, Metacarta, Ontosearch, Powerset, Readware, Semaview, Siderean, Syntactica, Textdigger, Vivisimo, and ZoomInfo. Most of these vendors rely heavily on NLP, pattern-matching, and text analytics to power the semantics-aware crawlers that they deploy to extract ontologies from unstructured text throughout the Web, intranets, and other content collections.
Just as important, Semantic Web pure-play vendors have come into their own. Dozens of vendors offer flexible, sophisticated solutions that can support a wide range of semantics-aware applications in addition to search. Pure-plays in this space include Access Innovations, Axontologic, Cycorp, Fourthcodex, DATA-GRID, Franz, LinkSpace, Metatomix, Modus Operandi, Mondeca, Ontology Works, Ontopia, Ontoprise, Ontos AG, Revelytix, Sandpiper Software, SchemaLogic, Semagix, Semandex Networks, Semansys, Semantic Insights, Semantic Research, Semantra, Semtation GmBH, Teragram, Thetus, TopQuadrant, Visual Knowledge, Wordmap, and XSB.
Semantic Web vendors vary widely in their functionality, development interfaces, deployment flexibility, and standards support. None of these vendors are staking their success on rapid, universal adoption of the full stack of Semantic Web standards. Instead, they all provide tools, platforms, and applications that can be deployed for tactical, point, quick-payoff IT projects. They address specific business needs with their solutions while enabling customers to integrate semantics solutions to varying degrees with their existing application and middleware infrastructures.
What follows are snapshots of a handful of these vendors, illustrating their diverse backgrounds, approaches, and business models:
- Cycorp: Headquartered in Austin TX, Cycorp develops turnkey solutions in artificial intelligence, knowledge representation, machine reasoning, NLP, semantic data integration, information management, and search. Its Cyc middleware combines an ontology (which has been placed in the public domain) with a knowledge base, inference engine, natural language interfaces, and semantic integration bus. The vendor offers a no-cost license to its semantic technologies development toolkit to the research community. In the Semantic Web arena, Cycorp is doing R&D into scenarios in which end users create lightweight local ontologies that are subsequently elaborated, enriched, and mapped to more formal global ontologies by semantic inference engines.
- Sandpiper Software: Headquartered in Los Altos CA, Sandpiper Software provides semantics tools, consulting, and training. Its Visual Ontology Modeler (VOM) 1.5 tool supports component-based ontology modeling through frame-based knowledge representation. VOM, an add-in to IBM Rational Rose, leverages UML to capture and represent knowledge unambiguously. VOM supports RDF/OWL-based modeling of domain, interface, process, and user ontologies. As a subscription service, Sandpiper also offers the Medius Ontology Library, which extends VOM’s bundled ontology libraries to include application-specific ontologies plus utility ontologies for national, international, and general metadata standards.
- SchemaLogic: Headquartered in Kirkland WA, SchemaLogic provides an SOA-based business semantics middleware suite, as well as semantics consulting and training services. The company’s SchemaLogic Enterprise Suite includes server components that gather, create, refine, reconcile, and distribute ontologies, taxonomies, tag libraries, and other semantic metadata to subscribing applications over a real-time pub-sub integration fabric. The suite includes a governance layer that supports collaborative, Web-based participation and feedback by users and subject matter experts in the creation and refinement of business semantics. Collaborative semantic governance may span organizational boundaries, with the resultant semantic artifacts capable of being propagated automatically to third-party search engines, content management applications, portals, and other systems. For example, customers can use SchemaLogic Enterprise Suite to synchronize content categories and descriptions across distributed deployments of Microsoft Office SharePoint Servers.
- TopQuadrant: Headquartered in Alexandria VA, TopQuadrant is a software vendor that provides an open Java-based platform for development of Semantic Web applications. The TopBraid Suite includes tools and components for building ontologies; developing inference rules and SPARQL-based queries; collaboratively creating and browsing RDF-enabled content; extracting semantics from various data sources via GRDDL and other interfaces; mediating between RDF/OWL and other formats; displaying rich model-driven user interfaces; configuring and orchestrating semantic inference operations; and storing ontologies in third-party RDF triple-store databases. The suite supports browser-based access, collaborative semantic governance, and ontology-based search.
- Modus Operandi: This vendor’s Wave Semantic Data Services Layer product integrates with BEA’s EII solution--AquaLogic Data Services Platform (ALDSP)—via RDF/OWL ontologies. In so dong, it enables semantic integration of information across diverse, dispersed corporate applications, databases, and data warehouses. It supports user-driven ad-hoc semantic search and query, relying on ontologies to reconcile semantic conflicts among heterogeneous data. It also incorporates runtime services to crawl and index data services, to visualize the integrated data, and to monitor data services status. Modus Operandi’s ontology development tool can be launched from within BEA WebLogic Workshop, and can also import any standard OWL ontology developed in external tools. The tool deploys Wave semantic data services directly to ALDSP running on BEA’s WebLogic Server.
- Revelytix: This vendor’s MatchIT integrates with the semantic data services layer in Red Hat/MetaMatrix’s EII environment. MatchIT supports automated semantic mapping to help domain experts reconcile, map, and mediate semantics across heterogeneous environments via RDF/OWL ontologies. It provides an extensible ontology development tool that implements various sophisticated algorithms for determining semantic equivalence.
- Oracle: Released in July 2005, Oracle Spatial 10g Release 2 provides a data management platform for RDF-based applications, supporting new object types to manage RDF data in Oracle. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types. The Oracle 10g RDF database ensures that application developers benefit from the scalability of the Oracle database to deploy scalable semantic-based enterprise applications. Metatomix, Ontoprise, and TopQuadrant have all announced support for Oracle Spatial 10g Release 2.
- IBM: Downloadable from vendor’s AlphaWorks site, IBM Integrated Ontology Development Toolkit supports storage, manipulation, query, and inference of ontologies and corresponding data instances. It includes an ontology definition metadata model, workbench, and repository. Its metamodel is a runtime semantics library that is derived from the OMG's Ontology Definition Metamodel (ODM) and implemented in Eclipse Modeling Framework (EMF). The Java-based workbench enables RDF/OWL ontology building, management, visualization, parsing, and serialization, plus transformation between RDF/OWL and other data-modeling languages. The repository, Minerva, is a high-performance DBMS optimized for OWL ontology storage, inference, and query, implementing a subset of SPARQL.
Even with all of this industry activity, the Semantic Web market is still far from mature. First off, RDF, OWL, and kindred W3C specifications have not exactly taken the SOA world by storm. One could not name a single pure-play vendor of Semantic Web technology that’s well-known to the average enterprise IT professional. And rare is the enterprise IT organization that’s looking for people with backgrounds in or familiarity with Semantic Web technologies. This remains a young, highly specialized niche in which academic research projects outnumber commercial products, and in which most products are point solutions rather than integrated features of enterprise databases, development tools, and application platforms. As noted above, no EII vendor has natively integrated Semantic Web specifications, and neither Oracle nor IBM has ventured much beyond their initial tentative forays into this new arena.
Commercial progress on the Semantic Web front has been glacial, at best, with no clear tipping point in sight. It’s been eight years since RDF was ratified by W3C, and more than three years since OWL spread its wings, but neither has achieved breakaway vendor or user adoption. To be fair, there has been a steady rise in the number of semantics projects and start-ups, as evidenced by growing participation in the annual Semantic Technology Conference, which was recently held in San Jose CA. And there has been a recent resurgence in industry attention to semantics issues, such as the recent announcement of a “Semantic SOA Consortium” involving Science Applications International Corporation (SAIC), and others. Some industry observers have even attempted to rebrand Semantic Web as “Web 3.0,” so as to create the impression that this is a new initiative and not an old effort straining to stay relevant.
Surprisingly, the SOA market sectors that one would expect to embrace the Semantic Web have largely kept their distance. In theory, vendors of search, ECM, EII, ESB, business intelligence (BI), database management systems (DBMS), master data management (MDM), and data quality (DQ) solutions would all benefit from the ability to automatically harmonize divergent ontologies across heterogeneous environments. But only a handful of vendors from these niches has taken a visible role in the Semantic Web community, and even these vendors seem to be taking a wait-and-see attitude to it all. One big reason for reluctance is that there are already many established tools and approaches for semantic interoperability in the SOA world, and the new W3C-developed approaches have not yet demonstrated any significant advantages in development productivity, flexibility, or cost.
One of the leading indicators of any technology’s commercial adoption is the extent to which Microsoft is on board. By that criterion, the Semantic Web has a long way to go, and may not get to first base until early in the next decade, at the very least. The vendor’s ambitious roadmap for its SQL Server product includes no mention of the Semantic Web, ontologies, RDF, or anything to that effect. So far, the only mention of semantic interoperability in Microsoft’s strategy is in a new development project codenamed “Astoria.” Project “Astoria,”, which was announced in May at Microsoft’s MIX conference, will support greater SOA-based semantic interoperability on the ADO.Net framework through a new Entity Data Model schema that implements RDF, XML, and URIs. However, Microsoft has not committed to integrating “Astoria” with SQL Server, nor is it planning to implement any of the W3C’s other Semantic Web specifications. Essentially, “Astoria” is Microsoft’s trial balloon to see if a Semantic Web-lite architecture lights any fires in the development community.
Clearly, there is persistent attention to semantic interoperability issues throughout the distributed computing industry. Microsoft is certainly not the only SOA vendor that is at least pondering these issues on a high architectural plane. Over the remainder of this decade, most major SOA, EII, DBMS, and BI vendors are going to make some strategic acquisitions in the Semantic Web community. Increasingly, leading enterprise platform, application, and tool vendors will integrate ontologies, inference engines, RDF-triple stores, and other semantics components and interfaces into their solutions.
But it may take another decade before the likes of IBM, Oracle, Microsoft, SAP, and other leading enterprise software vendors fully integrate semantics into all of their solutions. Until such time, we must continue to view the Semantic Web as an exciting but immature work in progress.
Original publication date: October 2007, Business Communications Review.
Author's note: I've posted this here because I'm tired of telling people about this great article I wrote on Semantic Web a couple of years ago for some now-defunct publication that practically nobody read. That wasn't very long ago and this still holds up quite well. Judge for yourself. Same principle applies with my poetry; it's more important to make your own audience than wonder why one never materializes. Better to self-publish than forever perish.
VENDORS MENTIONED IN THIS ARTICLE:
Access Innovations: http://www.accessinn.com/
Advanced Knowledge Technologies: http://www.aktors.org/akt/
Articulate Software: http://www.articulatesoftware.com/
Business Semantics: http://www.businesssemantics.com/
Cognition Technologies: http://www.cognition.com/
Digital Enterprise Research Institute: http://www.deri.ie/
FAST Search and Transfer: http://www.fastsearch.com/
ISYS Search Software: http://www.isys-search.com/
Mindful Data: http://www.mindfuldata.com/
Modus Operandi: http://www.modusoperandi.com/
Ontology Works: http://www.ontologyworks.com/
Ontos AG: http://www.ontos.com/de/company/index.php
Pragati Synergetic Research: http://www.pragati-inc.com/index.html
Sandpiper Software: http://www.sandsoft.com/
Semandex Networks: http://www.semandex.com/
Semantic Arts: http://www.semanticarts.com/
Semantic Insights: http://www.semanticinsights.com/
Semantic Light: http://www.semanticlight.com/
Semantic Research: http://www.semanticresearch.com/
Semtation GmBH: http://www.semtation.de/
Taxonomy Strategies: http://www.taxonomystrategies.com/
Taxonomy Warehouse: http://www.taxonomywarehouse.com/
Telcordia Technologies: http://www.telcordia.com/
Visual Knowledge: http://www.visualknowledge.com/index.html