Wednesday, June 27, 2007

imho Ocean Semantic………….


Semantics is just a fancy word for understanding what things truly mean.

Semantic Web is the “identity of things” taken to its logical extreme. I keep coming back to that thought. In this discipline, the core question is: Do two (or more) phenomenologically distinct content instances refer, ontologically, to the same "thing" (aka "subject" or "entity")? And Semantic Web's core grammar is, of course, RDF, which is built on the notion that we can define meaningful ontological statements as consisting of discrete “subjects,” “predicates,” and “objects,” and that each of those "parts of speech" (my term) is itself a thing that can be given its own unique identity, designated with a URI, within an RDF triple. Every source or target content thing/subject can have its own identity/URI, as can every attribute/predicate-value of that thing/subject. In the process of determining semantic equivalence between two phenomenologically distinct semantic content instances (i.e., things such as customer records from separate applications or databases), our inference engines resolve them to a single thing defined under a common, shared ontology (defined in RDF/OWL). In other words, resolve (match, merge, reconcile) distinct things down to a single unique name—same semantics means, memetically, the same meaningful things, heteronymously, are tamed to assume the same names—Plus ça change, plus c'est la même chose.

That’s the Semantic Web (it’s also the core function of the data quality space—in, which, near as I can tell, the only vendor doing semantic web at this moment is Silver Creek Systems). Now, here’s something I wrote in this blog on February 10, 2005, in a different context (referencing ID Dataweb architectures built on XRI—a URI-based identification scheme):

“[W]hat the heck does the ‘identity of things’ refer to? On one level, it sounds like some metaphysical plane of existence, some mythical spirit world, some platonic ideal, like the ‘secret life of plants’ or the ‘lifestyles of the rich and famous.’ Like animism: the identities/souls of the inanimate starstuff from which we’re all, magically, composed….

broad scope of the term, in terms of concrete, real-world, commercial technical approaches, such as IP addressing, RFID, and ID dataweb. …..

That’s one of the big problems with the ‘identity of things.’ There are just too many ‘things’ in the universe. Try giving every star in the sky its own unique name, including the billions upon billions embedded in galaxies, and don’t forget to give each of the countless galaxies their own unique names. After identifying every discrete point of light uniquely, now try storing and managing all those names (plus the associated descriptive attributes of each star) in some master directory database in the sky. Clearly, the directory itself would have sufficiently massive gravitation to form its own black hole, sucking all of the named ‘objects’ in the universe down into some freaky meta-universe, never to be heard from again. ….

ID dataweb—aka federated resource sharing environments built on emerging Web services standards, especially Extensible Resource Identifier (XRI) and XRI Data Interchange (XDI). …

ID dataweb (actually, there are many synonyms for this emerging space—I’m partial to ‘federated resource sharing’) is an approach under which every data element in every database can conceivably be given a unique, fine-grained identifier—thanks to XRI, which is backward-compatible with the URI/URN naming scheme that has achieved ubiquity on the Web……the World Wide Web was built on the ‘identity of things’ (aka pages, scripts, etc.), leveraging URI, DNS, and IP. …

ID dataweb is an environment within which autonomous data domains can choose to selectively grant fine-grained data-access rights to external parties—and unilaterally rescind those rights. It leverages the identity federation and trust infrastructure being implemented everywhere through open standards such as WS-Security, SAML, Liberty Alliance, and others. It’s a standards-based flexible way of securely setting up and managing as-needed data-integration connections between autonomous organizations. Such as manufacturers, suppliers, distributors, and other participants in a supply chain. Or financial services firms engaging in dynamic partnering on equities underwritings. And so forth. Data integration/exchange/transfer is one of the principal tasks in any B2B collaborative-commerce partnering…..

Here’s an issue that the ID dataweb community must grapple with: As organizations expose/share/protect more of their fine-grained data resources through XRI/XDI, how are they going to manage the massive databases underlying the humongous ‘directories of things’ that result."

Maybe we should call it “thing-centric identity,” to adapt a phrase from my just-previous multi-month multi-post meditation. Is an RDF triple store the nucleus of that "directory of things"? How big will triple stores need to grow to encompass the universe of semantic things? At some point, will these stores grow so large as to mash it all gravitationally, resolve it all ontologically, down into a semantically massive and mighty thingularity?

More to come.


Tuesday, June 26, 2007

imho Ocean Semantic............



meaning is modeled
mined and mashed, coming somehow,
emerging among.


More to come.


Sunday, June 24, 2007

imho Ocean Semantic………..


Now, it’s clear from my research that the Semantic Web community cannot be neatly split into two camps. If you look at the work ongoing at those “pure plays” in my previous post, much of it spans both the “SOA Semantic Web” and “Social Semantic Web.”

In other words, today’s Semantic Web vendor-community ferment combines bits of the former approach (explicit modeling, controlled vocabularies, deterministic mediation) with elements of the latter (implicit modeling, uncontrolled vocabularies, probabilistic mediation). Or, if you will, the leading edge blends the onto-taxo school of top-down meaning modeling (i.e., taxonomies, RDF, OWL, etc.) with the linguo-extracto strain of bottom-up meaning mining (i.e., NLP, text analytics, etc.).

At a coarse level, my semantics maturity model is mapping well to the varied architectures/approaches/tooling of these dozens of providers, but I’m not trying to force-fit anything to that framework. I'm still trying to work through the practical distinctions between "semantic integration" and "semantic quality" components (where do "inference engines " fall?). Rather, I’m trying to understand everything on its own terms, and carefully, painstakingly work my way toward a grand re-synthesis.

Sorting through the field of Semantic Web “pure plays” from my previous list, I realize now that it was a bit too inclusive. In that quick/handy list, I lumped semantics academic research programs (e.g., Advanced Knowledge Technologies, Digital Enterprise Research Institute), semantics open source communities (e.g., Liminal Systems, SemWebCentral), semantics consulting shops (e.g., Articulate Software, Business Semantics, Mindful Data, Pragati Synergetic Research, Semantic Arts, Semantic Light, Taxonomy Strategies, Zepheira), and a semantics broker (e.g., Taxonomy Warehouse). Some of the names on that list left me scratching my head wondering whether they actually exist or what the heck they actually do (e.g., MetaWeb, Ontologent, Ontomantics, Semaview, VivoMind Intelligence). Many vendors are not semantics specialists, but, rather incorporate semantics features into AI tools, content/document management, identity/security management solutions, graphics, desktop productivity applications, or other products (e.g., Crystal Semantics, ExpertMaker, Franz, Garlik, LinkSpace, Semtation, WordMap). And, of course, there are the many semantic search vendors (e.g., Aduna, AskMeNow, Cha-Cha, Cognition Technologies, Conversa, Copernic, Endeca, FAST Search and Transfer, Google, Groxis, Hakia, Intelliseek, ISYS Search Software, Metacarta, Ontosearch, Powerset, Readware, Textdigger, Vivisimo, ZoomInfo).

I’m not doubting that they all have unique and innovative approaches and so forth, but I’m looking for pure-play solution vendors that provide semantics tools/platforms to support a broad range of applications. With that as criteria, I’ve boiled my short list down to a still-unwieldy twenty-three. They are Axontologic, Cycorp, Fourthcodex, Gnowsis, Metatomix, Modus Operandi, Mondeca, Ontology Works, Ontopia, Ontoprise, Ontos AG, Revelytix, Sandpiper Software, Semagix, Semandex Networks, Semansys, Semantic Insights, Semantic Research, Semantra, Siderean, Thetus, TopQuadrant, and XSB.

One sweet little payoff for me so far in my research is seeing the early footprint of Semantic Web technology into the enterprise information integration (EII)—aka data federation—space. If you go back to the second post in this ongoing thread (i.e,. the one in which there are a mere two dots after “imho Ocean Semantic”), you’ll see that this meandering inquiry began with a few burning (rhetorical and non-rhetorical) questions:

  • How is [SOA-enabled EII solutions’ semantic-abstraction layers from such vendors as IBM, BEA, Business Objects, Informatica, Sybase, Actuate, Composite Software, Ipedo, Inetsoft, and MetaMatrix] not the Semantic Web?
  • Do any of these commercial solutions depend on any of the core specs (i.e., RDF, OWL) usually associated with the W3C's flavor of Semantic Web?
  • Does Red Hat's decision to acquire MetaMatrix, open-source its EII technology, and bundle it with the JBoss Enterprise Middleware Suite represent a critical step toward making SOA-enabled EII (i.e, semantic Web) ubiquitous?

Funny you should ask, Jim. As it turns out, check out Revelytix, which has provided a Semantic Web layer for Red Hat/MetaMatrix’s EII environment, to wit:

  • “[Revelytix] MatchIT, a component of the MetaMatrix Semantic Data Services product, provides automated semantic mapping technology to aid domain experts in more quickly reconciling the semantics across a dispersed information environment. MatchIT, an extensible ontology-driven tool using RDF and OWL, implements a variety of sophisticated algorithms for determining semantic equivalence. It leverages the Semantic Data Services defined within the MetaMatrix designer to aid in more rapid deployment of a mediation solution by automatically exposing potential semantic matches.”

Also, check out Modus Operandi, whose solution does something similar for BEA’s EII solution (AquaLogic Data Services Platform)—viz:

  • “[Modus Operandi] Wave Semantic Data Services Layer gives discoverable meaning to data by linking data services to an ontology. Wave enhances the BEA AquaLogic Data Services Platform with the tools to semantically integrate information across the enterprise. A Wave data services layer in a SOA supports flexible, user-driven ad hoc queries and semantic search….Wave makes use of an ontology (or conceptual model) to unify and resolve semantic conflicts among data sources. At runtime, the Wave web service provides the data service layer’s API (Application Programming Interface) for discovering, querying, and searching the integrated information via the ontology. Wave also includes runtime services to crawl and index data services, to visualize the integrated data, and to monitor data services status….Launched from a BEA WebLogic Workshop menu, the Wave Importer transforms an OWL file to data service templates that map directly to classes and properties found in the ontology. You can use any ontology development environment that produces a standard OWL output. Wave semantic data services are activated by deploying to the WebLogic Server.”

Interestingly, none of the EII vendor provides this Semantic Web capability themselves yet. All rely on third-parties to provide it through add-ons.

All of which underlines my point about the Semantic Web space being several years from maturity. And all of which indicates that a lot of major EII vendors are going to make some strategic acquisitions in the Semantic Web community before long. This technology must be integrated into enterprises’ basic data services platforms before the Semantic Web can be truly ubiquitous.

More to come.


Saturday, June 16, 2007

imho Ocean Semantic..........


Here's a quick hit of my evolving taxonomy of the SOA Semantic Web market, based on ongoing navigation of the ocean semantic.

First off, it's clear to me that the primary use cases for the SOA Semantic Web so far--in terms of vendor activity and enterprise deployment--are in:
  • (semantic) search
  • (semantic) text mining/analytics
  • (semantic) content and knowledge management
  • (semantic) enterprise information integration
I put the parentheses around (semantic) for a reason: this (i.e., ontologies, RDF, OWL, etc.) is a new approach that established vendors in those segments are pursuing, plus a growing range of pure-plays. Semantic-oriented search is the hottest of all the segments, judging by the number of startups and others pitching product/service right's a quick list: Aduna, AskMeNow, Cha-Cha, Cognition Technologies, Conversa, Copernic, Endeca, FAST Search and Transfer, Google, Groxis, Hakia, Intelliseek, ISYS Search Software, Metacarta, Ontosearch, Powerset, Readware, Textdigger, Vivisimo, and ZoomInfo.

Well, not so quick a list...but mind-blowing, considering how many of these companies weren't around this time last year....or may be defunct by this time next year....or swept up in furious M&A activity...or huge and rich beyond believe....or still waiting for their ship to come in.

But even more mind-blowing is the current, active list of SOA Semantic Web pure-plays that are more than point solutions for one or more semantic use cases (e.g., search)...but are providers of the ontology modeling tools, ontology inference servers, ontology repositories, and other underlying semanto/onto/taxo componentry for a broad swath of use cases--to wit:

Of course, they're all quite different from each other in their strategies, competencies, solution portfolio, partnerships, target markets, funding, prospects for success, etc. In terms of how their solutions map to my SOA Semantic Web maturity model....well....the mapping is going on as I speak (perhaps even as I sleep...can't get this topic out of my head...not even for a night...this endless notion keeps on sloshing from ear to shining ear).

More to come.


Friday, June 15, 2007

imho Ocean Semantic.........


I've been putting together an SOA Semantic Web maturity model. I'm trying to create a reference framework that can help me sort through the confusion, complexity, and diversity of solutions/components/tools in this market.

In developing the framework, I've been working from a basic principle: SOA. In other words, SOA refers fundamentally to a paradigm that focuses on maximizing the reuse, sharing, and standards-based interoperability of key resources over distributed environments. In an SOA context, then, we can conceive of semantics (of data, services, apps, business processes, etc.) as perhaps the most important resource that must be shared. Hence the "SOA Semantic Web."

I already have a recently developed SOA framework that gets me 90 percent of the way there. It's the master data management (MDM) maturity model that is the conceptual backbone of my MDM market coverage for Current Analysis. You can see how I use that maturity model to compare/contrast MDM vendors' solution sets (e.g., IBM, Oracle, Teradata, TIBCO, SAS/DataFlux etc.) if you go to and subscribe to my Data Management module (hey....I told you I make a living somehow...this is an explicit plug for my bread-and-butter). That MDM maturity model includes an explicit notion of "governance" of this resource (i.e., master data) within a "domain" according to a "domain model." I find these notions essential to understanding how a vocabulary (i.e., ontology) is controlled within a Semantic Web environment.

To some degree, if we use the word "semantic" in place of "data" in the maturity model (and make a variety of other conceptual tweaks to keep it real), we have a useful SOA Semantic Web maturity model. To wit:


• Semantic Integration: These consist of all tools, runtime components, and services needed to retrieve, extract, and move semantic objects (i.e., data and metadata) from origin repositories; parse, validate, mediate, infer (deterministic and/or probabilistic) mappings among the semantic objects; transform the semantic objects; and deliver the semantic objects to target repositories, applications, services, users, and other consumers.

• Semantic Quality: These consist of all tools, runtime components and services needed to discover and profile source semantic objects; validate, mediate, de-duplicate, match, merge and cleanse those objects ; and enhance, enrich and augment it with additional, related objects.

• Semantic Repositories: These consist of all tools, runtime components, and services needed to organize, index, store, query and administer structured semantic objects; consolidate structured semantic objects into subject/topic-oriented, integrated, non-volatile and time-variant repositories under unified governance; and govern its controlled distribution to various target repositories, applications, services, users, and other consumers.

• Semantic Domain Models: These consist of all prebuilt master semantic governance objects (metadata, schemas, ontologies, glossaries, and vocabularies), plus semantic governance infrastructure that a semantic domain authority uses to administer the semantics of a particular process, platform, or other solution domain (e.g., MDM, data warehousing, enterprise content management, enterprise information integration, enterprise service bus, business intelligence) of a horizontal, vertical, B2B, organization-specific, regional, or other deployment scenario.

• Semantic Modeling and Mapping: These consist of all tools necessary to create business and technical definitions of master semantic domain models; discover, author, design, develop, index, query, visualize, browse, modify, version-control, access-control, import/export, and/or cross-reference one or more semantically distinct master data sets; and define and manage hierarchies, mappings and transformations among master semantic objects.

• Semantic Governance: This encompasses all repositories (metadata, ontology, policy etc.); collaboration environments (workflow, task management, exception handling, event-driven alerting, calendar-driven reminders, priority escalation etc.); controls (authentication, authorization, mapping/translation, version, validation, monitoring, auditing etc.); and other tools, components and services necessary to define, approve and administer domain models—including rules governing semantic integration, quality, and repositories--upon which semantic interoperability environments depend. This is sometimes known as a “semantic stewardship” environment.


Yeah...that seems about right....handy....I'm just about to launch into the Semantic Web vendor/product/market survey segment of my research for my upcoming BCR feature article on the topic.

I sorta feel I have a decent enough map of this jungle (though no map marks the quicksand that no doubt is everywhere).

More to come.


Monday, June 11, 2007

imho Ocean Semantic........


I'm not sure if the questions I posed at the end of the previous post (or next one, depending on whether you're reading present to past down from the top, or up, in chronological order, from the bottom) are provocative, or simply stupid: Why doesn't W3C take up defining standards for the Social Semantic Web? And what would those standards be?

Maybe "stupid" is too harsh on my precious self--perhaps "overreaching" is a better word. Why fence the young frontier of the Social Semantic Web by calling for standards prematurely? And why even give this phenomenon a special name all its own, implying that it somehow deserves consideration equal to the OWL-ish Semantic Web stuff being hammered out at W3C? Is there anything truly new going on in all this “folksonomy” and “Web 2.0” stuff that deserves to be considered under the Semantic Web big top?

Universal standards are, of course, the foundation of this thing called the World Wide Web and the SOA universe from which, apparently, the Semantic Web is bursting forth. You’ll recall that I characterized the W3C Semantic Web as the SOA Semantic Web, due to its reliance on the SOA standards (especially the nouveau XML-based Web services standards), while noting that the W3C specs implement some core principles: explicit semantic modeling, controlled semantic vocabularies, and deterministic semantic mediation.

At that point in my analysis, it was a straightforward exercise to point out that some semantics-oriented efforts come down on the opposite ends of each spectrum: implicit semantic modeling, uncontrolled semantic vocabularies, and probabilistic semantic mediation. All of which seems to characterize the chaotic colloquial collaborative linguistic social semantic space we all inhabit on the World Wide Web. Hence, the “Social Semantic Web.”

But doesn’t the very notion of standards call for everything that the Social Semantic Web is not: explicit models of meaning, control over official vocabularies in which meanings are expressed, and clearcut mappings among divergent formulations that express the same underlying meaning? How can standards nail down anything that is inherently implicit, uncontrolled, probabilistic, piggly-wiggly, loosey-goosey…..?

So maybe the notion of standards in this space isn’t feasible. And maybe the notion that we’re actually talking about a new “space” is a tad off the mark. Why give it a new name to imply that something radically new is going on, when--it occurred to me—the whole “Social Semantic Web” is just the good ol’ World Wide Web chugging away at what it’s been doing since the start.

Essentially, the foundation principle of the World Wide Web—and “Web 2.0”--is: any entity can link to, recontextualize, and render commentary on any aggregation of content originated by any other entity anywhere.

That’s what hypertext environments such as the Web are all about. That’s the foundation of HTML, HTTP, URIs, etc. (the most critical standards for the “Social Semantic Web”).

That’s what Web sites and portals do.

Search engines too (human- and/or bot-indexed, based on informal or formal rules that prioritize/classify/contextual all crawlable content with something resembling meaning, relevance, etc.).

Blogs too (on occasion….my “fyi” posts include the link to the kontent on which I’m ostensibly kommenting….my “imho” posts are just me shooting from the hip).

Blogrolls (e.g., hey, if you’ve got nothing better to do browse to these 124 blogs written by people I may have never met or even looked at their posts but they have some general affinity with me hence bolster my claim to being plugged into some cool virtual community that absolutely rules in some virtual sense).

Wikis too (usually….they can also be one entity implicitly commenting on another by totally obliterating that other’s last comment).

Social bookmarking sites for sure (e.g., “digg” these 872 external webpages I like and my sketchy comments and flurry of vague tags explaining why I think they’re individually or collectively worthy of your perusal).

Social networking arenas of all shapes and sizes are cross-commentary cliques par excellence, thick with mutual, sometimes antagonistic, contextualization. Isn't that what a flame war is all about at heart? The nasty side of the Social Semantic Web--the tooth-and-nail fight for heads, hearts, souls, and curly hairs.

Maybe it’s a tad pretentious to refer to this to-and-fro mishmash of chaotic cross-commentary as a “Social Semantic Web,” which implies that something resembling coherent meaning is emerging from the bubbling brew (sometimes it feels more like a Semantic Warp, where you’re more confused coming out than you were going in). To the extent that the Social Semantic Web can precipitate anything of value from the warp, it’s up to each user to navigate the mess, filter the firehose, extract what they find interesting, and synthesize some coherent point to it all. Maybe they’ll lean on their data/text/content mining tools to aggregate, filter, categorize, classify, and render it all for them in pretty pictures that make sense of it all. Or maybe they’ll call for analysts or other smart people who have a knack for standing above the cloud and seeing patterns that others are still having trouble bringing into focus.

Analysts, synthesists, smart people….the pivotal social intermediaries….stitching together the meanings explicit or implicit in any knowledge domain…the key connectors in the human web or any other environment in which individuals must somehow collectively navigate an ocean thick with their own semantic plankton....the world wide warp.

More to come.


Wednesday, June 06, 2007

imho Ocean Semantic.......


Semantic interoperability is the wrong term for this space—or, rather, it’s an OK term, but the word “interoperability” in this context implies that the semantic understandings are purely application-to-application. It implies that the semantic ontologies are meant to be purely machine-readable—hence, RDF/OWL and so forth.

Semantic Web is also the wrong term—or rather, it’s OK too, but the word “Web” implies that, like the World Wide Web, it applies primarily in application-to-person use cases, such as search engines.

But this latter term—Semantic Web—is the one on everybody’s lips, so we’ll have to make our peace with it. It wouldn’t do anybody ('cept jim) any good to rename the space just to suit one analyst. We’ll need to keep emphasizing that Semantic Web refers to all use cases of semantic interoperability: application-to-application, application-to-person, person-to-person, and so forth.

As I survey the vast ocean of Semantic Web-ish activity going on out there, it’s clear to me that only a subset of it is being addressed by the W3C’s Semantic Web activity. Viewed globally, the Semantic Web community divides into two loosely scoped “camps,” with each having its own focus:

  • SOA Semantic Web (i.e., W3C with RDF, OWL, SPARQL, GRDDL, etc., addressing app-to-app and app-to-person semantic interoperability in an SOA/Web 1.0 environment, where the vast majority of the activity addresses the need to surface and make transparent the originator-intended semantics of structured info expressed in XML and other standardized markup syntax):
    • Explicit semantic modeling (i.e, knowledge representation languages that provide app/data developers with formal grammars for expressing the entity-relationship graphs and hierarchies/taxonomies within which structured content is generated, transmitted, and consumed)
    • Controlled semantic vocabularies (i.e., source-domain-asserted ontologies, definitions, tags, metadata, schemas, glossaries, hierarchies, etc. under governance/stewardship of clearly defined domain authorities)
    • Deterministic semantic mediation (i.e., certain semantic correspondence among autonomous semantic domains’ ontologies, via well-defined mapping/transformations among different domains’ vocabularies, per agreed-upon standards, conventions, federation agreements, etc.)
  • Social Semantic Web (i.e., all the user-centric application-to-person and person-to-person “folksonomy” and social networking/bookmarking stuff in the Web 2.0 world, where most of the activity concerns the need to aggregate, classify/cluster, apply third-party tag-based contextualization to, and mine the latent meanings of various structured, unstructured, and media content objects originated by users themselves and/or third-parties, such as media websites, other users, etc.):
    • Implicit semantic modeling (i.e., natural human languages that provide normal human beings with informal/colloquial grammars for expressing themselves within unstructured/semi-structured content objects, from which, through text mining/analytics, those humans’ implicit ontology of entities, classes, relationships, sentiments, etc are extracted, surfaced, etc.; hence semantic “mining” takes precedence over semantic “modeling”)
    • Uncontrolled semantic vocabularies (i.e., target-user-asserted keywords, tags, comments, scores, votes, evaluations, etc. that they apply to any self-originated or third-party-originated content, site, resource, entity, etc., without need for prior agreement or relationship with third-party content originator, and without need for the meaning-asserting target-user to implement any systematic governance/stewardship over the idiosyncratic “vocabularies” or “ontologies” they use to express the meaning, to them, of everything they encounter online; hence, user-idiosyncratic semantic “waywardship” takes precedence over authority-governed semantic “stewardship”)
    • Probabilistic semantic mediation (i.e., uncertain prima-facie semantic correspondence among diverse source-domain-asserted ontologies and user-asserted implicit “ontologies,” and among different user-asserted “ontologies” keyed on any given content/resource, hence the need for fuzzy matching, relevance ranking, inference engines, data/text mining, clustering/classification, and other automated techniques to establish greater confidence in semantic correspondence; and also the occasional need for human content analysis/judgment to deal with all the gray areas where it’s not clear if two or more users or documents or blogs or social networking sites are referring to the same or different things)

There’s a lot going on in the text mining/analytics space right now. That’s going to be a pivotal technology behind the Social Semantic Web. I don’t see much uptake of RDF/OWL—the heart of the SOA Semantic Web—in the social networking world. Not yet. Unless others see something I’m not seeing.

Why doesn't W3C take up defining standards for the Social Semantic Web? And what would those standards be?

More to come.


Saturday, June 02, 2007

imho Ocean Semantic……


Ubiquitous semantic interoperability is like world peace. It’s a goal so grandiose, nebulous, and contrary to the fractious realities of distributed networking that it hardly seems worth waiting for. In most circumstances, we can usually assume that heterogeneous applications will employ different schemas to define semantically equivalent entities—such as customer data records—and that some sweat equity will be needed to define cross-domain data mappings for full interoperability.

Nevertheless, many smart people feel that automated, end-to-end, standards-based semantic interoperability is more than a pipe dream. Most notably, the World Wide Web Consortium’s long-running Semantic Web initiative just keeps chugging away, developing specifications that have fleshed out Tim Berners-Lee’s vision to a modest degree and gained a smidgen of real-world adoption. If nothing else, the W3C can point to the Resource Description Framework (RDF)—the first and most fundamental output from this W3C activity—as a solid accomplishment. Created just before the turn of the millennium, RDF—plus the closely related Web Ontology Language (OWL)--provides an XML/URI-based grammar for representing diverse entities and their multifaceted relationships.

However, RDF, OWL, and kindred W3C specifications have not exactly taken the service-oriented architecture (SOA) world by storm. In fact, you’d be hard-pressed to name a single pure-play vendor of Semantic Web technology that’s well-known to the average enterprise IT professional. And rare is the enterprise IT organization that’s looking for people with backgrounds in or familiarity with Semantic Web technologies. This remains an immature, highly specialized niche in which academic research projects far outnumber commercial products, and in which most products are point solutions rather than integrated features of enterprise databases, development tools, and application platforms.

Part of the problem is that, from the very start, the W3C’s Semantic Web initiative has been more utopian than practical in focus. If you tune into Berners-Lee’s vision, it seems to refer to some sort of supermagical metadata, description, and policy layer that will deliver universal interoperability by making every networked resource automatically and perpetually self-describing on every conceivable level. Alternately, it seems to call for some sort of XML-based tagging vocabulary that everybody will apply to every scrap of online content, thereby facilitating more powerful metadata discovery, indexing, and search. The success of the whole Semantic Web project seems to be predicated on the belief that these nouveau standards will be adopted universally in the very near future.

Needless to say, this future’s been slow to arrive. Commercial progress on the Semantic Web front has been glacial, at best, with no clear tipping point in sight. It’s been eight years since RDF was ratified by W3C, and more than three years since OWL spread its wings, but neither has achieved breakaway vendor or user adoption. To be fair, there has been a steady rise in the number of semantics projects and start-ups, as evidenced by growing participation in the annual Semantic Technology Conference, which was recently held in San Jose CA. And there has been a recent resurgence in industry attention to semantics issues, such as the recent announcement of a “Semantic SOA Consortium.” Some have even attempted, lamely, to rebrand Semantic Web as “Web 3.0,” so as to create the impression that this is a new initiative and not an old effort straining to stay relevant.

But the SOA market sectors that one would expect to embrace the Semantic Web have largely kept their distance. In theory, vendors of search, enterprise content management, enterprise information integration, enterprise service bus, business intelligence, relational database, master data management, and data quality would all benefit from the ability to automatically harmonize divergent ontologies across heterogeneous environments. But only a handful of vendors from these niches—most notably, Oracle, Software AG, and Composite Software—has taken a visible role in the Semantic Web community, and even these vendors seem to be taking a wait-and-see attitude to it all. One big reason for reluctance is that there are already many established tools and approaches for semantic interoperability in the SOA world, and the new W3C-developed approaches have not yet demonstrated any significant advantages in development productivity, flexibility, or cost.

One of the leading indicators of any technology’s commercial adoption is the extent to which Microsoft is on board. By that criterion, the Semantic Web has a long way to go, and may not get to first base until early in the next decade, at the very least. The vendor’s ambitious roadmap for its SQL Server product includes no mention of the Semantic Web, ontologies, RDF, or anything to that effect. So far, the only mention of semantic interoperability in Microsoft’s strategy is in a new development project codenamed “Astoria.” Project “Astoria,”, which was announced in May at Microsoft’s MIX conference, will support greater SOA-based semantic interoperability on the ADO.Net framework through a new Entity Data Model schema that implements RDF, XML, and URIs. However, Microsoft has not committed to integrating “Astoria” with SQL Server, nor is it planning to implement any of the W3C’s other Semantic Web specifications. Essentially, “Astoria” is Microsoft’s trial balloon to see if a Semantic Web-lite architecture lights any fires in the development community.

Clearly, there is persistent attention to semantic interoperability issues throughout the distributed computing industry, and Microsoft is certainly not the only SOA vendor that is at least pondering these issues on a high architectural plane. The W3C’s Semantic Web initiative may indeed be the seedbed of a new semantics-enabling SOA, though it may take a lot longer for this dream to be fully realized. It may take another generation or so before we see anything resembling a universal semantic backplane that spans all SOA platforms.

After all, the utopian hypertext visions articulated by Vannevar Bush in the 1940s and Ted Nelson in the 1960s had to wait till the 1990s, until Tim Berners-Lee nudged something called the World Wide Web into existence.

More to come.