Wednesday, June 06, 2007

imho Ocean Semantic.......


Semantic interoperability is the wrong term for this space—or, rather, it’s an OK term, but the word “interoperability” in this context implies that the semantic understandings are purely application-to-application. It implies that the semantic ontologies are meant to be purely machine-readable—hence, RDF/OWL and so forth.

Semantic Web is also the wrong term—or rather, it’s OK too, but the word “Web” implies that, like the World Wide Web, it applies primarily in application-to-person use cases, such as search engines.

But this latter term—Semantic Web—is the one on everybody’s lips, so we’ll have to make our peace with it. It wouldn’t do anybody ('cept jim) any good to rename the space just to suit one analyst. We’ll need to keep emphasizing that Semantic Web refers to all use cases of semantic interoperability: application-to-application, application-to-person, person-to-person, and so forth.

As I survey the vast ocean of Semantic Web-ish activity going on out there, it’s clear to me that only a subset of it is being addressed by the W3C’s Semantic Web activity. Viewed globally, the Semantic Web community divides into two loosely scoped “camps,” with each having its own focus:

  • SOA Semantic Web (i.e., W3C with RDF, OWL, SPARQL, GRDDL, etc., addressing app-to-app and app-to-person semantic interoperability in an SOA/Web 1.0 environment, where the vast majority of the activity addresses the need to surface and make transparent the originator-intended semantics of structured info expressed in XML and other standardized markup syntax):
    • Explicit semantic modeling (i.e, knowledge representation languages that provide app/data developers with formal grammars for expressing the entity-relationship graphs and hierarchies/taxonomies within which structured content is generated, transmitted, and consumed)
    • Controlled semantic vocabularies (i.e., source-domain-asserted ontologies, definitions, tags, metadata, schemas, glossaries, hierarchies, etc. under governance/stewardship of clearly defined domain authorities)
    • Deterministic semantic mediation (i.e., certain semantic correspondence among autonomous semantic domains’ ontologies, via well-defined mapping/transformations among different domains’ vocabularies, per agreed-upon standards, conventions, federation agreements, etc.)
  • Social Semantic Web (i.e., all the user-centric application-to-person and person-to-person “folksonomy” and social networking/bookmarking stuff in the Web 2.0 world, where most of the activity concerns the need to aggregate, classify/cluster, apply third-party tag-based contextualization to, and mine the latent meanings of various structured, unstructured, and media content objects originated by users themselves and/or third-parties, such as media websites, other users, etc.):
    • Implicit semantic modeling (i.e., natural human languages that provide normal human beings with informal/colloquial grammars for expressing themselves within unstructured/semi-structured content objects, from which, through text mining/analytics, those humans’ implicit ontology of entities, classes, relationships, sentiments, etc are extracted, surfaced, etc.; hence semantic “mining” takes precedence over semantic “modeling”)
    • Uncontrolled semantic vocabularies (i.e., target-user-asserted keywords, tags, comments, scores, votes, evaluations, etc. that they apply to any self-originated or third-party-originated content, site, resource, entity, etc., without need for prior agreement or relationship with third-party content originator, and without need for the meaning-asserting target-user to implement any systematic governance/stewardship over the idiosyncratic “vocabularies” or “ontologies” they use to express the meaning, to them, of everything they encounter online; hence, user-idiosyncratic semantic “waywardship” takes precedence over authority-governed semantic “stewardship”)
    • Probabilistic semantic mediation (i.e., uncertain prima-facie semantic correspondence among diverse source-domain-asserted ontologies and user-asserted implicit “ontologies,” and among different user-asserted “ontologies” keyed on any given content/resource, hence the need for fuzzy matching, relevance ranking, inference engines, data/text mining, clustering/classification, and other automated techniques to establish greater confidence in semantic correspondence; and also the occasional need for human content analysis/judgment to deal with all the gray areas where it’s not clear if two or more users or documents or blogs or social networking sites are referring to the same or different things)

There’s a lot going on in the text mining/analytics space right now. That’s going to be a pivotal technology behind the Social Semantic Web. I don’t see much uptake of RDF/OWL—the heart of the SOA Semantic Web—in the social networking world. Not yet. Unless others see something I’m not seeing.

Why doesn't W3C take up defining standards for the Social Semantic Web? And what would those standards be?

More to come.