Tuesday, May 26, 2009

FORRESTER blog repost Database Religions Dissolve into the Big Billowing Virtual Data Cloud

Database Religions Dissolve into the Big Billowing Virtual Data Cloud

By James Kobielus

Virtualization is a venerable old computing concept that has achieved new life in recent years.

Virtualization brings to life a new world of more flexible service provisioning while cleverly emulating the old world that is being replaced. Virtualization refers to any approach that abstracts the external interface from the internal implementation of some service, functionality, or other resource.

The promise of virtualization is that, no matter how scattered and diverse, all pooled resources behave as if they were a single unified resource, both for usage and administration. In a sense, this is the practical magic that Arthur C. Clarke identified with advanced technology. The external interface may conceal various facts about the implementations of the underlying resources. The virtualized resources may run on diverse operating and application platforms;have been deployed on nodes in diverse locations; have been aggregated across diverse hosting platforms (or partitioned within a single hosting platform, either through virtual machine software, separate CPUs, or separate blade servers); and have been provisioned dynamically in response to a client request.

When Noel Yuhanna and I presented on enterprise database virtualization last week at Forrester IT Forum, we took pains to point out that is not a radically new paradigm. In fact, database administrators (DBAs) have been doing virtualization for a long time and not realizing it. We’re all familiar with such database virtualization approaches as policy-based server clustering, massive parallel processing database grids, and enterprise information integration. In these environments, you can identify the virtualization layer as “single system image,” “semantic abstraction,” or some other approach.

What all these approaches share is that they make two or more repositories behave as if they were a single database for unified access, query, reporting, predictive analytics, and other applications. If you wish, I could drill down further into the layers of database virtualization--data virtualization, transaction virtualization, and platform virtualization--but that would be too much for a mere blogpost.

One twist that I didn’t have time to explore in depth last week is the notion that the traditional hub-and-spoke enterprise data warehousing (EDW) architecture is itself a form of database virtualization. The hub-and-spoke model transforms analytic data to a common “spoke-side” semantic access model, such as star schema or columnar. As such, this approach abstracts from the data models (usually 3NF relational) implemented at the EDW hub tier, the staging tier (perhaps file-based), and OLTP sources (perhaps hierarchical, XML, or what have you).

When you realize that each data-persistence approach has its optimal deployment sphere, you’re thinking database virtualization. At that point, you start to realize that the various database religions--relational is supreme, columnar is king, and so forth--are not absolute truths. They’re simply sectarian texts in a tradition of longer vintage: the evolution of truly all-encompassing data virtualization clouds.

Yes, I’m using “cloud” in this context because it best describes this new paradigm. Cloud-based virtualization is beginning to seep into analytic infrastructures. To support flexible mixed-workload analytics, the EDW, over the coming five to 10 years, will evolve into a virtualized, cloud-based, and supremely scalable distributed platform.

What are the outlines of this new paradigm? The virtualized EDW will allow data to be transparently persisted in diverse physical and logical formats to an abstract, seamless grid of interconnected memory and disk resources and to be delivered with subsecond delay to consuming applications. EDW application service levels will be ensured through an end-to-end, policy-driven, latency-agile, distributed-caching and dynamic query-optimization memory grid, within an information-as-a-service (IaaS) environment. Analytic applications will migrate to the EDW platform and leverage its full parallel-processing, partitioning, scalability, and optimization functionality. At the same time, DBAs will need to make sure that cloud-based DW offerings meet their organizations’ most stringent security, performance, availability, and other service-level requirements.

I won’t opine here and now on how much enterprise data will be persisted in public clouds vs. private environments that incorporate many of the same platform virtualization technologies. I’ll save that discussion for the upcoming Forrester reports that Noel and I are developing in virtualization of transactional and analytic databases, respectively.

Expect those in Q3 or thereabouts. Thanks everybody who attended our preso last week in Vegas!