Thursday, June 26, 2008

OLAP's cube crumbling around the edges


Business intelligence (BI) is essentially a set of best practices for building models to answer business questions. However, today’s BI best practices may be suboptimal for many enterprises’ decision-support requirements.

For most users, BI is a journey that’s been modeled and mapped out in advance by others, following a well-marked path through vast data sets. Data models, which must often be pre-built by specialists, generate or shape the design of such key BI artifacts as queries, reports, and dashboards. Essentially, every BI application is some data modeler’s prediction of the types of questions that users will want to ask of the underlying data marts. Sometimes, those predictions are little more than an educated guess--and are not always on the mark.

BI’s most ubiquitous data-modeling approach is the online analytical processing (OLAP) data structure known as a “cube.” The OLAP cube--essentially a denormalized relational database--sits at the heart of most BI data marts. OLAP cubes, usually implemented as multidimensional “star” or “snowflake” schemas, allow large recordsets to be quickly and efficiently summarized, sorted, queried, and analyzed. However, no matter how well designed the dimensional data models within any particular cube, users eventually outgrow these constraints and demand the ability to drill down, up, and across tabular recordsets in ways not built into the underlying data structures.

The chief disadvantage of multidimensional OLAP cubes is their inflexibility. Cubes are built by pre-joining relational data tables into fixed, subject-specific structures. One way of getting around these constraints is the approach known as relational OLAP, which retains the underlying normalized relational storage approach while speeding multidimensional query access through “projections.” However, relational OLAP also suffers from the need for explicit, upfront modeling of relationships within and among the underlying tabular data structures.

From the average end user’s point of view, all of this is mere plumbing--invisible and boring--until it prevents them from obtaining the new query tools, structured reports, and dashboards needed to do their jobs. One unfortunate consequence of OLAP cubes’ inflexibility is that requests for new BI applications inevitably wind up in a backlog of IT projects that can take weeks or months to deliver. What might seem a trivial thing to the end user--such as adding a new field or new calculation to an existing report--might represent a time-consuming technical exercise for the data modeling professional. Behind the scenes, this simple decision-support request might, beyond the front-end BI tweaks, also require remodeling of the data mart’s OLAP star schema, re-indexing of the data warehouse, revision of extract transform load (ETL) scripts, and retrieval of data from different transactional applications.

No one expects the OLAP cube to vanish completely from the BI landscape, but its role in many decision-support environments has been declining over the past several years. Increasingly, vendors are emphasizing new approaches that, when examined in a broader context, appear to be loosening OLAP’s lockhold on mainstream BI and data warehousing. The emerging paradigm for ad-hoc, flexible, multi-dimensional, user-driven decision support includes the following important approaches:


  • Automated discovery and normalization of dispersed, heterogeneous data sets through a pervasive metadata layer
  • Semantic virtualization middleware, which supports on-demand, logically integrated viewing and query of data from heterogeneous, distributed data sources without need for a data warehouse or any other centralized persistence node
  • On-the-fly report, query, and dashboard creation, which relies on dynamic aggregation of data, organization of that data within relevant hierarchies, and presentation of metrics that have been customized to the user or session context
  • Interactive data visualization tools, which enable user-driven exploration of the full native dimensionality of heterogeneous data sets, thereby eliminating the need for manual modeling and transformation of data to a common schema
  • Guided analytics tools, which support user-driven, ad-hoc creation of sharable, extensible models containing data, visualization, and navigation models for customizable decision-support scenarios
  • Inverted indexing storage engines, which support more flexible, on-the-fly assembly of structured data in response to ad-hoc queries than is possible with traditional row-based or column-based data warehousing persistence layers
  • Distributed in-memory processing, which enables continuous delivery of intelligence being extracted in real-time from millions of rows of data that originates in myriad, distributed data sources
Unfortunately, this new decision-support paradigm has no pithy name or coherent best practices. If we were call it the “post-OLAP” paradigm, that would give the false impression that OLAP cubes are obsolete, when in fact they are simply being virtualized and embedded within a more flexible Web 2.0 and SOA framework. We could call this the new “hypercube” paradigm, but that might give the mathematical purists among us a case of indigestion.

Whatever we choose to call this new era, look around you. It has already arrived. We can see this trend in the growing adoption of all of these constituent approaches in production BI environments everywhere. However, to date, few enterprises have combined these post-OLAP approaches in a coherent BI architectural framework.

But that day is rapidly coming to mainstream BI and data warehousing environments everywhere. OLAP’s hard-and-fast, cube-based approach is slowly but surely dissolving in this new era of more flexible, user-centric decision support.