Friday, January 18, 2008

personal My Forrester Research coverage areas--and linkages with others'--plus some new thinking on it all

All:

Hi. Yes, I plan to continue blogging to this, my personal pulpit, in addition to contributing to Forrester's blog. As you can well imagine, I've been busy as can be this month orienting to Forrester. In fact, I've been in Cambridge much of this month (including right now) getting orientated, acclimated, acculturated, caffeinated, invigorated, accelerated, and downright exhilarated to all things Forrester. Seriously great team I'm with.

For starters, you might have noticed that Boris Evelson and I co-blogged this week, under Forrester auspices, on the Oracle/BEA and Sun/MySQL acquisitions. So you can read it at http://blogs.forrester.com/information_management/business_intelligence/index.html. No doubt, you've also read various things that Boris, myself, and other Forrester analysts said to the many IT trade press reporters who called us all-day Wednesday about those stories.

So I've been busy from the get-go at Forrester. In coming weeks, our customers will see my first published Forrester document: on the evolution of data warehousing (DW) appliances, and on best practices for evaluating, deploying, and managing them. Appliances in all their diversity have become the dominant industry approach for rolling out purpose- and performance-built solutions in support of online analytical processing, bulk data loading, and other core DW functions. Rest assured that I'll present a multilayered definition of this go-to-market approach that does justice to the range of vendor implementations. I'll also, in subsequent reports, delve deeper into such key enterprise requirements as real-time DW, master data management (MDM) applications in DW implementations, and convergence of structured, semi-structured, and unstructured data in DW environments. As you can well imagine, I'll collaborate closely with such esteemed analyst colleagues as Boris (our BI guru), Rob Karel (data integration, data quality, MDM), and Noel Yuhanna (DBMS).

I'm teamed most closely with Rob and Boris. From a corporate hierarchy viewpoint, we're all in Forrester's IT Client Group, under that group's Information and Knowledge Management (I&KM) orbit, and in that orbit's Data Domain, focusing primarily on structured data. Please note that we have several I&KM colleagues, including Kyle McNabb, Barry Murphy, Craig Le Clair, and Steve Powers, in the Enterprise Content Management domain, who look at unstructured and semi-structured info. And I shouldn't overlook other important analysts in I&KM, including those who cover collaboration (Erica Driver, Rob Koplowitz, Connie Moore, Colin Teubner, and Claire Schooley) and information access (Matt Brown, Ken Poore, and Leslie Owens).

Re Rob and Boris, it makes perfect sense for us to be in same domain, because DI, DQ, DW, and BI are inextricably linked disciplines that are separated by semi-permeable membranes. Which reminds me: I've long used the following definitions to distinguish the conceptual "demarc" between them:
  • Business Intelligence (BI): BI includes all tools and runtime components necessary to provide actionable information, insight, analysis, and decision support to business users. Information may be retrieved into BI environments from one or more repositories, including diverse databases, data marts, data warehouses, operational data stores, document management systems, and online transaction processing systems.
  • Data Warehousing (DW): DW includes all tools and runtime components necessary to consolidate structured master data into subject-oriented, integrated, non-volatile, time-variant repositories under unified business governance. DW environments consolidate master data from source data stores through various DI approaches and govern its controlled distribution to various operational data stores, data marts, access databases, and BI environments.
  • Data Integration (DI): DI includes all the tools and runtime components needed to retrieve, extract, and move data from origin repositories; validate and transform the data; and deliver it to target databases, data warehouses, data marts, and applications. The DI marketplace also includes data quality (DQ) solutions for profiling source data; matching, merging, and cleansing that data; and augmenting it with additional, related data.
Which also reminds me: the membrane around my coverage area also includes a few technologies that I've traditionally included under BI (but Boris assures me that he'd prefer I continue to deepen my established research focus on them). Here they are, plus my still-wordy scoping definitions (yes, I'm working on elevator pitches....these ones technically qualify if I plan to use them in the Burj Dubai):
  • Predictive Analytics/Data Mining: Predictive analytics uses statistics-powered data mining and interactive visualization that enables forecasting and assessment of the likelihood of future events and trends, in order to support forward-looking decision making with regard to strategic planning, corporate development, sales and marketing, product development, pricing and packaging, customer service, and other critical matters.
  • Complex Event Processing: CEP uses low-latency middleware and interactive visualization that enables continuous monitoring, aggregation, correlation, filtering, and presentation of diverse external and/or internal events surfaced from operational applications, business process management systems, databases, and other sources, in order to support flexible, real-time business response and proactive coordination.
To a great degree, predictive analytics/data mining depends on DWs--i.e., hub-and-spoke DW environments in which there are data marts to support access, storage, scoring, loading, cleansing, and other life-cycle functions on structured analytical data sets. However, CEP would seem, in many real-world deployments, to do without DWs altogether--and require a more distributed, federated, real-time, low-latency, end-to-end event-stream processing middleware fabric.

Essentially, a traditional DW operates in store-and-forward mode, introducing latency into the delivery of data to BI environments. Most of today’s DWs have been optimized for specific latency-producing operations: extraction, transformation, and loading (ETL) of data from operational DBMSs; retention of that data in persistent repositories; and downstream retrieval of that stored data into reports, graphical dashboards, multidimensional online analytical processing (OLAP) cubes, and other BI outputs.

DWs can be re-architected to support real-time BI. In fact, most DW vendors have already begun to address these requirements in their products. At heart, doing so requires that DWs be reconfigured to also serve as real-time application-layer data “routers” (in a broad unconventional sense of that term). For example, Teradata’s “active DW” approach adds support for near-real-time ETL and data delivery. Just as important, the vendor has added the policy-driven event detection, processing, and notification features needed to manage the flow of real-time events between data sources and consumers, as brokered through the DW. It has also built into its DW environment the availability, scalability, performance optimization, and dynamic workload management features needed to monitor, sustain, and guarantee minimal latency on data throughput out to BI applications.

Though organizations are beginning to use active DWs for real-time BI, no one is seriously considering deploying them as general-purpose, application-layer routers. Mainly this is because DWs are usually deployed in hub-and-spoke configurations and thus potentially can become significant bottlenecks. Some in the industry have proposed DW federation to alleviate the potential bottleneck, but most federation scenarios are still fundamentally hub-and-spoke in their reliance on common ETL tools, metadata repositories, and data staging areas.

All of which sounds like fodder for future Forrester papers. I'll keep you posted as I firm up my research calendar over the next several weeks.

Thanks, by the way, for your kind e-mails etc. Nice to know people value my expertise. I won't disappoint.

Jim

Tuesday, January 01, 2008

personal James Kobielus joins Forrester Research

All:

I have joined Forrester Research as senior analyst for data warehousing. You can reach me there at jkobielus@forrester.com or 703-340-8134.

Jim