Sunday, November 22, 2009

FORRESTER blog repost Instrumenting Your Enterprise for Maximum Predictive Power

Instrumenting Your Enterprise for Maximum Predictive Power

By James Kobielus

Business is all about placing bets and knowing if the odds are in your favor.

As I noted in my most recent Forrester report, business success depends on your company being able to visualize likely futures and take appropriate actions as soon as possible. You must be able to predict future scenarios well enough to prepare plans and deploy resources so that you can seize opportunities, neutralize threats, and mitigate risks.

Clearly, predictive analytics can play a pivotal role in the day-to-day operation of your business. It can help you focus strategy and continually tweak plans based on actual performance and likely future scenarios. And, as I noted in a recent Forrester blog post, the technology can sit at the core of your service-oriented architecture (SOA) strategy as you embed predictive logic deeply into data warehouses, business process management platforms, complex event processing streams, and operational applications.

The grand promise of predictive analytics—still largely unrealized in most companies—is that it will become ubiquitous, guiding all decisions, transactions, and applications. For the technology to rise to that challenge, organizations must move toward a comprehensive advanced analytics strategy that integrates data mining, content analytics, and in-database analytics. Already, we’ve sketched out a vision of “Service-Oriented Analytics,” under which you break down silos among data mining and content analytics initiatives and leverage these pooled resources across all business processes.

You may agree that this is the right vision but have doubt about whether there is a practical, incremental roadmap for taking your company in that direction. In fact there is, and it starts with re-assessing the core of most companies’ predictive analytics capability: your data mining tools. As you plan your predictive analytics initiatives, you should avoid the traditional approach of focusing on tactical, bottom-up project-specific requirements. You should also try not to shoehorn your requirements into the limited feature set of whatever modeling tool you currently happen to use.

To become a fully predictive enterprise, you will need to take both a top-down and bottom-up approach to your data mining initiatives. From the top-down, it’s all about building and integrating alternate models of how your business environment is likely to evolve internally and externally. In our recent report on advanced analytics, Boris Evelson, Leslie Owens, and I sketched out the many business processes that can be enriched by predictive analytics.

So how do you instrument your company to become more predictive? For starters, assess whether your analytics tools support the following capabilities for developing, validating, and deploying predictive models:

  • Model multiple business scenarios: You should be able to build complex models of multiple, linked business scenarios across different business, process, and subject-area domains, using such key features as strategy maps, ensemble modeling , and champion-challenger modeling.
  • Incorporate multiple information types into models: You should be able to develop models against multiple information types, including unstructured content and real-time event streams, while leveraging state-of-the-art algorithm in sentiment analysis and social network analysis.
  • Leverage multiple statistical algorithms and approaches in models: You should be able to develop models using the widest, most sophisticated range of statistical and mathematical algorithms and approaches, including regression, constraint-based optimization, neural networks, genetic algorithms, and support vector machines.
  • Apply multiple metrics of model quality and fitness: You should be able to score and validate model quality using multiple metrics and approaches, including quality scores, lift charts, goodness-of-fit charts, comparative model evaluation, and auto best-model selection.
  • Employ multiple variable discovery and assessment approaches: You should be able to build and validate models using various approaches for variable discovery, profiling, and selection, including decision trees, feature selection, clustering, association rules, affinity analysis, and outlier analysis.

How is this different from predictive analytics as usual? Traditionally, most predictive modeling specialists focus on the latter three capabilities: statistical algorithms and approaches, model quality and fitness, and variable discovery and assessment. Most models are built in narrowly scoped business or subject domains—such as customer analytics for marketing campaign management—and only against structured data sources (such as relational tables). Traditionally, few predictive analytics projects have entailed modeling of multiple business scenarios across diverse domains--such as sales, marketing, customer service, manufacturing, and supply chain-- though in the real world these business processes are often quite interconnected. Also, many data mining initiatives fail to incorporate information from unstructured sources—such as text in call-center logs—though this content may be as important as what comes relational databases and other structured sources.

It’s very important to build multi-scenario predictive models against complex information sets, but becoming a fully predictive enterprise demands much more. To instrument your organization for maximum predictive power, you should also tool your advanced analytics to support the following capabilities:

  • DW-integrated data preparation: To speed up and standardize the most time-consuming predictive modeling project tasks, you should be able to leverage your existing data warehouse, extract transform load, data quality, and metadata tools to support a full range of data preparation features. These features include the ability to discover, acquire, capture, profile, sample, collect, collate, aggregate, deduplicate, transform, correct, augment, and load analytical data sets.
  • Deep application and middleware integration: To deliver models deeply into whatever heterogeneous SOA-enabled platform you happen to use, your predictive analytics tool should deploy on and/or integrate with a wide range of enterprise applications, middleware, operating platforms, and hardware substrate. You should be able to deploy models seamlessly into your data warehouse, business intelligence, online analytical processing, data integration, complex event processing, data quality, master data management, and business process management environments. And to play well in your SOA, your predictive modeling tools should support application programming interfaces, languages, tools, and approaches such as Web services, Java, C++, and Visual Studio, as well as emerging languages such as SQL-MapReduce and R.
  • Consistent cross-domain model governance: To avoid fostering an unmanageable glut of myriad models, your predictive analytics solution should support a wide range of tools, features, and interfaces to support life-cycle governance of models created in diverse tools. At the very least, your tools should enable model check in/check-out, change tracking, version control, and collaborative development and validation of models. To realize this promise, it should support a full range of tools, standards, and interfaces for import and embedding of models from other tools, as well as export and sharing of models to other environments.
  • Flexible model deployment: To execute modeling functions--such as data preparation, regression, and scoring—on the widest range of data warehouses and other platforms, your tools should support in-database or embedded analytics. And to scale to the max, your predictive analytics tools should deploy models to massively parallel data warehouses, software-as-a-service environments, and cloud computing fabrics. Your advanced analytics tools should also support development of application logic in open frameworks—such as MapReduce and Hadoop—to enable convergence of data mining and content analytics in the cloud.
  • Rich interactive visualization: To deliver their precious payload—actionable intelligence—your advanced analytics tools should support interactive visualization of models, data, and results. Ideally, you should be able to visualize all of this in your preferred business intelligence tool, or in the predictive modeling vendor’s integrated visualization layer. Of course, you have every right to expect the full range of visualization techniques--histograms, box plots, heat maps, etc.—regardless of who provides the visualization layer.

As you can see, this goes well beyond data mining as usual. Forrester has a slightly different perspective on the development of the predictive analytics market than you’re likely to get from other sources. We see a robust, flexible, SOA-enabled data mining tools as the centerpiece of advanced analytics for fully predictive enterprises. The competitive stakes are too great for businesses to take the traditional silo-mired approach when implementing this mission-critical technology.

What do you think?