Friday, November 27, 2009
Sunday, November 22, 2009
By James Kobielus
Business is all about placing bets and knowing if the odds are in your favor.
As I noted in my most recent Forrester report, business success depends on your company being able to visualize likely futures and take appropriate actions as soon as possible. You must be able to predict future scenarios well enough to prepare plans and deploy resources so that you can seize opportunities, neutralize threats, and mitigate risks.
Clearly, predictive analytics can play a pivotal role in the day-to-day operation of your business. It can help you focus strategy and continually tweak plans based on actual performance and likely future scenarios. And, as I noted in a recent Forrester blog post, the technology can sit at the core of your service-oriented architecture (SOA) strategy as you embed predictive logic deeply into data warehouses, business process management platforms, complex event processing streams, and operational applications.
The grand promise of predictive analytics—still largely unrealized in most companies—is that it will become ubiquitous, guiding all decisions, transactions, and applications. For the technology to rise to that challenge, organizations must move toward a comprehensive advanced analytics strategy that integrates data mining, content analytics, and in-database analytics. Already, we’ve sketched out a vision of “Service-Oriented Analytics,” under which you break down silos among data mining and content analytics initiatives and leverage these pooled resources across all business processes.
You may agree that this is the right vision but have doubt about whether there is a practical, incremental roadmap for taking your company in that direction. In fact there is, and it starts with re-assessing the core of most companies’ predictive analytics capability: your data mining tools. As you plan your predictive analytics initiatives, you should avoid the traditional approach of focusing on tactical, bottom-up project-specific requirements. You should also try not to shoehorn your requirements into the limited feature set of whatever modeling tool you currently happen to use.
To become a fully predictive enterprise, you will need to take both a top-down and bottom-up approach to your data mining initiatives. From the top-down, it’s all about building and integrating alternate models of how your business environment is likely to evolve internally and externally. In our recent report on advanced analytics, Boris Evelson, Leslie Owens, and I sketched out the many business processes that can be enriched by predictive analytics.
So how do you instrument your company to become more predictive? For starters, assess whether your analytics tools support the following capabilities for developing, validating, and deploying predictive models:
- Model multiple business scenarios: You should be able to build complex models of multiple, linked business scenarios across different business, process, and subject-area domains, using such key features as strategy maps, ensemble modeling , and champion-challenger modeling.
- Incorporate multiple information types into models: You should be able to develop models against multiple information types, including unstructured content and real-time event streams, while leveraging state-of-the-art algorithm in sentiment analysis and social network analysis.
- Leverage multiple statistical algorithms and approaches in models: You should be able to develop models using the widest, most sophisticated range of statistical and mathematical algorithms and approaches, including regression, constraint-based optimization, neural networks, genetic algorithms, and support vector machines.
- Apply multiple metrics of model quality and fitness: You should be able to score and validate model quality using multiple metrics and approaches, including quality scores, lift charts, goodness-of-fit charts, comparative model evaluation, and auto best-model selection.
- Employ multiple variable discovery and assessment approaches: You should be able to build and validate models using various approaches for variable discovery, profiling, and selection, including decision trees, feature selection, clustering, association rules, affinity analysis, and outlier analysis.
How is this different from predictive analytics as usual? Traditionally, most predictive modeling specialists focus on the latter three capabilities: statistical algorithms and approaches, model quality and fitness, and variable discovery and assessment. Most models are built in narrowly scoped business or subject domains—such as customer analytics for marketing campaign management—and only against structured data sources (such as relational tables). Traditionally, few predictive analytics projects have entailed modeling of multiple business scenarios across diverse domains--such as sales, marketing, customer service, manufacturing, and supply chain-- though in the real world these business processes are often quite interconnected. Also, many data mining initiatives fail to incorporate information from unstructured sources—such as text in call-center logs—though this content may be as important as what comes relational databases and other structured sources.
It’s very important to build multi-scenario predictive models against complex information sets, but becoming a fully predictive enterprise demands much more. To instrument your organization for maximum predictive power, you should also tool your advanced analytics to support the following capabilities:
- DW-integrated data preparation: To speed up and standardize the most time-consuming predictive modeling project tasks, you should be able to leverage your existing data warehouse, extract transform load, data quality, and metadata tools to support a full range of data preparation features. These features include the ability to discover, acquire, capture, profile, sample, collect, collate, aggregate, deduplicate, transform, correct, augment, and load analytical data sets.
- Deep application and middleware integration: To deliver models deeply into whatever heterogeneous SOA-enabled platform you happen to use, your predictive analytics tool should deploy on and/or integrate with a wide range of enterprise applications, middleware, operating platforms, and hardware substrate. You should be able to deploy models seamlessly into your data warehouse, business intelligence, online analytical processing, data integration, complex event processing, data quality, master data management, and business process management environments. And to play well in your SOA, your predictive modeling tools should support application programming interfaces, languages, tools, and approaches such as Web services, Java, C++, and Visual Studio, as well as emerging languages such as SQL-MapReduce and R.
- Consistent cross-domain model governance: To avoid fostering an unmanageable glut of myriad models, your predictive analytics solution should support a wide range of tools, features, and interfaces to support life-cycle governance of models created in diverse tools. At the very least, your tools should enable model check in/check-out, change tracking, version control, and collaborative development and validation of models. To realize this promise, it should support a full range of tools, standards, and interfaces for import and embedding of models from other tools, as well as export and sharing of models to other environments.
- Flexible model deployment: To execute modeling functions--such as data preparation, regression, and scoring—on the widest range of data warehouses and other platforms, your tools should support in-database or embedded analytics. And to scale to the max, your predictive analytics tools should deploy models to massively parallel data warehouses, software-as-a-service environments, and cloud computing fabrics. Your advanced analytics tools should also support development of application logic in open frameworks—such as MapReduce and Hadoop—to enable convergence of data mining and content analytics in the cloud.
- Rich interactive visualization: To deliver their precious payload—actionable intelligence—your advanced analytics tools should support interactive visualization of models, data, and results. Ideally, you should be able to visualize all of this in your preferred business intelligence tool, or in the predictive modeling vendor’s integrated visualization layer. Of course, you have every right to expect the full range of visualization techniques--histograms, box plots, heat maps, etc.—regardless of who provides the visualization layer.
As you can see, this goes well beyond data mining as usual. Forrester has a slightly different perspective on the development of the predictive analytics market than you’re likely to get from other sources. We see a robust, flexible, SOA-enabled data mining tools as the centerpiece of advanced analytics for fully predictive enterprises. The competitive stakes are too great for businesses to take the traditional silo-mired approach when implementing this mission-critical technology.
What do you think?
Aweekstweets November 15-22 2009—whole week’s blather scraped, classified, with extended commentary only on the tech-related stuff
Aweekstweets November 15-22 2009—whole week’s blather scraped, classified, with extended commentary only on the tech-related stuff
JK2—We’re always trying to rebuild the brain. Now we’re ratcheting down our ambition: mimicking carnivore gray matter because we don’t have the computer power to do justice to our own wetware circuitry. I recommend we start by replicating the most primitive brains in the animal kingdom: insects (do they even have brains?). Why start there? Well, because they’re such an incredibly successful category of organisms...they might have something to teach us, if we can learn to think like them. Cats? They’re recently evolved camp followers of homo sapiens. From an evolutionary standpoint, they’ll teach us what we already know: if you protect the food supply from rodents, generally keep to yourself, and provide a passive object of comfort and companionship, you have a warm place in the human hearth—as long as humans themselves survive. Insects are something entirely different: they’ll survive whether or not we do, and they might contribute to our downfall. Keep your friends close, your enemies closer.
JK2—Let’s not imagine that everybody everywhere wants to spend every day experiencing the world through reports, dashboards, and other other visualization containers we associate with specialized business intelligence (BI) solutions. Most of us want all of these contextualizers, but embedded in all the apps and services we use. And let’s not imagine that everybody wants to see every scrap of information packaged in a BI-like experiences: with prebuilt visualizations, context, and insights. So it’s not productive to view the world through purely BI-colored glasses. What I love most about the Web is the passing parade of people, situations, events, images, information, trends, and experiences—arbitrary, complex, confusing, sprawling, stimulating, open-ended. The masses are happy to derive their own meanings from these messes.
JK2—Will enterprises evolve toward hybrid BI environments hosted partly on-premises and partly in the SaaS/cloud. Will the departments be allowed to mashup their own BI reports and dashboards on outsourced SaaS/cloud services, while the enterprise as a whole uses a premises-based platform? Won’t one approach crowd out the other over time as corporate IT looks to consolidate on a single platform? Either SaaS/cloud will become the dominant BI deployment approach for companies of all sizes, or the dominant approach for one segment, such as the midmarket. Or the dominant approach for deployment of one category of BI capability—such as predictive analytics against cloud-sourced data—while the core of BI is still deployed on on-premises platforms.
JK2—Nobody truly knows the future. Some of us have models that have proven quite good at predicting futures with a reasonable degree of confidence, based on observations. That’s what predictive modeling is all about. Where analytics is concerned, there has never been a “next big thing.” Instead, all the old things (and data mining is certainly an old established discipline) just keep evolving aggressive new marketing messages to justify customers’ continued loyalty.
JK2—The key gating factor on predictive analytics’ adoption has always been the specialized statistical and mathematical knowledge required to use these tools effectively. That constraint is beginning to ease, thanks to the development of more automated visual tooling for data discovery, exploration, preparation, and modeling. But this is still a math-geek-intensive discipline—much more than, say, core BI. Let’s be honest with ourselves. No true “next big thing” demands that you first go back for college-level training in statistics.
JK2—I’m a bit fatalistic about speaking to the press. Even when they quote me correctly, and place that quote in the right context in a well-written article, a misleading headline can screw it all up. Jeff did a good job on this one except for the headline. There’s no mention anywhere in this article of a free DW appliance (software plus hardware in a complete, no-charge package) being offered by any vendor. If there were, that would definitely be news.
JK2—In-database analytics is a capability that most DW/DBMS platforms support, as do most predictive analytics and data mining tools. It’s all about tools exporting models as PMML, or as native SAS code, or as Java archives, or any of various other approaches—and DW/DBMSs’ importing them and executing those models as user-defined functions (UDFs) or some other approach. Of course, vendors vary widely in the range of data mining functions—such as data preparation, regression analysis, and scoring—that can be done on which tools’ models by which DW/DBMS vendors’ platforms.
JK2—Excuse me for being quick with the metaphorical comeback.
JK2—IBM was the sole advertiser on that issue.
JK2—No matter how I score the various vendor tools on my forthcoming Forrester Wave for Predictive Analytics and Data Mining Solutions, I’m going to face a boatload of ire from devotees of the lower-scored tools. And even from users of the higher-scored tools who will point out the myriad feature their vendor has never got quite right—but which are not showstoppers that would cause these users to abandon the stat tools they’ve been using since their college days.
JK2—I’m more than happy covering the myriad not-quite-that-big things that loom large in the daily nitty-gritty of enterprise computing. A more sustainable career than riding the wave of fast-rising bubble technologies that may be big next year but obsolete the year after.
JK2—That’s the thrust of my Service-Oriented Analytics discussion.
JK2—There are many ways to skin the “self-service operational BI” cat, and almost every vendor in this arena is doing it by a blend of these and other approaches. Everybody’s trying to take this technology out of IT’s hands and put the users in the driver’s seat. Very little of it is rocket science. Most of it is well-established and well-understood, has stable usage and integration patterns, and can be automated to a greater degree than we like to admit.
JK2—Microsoft’s strategic error on cloud DW was building one stovepipe analytic database environment for Azure, and another one for SQL Server. They’ll spend several years converging them, and it won’t be pretty. And it won’t be in time to make much headway against Amazon, Google, IBM, Teradata, and others who are getting there first with more integrated cloud DBMS/DW solutions—in IBM and Teradata’s cases, with the same core database in premises-based and public clouds.
JK2—That list is more or less in descending order—from strong/promising to weak/non-existent--of the cloud strategies of DW vendors in today’s market.
JK2—See tweet explaining this, earlier in this aweekstweets. Lots of DW/DBMSs can import models in native SAS code. Aster can even execute those models, without conversion, in a SAS executable runtime container in the new nCluster v4.
JK2—Milestone in the ordinary product-management sense that it’s one step closer to go-live for Microsoft. Will Azure represent an industry milestone in the maturity, sophistication, and adoption of cloud computing in corporate environments? 2010 will be the year we learn.
JK2—Windows—with its schizoid “wait forever for your latest goddamn mouse-click to advance the cursor a millimeter on screen” GUI—has rarely been quick, and—with the blue screens of death, malware infestations, and general look-and-feel madness—has often been dirty.
JK2—I’d be interested in knowing how exit-polling on elections matches up with same-time voice-of-voter polls as expressed in the twit-o-blog-o-sphere.
JK2—Even difficult to explain to DBAs—and to other analysts.
JK2—In a commoditized market with a couple dozen competitors, re-startups had better have some awesomely innovative verge-of-commercialization technology in the labs to have a snowball’s chance.
JK2—For example: Can social network analysis detect the outlines of the GOP agenda and presidential candidate shortlist for 2012 even though it’s 3 years from now? And can these algorithms outdo the human pundits in this regard? That’d be like a football coach having a mole in the opposition’s huddle.
JK2—Quite frankly, dreams are often a distraction from the main business of life. Dreams are just rapid eye movements and herky-jerky unconscious muscle spasms. Sell people quiets, comfy pillows, and firm mattresses.
JK2—Seriously. MySQL is what? The 13th or 14th most popular DBMS in the world.
JK2—Mostly just an internal private cloud at IBM. The commercialized cloud will be for IBM mainframe customers. I’m still waiting for an IBM smart analytics public and private cloud that will be virtualized across DB2 and Informix, and across all OS and hardware platforms. I don’t know yet where IBM is going with this, or whether in fact they plan to go that all-encompassing.
JK2—Is Cray a stand-alone company or a product group within some larger vendor? Will have to look them up—again.
JK2—The point of this tweet was that SAS is the largest predictive analytics and data mining vendor by market share—hence many predictive models have been built with its tools—hence DW vendors that do in-database analytics should be able to integrate with and execute the full range of procedures, including scoring and regression-, on SAS models—hence it’s good that Netezza has a SAS partnership. Netezza’s partnership with Fuzzy Logix, vendor of the DB Lytix in-db enabling tool, is important for Netezza in-database analytics on a wide range of third-party PA/DM tool vendors’ models. Whew—hard point to make without lots of detail and nuance. Thank goodness for aweekstweets (assuming anybody actually reads this).
JK2—Sentiment analysis is in danger of becoming too popular, pre-empting formal opinion polls and focus groups, pre-empting the need to actually talk to your customers to see what’s on their minds.
JK2—In petabyte and multi-100-TB DWs, the data’s getting too massive to move. For that, and other reasons, move the predictive and other analytic logic to the DW, rather than vice versa.
JK2—I’m definitely going to have to SWOT the DW and PA/DM vendors’ in-database analytics features more finely in the coming year.
JK2—That’s fine, but I still don’t have any meaningful details on IBM Smart Analytics System appliances. Still waiting. Getting a briefing update from them in December.
JK2—SAS knows this is a must for their customers to continue scaling up and out.
JK2—Here I’ll simply re-post the full response from a few tweets above: “JK2—The point of this tweet was that SAS is the largest predictive analytics and data mining vendor by market share—hence many predictive models have been built with its tools—hence DW vendors that do in-database analytics should be able to integrate with and execute the full range of procedures, including scoring and regression-, on SAS models—hence it’s good that Netezza has a SAS partnership. Netezza’s partnership with Fuzzy Logix, vendor of the DB Lytix in-db enabling tool, is important for Netezza in-database analytics on a wide range of third-party PA/DM tool vendors’ models. Whew—hard point to make without lots of detail and nuance. Thank goodness for aweekstweets (assuming anybody actually reads this).”
JK2—It’s funny that the headline writer put “fluff up” in there, as if these vendor’s announcements were insubstantial. They weren’t insubstantial, but they didn’t begin to address the pricing issues that will determine whether any of the new services are cost-effective for the mass market anytime soon. My hunch is that we’re due for a nasty price war among the cloud app/platform vendors in 2010-2012, with packaged software license revenues (watch out Microsoft!) taking a huge hit.
JK2—What’s Twitter? A cloud of colloquial noise. You can hide juicy tweetborne content in plain sight—i.e, won’t pass through the filters of many sophisticated text analytics NLP engines, because essentially written in ad-hoc arbitrary you-and-your-friends-specific code language.
JK2—Interesting on many levels. But they have yet to identify a killer app.
JK2—DW vendors missing from SAS partnership: Oracle, Microsoft, SAP, Sybase.
JK2—Give it up world. You’re visible from space. You’re also visible from airplanes and hot-air balloons. Blame the Wright and Montgolfier Brothers, if you have to start somewhere.
JK2—I saw another mention of Wolfram Alpha in a vendor’s partner slides the other day. Who exactly is using it?
CONTINUING-TO-INDULGE-IN-SOCIAL-MEDIA-ARE-CHANGING-THE-VERY-FABRIC-OF-OUR-POSTMODERN-EXISTENCE-WAIT-A-SEC-WHY-AREN’T THESE IN TECH-TWEET? TWEETS
Teaching myself to stop worrying and love the calendar. Starting to call next year "twenty-ten" and back-fit "twenty-oh-nine" to this one. 11 minutes ago from TweetDeckFrom itd.daily@it -director.com: "'It is wonderful to be here in the great state of Chicago.' Dan Quayle:" JK--Huh? Why pick on him anymore? 6:56 AM Nov 19th from TweetDeck