DIG White Paper 95/01
Data Intelligence Group
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help D&B "preemptively define the information market of tomorrow." Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools. Data mining answers business questions that traditionally were too time-consuming to resolve. Data mining tools scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
D&B companies already know how to collect and refine massive quantities of data to deliver relevant and actionable business information. In this sense, D&B has been "mining" data for years. Today, some D&B units are already using data mining technology to deliver new kinds of answers that rank high in the business value chain because they directly fuel return-on-investment decisions.
Data mining techniques can be implemented rapidly on existing software and hardware platforms across D&B to enhance the value of existing resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client-server or parallel processing computers, data mining tools can analyze massive databases while a customer or analyst takes a coffee break, then deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?"
In the D&B units DIG surveyed, we found strong interest and a wide range of activities and research in data mining. Groups are engaged in data mining to varying degrees, from experimentation by individual analysts to the deployment of completed projects. We also found a wealth of potential business opportunities that could open up through data mining technology.
The breadth of D&B's collected data places the company in a unique position to take advantage of the fact that data mining tools produce better results with larger, broader databases. By integrating data mining into its products and services, D&B can leverage its existing resources to achieve new revenue.
D&B units are connected through the common goal of delivering integrated, global solutions to support business decisions. In accomplishing this goal across a broad spectrum of markets, D&B units face similar market pressures and opportunities. For example, customers urgently require tools to help them keep pace with accelerating growth in the size and complexity of business data. At the same time, customers demand ever more timely, sophisticated, and widely integrated data analyses.
D&B units work hard today to maintain their leadership against a growing competitive threat from other vendors. These vendors have often aggressively exploited new technology to capture market advantage. While D&B units have responded successfully to these competitive challenges in the past, the question remains: What new technology is becoming available today that D&B can leverage proactively?
Data mining is such a technology. D&B is in a unique position to take the lead in delivering the benefits of data mining technology to customers. The company has a wealth of data unrivaled in its breadth and depth, and the understanding of the relevant markets that is necessary to bring this technology to customers successfully. D&B units are engaged in markets where data mining can have significant impact. These markets use large databases and need the power of data mining to achieve a better understanding of their data.
Data mining derives its name from the similarities between searching for valuable business information in a large database - for example, finding linked products in Nielsen's gigabytes of store scanner data - and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides.
Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:
Data mining techniques can yield the benefits of automation when implemented on existing software and hardware platforms at D&B, and can be implemented on new systems as existing platforms are upgraded and new products developed. When data mining tools are implemented on high performance parallel processing systems, they can analyze massive databases in minutes. Faster processing means that users can automatically experiment with more models to understand complex data. High speed makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield improved predictions. Databases can be larger in two senses:
A recent Gartner Group Advanced Technology Research Note listed data mining and Artificial Intelligence at the top of the five key technology areas that "will clearly have a major impact across a wide range of industries within the next three to five years." Gartner also listed parallel architectures and data mining as two of the top ten new technologies in which companies will invest during the next five years. According to a recent Gartner HPC Research Note, "With the rapid advance in data capture, transmission and storage, large-systems users will increasingly need to implement new and innovative ways to mine the after-market value of their vast stores of detail data, employing MPP [massively parallel processing] systems to create new sources of business advantage (0.9 probability)."
At D&B, data mining technology provides a basis for new products and for enhancements to existing offerings. For example, at DBIS, data mining tools can be used to automate more elements of the process of building risk models for a variety of markets. Data mining can present a Nielsen customer with the top ten most significant new buying patterns each week, or present an IMS customer with patterns of sales calls and marketing promotions that have significant impact within certain market niches.
Some of the most commonly used techniques in data mining are:
Data mining techniques are the result of a long process of research and product development. This evolution began when business data were first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature:
Commercial databases are growing at unprecedented rates. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least ten years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods.
In the evolution from business data to business information, each new step has built upon the previous ones. For example, dynamic data access is critical for drill-through in data navigation applications, and the ability to store large databases is critical to data mining. From the user's point of view, the four steps listed in Table 1 were revolutionary because they allowed new business questions to be answered accurately and quickly.
|Evolutionary Step||Business Question||Enabling Technologies||Product Providers||Characteristics|
|"What was my average total revenue over the last five years?"||Computers, tapes, disks||IBM, CDC||Retrospective, static data delivery|
|"What were unit sales in New England last March?"||Relational databases (RDBMS), Structured Query Language (SQL), ODBC||Oracle, Sybase, Informix, IBM, Microsoft||Retrospective, dynamic data delivery at record level|
|"What were unit sales in New England last March? Drill down to Boston."||On-line analytic processing (OLAP), multidimensional databases, data warehouses||Pilot, IRI, Arbor, Redbrick, Evolutionary Technologies||Retrospective, dynamic data delivery at multiple levels|
|"What's likely to happen to Boston unit sales next month? Why?"||Advanced algorithms, multiprocessor computers, massive databases||Lockheed, IBM, SGI, numerous startups (nascent industry)||Prospective, proactive information delivery|
Table 1. Steps in the evolution of data mining.
In the units we surveyed, we found strong interest and a wide range of activities and research in data mining. Many groups are already engaged in data mining projects, from research and experimentation by individual analysts to completed products that have already added value to the business. We also found a wealth of potential business opportunities that could become available with data mining technology. The following key examples provide a flavor of the interest, activities, and opportunities we encountered:
We see distinct possibilities for synergy among D&B units in the development and application of data mining technology. While each unit has unique business goals, markets, and customer problems, many different problems can be addressed using similar core data mining technologies. For example, a classification tool such as CART can be used equally well to identify municipal bonds whose underlying ratings criteria have changed significantly, or to identify physicians whose prescription-writing patterns have changed. A tool that selects the best variables to use in creating a credit-scoring model for the telecommunications industry could equally well select the variables to use in comparing performance profiles of retail sales channels.
This survey of a limited number of D&B units points to quantifiable business benefits achievable through the integration of data mining technology with D&B products and services. Data mining is a powerful technology with great potential for adding value to the offerings of D&B units. D&B is in a remarkable position to take advantage of this technology: it has the data, and it has the infrastructure to support units in collaborating to solve shared problems.
The members of DIG are committed to seeing the benefits of data mining technology delivered to D&B customers and intends to further D&B synergy in solving data mining problems. We welcome suggestions for other useful activities and for collaboration with analysts across D&B.
The Data Intelligence Group (DIG) was created by Dun & Bradstreet as a center of excellence in data mining. It is composed of scientists, developers, and marketing and business analysts. DIG is part of Pilot Software, a Dun & Bradstreet company that develops and delivers OLAP database products. Prior to DIG, the five original members of the group created the Darwin data mining software application developed at Thinking Machines Corporation.
[ Data Mining Page ] [ White Papers ] [ Data Mining Tutorial ]