Position Papers

Keys to the Commercial Success of Data Mining

A workshop held in conjunction with KDD'98
August 31, 1998
New York City

Workshop Chairs:

Kurt Thearling, Exchange Applications

Roger Stein, Moodys

 

 

Some thoughts on the current state of data mining software applications

Kurt Thearling

Director of Analytics
Exchange Applications
89 South Street
Boston, MA 02111
kurt@thearling.com
http://www.thearling.com

As a former developer of data mining software, I can understand how difficult it is to create applications that are relevant to business users. Much of the data mining community comes from an academic background and has focused on the algorithms buried deep in the bowels of the technology. But algorithms are not what business users care about. Over the past few years the technology of data mining has moved from the research lab to Fortune 500 companies, requiring a significant change in focus. The core algorithms are now a small part of the overall application, being perhaps 10% of a larger part, which itself is only 10% of the whole.

That being said, the focus of this article is to point out some areas in the remaining 99% that need to be improved upon. Here’s my current top ten list:

  1. Database integration. No flat files. One more time: No flat files. Not supporting database access (reading and writing) via ODBC or native methods is just plain lazy. Companies spend millions of dollars to build data warehouses to hold their data and data mining applications must take advantage of this. Besides saving significant manual effort and storage space, relational integration allows data mining applications to access the most up-to-date information available. I’m happy to say that many of the leading data mining vendors have heard this message but there’s still room for improvement.
  2. Automated Model Scoring. Scoring is the unglamorous workhorse of data mining. It doesn't have the sexiness of a neural network or a genetic algorithm but without it, data mining is pretty useless. (There are some data mining applications that cannot score the models that they produce — to me this is like building a house and forgetting to put in any doors.) At the end of the day, when your data mining tools have given you a great predictive model, there's still a lot of work to be done. Scoring models against a database is currently a time consuming, error prone activity that hasn't been given the consideration that it is due. When someone in marketing needs to have a database scored, they usually have to call someone in IT and cross their fingers that it will be done correctly. If the marketing campaigns that rely on the scores are run on a continuous (daily) basis, this means a lot of phone calls and lot of manual processing. Instead, the process that makes use of the scores should drive the model scoring. Scoring should be integrated with the driving applications via published API's (a standard would be nice but it's probably too soon for this) and run-time-library scoring engines. Automation will reduce processing time, allow for the most up-to-date data to be used, and reduce error.
  3. Exporting Models to Other Applications. This is really an extension to #2. Once a model has been produced, other applications (especially applications will drive the scoring process) need to know that they exist. Technologies such as OLE automation can make this process relatively straightforward. It's just a matter of adding the "export" button on the data mining user interface and creating a means to extend the export functionality by external applications. Exporting models will then close the loop between data mining and the applications that need to use the results (scores). Besides exporting the model itself, it would be useful to include summary statistics and other high-level pieces of information about the model so that the external application could incorporate this information into its own process.
  4. Business Templates. Solving a business problem is much more valuable to a user than is solving a statistical modeling problem. This means that a cross-selling specific application is more valuable than a general modeling tool that can create cross-selling models. It might be simply a matter of changing terminology and a few modifications to the user interface but those changes are important. From the user’s perspective, it means that they don’t have to stretch very far in order to take their current understanding of their problem and map it to the software they are using.
  5. Effort Knob. Users do not necessarily understand the relationship between complex algorithm parameters and the performance that they will see. As a result, the user might naively change a tuning parameter in order to improve modeling accuracy, increasing processing time by an order of magnitude. This is not a relationship that the user can (or should) understand. Instead, a better solution is to provide an "effort knob" that allows a user to control global behavior. Set it to a low value and the system should produce a model quickly, doing the best it can given the limited amount of time. On the other hand, if it is set to the maximum value the system might run overnight to produce the best model possible. Because time and effort are concepts that a business user can understand, an effort knob is relevant in a way that tuning parameters are not.
  6. Incorporate Financial Information. Data mining does not operate in a vacuum. The results of the data mining process will drive efforts in areas such as marketing, risk management, and credit scoring. Each of these areas is influenced by financial considerations that need to be incorporated in the data mining modeling process. A business user is concerned with maximizing profit, not minimizing RMS error. The information necessary to make these financial decisions (costs, expected revenue, etc.) is often available and should be provided as an input to the data mining application.
  7. Computed Target Columns. In many cases the desired target variable does not necessarily exist in the database. If the database includes information about customer purchases, a business user might only be interested in customers whose purchases were more than one hundred dollars. Obviously, it would be straightforward to add a new column to the database that contained this information. But this would probably involve database administrator and IT personnel, complicating a process that is probably complicated already. In addition, the database could become messy as more and more possible targets are added during an exploratory data analysis phase. The solution is to allow the user to interactively create a new target variable. Combining this with an application wizard (#10), it would be relatively simple to allow the user to create computed targets on the fly.
  8. Time-Series Data. Much of the data that exists in data warehouses has a time-based component. A year’s worth of monthly balance information is qualitatively different than twelve distinct non-time-series variables. Data mining applications need to understand that fact and use it to create better models. Knowing that a set of variables is a time-series allows for calculations to be done that make sense only for time series data: trends, slopes, deltas, etc. These calculations have been in use manually by statisticians for years but most data mining applications cannot perform them because time-series data is considered as a set of unrelated variables.
  9. Use vs. View. Data mining models are often complex objects. A decision tree with four hundred nodes is impossible to fit on a high-resolution video display, let alone be understood by a human viewer. Unfortunately most data mining applications do not differentiate between the model that is used to score a database and the model representation that is presented to users. This needs to be changed. The model that is presented visually to the user does not necessarily have to be the full model that is used to score data. A slider on the interface that visualizes a decision tree could be used to limit the display to the first few (most important) levels of the tree. Interacting with the display would not have an effect on the complexity of the model but it would simplify its representation. As a result, users would be able to interact with the system to provide only the amount of information they can comprehend.
  10. Wizards. Not necessarily a must-have, application wizards can significantly improve the user’s experience. Besides simplifying the process, they can help prevent human error by keeping the user on track.

Background:

Kurt Thearling is Director of Analytics at Exchange Applications, a Boston based database marketing company, where he directs the use of data mining and visualization technology in EA's database marketing software and consulting practice. Over the past decade he has developed a number of commercial data mining software products, including Thinking Machines' Darwin and Pilot Software's Discovery Server. He also an independent consultant in areas related to data mining and decision support technologies. His data mining web page can be found at http://www.thearling.com.

 


 

Interview with D S * (November 4, 1997)

Roger M. Stein

Vice President, Senior Credit Officer
Quantitative Analytics and Knowledge Based Systems
Moody's Investors Service
99 Church Street
New York, NY 10007
steinr@moodys.com

D S * : What are the most common serious mistakes made by CIOs implementing data mining and knowledge discovery technologies?

STEIN: "At the CIO level, it's hard to characterize something as a mistake after only a short period of time, which is typically how long most firms have been undertaking data mining programs. This is particularly so given the amount of infrastructure development and learning that must sometimes take place. Having said that, certain patterns of behavior seem to emerge in many data mining projects.

"There is, first, a tendency to focus closely on the tools, getting excited about one or another, as opposed to really looking at how the business problems are structured. Yet it is the structure of the problems that allows the solutions to fall out. You can use any tool to solve almost any problem, if you're willing to work hard enough on it. It then becomes a question of whether you're doing things efficiently and what your likelihood of success will be.

"People sometimes think of these technologies as magic bullets. Much early commercial work in neural networks, for example, took the position that you didn't have to understand either your data or statistics -- just dump the data in, and the technology would find the relationships. But, it turns out that, you must have a very firm background in modeling, statistics and the business domain in order to structure, validate and justify any model, whether it's a decision tree, neural network, discriminant analysis or whatever. So the tool wasn't the answer.

"Because most of these methods involve extensive searches for patterns through very large data spaces, to the extent that you make the process more difficult by a problem poor formulation you're far less likely to find interesting information."

"There is another side to that coin: the problem of overfitting. It's human nature to see patterns in things. When people look at the output of, say, a particular data mining algorithm -- a rule, perhaps that bubbles to the surface of the data -- there's a desire to explain that rule using intuition and background knowledge. And people are very good at figuring out such explanations, even if they're wrong.

"Most people who work in this field have had the experience of finding an interesting (but false) rule, only to later realize that they made some error in problem formulation or picked up some spurious relationship in the data. What is interesting is that people can usually generate a very good explanation for a rule even when the rule is wrong! So it is very easy to fool yourself: that's another mistake you run into frequently. This points out the need for more rigor in some data mining approaches."

D S * : Specifically, how can you guard against the perception of spurious patterns or relationships?

STEIN: "It is a tricky problem. Rigorous testing procedures are important: this breaks with some of the more common statistical approaches to problem-solving that concentrate on evaluating the significance of the parameters of the model itself. I, like most people who work in this field, typically favor out-sample testing. But even this can be tricky! People tend to make technical mistakes during testing. There are lots of stories of developers that thought they were performing exceptionally rigorous testing, when in fact, they were missing a fundamental assumption in their whole approach."

D S * : Should a CIO work closely with a statistician?

STEIN: "Since the CIO sets the vision for the organization, it behooves him or her to have a firm grasp of data mining technology at least at the intuitive level. But this doesn't necessarily mean that the CIO needs to work intimately with a statistician or programmer. That level of in-depth technical familiarity may not be warranted.

"However, the team responsible for development of data mining in a particular domain should be made up of domain experts (business folks), as well as specialists with strong backgrounds in mathematics, statistics, and database programming. This is how I usually structure projects, and I've seen it work quite well and deliver remarkable results. The key is understanding how the technologies will fit into larger business solutions.

"Unfortunately, though, what happens in many organizations is that you get a lopsided team: people with strong business backgrounds who don't understand the technology well or technical people who do all kinds of things to obtain "interesting" information from data, but sometimes end up solving the wrong problems or solving the right problem the wrong way from a business needs perspective.

"There must be a good partnership between the two types of people, not merely a superficial one."

D S * : How can two groups with such different perspectives be liaisoned effectively?

STEIN: "You need a way to map problems onto business solutions and vice-versa. But often all you get, say in vendor literature, is someone pushing a particular tool.

"It is vital that a person who is business-savvy in a particular area understand a little about the technology, at least at a basic level. By the same token, technologists must actively work to understand how business people will utilize the output of their technologies. Both sides need to talk to one another in a sort of middle-ground language, a language that is both technical and business-focused, a language that does not require either a Ph. D. in mathematics or an MBA. This language is what Vasant Dhar and I attempted to provide in our book."

D S * : What kind of resources must be utilized to implement this?

STEIN: "It depends upon the structure and culture of the organization. Some firms work well with project teams. Here, a business team comes together to solve a particular problem, and the technologists act in the capacity of a consulting service, going from team to team and problem to problem. Other firms form a specific group to solve a specific business problem, so a given business unit will "own" that process and take full responsibility for its solution. It really depends on the scope and structure of the project.

"Typically I favor finding a single strategic business need where a moderate increase in data understanding can potentially produce a fairly large impact. This is what I call a "quick kill." If all goes well, more challenging problems can be attacked next. This lets organizations get familiar with the culture of leveraging data and fitting business problems to different technologies. It also allows the technologies to build a track record within the firm."

"The key is that the organization's data and the expertise of its people must be deployed and managed as any other asset, as opposed to being thought of as a support-type function that is drawn upon by the business, like the opening of a faucet. The very concept of using data strategically must be important to the organization! Otherwise, what you end up with is a bunch of technical people trying to get business people excited about a particular idea or a group of business people getting excited about a new toy and trying to figure out how to apply it to their problems."

D S * : What specific role does education play here?

STEIN: "Of course, people should be well-trained, and the assumption is that teams are made up of the types of people I described earlier: specialists in statistical modeling and business strategy, etc. I'm also a strong advocate of self-education in this context: finding out what others have done, going to conferences, reading, etc., and trying technologies out to understand how they can fit into a firm's business strategy. This is especially useful on the management side, where people are not familiar with the technology and may well feel intimidated by the literature. However, I don't think attendance at, say, a three-day a course will teach people how to solve these problems in one shot. The learning experience must be iterative.

"But I want to re-emphasize that there is no way you can get either the business side or the technology side out of the picture. I would never think of developing this type of business process and system without a interdisciplinary group -- period."

D S * : How can an executive realistically benchmark the results obtained from these technologies? How do you judge whether outcomes are marginal, adequate or exceptional?

STEIN: "Technologies cannot provide an answer here. This depends heavily on which problems are to be solved within a business domain and the quality of the data and expertise that are available.

"There is a tendency to say 'This group improved their profitability by x percent...' or 'That project failed by y percent.' But ultimately these are all domain-specific criteria. Accuracy is only one dimension. From a business perspective, things like business flexibility or decision explainability may be far more important, depending on the intended use.

"For example, the Chicago Bulls coaching staff reported that data mining had improved performance by a 2-3 points per game. That's very different from US WEST saying they've saved millions of dollars using OLAP and a data warehouse to improve internal processes by highlighting lapses in transaction processing."

"Results depend upon the goals of an organization, the structure of the entire problem and the structure of the data. And optimal results always reflect where the organization is currently. An organization that's already been using data efficiently for a certain application may obtain only moderate added benefits from data mining. But, as it turns out, most organizations don't take very good advantage of their data, so most can realize large improvements from even simple projects.

"Evaluating results is intimately dependent upon the particular problem itself; it cannot be generalized. It's a chicken-and-egg situation: the problem defines the solution, and the solution defines the results, given the business context."

D S * : How should executives prioritize involvement with data mining technologies?

STEIN: "Again, let business needs drive development. Find a problem that the line cares a lot about. If you then concentrate on the dynamics of that problem -- and what a solution can provide -- you can intelligently evaluate the function of various tools: neural networks, genetic algorithms, recursive partitioning, etc. Each of these has a specific footprint or characteristic in terms of what sort of solution can be provided. Once the business needs are explicitly understood, certain tools and approaches will rule themselves out while others will suggest themselves.

"A classic example is neural networks, where the structure of the final decision model is often considered hard to interpret as compared to, say, a pattern coming out of CART, a rule-tree generating algorithm. On the other hand, neural networks give better fitting of complex surfaces, whereas a CART tree would have to be extremely complex to map such surfaces. There are trade-offs that must be made, and the key is deciding what is important for a particular business application. Note that two people could attack the same problem but nevertheless need very different things on the business side! The ultimate solutions required will dictate different approaches."

"Above all, though, firms need to begin to explore these technologies, if they haven't already, and understand how they fit into their business strategies. It can be very hard to play catch-up in this arena."

Background:

Roger Stein is a Vice President, Senior Credit Officer, Quantitative Analytics and Knowledge-Based Systems at Moody’s Investors Service in New York. He has been working for Moody’s in the field of data mining, mathematical modeling, applied AI, and stochastic simulation since 1989.

In the past nine years, Stein has developed and deployed dozens of models and systems that use applied technologies including fuzzy logic, genetic algorithms, neural networks, etc. as well as models that use standard and more traditional statistical methods. His research has spanned fields from credit and finance to operations research.

In addition to work with learning systems, Stein also spent several years as a rating analyst in Moody’s Structured Finance group where he rated various types of asset backed securities. He was instrumental in developing several analytic methodologies to quantify the risks associated with new types of structured instruments.

Stein is a frequent invited lecturer and instructor at the NYU Stern School of Business, and has also spoken at The Wharton School and The Santa Fe Institute. Along with Vasant Dhar, he is the author of Seven Methods for Transforming Corporate Data Into Business Intelligence (1997, Prentice-Hall) which focuses on the application of intelligent methods to business problems.

 


 

Keys to the Commercial Success of Data Mining

Michel Adar and Nicolas Bonnet

DataMind Corporation
2121 South El Camino Real, Suite 1200
San Mateo, CA 94403
madar@datamindcorp.com, nicolas.bonnet@datamindcorp.com

What’s in it for me - the business person?

Awesome data mining tools, fantastic algorithms, rapidly converging neural networks, highly accurate classification methods, clustering methodologies, etc. are neat and useful tools for the knowledge discovery professional, but they are far from demonstrating significant value to the business person.

The key to Commercial Success of Data Mining lies in providing true value to the business person in a form that can be used and understood by the business community.

We present here several of the most important aspects of how the DataCruncher was designed to accomplish the business users goals.

Recognize who the customer is

To be commercially successful the first thing to realize is that the customer is not your fellow scientist. The customer is not the statistical analyst and the customer is not the mathematician. Sure, you can sell tools to all of them, but in order to be a commercial success you have to sell to the business person.

Then you need to realize that business people spend their money in tools that help solve specific business problems. So your tools need to demonstrate that they are useful in business situations and that they have a visible impact in the business.

Speak Business

The tools have to make themselves comprehensible to the business users. The language used has to be simple and business oriented. Results should be explained in terms that are comprehensible to the business user.

It is important to note that this is not just replacing a statistical concept by the English sentence that describes it. Rather it is realizing the communication with the user in the terms and concepts that are familiar to the user.

Task and Goal Oriented

The business user does not want to create a neural network, a decision tree or an Agent Network. The user wants to solve a specific problem, to find a specific answer. The user wants to know into what segments it makes sense to divide the customers. The user wants to know what customers are most likely to churn. The user want to know how likely a specific customer will respond to a product being currently promoted.

The data mining tools’ user interface should reflect the problems that the user is trying to solve. The specific approach used in the Data Cruncher to attack this issue is the concept of Assistants. Assistants somewhat resemble wizards in the sense that they guide the user through a set of steps, but they are more complete than wizards. They provide random access to the steps and they follow the user through the whole process, always being there, accessible and well documented. They guide the user through the mining process. The assistants also allow the user to step outside the assistant and do things using the full flexibility of the tools when necessary, and then go back to the assistant. The Data Cruncher assistants are customizable to many different business situations through the use of a scripting language.

Bridge the gap between analysis and deployment

Data mining models are developed for a purpose. The data mining tools should help in allowing the user to apply the model to its purpose. For example, if the model is developed with the goal of identifying the best customers for a mailing campaign, then the model should be available where the mailing lit is built. Approaches to solve this issue include providing APIs that enable other applications to make use of the model or adding capabilities to build a mailing list to the data mining tool itself, or integrating the data mining pieces into a mailing list generation.

Another approach is to provide access to the data mining models through a service oriented interface, where the models are published to a centralized server and then can be used by any application wanting to evaluate specific models against specific records. For example, several models maybe built to determine customer segmentation, likelihood to churn, customer value, cross selling opportunities, etc. Then these models can be made available through the server and any number of applications can apply the models to different customers by sending the appropriate messages to the server. For example, a mailing list building application may consult a model to score the likelihood of the customers to respond to the mailer.

Combine explicit with implicit knowledge

An important aspect of bridging the deployment gap is to understand that data mining models alone can not take decisions. The models represent the implicit or learned knowledge. Model results have to be filtered through business rules -which represent the explicit knowledge- before they are put to work. These business rules may contain overrides, additional targeting criteria, geographic or time restriction, etc. For example, a company that sells Video Cassettes may want to avoid offering a rated R film to a customer that is a minor, even if the cross selling model says that the customer’s profile indicates that this is a good title to offer. Another example maybe targeting a churn avoidance campaign to the residents of California, in this case even if the data mining model may indicate that a customer is about to churn, the offer should not be made because the customer does not live in California.

An additional advantage of having business rules combined with the data mining models is that the same rules and models can be used at the many different points where a decision is made. For example, a marketing campaign targeted at attracting new customers may use several models, like customer segmentation, value and likelihood to accept the offer, combined with some business logic maybe used to decide whether the offer should be made. If these models are deployed to a central server together with the business logic, then the same selection criteria can be used at the many points of contact between the company and the customer. For example, the company’s call center, the mailing of the next bill, or the customer’s visit to the company’s web pages.

Make it responsive and easy to change

Business situations change very rapidly. It is very important for the business user to react quickly to changing business conditions. In today’s competitive world it is not acceptable to have the answer to a question be delivered three months after it was asked. For example, a marketing person developing a promotion maybe interested in modeling the customer’s behavior to fine tune the targeting. It is important for this model to be available very soon. In addition, the whole package that includes the several data mining models and business logic should be readily available and easy to modify to adapt to the necessary changes in the promotion.

Decision Delivery Systems

Decision Delivery Systems are designed as a vehicle for bringing decisions to different applications. These systems can typically combine the results of different data mining models with business logic to generate the decisions. As a centralized facility they provide a focused point for the deployment of models and knowledge, helping bridge the gap between the development of useful data mining models and putting them to work.

 


 

Skills and tasks of a data mining practitioner: A report from the trenches

Tej Anand

Golden Books Family Entertainment
TAnand@goldenbooks.com

Previously I have stressed the importance of the knowledge discovery process as opposed to the data mining algorithm. Initially I thought that the development of data mining algorithms was somehow removed from the understanding and documentation of the knowledge discovery process. I now believe that it is not the process that needs documentation it is the algorithms that need documentation with the purpose of understanding their role in the knowledge discovery process. It is possible that this documentation and subsequent analysis will lead to the discovery that some of our most popular data mining algorithms need to be modified so that they fit into the knowledge discovery process.

In this position paper I will discuss the skills that a data mining practitioner who works for a mid-sized (less that $500 million in annual revenue) non-technology commercial business is likely to have. I will then discuss some of the tasks that these practitioners are expected to perform. Finally, I will describe how a popular class of data mining algorithms was augmented in one tool to support one of these tasks and conjecture on how we should modify/augment other data mining algorithms to support the remaining tasks. Skills – In my experience a large number of data mining practitioners have finance or marketing as their core skills. These people are usually creative and understand the semantics behind business numbers. On the average these individuals are adept personal computer users for the purposes of (1) sorting, averaging, taking percentiles, joining (as in database tables) and categorizing numbers; (2) presenting numbers as creatively formatted tables and charts and (3) providing concise summaries of all of the numbers in a narrative. Please note that these individuals have very little skills that we would consider as core statistical or database skills. Usually individuals with statistical skills play supporting technical roles and are usually removed from the business knowledge that data mining practitioners need. My guess is that there are approximately 50 data mining practitioners in a mid-sized company without any core statistical skills and approximately 5 with statistical skills playing support roles.

Tasks – Most of the tasks being performed by data mining practitioners can be described as follows.

1. Identification of "root causes" for increases or decreases in sales revenue and/or costs when compared to some base.

2. Forecasting of sales revenue and/or costs for existing and new products/services.

3. Analysis of trends associated with the market, customers and competitors.

4. Continuous classification/categorization of the company’s business.

All the four types of analyses very rarely include the explicit development of analytical models. I will refer to the above four tasks as "strategic analysis".

Current Situation – I believe that most existing data mining algorithms and products are focused towards the development of analytic models. In my opinion the development of such models requires core statistical expertise even if one is using algorithms developed by the machine learning community. By the way just because a product has an easy to use graphical interface it does not automatically become amenable to use by the non-technical community. For true ease of use the non-technical community should easily understand the semantics of the task that a software product requires its users to carry out. Also, often analytic models are focused on narrow areas of the business and their application is in the operational end of the business not the strategic end of the business. The current situation results in three consequences:

(1) We narrow the data mining market to the practitioners with statistical skills.

(2) We loose the opportunity to increase the quality of strategic analyses being conducted by businesses.

(3) We loose the opportunity to show quick cost savings in terms of personnel reduction and process improvements.

How can the current situation be improved? What has got me excited and intrigued is the realization that the analytic process that data mining practitioners involved in strategic analysis go through is no different from the process that we go through in the development of analytic models. In the past we have referred to this process as the knowledge discovery process. So if we can focus on embedding algorithms without altering the semantics of the analytic process we will greatly increase the quality of the strategic analysis being produced by businesses. For example let us look at a data mining product called Forecast Pro. Forecast Pro has embedded sophisticated exponential smoothing algorithms within the forecasting process. When using Forecast Pro the analyst is engaged in tasks that are no different than if exponential smoothing algorithms were not being used. Forecast Pro helps the analysts identify if the forecast being created deals with seasonality, cycles or exceptional market conditions. Forecast Pro then selects the best algorithm, creates the forecast and visually allows the analyst to understand the accuracy of the forecast. Most of the technical jargon and steps are not visible to the analysts. Obviously Forecast Pro does not produce a forecast that is as good as that produced by a custom analytic model but it does far better than a forecast that the analyst would have produced based on intuition without the use of any data mining algorithm. Forecast Pro is designed in a way that I feel comfortable giving it to someone without any core statistical skills to use. The immediate impact of using this tool is that the business has a better forecast, the forecast is a lot less cumbersome to generate and 1 analyst instead of 3 can generate the forecast. Of course, I would not use Forecast Pro in an "operational" environment for producing detailed forecasts to drive logistics. There I would prefer a custom analytic model.

If you agree with my analysis above then we should start working on embedding existing data mining algorithms within the analytic processes of the various tasks listed above. Induction or regression algorithms can be modified to help analysts within the "root cause" analysis process. The goal here would not be to develop precise causal models but to produce causal analysis that is more accurate and insightful than what is currently being produced and to produce this analysis faster and cheaper than it is currently being produced. Similarly clustering algorithms can be modified to help analysts within the continuous classification/categorization process. Finally time-series and sequence analysis algorithms can be modified to help analysts with the trend analysis process. My position is that if we work on embedding data mining algorithms within the analytic process we will discover the true reason for the research dedicated towards integrating data mining algorithms with database management systems, the development of knowledge discovery process models, the development of "hybrid" data mining algorithms and the increasing use of data visualization. I also believe that the more appropriate accuracy and performance comparisons are not among data mining algorithms but between a easy to use data mining algorithm and no algorithm at all. Finally, I believe that work on embedding data mining algorithms within the analytic process will lead to the creation of a community of experts in the analytic process. Currently I thing we are a community of algorithm (or tool) experts. Imagine a carpenter who only knows how to use a saw!

 


 

Intelligent Information Delivery: When too Much Knowledge is a Dangerous Thing

Judy Bayer, Ph.D.

Vice President, Analytic Solutions
Ceres Integrated Solutions
jbayer@ceresios.com

Introduction

Recently, users of advanced information systems have begun to realize the value of incorporating automated alerts into their systems. Automated alerts are analytical agents that are designed to automatically find managerially interesting and important information in a database. The agents operate without user intervention, but report important information back to users whenever critical events are found in the database.

Alerts can be a powerful analytical tool to keep managers informed as to critical problems and important business opportunities. All it takes, it seems, is having the correct underlying sources of data for the alerts to operate on and then creating the appropriate set of alerts. The problem with automated alerts is that the volume of information automatically returned to the user can quickly become overwhelming. Analysis is easy. Knowledge is hard. A vital component in the development of knowledge is the recognition that an event is something that is important for the recipient to know about; in fact, that it is more important to know about than other significant events.

In this paper, we examine the implications of "alerts run rampant" on the ability of the alerts system to provide actionable knowledge to the organization. We then provide a simple example of an Intelligent Information Delivery (IID) mechanism that functions as a meta-layer to the alerts system. The IID layer evaluates the importance and criticality of alert-created information across all alerts in the system. It then decides on the disposition of specific pieces of information. Finally, we describe how the IID layer can be used as a mechanism to derive knowledge out of analyzed information from data mining systems, in general.

Alerts Run Rampant

Data mining systems, in general, are geared towards the analysis of vast amounts of data, and are designed to produce large quantities of analyzed information that, essentially, have to be sifted through and analyzed before they become useful as business decision making aids. This fact can become a critical issue when applied to automated alert systems. These systems are designed to perform data mining automatically and continuously. An example from the consumer packaged goods (CPG) industry will show the magnitude of the problem.

The CPG industry has, for many years, had the availability of rich sources of data. For most grocery products, vendors such as A.C. Nielsen and IRI sell sales scanner data that tracks all competitive products in a category, by UPC (the individual product, the lowest level information that manufacturers track for sales purposes), in each of fifty or more markets. A typical category can have 1,200 or more UPCs in each market. Most packaged goods manufacturers receive updates weekly. This means that a single alert measure can be tracking 60,000 possible events each week.

There are, however, many more than a single important alert measure that packaged goods manufacturers need to track. Some key alert measures for the packaged goods industry include short term market share changes for all UPCs in the market, trends in market share changes, introductions of new competitive items, competitor price changes, and changes in competitive promotional activity. Competitive activity is inferred by observing such things as retailer promotion pricing actions, increased levels of distribution for a competitor’s UPCs, and retailer promotions, such as increases in point of purchase displays, major ads and coupon activity. Since packaged goods marketers typically micro-market, each UPC has to be tracked, by market, for each alert measure.

There can easily be hundreds of thousands (or many more) events being tracked automatically. Because the CPG marketing environment is highly competitive and dynamic, there can easily be thousands of events that set off trigger conditions to alert a user. The situation gets even more overwhelming when we consider the fact that advanced marketing analysis systems in the CPG industry sometimes also embed sophisticated data mining technology that automatically analyzes causal factors associated with some alert conditions. The ensuing report, then, includes not only alert information, but also details of an analysis. The information overload that results can set up a condition where the user has to either spend all his or her time on reviewing the results of alerts, or ends up just ignoring the output of the alerts system.

What an Intelligent Information Delivery System is

An Intelligent Information Delivery system is essentially a knowledge-based system that:

  1. monitors and intercepts all outputs from the alert system,
  2. performs some analyses on the set of alerts that evaluates alert output based on the totality of what is known, and
  3. applies business rules to the output of the process to determine which outputs from the alert system are critically important for a user to know about.

The IID system functions as a meta-analysis layer for the alerts system. It evaluates alert-created information across all alerts in the system. Based on results of analysis and the rules contained in the system, it decides on the relative importance of the various alerts. The IID system also decides who, that is, which users, should receive specific pieces of information. The IID knowledge base contains rules related to managerial objectives that guide the selection of output for individual users. Development of this knowledge base is based on conducting knowledge engineering sessions with key business users to determine specific business rules to incorporate in the system. Actual application to individual users is based on creating user settings stored in a database table and accessed by the meta-analysis layer.

Simple Example of an Intelligent Information Delivery Mechanism

As an illustration, we provide a simple example of an IID system. The system has all three components: 1) an alert monitor, 2) meta-analysis capabilities, and 3) a business rule knowledge base. The IID system is designed to support a consumer packaged goods alert system.

Alert Monitor

The alert system in the example polls the database and performs its analyses weekly to coincide with database updates based on marketplace scanner data purchased from IRI or A.C. Nielsen. The focus of the system is on information contained in this data. The IID Alert Monitor intercepts all alerts that are in its domain of knowledge. No alerts are passed onto users at this time. The Monitor holds the alert information until all the alerts have finished processing the updated information in the database.

Meta-Analysis Layer

The Meta-Analysis Layer synthesizes results of the alerts process and performs further analysis. For example, it will do cross-market analysis of alerts to discover whether or not an alert condition is specific to a single market, or whether it reflects a more general condition. It will also check if the alert is a one-time occurrence or whether there has been a pattern of these conditions over time.

The Meta-Analysis Layer also makes an assessment of the overall volatility in the marketplace. Highly volatile markets can be expected to have many fluctuations in market share, retailer promotional activity, and competitor product introductions. Some alerts that might be considered significant in a non-volatile market, after this assessment may no longer be important enough to report.

Business Rules Knowledge Base

The Business Rules Knowledge Base contains rules developed by conducting in-depth interviews with key business managers in the organization responsible for taking action based on the results of the alert system. The business rules are mapped against the meta-analyzed alerts to determine which alerts are really important to know about, and who receives which alerts.

Consumer packaged goods marketers often focus on Brand Development Index (BDI) and Category Development Index (CDI) measures when running their business. BDI ranks markets as to the strength of the brand in that market. Markets where the brand has a high market share, high BDI markets, are ranked ahead of markets where the brand has a low market share. CDI ranks markets as to the strength of the overall category in the market. Markets where category sales are high (high CDI markets) are ranked ahead of markets where category sales are low.

Brand strategies often incorporate the relative importance of these measures and how to use them. For example, a brand strategy that focuses on increasing market share may often focus on high opportunity markets – those with high CDI, but low BDI. A brand strategy that focuses on maintaining current brand strength may focus on high BDI markets. An important element, then, of the business rules knowledge base may be the incorporation of rules related to BDI and CDI.

An emphasis on BDI and CDI could lead to the following rules:

Rule 1:

IF Brand Strategy is to focus on High Opportunity Markets
THEN Alerts should be ranked by the CDI of the market they relate to

Rule 2:

IF Brand Strategy is to focus on High Brand Strength Markets
THEN Alerts should be ranked by the BDI of the market they relate to

There will be other rules relating to other strategies that incorporate additional factors.

Other rules in the knowledge base may relate to results of cross-market analysis, prioritization of negative information about the marketers brand, prioritization of positive information about competitors’ brands, priority given to trends versus one-time events, and thresholds related to when to consider competitive activity important.

Rule 3:

IF Cross-market analysis shows an overall strong pattern
THEN This is an important alert

Rule 4:

IF UPC is for OUR Brand
AND There is a downward trend in market share
THEN This is an important alert

Rule 5:

IF UPC is for Key Competitors’ Brand
AND There has been a Highly Significant increase in market share
THEN This is an important alert

Rule 6:

IF Market share change for a UPC is > twice the average Market Share change
THEN This is a Highly Significant increase in market share

Rule 7:

IF UPC is for Key Competitors’ Brand
AND There has been at least a three month trend in price decreases
THEN This is an important alert

The above is just a small sample of the business rules knowledge base that would be developed for even a simple Intelligent Information Delivery system. However, even a simple IID system can reduce the volume of output of alerts from hundreds of pages containing thousands of analyses to just the few most important findings.

Conclusion

In this paper, we introduce the concept of Intelligent Information Delivery systems – systems that form a meta-layer on top of an alert, or other type of data mining system. The IID monitors the alerts produced and decides which information is most critical to bring to users’ attention. We also give a brief example of an IID system.

As data mining systems and systems of alerts that repetitively and automatically analyze information in databases become more prevalent, the problem of what to do with all the answers that come out will become increasingly important. The alternative is that over time, users of these systems will find that the more analysis they receive, the less they end up knowing.

Background:

Judy Bayer is Vice President, Analytic Solutions for Ceres Integrated Solutions. In that capacity, she has worked to help companies integrate marketing information into the general decision-making process. Prior to joining Ceres, Dr. Bayer taught marketing at the MBA, Ph.D., and undergraduate levels at Carnegie Mellon University and New York University, and was Vice President of Advanced Technologies at a Business Intelligence Consulting firm. Her expertise includes marketing research, business and market modeling, data mining and knowledge-based systems for managing information intensive environments, customer database marketing, and technology adoption. She has worked with leading companies in the packaged goods, computer, retail, insurance, financial, telecommunications and defense contractor industries and has presented her work to executive groups such as The Conference Board and the Advertising Research Foundation. Recently, she has led strategic seminars in data mining concepts and products.

Dr. Bayer’s research on knowledge-based systems has been widely cited in books and articles on marketing management and advanced marketing information systems. She has authored or coauthored more than 25 professional journal publications, white papers and technical reports.

 


 

Data Mining and Visualization for Agent-Based Modeling

 

Robert N. Bernard and Alan R. Shapiro

PricewaterhouseCoopers Consulting
1301 Avenue of the Americas
New York, NY 10019-6013
robert.bernard@us.pwcglobal.com, alan.shapiro@us.pwcglobal.com

Introduction

PricewaterhouseCoopers is a global accounting and management consulting organization that has seen a steady increase in its data mining practice over the past three years. Over 65 employees in the U.S. practice devote full-time exclusively to the application and development of data mining techniques. The areas of application include: demand forecasting, supply chain management, market segmentation, customer lifetime profitability estimation, trading surveillance, detection of opportunities for cross-selling, and fraud detection, to name a few. Data mining at PricewaterhouseCoopers is viewed as an analytic process for defining and meeting clients’ information needs rather than as simply a set of techniques.

In this paper, we delve into the work of a particular group at PricewaterhouseCoopers Consulting, the Emergent Solutions Group (ESG). ESG provides forecasts of customer demand in a variety of industries for clients who are interested in near real-time decision support. ESG forecasts customer demand primarily through a technique called adaptive agent-based simulation modeling. Instead of standard numerical forecasting techniques (e.g., regression, ARIMA), ESG’s adaptive agent-based simulation modeling attempts to replicate decision processes and interactions of actual consumers in an environment. For instance, during the simulation of the activities that occur in a retail store during a day, we would model each consumer that enters the store, what the consumer thinks about while browsing the store, what kind of information consumers might exchange with each other and with the (simulated) store clerks, and the contents of and location at which a transaction took place.

Data Mining for Agent-Based Modeling

PricewaterhouseCoopers Consulting Emergent Solutions Group uses data mining for agent-based modeling in two separate contexts: to imbue its agents with realistic knowledge and to extract information from clients’ data and from our own simulations. First, we are exploring text mining as a method of extracting realistic behaviors for our agents. Second, we have developed several methods and practices of high quality visualizations of the results and processes in agent-based modeling.

Agents need realistic behaviors in order to be useful in forecasting customer demand. We are exploring the use of text mining as a technique in enhancing the complexity and realism of our agent models. It has become increasingly clear that purely numeric data may not contain all the essential details needed for modeling human behavior. Textual data that describes unusual circumstances, or that gives insight into reasons why actions were taken, clearly contains meaningful information not to be found in a simple number. If there is excessive reduction for purposes of numerical manipulation, the information crucial for explanation may no longer still be in the analysis. A variety of approaches, from full natural language processing to simple identification of noun phrases, have been used for extracting information from text. Approaches which rely on the identification and exploitation of restricted sublanguages, together with limitations on the types of information processed, have produced useable, albeit limited, results (Grishman, 1997). Using such an approach, automated text analysis has been combined with techniques such as association analysis and rule induction to explore text-containing databases for new insights (Shapiro, 1983). More recently, the most productive approach we have found has been to create an environment in which extensive processing is used to complement human context and pattern recognition capabilities.

The ability to extract reliable qualitative information from written text (such as interviews or transcripts) is exceedingly valuable in imbuing ESG’s agents with realistic behaviors. Decision processes of actual people that may have gone unaccounted for or unnoticed by virtue of using merely traditional large-scale survey techniques, can now be captured and utilized in an agent-based simulation model. Of course, ESG also uses traditional survey techniques to gauge the demographic characteristics, declarative knowledge, interactions between one another, and other cross sectional properties of the agents. Combining the dynamic behavioral data garnered from the text mining process and the static characteristics from traditional survey techniques provides a rich source of data that eventually results in forecasts that are more accurate than those provided by conventional numeric techniques.

Once the agents have obtained real characteristics and behaviors, we run them in our simulations; as stated above, many of these simulations are forecasts of consumer demand. These simulations use our IceCore™ technology, which allows seamless communication between the simulation code and a relational database. These simulations serve three purposes: first, they allow the results of the simulations to be interpreted as forecasts of consumer demand; second, the data generated by these simulations can be mined (a la Stein and Bernard, 1998), using both standard mining techniques as well as visually, to see if any interesting patterns exist; and third, the simulation can be examined visually, while it is running, to see if any interesting patterns can be picked up upon visually.

As mentioned above, data visualization of simulation results and client data is a key component of much of ESG’s work. Using the graphics capabilities of Silicon Graphics workstations, ESG can animate high dimensional data so that clients can see the simulation develop over time. The ability to explore, discover, and portray patterns visually appeals to some clients’ non-quantitative inclinations, and frequently provide a more holistic and gestalt understanding of the hidden messages in the data than presenting simple rules or single numerical answers.

Visually mining forecasts done through agent-based simulation can also provide insight faster that in the context of a prose document. By visually observing several different variables over time, clients can better grasp the intricacies of the dynamic nature of many consumer markets. Furthermore, clients are also able to peer into the nature of the decision processes of individual agents (i.e., the synthetic consumers) and obtain an intuitive explanation as to why the agent purchased or did not purchase a particular product. Intuitive explanations are not available from the results of a neural net, for instance.

Finally, to allow clients to view a simulation as it progresses, ESG has developed an non-commercial public domain usage protocol for doing three-dimensional visualization of agent-based simulation models, the Remote Simulation Visualization Protocol, a.k.a., RSVP (Borges and Sigvaldason, 1998). RSVP consists of a series of simple commands that attach to a programming language such as C++. These commands control the movement and display of agents in a three-dimensional simulated environment. Clients can see the movement of agents in an environment as the simulation progresses. Thus, they have the unusual ability to peer into the behavior of a world and use a very valuable data mining tool that we sometimes overlook – the human brain.

References

Grishman, R. 1997. Information extraction: techniques and challenges. In, M. Pazienza (ed.), Information Extraction. Berlin: Springer-Verlag, 10-27.

Shapiro, A.R. 1983. Exploratory analysis of the medical record. Medical Informatics (Special issue -- New methods for the analysis of clinical data), 8(3),163-171.

Borges, B. and T. Sigvaldason. 1998. Bar stool theorizing: on the validity of economic signals in bounded rational worlds. Paper presented at A-LIFE 6. Los Angeles, CA. June 1998.

Stein, R. M., and R. N. Bernard. 1998. Data mining the future: genetic discovery of good trading rules in agent-based financial market simulations. Proceedings of the IEEE/IAFE/INFORMS 1998 Conference on Computational Intelligence for Financial Engineering (CIFEr): 171-179.

Background:

Rob Bernard is a Senior Associate in, and the leader of the New York office of PricewaterhouseCoopers Consulting's Emergent Solutions Group (ESG). Rob specializes in statistical and qualitative analysis of ESG's forecasting capabilities. In addition, he develops adaptive agent-based simulations for governmental policy makers combining federal, state, and local data sources. Rob is currently finishing his Ph.D. in Urban Planning and Policy Development at Rutgers University.

Alan R. Shapiro is in the Business Intelligence Practice at PricewaterhouseCoopers with a primary focus on data and text mining. Dr. Shapiro trained in multivariate statistics at the University of North Carolina at Chapel Hill and then in statistical pattern recognition and adaptive signal processing at Stanford University. He taught applications of statistical pattern recognition as a professor in the Department of Mathematics at the University of California, San Diego and at the Medical University of South Carolina. Over the past twenty years, Dr. Shapiro has directed the development of multiple analytic database systems in medicine and finance. His current research interests involve methods for the analysis and visualization of the information contained in text.

 


 

Business focus on data engineering

Wray Buntine

Ultimode Systems
wray@ultimode.com

Data mining has emerged this decade as a key technology for areas such as business intelligence, marketing, and so forth. For the purposes of discussion, application and business domains I will consider here include telecommunications, medical devices, space science (vehicle health management and scientific instrumentation), targeted marketing, and mining.

From a technical view, I don't consider data mining to be a new field, but rather another discipline in the lengthy history of engineering sciences that use data is a core focus for developing knowledge. This family of disciplines I'll consider here under the term "data engineering" (see our company position at http://www.ultimode.com/papers/data.html).

Some traditional and non-traditional examples follow: Data engineers work with physicists in analyzing spectral data measured from a high-resolution imaging spectrometer develop sophisticated models of the spectrometer's complex error modalities (registration, response function, calibration, measurement glitches) so that a high-fidelity model of the spectrometer's measurements can be developed. Data engineers investigating the performance of an industrial strength place-and-route package uncover useful characteristics of the optimization process and thereby improve the performance of the algorithm. Data engineers work with astronomers in analyzing infra-red data from an electronic star-catalogue. The analysis, in concert with the astronomer's interpretations reveal new, publishable classes of stars and also uncovers troublesome, never-before recognized artifacts with the original instrument. Data engineers in a large corporation investigate the bad debts database and uncover useful patterns in selecting targets for debt recovery, thereby dramatically improving the corporation's debt recovery.

At the time of the development, the individuals performing these tasks may have considered themselves applied machine learning researchers, decision analysts, statisticians, or neural network researchers, however they were all performing data engineering. You may have also head of the terms data mining and knowledge discovery, exploratory data analysis, intelligent data analysis, and so forth. These areas perform similar tasks, however have a particular emphasis that distinguishes their origins, whether it be the applications they serve of the algorithms for data analysis that they use. Data engineering is inherently a multi-disciplinary field, because of the number of technologies involved: visualization, data analysis, knowledge engineering, perhaps data bases, and of course the subject matter of the application.

So there we have the technical background of the community, and some idea of the range of applications. What are the business implications here. A number of factors have emerged in our consulting work that are beginning to give me a better understanding of the business nature of the discipline. First, the community has a number of different focuses.

Our experiences in this third focus present an interesting conundrum for the business manager. We find that in this third focus, there is a big difference between the results of the "average" practitioner and the "quality" practitioner. Every good software manager would know that a really good programmer can produce 100 times more code than an average programmer, partly due to the net result of subsequent maintenance, reduction in overhead and systems validation, and so forth. We find the same with data engineering. Except with data engineering, we find there are a few key insights made in a project that make all the difference. Mundane use of the "usual tools" in the "usual manner" by the average practitioner gets you so far. But a big difference in performance is gained by the quality practitioner who makes a few key insights to change the project.

I will give one technical example. For our mining and targeted marketing clients, for instance, we are under NDA on our key discoveries not too disclose the details. Related public-domain examples where non-trivial analysis of the data makes a key difference can be found at the Ultimode System's Case Studies page.

The following example comes from a long term project between Ultimode Systems and NASA Marshall Space Center called OPAD. Looking at the high-resolution spectrometer data taken of the NASA space shuttle main engine, everyone thought the significant OH component of the spectrum varied significantly from engine firing to engine firing and thus the task of determing the subtle but significant metal lines in it would be difficult. No, they were wrong. To everyone's surprise, I managed to show that the instrument's irradiance calibration data was integrated over too short a period, thus producing fluctuations. Unfortunately, the manufacturer hard coded the integration period into the instrument. I also managed to show that smoothing methods (only an experienced professional would know about) could be used to correct for the problem, and thus we now obtain lovely consistent OH spectra from engine firing to engine firing.

What are business implications here? There are several.

Regardless of what happens to data mining as a community, we know that data engineering in one form or another will continue to remain a key enabling technology for many businesses, and thus finding the right balance between software, intellectual property, and so forth, is all part of the evolution of the industry.

 


 

Kensington Approach Towards Enterprise Data Mining

Jaturon Chattratichat, Yike Guo, Stefan Hedvall, and Martin Kohler

Data Mining Group, Imperial College Parallel Computing Centre
University of London
jc8@doc.ic.ac.uk

The Kensington system, which is being developed at the Imperial College Parallel Computing Centre in University of London, aims to provide an enterprise solution for large-scale data mining in environments where data is logically and geographically distributed over multiple databases. Supported by an integrated visual programming environment, the system allows an analyst to explore remote databases and visually define and execute procedures that model the entire data mining process. It also provides learning algorithms, optimised for high-performance platforms, for the most common data mining tasks. Decision models generated by the system are evaluated and manipulated using powerful interactive visualisation techniques. The overall aim of the system design is to provide an integrated, flexible and powerful data mining environment as the basis for customised domain-specific applications. The main features of the system design are discussed in turn below.

Distributed database support

Today, many companies store large quantities of data in data warehouses. The data is potentially rich and useful for data mining. A data mining system should allow seamless integration of both local files and remote databases. The Kensington system enables database integration in preparation of data mining by providing remote database access via JDBC. Analysts can query and retrieve data from their remote and distributed databases across the Internet. The ability to query several remote databases concurrently means that an analyst can now efficiently combine and enrich the data for mining.

Distributed Object Management

The Kensington system adopts a three-tier approach based on the Enterprise JavaBeans (EJB) component architecture, to support data mining in an enterprise environment. The component-based middleware is designed to support scalability and extensibility. Application servers can be transparently distributed for scalability or replicated for increased availability. The system also supports efficient management of resources and multi-tasking capabilities. In an enterprise where resources such as databases and high-performance servers are shared, the Kensington system enables efficient resource management and scheduling.

The data mining procedures that are defined and customised with the Kensington system can be flexibly deployed in the enterprise.

Because a data mining procedure is treated as a graph of components, each of them can be scheduled to use appropriate resources. The middleware’s management strategy of the logical component and physical resources ensures that all facilities are used efficiently.

Groupware, Security and Persistent Objects

In an enterprise where information is often shared within a workgroup, it is important that a data mining system supports the exchange of information in order to enhance productivity. Therefore, the Kensington system enables persistent storage of components so that they may be transparently shared and reused. Important information such as data, defined data mining procedures/templates, or generated decision models are managed as persistent objects, which can easily be exchanged between group members. The system provides on strong security for data transfer and model distribution through secure socket communications. Access control mechanisms protect a user’s or group’s private resources from unauthorised access.

Universal clients - user friendly data mining

For maximum flexibility and easy deployment, client tools are Java applets that run securely in Web browsers anywhere on the Internet. A data analyst is therefore not bound to any specific location or computer.

Effective human-computer interaction is a strong feature of the Kensington system. Based on the visual programming paradigm, the Kensington client provides an integrated workspace for the visual construction of data mining procedures. The workspace includes wizards and templates for database connection, shows the user’s view of the persistence object store and provides the data mining task construction area. A data mining procedure is built visually as a connected graph and executed on request. The model or models returned by the mining components can be viewed with appropriate visualisation applets in the client.

Besides various data mining algorithms and data manipulation tools, the client interface also provides Java-based visualisation tools for data and model analysis. Data visualisation allows users to view and manipulate data before it is mined. Complex models, produced from data mining algorithms, are presented as interactive visual objects. The Kensington system provides various 2D graphing tools and a 3D scatter visualiser for data visualisation. A decision tree visualiser, association rule visualiser, and a cluster visualiser are examples of tools used to present mining models to the user.

High Performance Server

An important issue in data mining is the speed and performance of the task. In a competitive business environment where quick and precise decisions are needed, it is essential that a data mining task is performed within a reasonable time. Given the enormous size of data accumulated today, many analysts have turned to high performance computers for a solution. Kensington’s middleware serves as a gateway for connecting high performance servers to thin clients and distributed databases. In addition, the system provides several optimised parallel algorithms to support common data mining tasks. These components include data mining algorithms for classification, clustering, association rule analysis and neural networks.

Industrial Partners

Although parts of the Kensington system are still under development, it has attracted enthusiastic interest from various users in application areas ranging from retail information system providers, food and chemical industry and bio-informatics service providers.

We have applied the system to various real world applications including cluster analysis of the UK National Transport Survey (in collaboration with the University of London Centre for Transport Studies), the classification of large software codes of an international IT consultancy and intrusion analysis of network security, using classification algorithms and association rule discovery.

 


 

Completing a Solution for Market Basket Analysis

Scott Cunningham, Srikant Sreedhar, and Bill Smart

Knowledge Discovery Group
Human Interface Technology Center
NCR Corporation
5 Executive Parkway, N.E.
Atlanta, GA 30329

Tej Anand

Golden Books, Inc.

Introduction

Algorithms for finding rules or affinities between items in a database are well known and well documented in the knowledge discovery community. A prototypical application of such affinity algorithms is in "market basket" analysis - the application of affinity rules to analyzing consumer purchases. Such analyses are of particular importance to the consumer package goods industry. The retailers and wholesalers in this industry generated over 300 billion dollars of sales every year in the United States alone. Despite the economic importance of this industry, data mining solutions to the key business problems have yet to be developed. This position paper discusses some of the problems of the consumer package goods industry, notes a case study of some of the challenges presented to data miners within this industry, and critiques current knowledge discovery research in these areas.

Business Problem

The consumer package goods industry exists within a complex economic and informational environment. Mass merchandizing of products is in decline; U.S. consumers are increasing recognized as belonging to fifty (or more) distinct segments, each with its own demographic profile, buying power, product preferences and media access. The items being sold, consumer products, are more diverse than ever before; a single category of food may easily contain hundreds of competing products. Within this highly differentiated environment, strong product brand names continue to offer a strong competitive advantage. By themselves temporary price reductions are not sufficient for establishing consumer loyalty to either a store or product. Consumers are knowledgeable, and mobile, enough to seek out the lowest possible prices for a product. Ultimately consumer value is gained by those retailers able to negotiate favorable terms with their suppliers. Retailers gain the requisite detailed knowledge of customers through the creation of consumer loyalty programs and the use of on-line transaction processing systems; this information about the consumer is a crucial component in retailer-supplier negotations.

Consumer package goods is a mature industry in the United States. Profit is no longer merely a matter of opening more stores, and selling to increasing numbers of consumers; the market is becoming saturated, and the available consumer disposable income largely consumed.

Maintenance of an existing customer base is more important than growing entirely new customers; this new phase of retail growth is based upon selling more and a greater variety of products to pre-existing consumers. The most profitable retailers are those that are able to maintain or reduce their operating costs. Economics of scope, not scale, determine profitability. Data warehousing is one of the foremost technological means of increasing operational efficiency. Efficient consumer response systems, based upon data warehouses, are expected to save the industry $30 billion a year. Category management, an organizational strategy for enhancing retailer-wholesaler coordination, is another means of increasing operational efficiency. In the following two brief case studies we examine how data warehouses, category management, and data mining techniques show promise for answering the concerns of two large consumer package goods companies.

Case Studies

A major international food manufacturer, with significant brand equity and a wide variety of manufactured products, is interested in optimizing its product advertising budget. Like many package goods retailers, this manufacturer has an extensive and rapidly growing advertising budget. Essential to the endeavor is the cooperation of their independent retail outlets in the creation and design of product promotions. The manufacturer sought to create a suite of software tools for the design of promotions, utilizing the newest data mining technology, and to make these tools available in real time to their category managers and to the managers of their retail outlets. The business case suggested that there would be at least three sources of return in the creation of this tool: I mproved coordination with retailers; more effective cross-sales across product categories; reduced promotional competition from other manufacturers; and enhanced promotional returns. NCR proposed and designed a state-of-the-art neural network for forecasting and optimizing planned promotions. The network met, or exceeded industry standards, for promotional forecasts (within 15% of actual sales, 85% of the time). Despite the statistical quality of the results the application was never put into production by the manufacturer; the software design necessary to implement the results was too complex. Part of the application complexity stemmed from the hierarchical data types necessitated by the varied products and markets; another component of the complexity was reconciling the different product world-views of manufacturer and retailer.

A major regional food retailer, a grocer, sought analyses of its consumer transactions within its produce and salad dressing departments. The retailer anticipated improved design of store layouts, improved promotional design, and an insight into the market role of the various highly differentiated products within the category. The retailer clearly anticipated a causal analysis which would reveal the products which, when purchased by consumers, would lead to additional add-on sales of other products. NCR produced a market basket analysis which revealed the distinctive purchasing profiles that are associated with each major brand of interest. The NCR analysis revealed that the best selling brands were not those that resulted in the greatest amount of attendant sales. The NCR analysis supported the existing category management plans by the retailer, and also independently confirmed the results of a demographic panel survey. Despite these successes the market basket analysis, by itself, did not produce any new actionable results for the retailer. In the next section, on data mining, key data mining algorithms and outputs are examined for their suitability for answering these, and other, consumer package goods questions.

Data Mining Solutions

Affinity algorithms are well-understood and well-documented by the data mining community. The quintessential application of affinity algorithms is in the area of market basket analysis. For instance, these algorithms when applied to market basket analysis produce rules such as "Those baskets producing product X are also 75% likely to contain product Y." Additional research has focused on optimizing the speed and efficiency with which these rules are found; however additional applied research is needed to the support decision making needs of the consumer goods industry (and other relevant business groups).

First, affinity algorithms produce individual, isolated rules; associations between groups of products are not revealed. While the analysis can be repeated across all products in a category, or even a store, the number of rules produced grows exponentially. Not only is this computationally complex, but the resulting welter of rules is hard to interpret as well. Second, the output of affinity algorithms seem to suggest causal relationships between products. Yet the algorithms themselves embody no causal assumptions. The nature of product affinities needs to reconsidered; either a new and causal form of affinities analysis needs to be produced, or a thorough understanding of non-causal applications and use of affinity rules needs to be obtained. Third, affinity algorithms lack robustness. The algorithms produce a point estimate of affinity; yet retailers need to understand how (and if) these rules apply across larger groups of transaction. A similar issue is the minimum sample size needed to produce robust results. Fourth, market basket analyses carry implicit information about consumer preferences. Even when consumer identification is missing from transaction data, the data can still be grouped or segmented using data mining techniques to reveal distinct groups of consumer preferences. Affinity algorithms imply that samples are taken from homogenous groups of customers; yet business knowledge suggests that consumers are highly varied in taste and expenditure. Fifth, the market basket analyses, for some set of business questions, may require the rigor of a properly designed statistical experiment. Reasoning from standard to promotional pricing, as well as reasoning from standard display conditions to promotional display conditions is unwarranted. Yet much of the potential of market basket analysis stems from the capacity of retailers to manipulate product pricing, display or even attributes to meet consumer need. Sixth, and finally, standard forecasting tools produce estimates of sales single goods across times. (This is not conventionally the domain of market basket or affinities analysis.) However retailers and manufacturers need to have forecasts for whole groups of products. Producing individual product forecasts, and then aggregating, will not produce optimum forecasts since sales of one product contains information about the potential sales of other products; indeed, the forecasts may not even aggregate correctly. Techniques such as "state space analysis" which combine forecasting with multivariate analysis, may prove useful.

Recommendations

The consumer package goods industry is an important, and expansive, industrial segment of the economy. This industry is dependent upon information for its continued economic growth. It is therefore making great progress in collecting large databases of relevant data about its industry. The corresponding questions the industry has about its data are both interesting, and economically fruitful. This paper considered two case studies of applying standard data mining techniques to industrial questions in the area of consumer package goods. The examples discussed a wholesaler and a retailer that sought better management of product categories, and a resulting improved economy of scope. Commercial success of data mining will in part, be dependent upon the capacity of algorithms to model complex, hierarchical arrangements of goods and products.

Background:

Scott Cunningham received a D.Phil. in Science and Technology Policy from the University of Sussex (1997). His thesis research involved using principal components analysis for the automatic classification of text. Scott has consulted for the British and Malaysian governments on matters of science policy and the analysis of large databases of published sciences. Since joining NCR's Human Interface Technology Center he has been working on customer funded research on business applications of data mining. Most recently he has been working on applications of data mining technology to develop intelligent web commerce applications.

 


 

Keys to the Commercial Success of Data Mining

Andrea Danyluk

Department of Computer Science
Williams College
Williamstown, MA 01267
andrea@cs.williams.edu

I am currently Assistant Professor of Computer Science at Williams College in Williamstown, Massachusetts. I am also employed as a consultant by Bell Atlantic’s Science and Technology Center in White Plains, New York. Prior to taking the faculty position at Williams, I was employed by the Bell Atlantic (then NYNEX) Science and Technology Center for four years. At Bell Atlantic I work with Foster Provost and Tom Fawcett, as both a developer and user of data mining technology. I have taken my experience with applications of interest to the telecommunications business and have carried it over to my academic research position, where I focus on machine learning algorithms.

A large class of data mining algorithms have developed out of ideas investigated earlier by researchers and developers of machine learning algorithms. Notable examples include CART, C4.5, neural networks, and Bayesian classifiers, among others. One of the assumptions made by these algorithms, which is carried over into data mining applications, is that of clean data.

All of these algorithms, and others like them, do relax the assumption from its strictest terms. They do not assume perfectly clean data, but rather assume that the data might be noisy. While the ability to handle noise is obviously critical to the successful application of data mining algorithms, the treatment of noise typically falls short of handling the complete problem of data error.

Systematic Error

Systematic errors arise in many applications, and they may be due to any of the following:

We have found many examples of these in some of the telecommunications applications we’ve investigated at NYNEX and Bell Atlantic.

One of these applications is classification of customer-reported telephone problems in the local loop of the telephone network. Problem diagnoses are high level, describing roughly that segment of the local loop where the trouble might be found, so that an appropriate technician might be dispatched to repair the trouble. The diagnoses are: dispatch to the customer’s premise; dispatch to the cable; dispatch to the central office; hold for further testing. The data describing the troubles include information about the type of switch to which the customer’s line is connected and electrical readings such as voltages and resistances, among others. The data mining problem here is to consider a large database of past troubles and their resolutions, and to develop rules for sending the appropriate technicians out to fix problems that have a certain profile.

The electrical readings that are a large component of the data are obtained via an automated line testing system. The line testing system must be calibrated regularly, but in practice this rarely occurs. As a result, the system becomes miscalibrated, and all readings reported for a set of lines on a given day might be off by a systematic amount. Furthermore, the system’s baseline readings can differ from day to day. This source of systematic error is known, but there are no mechanisms in place to handle the error so that it can be eliminated from the data. Given the heavy load handled by the company, it is not clear that careful calibration can become a high priority item. Thus we can expect that the problem will persist.

People can also affect the data in a systematic way. In particular, one source of the diagnoses for troubles are the technicians who fix the problems. They report results using a complex coding system. If a technician has memorized the wrong code to represent the outcome of a repair, it will be wrong consistently. Again, we have a good sense of the source of the problem, but it is not clear that it can be controlled. Also, aside from maintaining a profile of each technician, it is not clear that there is a mechanism that could automatically correct for these errors.

There are a number of different scenarios that arise with respect to systematic data error.

(1) The systematic error is well-understood. In these cases, the data can be "cleaned" and data mining algorithms can be applied to the clean data.

(2) The errors can be reconciled. There are applications in which data may be obtained from several sources. In these cases, it may be possible to retain data that are consistent over the sources. This has the effect of cleaning the data, by making the assumption that the data might have errors but that the errors won’t be consistent over the various sources. We found that with the local-loop diagnosis application, we were able to use a variety of data sources to reconcile diagnostic error (though we were not able to account for calibration error).

(3) The data cannot be cleaned. These are cases where the error exists, but cannot be removed from the data. It is important to note that in these cases, the sources of the error might, in fact, be quite well-known, but that additional complications make it difficult to pull the error out of the data.

One obvious reaction to these situations is to throw up our hands and assume that the application of data mining techniques will provide no useful results. But this reaction is unreasonable.

(1) If the amount of systematic error is small, or if the right algorithm is applied, the impact of the error might be small relative to other gains of the data mining.

(2) Data mining techniques might be useful for helping to identify systematic error, making the process of cleaning one’s data a possibility.

(3) There are many applications for which only a small amount of mined information can go a long way to benefiting a company. In these cases, it is not in our best interest as data miners to simply dismiss an application as being "too hard". In the application described above, an improvement of only 1% over the current dispatch procedure could save the company over $3,000,000 annually.

More work needs to be done on:

(1) Developing data mining algorithms for cleaning systematic error out of data.

(2) Analyzing the tools we have so that we can determine how they are actually affected by different types of error.

Pointers to my work on data mining and telecommunication applications include:

Danyluk, A. P. and Provost, F. J. (1993). Small Disjuncts in Action:

Learning to Diagnose Errors in the Local Loop of the Telephone Network. In Proceedings of the Tenth International Conference on Machine Learning, p. 81-88. San Mateo, CA: Morgan Kaufmann Publishers, Inc.

Danyluk, A. P. and Provost, F. J. (1993). Adaptive Expert Systems: Applying Machine Learning to NYNEX MAX. In Working Notes of the AAAI-93 Workshop on AI in Service and Support: Bridging the Gap Between Research and Applications.

Danyluk, Andrea (1995) A Comparison of Data Sources for Machine Learning in a Telephone Trouble Screening Expert System, in Working Notes of the Workshop on Data Engineering for Inductive Learning: A Workshop at the International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1-10.

Merz, C. J., Pazzani, M., and Danyluk, A. P. (1996). Tuning Numeric Parameters to Troubleshoot a Telephone Network Local Loop. IEEE Expert:11:1, p. 44-49.

Provost, Foster J. and Danyluk, Andrea (1995) Learning from Bad Data, in Aha, D. W. & Riddle, P. J. (eds.) Working Notes for Applying Machine Learning in Practice: A Workshop at the Twelfth International Machine Learning Conference (Technical Report AIC-95-023). Washington, DC: Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, pp. 27-33.

Background:

Andrea Danyluk is assistant professor of computer science at Williams College. She received her Ph.D. in computer science from Columbia University in 1992, and was a reseacher at NYNEX Science and Technology for a number of years before coming to Williams. Her research interests include the effects of systematic error on induction, time-series problems, and the application of machine learning to real-world problems.

 


 

Keys to the Commercial Success of Data Mining Workshop

 

Piew Datta

GTE Laboratories
40 Sylvan Road
Waltham, MA 02254
pdatta@gte.com

As data mining and machine learning techniques are moving from research algorithms to business applications, it is becoming obvious that the acceptance of data mining systems into practical business problems relies heavily on the integration of the data mining system in the business process. Some key dimensions that data mining developers must address include understanding the business process from the end user perspective, understanding the environment in which the system will be applied, including end users throughout the lifecycle of the development process, and building user confidence and familiarity of the techniques.

One critical aspect of building a practical and useful system is showing that the techniques can tackle the business problem. Traditionally, machine learning and data mining research areas have used classification accuracy in some form to show that the techniques can predict better than chance. While this is necessary, it is not sufficient to sell data mining systems. The evaluation methods need to more closely resemble how the system will work while in place.

One way to more closely evaluate data mining software in their intended setting is to incorporate time into the evaluation process. Although the time issue makes the prediction scenario more complicated, many data warehouses have data that is time dependent. For example, billing data stores billing, payment, and usage data for each customer indexed by time. As Kurt mentions in "Some thoughts on the current state of data mining software applications", few if any data mining systems deal with the time variable indirectly stored in the data.

Although the time dimension is left out during the prediction process, data mining systems should be evaluated with some aspects of time kept in mind. Researchers and developers can simulate time dependent evaluation by evaluating models on historical data stored in the data warehouse. For example, suppose we are building a model to predict whether a customer will churn in a given month. Suppose we have the data for the independent variables at time t and we make a prediction for person x saying that they will churn. When will x churn? Will they churn immediately, in the next two weeks, in the next month? For what time period should we evaluate the model? An intuitive guess would say that the model is most accurate at predicting churn the closer it occurs in time to the independent data. When does the model predict the same as the background churn rate? While accuracy is a valid evaluation criterion, determining how long the model is valid is also important information. Instead of showing the accuracy of a model as a single number, the accuracy could be shown as a function of time. This information can also be used to compare different data mining techniques. The characteristics of the model should be tested while increasing time to better simulate the data mining software while used in a business process. The accuracy of the model given time and other evaluation criteria should be provided to the end users so they can determine which characteristics are more important to their business task.

Given that models have some accuracy function implies that models should be relearned or refreshed after some amount of time. If the accuracy of a model becomes close to random chance after some time, a new model should be learned. The older model and the newer model should be somewhat consistent and similar. For instance, if a model at time t is based on attributes A, B, and C, we would expect the refreshed model to use a similar set of attributes. If they change radically, the models may be overfitting the data or the models may reflect seasonal trends. The end users of the system expect the models to be somewhat consistent. They might lose confidence if the models change radically, because intuitively the radical change may not make sense.

As stated earlier, the evaluation should reflect the business process it will be applied in. For example, if the churn system is used to identify churners monthly for targeted campaigns, then an interesting question from an end user may be to ask what percentage of churners in month y would the data mining software predict to churn ahead of time? The results of running experiments on historical data to answer this question may give some indication of how often campaigns need to be run to capture a certain percentage of churners. In addition, by noting which customers end up being on the predicted churn list for successive months we also find out more about the consistency of the models. These types of questions come about by interacting with end users and by looking at the task through their perspective.

In summary, we contend that data mining techniques should be evaluated according to the business task. This requires knowledge of the business process and interaction with the end users. Although most traditional evaluation has held time constant, the time variable cannot be forgotten when data mining software is put into the business process. The learned models can be evaluated and compared along the time dimension. By understanding the characteristics of the learned models, developers as well as end users can make more informed decisions.

Background:

Piew Datta is a Senior Member of Technical Staff in the Knowledge Discovery in Databases group at GTE Laboratories in Waltham, MA. She received her Ph.D. in Information and Computer Science from the University of California at Irvine. Her research interests in machine learning include clustering, prototype learning, and concept sharing. She has applied various machine learning techniques to Alzheimer's disease classification and churn prediction.

 


 

The Hardest Thing is Getting Into Peoples’ Heads: Quotable Quotes and Their Implications

Vasant Dhar

Department of Information Systems
Stern School of Business
New York University

The hardest part is getting into peoples’ heads. Getting into peoples’ heads involves a lot more than clear communication. It is the ability to lead people to consider useful aspects of a problem that they would not otherwise. Why is this a challenging problem, and how can we think about addressing it? I present some insights into these questions based on my experience in the financial industry.

In practice, Data Mining is a collaborative theory building exercise. Given limited time and resources, it is important to explore and probe the interesting aspects of the problem as quickly and thoroughly as possible. The theory building exercise can be characterized in terms of two loops, which I call the inner and outer loop. The inner loop is where machine learning is applied. The outer loop is where results are discussed with experts and/or business users, setting the stage for an iteration through the inner loop. I illustrate the difficulties involved in this learning exercise using a number of "quotable quotes" and discuss how and to what extent these can be addressed. Some of these are motivated by research on financial markets, but they apply to a number of other areas.

Patterns emerge before the reasons for them are apparent

Implication: makes it difficult to evaluate new hypotheses

Most new ideas are on average, poor

Implication: makes us reticent to propose new hypotheses, hence myopia

I would sacrifice performance for understandability

Implication: makes it important to make simplicity an important part of the search criteria

The trend is your friend until it isn’t

Implication: makes it hard to come up with simple explanations

Don’t confuse me with data

Implication: if results fit into a prior conceptual framework, the data may not be perceived as credible or complete.

Don’t confuse me with results

Implication: in the early stages of a project when the problem is less well understood, results are more useful in guiding further probing, and not necessarily as usable outputs

Don’t confuse brains with a bull market

Implication: if the sample used to construct the model is biased, the results will not be general

The market is like a barometer not a thermometer

Implication: it is more important to understand

A system is only as strong as its weakest link

Implication: Execution, or use of the model, must be driving force in deciding what types of questions are worth addressing in the first place.

I shall present example of each of these situations and discuss their implications more fully.

 


 

Keys to the Commercial Success of Data Mining

Tom Fawcett

Science and Technology Center
Bell Atlantic
White Plains, NY
fawcett@basit.com

Foster Provost and I are in-house data mining experts at Bell Atlantic’s Science and Technology Center in White Plains, New York.  We serve both as developers and users of data mining technology: we do research in the field but are also responsible for applying the technology to domains of interest to Bell Atlantic.

Because we straddle the line between developer and user, we have a unique perspective on data mining problems.  Our combined experience applying data mining technology to many domains over the years has taught us several lessons that are not commonly discussed in the community, by either vendors, researchers or business users.  I present three of them below.

1) Before business problems can be solved with data mining, they must be transformed to match existing tools.

Data mining tools perform a small set of basic tasks such as classification, regression and time-series analysis.  Rarely is a business problem exactly in one of these forms.  Usually it must be transformed into (or rephrased as) one of these basic tasks before a data mining tool can be applied.  Often, in order to solve a problem it must be decomposed into a series of basic tasks. Indeed, much of the art of data mining involves the creative decomposition of a problem into a sequence of such sub-tasks that are solvable by existing tools.

For example, our work on cellular phone fraud detection transformed the problem of fraud detection into a sequence of knowledge discovery, regression and classification tasks (mining for indicators of fraud, profiling customer behavior, combining    evidence to classify behavior as fraudulent).  No single type of task was adequate to solve the problem.

2) Evaluation of data mining results is more complex than either developers or users believe.

Most data mining tools, like the research prototypes from which they were derived, measure performance in terms of accuracy or classification error.  A tacit assumption in the use of classification accuracy as an evaluation metric is that the class distribution among examples is constant and relatively balanced.

In the real world this is rarely the case.  Classifiers are often used to sift through a large population of normal or uninteresting entities in order to find a relatively small number of unusual ones; for example, looking for fraudulent transactions or checking an assembly line for defective parts.  Because the unusual or interesting class is rare within the general population, the class distribution is very skewed.

Evaluation by classification accuracy also assumes equal error costs.  In the real world this is unrealistic because classifications lead to actions which have consequences, sometimes grave.  Rarely are mistakes evenly weighted in their cost.  We have yet to encounter a domain in which they are.

The class skew (as well as error costs) may change over time, after a data mining solution is deployed.  Indeed, error costs and class distributions in the field may never be known exactly.

Unfortunately, the importance and difficulty of evaluation is often not appreciated by business users either.  The business user usually knows the general problem to be solved, but may not be able to specify error costs or even advise in their calculation. Sometimes the business user does not know how well current procedures solve the problem, and has no mechanisms in place to evaluate their performance.  We are sympathetic to this, since evaluating performance often takes time and effort away from the task itself.  However, it makes measuring the efficacy of a data mining solution difficult or impossible.

These recurring difficulties with evaluation have directed our research at Bell Atlantic.  We have developed a technique based on ROC analysis that greatly facilitates comparison and evaluation of data mining results.  The technique is especially useful when error costs and class distributions are only known approximately, or may change.  We now use this technique in all of our work.

3) Data preparation and data cleaning are more time-consuming and knowledge intensive than is acknowledged.

In our experience, understanding the data, reducing noise and converting the data to an appropriate representation is the most time-consuming part of the data mining process.  Furthermore, the process is usually iterative and knowledge intensive: as the project progresses, we learn more about the process that generates the data and we have to go back and re-clean them based on the new knowledge.  Although the provider usually has information about the data, we are often the first people ever to analyze the data carefully.  We have uncovered errors, idiosyncracies and artifacts of the data gathering process that were unknown to the provider. These discoveries sometimes end up changing how we approached the data mining task.

Data preparation and cleaning are often tedious, uninteresting tasks.  However, over the life of a data mining project, these tasks account for far more time than that taken by applying the machine learning algorithms.

Pointers

Robust Classification Systems for Imprecise Environments, Foster Provost and Tom Fawcett,      To be presented at AAAI-98 (Fifteenth National Conference on Artificial Intelligence)

The Case Against Accuracy Estimation for Comparing Induction Algorithms, Foster Provost, Tom Fawcett and Ron Kohavi, To be presented at ICML-98 (Fifteenth International Conference on Machine Learning)

Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions, Foster Provost and Tom Fawcett, Presented at KDD-97 (Third International Conference on Knowledge Discovery and Data Mining) -- Winner of Best Paper Award

Adaptive Fraud Detection, Tom Fawcett and Foster Provost, Published in Journal of Data Mining and Knowledge Discovery, v.1 n.3, 1997

Combining Data Mining and Machine Learning for Effective User Profiling, Tom Fawcett and Foster Provost, Presented at KDD-96 (Second International Conference on Knowledge Discovery and Data Mining),

 


 

Insight into Some Commercial Data-Mining Problems

Yizhak Idan and Saharon Rosset

Amdocs (Israel) Ltd.
16 Abba Hillel St. Ramat Gan 52506, Israel
yizhaki@amdocs.com

Amdocs has been supplying large-scale information systems to the telecommunications area since 1982. A data processing staff of over 2,800 professionals is fully dedicated to this area. With installations in over 50 of the major telecommunications companies and directory publishers around the globe, Amdocs is a world leader in the development and implementation of Customer Care and Billing systems for providers of telecom services.

Amdocs carries out a major Data Mining and Decision Support activity as part of its R&D Division tasks. Among other things, we have developed simulation and run-time KDD environments that are used by our customers and are integrated in several products and applications such as: Fraud Management and Churn Management.

Following are some interesting points we have encountered, together with some insight into desirable and less desirable solutions.

Consideration of customer value in the data-mining process

One of the most important issues for business-oriented use of data mining, is the incorporation of value considerations into the analysis process. Value is a general term that may mean different things in different settings, such as: the average monthly revenue from customer, number of lines he owns or other combination value we would like to consider at a certain point of time. In the context of churn management, some of the tactics and ideas often employed are:

We propose an original approach in which value is integrated into our data-mining algorithm, in a way that the process of data partitioning is considering the distribution of value at the same time as the size of populations.

Effective incentive allocation

In several applications data mining is used for analysis followed by countermeasure reaction. For example, in the churn management, the analysis of churning customers will normally result with incentive campaigns. This means that we will accord incentives to valuable customers that are predicted to churn.

There are two main areas of interaction between the incentive component and the data-mining component in such application: t