Thearling.com

Information about data mining and 
analytic technologies

Kurt Thearling / kurt@thearling.com

 
         

Data mining, if you haven't heard of it before, is the automated extraction of hidden predictive information from databases.  I have spent the past fifteen years building commercial data mining and data analysis systems, solving problems across fields such as financial services, the life sciences, insurance, and telecommunications.  I am currently Vice President of Strategic Technology at Capital One, where I lead the company's advanced technology group.  (Click here for more about my background.)

The purpose of this web site is to share information about data mining and other analytic technologies.  I hope you find it useful. 

 

 

 
My Data Mining Book of the Month is Competing on Analytics by Tom Davenport and Jeanne Harris.  This book is a great addition to the literature of analytics.   Not that it solves any complicated statistical modeling problem; I don't think that there is a single equation in the entire book.  Instead, this book focuses on the (more important) problem of getting an organization to change its approach to problem solving, by increasing the use of analytics across a business.  This is a trend that I have been involved with over the past fifteen years, and I think that it is a key differentiator between modern companies.  If you are interested in learning how companies like Amazon, Capital One, Harrah's, and Netflix (to name just a few) put analytics into practice, this book is a fantastic resource.   For more books on data mining, take a look at my list of recommended books.

Competing on Analytics by Davenport and Harris

 
  

A while back I added online versions of the slides from a couple talks I have given on data mining and analytic technologies:

Some time ago I took part in a National Public Radio report about data mining and privacy with a local NPR affiliate.   It's a bit sensationalistic, but I think that reflects the concerns that the general public has about privacy in the age of the internet.

What is data mining good for?

Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology.  For example, consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond.  The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc.) so that the right customers receive the right offers.

White Papers and Other Publications:

Over the past few years I have written or co-written a number of white papers and other publications looking at data mining and decision support technologies. Most of them are available below.

Useful Data Mining  & CRM References:

If you would like to get more information on data mining and CRM, you might want to look at the following sites:

My Background:

Before my current position at Capital One, I was director of Engineering at AnVil, an in silico drug discovery company focused on the commercial analysis of biological and clinical datasets.  I was responsible for the development of AnVil's data analysis platform technology (ADAPT), an award winning system of data mining tools used to automate the analysis of everything from gene expression microarray data to clinical healthcare records.

I came to AnVil from Wheelhouse, a marketing technology and services company, where I was Chief Scientist.   I founded the engineering organization, managed software development efforts, and set technology strategy.  In addition, as a Wheelhouse senior management team member, I performed numerous corporate duties including engaging clients, making sales calls, evaluating technologies, public speaking, fundraising (including a $52M series B investment round), etc.

Prior to my position at Wheelhouse, I was Director of Analytics at Xchange Inc., a leading CRM software vendor (and now part of Amdocs, Inc.).  I was responsible for directing the integration of analytic applications (data mining, customer optimization, decision support, and visualization) into Xchange's suite of marketing automation software applications.   

Before Xchange, I co-founded the data mining group Dun & Bradstreet.  At the time, D&B was a complicated collection of over twenty-five companies whose main purpose was to collect information and turn it into a form that other companies could use.  This data covered everything from TV Ratings (Nielsen Media) to prescriptions (IMS) to grocery purchases (A.C. Nielsen).  Analyzing this data was a major component to the business and I was a consultant to the various divisions, providing help in the areas data mining and high-performance computing.    I also worked with the Pilot Software division of D&B, and developed new software applications to put data mining solutions in the hands of business users.  

Before D&B I was a senior scientist at an amazing company called Thinking Machines Corporation. TMC was a pioneer in the commercial development of massively parallel supercomputers. While I was at Thinking Machines I helped create Darwin, one of the first commercial data mining applications.  Our early work with Darwin made use of the massive computational power available with a supercomputer but later versions were adapted for use on less esoteric hardware.    TMC eventually moved out of the supercomputer business and turned itself into a company focused exclusively on the data mining software market.   Oracle corporation eventually acquired TMC and incorporated Darwin into their database platform. 

If you would like to know even more about me, you can check out my LinkedIn profile, resume, or list of publications. Besides the particulars of my work experience and education, you can find links to many of the papers I have written. The topics of these papers include data mining, artificial life, time-series prediction, parallel computers, and the future of personal computing.

[ Data Mining Books ] [ White Papers ] [ Data Mining Tutorial ]

Copyright © 2008 Kurt Thearling