Information about analytics and data science

Kurt Thearling /


Data Science, sometimes called data mining, is the automated extraction of hidden predictive information from large data sets.  I have spent much of the last two decades building commercial analytic and data science systems, solving problems in fields ranging from financial services to biotechnology to advertising (click here for more about my background).

The purpose of this web site is to share information about analytics and data science. I hope you find it useful. 



My Analytics Book of the Month is Data Mining Techniques by Michael Berry and Gordon Linoff.  This is the third edition of what I consider to be one of the best introductions to analytics and data mining.   They cover both the practical applications of data mining as well as the techniques that underpin analytics.   Michael and Gordon are two of the best consultants in the field, and this book contains nearly twenty years of their experiences working with clients and solving real world data analysis problems.   Highly recommended.  For more books on analytics and data mining, take a look this list of data science books.

Competing on Analytics by Davenport and Harris


A while back I added online versions of the slides from a couple talks I have given on data mining and analytic technologies:

What are analytics and data mining good for?

Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology.  For example, consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond.  The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc.) so that the right customers receive the right offers.

White Papers and Other Publications:

Over the past few years I have written or co-written a number of white papers and other publications looking at data mining and decision support technologies. Most of them are available below.

Useful Data Mining  & CRM References:

If you would like to get more information on data mining and CRM, you might want to look at the following sites:

My Background:

I currently lead the analytics practice at WEX, a B2B payments company. Before that I worked at DigitasLBI, and advertising agency that leverages data and anlytics to better target digital advertising. Before I did this, I ran the analytics business for Vertex Business Services, a multi-national customer management company. I was the first analytics person hired at Vertex and built and led a Decision Sciences group spanning three continents.  Our clients covered  multiple industries including retail, utilities, healthcare, government, and financial services.

Before I joined Vertex, I led Capital One's advanced technology & innovation organization, with a specific focus on accelerating the use of new ways to do data science.  This involved developing a corporate-wide analytic infrastructure plan, managing relationships with key analytic vendors, and identifying and deploying new analytic tools and techniques across the company.

Before Capital One, I was director of Engineering at AnVil, an in silico drug discovery company focused on the commercial analysis of biological and clinical datasets.  I was responsible for the development of AnVil's data analysis platform technology (ADAPT), an award winning system of data mining tools used to automate the analysis of everything from gene expression microarray data to clinical healthcare records.

I came to AnVil from Wheelhouse, a marketing technology and services company, where I was Chief Scientist.   I founded the engineering organization, managed software development efforts, and set technology strategy.  In addition, as a Wheelhouse senior management team member, I performed numerous corporate duties including engaging clients, making sales calls, evaluating technologies, public speaking, fundraising (including a $52M series B investment round), etc.

Prior to my position at Wheelhouse, I was Director of Analytics at Xchange Inc., a leading CRM software vendor (and now part of Amdocs, Inc.).  I was responsible for directing the integration of analytic applications (data mining, customer optimization, decision support, and visualization) into Xchange's suite of marketing automation software applications.   

Before Xchange, I co-founded the data mining group Dun & Bradstreet.  At the time, D&B was a complicated collection of over twenty-five companies whose main purpose was to collect information and turn it into a form that other companies could use.  This data covered everything from TV Ratings (Nielsen Media) to prescriptions (IMS) to grocery purchases (A.C. Nielsen).  Analyzing this data was a major component to the business and I was a consultant to the various divisions, providing help in the areas data mining and high-performance computing.    I also worked with the Pilot Software division of D&B, and developed new software applications to put data mining solutions in the hands of business users.  

Before D&B I was a senior scientist at an amazing company called Thinking Machines Corporation. TMC was a pioneer in the commercial development of massively parallel supercomputers. While I was at Thinking Machines I helped create Darwin, one of the first commercial data mining applications.  Our early work with Darwin made use of the massive computational power available with a supercomputer but later versions were adapted for use on less esoteric hardware.    TMC eventually moved out of the supercomputer business and turned itself into a company focused exclusively on the data mining software market.   Oracle corporation eventually acquired TMC and incorporated Darwin into their database platform. 

If you would like to know even more about me, you can check out my LinkedIn profile or list of publications. Besides the particulars of my work experience and education, you can find links to many of the papers I have written. The topics of these papers include data mining, artificial life, time-series prediction, parallel computers, and the future of personal computing.

[ Data Mining Books ] [ White Papers ] [ Data Mining Tutorial ]

Copyright 2017 Kurt Thearling