Data Mining for CRM
By Kurt Thearling
www.thearling.com
From: The Data Mining and Knowledge Discovery Handbook, Springer, Maimon and Rokach (editors)
Abstract: Data mining technology allows marketing organizations to better
understand their customers and respond to their needs. This chapter describes
how data mining can be combined with customer relationship management to help
drive improved interactions with customers. An example of the process using data
mining to drive customer acquisition activities is presented.
Key words: customer relationship management, CRM, campaign management, customer
acquisition, analytics, scoring
1. WHAT IS CRM?
It is now a cliché that in the days of the corner market, shopkeepers had no
trouble understanding their customers and responding quickly to their needs. The
shopkeepers would simply keep track of each customer in their heads, and would
know what to do when a customer walked into the store. But today's shopkeepers
face a much more complex situation. More customers, more products, more
competitors, and less time to react means that understanding your customers is
now much harder to do. This is where customer relationship management (CRM)
comes in. CRM lets companies design, manage, and execute strategies for
interacting with customers (and potential customers). CRM can be applied to the
complete customer life-cycle, from acquisition, to ongoing account management,
to cross-selling, to customer retention and attrition.
The goal of CRM is to allow marketing organizations to tune the customer
interaction strategies to the specific needs of each individual, giving
customers what they want, when they want it. Instead of interacting with large
numbers of customers en masse (consider billboards or magazine advertisements),
the new role of marketing is to interact with individual customers. This
involves identifying and understanding unique customer patterns as well as the
creation of customized offers for small customer groups that correspond to those
patterns. For example, a pattern might be that 71% of cell-phone customers that
make five or more calls to customer support in the first month cancel their
service. A marketing manger could use this pattern to identify unsatisfied
customers and proactively respond to their needs before they cancel.
As a result of the complex interactions that are now possible, the function of
marketing is increasingly becoming tied to technology, ranging from complex data
mining algorithms to campaign management software applications. Campaign
management software allows marketing users to segment groups of customers (and
prospective customers) into smaller groups and then specify the interaction that
should take place with those individuals. For example, consider a marketing
manager for a cellular phone company that is focusing on customer retention.
There might be a large number of reasons that a customer chooses to leave their
cellular provider, and the marketing manager is responsible for identifying ways
to reduce this problem. One group of customers might be leaving because they are
experiencing technical problems (e.g., frequent dropped calls) while another
group might be leaving because the plan they are signed up for does not match
their current calling patterns (e.g., a local calling plan with a large number
of national calls).
A user of campaign management software would define these segments by selecting
customers in the database that have the desired characteristics. For the
customers with technical problems, the marketer could create a customer segment
that selects those customers who have had more than five dropped calls within
the last month. Once the segment is defined, it needs to be associated with
offers that will be communicated to the customers in order to improve retention.
In the case of customers with technical problems, the offer might be a rebate of
one month's charges and a promise to improve service quality. This offer could
be communicated by a call from customer service or via a piece of direct mail
(email might be a third option, if the customer's email address is available).
The campaign management application would take the segment and split it into two
groups, half receiving a phone call and the other half receiving a piece of
direct mail (which half a customer fall into would be a random selection). Once
the definition of the segmentation is complete and the marketing manager is
satisfied with the campaign, it needs to be executed. This would be handled by a
scheduler that executes the campaign at regular intervals (e.g., every night at
2am). Upon execution of the campaign, the segments associated with the phone
call would be passed to the call-center software system, which would queue up
the customers who are supposed to receive the offer along with the specifics of
the script that the operator is supposed to use (full or half month rebate). The
direct mail segments would likely be handled differently, possibly by using an
external vendor (a “mail shop”) that would take a list of customers and produce
the actual envelopes that would be mailed. In this case, the campaign management
system would generate a file listing each of the customers, including their
address and offer type.
2. DATA MINING & CAMPAIGN MANAGEMENT
In the above discussion of campaign management, the selection criteria used to
define customer segments (“five dropped calls within the last month”) was
static, based on historical values stored in a database. Alternatively, some
decisions might be based on predicted values (scores) that are output by data
mining models. Scores can take just about any form, from numbers to strings to
entire data structures, but the most common scores are numbers (for example, the
probability of responding to a particular promotional offer). These scores can
be combined with static values to select the most appropriate prospects for a
targeted marketing campaign.
The actual execution of a data mining model (scoring) is distinct from the
process that creates the model. Typically, a model is used multiple times after
it is created to score data in different marketing campaigns. For example,
consider a model that has been created to predict the probability that a
customer will respond to the cell-phone retention campaign. The model would be
built by using historical data from customers and calls, as well as the
responses those customers had to various retention offers. After the model has
been created based on historical data, it can then be scored on new data in
order to make predictions about unseen behavior. This is what data mining is all
about.
Scoring is the unglamorous workhorse of data mining. It doesn't have the
sexiness of a neural network or a genetic algorithm, but without it, data mining
is pretty useless. At the end of the day, after your data mining tools have
given you a great predictive model, there's still a lot of work to be done.
Scoring models against a customer database can be a time-consuming, error-prone
activity, so the key is to smoothly integrate it with the rest of the CRM
process. In the past, when a marketer wanted to run a campaign based on model
scores, he or she would have to call the model builder to have the model
manually run against a database so that a score file could be created. The
marketer then had to solicit the help of an IT staffer to merge the scores with
the marketing database. This disjointed process was fraught with problems and
errors and could take several weeks. Often, by the time the models are
integrated with the database, either the models are outdated or the campaign
opportunity has passed.
The solution is the tight integration of data mining and campaign management
technologies. Under this scenario, marketers can invoke statistical models from
within the campaign management application, score customer segments on the fly,
and quickly create campaigns targeted to customer segments offering the greatest
potential. The past few years have seen significant improvements by CRM vendors
with respect to integrating data mining into the CRM process. This trend is
expected to continue and CRM applications will drive more and more marketing
activities based on data mining results.
3. AN EXAMPLE: CUSTOMER ACQUISITION
For most businesses, the primary means of growth involves the acquisition of new
customers. This could involve finding customers who previously were not aware of
your product, were not candidates for purchasing your product (for example, baby
diapers for new parents), or customers who in the past have bought from your
competitors. Some of these customers might have been your customers previously,
which could be an advantage (more data might be available about them) or a
disadvantage (they might have switched as a result of poor service). In any
case, data mining can often help segment these prospective customers and
increase the response rates that an acquisition marketing campaign can achieve.
The traditional approach to customer acquisition involved a marketing manager
developing a combination of mass marketing (magazine advertisements, billboards,
etc.) and direct marketing (telemarketing, mail, etc.) campaigns based on their
knowledge of the particular customer base that was being targeted. In the case
of a marketing campaign trying to influence new parents to purchase a particular
brand of diapers, the mass marketing advertisements might be focused in
parenting magazines (naturally). The ads could also be placed in more mainstream
publications whose readership demographics (age, marital status, gender, etc.)
were similar to those of new parents.
In direct marketing, a marketing manager would select the demographics that they
are interested in (which could very well be the same characteristics used for
mass market advertising), and then work with a data vendor (sometimes known as a
service bureau) to obtain lists of customers who meet those characteristics.
Service bureaus have large databases containing millions of prospective
customers that can be segmented based on specific demographic criteria (age,
gender, interest in particular subjects, etc.). To prepare for the “diapers”
direct mail campaign, the marketing manager might request a list of prospects
from a service bureau. This list could contain people, aged 18 to 30, who have
recently purchased a baby stroller or crib (this information might be collected
from people who have returned warranty cards for strollers or cribs). The
service bureau will then provide the marketer with a computer file containing
the names and addresses for these customers so that the diaper company can
contact these customers with their marketing message.
It should be noted that because of the number of possible customer
characteristics, the concept of “similar demographics” has traditionally been an
art rather than a science. There usually are not hard-and-fast rules about
whether two groups of customers share the same characteristics. In the end, much
of the segmentation that took place in traditional direct marketing involved
hunches on the part of the marketing professional. In the case of 18-to-30 year
old purchasers of baby strollers, the hunch might be that people who purchase a
stroller in this age group are probably making the purchase before the arrival
of their first child (because strollers are saved and used for additional
children). They also haven't yet decided which brand of diapers to use. Seasoned
veterans of the marketing game know their customers well and are often quite
successful in making these kinds of decisions.
3.1 How Data Mining and Statistical Modeling Changes Things
Although a marketer with a wealth of experience can often choose relevant
demographic selection criteria, the process becomes more difficult as the amount
of data increases. The complexities of the patterns increase, both with the
number of customers being considered and the increasing detail for each
customer. The past few years have seen tremendous growth in consumer databases,
so the job of segmenting prospective customers is becoming overwhelming.
Data mining can help this process, but it is by no means a solution to all of
the problems associated with customer acquisition. The marketer will need to
combine the potential customer list that data mining generates with offers that
people are interested in. Deciding what is an interesting offer is where the art
of marketing comes in.
3.2 Defining Some Key Acquisition Concepts
Before the process of customer acquisition begins, it is important to think
about the goals of the marketing campaign. In most situations, the goal of an
acquisition marketing campaign is to turn a group of potential customers into
actual customers of your product or service. This is where things can get a bit
fuzzy. There are usually many kinds of customers, and it can often take a
significant amount of time before someone becomes a valuable customer. When the
results of an acquisition campaign are evaluated, there are often different
kinds of responses that need to be considered.
The responses that come in as a result of a marketing campaign are called
“response behaviors.” The use of the word “behavior” is important because the
way in which different people respond to a particular marketing message can
vary. How a customer behaves as a result of the campaign needs to take into
consideration this variation. A response behavior defines a distinct kind of
customer action and categorizes the different possibilities so that they can be
further analyzed and reported on.
Binary response behaviors are the simplest kind of response. With a binary
response behavior, the customer response is either a yes or no. If someone is
sent a catalog, did they buy something from the catalog or not? At the highest
level, this is often the kind of response that is talked about. Binary response
behaviors do not convey any subtle distinctions between customer actions, and
these distinctions are not always necessary for effective marketing campaigns.
Beyond binary response behaviors are categorical response behaviors. As you
would expect, a categorical response behavior allows for multiple behaviors to
be defined. The rules that define the behaviors are arbitrary and are based on
the kind of business you are involved in. Going back to the example of sending
out catalogs, one response behavior might be defined to match if the customer
purchased women's clothing from the catalog, whereas a different behavior might
match when the customer purchased men's clothing. These behaviors can be refined
a far as deemed necessary (for example, “purchased men's red polo shirt.”
It should be noted that it is possible for different response behaviors to
overlap. A behavior might be defined for customers that purchased over $100 from
the catalog. This could overlap with the “purchased men's clothing” behavior if
the clothing that was purchased cost more than $100. Overlap can also be
triggered if the customer purchases more than one item (both men's and women's
shirts, for example) as a result of a single offer. Although the use of
overlapping behaviors can tend to complicate analysis and reporting, the use of
overlapping categorical response behaviors tends to be richer and therefore will
provide a better understanding of your customers in the future.

Figure 1: Example response analysis broken down by behavior
There are usually several different kinds of positive response behaviors that can be associated with an acquisition marketing campaign. (This assumes that the goal of the campaign is to increase customer purchases, as opposed to an informational marketing campaign in which customers are simply told of your company's existence.) Some of the general categories of response behaviors (Figure 1) are the following:
a) Customer inquiry. The customer asks for more information about your products or services. This is a good start. The customer is definitely interested in your products — it could signal the beginning of a long-term customer relationship. You might also want to track conversions, which are follow-ups to inquiries that result in the purchase of a product.
b) Purchase of the offered product or products. This is the usual definition of success. You offered your products to someone, and they decided to buy one or more of them. Within this category of response behaviors, there can be many different kinds of responses. As mentioned earlier, both “purchased men's clothing” and “purchased women's clothing” fit within this category.
c) Purchase of a product different than the ones offered. Despite the fact that the customer purchased one of your products, it wasn't the one you offered. You might have offered the deluxe product and they chose to purchase the standard model (or vice-versa). In some sense, this is very valuable response because you now have data on a customer/product combination that you would not otherwise have collected.
There are also typically two kinds of negative responses. The first is a
non-response. This is not to be confused with a definite refusal of your offer.
For example, if you contacted the customer via direct mail, there may be any
number of reasons why there was no response (wrong address, offer misplaced,
etc.). Other customer contact channels (outbound telemarketing, email, etc.) can
also result in ambiguous non-responses. The fact there was no response does not
necessarily mean that the offer was rejected. As a result, the way you interpret
a non-response as part of additional data analysis will need to be thought out
(more on this later).
A rejection by the prospective customer is the other kind of negative response.
Depending on the offer and the contact channel, you can often determine exactly
whether or not the customer is interested in the offer (for example, an offer
made via outbound telemarketing might result in a definitive “no, I'm not
interested” response). Although it probably does not seem useful, the definitive
“no” response is often as valuable as the positive response when it comes to
further analysis of customer interests.
3.3 It All Begins with the Data
One of the differences between customer acquisition and most other marketing
applications of data mining revolves around the data that is used to build
predictive models. The amount of information that you have about people that you
do not yet have a relationship with is much more limited than the information
you have about your existing customers. In some cases, the data might be limited
to their address and/or phone number. The key to this process is finding a
relationship between the information that you do have and the behaviors you want
to model.
Most acquisition marketing campaigns begin with the prospect list. A prospect
list is simply a list of customers that have been selected because they are
likely to be interested in your products or services. There are numerous
companies around the world that will sell lists of customers, often with a
particular focus (for example, new parents, retired people, new car purchasers,
etc.).
Sometimes, it is necessary to add additional information to a prospect list by
overlaying data from other sources. For example, consider a prospect list that
containing only names and addresses. In terms of a potential data mining
analysis, the information contained in the prospect list is very weak. There
might be some patterns in the city, state, or Zip code fields, but they would be
limited in their predictive power. To augment the data, information about
customers on the prospect list could be matched with external data. One simple
overlay involves combining the customer's ZIP code with U.S. census data about
average income, average age, and so on. This can be done manually or, as is
often the case with overlays, your list provider can take care of this
automatically.
More complicated overlays are also possible. Customers can be matched against
purchase, response, and other detailed data that the data vendors collect and
refine. This data comes from a variety of sources including retailers, state and
local governments, and the customers themselves. If you are mailing out a car
accessories catalog, it might be useful to overlay information (make, model,
year) about any known cars that people on the prospect list might have
registered with their department of motor vehicles.
3.4 Test Campaigns
Once you have a list of prospect customers, there is still some work that needs
to be done before you can create predictive models for customer acquisition.
Unless you have data available from previous acquisition campaigns, you will
need to send out a test campaign in order to collect data for analysis. Besides
the customers you have selected for your prospect list, it is important to
include some other customers in the campaign, so that the data is as rich as
possible for future analysis. For example, assume that your prospect list (that
you purchased from a list broker) was composed of men over age 30 that recently
purchased a new car. If you were to market to these prospective customers and
then analyze the results, any patterns found by data mining would be limited to
sub-segments of the group of men over 30 who bought a new car. What about women
or people under age 30? By not including these people in your test campaign, it
will be difficult to expand future campaigns to include segments of the
population that are not in your initial prospect list. The solution is to
include a small random selection of customers whose demographics differ from the
initial prospect list. This random selection should constitute only a small
percentage of the overall marketing campaign, but it will provide valuable
information for data mining. You will need to work with your data vendor in
order to add a random sample to the prospect list. More sophisticated techniques
than random selection do exist, such as those found in statistical design of
experiments (DoE).
Although this circular process (customer interaction --> data collection -->
data mining --> customer interaction) exists in almost every application of data
mining to marketing, there is more room for refinement in customer acquisition
campaigns. Not only do the customers that are included in the campaigns change
over time, but the data itself can also change. Additional overlay information
can be included in the analysis when it becomes available. Also, the use random
selection in the test campaigns allows for new segments of people to be added to
your customer pool.
Once you have started your test campaign, the job of collecting and categorizing
the response behaviors begins. Immediately after the campaign offers go out, you
need to track responses. The nature of the response process is such that
responses tend to trickle in over time, which means that the campaign can go on
forever. In most real-world situations, though, there is a threshold after which
you no longer look for responses. At that time, any customers on the prospect
list that have not responded are deemed “non-responses.” Before the threshold,
customers who have not responded are in a state of limbo, somewhere between a
response and a non-response.
3.5 Building Data Mining Models Using Response Behaviors
With the test campaign response data in hand, the actual mining of customer
response behaviors can begin. The first part of this process requires you to
choose which behaviors you are interested in predicting, and at what level of
granularity. The level at which the predictive models work should reflect the
kinds of offers that you can make, not the kinds of responses that you can
track. It might be useful (for reporting purposes) to track catalog clothing
purchases down to the level of color and size. If all catalogs are the same,
however, it really doesn't matter what the specifics of a customer purchase for
the data mining analysis. In this case (all catalogs are the same), binary
response prediction is the way to go. If separate men's and women's catalogs are
available, analyzing response behaviors at the gender level would be
appropriate. In either case, it is a straightforward process to turn the
lower-level categorical behaviors into a set of responses at the desired level
of granularity. If there are overlapping response behaviors, the duplicates
should be removed prior to mining.
In some circumstances, predicting individual response behaviors might be an
appropriate course of action. With the movement toward one-to-one customer
marketing, the idea of catalogs that are custom-produced for each customer is
moving closer to reality. Existing channels such as the Internet or outbound
telemarketing also allow you to be more specific in the ways you target the
exact wants and needs of your prospective customers. A significant drawback of
the modeling of individual response behaviors is that the analytical processing
power required can grow dramatically because the data mining process needs to be
carried our multiple times, once for each response behavior that you are
interested in.
How you handle negative responses also needs to be thought out prior to the data
analysis phase. As discussed previously, there are two kinds of negative
responses: rejections and non-responses. Rejections, by their nature, correspond
to specific records in the database that indicate the negative customer
response. Non-responses, on the other hand, typically do not represent records
in the database. Non-responses usually correspond to the absence of a response
behavior record in the database for customers who received the offer.
There are two ways in which to handle non-responses. The most common way is to
translate all non-responses into rejections, either explicitly (by creating
rejection records for the non-responding customers) or implicitly (usually a
function of the data mining software used). This approach will create a data set
comprised of all customers who have received offers, with each customer's
response being positive (inquiry or purchase) or negative (rejections and
non-responses).
The second approach is to leave non-responses out of the analysis data set. This
approach is not typically used because it throws away so much data, but it might
make sense if the number of actual rejections is large (relative to the number
of non-responses); experience has shown that non-responses do not necessarily
correspond to a rejection of your product or services offering.
Once the data has been prepared, the actual data mining can be performed. The
target variable that the data mining software will predict is the response
behavior type at the level you have chosen (binary or categorical). Because some
data mining applications cannot predict non-binary variables, some finessing of
the data will be required if you are modeling categorical responses using
non-categorical software. The inputs to the data mining system are the input
variables and all of the demographic characteristics that you might have
available, especially any overlay data that you combined with your prospect
list.
In the end, a model (or models, if you are predicting multiple categorical
response behaviors) will be produced that will predict the response behaviors
that you are interested in. The models can then be used to score lists of
prospect customers in order to select only those who are likely to response to
your offer. Depending on how the data vendors you work with operate, you might
be able to provide them with the model, and have them send you only the best
prospects. In the situation in which you are purchasing overlay data in order to
aid in the selection of prospects, the output of the modeling process should be
used to determine whether all of the overlay data is necessary. If a model does
not use some of the overlay variables, you might want to save some money and
leave out these unused variables the next time you purchase a prospect list.