An Introduction
to Data Mining
Outline
Resources
A Problem...
… A Solution
The Big Picture
Defining Data
Mining
Goal of Data
Mining
Data Mining Is…
Data Mining is
Not ...
Convergence of
Three Technologies
1. Increasing
Computing Power
2. Improved
Data Collection
3. Improved
Algorithms
Common Uses of
Data Mining
Definition:
Predictive Model
Models
Scoring
Two Ways to Use
a Model
How Good is a
Predictive Model?
Lift Curves
Receiver
Operating Characteristic Curves
Kinds of Data
Mining Problems
Supervised vs.
Unsupervised Learning
Clustering of
Gene Markers
How are Models
Built and Used?
What the Real
World Looks Like
Mining
Technology is Just One Part
Data Mining
Fits into a Larger Process
Example:
Workflow in Oracle 11i
What Caused
this Complexity?
Legal and
Ethical Issues
Data is the
Foundation for Analytics
Don’t Make
Assumptions About the Data
The Data Mining
Process
Generalization
vs. Overfitting
Cross
Validation
Some Popular
Data Mining Algorithms
Two Good
Algorithm Books
A Very Simple
Problem Set
Regression
Models
Regression
Models
k-Nearest-Neighbor
(kNN) Models
Time Savings
with kNN
Developing a
Nearest Neighbor Model
Example of a
Nearest Neighbor Model
Example:
Nearest Neighbor
(Feed Forward)
Neural Networks
Processor
Defines Network
Processor
Defines Network
Multilayer
Neural Networks
Adjusting the
Weights
Neural Network
Example
Neural Network
Issues
Comparing kNN
and Neural Networks
Rule Induction
Rule Induction
(cont.)
Decision Trees
Types of
Decision Trees
Decision Tree
Model
Decision Trees
& Understandability
Supervised
Algorithm Summary
Other Data
Mining Techniques
K-Means
Clustering
Self Organized
Maps (SOM)
Self Organized
Maps (SOM)
Text Mining
Text Can Be
Combined with Other Data
Text Can Be
Combined with Other Data
Commercial Data
Mining Software
What is
Currently Happening?
Top Data Mining
Vendors Today
Standards in
Data Mining
Data Mining
Moving into the Database
SAS Enterprise
Miner
Enterprise
Miner Capabilities
Enterprise
Miner User Interface
SPSS Clementine
Insightful
Miner
Oracle Darwin
Angoss
KnowledgeSTUDIO
Usability and
Understandability
User Needs to
Trust the Results
Visualization
Can Find Data Problems
Visualization
Can Provide Insight
Visualization
Can Show Relationships
The Books of
Edward Tufte
Small Multiples
CrossGraphs
Clinical Trial Software
OLAP Analysis
Micro/Macro
Inxight: Table
Lens
Thank You