An Introduction to Data Mining

Outline

Resources

A Problem...

… A Solution

The Big Picture

Defining Data Mining

Goal of Data Mining

Data Mining Is…

Data Mining is Not ...

Convergence of Three Technologies

1. Increasing Computing Power

2. Improved Data Collection

3. Improved Algorithms

Common Uses of Data Mining

Definition: Predictive Model

Models

Scoring

Two Ways to Use a Model

How Good is a Predictive Model?

Lift Curves

Receiver Operating Characteristic Curves

Kinds of Data Mining Problems

Supervised vs. Unsupervised Learning

Clustering of Gene Markers

How are Models Built and Used?

What the Real World Looks Like

Mining Technology is Just One Part

Data Mining Fits into a Larger Process

Example: Workflow in Oracle 11i

What Caused this Complexity?

Legal and Ethical Issues

Data is the Foundation for Analytics

Don’t Make Assumptions About the Data

The Data Mining Process

Generalization vs. Overfitting

Cross Validation

Some Popular Data Mining Algorithms

Two Good Algorithm Books

A Very Simple Problem Set

Regression Models

Regression Models

k-Nearest-Neighbor (kNN) Models

Time Savings with kNN

Developing a Nearest  Neighbor Model

Example of a Nearest Neighbor Model

Example: Nearest Neighbor

(Feed Forward) Neural Networks

Processor Defines Network

Processor Defines Network

Multilayer Neural Networks

Adjusting the Weights

Neural Network Example

Neural Network Issues

Comparing kNN and Neural Networks

Rule Induction

Rule Induction (cont.)

Decision Trees

Types of Decision Trees

Decision Tree Model

Decision Trees & Understandability

Supervised Algorithm Summary

Other Data Mining Techniques

K-Means Clustering

Self Organized Maps (SOM)

Self Organized Maps (SOM)

Text Mining

Text Can Be Combined with Other Data

Text Can Be Combined with Other Data

Commercial Data Mining Software

What is Currently Happening?

Top Data Mining Vendors Today

Standards in Data Mining

Data Mining Moving into the Database

SAS Enterprise Miner

Enterprise Miner Capabilities

Enterprise Miner User Interface

SPSS Clementine

Insightful Miner

Oracle Darwin

Angoss KnowledgeSTUDIO

Usability and Understandability

User Needs to Trust the Results

Visualization Can Find Data Problems

Visualization Can Provide Insight

Visualization Can Show Relationships

The Books of Edward Tufte

Small Multiples

CrossGraphs Clinical Trial Software

OLAP Analysis

Micro/Macro

Inxight: Table Lens

Thank You