Data Mining with SQL Server 2005

DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”

presented to fwPASS on 1/26/2010

About Me

Work for Systemental as a Consultant and Software Developer

Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)

.Net since 2004

President, fwPASS.org

Mfg. Eng. Technology degrees from Ball State University

Six Sigma Black Belt, Certified

http://www.systemental.com/

http://www.fwpass.org/

What We Will cover

Data mining – what is it?

“Cash for Clunkers”

Other examples

Amazon.com

Coke Freestyle

Basic Data Mining Concepts

Demo time

Wikipedia

Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, frauddetection and scientific discovery.

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Profiling_practices

http://en.wikipedia.org/wiki/Marketing

http://en.wikipedia.org/wiki/Surveillance

http://en.wikipedia.org/wiki/Fraud

Cash for Clunkers

Columbia City: SR 30 & SR 9

Objectives of “Cash for Clunkers” Jump start automotive sector sales

Specifically higher mileage vehicles

Get gas guzzlers off the street

Cash for Clunkers

How did they decide who to target and how?

How would you do it?

Where did the data come from?

Where should the data come from?

Who to target?

Anyone, everyone, or targeted

Self qualified

Organic growth or just “pull up” existing sales

Convert foreign sales to GM

Conflict of interest? – Government motors

Discriminatory?

Estimating the effectiveness

Affect of “pull up” vs. organic growth

Peripheral commercial effect

Estimation of payback

Sales, plates and excise tax

Income tax from lay-off recalls

Reduction of unemployment

Auto Insurance

Reduction in tax revenue at gas pumps

Data content and source

Public records

CAFE

GM Data

Industry sponsored studies

Amazon.com

SQL Server 2005 Data Mining

Nine algorithms (3rd party pluggable)

Both Modeling and exploration in VS

Integrated tools: SS*S

API

Data Mining Extensions to SQL (DMX)

Type of analysis

Optimization vs. Predictive

Descriptive – provides deeper understanding of existing data

Predictive – provides insight to understand probability of future conditions

Data Mining Objective

Classification – assign data to known classes (discrete)

Segmentation – clustering in similar groups

Estimation – predicting continuous values

Association – what events occur together

Forecasting – time series estimating of future

Algorithms

1. Decision Trees (attributes from the tree)

2. Naive Bayes (uses all attributes)

3. Clustering

4. Linear Regression

5. Logistic Regression

6. Neural Nets

7. Sequence Clustering

8. Time Series

9. Association Rules (discrete only)

DMX

Column syntax: Name, data type, content type, [usage]

Case being analyzed – key

Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)

Usage: Input, predict, predict-only (not to build any other part of model)

Structure

Datamart, DW, cube

Data source

Mining Structure (which fields)

Mining Models (algorithms, attributes)

Viewers (tree, clusters, discrimination, classification)

Training the model

SSIS Percentage Sampling Data Flow Component

Training, Testing

Estimating error

Demos

Visual Studio

SSMS

Win Client

Web Client

Miscellaneous

Sequence or timing

Prediction + measure of confidence

Caution: Over-fitting the model

Nested tables ex: transactional detail data

Key is never foreign key to case table

Key is what table is about

References

http://dean-o.blogspot.com/

http://abbottanalytics.blogspot.com/

http://www.thearling.com/umass/index_frame.htm

http://www.thearling.com/text/dmtechniques/dmtechniques.htm

MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise

http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20Mining%20Web%20Controls%20Library

http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=34035

Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20




http://abbottanalytics.blogspot.com/

http://www.thearling.com/umass/index_frame.htm

http://www.thearling.com/text/dmtechniques/dmtechniques.htm

http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data Mining Web Controls Library

http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data Mining Web Controls Library



Thank you!

Website http://www.systemental.com

Blogs http://dean-o.blogspot.com/ http://practicalhoshin.blogspot.com

Twitter http://www.twitter.com/deanwillson

Email [email protected]

LinkedIn http://www.linkedin.com/in/deanwillson

http://www.systemental.com/




http://practicalhoshin.blogspot.com/

http://www.twitter.com/deanwillson

mailto:[email protected]

http://www.linkedin.com/in/deanwillson