22
DATA MINING A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS” presented to fwPASS on 1/26/2010

Data Mining with SQL Server 2005

Embed Size (px)

DESCRIPTION

Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical example for the discussion. Related blog post "Cash for Clunkers - a Typical Sales Campaign?" http://practicalhoshin.blogspot.com/2009/08/cash-for-clunkers-typical-sales.html

Citation preview

Page 1: Data Mining with SQL Server 2005

DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”

presented to fwPASS on 1/26/2010

Page 2: Data Mining with SQL Server 2005

About Me

Work for Systemental as a Consultant and Software Developer

Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)

.Net since 2004

President, fwPASS.org

Mfg. Eng. Technology degrees from Ball State University

Six Sigma Black Belt, Certified

Page 3: Data Mining with SQL Server 2005

What We Will cover

Data mining – what is it?

“Cash for Clunkers”

Other examples

Amazon.com

Coke Freestyle

Basic Data Mining Concepts

Demo time

Page 4: Data Mining with SQL Server 2005

Wikipedia

Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, frauddetection and scientific discovery.

Page 5: Data Mining with SQL Server 2005

Cash for Clunkers

Columbia City: SR 30 & SR 9

Page 6: Data Mining with SQL Server 2005

Objectives of “Cash for Clunkers” Jump start automotive sector sales

Specifically higher mileage vehicles

Get gas guzzlers off the street

Page 7: Data Mining with SQL Server 2005

Cash for Clunkers

How did they decide who to target and how?

How would you do it?

Where did the data come from?

Where should the data come from?

Page 8: Data Mining with SQL Server 2005

Who to target?

Anyone, everyone, or targeted

Self qualified

Organic growth or just “pull up” existing sales

Convert foreign sales to GM

Conflict of interest? – Government motors

Discriminatory?

Page 9: Data Mining with SQL Server 2005

Estimating the effectiveness

Affect of “pull up” vs. organic growth

Peripheral commercial effect

Estimation of payback

Sales, plates and excise tax

Income tax from lay-off recalls

Reduction of unemployment

Auto Insurance

Reduction in tax revenue at gas pumps

Page 10: Data Mining with SQL Server 2005

Data content and source

Public records

CAFE

GM Data

Industry sponsored studies

Page 11: Data Mining with SQL Server 2005

Amazon.com

Page 12: Data Mining with SQL Server 2005

SQL Server 2005 Data Mining

Nine algorithms (3rd party pluggable)

Both Modeling and exploration in VS

Integrated tools: SS*S

API

Data Mining Extensions to SQL (DMX)

Page 13: Data Mining with SQL Server 2005

Type of analysis

Optimization vs. Predictive

Descriptive – provides deeper understanding of existing data

Predictive – provides insight to understand probability of future conditions

Page 14: Data Mining with SQL Server 2005

Data Mining Objective

Classification – assign data to known classes (discrete)

Segmentation – clustering in similar groups

Estimation – predicting continuous values

Association – what events occur together

Forecasting – time series estimating of future

Page 15: Data Mining with SQL Server 2005

Algorithms

1. Decision Trees (attributes from the tree)

2. Naive Bayes (uses all attributes)

3. Clustering

4. Linear Regression

5. Logistic Regression

6. Neural Nets

7. Sequence Clustering

8. Time Series

9. Association Rules (discrete only)

Page 16: Data Mining with SQL Server 2005

DMX

Column syntax: Name, data type, content type, [usage]

Case being analyzed – key

Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)

Usage: Input, predict, predict-only (not to build any other part of model)

Page 17: Data Mining with SQL Server 2005

Structure

Datamart, DW, cube

Data source

Mining Structure (which fields)

Mining Models (algorithms, attributes)

Viewers (tree, clusters, discrimination, classification)

Page 18: Data Mining with SQL Server 2005

Training the model

SSIS Percentage Sampling Data Flow Component

Training, Testing

Estimating error

Page 19: Data Mining with SQL Server 2005

Demos

Visual Studio

SSMS

Win Client

Web Client

Page 20: Data Mining with SQL Server 2005

Miscellaneous

Sequence or timing

Prediction + measure of confidence

Caution: Over-fitting the model

Nested tables ex: transactional detail data

Key is never foreign key to case table

Key is what table is about

Page 21: Data Mining with SQL Server 2005

References

http://dean-o.blogspot.com/

http://abbottanalytics.blogspot.com/

http://www.thearling.com/umass/index_frame.htm

http://www.thearling.com/text/dmtechniques/dmtechniques.htm

MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise

http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20Mining%20Web%20Controls%20Library

http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=34035

Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20

Page 22: Data Mining with SQL Server 2005

Thank you!

Website http://www.systemental.com

Blogs http://dean-o.blogspot.com/ http://practicalhoshin.blogspot.com

Twitter http://www.twitter.com/deanwillson

Email [email protected]

LinkedIn http://www.linkedin.com/in/deanwillson