Upload
dean-willson
View
1.652
Download
3
Embed Size (px)
DESCRIPTION
Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical example for the discussion. Related blog post "Cash for Clunkers - a Typical Sales Campaign?" http://practicalhoshin.blogspot.com/2009/08/cash-for-clunkers-typical-sales.html
Citation preview
DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
presented to fwPASS on 1/26/2010
About Me
Work for Systemental as a Consultant and Software Developer
Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)
.Net since 2004
President, fwPASS.org
Mfg. Eng. Technology degrees from Ball State University
Six Sigma Black Belt, Certified
What We Will cover
Data mining – what is it?
“Cash for Clunkers”
Other examples
Amazon.com
Coke Freestyle
Basic Data Mining Concepts
Demo time
Wikipedia
Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, frauddetection and scientific discovery.
Cash for Clunkers
Columbia City: SR 30 & SR 9
Objectives of “Cash for Clunkers” Jump start automotive sector sales
Specifically higher mileage vehicles
Get gas guzzlers off the street
Cash for Clunkers
How did they decide who to target and how?
How would you do it?
Where did the data come from?
Where should the data come from?
Who to target?
Anyone, everyone, or targeted
Self qualified
Organic growth or just “pull up” existing sales
Convert foreign sales to GM
Conflict of interest? – Government motors
Discriminatory?
Estimating the effectiveness
Affect of “pull up” vs. organic growth
Peripheral commercial effect
Estimation of payback
Sales, plates and excise tax
Income tax from lay-off recalls
Reduction of unemployment
Auto Insurance
Reduction in tax revenue at gas pumps
Data content and source
Public records
CAFE
GM Data
Industry sponsored studies
Amazon.com
SQL Server 2005 Data Mining
Nine algorithms (3rd party pluggable)
Both Modeling and exploration in VS
Integrated tools: SS*S
API
Data Mining Extensions to SQL (DMX)
Type of analysis
Optimization vs. Predictive
Descriptive – provides deeper understanding of existing data
Predictive – provides insight to understand probability of future conditions
Data Mining Objective
Classification – assign data to known classes (discrete)
Segmentation – clustering in similar groups
Estimation – predicting continuous values
Association – what events occur together
Forecasting – time series estimating of future
Algorithms
1. Decision Trees (attributes from the tree)
2. Naive Bayes (uses all attributes)
3. Clustering
4. Linear Regression
5. Logistic Regression
6. Neural Nets
7. Sequence Clustering
8. Time Series
9. Association Rules (discrete only)
DMX
Column syntax: Name, data type, content type, [usage]
Case being analyzed – key
Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)
Usage: Input, predict, predict-only (not to build any other part of model)
Structure
Datamart, DW, cube
Data source
Mining Structure (which fields)
Mining Models (algorithms, attributes)
Viewers (tree, clusters, discrimination, classification)
Training the model
SSIS Percentage Sampling Data Flow Component
Training, Testing
Estimating error
Demos
Visual Studio
SSMS
Win Client
Web Client
Miscellaneous
Sequence or timing
Prediction + measure of confidence
Caution: Over-fitting the model
Nested tables ex: transactional detail data
Key is never foreign key to case table
Key is what table is about
References
http://dean-o.blogspot.com/
http://abbottanalytics.blogspot.com/
http://www.thearling.com/umass/index_frame.htm
http://www.thearling.com/text/dmtechniques/dmtechniques.htm
MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20Mining%20Web%20Controls%20Library
http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=34035
Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
Thank you!
Website http://www.systemental.com
Blogs http://dean-o.blogspot.com/ http://practicalhoshin.blogspot.com
Twitter http://www.twitter.com/deanwillson
Email [email protected]
LinkedIn http://www.linkedin.com/in/deanwillson