17
How-To Build Your First Model A publication of

How to Build Your First Predictive Model

Embed Size (px)

Citation preview

Page 1: How to Build Your First Predictive Model

How-To

Build YourFirst Model

A publication of

Page 2: How to Build Your First Predictive Model

Building your first model with a new data mining tool can be intimidating.

Though some of us may have some intuition for model building, it’s pretty daunting to look at the default settings, knowing you have a ways to go before you have an accurate, explainable predictive model to hand over to your boss.

To make sure you’re set up for data mining success, follow these simple steps to build your first models in the SPM software suite.

INTRO

Page 3: How to Build Your First Predictive Model

Want to skip ahead? Here’s what we’re going to cover.

IMPORT DATA5 … Prepare 6 … Stay Organized

Model Setup8 … Select and Engine9 … Analysis Type10 … Variables11 … Testing12 … Control Parameters

PERFORMANCE15 … What To Look For17 … What’s Next

Page 4: How to Build Your First Predictive Model

IMPORTDATA

We’re going to walk you through best practices for preparing and uploading

your data into the SPM software.

Page 5: How to Build Your First Predictive Model

PREPAREMake sure your data is in a ‘flat’ file (i.e. rows x columns)1

2 Make sure you understand your variable labels! If you don’t understand what your variables represent, you’re going to have a heck of a time understanding your results.

Want to read the nitty gritty?Check out the complete SPM User Guide.

Want to read the nitty gritty?Check out the complete SPM User Guide.

Page 6: How to Build Your First Predictive Model

STAY ORGANIZEDSave your data set, or sets, in one, easy-to-find folder. If you’re pulling in data from all over creation, you’re just making the process longer and more difficult to comprehend. Do yourself a favor and dedicate a directory to each data mining project you’re working on.

Page 7: How to Build Your First Predictive Model

Model Setup

Once you have imported your data, you need to set a few parameters (leaving most of them in default settings) before you click ‘start.’

10 parameters to pay attention to when building a model

Page 8: How to Build Your First Predictive Model

Select an Engine.

CARTMARS

TreeNet

Random Forests

CART Ensembles

RuleLearner/Model Compression

Regression

Logit

GPS/Generalized Lasso

Data Binning

Page 9: How to Build Your First Predictive Model

Classification. Regression.Logistic Binary.Unsupervised.

Page 10: How to Build Your First Predictive Model

You must have a target variable.

SELECT A TARGET VARIABLE AND PREDICTORS1

2

3

4

You should have multiple predictors.

You don’t need to use all of your predictors.Take note of categorical vs. continuous variables.

Page 11: How to Build Your First Predictive Model

SELECT A TESTING METHOD

No independent testing – exploratory treeFraction of cases selected at random for testing (%)Test sample contained in a separate file

V-fold cross-validation (i.e 10)

Page 12: How to Build Your First Predictive Model

• Learn rate

• Number of trees built

• Number of nodes in a tree

• Loss criterion

*These will vary depending on the modeling engine being used to build a model.

Salford Systems RecommendsThat You Manually Set Your:

Page 13: How to Build Your First Predictive Model

CLICK START!CLICK START!YOU ARE NOW BUILDING

YOUR FIRST MODEL

Page 14: How to Build Your First Predictive Model

EVALUATING YOUR PERFORMANCE

Don’t get overwhelmed by all of the fancy reporting features available in the SPM software suite. Start slow. We will show you where to begin if you are new to using SPM and just want to understand what your model means.

Page 15: How to Build Your First Predictive Model

What To Look For

• Mean Squared Error (MSE)• R-Squared• Test vs. Learn Performance• Variable Performance• Variable Dependence Plots (TreeNet)

Page 16: How to Build Your First Predictive Model

If you have already downloaded the SPM software, build a model!

Once you’ve built your first model, start tweaking some of the control parameters we discussed.

What is your best model performance so far?

… AND YOU’RE DONE!