32
[email protected] Machine Learning: Boosting Analytics Model Performance

Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

Embed Size (px)

Citation preview

Page 1: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Machine Learning: Boosting

Analytics Model Performance

Page 2: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

THE JOB OF DATA SCIENTISTSDoes this sound familiar to anyone?

Page 3: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

How to design a strategy for boosting performance.

2- Strategy

How to use Feature Engineering to boost model performance.

3. Features

Explaining why boosting performance is relevant.

1- Background

Time for questions from the audience.

5. Questions

A collection of free resources for boosting model performance.

4. Bonus Round

AGENDA

Page 4: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

BOOSTING MODEL PERFOMANCESection 1: Background

Page 5: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Explaining why boosting performance is relevant.

1- Background

SECTION 1: Background

Page 6: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TIPS SOURCESWhere do the recommendations originate?

197 Kaggle Winner

Interviews

How did they win?

50 In-depth Case

Studies

Which factors mattered

25,000 Head-to-Head

Tests

What made the difference?

Page 7: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

WHERE HAVE THESE TIPS WORKED?

IMPORTANT: All views expressed are solely my own, and should not be taken as being those of current or past employers, clients or others.

Page 8: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TWO CATEGORIES OF TIPSPresentation Focus

The plan, method, series of tactics or stratagems for building your model.

Model StrategyPart 1

The process for identifying, building, developing, standardizing, normalizing and engineering the correct inputs for one or more analytics processes.

Data PreparationPart 2

Page 9: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

BOOSTING MODEL PERFOMANCESection 2: Model Strategy

Page 10: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

How to design a strategy for boosting performance.

2- Strategy

Explaining why boosting performance is relevant.

1- Background

SECTION 2Strategy

Page 11: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Source: Jeong-Yoon Lee, Chief Data Scientist at Conversion Logic,https://www.slideshare.net/jeongyoonlee/data-science-competition-72596610

TIP 1: Leverage Extreme EnsemblesThe performance boost from models with non-correlated errors is consistently higher than single models or smaller ensembles.

Source: Owen Zhang, Chief Product Officer at DataRobot,https://www.slideshare.net/OwenZhang2/tips-for-data-science-competitions

• 6-layer process• 5 distinct data prep steps• 31 combined feature sets• 2 layers of 3 models each

2015 Liberty Mutual ContestOwen Zhang

• 7 feature sets• 64 component models• 15 models in Level 1 Ensemble• 2 models in Level 2 Ensemble

2015 KDD CUPJeong-Yoon Lee

Page 12: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

• Seed lists• Old, unusable lead sources• Discontinued markets

MARKETINGEliminate irrelevant populations

• Low dollar thresholds

• “Best” customers

• Higher authentication transactions

• “Standing” transactions

• Canceled transfers

FRAUDEliminate “safer” populations

• What do you already know?• What is beyond your influence?• Which problems can be handled separately?

GENERALOther instances

TIP 2: Reduce Decision SpaceReduce the Decision Space

Page 13: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TIP 3: Use Targeted AUC Instead of Total AUCMatch model objective to organizational objective. Example courtesy of ORACLE.

• Less common approach• Perfect for projects with target thresholds such as

limited marketing budgets or maximum fraud referral/ turndown rates

• Sacrifices overall accuracy for accuracy at lower threshold targets

TARGETED AUCOptimizes targeted model performance

• Traditional approach• Perfect for may Kaggle competitions• Sacrifices accuracy at lower threshold targets for

overall accuracy

TOTAL AUCOptimizes overall model performance

Page 14: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TIP 4: Cross-Validate EverywhereReducing overfitting while extracting maximum learning from your data

OUT-OF-SAMPLE VALIDATION

Traditional methodology

CROSS-VALIDATION

Used to reduce both overfitting and outlier influence

Page 15: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TIP 5: Algorithm ArsenalLeverage diverse modeling arsenal

Bayesian Network

Gradient Boosting

Machines

Random Forests

Logistic Regression

Factorization Machines

Neural Network

Genetic Algorithms

Support Vector Machines

Page 16: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

BOOSTING MODEL PERFOMANCESection 3: Features

Page 17: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

How to design a strategy for boosting performance.

2- Strategy

How to use Feature Engineering to boost model performance.

3. Features

Explaining why boosting performance is relevant.

1- Background

SECTION 3Features

Page 18: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

TIP 7: Test Variable Transformation FunctionsFeatures

Page 19: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

“Stumps” represent the first split in decision trees, and make powerful “weak learners.” Create a derived feature for each input.

1. Derive “Stumps”

Using trees creates bin “boundaries” directly associated with the dependent variable, rather than a more arbitrary approach. Assign bins for each continuous inputs.

2. Bin Continuous Inputs

Missing values assigned to a separate, unique category preserves information content and eliminates arbitrary replacement approaches.

3. Handle Missing Values

Each input, regardless of data type, can have consistent, normalized scaling by using something like NORM Sigmoid or Yule’s Q for each terminal node from each univariate tree.

5. Normalize scaling

Calling out tree nodes with uniquely powerful splitting capabilities as derived features leverages the most benefit from single inputs.

4. Derive High-Impact Flags

Re-coding the original input into the values from the terminal nodes makes interpretation much easier.

6. Overall Transformation

TIPS 8-13: Univariate Tree Feature EngineeringFeatures

Page 20: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Moving Away From… Moving Toward…

TIP 14: Think “Crafts-person-ship”Less “Assembly Line,” More “Fine Craftsmanship”

Page 21: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

BOOSTING MODEL PERFOMANCESection 4: Bonus Round

Page 22: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

How to design a strategy for boosting performance.

2- Strategy

How to use Feature Engineering to boost model performance.

3. Features

Explaining why boosting performance is relevant.

1- Background

A collection of free resources for boosting model performance.

4. Bonus Round

SECTION 4Bonus Round

Page 23: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

2. Create Common Table

of Values for Each Node

3. Calculate Z-Score

Across Entire Table

5. Calculate Avg., High

and Low

6. Gradient Boosting4. Assign New Value to

New Derived Feature

1. Univariate Tree

Models

Bonus Round:

Patent-Application IMPACT FeaturesPatent application approach for transforming and combining model inputs

Page 24: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

How to design a strategy for boosting performance.

2- Strategy

How to use Feature Engineering to boost model performance.

3. Features

Explaining why boosting performance is relevant.

1- Background

Time for questions from the audience.

5. Questions

A collection of free resources for boosting model performance.

4. Bonus Round

AGENDA

Page 25: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

USA 1-443-810-8066

[email protected]

MktgSciences3719 Yolando RoadBaltimore, MD 21218

Get in TouchSee you soon....

Page 26: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Source: Jeong-Yoon Lee, Chief Data Scientist at Conversion Logic,https://www.slideshare.net/jeongyoonlee/data-science-competition-72596610

MODEL STRATEGY TIP 1Cross-validate everywhere.

Page 27: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Source: Owen Zhang, Chief Product Officer at DataRobot,https://www.slideshare.net/OwenZhang2/tips-for-data-science-competitions

MODEL STRATEGY TIP 1Cross-validate everywhere.

Page 28: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

THANK YOU...

Page 29: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

BOOSTING MODEL PERFOMANCEAppendix

Page 30: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

DEFINITIONS

performance(noun):

“the manner in which or the efficiencywith which something reacts or fulfills its intended purpose.”

Page 31: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

Moving Away From… Moving Toward…

PERFORMANCE IS BEING MORE CLOSELY MEASURED

Page 32: Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance

[email protected]

PEFORMANCE WILL DETERMINE COMPENSATIONLike it or not, Data Science compensation will become more closely tied to model performance.