35
Evaluating Machine Learning Models – A Beginner’s Guide Alice Zheng, Dato September 15, 2015 1

Evaluating Machine Learning Models -- A Beginner's Guide

Embed Size (px)

Citation preview

Page 1: Evaluating Machine Learning Models -- A Beginner's Guide

1

Evaluating Machine Learning Models – A Beginner’s Guide

Alice Zheng, DatoSeptember 15, 2015

Page 2: Evaluating Machine Learning Models -- A Beginner's Guide

2

My machine learning trajectory

Applied machine learning(Data science)

Build ML tools

Shortage of expertsand good tools.

Page 3: Evaluating Machine Learning Models -- A Beginner's Guide

3

Why machine learning?

Model data.Make predictions.Build intelligent

applications.

Page 4: Evaluating Machine Learning Models -- A Beginner's Guide

4

Machine learning pipeline

I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, …

Raw data

FeaturesModels

Predictions

Deploy inproduction

GraphLab Create

Dato Distributed

Dato Predictive Services

Page 5: Evaluating Machine Learning Models -- A Beginner's Guide

The ML Jargon Challenge

Page 6: Evaluating Machine Learning Models -- A Beginner's Guide

6

Typical machine learning paper

… semi-supervised model for with large-scale learning from sparse data … sub-modular optimization for distributed computation…evaluated on real and synthetic datasets…performance exceeds start-of-the-art methods

Page 7: Evaluating Machine Learning Models -- A Beginner's Guide

7

What it looks like to ML researchers

Such regularize!

Much optimal

So sparsity

Wow!

Amaze

Very scale

Page 8: Evaluating Machine Learning Models -- A Beginner's Guide

8

What it looks like to normal people

Page 9: Evaluating Machine Learning Models -- A Beginner's Guide

9

What it’s like in practice

Doesn’t scaleBrittle ?

Hard to tune

? ???

??? ?

Doesn’t solve my problem on my data

Page 10: Evaluating Machine Learning Models -- A Beginner's Guide

10

Achieve Machine Learning Zen

Page 11: Evaluating Machine Learning Models -- A Beginner's Guide

11

Why is evaluation important?• So you know when you’ve succeeded• So you know how much you’ve succeeded• So you can decide when to stop• So you can decide when to update the model

11

Page 12: Evaluating Machine Learning Models -- A Beginner's Guide

12

Basic questions for evaluation• When to evaluate?• What metric to use?• On what data?

12

Page 13: Evaluating Machine Learning Models -- A Beginner's Guide

13

When to evaluate

Onlineevaluation

Historical data

Onlineevaluation

results

Offlineevaluation

Livedata

Offlineevaluationon live data

Prototypemodel

Trainingresults

Validationresults

Deployed model

Page 14: Evaluating Machine Learning Models -- A Beginner's Guide

Evaluation Metrics

Page 15: Evaluating Machine Learning Models -- A Beginner's Guide

15

Types of evaluation metric• Training metric• Validation metric• Tracking metric• Business metric

“But they may not match!”

Uh-oh Penguin

Page 16: Evaluating Machine Learning Models -- A Beginner's Guide

16

Example: recommender system• Given data on which users liked which items,

recommend other items to users• Training metric

- How well is it predicting the preference score? - Residual mean squared error: (actual – predicted)2

• Validation metric- Does it rank known preferences correctly?- Ranking loss

Page 17: Evaluating Machine Learning Models -- A Beginner's Guide

17

Example: recommender system• Tracking metric

- Does it rank items correctly, especially for top items?- Normalized Discounted Cumulative Gain (NDCG)

• Business metric- Does it increase the amount of time the user spends on

the site/service?

Page 18: Evaluating Machine Learning Models -- A Beginner's Guide

18

Dealing with metrics• Many possible metrics at different

stages• Defining the right metric is an art

- What’s useful? What’s feasible?• Aligning the metrics will make

everyone happier- Not always possible: cannot directly

train model to optimize for user engagement

“Do the best you

can!”

Okedokey Donkey

Page 19: Evaluating Machine Learning Models -- A Beginner's Guide

Model Selection and Tuning

Page 20: Evaluating Machine Learning Models -- A Beginner's Guide

Model Selection and Tuning

Historical data

Hyperparametertuning

Trainingdata

Validationdata

Model training

Model

Trainingresults

Validationresults

Page 21: Evaluating Machine Learning Models -- A Beginner's Guide

21

Key questions for model selection• What’s validation?• What’s a hyperparameter and how do you tune it?

21

Page 22: Evaluating Machine Learning Models -- A Beginner's Guide

22

Model validation• Measure generalization error

- How well the model works on new data- “New” data = data not used during training

• Train on one dataset, validate on another• Where to find “new” data for validation?

- Clever re-use of old data

Page 23: Evaluating Machine Learning Models -- A Beginner's Guide

23

Methods for simulating new dataHold-out validation

Data

Training Validation

K-fold cross validation

Data

1 2 3 K…

Validation set

Bootstrap resampling

Data

Resampled dataset

Page 24: Evaluating Machine Learning Models -- A Beginner's Guide

24

Hyperparameter tuning vs. model training

Best model parameters

Best hyperparameters

Hyperparameter tuning

Model training

Page 25: Evaluating Machine Learning Models -- A Beginner's Guide

25

Hyperparameters != model parameters

Feature 2

Feature 1

Classification between two classes

Model parameter

Hyperparameter:How many features to use

Page 26: Evaluating Machine Learning Models -- A Beginner's Guide

26

Why is hyperparameter tuning hard?• Involves model training as a sub-process

- Can’t optimize directly• Methods:

- Grid search- Random search- Smart search

• Gaussian processes/Bayesian optimization• Random forests• Derivative-free optimization• Genetic algorithms

Page 27: Evaluating Machine Learning Models -- A Beginner's Guide

Online Evaluations

Page 28: Evaluating Machine Learning Models -- A Beginner's Guide

28

ML in production - 101

ModelHistorical

Data

Predictions

LiveData

Feedback

Batch training

Real-time predictions

Page 29: Evaluating Machine Learning Models -- A Beginner's Guide

29

ML in production - 101

ModelHistorical

Data

Real-time predictions

Batch training

PredictionsModel 2

LiveData

Page 30: Evaluating Machine Learning Models -- A Beginner's Guide

30

Why evaluate models online?• Track real performance of model over time• Decide which model to use when

Page 31: Evaluating Machine Learning Models -- A Beginner's Guide

31

Choosing between ML models

Model 2

Model 12000 visits10% CTR

Group A

Everybody gets Model 2

2000 visits30% CTR

Group B

Strategy 1: A/B testing—select the best model and use it all the time

Page 32: Evaluating Machine Learning Models -- A Beginner's Guide

32

Choosing between ML modelsA statistician walks into a casino…

Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500

Play this 85% of the time

Play this 10% of the time

Play this 5% of the time

Multi-armed bandits

Page 33: Evaluating Machine Learning Models -- A Beginner's Guide

33

Choosing between ML modelsA statistician walks into an ML production environment

Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500

Use this 85% of the time

(Exploitation)

Use this 10% of the time

(Exploration)

Use this 5% of the time

(Exploration)

Model 1 Model 2 Model 3

Page 34: Evaluating Machine Learning Models -- A Beginner's Guide

34

MAB vs. A/B testingWhy MAB?• Continuous optimization, “set and forget”• Maximize overall reward

Why A/B test?• Simple to understand• Single winner• Tricky to do right

Page 35: Evaluating Machine Learning Models -- A Beginner's Guide

35

That’s not all, folks!Read the details• Blog posts: http://blog.dato.com

/topic/machine-learning-primer • Report: http://oreil.ly/1L7dS4a

• Dato is hiring! [email protected]

[email protected] @RainyData