50
Practical Advice for Debugging ML Algorithms Stephen Gould, Cheng Soon Ong, Mark Reid [email protected] November 2015

Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Practical Advice for

Debugging ML AlgorithmsStephen Gould, Cheng Soon Ong, Mark Reid

[email protected]

November 2015

Page 2: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Answer: 0

import numpy as np

x = np.array([0, 1, 41, 255], dtype=‘uint8’)

x += 1

print(x)

What is 255 + 1?

3

Page 3: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Answer: 16,777,216

import numpy as np

x = np.array([16777216], dtype=‘float32’)

x += 1

print(x)

What is 16,777,216 + 1?

4

Page 4: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Implementation Issues

• Numerical calculations on a computer are always subject to errors

• Machine learning algorithms are full of numerical calculations

• Numerical errors can be due to:• limited precision arithmetic

• we just saw two examples

• algorithmic limitations (e.g., generating true random numbers)

• careless implementations• we will see some example soon

• bugs!!!

5

Page 5: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Numerical Robustness Example

Consider the simple problem of computing a vector norm,

Problem: numerical overflow or underflow

6

Page 6: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Numerical Robustness Example (2)

The standard deviation of a set of measurements can be calculated as

where

but this takes two passes through the data.

7

Page 7: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Numerical Robustness Example (3)

A “better approach” is to perform the following equivalent calculation

which only requires one pass through the data.

What can go wrong with this implementation?

8

Page 8: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Bugs in ML Algorithms are Hard to Find

As an example, let’s say we are trying to minimize the following scalar function using (damped) Newton’s method,

which updates iterates as

9

Page 9: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Bugs in ML Algorithms are Hard to Find

What happens if we introduce a small bug?

10

Page 10: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Bugs in ML Algorithms are Hard to Find

11

Page 11: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Feature Scaling

• Numerical algorithms work best on well-scaled data. A common pre-processing step is to scale the input to have zero mean and unit variance (sometimes called whitening), e.g.,

• Note. Estimating the scaling parameters must not use test set data.

• Feature scaling does not change the “strength” of a classifier but it does help with convergence during training.

12

Page 12: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Feature Scaling Example

• Dataset: Iris [Fisher, 1936],three classes, four features, 50 examples per class

• Feature vector: squared raw features plus bias term

• Classifier: multi-class logistic (a.k.a. soft-max)

13

Page 13: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Numerical Issues Take Home Message

Wherever possible, use tried-and-tested third-party implementations

(but remember that not all open source code is created equally)

14

Page 14: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Machine Learning Pipeline

15

Page 15: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Regression Example

A classic regression problem is to fit a curve (e.g., a polynomial) to a set of points

16Adapted from http://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html

Page 16: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Dataset Partitioning

• training set: learn model parameters

• validation set: tune meta-parameters• (e.g., regularization strength, number of iterations, etc.)

• test/evaluation set: report performance• ideally used only once!

17

training set validation set test set

Page 17: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

1(train)

2(train)

3(train)

4(train)

5(train)

6(test)

7(train)

8(train)

9(train)

10(train)

Cross-validation

Cross-validation is a common method used to estimate how well a model will generalise to unseen data.

• K-fold: Split the data into K sets of roughly equal size. For the k-thfold, train the model on K - 1 parts and test on the k-th part. We can now use all the data to estimate the prediction error.

• Leave-one-out (LOOCV): set K to the size of the dataset.

18

Page 18: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Dataset Bias

Every dataset is biased.

19

Page 19: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Don’t think dataset bias won’t happen to you

20

Page 20: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Sampling Strategies on Regression Example

21Adapted from http://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html

• random sampling• Example: curve fitting• Example: classifying video frames

• unbalanced datasets• stratified sampling• data weighting (re-sampling with replacement)

Page 21: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Unbalanced Datasets (Random Sampling)

22

training set test set

dataset

Page 22: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Unbalanced Datasets (Stratified Sampling)

23

training set test set

dataset

Page 23: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Confusion Matrix for Classification Problems

• (i, j) entry: number of examples of class ithat were predicted as class j

• row sum: number of ground-truth examples of class i

• column sum: number of examples predicted as class j

• diagonal sum: number of correctly classified examples

• total sum: number of total examples

• Need not have equal number of rows and columns.

• Sometimes you will see the matrix transposed.

24

Page 24: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

• Often we care about overall classification accuracy. This is an example of micro-averaging,

• However, something we have an unbalanced dataset but wish to treat each class equally. This is an example of macro-averaging,

• More generally, we may also want to compute weighted accuracy.

Accuracy: Macro vs. Micro Averaging

25

Page 25: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Precision and Recall

• Terminology• true positive (TP), hit, detection

• true negative (TN), correct rejection

• false positive (FP), false alarm, type I error

• false negative (FN), miss, type II error

26

Derived Statistic Equation

recall, true positive rate, sensitivity, hit rate TP / (TP + FN)

positive predictive power, precision TP / (TP + FP)

true negative rate, specificity TN / (TN + FP)

accuracy (TP + TN) / (TP + FP + TN + FN)

F1-score 2 (precision . recall) / (precision + recall)

Page 26: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Precision-Recall Curves

27

Page 27: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Precision-Recall Curve Operating Points

28

classification rule:Pr(y = 1 | x) > t

Page 28: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Comparing PR Curves

Below we plot two algorithms, which is better?

29

Page 29: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Other Ways to Compare Algorithms

30

Page 30: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Analysis Take Home Message

1. Measure everything• This will save analysis and debugging time later

2. Choice of metrics can have a huge effect on interpretation of results

3. Ask questions of your metrics

31

Page 31: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Repeatability―Controlling Randomness

• Often you will want to compare differentvariants of an algorithm

• However, comparing different runs can isdifficult if the algorithm is stochastic

• transform random algorithm A(x) into deterministic algorithm A’(x, r) where ris a sequence of random numbers

• one way to do this is to use random seeds (np.random.seed(0))

“random chance seems to have operated in our favour”

randomizedalgorithm

deterministicalgorithm

x yx

yr

32

Page 32: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Exploring Features and Meta-parameters

33

change regularisation

change label weights

collect more data

run for more iterations

change regularisation

change label weights

collect more data

add features

run for more iterations

baseline model

Page 33: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Feature Selection

Often the number of available features is very large but there are only a small number of relevant features. Some recent methods try to learn features directly from data. However, often we are faced with the task of having to come up with a good set of features manually.

• Filter Feature Selection: Use a computationally cheap heuristic to evaluate features, e.g., mutual information between a feature and the class labels.

• Wrapper Feature Selection: Incrementally add the best feature to a feature set (forward feature selection) or remove the worst features from the feature set (backward feature selection).

34

Page 34: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Example: Forward Feature Selection

• Start with an empty feature set, F = {}

• Repeatedly try each feature i F, createFi = F {i} and use cross-validation to evaluate Fi. Set F to the best Fi.

35

Page 35: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnostics

• Diagnostics are about workingout why your algorithm is notgiving your the performanceyou want. What could theproblem be?• problem statement• data• features• algorithm/model• implementation• something else

• Take time to set up a good experimentalframework for repeated experiments

36

“give me six hours to cut down a tree and I will spend the first four sharpening my axe”

Page 36: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Visualise Your Data

37

Page 37: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnostic Example

Suppose that our test error is unacceptably high and we suspect the problem is either that the model is overfitting or the features are not good enough.

Diagnostic:

• The first hypothesis (overfitting) suggests that the training error will be much lower than the test error

• The second hypothesis (features) suggests that the training error and test error will both be high

38

Page 38: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Learning Curves

39

erro

r ra

te

training set size

training set

test set

Page 39: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Learning Curves: Bias vs. Variance

40

erro

r ra

te

training set size

training set

test set

target error rate

erro

r ra

te

training set size

training set

test set

target error rate

high variance high bias

Page 40: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Bias/Variance Trade-off

41

erro

r ra

te

model complexity

target error rate

training set

test set

high variance high bias

Page 41: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Fixes for Bias/Variance Problems

Diagnosing bias and variance problems provides us with hints as to what to try next.

For bias problems:• try a larger set of features

• try a richer model class

For variance problems:• try getting more training examples

• try a smaller set of features

42er

ror

rate

model complexity

target error rate

training set

test set

high variance high bias

Page 42: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Objective/Optimisation Problems

We may suspect that our poor performance is due to either a problem with our optimisation algorithm (e.g., not running for long enough) or a problem with our objective.

Unfortunately it is often verydifficult to determine whetheran iterative algorithm hasconverged.

43er

ror

rate

iteration count

converged?

Page 43: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnosing Optimisation Problems

Suppose we care about maximising some accuracy measure, perf(), and a learning algorithm is trying to minimise a surrogate loss().

• Let * be the parameters returned by our learning algorithm

• Let † be any other parameters (e.g., guesses or obtained from a different learning algorithm)

44

perf(†) > perf(*) perf(†) < perf(*)

loss(*) < loss(†) wrongobjective

no problem (?)

loss(*) > loss(†) pooroptimisation

poor optimisation(got lucky)

what we care about (higher better)

wh

at w

e o

pti

mis

e(l

ow

er b

ette

r)

Page 44: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Fixes for Optimisation/Objective Problems

Diagnosing optimisation versus objective problems provides us with hints as to what to try next.

For optimisation problems:• try running for more iterations

• try using a different algorithm (e.g., Newton’s method instead of gradient descent)

• try random restarts (e.g., for non-convex objectives)

• try smoothing

For objective problems:• try different regularisation

• try weighting training examples

• try a different loss function

• change the model

45

Page 45: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Approximate Search Algorithms

Suppose we are using an approximate nearest neighbour algorithm to find similar objects. We define a similarity measure that our algorithm can use.

How can we tell if we have a problem with the nearest neighbour algorithm or our similarity measure?

• Let x† be a match found by the algorithm

• Let x* be a hand selected match (ground-truth)

• If similarity(x, x†) < similarity(x, x*) then the problem is with the measure

• Otherwise, initialise the approximate nearest neighbour algorithm with the true solution:• If the algorithm moves away from the true solution then the problem is with the measure• Otherwise the problem is with the nearest neighbour algorithm

46

Page 46: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnostic Summary

• Diagnostics are an important tool when developing your machine learning algorithms• We showed examples for bias/variance, optimisation/objective, and search/score, but there

are many others

• Diagnostics can save a lot of wasted effort by guiding your choice of what to try next

• They also allow you to develop insights into your particular application and justify your design decisions

• Diagnostics often involve repeated experiments with different parameter settings while keeping everything else fixed

• Another important diagnostic tool is that of error analysis, i.e., understanding where your algorithm is making mistakes

47

Page 47: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Error Analysis

Error analysis tries to explain the difference between currentperformance and perfect performance.

• How much error is due to various different machine learning components in the application?Plug in the ground-truth (if available) into each component of the application and see how it affects accuracy. Alternatively, we could add noise to each component and, again, see how it affects accuracy.

• Does the algorithm fail on a particular subclass of examples?Visualise the data and results!

48

Page 48: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Ablative Analysis

Ablative analysis tries to explain the difference between some baselineperformance and the current performance.

Example: You’ve been working on your application for the past several months and now have a number of sophisticated features that you pass to a classifier. Which features account for the good performance of your classifier over a baseline classifier with some simple features?

Ablative analysis removes features from the application one at a time and sees which results in the biggest decrease in performance―similar to backward feature selection.

• Note that the order of removal matters.

49

Page 49: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnosing Your Implementation

Whenever you write some code or assemble machine learning components into a pipeline you’ll want to test your implementation.

• run against small synthetic test cases

• see what happens with ground-truth features

• see what happens with random features

• check boundary cases

• re-use known working components

50

Page 50: Debugging ML Algorithms - Amazon S3...•Machine learning algorithms are full of numerical calculations •Numerical errors can be due to: •limited precision arithmetic •we just

Diagnostics Take Home Messages

1. Visualize your data and learning progress2. Develop diagnostic tests3. Use good software development practices

• Revision control, revision control, revision control

“experimental confirmation of a prediction is merely a measurement; experimental disproving of a prediction is a discovery”

51