31
© sebastian thrun, CMU, 2000 1 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610

© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 1

10-610 The KDD Lab

Intro: Outcome Analysis

Sebastian ThrunCarnegie Mellon University

www.cs.cmu.edu/~10610

Page 2: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 2

Problem 1

You find out on testing data, your speech recognizer can recognize sentences with 68% word accuracy, whereas previous recognizers achieve 60%. Would you advice a company to adopt your speech recognizer?

Page 3: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 3

Problem 2

On testing data, your data mining algorithm can predict emergency C-sections with 68% accuracy, whereas a previous $1,000 test achieves 60% accuracy. Do you recommend to replace the previous test by your new method?

Page 4: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 4

Characterize: What Should We Worry about?

cost/loss

D

D dxxpxfxL )()),(,(

FP/FN errors

regression

quadratic error

unsupervised learning

log likelihood

pattern classification

+ -

classification error

Page 5: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 5

ROC Curves (ROC=Receiver Operating Characteristic)

Page 6: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 6

Error Types

Type I error, alpha error, false positive: Probability of accepting hypothesis if not true

Type II error, beta error, false negative: Probability of rejecting hypothesis when it is true

Page 7: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 7

ROC Curves (ROC=Receiver Operating Characteristic)

Page 8: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 8

ROC Curves (ROC=Receiver Operating Characteristic)

Sensitivity: probability that a test result will be positive when the disease is present

Specificity: probability that a test result will be negative when the disease is not present

Positive likelihood ratio: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease

Negative likelihood ratio: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease

Positive predictive value (PPV): probability that the disease is present when the test is positive

Negative predictive value (NPV): probability that the disease is not present when the test is negative

negative falsepositive true

positive true

positive falsenegative true

negative true

negative true

negative true

positive true

positive true

positive falsepositive true

positive true

negative falsenegative true

negative true

Page 9: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 9

Evaluating Machine Learning Algorithms

plenty data little data

Page 10: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 10

Holdout Set

Data

evaluate errortrain

Often also used for parameter optimization

Page 11: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 11

Example:

Hypothesis misclassifies 12 out of 40 examples in cross validation set S.

Q: What will the “true” error on future examples?

A:

Page 12: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 12

Finite Cross-Validation Set

True error:

Test error:

D

D ydxyxpxfye ,),(),(

Syx

S xfym

e,

),(1

ˆ

D = all data

m = #test samples S = test data

(true risk)

(empirical risk)

Page 13: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 13

Confidence Intervals (See Mitchell 97)

If• S contains m examples, drawn independently• m 30

Then• With approximately 95% probability, the true error eD

lies in the interval

m

eee SS

S

)ˆ1(ˆ96.1ˆ

Page 14: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 14

Example:

Hypothesis misclassifies 12 out of 40 examples in cross validation set S.

Q: What will the “true” error on future examples?

A: With 95% confidence, the true error will be in the interval:

m

eee SS

S

)ˆ1(ˆ96.1ˆ]44.0;16.0[

3.040

12ˆ Se40m 14.0

)ˆ1(ˆ96.1

m

ee SS

Page 15: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 15

Confidence Intervals (See Mitchell 97)

If• S contains n examples, drawn independently• n 30

Then• With approximately N% probability, the true error eD lies

in the interval

m

eeze SS

NS

)ˆ1(ˆˆ

N% 50% 68% 80% 90% 95% 98% 99%

zN 0.67 1.0 1.28 1.64 1.96 2.33 2.58

Page 16: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 16

Finite Cross-Validation Set

True error:

Test error:

Number of test errors: Is Binomially distributed:

D

D ydxyxpxfye ,),(),(

Syx

S xfym

e,

),(1

ˆ

knD

kD

Syx

eekmk

mkxfyp

)1()(

)!(!

!),(

,

Page 17: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 17

Binomial DistributionBinomial distribution for eD=0.3 and m=40

P(k)

Approximates Normal distribution (Central Limit Theorem)

Page 18: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 18

95% Confidence Intervals

Page 19: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 19

Question

What’s the difference between variance and confidence intervals?

Basically a factorm

1

m

eeze SS

NS

)ˆ1(ˆˆ

Page 20: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 20

Common Performance Plot

Testing Error95% confidence

intervals

Page 21: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 21

Comparing Different Hypotheses

True difference:

Test set difference:

95% Confidence interval:

)(ˆ)(ˆˆ21 SS eed

)()( 21 DD eed

2

22

1

11 ))(ˆ1()(ˆ))(ˆ1()(ˆ96.1ˆ

m

ee

m

eed SSSS

Page 22: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 22

Evaluating Machine Learning Algorithms

plenty data little data

Page 23: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 23

Holdout Set

Data

evaluate errortrain

Page 24: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 24

k-fold Cross Validation

Data

Train on yellow, evaluate on pink error5

Train on yellow, evaluate on pink error6

Train on yellow, evaluate on pink error7

Train on yellow, evaluate on pink error1

Train on yellow, evaluate on pink error3

Train on yellow, evaluate on pink error4

Train on yellow, evaluate on pink error8

Train on yellow, evaluate on pink error2

error = errori / k

k-way split

Page 25: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 25

The Jackknife

Data

Page 26: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 26

The Bootstrap

Data

Repeat and average

Train on yellow, evaluate on pink error

Page 27: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 27

What’s the Problem?

Confidence intervals assume independence. But our individual estimates are dependent.

Page 28: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 28

Comparing Different Hypotheses: Paired t test

True difference:

For each partition k:

Average:

N% Confidence interval:

k

iid

kd

1

ˆ1ˆ

)()( 21 DD eed

k

iikN kk

td1

21, )ˆˆ(

)1(

test error for partition k

)(ˆ)(ˆˆ2,1, kSkSk eed

k-1 is degrees of freedom N is confidence level

90% 95% 98% 99%

=2 2.92 4.30 6.96 9.92

=5 2.02 2.57 3.36 4.03

=10 1.81 2.23 2.76 3.17

=20 1.72 2.09 2.53 2.84

=30 1.70 2.04 2.46 2.75

=120 1.66 1.98 2.36 2.62

= 1.64 1.96 2.33 2.58

Page 29: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 29

Evaluating Machine Learning Algorithms

plenty data little data

unlimiteddata

Page 30: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 30

Asymptotic Prediction

Useful for very large data sets

Page 31: © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University 10610

© sebastian thrun, CMU, 2000 31

Summary

Know your loss function! Finite testing data: report confidence intervals Scarce data: Repartition training/testing set Asymptotic prediction: exponential

Put thoughts into your evaluation, and be critical. Convince yourself!