© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie...

10-610 The KDD Lab

Intro: Outcome Analysis

Sebastian ThrunCarnegie Mellon University

www.cs.cmu.edu/~10610

Problem 1

You find out on testing data, your speech recognizer can recognize sentences with 68% word accuracy, whereas previous recognizers achieve 60%. Would you advice a company to adopt your speech recognizer?

Problem 2

On testing data, your data mining algorithm can predict emergency C-sections with 68% accuracy, whereas a previous $1,000 test achieves 60% accuracy. Do you recommend to replace the previous test by your new method?

Characterize: What Should We Worry about?

cost/loss

D dxxpxfxL )()),(,(

FP/FN errors

regression

quadratic error

unsupervised learning

log likelihood

pattern classification

classification error

ROC Curves (ROC=Receiver Operating Characteristic)

Error Types

Type I error, alpha error, false positive: Probability of accepting hypothesis if not true

Type II error, beta error, false negative: Probability of rejecting hypothesis when it is true

Sensitivity: probability that a test result will be positive when the disease is present

Specificity: probability that a test result will be negative when the disease is not present

Positive likelihood ratio: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease

Negative likelihood ratio: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease

Positive predictive value (PPV): probability that the disease is present when the test is positive

Negative predictive value (NPV): probability that the disease is not present when the test is negative

negative falsepositive true

positive true

positive falsenegative true

negative true

positive true

positive falsepositive true

positive true

negative falsenegative true

negative true

Evaluating Machine Learning Algorithms

plenty data little data

Holdout Set

evaluate errortrain

Often also used for parameter optimization

Example:

Hypothesis misclassifies 12 out of 40 examples in cross validation set S.

Q: What will the “true” error on future examples?

Finite Cross-Validation Set

True error:

Test error:

D ydxyxpxfye ,),(),(

S xfym

D = all data

m = #test samples S = test data

(true risk)

(empirical risk)

Confidence Intervals (See Mitchell 97)

If• S contains m examples, drawn independently• m 30

Then• With approximately 95% probability, the true error eD

lies in the interval

eee SS

)ˆ1(ˆ96.1ˆ

Example:

Hypothesis misclassifies 12 out of 40 examples in cross validation set S.

Q: What will the “true” error on future examples?

A: With 95% confidence, the true error will be in the interval:

eee SS

)ˆ1(ˆ96.1ˆ]44.0;16.0[

12ˆ Se40m 14.0

)ˆ1(ˆ96.1

Confidence Intervals (See Mitchell 97)

If• S contains n examples, drawn independently• n 30

Then• With approximately N% probability, the true error eD lies

in the interval

eeze SS

)ˆ1(ˆˆ

N% 50% 68% 80% 90% 95% 98% 99%

zN 0.67 1.0 1.28 1.64 1.96 2.33 2.58

Finite Cross-Validation Set

True error:

Test error:

Number of test errors: Is Binomially distributed:

D ydxyxpxfye ,),(),(

S xfym

mkxfyp

Binomial DistributionBinomial distribution for eD=0.3 and m=40

Approximates Normal distribution (Central Limit Theorem)

95% Confidence Intervals

Question

What’s the difference between variance and confidence intervals?

Basically a factorm

eeze SS

)ˆ1(ˆˆ

Common Performance Plot

Testing Error95% confidence

intervals

Comparing Different Hypotheses

True difference:

Test set difference:

95% Confidence interval:

)(ˆ)(ˆˆ21 SS eed

)()( 21 DD eed

11 ))(ˆ1()(ˆ))(ˆ1()(ˆ96.1ˆ

eed SSSS

Holdout Set

evaluate errortrain

k-fold Cross Validation

Train on yellow, evaluate on pink error5

error = errori / k

k-way split

The Jackknife

The Bootstrap

Repeat and average

Train on yellow, evaluate on pink error

What’s the Problem?

Confidence intervals assume independence. But our individual estimates are dependent.

Comparing Different Hypotheses: Paired t test

True difference:

For each partition k:

Average:

N% Confidence interval:

)()( 21 DD eed

iikN kk

21, )ˆˆ(

test error for partition k

)(ˆ)(ˆˆ2,1, kSkSk eed

k-1 is degrees of freedom N is confidence level

90% 95% 98% 99%

=2 2.92 4.30 6.96 9.92

=5 2.02 2.57 3.36 4.03

=10 1.81 2.23 2.76 3.17

=20 1.72 2.09 2.53 2.84

=30 1.70 2.04 2.46 2.75

=120 1.66 1.98 2.36 2.62

= 1.64 1.96 2.33 2.58

unlimiteddata

Asymptotic Prediction

Useful for very large data sets

Summary

Know your loss function! Finite testing data: report confidence intervals Scarce data: Repartition training/testing set Asymptotic prediction: exponential

Put thoughts into your evaluation, and be critical. Convince yourself!

© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie...

Documents

High-level robot behavior control using POMDPs Joelle Pineau and Sebastian Thrun Carnegie Mellon University

Sebastian Thrun Carnegie Mellon University Statistical Learning in Robotics State-of-the-Art, Challenges and Opportunities

Structure From Motion Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision

Junior: The Stanford Entry in the Urban Challenge - Sebastian Thrun

SCAPE: Shape Completion and Animation PEople Stanford University Dragomir Anguelov Praveen Srinivasan Daphne Koller Sebastian Thrun Jim Rodgers UC, Santa

Generalized-ICP - University of Oxfordavsegal/resources/papers/Generalized_ICP.pdf · Sebastian Thrun Stanford University Email: thrun@stanford.edu Abstract—In this paper we combine

Stanford CS223B Computer Vision, Winter 2005 Final Project Presentations + Papers Sebastian Thrun, Stanford Rick Szeliski, Microsoft Hendrik Dahlkamp and

Sebastian Thrun & Jana Kosecka CS223B Computer Vision, Winter 2007 Stanford CS223B Computer Vision, Winter 2007 Lecture 2b Software for Computer Vision

Sebastian Thrun Carnegie Mellon & Stanford Wolfram Burgard University of Freiburg and Dieter Fox University of Washington Probabilistic Algorithms for

Camera Calibration Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision (with material from

Robotic Mapping: A Survey Sebastian Thrun, 2002 Presentation by David Black-Schaffer and Kristof Richmond

Probabilistic Algorithms in Robotics - Stanford Universityrobots.stanford.edu/papers/thrun.probrob.pdf · Probabilistic Algorithms in Robotics Sebastian Thrun ... in which a mobile

1 Lecture 11 Segmentation and Grouping Gary Bradski Sebastian Thrun * Pictures from Mean Shift: A Robust Approach

Towards robotic assistants in nursing homes: challenges and results Joelle Pineau Michael Montemerlo Martha Pollack * Nicholas Roy Sebastian Thrun Carnegie

© sebastian thrun, CMU, 20001 CS226 Statistical Techniques In Robotics Sebastian Thrun (Instructor) and Josh Bao (TA)

Hierarchical Methods for Planning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig

Shape From Symmetry - Sebastian Thrunrobots.stanford.edu/papers/thrun-symmetry05.pdf · Shape From Symmetry Sebastian Thrun Stanford University and Strider Labs 353 Serra Mall, Stanford,

Robotics Daniel Vasicek 2012/04/15. Topics for discussion – Free College Classes – Robotics (Sebastian Thrun,

Image Stitching and Panoramas Stanford CS223B Computer Vision, Winter 2007 Professors Sebastian Thrun and Jana Kosecka Vaibhav Vaish

Stereo Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision (with slides by James Rehg and