62
Spooky Stuff in Metric Space Spooky Stuff in Metric Space

Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Embed Size (px)

DESCRIPTION

Motivation #1

Citation preview

Page 1: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Spooky Stuff in Metric SpaceSpooky Stuff in Metric Space

Page 2: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Spooky StuffSpooky StuffData Mining in Metric Space

Rich CaruanaAlex Niculescu

Cornell University

Page 3: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #1

Page 4: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #1: Pneumonia Risk Prediction

Page 5: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #1: Many Learning Algorithms Neural nets Logistic regression Linear perceptron K-nearest neighbor Decision trees ILP (Inductive Logic Programming) SVMs (Support Vector Machines) Bagging X Boosting X Rule learners (C2, …) Ripper Random Forests (forests of decision trees) Gaussian Processes Bayes Nets …

No one/few learning methods dominates the others

Page 6: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2

Page 7: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2: SLAC B/Bbar Particle accelerator generates B/Bbar particles Use machine learning to classify tracks as B or Bbar Domain specific performance measure: SLQ-Score 5% increase in SLQ can save $1M in accelerator time

SLAC researchers tried various DM/ML methods Good, but not great, SLQ performance We tried standard methods, got similar results We studied SLQ metric:

– similar to probability calibration– tried bagged probabilistic decision trees (good on C-Section)

Page 8: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2: Bagged Probabilistic Trees

Draw N bootstrap samples of data Train tree on each sample ==> N trees Final prediction = average prediction of N trees

Average prediction(0.23 + 0.19 + 0.34 + 0.22 + 0.26 + … + 0.31) / # Trees = 0.24

Page 9: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2: Improves Calibration Order of Magnitude

Poor Calibration

Excellent Calibration

single tree

100 bagged trees

Page 10: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2: Significantly Improves SLQ

100 bagged trees

single tree

Page 11: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #2

Can we automate this analysis of performance metrics so that it’s easier to recognize which metrics are similar to each other?

Page 12: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #3

Page 13: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation #3

Threshold Metrics Rank/Ordering Metrics Probability Metrics

Model Accuracy F-Score Lift ROC Area Average Precision

Break Even Point

Squared Error

Cross-Entropy Calibration SAR Mean

SVM 0.8134 0.9092 0.9480 0.9621 0.9335 0.9377 0.8767 0.8778 0.9824 0.9055 0.9156

ANN 0.8769 0.8752 0.9487 0.9552 0.9167 0.9142 0.8532 0.8634 0.9881 0.8956 0.9102

BAG-DT 0.8114 0.8609 0.9465 0.9674 0.9416 0.9220 0.8588 0.8942 0.9744 0.9036 0.9086

BST-DT 0.8904 0.8986 0.9574 0.9778 0.9597 0.9427 0.6066 0.6107 0.9241 0.8710 0.8631

KNN 0.7557 0.8463 0.9095 0.9370 0.8847 0.8890 0.7612 0.7354 0.9843 0.8470 0.8559

DT 0.5261 0.7891 0.8503 0.8678 0.7674 0.7954 0.5564 0.6243 0.9647 0.7445 0.7491

BST-STMP 0.7319 0.7903 0.9046 0.9187 0.8610 0.8336 0.3038 0.2861 0.9410 0.6589 0.7303

Page 14: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Scary Stuff In ideal world:

– Learn model that predicts correct conditional probabilities (Bayes optimal)– Yield optimal performance on any reasonable metric

In real world: – Finite data– 0/1 targets instead of conditional probabilities– Hard to learn this ideal model– Don’t have good metrics for recognizing ideal model– Ideal model isn’t always needed

In practice:– Do learning using many different metrics: ACC, AUC, CXE, RMS, …– Each metric represents different tradeoffs– Because of this, usually important to optimize to appropriate metric

Page 15: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Scary Stuff

Page 16: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Scary Stuff

Page 17: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

In this work we compare nine commonly used performance metrics by applying data mining to the results of a massive

empirical study

Goals:Goals:– Discover relationships between performance metricsDiscover relationships between performance metrics– Are the metrics really that different?Are the metrics really that different?– If you optimize to metric X, also get good perf on metric Y?If you optimize to metric X, also get good perf on metric Y?– Need to optimize to metric Y, which metric X should you optimize to?Need to optimize to metric Y, which metric X should you optimize to?– Which metrics are more/less Which metrics are more/less robustrobust??– Design new, better metrics?Design new, better metrics?

Page 18: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

10 Binary Classification Performance Metrics

Threshold MetricsThreshold Metrics::– AccuracyAccuracy– F-ScoreF-Score– LiftLift

Ordering/Ranking MetricsOrdering/Ranking Metrics::– ROC AreaROC Area– Average PrecisionAverage Precision– Precision/Recall Break-Even PointPrecision/Recall Break-Even Point

Probability MetricsProbability Metrics::– Root-Mean-Squared-ErrorRoot-Mean-Squared-Error– Cross-EntropyCross-Entropy– Probability CalibrationProbability Calibration

SAR = ((1 - Squared Error) + Accuracy + ROC Area) / 3SAR = ((1 - Squared Error) + Accuracy + ROC Area) / 3

Page 19: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Accuracy

Predicted 1 Predicted 0

True

0

Tr

ue 1 a b

c d

correct

incorrect

accuracy = (a+d) / (a+b+c+d)threshold

Page 20: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Lift not interested in accuracy on entire dataset want accurate predictions for 5%, 10%, or 20% of dataset don’t care about remaining 95%, 90%, 80%, resp. typical application: marketing

how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold)

lift(threshold) =%positives>threshold%dataset>threshold

Page 21: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Lift

Predicted 1 Predicted 0

True

0

Tr

ue 1

a b

c d

threshold

lift= a (a+b)(a+c) (a+b+c+d)

Page 22: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

lift = 3.5 if mailings sent to 20% of the customers

Page 23: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Precision/Recall, F, Break-Even Pt

PRECISION = a /(a + c)

RECALL = a /(a + b)

F =2* (PRECISION×RECALL)(PRECISION+RECALL)

Break Even Point = PRECISION = RECALL

harmonic average of precision and recall

Page 24: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

betterperformance

worseperformance

Page 25: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Predicted 1 Predicted 0Tr

ue 0

True

1 truepositive

falsenegative

falsepositive

truenegative

Predicted 1 Predicted 0

True

0

Tr

ue 1

hits misses

falsealarms

correctrejections

Predicted 1 Predicted 0

True

0

Tr

ue 1

P(pr1|tr1) P(pr0|tr1)

P(pr0|tr0)P(pr1|tr0)

Predicted 1 Predicted 0

True

0

Tr

ue 1

TP FN

TNFP

Page 26: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

ROC Plot and ROC Area Receiver Operator Characteristic Developed in WWII to statistically model false positive and false

negative detections of radar operators Better statistical foundations than most other measures Standard measure in medicine and biology Becoming more popular in ML

Sweep threshold and plot – TPR vs. FPR– Sensitivity vs. 1-Specificity– P(true|true) vs. P(true|false)– Sensitivity = a/(a+b) = Recall = LIFT numerator– 1 - Specificity = 1 - d/(c+d)

Page 27: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

diagonal line israndom prediction

Page 28: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Calibration Good calibration:

If 1000 x’s have pred(x) = 0.2, ~200 should be positive

∀r x , prediction

r x ( ) = p(

r x )

Page 29: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Calibration Model can be accurate but poorly calibrated

– good threshold with uncalibrated probabilities Model can have good ROC but be poorly calibrated

– ROC insensitive to scaling/stretching– only ordering has to be correct, not probabilities themselves

Model can have very high variance, but be well calibrated Model can be stupid, but be well calibrated Calibration is a real oddball

Page 30: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Measuring Calibration Bucket method

In each bucket:– measure observed c-sec rate– predicted c-sec rate (average of probabilities)– if observed csec rate similar to predicted csec rate => good

calibration in that bucket

#0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

# # # # # # # # ## # # # # # # # #

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Page 31: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Calibration Plot

Page 32: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Experiments

Page 33: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Base-Level Learning Methods Decision trees K-nearest neighbor Neural nets SVMs Bagged Decision Trees Boosted Decision Trees Boosted Stumps

Each optimizes different things Each best in different regimes Each algorithm has many variations and free parameters Generate about 2000 models on each test problem

Page 34: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Data Sets 7 binary classification data sets

– Adult– Cover Type– Letter.p1 (balanced)– Letter.p2 (unbalanced)– Pneumonia (University of Pittsburgh)– Hyper Spectral (NASA Goddard Space Center)– Particle Physics (Stanford Linear Accelerator)

4 k train sets Large final test sets (usually 20k)

Page 35: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Massive Empirical Comparison7 base-level learning methods

X100’s of parameter settings per method

=~ 2000 models per problem

X7 test problems

=14,000 models

X 10 performance metrics

=140,000 model performance evaluations

Page 36: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

COVTYPE: Calibration vs. Accuracy

Page 37: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Multi Dimensional Scaling

M1 M2 M3 M4 M5 M6 M7 . . . M14,000ACC - - - - - - - - -FSC - - - - - - - - -LFT - - - - - - - - -AUC - - - - - - - - -APR - - - - - - - - -BEP - - - - - - - - -RMS - - - - - - - - -MXE - - - - - - - - -CAL - - - - - - - - -SAR - - - - - - - - -

Page 38: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Scaling, Ranking, and Normalizing Problem:

– some metrics, 1.00 is best (e.g. ACC)– some metrics, 0.00 is best (e.g. RMS)– some metrics, baseline is 0.50 (e.g. AUC)– some problems/metrics, 0.60 is excellent performance– some problems/metrics, 0.99 is poor performance

Solution 1: Normalized Scores:– baseline performance => 0.00– best observed performance => 1.00 (proxy for Bayes optimal)– puts all metrics on equal footing

Solution 2: Scale by Standard Deviation Solution 3: Rank Correlation

Page 39: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Multi Dimensional Scaling

Find low-dimension embedding of 10x14,000 data The 10 metrics span a 2-5 dimension subspace

Page 40: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Multi Dimensional Scaling Look at 2-D MDS plots:

Scaled by standard deviationNormalized scoresMDS of rank correlations

MDS on each problem individuallyMDS averaged across all problems

Page 41: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

2-D Multi-Dimensional Scaling

Page 42: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

2-D Multi-Dimensional Scaling

Normalized Scores Scaling Rank-Correlation Distance

Page 43: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Adult Covertype Hyper-Spectral

Letter Medis SLAC

Page 44: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Correlation Analysis 2000 performances for each metric on each problem Correlation between all pairs of metrics

– 10 metrics– 45 pairwise correlations

Average of correlations over 7 test problems

Standard correlation Rank correlation

Present rank correlation here

Page 45: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Rank CorrelationsMetric ACC FSC LFT AUC APR BEP RMS MXE CAL SAR Mean

ACC 1.00 0.87 0.85 0.88 0.89 0.93 0.87 0.75 0.56 0.92 0.852FSC 0.87 1.00 0.77 0.81 0.82 0.87 0.79 0.69 0.50 0.84 0.796LFT 0.85 0.77 1.00 0.96 0.91 0.89 0.82 0.73 0.47 0.92 0.832AUC 0.88 0.81 0.96 1.00 0.95 0.92 0.85 0.77 0.51 0.96 0.861 APR 0.89 0.82 0.91 0.95 1.00 0.92 0.86 0.75 0.50 0.93 0.853BEP 0.93 0.87 0.89 0.92 0.92 1.00 0.87 0.75 0.52 0.93 0.860RMS 0.87 0.79 0.82 0.85 0.86 0.87 1.00 0.92 0.79 0.95 0.872 MXE 0.75 0.69 0.73 0.77 0.75 0.75 0.92 1.00 0.81 0.86 0.803CAL 0.56 0.50 0.47 0.51 0.50 0.52 0.79 0.81 1.00 0.65 0.631SAR 0.92 0.84 0.92 0.96 0.93 0.93 0.95 0.86 0.65 1.00 0.896

Correlation analysis consistent with MDS analysis Ordering metrics have high correlations to each other ACC, AUC, RMS have best correlations of metrics in each metric class RMS has good correlation to other metrics SAR has best correlation to other metrics

Page 46: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Summary 10 metrics span 2-5 Dim subspace Consistent results across problems and scalings Ordering Metrics Cluster: AUC ~ APR ~ BEP CAL far from Ordering Metrics CAL nearest to RMS/MXE RMS ~ MXE, but RMS much more centrally located Threshold Metrics ACC and FSC do not cluster as tightly

as ordering metrics and RMS/MXE Lift behaves more like Ordering than Threshold metrics Old friends ACC, AUC, and RMS most representative New SAR metric is good, but not much better than RMS

Page 47: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

New Resources Want to borrow 14,000 models?

– margin analysis– comparison to new algorithm X– …

PERF code: software that calculates ~2 dozen performance metrics:– Accuracy (at different thresholds)– ROC Area and ROC plots– Precision and Recall plots– Break-even-point, F-score, Average Precision– Squared Error– Cross-Entropy– Lift– …– Currently, most metrics are for boolean classification problems– We are willing to add new metrics and new capabilities– Available at: http://www.cs.cornell.edu/~caruana

Page 48: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Future Work

Page 49: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Future/Related Work Ensemble method optimizes any metric (ICML*04) Get good probs from Boosted Trees (AISTATS*05) Comparison of learning algs on metrics (ICML*06)

First step in analyzing different performance metrics

Develop new metrics with better properties– SAR is a good general purpose metric– Does optimizing to SAR yield better models?– but RMS nearly as good– attempts to make SAR better did not help much

Extend to multi-class or hierarchical problems where evaluating performance is more difficult

Page 50: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Thank You.

Page 51: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Spooky Stuff in Metric SpaceSpooky Stuff in Metric Space

Page 52: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Which learning methods perform best on each metric?

Page 53: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Normalized Scores Best Single ModelsThreshold Metrics Rank/Ordering Metrics Probability Metrics

Model Accuracy F-Score Lift ROC Area Average Precision

Break Even Point

Squared Error

Cross-Entropy Calibration SAR Mean

SVM 0.8134 0.9092 0.9480 0.9621 0.9335 0.9377 0.8767 0.8778 0.9824 0.9055 0.9156

ANN 0.8769 0.8752 0.9487 0.9552 0.9167 0.9142 0.8532 0.8634 0.9881 0.8956 0.9102

BAG-DT 0.8114 0.8609 0.9465 0.9674 0.9416 0.9220 0.8588 0.8942 0.9744 0.9036 0.9086

BST-DT 0.8904 0.8986 0.9574 0.9778 0.9597 0.9427 0.6066 0.6107 0.9241 0.8710 0.8631

KNN 0.7557 0.8463 0.9095 0.9370 0.8847 0.8890 0.7612 0.7354 0.9843 0.8470 0.8559

DT 0.5261 0.7891 0.8503 0.8678 0.7674 0.7954 0.5564 0.6243 0.9647 0.7445 0.7491

BST-STMP 0.7319 0.7903 0.9046 0.9187 0.8610 0.8336 0.3038 0.2861 0.9410 0.6589 0.7303

SVM predictions transformed to posterior probabilities via Platt Scaling SVM and ANN tied for first place; Bagged Trees nearly as good Boosted Trees win 5 of 6 Threshold & Rank metrics, but yield lousy probs! Boosting weaker stumps does not compare to boosting full trees KNN and Plain Decision Trees usually not competitive (with 4k train sets) Other interesting things. See papers.

Page 54: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Platt Scaling SVM predictions: [-inf, +inf] Probability metrics require [0,1] Platt scaling transforms SVM preds by fitting a sigmoid

This gives SVM good probability performance

0

0.2

0.4

0.6

0.8

1

1.2

-15 -10 -5 0 5 10 15

Series1

Page 55: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Outline Motivation: The One True Model Ten Performance Metrics Experiments Multidimensional Scaling (MDS) Analysis Correlation Analysis Learning Algorithm vs. Metric Summary

Page 56: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Base-Level Learners Each optimizes different things:

– ANN: minimize squared error or cross-entropy (good for probs)– SVM, Boosting: optimize margin (good for accuracy, poor for probs)– DT: optimize info gain– KNN: ?

Each best in different regimes:– SVM: high dimensional data– DT, KNN: large data sets– ANN: non-linear prediction from many correlated features

Each algorithm has many variations and free parameters:– SVM: margin parameter, kernel, kernel parameters (gamma, …)– ANN: # hidden units, # hidden layers, learning rate, early stopping point– DT: splitting criterion, pruning options, smoothing options, …– KNN: K, distance metric, distance weighted averaging, …

Generate about 2000 models on each test problem

Page 57: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation Holy Grail of Supervised Learning:

– One True Model (a.k.a. Bayes Optimal Model)– Predicts correct conditional probability for each case– Yields optimal performance on all reasonable metrics– Hard to learn given finite data

train sets rarely have conditional probs, usually just 0/1 targets– Isn’t always necessary

Many Different Performance Metrics:– ACC, AUC, CXE, RMS, PRE/REC …– Each represents different tradeoffs– Usually important to optimize to appropriate metric– Not all metric created equal

Page 58: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Motivation In an ideal world:

– Learn model that predicts correct conditional probabilities– Yield optimal performance on any reasonable metric

In real world: – Finite data– 0/1 targets instead of conditional probabilities– Hard to learn this ideal model– Don’t have good metrics for recognizing ideal model– Ideal model isn’t always necessary

In practice:– Do learning using many different metrics: ACC, AUC, CXE, RMS, …– Each metric represents different tradeoffs– Because of this, usually important to optimize to appropriate metric

Page 59: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Target: 0/1, -1/+1, True/False, … Prediction = f(inputs) = f(x): 0/1 or Real Threshold: f(x) > thresh => 1, else => 0 threshold(f(x)): 0/1

#right / #total p(“correct”): p(threshold(f(x)) = target)

Accuracy

accuracy =1− (target i − threshold( f (

r x i)))( )

2

i=1K N∑

N

Page 60: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Precision and Recall Typically used in document retrieval Precision:

– how many of the returned documents are correct– precision(threshold)

Recall:– how many of the positives does the model return– recall(threshold)

Precision/Recall Curve: sweep thresholds

Page 61: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University

Precision/Recall

Predicted 1 Predicted 0

True

0

Tr

ue 1

a b

c d

PRECISION=a/(a+c)

RECALL=a/(a+b)

threshold

Page 62: Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University