Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
5/18/2008 1Evaluation
Classifier Evaluation
• How to estimate classifier performance.
• Learning curves
• Feature curves
• Rejects and ROC curves
5/18/2008 2Evaluation
Learning Curve
Size training set
True classification error ε
Bayes error ε*
Sub-optimal classifier
Bayes consistent classifier
cleval
testc
5/18/2008 3Evaluation
The Apparent Classification Error
Size training set
Classification error True error ε
Apparent error εA of training set
The apparent (or resubstitution error) of the training set is positively biased (optimistic).
An independent test set is needed!
bias
5/18/2008 4Evaluation
Error Estimation by Test Set
Training
ClassifierTesting
Design Set
Classification Error Estimate
Other training set other classifierOther test set other error estimate
0.031 0.054 0.0950.010 0.017 0.0030.003 0.005 0.009
10
100
1000
0.01 0.03 0.1N
gendat
testc
5/18/2008 5Evaluation
Training Set Size Test Set Size
• Training set should be large for good classifiers.• Test set should be large for a reliable, unbiased error estimate.• In practice often just a single design set is given
Same set for training and testing may give a good
classifier, but will definitely yield an optimistically biased
error estimate.
A small, independent test setyields an unbiased, but
unreliable (large variance) error estimate for a well trained
classifier.
A large, independent test setyields an unbiased and reliable (small variance) error estimate for a badly trained classifier.
50-50 is an often used standard solution for the sizes for training and testing. Is it
really good?There are better alternatives.
gendat
5/18/2008 6Evaluation
Cross-validationTraining
ClassifieriTesting Classification Error
EstimateRotaten times
Size test set 1/n of design set.
Size training set is (n - 1)/n of design set.
Train and test n test times. Average errors. (Good choice: n = 10)
All objects are tested ones most reliable test result that is possible.
Final classifier: trained by all objects best possible classifier.
Error estimate is slightly pessimistically biased.
crossval
2
5/18/2008 7Evaluation
Leave-one-out Procedure
Training
Test single object
Classification Error Estimate
Rotaten times
Cross-validation in which n is total number of objects. One object tested at a time.
n classifiers to be computed.In general unfeasible for large n.
Doable for k-NN classifier (needs no training).
crossval
testk
Classifieri
5/18/2008 8Evaluation
Expected Learning Curves by Estimated Errors
Size training set
Classification errorTrue error ε
Apparent error εA of training set
Cross-validation error estimate (rotation method)
Leave-one-out (n-1) error estimatecleval
5/18/2008 9Evaluation
Averaged Learning Curve
a = gendath([200 200]); e = cleval(a,ldc,[2,3,5,7,10,15,20],500); plote(e);
For obtaining ‘theoretically expected’ curves many repetitions
are needed.
cleval
5/18/2008 10Evaluation
Repeated Learning Curves
a = gendath([200 200]); for j=1:10e = cleval(a,ldc,[2,3,5,7,10,15,20],1); hold on; plote(e);
end
Small sample sizes have a very large variability.
cleval
plote
5/18/2008 11Evaluation
Learning Curves for Different Classifier Complexity
Size training set
Classification error
complexity
Bayes error
More complex classifiers are better in case of large training setsand worse in case of small training sets
cleval
5/18/2008 12Evaluation
Peaking Phenomenon, OvertrainingCurse of Dimensionality, Rao’s Paradox
feature set size (dimensionality)classifier complexity
Classification error training set size
clevalf
3
5/18/2008 13Evaluation
Example Overtraining, Polynomial Classifier
1 2 3
654
svc([],’p’’1)
5/18/2008 14Evaluation
Example Overtraining (2)
Complexity
svc([],’p’’1)
testc
gendat
5/18/2008 15Evaluation
Example Overtraining (3)
Run smoothing parameter in Parzen classifier from small to large5/18/2008 16Evaluation
Example Overtraining (4)
Complexity
parzenc([],h)
5/18/2008 17Evaluation
Overtraining Increasing Bias
feature set size (dimensionality)classifier complexity
Classification error True error
Apparent error
Bias
5/18/2008 18Evaluation
Example Curse of Dimensionality
Fisher classifier for various feature rankings
4
5/18/2008 19Evaluation
Confusion Matrix (1)
reallabels
obtainedlabels
Confusion matrix:
confmat
5/18/2008 20Evaluation
Confusion Matrix (2)
confmat
testc
classified toA B C
class A 8 2 0class B 6 23 1class C 4 1 15
obje
cts f
rom
-0.20 error in class A-0.23 error in class B-0.25 error in class C
0.228 averaged error
5/18/2008 21Evaluation
Confusion Matrix (3)
obje
cts f
rom
classified toA B C
class A 8 2 0 0.20class B 6 24 0 0.20class C 0 0 20 0.00
0.133
classified toA B C
class A 1 9 0 0.9class B 0 30 0 0.0class C 0 0 20 0.0
0.30
classified toA B C
class A 7 2 1 0.30class B 5 21 4 0.30class C 3 2 15 0.30
0.30
classification details are only observable in the confusion matrix!! 5/18/2008 22Evaluation
Conclusions on Error Estimations
• Larger training sets yield better classifiers.
• Independent test sets are needed for obtaining unbiased error estimates.
• Larger test sets yield more accurate error estimates.
• Leave-one-out cross-validation seems to be an optimal compromise, but might be computationally infeasible.
• 10-fold cross-validation is a good practice.
• More complex classifiers need larger training sets to avoid overtraining.
• This holds in particular for larger feature sizes, due to the curse of dimensionality.
• For too small training sets, more simple classifiers or smaller feature sets are needed.
• Confusion matrices allow a detailed look at the per class classification
5/18/2008 23Evaluation
Reject and ROC Analysis
• Reject Types• Reject Curves• Performance Measures• Varying Costs and Priors• ROC Analysis
Error
Reject
ε0
εr
r
x
∑
length
5/18/2008 24Evaluation
Outlier Reject
Reject objects for which probability density is low:
x
Note: in these area the posterior probabilities might be high!
rejectc
5
5/18/2008 25Evaluation
Ambiguity Reject
Reject objects for which classification is unsure:about equal posterior probabilities:
rejectc
5/18/2008 26Evaluation
Reject Curve
The classification error ε0 can be reduced to εr by rejecting a fraction r of the objects.
2d
Error
Reject
∑0
∑r
r
d
reject
plote
5/18/2008 27Evaluation
Rejecting Classifier
Original classifier
Rejecting classifier
Rejected fraction r
5/18/2008 28Evaluation
Reject Fraction
For reasonably well trained classifiers r > 2 (ε(r=0) – ε(r)).To decrease the error by 1%, more than 2% has to be rejected.
Class B
Class Ax
error decrease
Reject
Rejected fraction r
5/18/2008 29Evaluation
How much to reject?
For given total cost cthis is a linear function in the (r,ε) space.Shift it until a possible operating point is reached.
Error ε
Reject r
d
d*
Cost c
reject plote
Compare the cost of a rejected object, cr , with the cost of a classification error, cε :
5/18/2008 30Evaluation
Error / Performance Measures
xηA
Class A
εA
Class B
Objects of class Α, with prior p(A), is classified by S(x) = 0in a part ηA, assigned to A, and a part εA, assigned to B.
p(A) = ηA + εA
6
5/18/2008 31Evaluation
Error / Performance Measures
xClass A
ηBεB
Class B
Objects of class Β, with prior p(B), is classified by S(x) = 0in a part ηB, assigned to B, and a part εB, assigned to A.
p(B) = ηB + εB5/18/2008 32Evaluation
Error / Performance Measures
εA : error contribution class AεB : error contribution class B
ε : classification errorη : performance
ε + η = 1
εB= εB1+ εB2ε = εA+ εB
ηA = ηA1 + ηA2ηB = ηB1 + ηB2
η = ηA+ ηBp(A) + p(B) = 1p(A) = ηA+ εAp(B) = ηB+ εB
εA/p(A): error in class AεB/p(B): error in class B
ηΑ/p(A) : sensitivity (recall) class A(what goes correct in class A)
ηΑ/ (ηΑ+εB)/ : precision class A(what belongs to A in what is classified as A)
xClass B
ηA1
Class A
εA=ηB2εB1=ηA2
εB2
ηB1
5/18/2008 33Evaluation
Error / Performance Measures
classified as
T NT TP FNN FP TN
Truelabels
Sensitivity: = Error:
Specificity: =
False Discovery Rate (FDR):
Given a trained classifier and a test set:
x
T N
5/18/2008 34Evaluation
Error / Performance Measures
• Error: probability of erroneous classifications
• Performance: 1 – error
• Sensitivity of a target class (e.g. diseased patients): performance for objects from that target class.
• Specificity: performance for all objects outside the target class.
• Precision of a target class : fraction of correct objects among all objects assigned to that class.
• Recall: fraction of correctly classified objects. This is identical tothe performance. It is also identical to the sensitivity when related to a particular class.
35
ROC Curve
x εB
εA
For every classifier S(x) – d = 0 two errors, εA and εB are made.The ROC curve shows them as function of d.
d
roc plote
5/18/2008 36Evaluation
ROC Curve
ROC is independent of class prior probabilitiesChange of prior is analogous to the shift of the decision boundary
Shift in decision boundary
Priors = [0.2 0.8]Priors = [0.5 0.5]
7
5/18/2008 37Evaluation
ROC AnalysisROC: Receiver-Operator Characteristic (from communication theory)
Performanceof target class
Error in non-targets1
1
00
1
0
Error class A
Errorclass B
Medical diagnosticsDatabase retrieval
2-class pattern recognition
roc plote
5/18/2008 38Evaluation
When are ROC Curves Useful? (1)
Performance
Cost
1
00
Trade-off between cost and performance.
What is the performance for given cost?
What are the cost of a preset performance?
5/18/2008 39Evaluation
When are ROC Curves Useful? (2)
1
0εA
εB
0
Study of the effect of changing priors
operating points: value of d in S(x) – d = 0
5/18/2008 40Evaluation
When are ROC Curves Useful? (3)
0 εA
εB
0
Compare behavior of different classifiers
εA
εBS1(x)
S2(x)
S3(x)
Any 'virtual' classifier between S1(x) and S2(x)in the (εA ,εB) space can be realized by usingat random α times S1(x) and (1-α) times S2(x).
Combine Classifiers
2A
1AA )1( εα−+αε=ε 2
B1BB )1( εα−+αε=ε
5/18/2008 41Evaluation
Summary Reject and ROC
• Reject for solving ambiguity: reject objects close to the decision boundary lower costs.
• Reject option for protection against outliers.
• ROC analysis for performance – cost trade-off.
• ROC analysis in case of unknown or varying priors.
• ROC analysis for comparing / combining classifiers.