Upload
marion-terry
View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Transductive Reliability Estimation for Kernel Based Classifiers
1Department of Computer Science, University of Ioannina, Greece2Faculty of Computer and Information Science, University of Ljubljana, Slovenia
Dimitris Tzikas1, Matjaz Kukar2, Aristidis [email protected], [email protected], [email protected]
Introduction
We wish to assess the reliability of single example classifications of kernel-based classifiers
Support Vector Machine (SVM) Relevance Vector Machine (RVM)
Such assessment is useful in risk-sensitive applications
Weighted combination of several classifiers
Reliability measures can be obtained directly from the classifier outputs
We propose the use of the transduction reliability methodology to kernel-based classifiers
Kernel Classifiers
Mapping function to the feature space:
Kernel function: inner product in the feature space:
Kernel Classifier:
Training: estimate w using training set D Prefer sparse solutions: most wn→0
SVM and RVM differ in the training method.
: d px
1 2 1 2,T
K x x x x
1
( ) ( , )N
n nn
y x w K x x b
D = {( , )} n nx t
Support Vector Machine (SVM) SVM model (two-class)
Maximize margin from the separating hyperplane in feature space
subject to
C is a hyperparameter to be prespecified
1
( ) ( , )N
SVM n nn
y x w K x x b
N
nn
T
bwCww
1,, 2
1min
( ( ) ) 1
0
Tn n n
n
t w x b
{ 1,1}nt
Reliability Measure for SVM The points near the decision boundary have lower reliability. Output |ysvm(x)|: distance from the separating hyperplane
(decision boundary). Transform the outputs to probabilities by applying the
sigmoid function:
Define reliability measure:
1( )
1 exp( )y
y
( ) 2 ( ( )) 1SVM SVMRE x y x
Reliable examples : | | , 1
Unreliable examples : | | 0, 0SVM SVM
SVM SVM
y RE
y RE
Relevance Vector Machine RVM model (two-class):
Provides posterior probability for class C1
RVM is a Bayesian linear model with hierarchical prior on weights w
The hierarchical prior enforces sparse solutions
1
( ) ( ( , ))N
RVM n nn
y x w K x x
N
nnnwNwp
1
1),0|()|(
N
n
baGammap1
),|()(
{0,1}nt
Relevance Vector Machine Compute by maximizing likelihood Many Compute w:
Incremental RVM: Start from an empty model and a set of basis
functions Incrementally add (and delete) terms Convenient for the transduction approach which
requires retraining
( , )n nw K x x
, 0n nw n
1n n ( ) ( (x )(1- (x ))) T T
RVM RVMw Bt B A diag y y
RVM Reliability Measure Compute reliability estimate for the decision of input x
as:
( ) | 2 ( ) 1|RVM RVMRE x y x
Unreliable examples : 0.5, 0
Reliable examples : 1 or 0, 1RVM RVM
RVM RVM
y RE
y RE
Transductive Reliability Estimation (Kukar and Kononenko, ECML 2002)
The transductive methodology estimates reliability of individual classifications.
Measures stability of the classifier after small perturbation to the training set (the test example with the class
label is added to the training set) retraining of the classifier
Assumption: For reliable decisions, this process should not lead to significant model changes.
The method can be applied to any classifier that outputs
class posterior probabilities Transduction requires retraining → incremental training
methods are preferable
Transductive Reliability Estimation
Assume a classifier CL1 and a training set
Compute class posteriors pk and classify a test example.
Objective: Estimate reliability of decision
Transductive step Add previous test example
with the classification label to training set
Train a classifier CL2 Compute class posteriors
qk and classify the test example.
Transductive Reliablility Estimation
Difference between the class posterior vectors p and q of CL1 and CL2 is an estimate of reliability.
Symmetric KL divergence:
Scale reliability values to [0, 1]:
Reliable estimations:
How do we select threshold T?
21
, , , logK
kk k
k k
pJ p q KL q p KL p q p q
q
,( ) 1 2 J p qTRE x
( )TRE x T
Selecting the Threshold Use Leave-one-out to obtain classifications and reliability estimations TRE(x) for
each example x For a threshold T
We wish: D1 to contain incorrectly classified examples
D2 to contain correctly classified examples Select T that maximizes Information Gain
check 1 21 2( ) ( )
D DIG T H D H D H D
D D
1 1
2 2
: set of unreliable classifications: { : ( ) }
: set of reliable classifications: { : ( ) }
D D x TRE x T
D D x TRE x T
Evaluation of reliability measures Transduction has been evaluated on several classifiers:
decision trees, Naïve Bayes
We applied the transduction approach to SVM and RVM SVM is retrained from scratch with same hyperparameters For RVM we considered both retraining from scratch and
incremental retraining
Reliability measures: ΤRESVM, TRERVM and TRERVM(inc).
TRERVM(inc).is computationally efficient (50 – 100 times faster)
We compare direct measures RESVM, RERVM with transductive measures.
Evaluation of reliability measures
3 UCI medical datasets (RBF kernel) 1 bioinformatics (linear kernel) dataset (leukemia) Cardiac Artery Disease (CAD) dataset (RBF kernel)
Comparison with expert physicians
Evaluation of reliability estimation methods Use Leave-one-out to decide for correct or incorrect
classification of each example and compute the reliability estimates (RE(x), TRE(x)).
For each dataset and measure determine the threshold that maximizes the information gain
Use the maximum information gain to compare different reliability measures on each dataset
Evaluation on UCI Datasets
Max IG of TRESVM is higher than RESVM
Max IG of TRERVM(inc) is higher than TRERVM and RERVM (except hepatitis dataset)
Method hepatitis
new-thyroid
wdbc leukemia
RESVM 0.106 0.083 0.036 0.054
TRESVM 0.120 0.092 0.047 0.073
RERVM 0.109 0.068 0.091 0.089
TRERVM 0.178 0.062 0.094 0.062
TRERVM(inc) 0.133 0.072 0.106 0.107
Application on CAD (comparison to physicians)
Coronary Artery Disease (CAD) dataset (University Clinical Centre, Ljubljana).
327 cases (228 positive, 99 negative)
Physicians estimate reliability by computing a posterior probability based on diagnostic tests and other information.
For posterior > 0.9 or < 0.1 diagnosis is assumed reliable.
Application on CAD
Positive
Negative
Method Reliable(%)
Correct(%)
Errors(%)
Reliable
(%)
Correct(%)
Errors(%)
Physicians
76 72 4 52 45 7
RESVM 65 65 0 34 30 4TRESVM 78 76 2 65 57 8RERVM 63.4 63 0.4 60 54 6TRERVM 68.3 67 1.3 54 49 5TRERVM(inc) 69.4 69 0.4 61 54 7
Conclusions
We applied the transductive approach to kernel-based models Support Vector Machine (SVM) Relevance Vector Machine (RVM)
We compared direct and transductive reliability measures on several datasets
We also compared against physician’s performance on a real dataset for Diagnosis of Coronary Artery Disease (CAD)
The transductive approach seems to provide good estimates
Future work
Examine incremental training methods for SVM.
Define reliability measures based on the structural difference between the classifiers CL1 and CL2.
Use transduction to estimate ‘strangeness’ of an example in the typicalness framework for confidence estimation (Kukar, KIS 2006)