Transductive Reliability Estimation for Kernel Based Classifiers 1 Department of Computer Science, University of Ioannina, Greece 2 Faculty of Computer

Transductive Reliability Estimation for Kernel Based Classifiers

1Department of Computer Science, University of Ioannina, Greece2Faculty of Computer and Information Science, University of Ljubljana, Slovenia

Dimitris Tzikas1, Matjaz Kukar2, Aristidis [email protected], [email protected], [email protected]

Introduction

We wish to assess the reliability of single example classifications of kernel-based classifiers

Support Vector Machine (SVM) Relevance Vector Machine (RVM)

Such assessment is useful in risk-sensitive applications

Weighted combination of several classifiers

Reliability measures can be obtained directly from the classifier outputs

We propose the use of the transduction reliability methodology to kernel-based classifiers

Kernel Classifiers

Mapping function to the feature space:

Kernel function: inner product in the feature space:

Kernel Classifier:

Training: estimate w using training set D Prefer sparse solutions: most wn→0

SVM and RVM differ in the training method.

: d px

1 2 1 2,T

K x x x x

1

( ) ( , )N

n nn

y x w K x x b

D = {( , )} n nx t

Support Vector Machine (SVM) SVM model (two-class)

Maximize margin from the separating hyperplane in feature space

subject to

C is a hyperparameter to be prespecified

1

( ) ( , )N

SVM n nn

y x w K x x b

N

nn

T

bwCww

1,, 2

1min

( ( ) ) 1

0

Tn n n

n

t w x b

{ 1,1}nt

Reliability Measure for SVM The points near the decision boundary have lower reliability. Output |ysvm(x)|: distance from the separating hyperplane

(decision boundary). Transform the outputs to probabilities by applying the

sigmoid function:

Define reliability measure:

1( )

1 exp( )y

y

( ) 2 ( ( )) 1SVM SVMRE x y x

Reliable examples : | | , 1

Unreliable examples : | | 0, 0SVM SVM

SVM SVM

y RE

y RE

Relevance Vector Machine RVM model (two-class):

Provides posterior probability for class C1

RVM is a Bayesian linear model with hierarchical prior on weights w

The hierarchical prior enforces sparse solutions

1

( ) ( ( , ))N

RVM n nn

y x w K x x

N

nnnwNwp

1

1),0|()|(

N

n

baGammap1

),|()(

{0,1}nt

Relevance Vector Machine Compute by maximizing likelihood Many Compute w:

Incremental RVM: Start from an empty model and a set of basis

functions Incrementally add (and delete) terms Convenient for the transduction approach which

requires retraining

( , )n nw K x x

, 0n nw n

1n n ( ) ( (x )(1- (x ))) T T

RVM RVMw Bt B A diag y y

RVM Reliability Measure Compute reliability estimate for the decision of input x

as:

( ) | 2 ( ) 1|RVM RVMRE x y x

Unreliable examples : 0.5, 0

Reliable examples : 1 or 0, 1RVM RVM

RVM RVM

y RE

y RE

Transductive Reliability Estimation (Kukar and Kononenko, ECML 2002)

The transductive methodology estimates reliability of individual classifications.

Measures stability of the classifier after small perturbation to the training set (the test example with the class

label is added to the training set) retraining of the classifier

Assumption: For reliable decisions, this process should not lead to significant model changes.

The method can be applied to any classifier that outputs

class posterior probabilities Transduction requires retraining → incremental training

methods are preferable

Transductive Reliability Estimation

Assume a classifier CL1 and a training set

Compute class posteriors pk and classify a test example.

Objective: Estimate reliability of decision

Transductive step Add previous test example

with the classification label to training set

Train a classifier CL2 Compute class posteriors

qk and classify the test example.

Transductive Reliablility Estimation

Difference between the class posterior vectors p and q of CL1 and CL2 is an estimate of reliability.

Symmetric KL divergence:

Scale reliability values to [0, 1]:

Reliable estimations:

How do we select threshold T?

21

, , , logK

kk k

k k

pJ p q KL q p KL p q p q

q

,( ) 1 2 J p qTRE x

( )TRE x T

Selecting the Threshold Use Leave-one-out to obtain classifications and reliability estimations TRE(x) for

each example x For a threshold T

We wish: D1 to contain incorrectly classified examples

D2 to contain correctly classified examples Select T that maximizes Information Gain

check 1 21 2( ) ( )

D DIG T H D H D H D

D D

1 1

2 2

: set of unreliable classifications: { : ( ) }

: set of reliable classifications: { : ( ) }

D D x TRE x T

D D x TRE x T

Evaluation of reliability measures Transduction has been evaluated on several classifiers:

decision trees, Naïve Bayes

We applied the transduction approach to SVM and RVM SVM is retrained from scratch with same hyperparameters For RVM we considered both retraining from scratch and

incremental retraining

Reliability measures: ΤRESVM, TRERVM and TRERVM(inc).

TRERVM(inc).is computationally efficient (50 – 100 times faster)

We compare direct measures RESVM, RERVM with transductive measures.

Evaluation of reliability measures

3 UCI medical datasets (RBF kernel) 1 bioinformatics (linear kernel) dataset (leukemia) Cardiac Artery Disease (CAD) dataset (RBF kernel)

Comparison with expert physicians

Evaluation of reliability estimation methods Use Leave-one-out to decide for correct or incorrect

classification of each example and compute the reliability estimates (RE(x), TRE(x)).

For each dataset and measure determine the threshold that maximizes the information gain

Use the maximum information gain to compare different reliability measures on each dataset

Evaluation on UCI Datasets

Max IG of TRESVM is higher than RESVM

Max IG of TRERVM(inc) is higher than TRERVM and RERVM (except hepatitis dataset)

Method hepatitis

new-thyroid

wdbc leukemia

RESVM 0.106 0.083 0.036 0.054

TRESVM 0.120 0.092 0.047 0.073

RERVM 0.109 0.068 0.091 0.089

TRERVM 0.178 0.062 0.094 0.062

TRERVM(inc) 0.133 0.072 0.106 0.107

Application on CAD (comparison to physicians)

Coronary Artery Disease (CAD) dataset (University Clinical Centre, Ljubljana).

327 cases (228 positive, 99 negative)

Physicians estimate reliability by computing a posterior probability based on diagnostic tests and other information.

For posterior > 0.9 or < 0.1 diagnosis is assumed reliable.

Application on CAD

Positive

Negative

Method Reliable(%)

Correct(%)

Errors(%)

Reliable

(%)

Correct(%)

Errors(%)

Physicians

76 72 4 52 45 7

RESVM 65 65 0 34 30 4TRESVM 78 76 2 65 57 8RERVM 63.4 63 0.4 60 54 6TRERVM 68.3 67 1.3 54 49 5TRERVM(inc) 69.4 69 0.4 61 54 7

Conclusions

We applied the transductive approach to kernel-based models Support Vector Machine (SVM) Relevance Vector Machine (RVM)

We compared direct and transductive reliability measures on several datasets

We also compared against physician’s performance on a real dataset for Diagnosis of Coronary Artery Disease (CAD)

The transductive approach seems to provide good estimates

Future work

Examine incremental training methods for SVM.

Define reliability measures based on the structural difference between the classifiers CL1 and CL2.

Use transduction to estimate ‘strangeness’ of an example in the typicalness framework for confidence estimation (Kukar, KIS 2006)

Documents

Transductive Reliability Estimation for Kernel Based Classifiers 1 Department of Computer Science, University of Ioannina, Greece 2 Faculty of Computer