46
Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1 , Kristin P. Bennett 2 , Amina Shabbeer 2 , Richard Maclin 3 and Jude W. Shavlik 1 1 University of Wisconsin- Madison, USA 2 Rensselaer Polytechnic Institute, USA 3 University of Minnesota, Duluth, USA ECML 2010, Barcelona, Spain

Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

Embed Size (px)

Citation preview

Page 1: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Online Knowledge-BasedSupport Vector Machines

Gautam Kunapuli1, Kristin P. Bennett2, Amina Shabbeer2, Richard Maclin3 and

Jude W. Shavlik1

1University of Wisconsin-Madison, USA2Rensselaer Polytechnic Institute, USA3University of Minnesota, Duluth, USA

Page 2: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Outline

• Knowledge-Based Support Vector Machines • The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions

Page 3: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Knowledge-Based SVMs• Introduced by Fung et al (2003)• Allows incorporation of expert advice into

SVM formulations• Advice is specified with respect to polyhedral

regions in input (feature) space

• Can be incorporated into SVM formulation as constraints using advice variables

(f eature7 ¸ 5) ^ (f eature12 · 4) ) (cl ass =+1)(f eature2 · ¡ 3) ^ (f eature3 · 4) ^ (f eature10 ¸ 0) ) (cl ass = ¡ 1)(3f eature6+5f eature8 ¸ 2) ^ (f eature11 · ¡ 3) ) (cl ass =+1)

Page 4: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Knowledge-Based SVMs

In classic SVMs, we have T labeled data points (xt, yt), t = 1, …, T. We learn a linearclassifier w’x – b = 0.

min12kwk2+ ¸ e0»

sub. to Y (X w ¡ be) +»¸ e;

»¸ 0:

The standard SVM formulation trades off regularization and loss:

Class A, y= +1

Class B, y = -1

Page 5: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Knowledge-Based SVMsWe assume an expert provides polyhedral advice of the form

Dx · d ) w0x ¸ b

We can transform the logic constraint above using advice variables, u

Dx · d

Class A, y= +1

Class B, y = -1

8<

:

D0u +w = 0;¡ d0u ¡ b ¸ 0;

u ¸ 0

9=

;

These constraints are added to the standard formulation to give Knowledge-Based SVMs

Page 6: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Knowledge-Based SVMsIn general, there are m advicesets, each with label , for advice belonging to Class A or B,Dix · di ) zi (w0x) ¡ b¸ 0

Each advice set adds thefollowing constraints to the SVM formulation8<

:

D0iu

i +ziw = 0;¡ di

0ui ¡ zib ¸ 0;ui ¸ 0

9=

;

D2x · d2

D1x · d1

Class A, y= +1

Class B, y = -1

D3x · d3

z = § 1

Page 7: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Knowledge-Based SVMsThe batch KBSVM formulationintroduces advice slack variables to soften the advice constraints

D2x · d2

D1x · d1

Class A, y= +1

Class B, y = -1

D3x · d3

min12kwk2 +¸ e0»+¹

mX

i=1

¡´i +³i

¢:

s.t. Y (X w ¡ be) +»¸ e;

»¸ 0;

D0iu

i +ziw +´i =0;

¡ di 0ui ¡ zib+³i ¸ 1;

ui ;´i ; ³i ¸ 0; i = 1;:::;m:

Page 8: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Outline

• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions

Page 9: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Online KBSVMs• Need to derive an online version of KBSVMs• Algorithm is provided with advice and one labeled

data point at each round• Algorithm should update the hypothesis at each

step, wt, as well as the advice vectors, ui,t

Page 10: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Passive-Aggressive Algorithms

• Adopt the framework of passive-aggressive algorithms (Crammer et al, 2006), where at each round, when a new data point is given,– if loss = 0, there is no update (passive)– if loss > 0, update weights to minimize loss (aggressive)

• Why passive-aggressive algorithms?– readily applicable to most SVM losses– possible to derive elegant, closed-form update rules– simple rules provide fast updates; scalable– analyze performance by deriving regret bounds

Page 11: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice variables are wt

Online KBSVMs

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

At round t, the formulation for deriving an update is

Page 12: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice variables are wt

Formulation At The t-th Round

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

proximal terms for hypothesis and advice vectors

Page 13: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Formulation At The t-th Round

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³i 2

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

data loss advice loss

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice variables are wt

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

Page 14: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Formulation At The t-th Round

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

parameters

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice variables are wt

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

Page 15: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Formulation At The t-th Round

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

inequality constraints make deriving a closed-form update impossible

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice variables are wt

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

Page 16: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current

advice-vector estimates are wt

Formulation At The t-th Round

(xt; yt)

ui ;t; i = 1;:::;m

(Di ; di ; zi )mi=1

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

inequality constraints make deriving a closed-form update impossible

DECOMPOSE

INTO

SMALLER

SUBPROBLE

MS

Page 17: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• First sub-problem: update hypothesis by fixing the advice variables, to their values at the t-th iteration

Decompose Into m+1 Sub-problems

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

ui =ui ;t

• Some objective terms and constraints drop out of the formulation

Page 18: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• First sub-problem: update hypothesis by fixing the advice vectors

• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each

advice set

Deriving The Hypothesis Update

wt+1 = minw;»;´ i

12kw ¡ wtk22+

¸2»2+

¹2

mX

i=1

k´ik22

subject to ytw0xt ¡ 1+»¸ 0; (®)

D0iu

i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )

Page 19: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• First sub-problem: update hypothesis by fixing the advice vectors

• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each

advice set

Deriving The Hypothesis Update

wt+1 = minw;»;´ i

12kw ¡ wtk22+

¸2»2+

¹2

mX

i=1

k´ik22

subject to ytw0xt ¡ 1+»¸ 0; (®)

D0iu

i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )

fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as ri ;t

Page 20: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• First sub-problem: update hypothesis by fixing the advice vectors

• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each

advice set

Advice-Estimate Of Current Hypothesis

wt+1 = minw;»;´ i

12kw ¡ wtk22+

¸2»2+

¹2

mX

i=1

k´ik22

subject to ytw0xt ¡ 1+»¸ 0; (®)

D0iu

i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )

fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as ri ;t average advice-estimates

over all m advice vectors and denote as

rt =1m

mX

i=1

r i ;t

Page 21: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

The Hypothesis UpdateFor ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is

wt+1 = º (wt +®tytxt) + (1¡ º) rt;

®t =µ1¸+ºkxtk2

¶ ¡ 1

¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt

0xt ; 0

´:

Page 22: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

For ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is

wt+1 = º (wt +®tytxt) + (1¡ º) rt;

®t =µ1¸+ºkxtk2

¶ ¡ 1

¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt

0xt ; 0

´:

The Hypothesis Update

Update is convex combination of the standard passive-aggressive update and the average advice-estimate

º =1

1+m¹

Parameter of convex combinations is

Page 23: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

For ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is

wt+1 = º (wt +®tytxt) + (1¡ º) rt;

®t =µ1¸+ºkxtk2

¶ ¡ 1

¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt

0xt ; 0

´:

The Hypothesis Update

Update is convex combination of the standard passive-aggressive update and the average advice-estimate

Update weight depends on hinge loss computed with respect to a composite weight vector that is a convex combination of the current hypothesis and the average advice-estimate

º =1

1+m¹

Parameter of convex combinations is

Page 24: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• Second sub-problem: update advice vectors by fixing the hypothesis

Deriving The Advice Updates

min»;u i ;´ i ;³ i ¸ 0;w

12kw ¡ wtk2+

12

mX

i=1

kui ¡ ui ;tk2+¸2»2+

¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to ytw0xt ¡ 1+»¸ 0;

D0iu

i +ziw +´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

• Some constraints and objective terms drop out of the formulation

w =wt+1

Page 25: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• Second sub-problem: update advice vectors by fixing the hypothesis

Deriving The Advice Updates

minu i ;´ i ;³ i ¸ 0;

12

mX

i=1

kui ¡ ui ;tk2+¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to

D0iu

i +ziwt+1+´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

Page 26: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• Second sub-problem: update advice vectors by fixing the hypothesis

Deriving The Advice Updates

minu i ;´ i ;³ i ¸ 0;

12

mX

i=1

kui ¡ ui ;tk2+¹2

mX

i=1

¡k´ik2+³2i

¢;

subject to

D0iu

i +ziwt+1+´i =0

¡ di0ui ¡ 1+ ³i ¸ 0

ui ¸ 0

9>=

>;i = 1;:::;m:

split into m sub-problems

Page 27: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• m sub-problems: update the i-th advice vector by fixing the hypothesis

Deriving The i-th Advice Updates

ui ;t+1 = minu i ;´ ;³

12kui ¡ ui ;tk2+

¹2

¡k´ik22+³2i

¢

subject to D0iu

i +ziwt+1+´i =0; (¯ i )

¡ di 0ui ¡ 1+ ³i ¸ 0; (°i )

ui ¸ 0: (¿i )

Page 28: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

• m sub-problems: update the i-th advice vector by fixing the hypothesis

Deriving The i-th Advice Updates

cone constraints still complicating

cannot derive closed form solution

• Use projected-gradient approach• drop constraints to compute

intermediate closed-form update• project intermediate update back on to

cone constraints

ui ;t+1 = minu i ;´ ;³

12kui ¡ ui ;tk2+

¹2

¡k´ik22+³2i

¢

subject to D0iu

i +ziwt+1+´i =0; (¯ i )

¡ di 0ui ¡ 1+ ³i ¸ 0; (°i )

ui ¸ 0: (¿i )

Page 29: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

The m Advice UpdatesFor ¹ > 0, and current hypothesis wt+1, for each advice set, i = 1;:::;m, theupdate rule is given by

ui ;t+1 = max¡ui ;t +Di¯ i ¡ di °i ; 0

¢;

·¯ i

°i

¸=

2

4¡ (D0

iDi + 1¹ I n) D0

idi

di 0Di ¡ (di 0di + 1¹ )

3

5

¡ 1 2

4D0iu

i ;t +ziwt+1

¡ di 0ui ;t ¡ 1

3

5

Page 30: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

The m Advice UpdatesFor ¹ > 0, and current hypothesis wt+1, for each advice set, i = 1;:::;m, theupdate rule is given by

ui ;t+1 = max¡ui ;t +Di¯ i ¡ di °i ; 0

¢;

·¯ i

°i

¸=

2

4¡ (D0

iDi + 1¹ I n) D0

idi

di 0Di ¡ (di 0di + 1¹ )

3

5

¡ 1 2

4D0iu

i ;t +ziwt+1

¡ di 0ui ;t ¡ 1

3

5

each advice update depends on the newly updated hypothesis

• hypothesis-estimate of the advice; denote

• The update is the error or the amount of violation of the constraint by an ideal data point,

si = ¯ i=°i

Dix · di

si

projection

Page 31: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

A lgorithm 1 ThePassive-AggressiveAdviceptron Algorithm1: input: data (xt ;yt)Tt=1, advice sets (D i ;di ;zi )mi=1, parameters ¸; ¹ > 02: initialize: ui ;1 = 0, w1 = 03: let: º = 1=(1+m¹ )

4: for (xt ; yt) do5: predict label yt = sign(wt 0xt)6: receive correct label yt7: su®er loss

t =max¡1¡ ºytw

t 0xt ¡ (1¡ º)ytrt 0xt ; 0

¢

8: update hypothesis using ui ;t

®= t=(1¸+ ºkxtk2); wt+1 = º (wt +®ytx

t ) + (1¡ º) r t

9: update advice variables using wt+1

(¯ i ; °i ) = H ¡ 1i gi ; ui ;t+1 =

¡ui ;t +D i ¯

i ¡ di °i¢+

10: end for

The Adviceptron

Page 32: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Outline

• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions

Page 33: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Diagnosing Diabetes• Standard data set from UCI repository (768 x 8)– all patients at least 21 years old of Pima Indian heritage– features include body mass index, blood glucose level

• Expert advice for diagnosing diabetes from NIH website on risks for Type-2 diabetes– a person who is obese (characterized by BMI > 30) and

has a high blood glucose level (> 126) is at a strong risk for diabetes

– a person who is at normal weight (BMI < 25) and has low blood glucose level (< 100) is at a low risk for diabetes

(BMI ¸ 30) ^(bl oodgl ucose ¸ 126) ) di abetes

(BMI · 25) ^(bl oodgl ucose · 100) ) : di abetes

Page 34: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Diagnosing Diabetes: Results

• 200 examples for training, remaining for testing

• Results averaged over 20 randomized iterations

• Compared to advice-free online algorithms: • Passive-aggressive

(Crammer et al, 2006),• ROMMA

(Li & Long, 2002), • Max margin-perceptron

(Freund & Schapire, 1999)

batch KBSVM, 200 points

Page 35: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Outline

• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions

Page 36: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Tuberculosis Isolate Classification• Task is to classify strains of Mycobacterium

tuberculosis complex (MTBC) into major genetic lineages based on DNA fingerprints

• MTBC is the causative agent for TB– leading cause of disease and morbidity– strains vary in infectivity, transmission, virulence,

immunogenicity, host associations depending on genetic lineage

• Lineage classification is crucial for surveillance, tracking and control of TB world-wide

Page 37: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Tuberculosis Isolate Classification• Two types of DNA fingerprints for all culture-positive TB strains

collected in the US by the CDC (44 data features)• Six (classes) major lineages of TB for classification– ancestral: M. bovis, M. africanum, Indo-Oceanic– modern: Euro-American, East-Asian, East-African-Indian

• Problem formulated as six 1-vs-many classification tasks

#pieces of #pieces ofClass #isolates PositiveAdvice NegativeAdviceEast-Asian 4924 1 1East-African-Indian 1469 2 4Euro-American 25161 1 2Indo-Oceanic 5309 5 5M. africanum 154 1 3M. bovis 693 1 3

Page 38: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Expert Rules for TB Lineage Classification

Rules provided by Dr. Lauren Cowan at the Center for Disease Control, documented in Shabbeer et al, (2010)

East-Asian

Indo-Oceanic

Indo-Oceanic

M. bovisM. africanum

East-African-Indian

Euro-American

MI RU24> 1MI RU24 · 1

Page 39: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

TB Results: Might Need Fewer Examples To Converge With Advice

Euro-American vs. the Rest M. africanum vs. the Rest

batch KBSVMbatch KBSVM

Page 40: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

TB Results: Can Converge To A Better Solution With Advice

East-African-Indian vs. the Rest Indo-Oceanic vs. the Rest

batch KBSVMbatch KBSVM

Page 41: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

TB Results: Possible To Still Learn Well With Only Advice

East-Asian vs. the Rest M. bovis vs. the Rest

batch KBSVM batch KBSVM

Page 42: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Outline

• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions And Questions

Page 43: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Conclusions• New online learning algorithm: the adviceptron• Makes use of prior knowledge in the form of (possibly

imperfect) polyhedral advice• Performs simple, closed-form updates via passive-aggressive

framework; scalable• Good advice can help converge to a better solution

with fewer examples• Encouraging empirical results on two important

real-world tasks

Page 44: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

Acknowledgements. The authors would like to thank Dr. Lauren Cowan of the Center for Disease Control (CDC)

for providing the TB dataset and the expert-defined rules for lineage classification. We gratefully acknowledge support of DARPA under grant HR0011-07-C-0060 and the NIH

under grant 1-R01-LM009731-01.

Views and conclusions contained in this document are those of the authors and do not necessarily represent the official opinion or policies, either expressed or implied of the US government or of DARPA.

References.(Fung et al, 2003) G. Fung, O. L. Mangasarian, and J. W. Shavlik. Knowledge-based support vector classifiers. In S. Becker, S. Thrun & K. Obermayer, eds, NIPS, 15, pp. 521–528, 2003(Crammer et al, 2006) K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. J. of Mach. Learn. Res., 7:551–585, 2006.(Freund and Schapire, 1999) Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Mach. Learn., 37(3):277–296, 1999.(Li and Long, 2002) Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Mach. Learn., 46(1/3):361–387, 2002.(Shabbeer et al, 2001) A. Shabbeer, L. Cowan, J. R. Driscoll, C. Ozcaglar, S. L Vandenberg, B. Yener, and K. P Bennett. TB-Lineage: An online tool for classification and analysis ofstrains of Mycobacterium tuberculosis Complex. Unpublished manuscript, 2010.

Page 45: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

KBSVMs: Deriving The Advice Constraints

We assume an expert provides polyhedral advice of the formDx · d ) w0x ¸ b

We know is equivalent to

If has a solution then its negation has no solution or,8<

:

Dx ¡ d ¿ · 0;w0x ¡ b¿ < 0;

¡ ¿ < 0

9=

;

(x;¿):has no solution

p ) q: p _ q Dx · d

Class A, y= +1

Class B, y = -1

: p _ q

Page 46: Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University

ECML 2010, Barcelona, Spain

has no solution , then byMotzkin’s Theorem of the Alternative, the following system

If the following system

(x;¿)

8<

:

D0u +w = 0;¡ d0u ¡ b ¸ 0;

u ¸ 0

9=

;

has a solution u.

Dx · d

Class A, y= +1

Class B, y = -1

8<

:

Dx ¡ d ¿ · 0;w0x ¡ b¿ < 0;

¡ ¿ < 0

9=

;

KBSVMs: Deriving The Advice Constraints