Upload
raymond-barton
View
216
Download
1
Embed Size (px)
Citation preview
ECML 2010, Barcelona, Spain
Online Knowledge-BasedSupport Vector Machines
Gautam Kunapuli1, Kristin P. Bennett2, Amina Shabbeer2, Richard Maclin3 and
Jude W. Shavlik1
1University of Wisconsin-Madison, USA2Rensselaer Polytechnic Institute, USA3University of Minnesota, Duluth, USA
ECML 2010, Barcelona, Spain
Outline
• Knowledge-Based Support Vector Machines • The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions
ECML 2010, Barcelona, Spain
Knowledge-Based SVMs• Introduced by Fung et al (2003)• Allows incorporation of expert advice into
SVM formulations• Advice is specified with respect to polyhedral
regions in input (feature) space
• Can be incorporated into SVM formulation as constraints using advice variables
(f eature7 ¸ 5) ^ (f eature12 · 4) ) (cl ass =+1)(f eature2 · ¡ 3) ^ (f eature3 · 4) ^ (f eature10 ¸ 0) ) (cl ass = ¡ 1)(3f eature6+5f eature8 ¸ 2) ^ (f eature11 · ¡ 3) ) (cl ass =+1)
ECML 2010, Barcelona, Spain
Knowledge-Based SVMs
In classic SVMs, we have T labeled data points (xt, yt), t = 1, …, T. We learn a linearclassifier w’x – b = 0.
min12kwk2+ ¸ e0»
sub. to Y (X w ¡ be) +»¸ e;
»¸ 0:
The standard SVM formulation trades off regularization and loss:
Class A, y= +1
Class B, y = -1
ECML 2010, Barcelona, Spain
Knowledge-Based SVMsWe assume an expert provides polyhedral advice of the form
Dx · d ) w0x ¸ b
We can transform the logic constraint above using advice variables, u
Dx · d
Class A, y= +1
Class B, y = -1
8<
:
D0u +w = 0;¡ d0u ¡ b ¸ 0;
u ¸ 0
9=
;
These constraints are added to the standard formulation to give Knowledge-Based SVMs
ECML 2010, Barcelona, Spain
Knowledge-Based SVMsIn general, there are m advicesets, each with label , for advice belonging to Class A or B,Dix · di ) zi (w0x) ¡ b¸ 0
Each advice set adds thefollowing constraints to the SVM formulation8<
:
D0iu
i +ziw = 0;¡ di
0ui ¡ zib ¸ 0;ui ¸ 0
9=
;
D2x · d2
D1x · d1
Class A, y= +1
Class B, y = -1
D3x · d3
z = § 1
ECML 2010, Barcelona, Spain
Knowledge-Based SVMsThe batch KBSVM formulationintroduces advice slack variables to soften the advice constraints
D2x · d2
D1x · d1
Class A, y= +1
Class B, y = -1
D3x · d3
min12kwk2 +¸ e0»+¹
mX
i=1
¡´i +³i
¢:
s.t. Y (X w ¡ be) +»¸ e;
»¸ 0;
D0iu
i +ziw +´i =0;
¡ di 0ui ¡ zib+³i ¸ 1;
ui ;´i ; ³i ¸ 0; i = 1;:::;m:
ECML 2010, Barcelona, Spain
Outline
• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions
ECML 2010, Barcelona, Spain
Online KBSVMs• Need to derive an online version of KBSVMs• Algorithm is provided with advice and one labeled
data point at each round• Algorithm should update the hypothesis at each
step, wt, as well as the advice vectors, ui,t
ECML 2010, Barcelona, Spain
Passive-Aggressive Algorithms
• Adopt the framework of passive-aggressive algorithms (Crammer et al, 2006), where at each round, when a new data point is given,– if loss = 0, there is no update (passive)– if loss > 0, update weights to minimize loss (aggressive)
• Why passive-aggressive algorithms?– readily applicable to most SVM losses– possible to derive elegant, closed-form update rules– simple rules provide fast updates; scalable– analyze performance by deriving regret bounds
ECML 2010, Barcelona, Spain
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice variables are wt
Online KBSVMs
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
At round t, the formulation for deriving an update is
ECML 2010, Barcelona, Spain
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice variables are wt
Formulation At The t-th Round
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
proximal terms for hypothesis and advice vectors
ECML 2010, Barcelona, Spain
Formulation At The t-th Round
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³i 2
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
data loss advice loss
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice variables are wt
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
ECML 2010, Barcelona, Spain
Formulation At The t-th Round
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
parameters
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice variables are wt
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
ECML 2010, Barcelona, Spain
Formulation At The t-th Round
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
inequality constraints make deriving a closed-form update impossible
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice variables are wt
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
ECML 2010, Barcelona, Spain
• There are m advice sets, • At round t, the algorithm receives• The current hypothesis is , and the current
advice-vector estimates are wt
Formulation At The t-th Round
(xt; yt)
ui ;t; i = 1;:::;m
(Di ; di ; zi )mi=1
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
inequality constraints make deriving a closed-form update impossible
DECOMPOSE
INTO
SMALLER
SUBPROBLE
MS
ECML 2010, Barcelona, Spain
• First sub-problem: update hypothesis by fixing the advice variables, to their values at the t-th iteration
Decompose Into m+1 Sub-problems
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
ui =ui ;t
• Some objective terms and constraints drop out of the formulation
ECML 2010, Barcelona, Spain
• First sub-problem: update hypothesis by fixing the advice vectors
• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each
advice set
Deriving The Hypothesis Update
wt+1 = minw;»;´ i
12kw ¡ wtk22+
¸2»2+
¹2
mX
i=1
k´ik22
subject to ytw0xt ¡ 1+»¸ 0; (®)
D0iu
i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )
ECML 2010, Barcelona, Spain
• First sub-problem: update hypothesis by fixing the advice vectors
• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each
advice set
Deriving The Hypothesis Update
wt+1 = minw;»;´ i
12kw ¡ wtk22+
¸2»2+
¹2
mX
i=1
k´ik22
subject to ytw0xt ¡ 1+»¸ 0; (®)
D0iu
i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )
fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as ri ;t
ECML 2010, Barcelona, Spain
• First sub-problem: update hypothesis by fixing the advice vectors
• Update advice vectors by fixing the hypothesis– Breaks down into m sub-problems, one for each
advice set
Advice-Estimate Of Current Hypothesis
wt+1 = minw;»;´ i
12kw ¡ wtk22+
¸2»2+
¹2
mX
i=1
k´ik22
subject to ytw0xt ¡ 1+»¸ 0; (®)
D0iu
i ;t +ziw +´i =0; i = 1;:::;m: (¯ i )
fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as ri ;t average advice-estimates
over all m advice vectors and denote as
rt =1m
mX
i=1
r i ;t
ECML 2010, Barcelona, Spain
The Hypothesis UpdateFor ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is
wt+1 = º (wt +®tytxt) + (1¡ º) rt;
®t =µ1¸+ºkxtk2
¶ ¡ 1
¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt
0xt ; 0
´:
ECML 2010, Barcelona, Spain
For ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is
wt+1 = º (wt +®tytxt) + (1¡ º) rt;
®t =µ1¸+ºkxtk2
¶ ¡ 1
¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt
0xt ; 0
´:
The Hypothesis Update
Update is convex combination of the standard passive-aggressive update and the average advice-estimate
º =1
1+m¹
Parameter of convex combinations is
ECML 2010, Barcelona, Spain
For ¸, ¹ > 0, and advice-estimate rt, thehypothesis update is
wt+1 = º (wt +®tytxt) + (1¡ º) rt;
®t =µ1¸+ºkxtk2
¶ ¡ 1
¢max³1¡ º ytwt0xt ¡ (1¡ º) ytrt
0xt ; 0
´:
The Hypothesis Update
Update is convex combination of the standard passive-aggressive update and the average advice-estimate
Update weight depends on hinge loss computed with respect to a composite weight vector that is a convex combination of the current hypothesis and the average advice-estimate
º =1
1+m¹
Parameter of convex combinations is
ECML 2010, Barcelona, Spain
• Second sub-problem: update advice vectors by fixing the hypothesis
Deriving The Advice Updates
min»;u i ;´ i ;³ i ¸ 0;w
12kw ¡ wtk2+
12
mX
i=1
kui ¡ ui ;tk2+¸2»2+
¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to ytw0xt ¡ 1+»¸ 0;
D0iu
i +ziw +´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
• Some constraints and objective terms drop out of the formulation
w =wt+1
ECML 2010, Barcelona, Spain
• Second sub-problem: update advice vectors by fixing the hypothesis
Deriving The Advice Updates
minu i ;´ i ;³ i ¸ 0;
12
mX
i=1
kui ¡ ui ;tk2+¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to
D0iu
i +ziwt+1+´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
ECML 2010, Barcelona, Spain
• Second sub-problem: update advice vectors by fixing the hypothesis
Deriving The Advice Updates
minu i ;´ i ;³ i ¸ 0;
12
mX
i=1
kui ¡ ui ;tk2+¹2
mX
i=1
¡k´ik2+³2i
¢;
subject to
D0iu
i +ziwt+1+´i =0
¡ di0ui ¡ 1+ ³i ¸ 0
ui ¸ 0
9>=
>;i = 1;:::;m:
split into m sub-problems
ECML 2010, Barcelona, Spain
• m sub-problems: update the i-th advice vector by fixing the hypothesis
Deriving The i-th Advice Updates
ui ;t+1 = minu i ;´ ;³
12kui ¡ ui ;tk2+
¹2
¡k´ik22+³2i
¢
subject to D0iu
i +ziwt+1+´i =0; (¯ i )
¡ di 0ui ¡ 1+ ³i ¸ 0; (°i )
ui ¸ 0: (¿i )
ECML 2010, Barcelona, Spain
• m sub-problems: update the i-th advice vector by fixing the hypothesis
Deriving The i-th Advice Updates
cone constraints still complicating
cannot derive closed form solution
• Use projected-gradient approach• drop constraints to compute
intermediate closed-form update• project intermediate update back on to
cone constraints
ui ;t+1 = minu i ;´ ;³
12kui ¡ ui ;tk2+
¹2
¡k´ik22+³2i
¢
subject to D0iu
i +ziwt+1+´i =0; (¯ i )
¡ di 0ui ¡ 1+ ³i ¸ 0; (°i )
ui ¸ 0: (¿i )
ECML 2010, Barcelona, Spain
The m Advice UpdatesFor ¹ > 0, and current hypothesis wt+1, for each advice set, i = 1;:::;m, theupdate rule is given by
ui ;t+1 = max¡ui ;t +Di¯ i ¡ di °i ; 0
¢;
·¯ i
°i
¸=
2
4¡ (D0
iDi + 1¹ I n) D0
idi
di 0Di ¡ (di 0di + 1¹ )
3
5
¡ 1 2
4D0iu
i ;t +ziwt+1
¡ di 0ui ;t ¡ 1
3
5
ECML 2010, Barcelona, Spain
The m Advice UpdatesFor ¹ > 0, and current hypothesis wt+1, for each advice set, i = 1;:::;m, theupdate rule is given by
ui ;t+1 = max¡ui ;t +Di¯ i ¡ di °i ; 0
¢;
·¯ i
°i
¸=
2
4¡ (D0
iDi + 1¹ I n) D0
idi
di 0Di ¡ (di 0di + 1¹ )
3
5
¡ 1 2
4D0iu
i ;t +ziwt+1
¡ di 0ui ;t ¡ 1
3
5
each advice update depends on the newly updated hypothesis
• hypothesis-estimate of the advice; denote
• The update is the error or the amount of violation of the constraint by an ideal data point,
si = ¯ i=°i
Dix · di
si
projection
ECML 2010, Barcelona, Spain
A lgorithm 1 ThePassive-AggressiveAdviceptron Algorithm1: input: data (xt ;yt)Tt=1, advice sets (D i ;di ;zi )mi=1, parameters ¸; ¹ > 02: initialize: ui ;1 = 0, w1 = 03: let: º = 1=(1+m¹ )
4: for (xt ; yt) do5: predict label yt = sign(wt 0xt)6: receive correct label yt7: su®er loss
t =max¡1¡ ºytw
t 0xt ¡ (1¡ º)ytrt 0xt ; 0
¢
8: update hypothesis using ui ;t
®= t=(1¸+ ºkxtk2); wt+1 = º (wt +®ytx
t ) + (1¡ º) r t
9: update advice variables using wt+1
(¯ i ; °i ) = H ¡ 1i gi ; ui ;t+1 =
¡ui ;t +D i ¯
i ¡ di °i¢+
10: end for
The Adviceptron
ECML 2010, Barcelona, Spain
Outline
• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions
ECML 2010, Barcelona, Spain
Diagnosing Diabetes• Standard data set from UCI repository (768 x 8)– all patients at least 21 years old of Pima Indian heritage– features include body mass index, blood glucose level
• Expert advice for diagnosing diabetes from NIH website on risks for Type-2 diabetes– a person who is obese (characterized by BMI > 30) and
has a high blood glucose level (> 126) is at a strong risk for diabetes
– a person who is at normal weight (BMI < 25) and has low blood glucose level (< 100) is at a low risk for diabetes
(BMI ¸ 30) ^(bl oodgl ucose ¸ 126) ) di abetes
(BMI · 25) ^(bl oodgl ucose · 100) ) : di abetes
ECML 2010, Barcelona, Spain
Diagnosing Diabetes: Results
• 200 examples for training, remaining for testing
• Results averaged over 20 randomized iterations
• Compared to advice-free online algorithms: • Passive-aggressive
(Crammer et al, 2006),• ROMMA
(Li & Long, 2002), • Max margin-perceptron
(Freund & Schapire, 1999)
batch KBSVM, 200 points
ECML 2010, Barcelona, Spain
Outline
• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions
ECML 2010, Barcelona, Spain
Tuberculosis Isolate Classification• Task is to classify strains of Mycobacterium
tuberculosis complex (MTBC) into major genetic lineages based on DNA fingerprints
• MTBC is the causative agent for TB– leading cause of disease and morbidity– strains vary in infectivity, transmission, virulence,
immunogenicity, host associations depending on genetic lineage
• Lineage classification is crucial for surveillance, tracking and control of TB world-wide
ECML 2010, Barcelona, Spain
Tuberculosis Isolate Classification• Two types of DNA fingerprints for all culture-positive TB strains
collected in the US by the CDC (44 data features)• Six (classes) major lineages of TB for classification– ancestral: M. bovis, M. africanum, Indo-Oceanic– modern: Euro-American, East-Asian, East-African-Indian
• Problem formulated as six 1-vs-many classification tasks
#pieces of #pieces ofClass #isolates PositiveAdvice NegativeAdviceEast-Asian 4924 1 1East-African-Indian 1469 2 4Euro-American 25161 1 2Indo-Oceanic 5309 5 5M. africanum 154 1 3M. bovis 693 1 3
ECML 2010, Barcelona, Spain
Expert Rules for TB Lineage Classification
Rules provided by Dr. Lauren Cowan at the Center for Disease Control, documented in Shabbeer et al, (2010)
East-Asian
Indo-Oceanic
Indo-Oceanic
M. bovisM. africanum
East-African-Indian
Euro-American
MI RU24> 1MI RU24 · 1
ECML 2010, Barcelona, Spain
TB Results: Might Need Fewer Examples To Converge With Advice
Euro-American vs. the Rest M. africanum vs. the Rest
batch KBSVMbatch KBSVM
ECML 2010, Barcelona, Spain
TB Results: Can Converge To A Better Solution With Advice
East-African-Indian vs. the Rest Indo-Oceanic vs. the Rest
batch KBSVMbatch KBSVM
ECML 2010, Barcelona, Spain
TB Results: Possible To Still Learn Well With Only Advice
East-Asian vs. the Rest M. bovis vs. the Rest
batch KBSVM batch KBSVM
ECML 2010, Barcelona, Spain
Outline
• Knowledge-Based Support Vector Machines• The Adviceptron: Online KBSVMs• A Real-World Task: Diabetes Diagnosis• A Real-World Task: Tuberculosis Isolate Classification• Conclusions And Questions
ECML 2010, Barcelona, Spain
Conclusions• New online learning algorithm: the adviceptron• Makes use of prior knowledge in the form of (possibly
imperfect) polyhedral advice• Performs simple, closed-form updates via passive-aggressive
framework; scalable• Good advice can help converge to a better solution
with fewer examples• Encouraging empirical results on two important
real-world tasks
ECML 2010, Barcelona, Spain
Acknowledgements. The authors would like to thank Dr. Lauren Cowan of the Center for Disease Control (CDC)
for providing the TB dataset and the expert-defined rules for lineage classification. We gratefully acknowledge support of DARPA under grant HR0011-07-C-0060 and the NIH
under grant 1-R01-LM009731-01.
Views and conclusions contained in this document are those of the authors and do not necessarily represent the official opinion or policies, either expressed or implied of the US government or of DARPA.
References.(Fung et al, 2003) G. Fung, O. L. Mangasarian, and J. W. Shavlik. Knowledge-based support vector classifiers. In S. Becker, S. Thrun & K. Obermayer, eds, NIPS, 15, pp. 521–528, 2003(Crammer et al, 2006) K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. J. of Mach. Learn. Res., 7:551–585, 2006.(Freund and Schapire, 1999) Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Mach. Learn., 37(3):277–296, 1999.(Li and Long, 2002) Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Mach. Learn., 46(1/3):361–387, 2002.(Shabbeer et al, 2001) A. Shabbeer, L. Cowan, J. R. Driscoll, C. Ozcaglar, S. L Vandenberg, B. Yener, and K. P Bennett. TB-Lineage: An online tool for classification and analysis ofstrains of Mycobacterium tuberculosis Complex. Unpublished manuscript, 2010.
ECML 2010, Barcelona, Spain
KBSVMs: Deriving The Advice Constraints
We assume an expert provides polyhedral advice of the formDx · d ) w0x ¸ b
We know is equivalent to
If has a solution then its negation has no solution or,8<
:
Dx ¡ d ¿ · 0;w0x ¡ b¿ < 0;
¡ ¿ < 0
9=
;
(x;¿):has no solution
p ) q: p _ q Dx · d
Class A, y= +1
Class B, y = -1
: p _ q
ECML 2010, Barcelona, Spain
has no solution , then byMotzkin’s Theorem of the Alternative, the following system
If the following system
(x;¿)
8<
:
D0u +w = 0;¡ d0u ¡ b ¸ 0;
u ¸ 0
9=
;
has a solution u.
Dx · d
Class A, y= +1
Class B, y = -1
8<
:
Dx ¡ d ¿ · 0;w0x ¡ b¿ < 0;
¡ ¿ < 0
9=
;
KBSVMs: Deriving The Advice Constraints