Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Learning Structural SVMswith Latent Variables
Chun-Nam Yu
Dept. of Computer Science, Cornell University
October 8-9, IBM SMiLe Workshop
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 1 / 21
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y)≥∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
≥ ~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
≥ ~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yiC.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′
)
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
maxh∈H{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)−maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
maxh∈H{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸y
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)−maxh∈H
~w · Φ(xi , y , h)≥∆(yi , y , h)− ξi
maxh∈H{~w · Φ(︸ ︷︷ ︸
xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′
), . . . . . .}
≥maxh∈H{~w · Φ(︸ ︷︷ ︸
xi
, ︸ ︷︷ ︸y
, ︸ ︷︷ ︸h′
), . . . . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(1) Decompose the objective into convex and concave part
[12‖~w‖2 + C
n∑i=1
max(y ,h)∈Y×H
[~w · Φ(xi , y , h) + ∆(yi , y , h)]
]︸ ︷︷ ︸
convex
−
[C
n∑i=1
maxh∈H
~w · Φ(xi , yi ,h)
]︸ ︷︷ ︸
concave
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 8 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(2) Upper bound the concave part with a hyperplane at ~wt
∀~w ,−
[C
n∑i=1
maxh∈H
~w · Φ(xi , yi ,h)
]︸ ︷︷ ︸
concave
≤ −
[C
n∑i=1
~w · Φ(xi , yi ,h∗i )
]︸ ︷︷ ︸
linear
where h∗i = argmaxh∈H
~wt · Φ(xi , yi ,h)
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 9 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(3) Minimize the resulting convex sum to get ~wt+1
~wt+1 = min~w
[12‖~w‖2 + C
n∑i=1
max(y ,h)∈Y×H
[~w · Φ(xi , y , h) + ∆(yi , y , h)]
]︸ ︷︷ ︸
convex
−
[C
n∑i=1
~w · Φ(xi , yi ,h∗i )
]︸ ︷︷ ︸
linear
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 10 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function required
Discriminative probabilistic models with latent variablesI [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein
’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
Noun Phrase CoreferenceInput x : Noun phraseswith edge features
Label y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrases
Latent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as trees
Task: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclustering
Inference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]
Initialize spanning trees by chronological order
10-fold CV results:Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]
Initialize spanning trees by chronological order10-fold CV results:
Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiae
Latent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motif
Task: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motif
Inference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation
10-fold CV, 10 random restarts for each parameter settingAlgorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation10-fold CV, 10 random restarts for each parameter setting
Algorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithm
A modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasks
Potential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settings
Also looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one loss
Inference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one loss
Inference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one lossInference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
Optimizing Precision@kInput x : A query with anassociated collection ofdocumentsLabel y : Relevancejudgments of eachdocumentLatent variable h: Top krelevant documents
Query q: ICML 2009
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 19 / 21
Optimizing Precision@k - Formulation
Feature vector Φ: sum of features from top k documents
Φ(x , y ,h) =k∑
j=1
xhj
Loss function ∆: One minus precison@k
∆(y , y , h) = 1− 1k
k∑j=1
[yhj== 1]
Depends only on top k document selected by hInference: Sorting
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 20 / 21
Optimizing Precision@k - ResultsOHSUMED dataset from LETOR 3.0 benchmarkInitialize h with weight vector trained on classificationaccuracy5-fold CV results:
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 21 / 21