Upload
whitley
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Modeling Latent Variable Uncertainty for Loss-based Learning. M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay , Île-de-France. Ben Packer Stanford University. Daphne Koller Stanford University. Aim. Accurate learning with weakly supervised data. - PowerPoint PPT Presentation
Citation preview
Modeling Latent Variable Uncertainty for Loss-based Learning
Daphne KollerStanford University
Ben PackerStanford University
M. Pawan KumarÉcole Centrale Paris
École des Ponts ParisTechINRIA Saclay, Île-de-France
AimAccurate learning with weakly supervised data
Train Input xi Output yi
Bison
Deer
Elephant
Giraffe
Llama
Rhino Object Detection
Input x
Output y = “Deer”Latent Variable h
(y(f),h(f)) = argmaxy,h f(Ψ(x,y,h))
AimAccurate learning with weakly supervised data
Feature Ψ(x,y,h) (e.g. HOG)Input x
Output y = “Deer”
Prediction
Function f : Ψ(x,y,h) (-∞, +∞)
Latent Variable h
f* = argminf Objective(f)
AimAccurate learning with weakly supervised data
Feature Ψ(x,y,h) (e.g. HOG)Input x
Output y = “Deer”
Function f : Ψ(x,y,h) (-∞, +∞)
Learning
Latent Variable h
AimFind a suitable objective function to learn f*
Feature Ψ(x,y,h) (e.g. HOG)Input x
Output y = “Deer”
Function f : Ψ(x,y,h) (-∞, +∞)
Learning
Encourages accurate prediction
User-specified criterion for accuracy
f* = argminf Objective(f)
Latent Variable h
• Previous Methods
• Our Framework
• Optimization
• Results
• Ongoing and Future Work
Outline
Latent SVM
Linear function parameterized by w
Prediction (y(w), h(w)) = argmaxy,h wTΨ(x,y,h)
Learning minw Σi Δ(yi,yi(w),hi(w))
✔ Loss based learning
✖ Loss independent of true (unknown) latent variable
✖ Doesn’t model uncertainty in latent variables
User-defined loss
Expectation Maximization
Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))
Z
Prediction (y(θ), h(θ)) = argmaxy,h Pθ(y,h|x)
Expectation Maximization
Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))
Z
Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)
Learning maxθ Σi log (Pθ(yi|xi))
Expectation Maximization
Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))
Z
Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)
Learning maxθ Σi Σhi log (Pθ(yi,hi|xi))
✔ Models uncertainty in latent variables
✖ Doesn’t model accuracy of latent variable prediction
✖ No user-defined loss function
• Previous Methods
• Our Framework
• Optimization
• Results
• Ongoing and Future Work
Outline
Problem
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
Solution
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
Solution
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
Pθ(hi|yi,xi)
hi
SolutionUse two different distributions for the two different tasks
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
The Ideal CaseNo latent variable uncertainty, correct prediction
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
The Ideal CaseNo latent variable uncertainty, correct prediction
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
hi(w)
The Ideal CaseNo latent variable uncertainty, correct prediction
hi
Pw(yi,hi|xi)
(yi,hi)(yi,hi(w))
Pθ(hi|yi,xi)
hi(w)
In PracticeRestrictions in the representation power of models
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
Our FrameworkMinimize the dissimilarity between the two distributions
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
User-defined dissimilarity measure
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)
Hi(w,θ)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))
- β Hi(θ,θ)Hi(w,θ)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- β Hi(θ,θ)Hi(w,θ)minw,θ Σi
• Previous Methods
• Our Framework
• Optimization
• Results
• Ongoing and Future Work
Outline
Optimizationminw,θ Σi Hi(w,θ) - β Hi(θ,θ)
Initialize the parameters to w0 and θ0
Repeat until convergence
End
Fix w and optimize θ
Fix θ and optimize w
Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
Stochastic subgradient descent
Optimization of wminw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
Expected loss, models uncertainty
Form of optimization similar to Latent SVM
Observation: When Δ is independent of true h,our framework is equivalent to Latent SVM
Concave-Convex Procedure (CCCP)
• Previous Methods
• Our Framework
• Optimization
• Results
• Ongoing and Future Work
Outline
Object Detection
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Input x
Output y = “Deer”Latent Variable h
Mammals Dataset
60/40 Train/Test Split
5 Folds
Train Input xi Output yi
Results – 0/1 Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.10.20.30.40.50.60.70.80.9
Average Test Loss
LSVMOur
Statistically Significant
Results – Overlap Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.1
0.2
0.3
0.4
0.5
0.6Average Test Loss
LSVMOur
Action DetectionInput x
Output y = “Using Computer”Latent Variable h
PASCAL VOC 2011
60/40 Train/Test Split
5 Folds
Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Train Input xi Output yi
Results – 0/1 Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.2
0.4
0.6
0.8
1
1.2Average Test Loss
LSVMOur
Statistically Significant
Results – Overlap Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62
0.64
0.66
0.68
0.7
0.72
0.74Average Test Loss
LSVMOur
Statistically Significant
• Previous Methods
• Our Framework
• Optimization
• Results
• Ongoing and Future Work
Outline
Slides Deleted !!!