40
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford Universit Ben Packer tanford University M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France

Modeling Latent Variable Uncertainty for Loss-based Learning

  • Upload
    whitley

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Modeling Latent Variable Uncertainty for Loss-based Learning. M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay , Île-de-France. Ben Packer Stanford University. Daphne Koller Stanford University. Aim. Accurate learning with weakly supervised data. - PowerPoint PPT Presentation

Citation preview

Page 1: Modeling Latent Variable Uncertainty for Loss-based Learning

Modeling Latent Variable Uncertainty for Loss-based Learning

Daphne KollerStanford University

Ben PackerStanford University

M. Pawan KumarÉcole Centrale Paris

École des Ponts ParisTechINRIA Saclay, Île-de-France

Page 2: Modeling Latent Variable Uncertainty for Loss-based Learning

AimAccurate learning with weakly supervised data

Train Input xi Output yi

Bison

Deer

Elephant

Giraffe

Llama

Rhino Object Detection

Input x

Output y = “Deer”Latent Variable h

Page 3: Modeling Latent Variable Uncertainty for Loss-based Learning

(y(f),h(f)) = argmaxy,h f(Ψ(x,y,h))

AimAccurate learning with weakly supervised data

Feature Ψ(x,y,h) (e.g. HOG)Input x

Output y = “Deer”

Prediction

Function f : Ψ(x,y,h) (-∞, +∞)

Latent Variable h

Page 4: Modeling Latent Variable Uncertainty for Loss-based Learning

f* = argminf Objective(f)

AimAccurate learning with weakly supervised data

Feature Ψ(x,y,h) (e.g. HOG)Input x

Output y = “Deer”

Function f : Ψ(x,y,h) (-∞, +∞)

Learning

Latent Variable h

Page 5: Modeling Latent Variable Uncertainty for Loss-based Learning

AimFind a suitable objective function to learn f*

Feature Ψ(x,y,h) (e.g. HOG)Input x

Output y = “Deer”

Function f : Ψ(x,y,h) (-∞, +∞)

Learning

Encourages accurate prediction

User-specified criterion for accuracy

f* = argminf Objective(f)

Latent Variable h

Page 6: Modeling Latent Variable Uncertainty for Loss-based Learning

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Page 7: Modeling Latent Variable Uncertainty for Loss-based Learning

Latent SVM

Linear function parameterized by w

Prediction (y(w), h(w)) = argmaxy,h wTΨ(x,y,h)

Learning minw Σi Δ(yi,yi(w),hi(w))

✔ Loss based learning

✖ Loss independent of true (unknown) latent variable

✖ Doesn’t model uncertainty in latent variables

User-defined loss

Page 8: Modeling Latent Variable Uncertainty for Loss-based Learning

Expectation Maximization

Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))

Z

Prediction (y(θ), h(θ)) = argmaxy,h Pθ(y,h|x)

Page 9: Modeling Latent Variable Uncertainty for Loss-based Learning

Expectation Maximization

Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))

Z

Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)

Learning maxθ Σi log (Pθ(yi|xi))

Page 10: Modeling Latent Variable Uncertainty for Loss-based Learning

Expectation Maximization

Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))

Z

Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)

Learning maxθ Σi Σhi log (Pθ(yi,hi|xi))

✔ Models uncertainty in latent variables

✖ Doesn’t model accuracy of latent variable prediction

✖ No user-defined loss function

Page 11: Modeling Latent Variable Uncertainty for Loss-based Learning

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Page 12: Modeling Latent Variable Uncertainty for Loss-based Learning

Problem

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Page 13: Modeling Latent Variable Uncertainty for Loss-based Learning

Solution

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Page 14: Modeling Latent Variable Uncertainty for Loss-based Learning

Solution

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Pθ(hi|yi,xi)

hi

Page 15: Modeling Latent Variable Uncertainty for Loss-based Learning

SolutionUse two different distributions for the two different tasks

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Page 16: Modeling Latent Variable Uncertainty for Loss-based Learning

The Ideal CaseNo latent variable uncertainty, correct prediction

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Page 17: Modeling Latent Variable Uncertainty for Loss-based Learning

The Ideal CaseNo latent variable uncertainty, correct prediction

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

hi(w)

Page 18: Modeling Latent Variable Uncertainty for Loss-based Learning

The Ideal CaseNo latent variable uncertainty, correct prediction

hi

Pw(yi,hi|xi)

(yi,hi)(yi,hi(w))

Pθ(hi|yi,xi)

hi(w)

Page 19: Modeling Latent Variable Uncertainty for Loss-based Learning

In PracticeRestrictions in the representation power of models

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Page 20: Modeling Latent Variable Uncertainty for Loss-based Learning

Our FrameworkMinimize the dissimilarity between the two distributions

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

User-defined dissimilarity measure

Page 21: Modeling Latent Variable Uncertainty for Loss-based Learning

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Page 22: Modeling Latent Variable Uncertainty for Loss-based Learning

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)

Hi(w,θ)

Page 23: Modeling Latent Variable Uncertainty for Loss-based Learning

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))

- β Hi(θ,θ)Hi(w,θ)

Page 24: Modeling Latent Variable Uncertainty for Loss-based Learning

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- β Hi(θ,θ)Hi(w,θ)minw,θ Σi

Page 25: Modeling Latent Variable Uncertainty for Loss-based Learning

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Page 26: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimizationminw,θ Σi Hi(w,θ) - β Hi(θ,θ)

Initialize the parameters to w0 and θ0

Repeat until convergence

End

Fix w and optimize θ

Fix θ and optimize w

Page 27: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 28: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 29: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Page 30: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Stochastic subgradient descent

Page 31: Modeling Latent Variable Uncertainty for Loss-based Learning

Optimization of wminw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Expected loss, models uncertainty

Form of optimization similar to Latent SVM

Observation: When Δ is independent of true h,our framework is equivalent to Latent SVM

Concave-Convex Procedure (CCCP)

Page 32: Modeling Latent Variable Uncertainty for Loss-based Learning

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Page 33: Modeling Latent Variable Uncertainty for Loss-based Learning

Object Detection

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Input x

Output y = “Deer”Latent Variable h

Mammals Dataset

60/40 Train/Test Split

5 Folds

Train Input xi Output yi

Page 34: Modeling Latent Variable Uncertainty for Loss-based Learning

Results – 0/1 Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.10.20.30.40.50.60.70.80.9

Average Test Loss

LSVMOur

Statistically Significant

Page 35: Modeling Latent Variable Uncertainty for Loss-based Learning

Results – Overlap Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.1

0.2

0.3

0.4

0.5

0.6Average Test Loss

LSVMOur

Page 36: Modeling Latent Variable Uncertainty for Loss-based Learning

Action DetectionInput x

Output y = “Using Computer”Latent Variable h

PASCAL VOC 2011

60/40 Train/Test Split

5 Folds

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Page 37: Modeling Latent Variable Uncertainty for Loss-based Learning

Results – 0/1 Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.2

0.4

0.6

0.8

1

1.2Average Test Loss

LSVMOur

Statistically Significant

Page 38: Modeling Latent Variable Uncertainty for Loss-based Learning

Results – Overlap Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62

0.64

0.66

0.68

0.7

0.72

0.74Average Test Loss

LSVMOur

Statistically Significant

Page 39: Modeling Latent Variable Uncertainty for Loss-based Learning

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Page 40: Modeling Latent Variable Uncertainty for Loss-based Learning

Slides Deleted !!!