Download pdf - ECE 6504: Advanced Topics in Machine Learnings14ece6504/slides/L17_mrf_inference… · ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning

Probabilistic Graphical Models and Large-Scale Learning

Dhruv Batra Virginia Tech

Topics –  Markov Random Fields: MAP Inference

–  Integer Programming, LP formulation –  Dual Decomposition

Readings: KF 13.1-5, Barber 5.1,28.9

Administrativia •  HW1

–  Solutions and grades released

•  HW2 –  Solutions released –  Grades next week

•  Project Presentations –  When: April 22, 24 –  Where: in class –  5 min talk

•  Main results •  Semester completion 2 weeks out from that point so nearly finished

results expected •  Slides due: April 21 11:55pm

(C) Dhruv Batra 2

Recap of Last Time

(C) Dhruv Batra 3

MAP Inference

(C) Dhruv Batra 4

MAP

Inference

Most Likely Assignment

y1

y2

…

yn

Person

Table Plate

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open

S(y) =�

i∈Vθi(yi) +

�

(i,j)∈E

θij(yi, yj)

P (y) =1

Z eS(y)

kx1 1 1 10 0

kxk

10

10 10

10

0

0

Node Scores / Local Rewards

Edge Scores / Distributed Prior

P (y)

y

MAP Inference •  Why is MAP difficult?

•  What if we independently maximize the terms?

(C) Dhruv Batra 5

MAP in Pairwise MRFs

•  Over-Complete Representation

6 (C) Dhruv Batra

G = (V, E)

X1

X2

…

Xn Xi kxk

… …

θij(1, 1) θij(1, k)

θij(k, 1) θij(k, k)

θ1(1)

θ1(k)

…

kx1

x1 (x1, x2) (xn−1, xn)

…

θn(1)

θn(k)

… …

θij(1, 1) θij(1, k)

θij(k, 1) θij(k, k)

… xn…

kx1

µi(s)1

1

0

0

0

0

0

0

θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)

1

0

0

0

xi = 1

0

1

0

0

xi = 2

µ1(1) . . . µ1(k) µn(1) . . . µn(k)

xi

MAP in Pairwise MRFs

•  Over-Complete Representation

7 (C) Dhruv Batra

G = (V, E)

x1

x2

…

xn Xi

x1 (x1, x2) (xn−1, xn)… xn…

θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)

k2x1

µij(s, t)

xi

xj

1 0 0 0 0 0 0 0 0 0 0 0

xi = 1xj = 1

0 1 0 0 0 0 0 0 0 0 0 0

xj = 2xi = 1

µ1(1) . . . µ1(k) µn(1) . . . µn(k) µ12(1, 1) . . . µ12(k, k) µn−1,n(1, 1) . . . µn−1,n(k, k)µx =

S(x) = θ · µx

MAP in Pairwise MRFs •  Integer Program

(C) Dhruv Batra 8

�

s

µij(s, t) = µj(t)

�

s,t

µi,j(s, t) = 1

�

s

µi(s) = 1

µij(s, t) ∈ {0, 1}

µi(s) ∈ {0, 1}Indicator Variables

Unique Label

Consistent Assignments

maxµ

θTµ

MAP in Pairwise MRFs •  MAP Integer Program

(C) Dhruv Batra 9

µx5

µx4µx3

µx2

µx1

maxµ

θTµ

s.t. Aµ = b

µ(·) ∈ {0, 1}

MAP in Pairwise MRFs •  MAP Linear Program

(C) Dhruv Batra 10

µx5

µx4µx3

µx2

µx1

A = O(|E|)

O(|E|)

Off-the-shelf solvers CPLEX Mosek

etc

maxµ

θTµ

s.t. Aµ = b

µ(·) ∈ [0, 1]

Plan for today •  MRF Inference

–  (Specialized) MAP Inference •  Integer Programming Formulation •  Linear Programming Relaxation

–  Understanding the LP better –  When is it tight? –  When is it not?

•  Dual Decomposition –  Algorithm for solving this LP

(C) Dhruv Batra 11


(C) Dhruv Batra 12

µx5

µx4µx3

µx2

µx1

maxµ

θTµ

s.t. Aµ = b

µ(·) ∈ {0, 1}

Marginal Polytope

(C) Dhruv Batra 13

•  a

Figure Credit: David Sontag

MAP in Pairwise MRFs •  MAP Linear Program

•  Properties –  If LP-opt is integral, MAP is found –  LP always integral for trees –  Efficient message-passing schemes for solving LP

(C) Dhruv Batra 14

µx5

µx4µx3

µx2

µx1

maxµ

θTµ

s.t. Aµ = b

µ(·) ∈ [0, 1]

LP Relaxation •  Block Co-ordinate / Sub-gradient Descent on Dual

(C) Dhruv Batra 15

A = O(|E|)

O(|E|)

λij→j

λji→i


(C) Dhruv Batra 16

A = O(|E|)

O(|E|)

λ(t)ji→i

λ(t)ij→j


(C) Dhruv Batra 17

A = O(|E|)

O(|E|)

λ(t+1)ij→j

λ(t+1)ji→i

Distributed Message-Passing

Still inefficient!

Linear Programming Duality

(C) Dhruv Batra 18 Figure Credit: David Sontag

Dual Decomposition •  For MAP Inference

–  On board

(C) Dhruv Batra 19


(C) Dhruv Batra 20

maxµ

θTµ

s.t. Aµ = b

µ(·) ∈ {0, 1}

µx5

µx4µx3

µx2

µx1

MAP LP •  Lagrangian Relaxation

(C) Dhruv Batra 21

Dual

Convex (Non-smooth)

Upper-Bound on MAP

f(λ) =

Subgradient Descent

∇f(λ(0))

λ(0)

∇f(λ(1))

λ(1)

maxµ∈C

�

i

θi · µi +�

(i,j)

θij · µij

s.t. µi(·),µij(·) ∈ {0, 1}

minλ≥0

f(λ)

λ

MAP score

f(λ)

−λ · (Aµ− b)