ECE 6504: Advanced Topics in Machine Learning
Probabilistic Graphical Models and Large-Scale Learning
Dhruv Batra Virginia Tech
Topics – Markov Random Fields: MAP Inference
– Integer Programming, LP formulation – Dual Decomposition
Readings: KF 13.1-5, Barber 5.1,28.9
Administrativia • HW1
– Solutions and grades released
• HW2 – Solutions released – Grades next week
• Project Presentations – When: April 22, 24 – Where: in class – 5 min talk
• Main results • Semester completion 2 weeks out from that point so nearly finished
results expected • Slides due: April 21 11:55pm
(C) Dhruv Batra 2
Recap of Last Time
(C) Dhruv Batra 3
MAP Inference
(C) Dhruv Batra 4
MAP
Inference
Most Likely Assignment
y1
y2
…
yn
Person
Table Plate
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open
S(y) =�
i∈Vθi(yi) +
�
(i,j)∈E
θij(yi, yj)
P (y) =1
Z eS(y)
kx1 1 1 10 0
kxk
10
10 10
10
0
0
Node Scores / Local Rewards
Edge Scores / Distributed Prior
P (y)
y
MAP Inference • Why is MAP difficult?
• What if we independently maximize the terms?
(C) Dhruv Batra 5
MAP in Pairwise MRFs
• Over-Complete Representation
6 (C) Dhruv Batra
G = (V, E)
X1
X2
…
Xn Xi kxk
… …
θij(1, 1) θij(1, k)
θij(k, 1) θij(k, k)
θ1(1)
θ1(k)
…
kx1
x1 (x1, x2) (xn−1, xn)
…
θn(1)
θn(k)
… …
θij(1, 1) θij(1, k)
θij(k, 1) θij(k, k)
… xn…
kx1
µi(s)1
1
0
0
0
0
0
0
θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)
1
0
0
0
xi = 1
0
1
0
0
xi = 2
µ1(1) . . . µ1(k) µn(1) . . . µn(k)
xi
MAP in Pairwise MRFs
• Over-Complete Representation
7 (C) Dhruv Batra
G = (V, E)
x1
x2
…
xn Xi
x1 (x1, x2) (xn−1, xn)… xn…
θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)
k2x1
µij(s, t)
xi
xj
1 0 0 0 0 0 0 0 0 0 0 0
xi = 1xj = 1
0 1 0 0 0 0 0 0 0 0 0 0
xj = 2xi = 1
µ1(1) . . . µ1(k) µn(1) . . . µn(k) µ12(1, 1) . . . µ12(k, k) µn−1,n(1, 1) . . . µn−1,n(k, k)µx =
S(x) = θ · µx
MAP in Pairwise MRFs • Integer Program
(C) Dhruv Batra 8
�
s
µij(s, t) = µj(t)
�
s,t
µi,j(s, t) = 1
�
s
µi(s) = 1
µij(s, t) ∈ {0, 1}
µi(s) ∈ {0, 1}Indicator Variables
Unique Label
Consistent Assignments
maxµ
θTµ
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 9
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
MAP in Pairwise MRFs • MAP Linear Program
(C) Dhruv Batra 10
µx5
µx4µx3
µx2
µx1
A = O(|E|)
O(|E|)
Off-the-shelf solvers CPLEX Mosek
etc
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ [0, 1]
Plan for today • MRF Inference
– (Specialized) MAP Inference • Integer Programming Formulation • Linear Programming Relaxation
– Understanding the LP better – When is it tight? – When is it not?
• Dual Decomposition – Algorithm for solving this LP
(C) Dhruv Batra 11
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 12
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
Marginal Polytope
(C) Dhruv Batra 13
• a
Figure Credit: David Sontag
MAP in Pairwise MRFs • MAP Linear Program
• Properties – If LP-opt is integral, MAP is found – LP always integral for trees – Efficient message-passing schemes for solving LP
(C) Dhruv Batra 14
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ [0, 1]
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 15
A = O(|E|)
O(|E|)
λij→j
λji→i
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 16
A = O(|E|)
O(|E|)
λ(t)ji→i
λ(t)ij→j
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 17
A = O(|E|)
O(|E|)
λ(t+1)ij→j
λ(t+1)ji→i
Distributed Message-Passing
Still inefficient!
Linear Programming Duality
(C) Dhruv Batra 18 Figure Credit: David Sontag
Dual Decomposition • For MAP Inference
– On board
(C) Dhruv Batra 19
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 20
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
µx5
µx4µx3
µx2
µx1
MAP LP • Lagrangian Relaxation
(C) Dhruv Batra 21
Dual
Convex (Non-smooth)
Upper-Bound on MAP
f(λ) =
Subgradient Descent
∇f(λ(0))
λ(0)
∇f(λ(1))
λ(1)
maxµ∈C
�
i
θi · µi +�
(i,j)
θij · µij
s.t. µi(·),µij(·) ∈ {0, 1}
minλ≥0
f(λ)
λ
MAP score
f(λ)
−λ · (Aµ− b)