115
Free Energy Approximation Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University [email protected] Advisor: Dr. John M. Walsh June 19, 2014 1/101 1

Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Free Energy Approximation

Solmaz Torabi

Dept. of Electrical and Computer EngineeringDrexel [email protected]

Advisor: Dr. John M. Walsh

June 19, 2014

1/101

hey

1

Page 2: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Refrences

M. Opper and D. Saad, “Advanced mean field methods: Theory andpractice,” MIT press, 2001.

J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructingfree-energy approximations and generalized belief propagationalgorithms.” Information Theory, IEEE Transactions, vol. 51, 2005.

M. Welling and Y. W. Teh, “Approximate inference in boltzmannmachines,” Artificial Intelligence, vol. 143, pp. 19–50, 2003.

A. Montanari, “Lecture notes, inference in graphical models,” 2011.

2/101

hey

2

Page 3: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Basics of graphical model

I Basics of message passing algorithm

I Variational free energy

I Mean field approximation

I TAP ( Thouless, Anderson and Palmer )

I Region Based approximation

I Bethe free energy

I Kikuchi approximation

3/101

hey

3

Page 4: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Undirected graphical model, Markov random field

Undirected graphical model with random vector X = (X1, ...,Xn)

I Given an undirected graph G = (V ,E ), each node s has anassociated random variable Xs

I A clique C ⊆ V is a fully connected subset of V .

I The distribution p factorizes according to G if it can be expressed asa product over cliques.

p(x) =1

Z

∏C∈C

ψC (xC )

p(x) =1

Zψ1(x1, x2, x3)ψ2(x3, x4, x5)ψ3(x4, x5, x6)ψ4(x4, x7)

4/101

hey

4

Page 5: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

graphical model, Factor Graph

I Factor graph is bipartite graph G = (V ,F ,E ), where V is theoriginal set of vertices, and (s, a) ∈ E if xs participates in the factorindexed by a ∈ F

I We assume that the functions fa(xa) are non-negative and finite.

P(X) =1

Z

∏a

fa(xa)

P(x) =1

ZfA(x1, x2)fB(x2, x3, x4)fC (x4)

5/101

hey

5

Page 6: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

graphical model- Undirected graph, Factor Graph

I Maximal cliques:C = {1, 2, 3, 4}, {4, 5, 6}, {6, 7}

I Vertex set V = {1, ..., 7}factor set F = {a, b, c}

P(x) =1

Zfa(x1, x2, x3, x4)fb(x4, x5, x6)fc(x6, x7)

6/101

hey

6

Page 7: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Pairwise graphical model

I Subclass of Markov networks commonly encounteredI Ising model, Boltzmann machines

I Computer vision

P(x1, x2, ...xN) =1

Z

∏(ij)

ψij(xi , xj)∏i

ψi (xi )

where ψij(xi , xj) is compatibility function and ψi (xi ) is the evidenceof node iψi : X → R+ for each i ∈ Vψij : X × X → R+ for each (i , j) ∈ E

7/101

hey

7

Page 8: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Boltzmann distribution

I Physicists specialize on the class of distribution P known asBoltzman distribution (Gibbs distribution)

P(X) =e−H[X]

Z

I H(X) is the energy of each state

I Z =∑X

e−H[X] is the normalizing partition function

I Pair-wise Markov random Field

P(X) =1

Z

∏(ij)

ψij(xi , xj)∏i

ψi (xi ) =e−H[X]

Z

energy is

H[X] = −∑ij

lnψij(xi , xj)−∑i

lnψi (xi )

8/101

hey

8

Page 9: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Ising model

I An example of pairwise model with ψij(xi , xj) = exp{Jijxixj},ψi (xi ) = exp{θixi}

I is a mathematical model of ferromagnetism in statistical mechanics.

I xi represents magnetic dipole moments of atomic spins,xi ∈ {+1,−1}, any two adjacent sites i , j has an interaction Jij

I each site i has an external magnetic field θi

I The energy for each configuration is

H(X) = −∑i,j

Jijxixj −∑i

θixi

I The configuration probability is

P(X) =e−H(X)

Z=

e−

∑i,j

Jijxixj−∑i

θixi

Z 9/101

hey

9

Page 10: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Inference tasks

I Computing marginal distribution p(xA) over a particular subsetA ⊂ V on nodes.

I Computing conditional distribution P(xA|xB)

I Computing the most probable configurations. (MAP)

x = argmaxx∈Xm

P(x)

10/101

hey

10

Page 11: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Basics of graphical model

I Basics of message passing algorithm

I Variational free energy

I Mean field approximation

I TAP ( Thouless, Anderson and Palmer )

I Region Based approximation

I Bethe free energy

I Kikuchi approximation

11/101

hey

11

Page 12: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Belief propagation

I BP is a method for computing marginal probability functions.

I The computed marginal probability is exact if the factor graph hasno cycles.

mi→a(xi ) =∏

c∈N(i)\a

mc→i (xi )

ma→i (xi ) =∑xa\xi

fa(xa)∏

c∈N(i)\a

mc→i (xi )

I i is used as general index over variables, a over factors.

12/101

hey

12

Page 13: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Belief propagation

In case this iteration converges, marginals are approximated by,

bi (xi ) ∝∏a∈Ni

ma→i (xi )

ba(xa) ∝ fa(xa)∏i∈Na

mi→a(xi )

I In general LBP may not converge.I If it does, bi (xi ) may not be close to the true marginal P(xi ).

I The set of pseudomarginals b may not be realizable.

13/101

hey

13

Page 14: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Basics of graphical model

I Basics of message passing algorithm

I Variational free energy

I Mean field approximation

I TAP ( Thouless, Anderson and Palmer )

I Region Based approximation

I Bethe free energy

I Kikuchi approximation

14/101

hey

14

Page 15: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Write down the energy function

Construct an approximation

Find the stationary condition

15/101

hey

15

Page 16: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Variational free energy

I Variational method approximates an intractable distribution P(X) ofrandom variables X = (S1, ...,SN) by a tractable distribution Q(X)

I Q is chosen to minimize certain distance measure.

KL(Q||P) =∑X

Q(X) lnQ(X)

P(X)=⟨

lnQ

P

⟩Q

where 〈.〉Q denotes the expectation with respect to Q

16/101

hey

16

Page 17: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Variational free energy

To find the best approximate to P = e−H(X)

Z

KL(Q||P) = ln Z + E [Q]− S [Q]

where

I S [Q] = −∑X

Q(X) ln Q(X) is the entropy of Q

I E [Q] =∑X

Q(X)H[X] is called average energy

=⇒ minQ

KL(Q||P) = ln Z + minQ

(E [Q]− S [Q])︸ ︷︷ ︸Variational free energy

17/101

hey

17

Page 18: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Variational free energy for Ising model

I The model under consideration is a Boltzmann machine.

P(X) =e−H(X)

Z=

e−

∑i,j

Jijxixj−∑i

θixi

Z

I For binary variable it is convenient to reparametrize these marginalsas follows,

pi (xi = 1) =1 + mi

2

18/101

hey

18

Page 19: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field approximation

Find a factorized distribution that best describes the true distribution.

I For binary variable the most general factorized distribution has theform.

QMF (x) =∏i

Qi (xi ) =∏i

(1 + ximi )

2

I KL(QMF ||P) = E (QMF )− S(QMF ) + log(Z )

I E (QMF ) =∑

QMFH(x) = −∑ij

Jijmimj −∑i

θimi

I S(QMF ) = −∑i

QMF ln QMF = −∑i

(1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

)

19/101

hey

19

Page 20: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field approximation

How to solve?

minmi

KL(QMF ||P)

I By taking derivative with respect to mi

I ∂∂mi

{−∑ij

Jijmimj−∑i

θimi+∑i

1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2 +log(Z )}

20/101

hey

20

Page 21: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field fixed points

∂KL

∂mi= −

∑j∈N(i)

Jijmj − θi + log( mi

1−mi

)I Fixed points of MF approximation:

mi =

exp(∑j

Jijmj + θi )− exp(−∑j

Jijmj − θi )

exp(∑j

Jijmj + θi ) + exp(−∑j

Jijmj − θi )

⇒ mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N

21/101

hey

21

Page 22: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

22

Page 23: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

23

Page 24: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

24

Page 25: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

25

Page 26: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

26

Page 27: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

mi = tanh(∑j

Jijmj + θi ), i = 1, ...,N (1)

Note

I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.

I These MF equations are run sequentially, i.e. we fix all mj except mi .

I In each step MF free energy is convex. Equation (1) finds minimumin one step.

I This procedure can be interpreted as coordinate descent in the mi

I Alternatively, all parameters mi can be updated in parallel.

I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).

I Some of the solutions may not be local minima

22/101

hey

27

Page 28: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

I In d-dimensional Ising model without theexternal magnetic field (θ = 0) and havingthe same interaction Jij = α

m(t+1) = tanh(2dαm(t))

I For α < 12d , the iteration converges to lim

t→∞m(t) = 0 (left figure)

I For α > 12d , if m(0) ≶ 0⇒ lim

t→∞m(t) = ∓m∗

[4]A. Montanari, Lecture notes for inference in graphical models,201123/101

hey

28

Page 29: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field

I MF neglects the dependency between the random variables.

However,

I We get an upper bound on the exact free energy.

KL(QMF ||P) = E (QMF )− S(QMF )︸ ︷︷ ︸=F [QMF ] Variational MF energy

− (− log(Z ))︸ ︷︷ ︸Exact free energy

Since KL(QMF ||P) ≥ 0

F (QMF ) ≥ − log(Z )

24/101

hey

29

Page 30: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field Method in general

I P(x) = 1Z

∏a∈F

fa(xa) is True distribution

I Q(x) =∏i

qi (xi ) is Approximate distribution

FMF (Q) =∑i

S(qi ) +∑a∈F

∑xa

∏xi∈N(a)

qi (xi ) log fa(xa)

I We passed from (|X |n − 1) to n(|X | − 1)

I FMF is no longer convex.

minQ

FMF (Q) subject to∑xi

qi (xi ) = 1

25/101

hey

30

Page 31: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Mean Field Method in general

I Add Lagrange multiplier λi

I Find the stationary condition by ∂L(Q,λ)∂qi (xi )

= 0

qi (xi ) ∝∏

a∈N(i)

ma→i (xi )

where

ma→i (xi ) = exp

( ∑xj :j∈N(a)\i

log fa(xa)∏

j∈N(a)\i

qj(xj)

)

I A simple greedy algorithm for finding a stationary point consists inupdating the q by iterating the above equations until convergence.

26/101

hey

31

Page 32: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Basics of graphical model

I Basics of message passing algorithm

I Variational free energy

I Mean field approximation

I TAP ( Thouless, Anderson and Palmer )

I Region Based approximation

I Bethe free energy

I Kikuchi approximation

27/101

hey

32

Page 33: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

TAP approximation

The Legendre Transform and Plefka’s Expansion

28/101

hey

33

Page 34: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

I Don’t restrict the approximate distribution Q to be productdistributions

I Minimize free energy in two steps:

I Constrained minimization in the family of distributions satisfying〈X〉Q = m for fixed m

G(m) = minQ{F [Q] = E [Q]− S [Q] |〈X〉Q = m}

I Minimize G(m) with respect to m

29/101

hey

34

Page 35: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

G (m) = minQ{F [Q] | 〈X〉Q = m}

By adding Lagrange multiplier λThen Lagrangian

G (m, λ) = E [Q]− S [Q]−∑i

λi (〈xi 〉Q −mi )

G (m, λ) =∑X

Q(X)H[X]− S [Q]−∑x

∑i

λixiQ(X) +∑i

λimi

is the form of variational free energy, where H[X] is replaced byH[X]−

∑i

λixi . We can construct such a gibbs free energy by adding a

set of external auxiliary field.

⇒ Qλ(X) = 1Z e−H[X]+

∑i

λixi

30/101

hey

35

Page 36: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

The dual function is,

G (mi ) = maxλi

{∑i

λimi − log(Z (λi ))}

I This equation known as Legendre transform between {λi} and {mi}.

I Z (λi ) is the normalizing constant for the Gibbs distribution

Qλ(X) =1

Zλi

e−H[X]+

∑i

λixi=

1

Zλi

e−

∑i,j

Jijxixj−∑i

θixi+∑i

λixi

I Set θ → 0 by shifting the Lagrange multiplier λi → λi − θi

I Z (λi ) =∑xi

exp(−∑i,j

Jijxixj +∑i

λixi )

31/101

hey

36

Page 37: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

G (mi ) = maxλi

{∑i

λimi − log(∑xi

exp(−∑i,j

βJijxixj +∑i

λixi ))}

I Plefka expansion is derived by Jij → βJij , by Taylor expanding theGibbs free energy around β = 0, where β is an inverse temperaturein physics,

Notice

I For each term in Taylor expansion, one has to expand the Lagrangemultiplier λi which maximize the Gibbs distribution as well as log(Z )

I The auxiliary field is temperature dependent.

32/101

hey

37

Page 38: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

I with Gn = ∂n

∂βn G (m)|β=0

G (m) = G0(m) + βG1(m) +β2

2!G2(m) + ...

I G0(m) =∑i

{1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

}Spins are entirely

controlled by the auxiliary field.

I G1(m) = −∑i<j

Jijmimj

I G2(m) = − 12

∑ij

J2ij (1−m2

i )(1−m2j )

I ...

33/101

hey

38

Page 39: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

I with Gn = ∂n

∂βn G (m)|β=0

G (m) = G0(m) + βG1(m) +β2

2!G2(m) + ...

I G0(m) =∑i

{1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

}Spins are entirely

controlled by the auxiliary field.

I G1(m) = −∑i<j

Jijmimj

I G2(m) = − 12

∑ij

J2ij (1−m2

i )(1−m2j )

I ...

33/101

hey

39

Page 40: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

I with Gn = ∂n

∂βn G (m)|β=0

G (m) = G0(m) + βG1(m) +β2

2!G2(m) + ...

I G0(m) =∑i

{1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

}Spins are entirely

controlled by the auxiliary field.

I G1(m) = −∑i<j

Jijmimj

I G2(m) = − 12

∑ij

J2ij (1−m2

i )(1−m2j )

I ...

33/101

hey

40

Page 41: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

I with Gn = ∂n

∂βn G (m)|β=0

G (m) = G0(m) + βG1(m) +β2

2!G2(m) + ...

I G0(m) =∑i

{1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

}Spins are entirely

controlled by the auxiliary field.

I G1(m) = −∑i<j

Jijmimj

I G2(m) = − 12

∑ij

J2ij (1−m2

i )(1−m2j )

I ...

33/101

hey

41

Page 42: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Plefka Expansion

with Gn = ∂n

∂βn G (m)|β=0

G (m) = G0(m) + βG1(m) +β2

2!G2(m) + ...

I G0 =∑i

{1+mi

2 ln 1+mi

2 + 1−mi

2 ln 1−mi

2

}⇒ MF variational entropy

I G1(m) = −∑i<j

Jijmimj ⇒ MF variational energy

I G2(m) = − 12

∑ij

J2ij (1−m2

i )(1−m2j )

I ...⇒ Takes into account the higher order dependencies

34/101

hey

42

Page 43: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

TAP approximation

TAP approximation= Minimizing G (m) for β = 1 and keeping only termsup to second order

GTAP(mi ) =−∑(ij)

Jijmimj +∑i

{1 + mi

2ln

1 + mi

2+

1−mi

2ln

1−mi

2

}− 1/2

∑(ij)

J2ij (1−m2

i )(1−m2j )

︸ ︷︷ ︸dependencies between rvs

I TAP takes in to account the dependencies between random variables.

I It’s exact in the high temperature for certain classes of models (SKmodels).

35/101

hey

43

Page 44: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

TAP approximation

Fixed points of TAP approximation:

mi = tanh( ∑

j∈N(i)

Jijmj +1

2(1− 2mi )

∑j∈N(i)

J2ijmj(1−mj)

)

I Running these equations doesn’t guarantee that TAP-Gibbs freeenergy decreases. (mi appears on both sides)

I There is danger that radius of convergence (of taylor expansion) willbe too small to obtain result for values of β we are interested in.

36/101

hey

44

Page 45: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Standard BP algorithm

I Junction tree algorithm

I Region Based free energyI Different types of region graph

I Special case: Bethe free energy

I Stationary points of Bethe free energy = BP Fixed points

I Generalized belief propagation (GBP)I Stationary points of Region based free approximation

37/101

hey

45

Page 46: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Standard BP algorithm

I Junction tree algorithm

I Region Based free energyI Different types of region graph

I Special case: Bethe free energy

I Stationary points of Bethe free energy = BP Fixed points

I Generalized belief propagation (GBP)I Stationary points of Region based free approximation

38/101

hey

46

Page 47: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing - Computing the marginals

p(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)fC (x4)

b1(x1) = p(x1) =?

39/101

hey

47

Page 48: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I

I

I

40/101

hey

48

Page 49: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)m2→A(x2)

I

I

41/101

hey

49

Page 50: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)m2→A(x2)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I

42/101

hey

50

Page 51: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I b1(x1) =∑x2,x3,x4

fA(x1, x2)fB(x2, x3, x4)m3→Bm4→B(x2)

43/101

hey

51

Page 52: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I b1(x1) =∑x2,x3,x4

fA(x1, x2)fB(x2, x3, x4)m4→B(x2)

44/101

hey

52

Page 53: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I b1(x1) =∑x2,x3,x4

fA(x1, x2)fB(x2, x3, x4)m4→B(x2)

45/101

hey

53

Page 54: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I b1(x1) =∑x2,x3,x4

fA(x1, x2)fB(x2, x3, x4)mC→4(x4)

46/101

hey

54

Page 55: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Message Passing

I b1(x1) = mA→1(x1)

I b1(x1) =∑x2

fA(x1, x2)mB→2(x2)

I b1(x1) =∑x2,x3,x4

fA(x1, x2)fB(x2, x3, x4)fC (x4)

47/101

hey

55

Page 56: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Standard BP algorithm

I Junction tree algorithm

I Region Based free energyI Different types of region graph

I Special case: Bethe free energy

I Stationary points of Bethe free energy = BP Fixed points

I Generalized belief propagation (GBP)I Stationary points of Region based free approximation

48/101

hey

56

Page 57: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

I Works for general graphI Tree shape graphs

I Graphs with cycles

I Directed graphs

I Undirected graphs

I Remove cycles by clustering nodes into cliques.

I Perform Belief Propagation on cliques.

I Exact inference of (clique) marginals.

49/101

hey

57

Page 58: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm - Moralization

I we first moralize the graph by connecting all unconnected parents.After this we make the graph an undirected graph

50/101

hey

58

Page 59: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm- Triangulation

I Triangulation i.e. for any given cycle there is an edge between anytwo non-successive nodes in the cycle

51/101

hey

59

Page 60: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

ψC1(xA, xB) = ψA,B(xA, xB)

52/101

hey

60

Page 61: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

ψC2(xB , xC , xF ) = ψB,C (xB , xC )ψC ,F (xC , xF )

53/101

hey

61

Page 62: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

ψC3(xC , xF , xG ) = ψC ,F (xC , xF )ψF ,G (xF , xG )

54/101

hey

62

Page 63: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

ψC4(xC , xD , xG , xH) =

ψC ,D,H(xC , xD , xH)ψD,G ,H(xD , xG , xH)

55/101

hey

63

Page 64: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

ψC5(xC , xE , xH) = ψC ,E ,H(xC , xE , xH)

56/101

hey

64

Page 65: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Independence in junction tree

I supposeI T is a junction tree for graph G .

I Consider cliques Ci and Cj with separator Sij = Ci ∩ Cj

I Variables X and Y are on opposite site of separator.

I X and Y are independent given Sij

57/101

hey

65

Page 66: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

Given junction tree and potentials on the cliques, the messages fromclique Ci to Cj is

mij(xSij ) =∑Ci\Sij

ψCi (xCi )∏

k∈N(i)\j

mki (xSki)

I Sij : nodes shared by i and j

I N(i): neighboring cliques of i

I The marginal distribution of any cliquesare

p(xCi ) = ψCi

∏k∈N(i)

mki (xSki)

p(xSij ) = mijmji

58/101

hey

66

Page 67: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Junction Tree algorithm

I m12(xB) =∑xA

ψC1(xA, xB)

I m23(xC , xF ) =∑xB

ψC2(xB , xC , xF )m12(xB)

I m34(xC , xG ) =∑xF

ψC3(xC , xF , xG )m23(xC , xF )

I m45(xC , xH) =∑xD ,xG

ψC4(xC , xD , xG , xH)m34(xC , xG )

59/101

hey

67

Page 68: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Standard BP algorithm

I Junction tree algorithm

I Region Based free energyI Different types of region graph

I Special case: Bethe free energy

I Stationary points of Bethe free energy = BP Fixed points

I Generalized belief propagation (GBP)I Stationary points of Region based free approximation

60/101

hey

68

Page 69: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Variational free energy

To find the best approximate to P = 1Z

∏c∈cliques

φc(xc)

KL(Q||P) =∑X

Q(x) ln Q(x)−∑x

Q(x) ln p(x)

where

I U[Q] = −∑x

Q(x) ln Q(x) is the entropy of Q

I H[Q] = −∑

c∈cliques

∑xc

Q(xc) log φc(xc) is called average energy

=⇒ minQ

KL(Q||P) = ln Z + minQ

(U[Q]− H[Q])︸ ︷︷ ︸Variational free energy

61/101

hey

69

Page 70: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Variational Free energy

I Two solution methods to

minQ

F [Q]

I Approximate F[Q]

I Region Based approximation =⇒ FR(qR)

I Choose a simpler form of Q

I Mean Field Approximation =⇒ Q =∏

qi

62/101

hey

70

Page 71: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I We decompose the system into subsystems and then approximatethe free energy by combining the free energies of the subsystems

I Group nodes in to (possibly overlapping) clusters.

I In each region, all variable nodes connected to any included factornodes are included.

I The sets of nodes {1, 2},{B,C , 2, 3, 4} could be regions.

I {B, 3} could not be a region.

63/101

hey

71

Page 72: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I The overall energy is the sum of the free energies of all the regions.

I If some of the large regions overlap, subtract out the free energies ofthese overlap region.

I Each factor and variable node should be counted exactly once.

I For every factor node a and every variable node i in a set of regionsR, the counting number is∑

R∈R

cRI(a ∈ FR) =∑R∈R

cRI(i ∈ VR) = 1

where I(x ∈ S) = 1 if x ∈ S

64/101

hey

72

Page 73: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I The overall energy is the sum of the free energies of all the regions.

I If some of the large regions overlap, subtract out the free energies ofthese overlap region.

I Each factor and variable node should be counted exactly once.

I For every factor node a and every variable node i in a set of regionsR, the counting number is∑

R∈R

cRI(a ∈ FR) =∑R∈R

cRI(i ∈ VR) = 1

where I(x ∈ S) = 1 if x ∈ S

64/101

hey

73

Page 74: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I The overall energy is the sum of the free energies of all the regions.

I If some of the large regions overlap, subtract out the free energies ofthese overlap region.

I Each factor and variable node should be counted exactly once.

I For every factor node a and every variable node i in a set of regionsR, the counting number is∑

R∈R

cRI(a ∈ FR) =∑R∈R

cRI(i ∈ VR) = 1

where I(x ∈ S) = 1 if x ∈ S

64/101

hey

74

Page 75: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I The overall energy is the sum of the free energies of all the regions.

I If some of the large regions overlap, subtract out the free energies ofthese overlap region.

I Each factor and variable node should be counted exactly once.

I For every factor node a and every variable node i in a set of regionsR, the counting number is∑

R∈R

cRI(a ∈ FR) =∑R∈R

cRI(i ∈ VR) = 1

where I(x ∈ S) = 1 if x ∈ S

64/101

hey

75

Page 76: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I Region base free energy for a set of region R is

FR(bR) = UR(bR)− HR(bR)

I Count every node once.

I UR(bR) =∑

R∈RcRUR(bR) =⇒ region based average energy

I HR(bR) =∑

R∈RcRHR(bR) =⇒ region based approximate entropy

65/101

hey

76

Page 77: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

if ∑R∈R

cRI(i ∈ FR) = 1for all a ∈ F

andbR(xR) = pR(xR)

=⇒ The average energy becomes exact.

UR(bR) =∑R∈R

cRUR(bR) = −∑R∈R

cR∑xR

bR(xR)∑a∈FR

ln fa(xa)

Exact energy⇒U =∑x∈S

p(x)E (x) = −∑a

∑xa

pa(xa) ln fa(xa)

66/101

hey

77

Page 78: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Based free energy

I Counting each variable node and factor node exactly once, results inexactness of the average energy.

I However, the region based entropy is still an approximation.

HR(bR) =∑R∈R

cRHR(bR) = −∑R∈R

cR∑xR

bR(xR) ln bR(xR)

I We are interested in the accuracy of HR(bR) near its maximum.

minbR

FR(bR) = minbR{UR(bR)− HR(bR)}

I HR(bR) should achieve its maximum when all beliefs bR(xR) areuniform. (Maxent normal )

67/101

hey

78

Page 79: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Outline

I Standard BP algorithm

I Junction tree algorithm

I Region Based free energyI Different types of region graph

I Special case: Bethe free energy

I Stationary points of Bethe free energy = BP Fixed points

I Generalized belief propagation (GBP)I Stationary points of Region based free approximation

68/101

hey

79

Page 80: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

Regions are R = {Ri ,Ra, i ∈ V , a ∈ F}I Ri = ({i}, 0, 0)

I Ra = ({N (a)}, {a}, {(i , a) : i ∈ N (a)})

I Large regions containing a single factornode a and all attached variable nodes.cr = 1

I Small regions containing a single variablenode cr = 1− di where di = |N (i)|

I R1 is subregion of R2 if R1 ⊂ R2

69/101

hey

80

Page 81: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Bethe region graph for thefollowing factor graph

70/101

hey

81

Page 82: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Bethe region graph for thefollowing factor graph

71/101

hey

82

Page 83: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Bethe region graph for thefollowing factor graph

72/101

hey

83

Page 84: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

cr = 1 for r ∈ Ra

cr = 1− di for r ∈ Ri

73/101

hey

84

Page 85: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Assigning counting number to the regions.

74/101

hey

85

Page 86: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Every variable node and factor node is counted once.

75/101

hey

86

Page 87: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy

I Bethe free energy:

FBethe = UBethe − HBethe

I Bethe average energy:

UBethe = −∑a

∑xa

ba(xa) ln fa(xa)

I Bethe entropy:

HBethe =−∑a

∑xa

ba(xa) ln ba(xa)

+∑i

(di − 1)∑xi

bi (xi ) ln bi (xi )

76/101

hey

87

Page 88: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Free energy - Maxent normal

I Global maximum of Bethe entropy is achieved when the beliefsbi (xi ), ba(xa) are uniform.

HBethe =∑i

H(bi )−∑a

I (ba)

whereH(bi ) = −

∑xa

bi (xi ) ln bi (xi )

I (ba) = −(∑

xa

ba(xa) ln ba(xa)−∑

i∈N(a)

H(bi ))

I Maximum of H(bi ) achieved when bi (xi ) has uniform dist.

I I (ba) ≥ 0→ when the beliefs are uniform, I (ba) = 0

77/101

hey

88

Page 89: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Constrained Bethe free energy

Constrained Bethe free energy enforces the beliefs to obey:

I The normalization constrains:∑xi

bi (xi ) = 1

∑xa

ba(xa) = 1

I Consistency constraints ∑xa\xi

ba(xa) = bi (xi )

I Inactive Constraint ⇒ Complementary slackness

0 ≤ bi (xi ) ≤ 1

0 ≤ ba(xa) ≤ 178/101

hey

89

Page 90: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Minimizing Constrained Bethe free energy

Theorem:Stationary points of the constrained Bethe free energy are BP fixedpoints.

minimizeb

FBethe

subject to∑xi

bi (xi ) = 1∑xa

ba(xa) = 1∑xa\xi

ba(xa) = bi (xi )

ba(xa), bi (xi ) ≥ 0

79/101

hey

90

Page 91: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Minimizing Constrained Bethe free energy

I Lagrangian:

L = FBethe +∑i

γi

{∑xi

bi (xi )− 1

}

+∑a

∑i∈N(a)

∑xi

λai (xi )

{∑xa\xi

ba(xa)− bi (xi )

}

I ∂L∂bi (xi )

= 0 =⇒ bi (xi ) = exp

(1

di−1{1− γi +∑

a∈N(i)

λai (xi )}

)

I ∂L∂ba(xa)

= 0 =⇒ ba(xa) = exp

(− Ea(xa) +

∑a∈N(i)

λai (xi )

)

80/101

hey

91

Page 92: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Minimizing Constrained Bethe free energy

I Lagrangian:

L = FBethe +∑i

γi

{∑xi

bi (xi )− 1

}

+∑a

∑i∈N(a)

∑xi

λai (xi )

{∑xa\xi

ba(xa)− bi (xi )

}

I ∂L∂bi (xi )

= 0 =⇒ bi (xi ) = exp

(1

di−1{1− γi +∑

a∈N(i)

λai (xi )}

)

I ∂L∂ba(xa)

= 0 =⇒ ba(xa) = exp

(− Ea(xa) +

∑a∈N(i)

λai (xi )

)

80/101

hey

92

Page 93: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Minimizing Constrained Bethe free energy

I Lagrangian:

L = FBethe +∑i

γi

{∑xi

bi (xi )− 1

}

+∑a

∑i∈N(a)

∑xi

λai (xi )

{∑xa\xi

ba(xa)− bi (xi )

}

I ∂L∂bi (xi )

= 0 =⇒ bi (xi ) = exp

(1

di−1{1− γi +∑

a∈N(i)

λai (xi )}

)

I ∂L∂ba(xa)

= 0 =⇒ ba(xa) = exp

(− Ea(xa) +

∑a∈N(i)

λai (xi )

)

80/101

hey

93

Page 94: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Bethe Fixed points

Define

λai (xi ) = ln∏

b∈N(i)\a

mb→i (xi )

Obtain BP equations:

bi (xi ) ∝∏

a∈N(i)

ma→i (xi )

ba(xa) ∝ fa(xa)∏

i∈N(a)

∏b∈N(i)\a

mb→i (xi )

81/101

hey

94

Page 95: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Unrealizable beliefs

I bA(x1, x2) =

(0.4 0.10.1 0.4

)

I bB(x2, x3) =

(0.4 0.10.1 0.4

)

I bC (x1, x3) =

(0.1 0.40.4 0.1

)I b1(x1) = b2(x2) = b3(x3) =

(0.50.5

)

I There is no b(x1, x2, x3)!

82/101

hey

95

Page 96: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Unrealizable beliefs

I bA(x1, x2) =

(0.4 0.10.1 0.4

)

I bB(x2, x3) =

(0.4 0.10.1 0.4

)

I bC (x1, x3) =

(0.1 0.40.4 0.1

)I b1(x1) = b2(x2) = b3(x3) =

(0.50.5

)I There is no b(x1, x2, x3)!

82/101

hey

96

Page 97: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region based energy

I How to select a set of regions R and and counting number cR?

I Some methods are:I Bethe method

I Junction Graph method

I Cluster variation method

I Region Graph method

83/101

hey

97

Page 98: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Graph

I Region graph is a directed acyclic graph, R → R ′ ⇒ R ′ ⊆ R.

I If there is a directed path between R and R ′, we say R is ancestor ofR ′ , R ∈ A(R ′) and R ′ is a descendant of R, R ′ ∈ D(R)

I In in a region graph these set of conditions satisfied,

cR = 1−∑

R′∈A(R)

c ′R for all R ∈ R

84/101

hey

98

Page 99: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region Graph Condition

I Every nodes is counted once:∑R∈R

cRI(a ∈ FR) =∑R∈R

cRI(i ∈ VR) = 1

⇒ ensures that the region graph average energy is exact

I Regions containing a particular variable node, form a connectedsubgraph⇒ Marginal probability is consistent.

85/101

hey

99

Page 100: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Example of not valid region graph

I This is not a valid region graph. Variable 5 is not counted once.

86/101

hey

100

Page 101: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Example of valid region graph

I Bethe region graph for thefollowing factor graph

87/101

hey

101

Page 102: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Example of valid region graph

I Bethe region graph for thefollowing factor graph

88/101

hey

102

Page 103: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Example of valid region graph

I Bethe region graph for thefollowing factor graph

89/101

hey

103

Page 104: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region graph

cR = 1−∑

R′∈A(R)

c ′R for all R ∈ R

90/101

hey

104

Page 105: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Region graph

I Valid region graph (every node is counted once)

91/101

hey

105

Page 106: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief Propagation

I Theorem: The stationary points of the constrained region-basedfree energy for a valid region graph, are the fixed points ofGeneralized belief propagation” for that region.

Stationary point of FR({bR}) =∑R∈R

cRFR(bR)

subject to∑xR

bR(xR) = 1 forall R ∈ R∑xP\xC

bP(xP) = bC (xC ) Parent, Child regions ∈ R

bR(xR) ≥ 0

92/101

hey

106

Page 107: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief Propagation

I Belief in a region is product of:

I Local information (factors in region)

I Messages from parent regions

I Messages into descendant regions from parents who ware notdescendant.

I Message update rules obtained by enforcing marginalizationconstraints.

93/101

hey

107

Page 108: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief Propagation

Belief in a region is:

bR(xR) ∝∏a∈AR

fa(xa)×

( ∏P∈P(R)

mP→R(xR)

)︸ ︷︷ ︸Messages from parent regions

×

( ∏D∈D(R)

∏P′∈P(D)\ε(R)

mP′→D(xD)

)︸ ︷︷ ︸

messages into descendant regions from parents who ware not descendant

94/101

hey

108

Page 109: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief Propagation

I Bethe region graph for thefollowing graph

[2]J.S. Yedidia, Construction free energy approximation, 2005

95/101

hey

109

Page 110: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief propagation

96/101

hey

110

Page 111: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief propagation

97/101

hey

111

Page 112: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief propagation

98/101

hey

112

Page 113: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief propagation

Use marginalization constraints to derive message-update rules

99/101

hey

113

Page 114: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Generalized Belief propagation

Use marginalization constraints to derive message-update rules

100/101

hey

114

Page 115: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation

Thanks

Questions?

101/101

hey

115