Mixing
Dana RandallGeorgia Tech
A tutorial on Markov chains
( Slides at: www.math.gatech.edu/~randall )
Outline
Fundamentals for designing a Markov chain
Bounding running times (convergence rates)
Connections to statistical physics
Main Q: What do typical elements look like?
Determine properties of “typical’’ elements Evaluate thermodynamic properties
(such as free energy, entropy,…)
Estimate the cardinality of the set “Markov chain Monte Carlo’’
Random sampling can be
used to:
Markov chains for sampling
Given: A large set (matchings, colorings,
independent sets,…)
A A
K K
2 2
Andrei Andreyevich Markov 1856-1922
Markov chains
Sampling using Markov chains
State space Ω
( |Ω| ~ cn )
Sampling using Markov chains
State space Ω
Step 1. Connect the state space.
( |Ω| ~ cn )
E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.
Basics of Markov chains
Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆.
- With all remaining prob. stay at x.
Transitions P: Random walk on H
(max deg in H)
H
Def’n: A MC is ergodic if it is: •irreducible - for all x,y Ω, t: Pt(x,y) > 0; (connected) •aperiodic - g.c.d. t: Pt(x,y) > 0 =1.
(not bipartite)(The “t step” transition prob.)
x
y
The stationary distribution
(1/∆/∆)
Thm: Any finite, ergodic MC converges to a unique stationary distribution π.
Thm: The stationary distribution π satisfies:
(The detailed balance condition)
π(x) P(x,y) = π(y) P(y,x).
P symmetric π is uniform.
˜
So,
E.g., For >0, sample ind. set I w/ prob: π(I) =
where Z = ∑J |J|.
0 21
|I|
Z
Q: What if we want to sample from some other distribution?
Sampling from non-uniform distributions
Step 2. Carefully define the transition probabilities.
The Metropolis Algorithm
Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x))
(with remaining probability stay at x).
(MRRTT ’53)
π(y)/∆π(x)1π(y)π(x)
x
y( if π(x) ≥ π(y) )
π(x) P(x,y) = π(y) P(y,x)
1/∆
For independent sets:
min(1,)
I
I v
min(1,-
1)
π(y) (|I|+1)/Z
π(x) (|I|)/Z= =
Q: But for how long do we walk?
Basics continued…
Step 1. Connect the state space.Step 2. Carefully define the transition probabilities.
Starting at any state x0, take a random walk for some number of steps . . . and output the final state (from ?).
Step 3. Bound the mixing time.
This tells us the number of steps to take.
The mixing rate
Def’n: The total variation distance is ||Pt,π|| = max __ ∑ |Pt(x,y) - π(x)|.
x Ω yΩ 2 1
A Markov chain is rapidly mixing if() is poly (n, log(-1)).
Def’n Given , the mixing time is
= min t: ||Pt’,π|| < , t’ ≥
t.
A
Spectral gap
Let >≥…≥ Ω be the eigenvalues of P.
Def’n: Gap(P) = 1-|2| is the spectral gap.
Mixing rate
Spectral Gap
Thm: (Alon, Alon-Milman, Sinclair)
≤
log ( )
≥ log
( ).
Gap(P)
1
2 Gap(P)
|2|
1
π*
1
2
Outline
Fundamentals for designing a Markov chain
Bounding running times (convergence rates)
Connections to statistical physics
Outline for rest of talk
Techniques:
•Coupling
•Flows and paths
•Indirect methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Coupling
Coupling
Once they agree, they move in sync (xt=yt
xt+1=yt+1)
Couple moves, but each simulates the MC
Start at any x0 and y0
x0
y0Simulate 2 processes:
Def’n: A coupling is a MC on Ω x Ω:1) Each process Xt, Yt is a faithful
copy of the original MC,
2) If Xt = Yt, then Xt+1 = Yt+1.
Coupling
T = max ( E [ Tx,y ] ), where Tx,y = min t: Xt=Yt | X0=x, Y0=y.
x,y
The coupling time T is:
Thm: () ≤ T e ln -1 . (Aldous’81)
Ex1: Walk on the hypercube
MCCUBE:• Start at v0=(0,0,…,0).• Repeat: - Pick i [n], b 0,1. - Set vi = b.
Symmetric, ergodic π is uniform.
Mixing time? Use coupling:
x0 = 0 1 1 0 0 1 y0 = 1 1 1 0 0 0
i=2, b=0: x1 = 0 0 1 0 0 1 y1 = 1 0 1 0 0 0
i=6, b=1: x2 = 0 0 1 0 0 1 y2 = 1 0 1 0 0 1
i=1, b=1: xt = 1 0 1 1 1 0 yt = 1 0 1 1 1 0. . .
˜
so T = n log n (coupon
collecting)
() = O ( n ln (n -1).
˜
Outline
Techniques:
•Coupling - path coupling
•Flows and paths
•Indirect methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Ex 2: Colorings
Given: A graph G (max deg d), k > 1.Goal: Find a random k-coloring of G. MCCOL: (Single point replacement)
• Starting at some k-coloring C0
• Repeat: - With prob 1/2 do nothing. - Pick v V, c [k]; - Recolor v with c, if possible.
The “lazy” chain
If k ≥ d + 2, then the state space is connected.
(Therefore π is uniform.)
Note: k ≥ d + 1 colorings exist.(Greedy)
˜
Path Coupling
Coupling: Show for all x,y , E[ (dist(x,y)) ] < 0.
Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[ (dist(u,v)) ] < 0.
-
-
Consider a shortest path:x = z0, z1, z2, . . . , zr= y, dist(zi,zi+1) = 1 dist(x,y) = r.
[Bubley,Dyer,Greenhill’97-8]
E[ (dist(x,y)) ]
≤ i
E[ (dist(zi,zi+1)) ]
≤ 0.
˜
Path coupling for MCCOL
Thm: MCCOL is rapidly mixing if k ≥ 3d. (Jerrum ‘95)
Pf: Use path coupling: dist(x,y) = 1.
x y
w w
E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ 0.
12nk12nk
v = w, c C \ , , : ∆dist = -1,Cases:
v N(w), c , : ∆dist = + 1 (or 0) o.w.: ∆dist = 0.
Summary: Coupling
Pros: Can yield very easy proofs
Cons: Demands a lot from the chain
Extensions: Careful coupling (k ≥ 2d) (Jerrum’95)
Change the MC (Luby-R-
Sinclair’95)
“Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)
Outline
Techniques:
•Coupling
•Flows and paths
•Indirect methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Conductance and flows
Ω
(Jerrum-Sinclair’88)
= min (S)SΩ, π(S)≤1/2
S SC(S) =
∑ π(s) P(s,s’)
∑ π(s)
sS, s’SC
sS
2 Thm: ≤ Gap(P) ≤ 2 2
x
y
Min cut Max flow
˜
paths: xy: from xΩ, to yΩ, x ≠ y, carrying π(x)π(y) units of flow.
: Make |Ω|2
canonical
(Sinclair’92)
Q(e) = π(u) P(u,v) = π(v) P(v,u).
Capacity of e=(u,v): e
= min l
( lis the max path length )
_
() = max ∑ π(x) π(y) Q(e)
1
xy e
e
The congestion of these paths is:
Ω
Thm: ≤ log ( π(x))-1._
Ex 3: Back to the hypercube
- The complementary pair (u’,v’) determines (s,t), so |
xy e | = 2n-1.
and l= n = Õ(n2).
() = max = = n Q(e)
∑ π(x) π(y)xy e
e
2n-1 2-2n
2-n (1/2n)
˜
s = 0 1 1 0 0 1 t = 1 1 0 0 0 0
Ex 3: Back to the hypercube
s = 0 1 1 0 0 1 t = 1 1 0 0 0 0
Ex 3: Back to the hypercube
1 1 1 0 0 1
s = 0 1 1 0 0 1 t = 1 1 0 0 0 0
Ex 3: Back to the hypercube
1 1 1 0 0 1
1 1 1 0 0 1
s = 0 1 1 0 0 1 t = 1 1 0 0 0 0
Ex 3: Back to the hypercube
1 1 0 0 0 1
1 1 0 0 0 1
1 1 0 0 0 1 t = 1 1 0 0 0 0
1 1 1 0 0 1
1 1 1 0 0 1
u =v =
0 1 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 1 1 0 0 0
0 1 1 0 0 0 0 1 1 0 0 1 = s
u’ =v’ =
- Bound the number of paths through (u,v) E.
- Define a canonical path from s to t.
Outline
Techniques:
•Coupling
•Flows and paths
•Indirect methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Ex 4: Sampling matchings
Ex 4: Sampling matchings
MCMATCH:
Starting at M0, repeat: Pick e = (u,v) E
- If e M, remove e;
- If u and v unmatched in
M, add e;
- If u matched (by e’) and v unmatched (or vice versa), add e and remove e’;
- Otherwise do nothing.
eu v
u ve
e’
eu v
Thm: Coupling won’t work! (Kumar-Ramesh’99)
Mixing time of MCMATCH
s
t
s t
s
t
u
vpaths using (u,v) determined by u’
. . . as before.
u’
Techniques:
•Coupling
•Flows and paths
•Indirect methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Outline
Goal: Given , sample ind. set I with prob: π(I) = |I|/Z,
Z = ∑J |J|.
Ex 5: Independent Sets
MCIND: Starting at I0, Repeat: - Pick v V and b 0,1; - If v I, b=0, remove v w.p. min (1,-1) - If v I, b=1, add v w.p. min (1,) if possible; - O.w. do nothing.
/
Slow mixing of MCIND (large )
n
n
(nn/2)
10 ∞
S SC
large there is a “bad cut,” . . . so MCIND is slowly mixing.
˜
#R/#B
(Even)
(Odd)
Summary: Flows
Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing.
Cons: Requires global knowledge of the chain to spread out paths.
Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )
Techniques:
•Coupling
•Flows and paths
•Indirect methods - Comparison - Decomposition
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Outline
Comparison(Diaconis,Saloff-Coste’93)
unknown
Pknown
P_
w
z
For each edge (x,y) P, make a path x,y using edges in P.
Let (z,w) be the set of paths x,y using (z,w)
_x y
Thm: Gap(P) ≥ Gap(P)._
1A
A = max ∑ |x,y|
π(x)P(x,y)
1
Q(e) exy e
_
Comparison
w
z
(x,y) P x,y (using P)
(z,w) is the set of paths x,y using (z,w)
Thm: Gap(P) ≥ Gap(P)._
1A
x y _known
P
unknownP
_
SS_
SS_
˜
(S,S) cannot be a bad cut in P if it isn’t in P.
__
Adjacency . . . The ˆ Matrix Reloaded
Comparison, aka . . .
Disjoint decomposition
Ω
A1
A3
A2
A6
A5A4
a1
a3 a4
a2
a5
a6
P—
Projection
P3
Restrictions
P
_
π(ai) =
π(Ai)
P(ai,aj) = ∑
π(x)P(x,y)
π(Ai) xAi,
yAj
_
(Madras-R.’96, Martin-R.’00)
Thm: Gap(P) ≥ — Gap(P) (mini Gap(Pi)).12
_
Let Ω = ind. sets of G; Ωk = ind. sets of size k.
For G=(V,E):
Ex 6: MCIND on small ind. sets
MCSWAP:Starting at I0, Repeat: - Pick (u,v,b) V x V x 0,1,2; - If b=0 and u V, remove u w.p. min (1,-1) - If b=1 and u V, add u w.p. min (1,) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing.
* Consider first the “swap” chain:
/
Thm: MCIND is rapidly mixing
on
Ωk , where K = |V|/2(∆+1).
k = 0
K
Ind. sets w/bounded size (cont.)
Thm: MCIND is rapidly mixing on
Ωk , where K=|V|/2(∆+1).k = 1
K
Ω0 Ω1 Ω2 . . . ΩK-1 ΩK
Ωk
a0 a1 a2 . . .aK-1 aK
ProjectionRestrictions
|ΩK| is logconcave, . . .
so P is rapidly mixing. _
.?
MCSWAP
The Restrictions of MCswap
Ω0 Ω1 Ω2 . . . ΩK-1 ΩK
Ωk
ProjectionRestrictions
.
Thm: MCSWAP is rapidly mixing on Ωk , k < K. (Bubley-Dyer’97)
.
KThm: MCSWAP is rapidly mixing on
Ωk .
k = 1 (Decomposition)
Cor: MCIND is rapidly mixing on Ωk .
k = 1
K
(Comparison)
Summary: Indirect methods
Pros: Offer a top down approach; allow hybrid methods to be used..
Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02)
Cons: Can increase the complexity.
Techniques:
•Coupling
•Flows and paths
•Hybrid methods
Problems:
•Walk on the hypercube
•Colorings
•Matchings
•Independent sets
•Connections with statistical physics: - problems - algorithms - physical insights
Outline
They have a need for sampling
Use many interesting heuristics
Great intuition
Experts on “large data sets’’
Microscopic
Macroscopic details behavior
(i.e., phase transitions)
Why Statistical Physics?
(3-colorings) (Independent sets)
(Matchings) (Min cut)
- - -- +
Models from statistical physics
Potts model
Hardcore model
Dimer model
--- --
- --
-++
++
+
++
+
+-
Ising model
+
Independent sets:
π(I)=|I|/Z
Models (cont.)
Matchings:
π(M)=|M|/Z
Ising model:
π()= |E |/Z,
E= = u v: (u) =
(v)
(E = E= E≠)
˜
-- --
--++
++
+
++
+
-
+
=
ˇ
Models: (The physics perspective)
Independent sets: H() = -|I|
If = e then π() = |I| /Z.
Given: A physical system Ω = Define: A Gibbs measure as follows:
π() = e-H()/ Z,
H() (the Hamiltonian),
= 1/kT (inverse temperature),
normalizing constant or partition function. where Z = ∑ e
-H() is the
Ising model: H() = -∑ u v
(u,v) E
If = e2 then π() = |E | /Z.=
Physics perspective (cont.)
Q: What about on the infinite lattice? Use conditional probabilities:
?
But there can be boundary effects !!!
Phase transitions: Ind. sets
Low temperature: long range effects
High temperature: ∂ effects die out
regions
……
T∞
T0
Tc
TC indicates a “phase transition.”
Slow mixing of MCIND
revisited
∞
S SC
n
n
(n n)
#R/#B
10
π(Si) = ∑ π(s) e-H(s)/Z
Si
sSi
“Entropy “Energy term” term”
Group by # of “fault lines”
S SC
. . .
Fault lines are vacant pathsof width 2 from top to bottom (or left to right).
SR
S1
SB
S3
S2
“Peierls Argument”
2. Shift right of fault by 1 and flip colors.
For fixed path length l,
S1
SB x 2n/2 x 3l.
1. Identify horizontalor vertical fault line .
( S1)
3. Remove rt column ; add points along fault line, if possible.
( SB)
Peierls Argument cont.
≤ 2n/2 3l
S1 SB
( ≥ l - n/2more points)
≤ π(SB) 2n/2 3n (n/2) (poly(n)) /n)
≤ π(SB) ( )n/2 (poly(n)),
if > 18.
18
π(S1) = ∑ π()eS1
≤ ∑ ∑ π() 2n/2 3l (n/2-l)
l eSB
(and similarly for S2, S3, …)
Conclusions
Techniques:• Coupling: can be easy
when it works
•Flows: requires global knowledge of chain;
very useful for slow mixing
• Connection to physics: can offer tremendous insights
Open problems: . . .
• Indirect methods: top down approach; often increases complexity
Conclusions
Open problems:
...
Sampling 4,5,6-colorings on the grid.
Sampling perfect matchings on non-bipartite graphs. Sampling acyclic orientations in a graph. Sampling configurations of the Potts model (a generalization of Ising, but with more colors).
How can we further exploit phase transitions? Other physical intuition?