Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
9/21/2020
1
BIOS 731
Advanced Statistical ComputingFall 2020
lecture 11
Introduction to MCMC
Steve Qin
Motivation
• Generate random samples from arbitrary
probability distributions.
• Ideally, the random samples are i.i.d.
– Independent
– Identically distributed
• Extremely hard problem especially for high
dimensional distributions.
2
9/21/2020
2
3
Markov Chain Monte Carlo
• The goal is to generates sequence of random
samples from an arbitrary probability
density function (usually high dimensional),
• The sequence of random samples form a
Markov chain,
• The purpose is simulation (Monte Carlo).
What is Monte Carlo
4
9/21/2020
3
5
6
9/21/2020
4
7
Applications of Monte Carlo
• Optimization
• Numerical integration
• Generate random samples
8
9/21/2020
5
9
10
9/21/2020
6
Motivation
• Generate iid r.v. from high-dimensional
arbitrary distributions is extremely difficult.
• Drop the “independent” requirement.
• How about also drop the “identically
distributed” requirement?
11
Markov chain
• Assume a finite state, discrete Markov chain
with N different states.
• Random process Xn, n = 0,1,2,…
• Markov property,
• Time-homogeneous
• Order
– Future state depends on the past m states.12
},...,2,1{ NSxn =
)|(),...,,|( 122111 nnnnnn xXxXPxXxXxXxXP ======= ++
9/21/2020
7
Key parameters
• Transition matrix
• Initial probability distribution π(0)
• Stationary distribution (invariant/equilibrium)
13
)}.,({
),,()|( 1
jipP
jipiXjXP nn
=
=== −
).()()( ixPi n
n ==
.P =
Reducibility
• A state j is accessible from state i (written
i→ j) if
• A Markov chain is irreducible if it is
possible to get to any state from any state.
14
.0)|( )(
0 === n
ijn piXjXP
9/21/2020
8
Recurrence
• A state i is transient if given that we start in
state i, there is a non-zero probability that
we will never return to i. State i is recurrent
if it is not transient.
15
Ergodicity
• A state i is ergodic if it is aperiodic and
positive recurrent. If all states in an
irreducible Markov chain are ergodic, the
chain is ergodic.
16
9/21/2020
9
Reversible Markov chains
• Consider an ergodic Markov chain that
converges to an invariant distribution π. A
Markov chain is reversible if for all x, y ϵ S,
which is known as the detailed balance
equation.
• An ergodic chain in equilibrium and
satisfying the detailed balance condition has
π as its unique stationary distribution. 17
).,()(),()( xypyyxpx =
18
Markov Chain Monte Carlo
• The goal is to generates sequence of random
samples from an arbitrary probability
density function (usually high dimensional),
• The sequence of random samples form a
Markov chain,
in Markov chain, P → π
in MCMC, π → P.
9/21/2020
10
Examples
19
( )( )
2
2
11
22 2 2
1 ( ) 1
( | , , )ij kj
kj
xK M
kj
k E i k j
P X E e
− −−
= = =
( ) ( ) ( )2 2 2 2
0
1 1 ( )
2
2 2
0
( )2 2
0
( )
( | ) | , | ,
(2 ) 2
( ) 1
1
2 1
k
k
K M
ij kj kj kj kj kj kj kj
k j E i k
n ka
na
k
ij
E i k
ij
E i k k
P X E P x p p d d
na
b
a n
x
b xn
= = =
−
+
=
=
=
+ =
+ + + + − +
1 1
K M
k j= =
Bayesian Inference
1
1
1
( , , )
( , , )
( , , )
n
n
m
Y y y
Z z z
=
=
=
Genotype
Haplotype
Frequency
1 2
1
1 1
( , , ) g
i i
n m
z z g
i g
P Y Z
−
= =
=
( )Dirichlet Prior
9/21/2020
11
21
History
• Metropolis, N., Rosenbluth, A. W.,
Rosenbluth, M. N., Teller, A. H. and Teller,
E. (1953).
Equation of state calculations by fast
computing machines. Journal of Chemical
Physics, 21, 1087–1092.
Metropolis algorithm
• Direct sampling from the target distribution is
difficult,
• Generating candidate draws from a proposal
distribution,
• These draws then “corrected” so that
asymptotically, they can be viewed as random
samples from the desired target distribution.
22
9/21/2020
12
Pseudo code
• Initialize X0,
• Repeat
– Sample Y ~ q(x,.),
– Sample U ~ Uniform (0,1),
– If U ≤ α (X,Y), set Xi = y,
– Otherwise Xi = x.
23
An informal derivation• Find α (X,Y):
• Joint density of current Markov chain state and the
proposal is g(x,y) = q(x,y)π(x)
• Suppose q satisfies detail balance
q(x,y) π(x) = q(y,x)π(y)
• If q(x,y) π(x) > q(y,x)π(y), introduce
α(x,y) < 1 and α(y,x) =1
hence
• If q(x,y) π(x) > q(y,x)π(y), …
• The probability of acceptance is
24
.),()(
),()(),(
yxqx
xyqyyx
=
.),()(
),()(,1min),(
=
yxqx
xyqyyx
9/21/2020
13
25
Metropolis-Hastings Algorithm
),()(
),()(
yxTx
xyTyr
tt
t
=
26
Remarks
• Relies only on
calculation of
target pdf up to a
normalizing
constant.
9/21/2020
14
27
Remarks
• How to choose a good proposal function is
crucial.
• Sometimes tuning is needed.
– Rule of thumb: 30% acceptance rate
• Convergence is slow for high dimensional
cases.
28
Illustration of Metropolis-Hastings
• Suppose we try to sample from a bi-variate
normal distributions.
• Start from (0, 0)
• Proposed move at each step is a two
dimensional random walk
0 1,
0 1N
1
1
cos
sin
(0,1)
(0,2 )
t t
t t
x x s
y y s
with
s U
U
+
+
= +
= +
9/21/2020
15
29
Illustration of Metropolis-Hastings
• At each step, calculate
since
1 1( , )
( , )
t t
t t
x yr
x y
+ +=
2
1 1 1 1(( , ), ( , )) (( , ), ( , )) 1t t t t t t t tT x y x y T x y x y + + + += =
( )
( )
( ) ( )( )
2 2
1 1 1 122
2 2
22
2 2 2 2
1 1 1 12
1 1exp 2
2(1 )2 1
1 1exp 2
2(1 )2 1
1exp 2 2
2(1 )
t t t t
t t t t
t t t t t t t t
x x y y
r
x x y y
x x y y x x y y
+ + + +
+ + + +
− − +
−− =
− − +
−−
= − − + − − +
−
30
Convergence
• Trace plot Good
0 200 400 600 800 1000
-20
24
68
10
Index
x
0 200 400 600 800 1000
-20
24
68
10
Index
y
9/21/2020
16
31
Convergence
• Trace plot Bad
0 100 200 300 400 500
-20
24
68
10
Index
x[1
:50
0]
0 100 200 300 400 500
-20
24
68
10
Index
y[1
:50
0]
32
Convergence
• Trace plot
0 1000 2000 3000 4000 5000
-20
24
68
10
Index
x
0 1000 2000 3000 4000 5000
-20
24
68
10
Index
y
9/21/2020
17
33
Convergence
• Autocorrelation plot Good
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series x
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series y
34
Convergence
• Autocorrelation plot Bad
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series x
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series y
s = 0.5
9/21/2020
18
35
Convergence
• Autocorrelation plot Okay s = 3.0
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series x
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Series y
36
References
• Metropolis et al. 1953,
• Hastings 1973,
• Tutorial paper:
Chib and Greenberg (1995). Understanding the Metropolis--Hastings Algorithm. The American Statistician 49, 327-335.
9/21/2020
19
37
Gibbs Sampler
• Purpose: Draw random samples form a joint
distribution (high dimensional)
• Method: Iterative conditional sampling
1 2( , ,..., ) Target ( )nx x x x x=
i [ ], draw x ( | )i ii x x −
38
Illustration of Gibbs Sampler
9/21/2020
20
39
40
References
• Geman and Geman 1984,
• Gelfand and Smith 1990,
• Tutorial paper:
Casella and George (1992). Explaining the
Gibbs sampler. The American Statistician,
46, 167-174.
9/21/2020
21
41
Remarks
• Gibbs Sampler is a special case of Metropolis-
Hastings
• Compare to EM algorithm, Gibbs sampler and
Metropolis-Hastings are stochastic procedures
• Verify convergence of the sequence
• Require Burn in
• Use multiple chains
42
References
• Geman and Geman 1984,
• Gelfand and Smith 1990,
• Tutorial paper:
Casella and George (1992). Explaining the
Gibbs sampler. The American Statistician,
46, 167-174.