Motivation - haowulab.org · Explaining the Gibbs sampler. The American Statistician, 46, 167-174. 9/21/2020 21 41 Remarks • Gibbs Sampler is a special case of Metropolis-Hastings

9/21/2020

1

BIOS 731

Advanced Statistical ComputingFall 2020

lecture 11

Introduction to MCMC

Steve Qin

Motivation

• Generate random samples from arbitrary

probability distributions.

• Ideally, the random samples are i.i.d.

– Independent

– Identically distributed

• Extremely hard problem especially for high

dimensional distributions.

2

9/21/2020

2

3

Markov Chain Monte Carlo

• The goal is to generates sequence of random

samples from an arbitrary probability

density function (usually high dimensional),

• The sequence of random samples form a

Markov chain,

• The purpose is simulation (Monte Carlo).

What is Monte Carlo

4

9/21/2020

3

5

6

9/21/2020

4

7

Applications of Monte Carlo

• Optimization

• Numerical integration

• Generate random samples

8

9/21/2020

5

9

10

9/21/2020

6

Motivation

• Generate iid r.v. from high-dimensional

arbitrary distributions is extremely difficult.

• Drop the “independent” requirement.

• How about also drop the “identically

distributed” requirement?

11

Markov chain

• Assume a finite state, discrete Markov chain

with N different states.

• Random process Xn, n = 0,1,2,…

• Markov property,

• Time-homogeneous

• Order

– Future state depends on the past m states.12

},...,2,1{ NSxn =

)|(),...,,|( 122111 nnnnnn xXxXPxXxXxXxXP ======= ++

9/21/2020

7

Key parameters

• Transition matrix

• Initial probability distribution π(0)

• Stationary distribution (invariant/equilibrium)

13

)}.,({

),,()|( 1

jipP

jipiXjXP nn

=

=== −

).()()( ixPi n

n ==

.P =

Reducibility

• A state j is accessible from state i (written

i→ j) if

• A Markov chain is irreducible if it is

possible to get to any state from any state.

14

.0)|( )(

0 === n

ijn piXjXP

9/21/2020

8

Recurrence

• A state i is transient if given that we start in

state i, there is a non-zero probability that

we will never return to i. State i is recurrent

if it is not transient.

15

Ergodicity

• A state i is ergodic if it is aperiodic and

positive recurrent. If all states in an

irreducible Markov chain are ergodic, the

chain is ergodic.

16

9/21/2020

9

Reversible Markov chains

• Consider an ergodic Markov chain that

converges to an invariant distribution π. A

Markov chain is reversible if for all x, y ϵ S,

which is known as the detailed balance

equation.

• An ergodic chain in equilibrium and

satisfying the detailed balance condition has

π as its unique stationary distribution. 17

).,()(),()( xypyyxpx =

18

Markov Chain Monte Carlo

• The goal is to generates sequence of random

samples from an arbitrary probability

density function (usually high dimensional),

• The sequence of random samples form a

Markov chain,

in Markov chain, P → π

in MCMC, π → P.

9/21/2020

10

Examples

19

( )( )

2

2

11

22 2 2

1 ( ) 1

( | , , )ij kj

kj

xK M

kj

k E i k j

P X E e

− −−

= = =

( ) ( ) ( )2 2 2 2

0

1 1 ( )

2

2 2

0

( )2 2

0

( )

( | ) | , | ,

(2 ) 2

( ) 1

1

2 1

k

k

K M

ij kj kj kj kj kj kj kj

k j E i k

n ka

na

k

ij

E i k

ij

E i k k

P X E P x p p d d

na

b

a n

x

b xn

= = =

−

+

=

=

=

+ =

+ + + + − +

1 1

K M

k j= =

Bayesian Inference

1

1

1

( , , )

( , , )

( , , )

n

n

m

Y y y

Z z z

=

=

=

Genotype

Haplotype

Frequency

1 2

1

1 1

( , , ) g

i i

n m

z z g

i g

P Y Z

−

= =

=

( )Dirichlet Prior

9/21/2020

11

21

History

• Metropolis, N., Rosenbluth, A. W.,

Rosenbluth, M. N., Teller, A. H. and Teller,

E. (1953).

Equation of state calculations by fast

computing machines. Journal of Chemical

Physics, 21, 1087–1092.

Metropolis algorithm

• Direct sampling from the target distribution is

difficult,

• Generating candidate draws from a proposal

distribution,

• These draws then “corrected” so that

asymptotically, they can be viewed as random

samples from the desired target distribution.

22

9/21/2020

12

Pseudo code

• Initialize X0,

• Repeat

– Sample Y ~ q(x,.),

– Sample U ~ Uniform (0,1),

– If U ≤ α (X,Y), set Xi = y,

– Otherwise Xi = x.

23

An informal derivation• Find α (X,Y):

• Joint density of current Markov chain state and the

proposal is g(x,y) = q(x,y)π(x)

• Suppose q satisfies detail balance

q(x,y) π(x) = q(y,x)π(y)

• If q(x,y) π(x) > q(y,x)π(y), introduce

α(x,y) < 1 and α(y,x) =1

hence

• If q(x,y) π(x) > q(y,x)π(y), …

• The probability of acceptance is

24

.),()(

),()(),(

yxqx

xyqyyx

=

.),()(

),()(,1min),(

=

yxqx

xyqyyx

9/21/2020

13

25

Metropolis-Hastings Algorithm

),()(

),()(

yxTx

xyTyr

tt

t

=

26

Remarks

• Relies only on

calculation of

target pdf up to a

normalizing

constant.

9/21/2020

14

27

Remarks

• How to choose a good proposal function is

crucial.

• Sometimes tuning is needed.

– Rule of thumb: 30% acceptance rate

• Convergence is slow for high dimensional

cases.

28

Illustration of Metropolis-Hastings

• Suppose we try to sample from a bi-variate

normal distributions.

• Start from (0, 0)

• Proposed move at each step is a two

dimensional random walk

0 1,

0 1N

1

1

cos

sin

(0,1)

(0,2 )

t t

t t

x x s

y y s

with

s U

U

+

+

= +

= +

9/21/2020

15

29

Illustration of Metropolis-Hastings

• At each step, calculate

since

1 1( , )

( , )

t t

t t

x yr

x y

+ +=

2

1 1 1 1(( , ), ( , )) (( , ), ( , )) 1t t t t t t t tT x y x y T x y x y + + + += =

( )

( )

( ) ( )( )

2 2

1 1 1 122

2 2

22

2 2 2 2

1 1 1 12

1 1exp 2

2(1 )2 1

1 1exp 2

2(1 )2 1

1exp 2 2

2(1 )

t t t t

t t t t

t t t t t t t t

x x y y

r

x x y y

x x y y x x y y

+ + + +

+ + + +

− − +

−− =

− − +

−−

= − − + − − +

−

30

Convergence

• Trace plot Good

0 200 400 600 800 1000

-20

24

68

10

Index

x

0 200 400 600 800 1000

-20

24

68

10

Index

y

9/21/2020

16

31

Convergence

• Trace plot Bad

0 100 200 300 400 500

-20

24

68

10

Index

x[1

:50

0]

0 100 200 300 400 500

-20

24

68

10

Index

y[1

:50

0]

32

Convergence

• Trace plot

0 1000 2000 3000 4000 5000

-20

24

68

10

Index

x

0 1000 2000 3000 4000 5000

-20

24

68

10

Index

y

9/21/2020

17

33

Convergence

• Autocorrelation plot Good

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series x

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series y

34

Convergence

• Autocorrelation plot Bad

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series x

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series y

s = 0.5

9/21/2020

18

35

Convergence

• Autocorrelation plot Okay s = 3.0

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series x

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series y

36

References

• Metropolis et al. 1953,

• Hastings 1973,

• Tutorial paper:

Chib and Greenberg (1995). Understanding the Metropolis--Hastings Algorithm. The American Statistician 49, 327-335.

9/21/2020

19

37

Gibbs Sampler

• Purpose: Draw random samples form a joint

distribution (high dimensional)

• Method: Iterative conditional sampling

1 2( , ,..., ) Target ( )nx x x x x=

i [ ], draw x ( | )i ii x x −

38

Illustration of Gibbs Sampler

9/21/2020

20

39

40

References

• Geman and Geman 1984,

• Gelfand and Smith 1990,

• Tutorial paper:

Casella and George (1992). Explaining the

Gibbs sampler. The American Statistician,

46, 167-174.

9/21/2020

21

41

Remarks

• Gibbs Sampler is a special case of Metropolis-

Hastings

• Compare to EM algorithm, Gibbs sampler and

Metropolis-Hastings are stochastic procedures

• Verify convergence of the sequence

• Require Burn in

• Use multiple chains

42

References

• Geman and Geman 1984,

• Gelfand and Smith 1990,

• Tutorial paper:

Casella and George (1992). Explaining the

Gibbs sampler. The American Statistician,

46, 167-174.

Documents

Motivation - haowulab.org · Explaining the Gibbs sampler. The American Statistician, 46, 167-174. 9/21/2020 21 41 Remarks • Gibbs Sampler is a special case of Metropolis-Hastings