MCMC: Gibbs Sampler

MCMC:Gibbs Sampler

Building Complex Models: Conditional Sampling

● Consider a more complex model

● When sampling for each parameter iteratively, only need to consider distributions that have that parameter

pb , ,∣y ∝ p y∣b p b∣ , p p

pb∣ , , y ∝ p y∣b p b∣ ,

p∣b , , y ∝ p b∣ , p

p∣b , , y ∝ p b∣ , p

In one MCMC step

p bg1∣g ,g , y ∝ p y∣bg p bg∣g ,g

p g1∣bg1 ,g , y ∝ p bg1∣g ,g p g

p g1∣bg1 ,g1 , y ∝ pbg1∣g1 ,g p g

Gibbs Sampling

● If we can solve a conditional distribution analytically– We can sample from that conditional directly

– Even if we can't solve for full joint distribution analytically

● Example

p ,2∣y ∝N y∣ ,2N ∣0,V 0 IG 2∣ ,

=N n

2g y

0

2

n

2g

1

2

,1

1

2g

1

2

p 2g1∣

g1 ,y ∝ N y∣g1 ,2g IG

2g∣ ,

p g1∣

2g ,y ∝N y∣g ,2gN

g∣0,V 0

= IG n2 ,12∑ y i−

g12

Trade-offs of Gibbs Sampling

● 100% acceptance rate● Quicker mixing, quicker convergence● No need to specify a Jump distribution● No need to tune a Jump variance● Requires that we know the analytical solution

for the conditional– Conjugate distributions

● Can mix different samplers within a MCMC

Gibbs Sampler

1) Start from some initial parameter value c

2) Evaluate the unnormalized posterior

3) Propose a new parameter value Random draw from conditional distribution given the current value for all other parameters

4) Evaluate the new unnormalized posterior

5) Decide whether or not to accept the new valueAccept new value 100% of the time

6) Repeat 3-5

Metropolis Algorithm

)1 Start from some initial parameter value c

)2 Evaluate the unnormalized posterior p(c)

)3 Propose a new parameter value Random draw from a “jump” distribution centered on the current parameter value

)4 Evaluate the new unnormalized posterior p()

)5 Decide whether or not to accept the new valueAccept new value with probability

a = p() / p(c)

)6 Repeat 3-5

Bayesian Regressionvia Gibbs Sampling

● Consider the standard regression model

– Y = response variable

– Xj = covariates

– j = parameters

yi=b0b1 xi ,1b2 xi ,2⋯bn xi , ni

i~N 0,2

Matrix version

● As noted earlier this

can be expressed as

● The full data set can be expressed as

yi=b0b1 xi ,1b2 xi ,2⋯bn xi ,n

yi=X ib

y=X b

Quick review of matrix math

● Vector

● Vector x vector

b=[b1

b2

b3]

bT×b=[b1 b2 b3]×[b1

b2

b3]=b1

2b2

2b3

2

● Transpose

bT=[b1 b2 b3 ]3 x 11 x 3

1 x 3 3 x 1 1 x 1

See Appendix C for more details

Quick review of matrix math

● Vector x vector

● Vector x matrix

b× bT=[b1

b2

b3]×[b1 b2 b3]=[

b12 b1b2 b1b3

b2b1 b22 b2b3

b3b1 b3b2 b32 ]

● Matrix x Matrix

n x 3

3 x 1 1 x 1

X b=

3 x 1 n x 1

X XT=Zn x 3 n x n3 x n

● Matrix x Matrix

● Identity Matrix

XT X=[ ∑ xi ,12 ∑ xi ,1 xi ,2 ∑ x i ,1 xi ,3

∑ x i ,2 xi ,1 ∑ xi ,22 ∑ x i ,2 xi ,3

∑ xi ,3 xi ,1 ∑ xi ,3 xi ,2 ∑ xi ,32 ]

3 x n 3 x 3n x 3

I=[1 0 00 1 00 0 1]

● Matrix Inversion

X−1X=X X−1=I

...back to regression

y=X b ~N n ,0

~N n0,2 I Likelihood

p y ∣b ,2 ,X =N ny ∣ X b ,2 I

Priorsp b=N pb ∣ b0 ,V b

p 2=IG

2∣ s1 , s2

Process modelData model

Parameter model

General pattern

● Write down likelihood– Data model

– Process model

● Write down priors for the free parameters in the likelihood– Parameter model

● Solve for posterior distribution of model parameters– Analytical

– Numerical

Posterior

p b ,2∣X ,y∝N n y ∣X b ,2 I ×

N pb ∣ b0 ,V b IG 2∣ s1 , s2

Sample in terms of conditionals

p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b

p 2∣b ,X ,y∝N n y ∣X b ,2 I IG 2∣ s1 , s2

p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b

Regression Parameters

Will have a posterior that is multivariate Normal...

N n b ∣ vV ,V =1

2n /2∣∣1 /2exp [ 1

2b−Vv TV−1

b−Vv ]

What are V and v??

b−Vv TV−1b−Vv

bTV−1b− bTV−1V v−v V V−1bv V V−1Vv

bTV−1b− bT v−v bvVv

bTV−1b−2 bT vvVv

y−XbT 2 I −1y−Xbb−b0

TV b−1b−b0

−2 yT y−−2 yT Xb−−2bT XT y−2bT XT Xb

bTV b−1b− bTV b

−1 b0−b0TV b

−1bb0TV b

−1b0

bTV−1b

−2 bT v

−2bT XT Xb bT V b−1b V−1=−2 XT XV b

−1

−−2 yT Xb−−2bT XT y− bTV b−1 b0−b0

TV b−1b

v=−2XT yV b

−1 b0

p 2 ∣b ,X ,y∝N n y ∣X b ,2 I IG 2 ∣ s1 , s2

Variance

Will have a posterior that is Inverse Gamma...

IG 2∣u1 ,u2∝2

−u11exp [− u2

2 ]

∝1

n /2 exp [ 122 y−Xb

T y−Xb]2−s11exp [− s2

2 ]

u1=s1n2

u2=s212y−XbT y−Xb

BUGS implementation

model{

##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)

##likelihoodfor(i in 1:n){

mu[i] <- b[1] + x[i] * b[2]y[i] ~ dnorm(mu[i],tau)

}

}

b [ 1 ]

b [ 1 ]

0 . 0 2 0 . 0 4 0 . 0 6 0 . 0

P(b

[1])

0.0

0.0

50

.10

.15

b [ 2 ]

b [ 2 ]

- 1 0 . 0 - 5 . 0 0 . 0

P(b

[2])

0.0

0.5

1.0

b [ 1 ].1 0 0 2 0 . 0 3 0 . 0 4 0 . 0

b[2

]

-8.0

-6.0

-4.0

-2.0

t a u

t a u0 . 0 0 . 1 0 . 2 0 . 3

P(t

au

)

0.0

5.0

10

.01

5.0

20

.0

b [ ]1

l a g0 5 0 1 0 0

au

to c

orr

ela

tio

n-1

.0-0

.50

.00

.51

.0

Centering the data

model{

##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)mX <- mean(x[])

##likelihoodfor(i in 1:n){

mu[i] <- b[1] + (x[i]-mX) * b[2]y[i] ~ dnorm(mu[i],tau)

}

}

b [ 1 ]- 5 . 0 0 . 0 5 . 0 1 0 . 0

b[2

]

-8.0

-6.0

-4.0

-2.0

Outline

● Different numerical techniques for sampling from the posterior– Rejection Sampling

– Inverse Distribution Sampling

– Markov Chain-Monte Carlo (MCMC)● Metropolis● Metropolis-Hastings● Gibbs sampling

● Sampling conditionals vs full model● Flexibility to specify complex models

Where things stand...

● We've now filled our “toolbox” with the basic tools that will enable us to explore more advanced topics– Probability rules & distributions

– Data models & Likelihoods

– Process models

– Parameter models & Prior distributions

– Analytical and Numerical Methods in ML and Bayes

General pattern - Bayes

● Write down likelihood– Data model

– Process model

● Write down priors for the free parameters in the likelihood– Parameter model

● Solve for posterior distribution of model parameters– Analytical

– Numerical

General Pattern - Maximum Likelihood

● Step 1: Write a likelihood function describing the likelihood of the observation

● Step 2: Find the value of the model parameter that maximized the likelihood– Calculate -lnL

– Analytical● Take derivative with respect to EACH parameter● Set = 0 , solve for parameter

– Numerical● Find minimum of -lnL surface

dLd

=0

Where we're going...

● Section 2 is the “meat” of the course:– Interval estimation

● Confidence and Credible intervals● Predictive intervals

– Model selection and hypothesis testing

– GLMs

– Nonlinear models

– Hierarchal models and random effects

– Measurement error and missing data

● Section 3 is more advanced applications

Documents

MCMC: Gibbs Sampler