95
Data Analysis and Probabilistic Inference Approximate Inference: Sampling Recommended reading: Bishop, Chapter 11 Iain Murray’s tutorial: http://tinyurl.com/jxb6t7f Murphy, Chapters 23–24 Marc Deisenroth Department of Computing Imperial College London March 1, 2018

Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Data Analysis and Probabilistic Inference

Approximate Inference: SamplingRecommended reading:Bishop, Chapter 11Iain Murray’s tutorial: http://tinyurl.com/jxb6t7fMurphy, Chapters 23–24

Marc DeisenrothDepartment of ComputingImperial College London

March 1, 2018

Page 2: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Monte Carlo Methods—Motivation

§ Monte Carlo methods are computational techniques that makeuse of random numbers

§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability

distribution ppxq, e.g., for simulation (generative models) orrepresentations of distributions

2. Problem 2: Compute expectations of functions under thatdistribution:

Er f pxqs “ż

f pxqppxqdx

Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 2

Page 3: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Monte Carlo Methods—Motivation

§ Monte Carlo methods are computational techniques that makeuse of random numbers

§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability

distribution ppxq, e.g., for simulation (generative models) orrepresentations of distributions

2. Problem 2: Compute expectations of functions under thatdistribution:

Er f pxqs “ż

f pxqppxqdx

Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 2

Page 4: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Monte Carlo Methods—Motivation

§ Monte Carlo methods are computational techniques that makeuse of random numbers

§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability

distribution ppxq, e.g., for simulation (generative models) orrepresentations of distributions

2. Problem 2: Compute expectations of functions under thatdistribution:

Er f pxqs “ż

f pxqppxqdx

Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 2

Page 5: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Approximate Integration

§ Numerical integration (low-dimensional problems)

§ Bayesian quadrature, e.g., O’Hagan (1987, 1991); Rasmussen &Ghahramani (2003)

§ Variational Bayes, e.g., Jordan et al. (1999)

§ Expectation Propagation, Opper & Winther (2001); Minka (2001)

§ Monte-Carlo Methods, e.g., Gilks et al. (1996), Robert & Casella(2004), Bishop (2006)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 3

Page 6: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Problem 2: Monte Carlo Estimation

§ Computing expectations via statistical sampling:

Er f pxqs “ż

f pxqppxqdx

«1S

ÿS

s“1f pxpsqq, xpsq „ ppxq

§ Making predictions (e.g., Bayesian regression with inputs x andtargets y)

ppy|xq “ż

ppy|θ, xq ppθqloomoon

Parameter distribution

«1S

ÿS

s“1ppy|θpsq, xq , θpsq „ ppθq

§ Key problem: Generating samples from ppxq or ppθqNeed to solve Problem 1

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 4

Page 7: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Problem 2: Monte Carlo Estimation

§ Computing expectations via statistical sampling:

Er f pxqs “ż

f pxqppxqdx

«1S

ÿS

s“1f pxpsqq, xpsq „ ppxq

§ Making predictions (e.g., Bayesian regression with inputs x andtargets y)

ppy|xq “ż

ppy|θ, xq ppθqloomoon

Parameter distribution

«1S

ÿS

s“1ppy|θpsq, xq , θpsq „ ppθq

§ Key problem: Generating samples from ppxq or ppθqNeed to solve Problem 1

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 4

Page 8: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Problem 2: Monte Carlo Estimation

§ Computing expectations via statistical sampling:

Er f pxqs “ż

f pxqppxqdx

«1S

ÿS

s“1f pxpsqq, xpsq „ ppxq

§ Making predictions (e.g., Bayesian regression with inputs x andtargets y)

ppy|xq “ż

ppy|θ, xq ppθqloomoon

Parameter distribution

«1S

ÿS

s“1ppy|θpsq, xq , θpsq „ ppθq

§ Key problem: Generating samples from ppxq or ppθqNeed to solve Problem 1

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 4

Page 9: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties of Monte Carlo Sampling

Er f pxqs “ż

f pxqppxqdx

«1S

Sÿ

s“1

f pxpsqq, xpsq „ ppxq

§ Estimator is asymptotically consistent, i.e.,

limSÑ8

1S

Sÿ

s“1

f pxpsqq “ Er f pxqs ` ε

§ Error ε is normal (Gaussian) and its variance shrinks 9 1{S,independent of the dimensionality

§ Estimator is unbiasedApproximate Inference: Sampling Marc Deisenroth March 1, 2018 5

Page 10: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Monte Carlo Estimation

Er f pxqs “ż

f pxqppxqdx

«1S

Sÿ

s“1

f pxpsqq, xpsq „ ppxq

§ How do we get these samples?

Need to solve Problem 1

§ Sampling from simple distributions

§ Sampling from complicated distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 6

Page 11: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Monte Carlo Estimation

Er f pxqs “ż

f pxqppxqdx

«1S

Sÿ

s“1

f pxpsqq, xpsq „ ppxq

§ How do we get these samples?

Need to solve Problem 1

§ Sampling from simple distributions

§ Sampling from complicated distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 6

Page 12: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Sampling Discrete Values

p = 0.3 p = 0.2 p = 0.5a b c

u = 0.55

§ u „ U r0, 1s, where U is the uniform distribution

§ u “ 0.55 ñ x “ c

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 7

Page 13: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Continuous Variables

p(z)

z

~

More complicated.Geometric intuition: sample uniformly from the area under the curve

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 8

Page 14: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Rejection Sampling: Setting

p(z)

z

~

§ Assume:§ Sampling from ppzq is difficult§ Evaluating p̃pzq “ Zppzq is easy (and Z may be unknown)

§ Find a simpler distribution (proposal distribution) qpzq fromwhich we can easily draw samples (e.g., Gaussian, Laplace)

§ Find an upper bound kqpzq ě p̃pzq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 9

Page 15: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Rejection Sampling: Setting

p(z)

z

~

§ Assume:§ Sampling from ppzq is difficult§ Evaluating p̃pzq “ Zppzq is easy (and Z may be unknown)

§ Find a simpler distribution (proposal distribution) qpzq fromwhich we can easily draw samples (e.g., Gaussian, Laplace)

§ Find an upper bound kqpzq ě p̃pzq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 9

Page 16: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Rejection Sampling: Algorithm

z0 z

u0

kq(z0)kq(z)

p(z)~

acceptance arearejection area

Adapted from PRML (Bishop, 2006)

1. Generate z0 „ qpzq

2. Generate u0 „ U r0, kqpz0qs

3. If u0 ą p̃pz0q, reject the sample. Otherwise, retain z0

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 10

Page 17: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

z0 z

u0

kq(z0)kq(z)

p(z)~

acceptance arearejection area

Adapted from PRML (Bishop, 2006)

§ Accepted pairs pz, uq are uniformly distributed under the curveof p̃pzq

§ Marginal probability density of the z-coordiantes of acceptedpoints must be proportional to p̃pzq

§ Samples are independent samples from ppzqApproximate Inference: Sampling Marc Deisenroth March 1, 2018 11

Page 18: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Sampling in High Dimensions

Example:

§ ppxq “ N`

0, σ2p I˘

, qpxq “ N`

0, σ2q I˘

where σq “ 1.01σp

§ What is the value of k if x P R1000?

§ qp0q “ 1{p2πσ2q q

500 For kq ě p we need to set

k ěpp0qqp0q

“pσ2

q q500

pσ2pq

500 “ exp`

1000 lnσq

σp

˘

“ expp1000 ln 1.01q « 20, 000

§ Acceptance rate is the ratio of the volume under p to the volumeunder kq. In our example: 1{k “ 1{20, 000.

§ In high dimensions the factor k is probably hugeLow acceptance rate

§ Finding k is tricky

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 12

Page 19: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Sampling in High Dimensions

Example:

§ ppxq “ N`

0, σ2p I˘

, qpxq “ N`

0, σ2q I˘

where σq “ 1.01σp

§ What is the value of k if x P R1000?

§ qp0q “ 1{p2πσ2q q

500 For kq ě p we need to set

k ěpp0qqp0q

“pσ2

q q500

pσ2pq

500 “ exp`

1000 lnσq

σp

˘

“ expp1000 ln 1.01q « 20, 000

§ Acceptance rate is the ratio of the volume under p to the volumeunder kq. In our example: 1{k “ 1{20, 000.

§ In high dimensions the factor k is probably hugeLow acceptance rate

§ Finding k is tricky

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 12

Page 20: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Shortcomings

z0 z

u0

kq(z0)kq(z)

p(z)~

acceptance arearejection area

Adapted from PRML (Bishop, 2006)

§ Finding the upper bound k is tricky

§ In high dimensions the factor k is probably huge

§ Low acceptance rate/high rejection rate of samples

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 13

Page 21: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Shortcomings

z0 z

u0

kq(z0)kq(z)

p(z)~

acceptance arearejection area

Adapted from PRML (Bishop, 2006)

§ Finding the upper bound k is tricky

§ In high dimensions the factor k is probably huge

§ Low acceptance rate/high rejection rate of samples

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 13

Page 22: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Shortcomings

z0 z

u0

kq(z0)kq(z)

p(z)~

acceptance arearejection area

Adapted from PRML (Bishop, 2006)

§ Finding the upper bound k is tricky

§ In high dimensions the factor k is probably huge

§ Low acceptance rate/high rejection rate of samples

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 13

Page 23: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx “ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq

, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 24: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx

ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq

, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 25: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx “ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq

, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 26: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx “ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq

, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 27: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx “ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq

, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 28: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Importance Sampling

Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):

Epr f pxqs “ż

f pxqppxqdx

ż

f pxqppxqqpxqqpxq

dx “ż

f pxqppxqqpxq

qpxqdx

“ Eq

f pxqppxqqpxq

If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:

Eq

f pxqppxqqpxq

«1S

Sÿ

s“1

f pxpsqqppxpsqqqpxpsqq

“1S

Sÿ

s“1

ws f pxpsqq, xpsq „ qpxq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 14

Page 29: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 30: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 31: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 32: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 33: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 34: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p

§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)

Degeneracy (see also Particle Filtering (Thrun et al., 2005))

§ Many draws from proposal density q required, especially in highdimensions

§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).

§ Does not scale to interesting (high-dimensional) problems

Different approach to sample from complicated (high-dimensional)distributions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 15

Page 35: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Markov Chain Monte Carlo

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 16

Page 36: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

ObjectiveGenerate samples from an unknown target distribution.

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 17

Page 37: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Markov Chains

Key idea: Instead of generating independent samples xp1q, xp2q, . . . ,use a proposal density q that depends on the previous sample (state)xptq

Samples are dependent

§ Markov property:ppxpt`1q|xp1q, . . . , xptqq “ ppxpt`1q|xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain

§ T is called a transition operator

§ Example: Tpxpt`1q|xptqq “ N`

xpt`1q | xptq, σ2I˘

§ Samples xp1q, . . . , xptq form a Markov chain

§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 18

Page 38: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Markov Chains

Key idea: Instead of generating independent samples xp1q, xp2q, . . . ,use a proposal density q that depends on the previous sample (state)xptq

Samples are dependent

§ Markov property:ppxpt`1q|xp1q, . . . , xptqq “ ppxpt`1q|xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain

§ T is called a transition operator

§ Example: Tpxpt`1q|xptqq “ N`

xpt`1q | xptq, σ2I˘

§ Samples xp1q, . . . , xptq form a Markov chain

§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 18

Page 39: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Markov Chains

Key idea: Instead of generating independent samples xp1q, xp2q, . . . ,use a proposal density q that depends on the previous sample (state)xptq

Samples are dependent

§ Markov property:ppxpt`1q|xp1q, . . . , xptqq “ ppxpt`1q|xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain

§ T is called a transition operator

§ Example: Tpxpt`1q|xptqq “ N`

xpt`1q | xptq, σ2I˘

§ Samples xp1q, . . . , xptq form a Markov chain

§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 18

Page 40: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Behavior of Markov Chains

From Iain Murray’s MCMC Tutorial

Four different behaviors of Markov chains:

§ Diverge (e.g., random walk diffusion where xpt`1q „ N`

xptq, I˘

)

§ Converge to an absorbing state

§ Converge to a (deterministic) limit cycle

§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 19

Page 41: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Behavior of Markov Chains

From Iain Murray’s MCMC Tutorial

Four different behaviors of Markov chains:

§ Diverge (e.g., random walk diffusion where xpt`1q „ N`

xptq, I˘

)

§ Converge to an absorbing state

§ Converge to a (deterministic) limit cycle

§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 19

Page 42: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Behavior of Markov Chains

From Iain Murray’s MCMC Tutorial

Four different behaviors of Markov chains:

§ Diverge (e.g., random walk diffusion where xpt`1q „ N`

xptq, I˘

)

§ Converge to an absorbing state

§ Converge to a (deterministic) limit cycle

§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 19

Page 43: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Behavior of Markov Chains

From Iain Murray’s MCMC Tutorial

Four different behaviors of Markov chains:

§ Diverge (e.g., random walk diffusion where xpt`1q „ N`

xptq, I˘

)

§ Converge to an absorbing state

§ Converge to a (deterministic) limit cycle

§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 19

Page 44: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Converging to an Equilibrium Distribution

§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)

Bouncing around in an equilibrium distribution is a good thing

§ Design the Markov chain such that the equilibrium distributionis the desired distribution ppxq

§ Generate a Markov chain that converges to that equilibriumdistribution (independent of start state)

§ Although successive samples are dependent we can effectivelygenerate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 20

Page 45: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Converging to an Equilibrium Distribution

§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)

Bouncing around in an equilibrium distribution is a good thing

§ Design the Markov chain such that the equilibrium distributionis the desired distribution ppxq

§ Generate a Markov chain that converges to that equilibriumdistribution (independent of start state)

§ Although successive samples are dependent we can effectivelygenerate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 20

Page 46: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Converging to an Equilibrium Distribution

§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)

Bouncing around in an equilibrium distribution is a good thing

§ Design the Markov chain such that the equilibrium distributionis the desired distribution ppxq

§ Generate a Markov chain that converges to that equilibriumdistribution (independent of start state)

§ Although successive samples are dependent we can effectivelygenerate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 20

Page 47: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Converging to an Equilibrium Distribution

§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)

Bouncing around in an equilibrium distribution is a good thing

§ Design the Markov chain such that the equilibrium distributionis the desired distribution ppxq

§ Generate a Markov chain that converges to that equilibriumdistribution (independent of start state)

§ Although successive samples are dependent we can effectivelygenerate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 20

Page 48: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Conditions for Converging to an EquilibriumDistribution

2 Markov chain conditions:

§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.

Self-consistency property

§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start

PropertyErgodic Markov chains only have one equilibrium distribution

Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 21

Page 49: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Conditions for Converging to an EquilibriumDistribution

2 Markov chain conditions:

§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.

Self-consistency property

§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start

PropertyErgodic Markov chains only have one equilibrium distribution

Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 21

Page 50: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Conditions for Converging to an EquilibriumDistribution

2 Markov chain conditions:

§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.

Self-consistency property

§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start

PropertyErgodic Markov chains only have one equilibrium distribution

Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 21

Page 51: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Conditions for Converging to an EquilibriumDistribution

2 Markov chain conditions:

§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.

Self-consistency property

§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start

PropertyErgodic Markov chains only have one equilibrium distribution

Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 21

Page 52: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Invariance and Detailed Balance

§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):

p˚px1q “ÿ

xTpx1|xqp˚pxq p˚px1q “

ż

Tpx1|xqp˚pxqdx

Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚

§ Sufficient condition for p˚ being invariant:Detailed balance:

p˚pxqTpx1|xq “ p˚px1qTpx|x1q

Also ensures that the Markov chain is reversible

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 22

Page 53: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Invariance and Detailed Balance

§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):

p˚px1q “ÿ

xTpx1|xqp˚pxq p˚px1q “

ż

Tpx1|xqp˚pxqdx

Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚

§ Sufficient condition for p˚ being invariant:Detailed balance:

p˚pxqTpx1|xq “ p˚px1qTpx|x1q

Also ensures that the Markov chain is reversible

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 22

Page 54: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Invariance and Detailed Balance

§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):

p˚px1q “ÿ

xTpx1|xqp˚pxq p˚px1q “

ż

Tpx1|xqp˚pxqdx

Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚

§ Sufficient condition for p˚ being invariant:Detailed balance:

p˚pxqTpx1|xq “ p˚px1qTpx|x1q

Also ensures that the Markov chain is reversible

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 22

Page 55: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Metropolis-Hastings

§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.

Example: Gaussian with mean xptq: qpx1|xptqq “ N`

xptq, Σ˘

Metropolis-Hastings Algorithm

1. Generate proposal x1 „ qpx1|xptqq

2. Ifqpxptq|x1qp̃px1q

qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s

accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.

§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 23

Page 56: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Metropolis-Hastings

§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.

Example: Gaussian with mean xptq: qpx1|xptqq “ N`

xptq, Σ˘

Metropolis-Hastings Algorithm

1. Generate proposal x1 „ qpx1|xptqq

2. Ifqpxptq|x1qp̃px1q

qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s

accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.

§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 23

Page 57: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Metropolis-Hastings

§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.

Example: Gaussian with mean xptq: qpx1|xptqq “ N`

xptq, Σ˘

Metropolis-Hastings Algorithm

1. Generate proposal x1 „ qpx1|xptqq

2. Ifqpxptq|x1qp̃px1q

qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s

accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.

§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 23

Page 58: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Example

From Iain Murray’s MCMC Tutorial

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 24

Page 59: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo

§ Explore ppxq “ N`

x | 0, 1˘

for different step sizes σ.

§ We can only evaluate log p̃pxq “ ´x2{2

§ Proposal distribution q: Gaussian N`

xpt`1q | xptq, σ2˘

centered atthe current state for various step sizes σ

§ Expect to explore the space between ´2, 2 with high probability

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 25

Page 60: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo: Discussion

§ Acceptance rate depends on the step size of the proposaldistribution

Exploration parameter

§ If we do not reject enough, the method does not work.

§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.

§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties

§ Tune the step size

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 26

Page 61: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo: Discussion

§ Acceptance rate depends on the step size of the proposaldistribution

Exploration parameter

§ If we do not reject enough, the method does not work.

§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.

§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties

§ Tune the step size

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 26

Page 62: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo: Discussion

§ Acceptance rate depends on the step size of the proposaldistribution

Exploration parameter

§ If we do not reject enough, the method does not work.

§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.

§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties

§ Tune the step size

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 26

Page 63: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo: Discussion

§ Acceptance rate depends on the step size of the proposaldistribution

Exploration parameter

§ If we do not reject enough, the method does not work.

§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.

§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties

§ Tune the step size

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 26

Page 64: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Step-Size Demo: Discussion

§ Acceptance rate depends on the step size of the proposaldistribution

Exploration parameter

§ If we do not reject enough, the method does not work.

§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.

§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties

§ Tune the step size

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 26

Page 65: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Samples are correlatedAdaptive rejection sampling generates independent samples

§ Unlike rejection sampling, the previous sample is used to resetthe chain (if a sample was discarded)

§ If q ą 0, we will end up in the equilibrium distribution:pptqpxq tÑ8

ÝÑ p˚pxq

§ Explore the state space by random walkMay take a while in high dimensions

§ No further catastrophic problems in high dimensions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 27

Page 66: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Samples are correlatedAdaptive rejection sampling generates independent samples

§ Unlike rejection sampling, the previous sample is used to resetthe chain (if a sample was discarded)

§ If q ą 0, we will end up in the equilibrium distribution:pptqpxq tÑ8

ÝÑ p˚pxq

§ Explore the state space by random walkMay take a while in high dimensions

§ No further catastrophic problems in high dimensions

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 27

Page 67: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling (Geman & Geman, 1984)

§ Assumption: ppxq “ ppx1, . . . , xnq is too complicated to drawsamples from directly, but its conditionals ppxi|xziq are tractableto work with

§ Any distribution “with a name” (Gaussian, Laplace, Bernoulli,Gamma, Wishart, ...) is easy to sample from (standard libraries)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 28

Page 68: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Algorithm

Assuming n parameters x1, . . . , xn,Gibbs sampling samples individualvariables conditioned on all others:

1. xpt`1q1 „ ppx1|x

ptq2 , . . . , xptqn q

2. xpt`1q2 „ ppx2|x

pt`1q1 , xptq3 , . . . , xptqn q

3....

4. xpt`1qn „ ppxn|x

pt`1q1 , . . . , xpt`1q

n´1 q z1

z2

L

l

From PRML (Bishop, 2006)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 29

Page 69: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Ergodicity

§ ppxq is invariant§ Ergodicity: Sufficient to show that all conditionals are greater

than 0.Then any point in x-space can be reached from any other point

(potentially with low probability) in a finite number of stepsinvolving one update of each of the component variables.

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 30

Page 70: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Finding the Conditionals

1. Write down the (log-) joint distribution ppx1, . . . , xnq

2. For each xi

2.1 Throw away all terms that do not depend on the current samplingvariable

2.2 Pretend this is the density for your variable of interest and allother variables are fixed. What distribution does the log-densityremind you of?

2.3 That is your conditional sampling density ppxi|xziq

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 31

Page 71: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Example

§ Model:

yi „ N`

µ, τ´1˘ , µ „ N`

0, 1˘

, τ „ Gammap2, 1q

§ Objective: Generate samples from the parameter posteriorppµ, τ|yq

§ Then

ppy, µ, τq “nź

i“1

ppyi|µ, τqppµqppτq

9τn{2 expp´τ

2

ÿ

ipyi ´ µq2q expp´1

2 µ2qτ expp´τq

ppµ|τ, yq “ N` τ

ř

i yi1`nτ , p1` nτq´1˘

ppτ|µ, yq “ Gammap2` n2 , 1` 1

2

ÿ

ipyi ´ µq2q

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 32

Page 72: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Example

§ Model:

yi „ N`

µ, τ´1˘ , µ „ N`

0, 1˘

, τ „ Gammap2, 1q

§ Objective: Generate samples from the parameter posteriorppµ, τ|yq

§ Then

ppy, µ, τq “nź

i“1

ppyi|µ, τqppµqppτq

9τn{2 expp´τ

2

ÿ

ipyi ´ µq2q expp´ 1

2 µ2qτ expp´τq

ppµ|τ, yq “ N` τ

ř

i yi1`nτ , p1` nτq´1˘

ppτ|µ, yq “ Gammap2` n2 , 1` 1

2

ÿ

ipyi ´ µq2q

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 32

Page 73: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Example

§ Model:

yi „ N`

µ, τ´1˘ , µ „ N`

0, 1˘

, τ „ Gammap2, 1q

§ Objective: Generate samples from the parameter posteriorppµ, τ|yq

§ Then

ppy, µ, τq “nź

i“1

ppyi|µ, τqppµqppτq

9τn{2 expp´τ

2

ÿ

ipyi ´ µq2q expp´ 1

2 µ2qτ expp´τq

ppµ|τ, yq “ N` τ

ř

i yi1`nτ , p1` nτq´1˘

ppτ|µ, yq “ Gammap2` n2 , 1` 1

2

ÿ

ipyi ´ µq2q

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 32

Page 74: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Properties

§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq

Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq

Exploration by random walk behavior can be slow

§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from

the conditionals§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it

works out how to do the updates: STAN1, WinBUGS2, JAGS3

1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 33

Page 75: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Properties

§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq

Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq

Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)

§ Applicability depends on how easy it is to draw samples fromthe conditionals

§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it

works out how to do the updates: STAN1, WinBUGS2, JAGS3

1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 33

Page 76: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Properties

§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq

Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq

Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from

the conditionals

§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it

works out how to do the updates: STAN1, WinBUGS2, JAGS3

1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 33

Page 77: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Properties

§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq

Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq

Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from

the conditionals§ May not work well if the variables are correlated

§ Statistical software derives the conditionals of the model, and itworks out how to do the updates: STAN1, WinBUGS2, JAGS3

1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 33

Page 78: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Gibbs Sampling: Properties

§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq

Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq

Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from

the conditionals§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it

works out how to do the updates: STAN1, WinBUGS2, JAGS3

1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 33

Page 79: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Flavors of Gibbs Sampling

§ Collapsed Gibbs sampler: Analytically integrate out someparameters and sample the rest.

Tends to be much more efficient with smaller variance(see Rao-Blackwellization in the state estimation literature)

§ Block-Gibbs sampler: Sample groups of variables at a timeinstead of single-site updating

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 34

Page 80: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling (Neal, 2003)

p(z)

z

~

§ Idea: Sample point (randomwalk) uniformly under the curvep̃pxq

§ Introduce additional variable u, define joint p̂px, uq:

p̂px, uq “

#

1{Zp if 0 ď u ď p̃pxq0 otherwise

, Zp “

ż

p̃pxqdx

§ The marginal distribution over x is thenż

p̂px, uqdu “ż p̃pxq

01{Zpdu “ p̃pxq{Zp “ ppxq

Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values

§ Gibbs sampling: Update one variable at a time

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 35

Page 81: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling (Neal, 2003)

p(z)

z

~

§ Idea: Sample point (randomwalk) uniformly under the curvep̃pxq

§ Introduce additional variable u, define joint p̂px, uq:

p̂px, uq “

#

1{Zp if 0 ď u ď p̃pxq0 otherwise

, Zp “

ż

p̃pxqdx

§ The marginal distribution over x is thenż

p̂px, uqdu “ż p̃pxq

01{Zpdu “ p̃pxq{Zp “ ppxq

Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values

§ Gibbs sampling: Update one variable at a time

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 35

Page 82: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling (Neal, 2003)

p(z)

z

~

§ Idea: Sample point (randomwalk) uniformly under the curvep̃pxq

§ Introduce additional variable u, define joint p̂px, uq:

p̂px, uq “

#

1{Zp if 0 ď u ď p̃pxq0 otherwise

, Zp “

ż

p̃pxqdx

§ The marginal distribution over x is thenż

p̂px, uqdu “ż p̃pxq

01{Zpdu “ p̃pxq{Zp “ ppxq

Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values

§ Gibbs sampling: Update one variable at a time

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 35

Page 83: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling (Neal, 2003)

p(z)

z

~

§ Idea: Sample point (randomwalk) uniformly under the curvep̃pxq

§ Introduce additional variable u, define joint p̂px, uq:

p̂px, uq “

#

1{Zp if 0 ď u ď p̃pxq0 otherwise

, Zp “

ż

p̃pxqdx

§ The marginal distribution over x is thenż

p̂px, uqdu “ż p̃pxq

01{Zpdu “ p̃pxq{Zp “ ppxq

Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values

§ Gibbs sampling: Update one variable at a time

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 35

Page 84: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling (Neal, 2003)

p(z)

z

~

§ Idea: Sample point (randomwalk) uniformly under the curvep̃pxq

§ Introduce additional variable u, define joint p̂px, uq:

p̂px, uq “

#

1{Zp if 0 ď u ď p̃pxq0 otherwise

, Zp “

ż

p̃pxqdx

§ The marginal distribution over x is thenż

p̂px, uqdu “ż p̃pxq

01{Zpdu “ p̃pxq{Zp “ ppxq

Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values

§ Gibbs sampling: Update one variable at a timeApproximate Inference: Sampling Marc Deisenroth March 1, 2018 35

Page 85: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling Algorithm

p(x)

x(t)

u

x

~ p(x)

x(t)

uxmin xmax

~

xAdapted from PRML (Bishop, 2006)

§ Repeat the following steps:1. Draw u|xptq „ U r0, p̃pxqs2. Draw xpt`1q|u „ U rtx : p̃pxq ą uus slice

§ In practice, we sample xpt`1q|u uniformly from an intervalrxmin, xmaxs around xptq.

§ The interval is found adaptively (see Neal (2003) for details)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 36

Page 86: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Slice Sampling Algorithm

p(x)

x(t)

u

x

~ p(x)

x(t)

uxmin xmax

~

xAdapted from PRML (Bishop, 2006)

§ Repeat the following steps:1. Draw u|xptq „ U r0, p̃pxqs2. Draw xpt`1q|u „ U rtx : p̃pxq ą uus slice

§ In practice, we sample xpt`1q|u uniformly from an intervalrxmin, xmaxs around xptq.

§ The interval is found adaptively (see Neal (2003) for details)

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 36

Page 87: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Relation to other Sampling Methods

Similar to:

§ Metropolis: Just need to be able to evaluate p̃pxqMore robust to the choice of parameters (e.g., step size isautomatically adapted)

§ Gibbs: 1-dimensional transitions in state spaceNo longer required that we can easily sample from 1-Dconditionals

§ Rejection: Asymptotically draw samples from the volume underthe curve described by p̃No upper-bounding of p̃ required

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 37

Page 88: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Slice sampling can be applied to multivariate distributions byrepeatedly sampling each variable/dimension in turn (similar toGibbs sampling).

See (Neal, 2003; Murray et al., 2010) for more details

§ This requires to compute a function that is proportional toppxi|xziq for all variables xi.

§ No rejections

§ Adaptive step sizes

§ Easy to implement

§ Broadly applicable

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 38

Page 89: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

Properties

§ Slice sampling can be applied to multivariate distributions byrepeatedly sampling each variable/dimension in turn (similar toGibbs sampling).

See (Neal, 2003; Murray et al., 2010) for more details

§ This requires to compute a function that is proportional toppxi|xziq for all variables xi.

§ No rejections

§ Adaptive step sizes

§ Easy to implement

§ Broadly applicable

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 38

Page 90: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

MCMC: Correlated Samples

§ Samples from the Markov chain before the equilibriumdistribution is reached should be discarded (burn-in phase)

§ MCMC generates dependent samplesIntroduces additional variance in the Monte-Carlo estimator

1S

Sÿ

s“1

f pxpsqq , xpsq „ ppxq

due to correlation of samples§ If we want independent samples, take only every Kth sample

(thinning)Does not decrease the efficiency of the sampler, but reducesmemory footprint

§ Autocorrelation is an indicator for choosing K

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 39

Page 91: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

MCMC: Correlated Samples

§ Samples from the Markov chain before the equilibriumdistribution is reached should be discarded (burn-in phase)

§ MCMC generates dependent samplesIntroduces additional variance in the Monte-Carlo estimator

1S

Sÿ

s“1

f pxpsqq , xpsq „ ppxq

due to correlation of samples

§ If we want independent samples, take only every Kth sample(thinning)Does not decrease the efficiency of the sampler, but reducesmemory footprint

§ Autocorrelation is an indicator for choosing K

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 39

Page 92: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

MCMC: Correlated Samples

§ Samples from the Markov chain before the equilibriumdistribution is reached should be discarded (burn-in phase)

§ MCMC generates dependent samplesIntroduces additional variance in the Monte-Carlo estimator

1S

Sÿ

s“1

f pxpsqq , xpsq „ ppxq

due to correlation of samples§ If we want independent samples, take only every Kth sample

(thinning)Does not decrease the efficiency of the sampler, but reducesmemory footprint

§ Autocorrelation is an indicator for choosing K

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 39

Page 93: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

MCMC: Correlated Samples

§ Samples from the Markov chain before the equilibriumdistribution is reached should be discarded (burn-in phase)

§ MCMC generates dependent samplesIntroduces additional variance in the Monte-Carlo estimator

1S

Sÿ

s“1

f pxpsqq , xpsq „ ppxq

due to correlation of samples§ If we want independent samples, take only every Kth sample

(thinning)Does not decrease the efficiency of the sampler, but reducesmemory footprint

§ Autocorrelation is an indicator for choosing K

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 39

Page 94: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

MCMC Diagnostics: Trace Plots

0 200 400 600 800 1000−10

0

10

20

30

40

50MH N(0,1.000

2), Rhat = 1.493

0 200 400 600 800 1000−60

−40

−20

0

20

40

60gibbs, Rhat = 1.007

Figure from Murphy (2012)

§ Mixing time: Amount of time it takes the Markov chain toconverge to the stationary distribution and forget its initial state.

§ Trace plots: Run multiple chains from very different startingpoints, plot the samples of the variables of interest. If the chainhas mixed, the trace plots should converge to the samedistribution.

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 40

Page 95: Approximate Inference: Sampling - Marc Deisenroth · Adapted from PRML (Bishop, 2006) 1.Generate z 0 qpzq 2.Generate u 0 Ur0,kqpz 0qs 3.If u 0 ¡p˜pz 0q, reject the sample. Otherwise,

References I

[1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006.

[2] S. Geman and D. Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEETransactions on Pattern Analysis and Machine Intelligence, 6(6):721–741, 1984.

[3] W. Hastings. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika, 57(1):97–109,1970.

[4] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An Introduction to Variational Methods for Graphical Models.Machine Learning, 37:183–233, 1999.

[5] J. S. Liu, W. Hung, W. And, and A. Kong. Covariance Structure of the Gibbs Sampler with Applications to theComparisons of Estimators and Augmentation Schemes. Biometrika, 81(1):27–40, 1994.

[6] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equations of State Calculations by FastComputing Machines. Journal of Chemical Physics, 21(6):1087–1092, 1953.

[7] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology,Cambridge, MA, USA, January 2001.

[8] K. P. Murphy. Machine Learning: A Proabilistic Perspective. MIT Press, Cambridge, MA, USA, 2012.

[9] I. Murray, R. P. Adams, and D. J. MacKay. Elliptical Slice Sampling. In Y. W. Teh and M. Titterington, editors, Proceedings ofthe 13th International Conference on Artificial Intelligence and Statistics, JMLR: W&CP 9, pages 541–548, 2010.

[10] R. M. Neal. Slice Sampling. Annals of Statistics, 31(3):705–767, 2003.

[11] A. O’Hagan. Monte Carlo is Fundamentally Unsound. The Statistician, 36(2/3):247–249, 1987.

[12] A. O’Hagan. Bayes-Hermite Quadrature. Journal of Statistical Planning and Inference, 29:245–260, 1991.

[13] M. Opper and O. Winther. Adaptive and Self-averaging Thouless-Anderson-Palmer Mean-field Theory for ProbabilisticModeling. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 64:056131, Oct 2001.

[14] C. E. Rasmussen and Z. Ghahramani. Bayesian Monte Carlo. In S. Becker, S. Thrun, and K. Obermayer, editors, Advancesin Neural Information Processing Systems 15, pages 489–496. The MIT Press, Cambridge, MA, USA, 2003.

[15] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. The MIT Press, Cambridge, MA, USA, 2005.

Approximate Inference: Sampling Marc Deisenroth March 1, 2018 41