37
Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References Kernel Sequential Monte Carlo Ingmar Schuster * (Paris Dauphine) Heiko Strathmann * (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37

Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Kernel Sequential Monte Carlo

Ingmar Schuster∗ (Paris Dauphine)Heiko Strathmann∗ (University College London)

Brooks Paige (Oxford)Dino Sejdinovic (Oxford)

* equal contribution

April 25, 2016

1 / 37

Page 2: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 1

Outline

2 / 37

Page 3: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

1 IntroductionImportance Sampling, PMC and SMCIntractable likelihoodsKernel emulators

2 Kernel SMC

3 Implementation Details

4 Evaluation

5 Conclusion

3 / 37

Page 4: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 2

Introduction

4 / 37

Page 5: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Importance Sampling estimators

Importance Sampling identity

H =

∫π(x)h(x)dx =

∫π(x)

q(x)h(x)q(x)dx

≈ 1

N∑i=1

w(Xi )h(Xi )

where Xi ∼ q iid, w(X ) = π(X )/q(X ) called unnormalizedimportance weight, wΣ =

∑Ni=1 w(Xi )

PMC identity: for any law g over proposals

H =

∫∫π(x)

qt(x)h(x) dqt(x) dg(qt) =

1

T∑t=1

N∑i=1

wt(Xi )h(Xi )

5 / 37

Page 6: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Proposal fatter than target

π(x)

q(x)

4 2 0 2 4

w(x

)=π(x

)/q(x)

6 / 37

Page 7: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Proposal thinner than target

π(x)

q(x)

4 2 0 2 4

w(x

)=π(x

)/q(x)

7 / 37

Page 8: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Population Monte Carlo Cappe et al. (2004)

Input: initial proposal density q0, unnormalized density π,population size N, sample size mOutput: lists P,W of m samples and weightsInitialize P = List()Initialize W = List()while len(P) ≤ m do

construct proposal distribution qtgenerate set of p samples Xt from qt and append it to P

for all X ∈ Xt append weights π(X )/qt(X ) to Wend while

8 / 37

Page 9: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Sequential Monte Carlo Samplers

Approximate integrals with respect to target distribution πT

Build upon Importance Sampling: approximate integral of hwrt density πT using samples following density q (undercertain conditions):∫

h(x)dπT (x) =

∫h(x)

πT (x)

q(x)dq(x)

Given prior π0, build sequence π0, . . . , πi , . . . πT such that

πi+1 is closer to πT than πi(δ(πi+1, πT ) < δ(πi , πT ) for some divergence δ)sample from πi can approximate πi+1 well usingimportance weight function w(·) = πi+1(·)/πi (·)

9 / 37

Page 10: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Sequential Monte Carlo Samplers

At i = 0

Using proposal density q0, generate particles{(w0,j ,X0,j)}Nj=1 where w0,j = π0(X0,j)/q0(X0,j)importance resampling, resulting in Nequally weighted particles {(1/N, X0,j)}Nj=1

rejuvenation move for each X0,j byMarkov Kernel leaving π0 invariant

At i > 0

approximate πi by {(πi (Xi−1,j)/πi−1(Xi−1,j),Xi−1,j)}Nj=1

resamplingrejuvenation leaving πi invariantif πi 6= πT , repeat

10 / 37

Page 11: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

A visual SMC iteration

Target and proposal

πi(x)

πi−1(x)

Weighted samples

Resampling proportional to weights

4 2 0 2 4

MCMC rejuvenation

11 / 37

Page 12: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Importance Sampling, PMC and SMC

Sequential Monte Carlo Samplers

estimate evidence ZT of πT by

ZT ≈ Z0

T∏i=1

1

N

∑j

wi ,j

(aka normalizing constant, marginal likelihood)

Can be adaptive in rejuvenation steps without diminishingadaptation as required in adaptive MCMC

Will construct rejuvenation using RKHS-embedding ofparticles

12 / 37

Page 13: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Intractable likelihoods

Intractable Likelihoods and Evidence

intractable likelihoods arise in many models (e.g.nonconjugate latent variable models)

for unbiased likelihood estimates, SMC/PMC still valid

simple case: estimate likelihood using IS or SMC, leads to IS2

(Tran et al., 2013) and SMC2 (Chopin et al., 2011)

results in noisy Importance Weights, but approximation ofevidence (probability of model given data) is still valid (Tranet al., 2013, Lemma 3)

cannot easily use information on geometry of π for efficientinference (e.g. gradients unavailable)

13 / 37

Page 14: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Kernel emulators

Kernel emulators

In the following: adapt RKHS-based emulators to PMC andSMC in intractable likelihood settings for adapting to targetgeometry

Using pd kernel k(·, ·) we can

adapt to local covariance (Sejdinovic et al., 2014)use gradient information of infinite exponential familyapproximation to π (Strathmann et al., 2015)

Emulators used for constructing proposals qt and use

importance correction in PMCMetropolis-Hastings correction within in SMC rejuvenationmoves

for samples X ∼ qt

14 / 37

Page 15: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Kernel emulators

Kernel Emulators

Local covariance: let k′(y , x) = ∇yk(y , x) andµ(y) =

∫k′(y , x)dπ(x) then

K (y) =

∫(k′(y , x)− µ(y))2dπ(x)

Gradient emulationfit infinite exponential family approximation

q(y) = exp(f (y)− A(f ))

where f (y) = 〈f , k(y , ·)〉H is the inner product between naturalparameters f and sufficient statistics k(y , ·) in H by minimizing∫

(∇y log π(y)−∇y f (y))2dπ(y)

use gradients information of log q in proposals

15 / 37

Page 16: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 3

Kernel SMC

16 / 37

Page 17: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Kernel Gradient Importance sampling

Use N (·|X + δ1∇X log q(X ), δ2C ) proposals with importanceweighting in PMC

C is a fit to global covariance of target π

resulting in Kernel Gradient Importance Sampling (KGRIS)

variant of Gradient Importance Sampling (Schuster, 2015)

17 / 37

Page 18: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Kernel Adaptive SMC Sampler

Use artificial sequence of distributions leading from prior π0 toposterior πT

rejuvenation with MH moves using N (·|X , δK (X )) proposals

resulting in Kernel Adaptive SMC (KASMC)

similar to Adaptive SMC sampler, a special case when using alinear kernel (Fearnhead and Taylor, 2013)

18 / 37

Page 19: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

KASMC versus ASMC

green: ASMC / KASMC with linear kernelred: KASMC with Gaussian RBF kernel

19 / 37

Page 20: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 4

Implementation Details

20 / 37

Page 21: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Construction of Target Sequence

For artificial distribution sequence we used geometric bridge

πi ∝ π1−ρi0 πρiT

where (ρi )Ti=1 is an increasing sequence satisfying ρT = 1

another standard choice in Bayesian Inference is addingdatapoints one after another

πi (X ) = π(X |d1, . . . , dbρiDc)

resulting in Iterated Batch Importance Sampling(Chopin, 2002, IBIS)

21 / 37

Page 22: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Stochastic approximation, variance reduction

Free scaling parameters can be tuned for optimal scaling ofMCMC using stochastic approximation framework of Andrieuand Thoms (2008)

asymptotically optimal acceptance rate for Random Walk MHis αopt = 0.234 (Rosenthal, 2011)tune single parameter δi by

δi+1 = δi + λi (αi − αopt)

for non-increasing λ1, . . . , λT

used Random Fourier Features (Rahimi and Recht, 2007) forefficient on-line updates of emulators

used weighted updates and Rao-Blackwellization for variancereduction in estimated emulators

22 / 37

Page 23: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 5

Evaluation

23 / 37

Page 24: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

Synthetic target: Banana distribution in 8 dimensions, i.e.Gaussian with twisted second dimension

20 15 10 5 0 5 10 15 20

4

2

0

2

4

6

8

24 / 37

Page 25: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

Compare performance of Random-Walk rejuvenation withasymptotically optimal scaling (ν = 2.38/

√d), ASMC and

KASMC with Gaussian RBF kernel

Fixed learning rate of λ = 0.1 to adapt scale parameter usingstochastic approximation

Geometric bridge of length 20

30 Monte Carlo runs

Report Maximum Mean Discrepancy (MMD) using polynomialkernel of order 3: distance of moments up to order 3 betweenground truth samples and samples produced by each method

25 / 37

Page 26: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

Figure: Improved convergence of all mixed moments up to order 3 ofKASMC compared to ASMC and RW-SMC. 26 / 37

Page 27: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Evidence approximation for intractable likelihoods

in classification using Gaussian Processes (GP), logistictransformation renders likelihood intractable

likelihood can be unbiasedly estimated using ImportanceSampling from EP approximation

estimate model evidence when using ARD kernel in the GP

particularly hard because noisy likelihoods means noisyimportance weights

ground truth by averaging evidence estimate over 20 longrunning SMC algorithms

27 / 37

Page 28: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Evidence approximation for intractable likelihoods

0 100 200 300 400 500

Number of particles

100

101

102

103Estimation variance

KASSASMC

Figure: Monte Carlo Variance, KASMC in blue, ASMC in green.

28 / 37

Page 29: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Stochastic volatility model with intractable likelihood

Stochastic volatility particularly challenging class of bayesianinverse problems

time series as a high-dimensional nuisance variable

models have to capture the non-linearities in the data(Barndorff-Nielsen and Shephard, 2001)

concentrate on the prediction of daily volatility of asset prices,reusing the model and dataset studied by Chopin et al. (2011)(nuisance of dimension d = 753)

report RMSE of target covariance estimate

29 / 37

Page 30: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

KGRIS with Stochastic volatility

5 10 15 20 25 30 35 40 45 50

Population size

0.0052

0.0054

0.0056

0.0058

0.0060

0.0062

0.0064

RM

SEco

vari

ance

AMKGIS

30 / 37

Page 31: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Section 6

Conclusion

31 / 37

Page 32: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Conclusion (1)

Developed Kernel SMC framework

KSMC exploits kernel emulators of target structure

combines these with general SMC/PMC advantages formultimodal targets and evidence estimation

especially attractive when likelihoods are intractable

32 / 37

Page 33: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Conclusion (2)

evaluated on several challenging models where it was clearlyimproving statistical efficiency

KASMC exhibits better MMD for Bananaless MC variance than ASMC in evidence estimation for GPclassificationKGRIS clearly improves covariance estimates in StochasticVolatility model

33 / 37

Page 34: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Thanks!

34 / 37

Page 35: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Literature I

Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC.Statistics and Computing, 18(November):343–373.

Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-GaussianOrnstern-Uhlenbeck-Based Models and Some of Their Uses inFinancial Economics. Journal of the Royal Statistical Society.Series B, 63(2):167–241.

Cappe, O., Guillin, a., Marin, J. M., and Robert, C. P. (2004).Population Monte Carlo. Journal of Computational andGraphical Statistics, 13(4):907–929.

Chopin, N. (2002). A sequential particle filter method for staticmodels. Biometrika, 89(3):539–552.

35 / 37

Page 36: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Literature II

Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2011).SMCˆ2: an efficient algorithm for sequential analysis ofstate-space models. 0(1):1–27.

Fearnhead, P. and Taylor, B. M. (2013). An Adaptive SequentialMonte Carlo Sampler. Bayesian Analysis, (2):411–438.

Rahimi, A. and Recht, B. (2007). Random Features for LargeScale Kernel Machines. In Neural Information ProcessingSystems, number 1, pages 1–8.

Rosenthal, J. S. (2011). Optimal Proposal Distributions andAdaptive MCMC. In Handbook of Markov Chain Monte Carlo,chapter 4, pages 93–112. Chapman & Hall.

36 / 37

Page 37: Kernel Sequential Monte Carlo - WordPress.com · 11/25/2015  · Intractable likelihoods Intractable Likelihoods and Evidence intractable likelihoods arise in many models (e.g. nonconjugate

Outline Introduction Kernel SMC Implementation Details Evaluation Conclusion References

Literature III

Schuster, I. (2015). Consistency of Importance Sampling estimatesbased on dependent sample sets and an application to modelswith factorizing likelihoods. arXiv preprint, pages 1–14.

Sejdinovic, D., Strathmann, H., Garcia, M. L., Andrieu, C., andGretton, A. (2014). Kernel Adaptive Metropolis-Hastings. arXiv,32.

Strathmann, H., Sejdinovic, D., Livingstone, S., Szabo, Z., andGretton, A. (2015). Gradient-free Hamiltonian Monte Carlo withefficient Kernel Exponential Families. In Neural InformationProcessing Systems.

Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2013).Importance sampling squared for Bayesian inference in latentvariable models.

37 / 37