Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh

Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs

Onno Zoeter & Tom Heskes

Bayesian Time Series Models Seminar2012.07.19

Summarized and Presented by Heo, Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

2

Contents Basic Model: SLDS

Motivation: Complexity for Posteriors

Methods¨ Assumed Density Filtering

cf) Clique Tree Inference in HMM¨ EP in a Nutshell¨ Expectation propagation for Smoothing in SLDS¨ Generalized EP

Experiments

Appendix: Canonical form & the corresponding operations


3

Model Switching linear dynamical system

¨ conditionally Gaussian state space model¨ Switching Kalman filter model¨ Hybrid model

Ellipse: Gaussian Rectangle: multinomialShading: observed

Observation model

Transition model

Switch part


4

Complexity for Posteriors Posterior distribution for Filtering problems

¨ should consider all possible sequences of S1:T

P(Xt | St, y1:T, ) = MT-1 Gaussian mixture


5

Complexity for Posteriors Ex)

¨ For i=2, if we consider P(X1, X2),

¨ # of Gaussian components in P(X2) without approximation: 4¨ Then, P(Xi) is a mixture of 2i Gaussians.

Representing the correct marginal dist. in a hybrid network can require space that is exponential in the size of network

Exact inference in CLG networks including standard discrete networks is NP-hard (even in polytrees).

Even the problem of computing the prob. of a single discrete vairable is NP-hard


6

Exact Inference: Filtering as Clique Tree Propaga-tion Recursive filtering process as message passing in

a clique tree with belief state

HMM case:

Forward pass in a sum-product clique tree alg.

)( )1(SP

)|( )1()1( oSP )|( )1()2( oSP ),|( )2()1()2( ooSP


7

Assumed Density Filtering (ADF) ADF forces the belief state to live in some restricted fam-

ily F, e.g., product of histograms, Gaussian. Given a prior , do one step of exact Bayesian up-

dating to get . Then do a projection step to find the closest approximation in the family:

If F is the exponential family, we can solve the KL minimization by moment matching.

( ) ( )ˆ ˆarg min ( || )t tuq F

D q

( 1)ˆ t F ( )ˆ t

u F

( 1)ˆ tu ( )ˆ t

u

( 1)ˆ t


8

Assumed Density Filtering (ADF) Minimizing KL(p||q) with respect to exponential family q(z),

¨ By setting gradient w.r.t. η to zero,

¨ For general exponential family, the following holds

¨ So,

Moment matching¨ The optimum solution is to match the expected sufficient statistics

from the derivative of

Notation in this Chapter:


9

Potentials in Sum-product alg.

Forward Message:

Approximating forward pass message for Filtering only!


10

One Example for DBN

( )t

EP in a nutshell Approximate a function by a simpler one:

Where each lives in a parametric, exponen-tial family (e.g. Gaussian)

Factors can be conditional distributions in a Bayesian network

a

afp )()( xx a

afq )(~)( xx

)(~ xaf

)(xaf

EP algorithm Iterate the fixed-point equations:

specifies where the approximation needs to be good

Coordinated local approximations

))()(~||)()((minarg)(~ \\ xxxxx aa

aaa qfqfDf

ab

ba fq )(~)(\ xxwhere

)(\ xaq

(Loopy) Belief propagation Specialize to factorized approximations:

Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)¨ “send messages”

i

iaia xff )(~

)(~ x “messages”

)()( \ xx aa qf

)()(~ \ xx aa qf

EP versus BP EP approximation can be in a restricted family,

e.g. Gaussian EP approximation does not have to be factorized EP applies to many more problems

¨ e.g. mixture of discrete/continuous variables


15

Expectation Propagation EP approximate smoothing algorithm

¨ Smoother is backward (smoothing) version based on the assumed density filter

Considering forward & Backward pass together

¨ In exact case: (backward message) ¨ In approximation:


16

Expectation Propagation


17

Convergence in EP for SLDS Sometimes approximation may fail

How to resolve¨ Iteration

Step 1 to 4 in ADF iteration to Find local approximation that are consistent as possible

¨ Use damped messages Normalisablility in step 4 in ADF is guaranteed if the sum of

the respective inverse covariance matrices is positive definite Damped message in canonical space (appendix)


18

Generalized EP¨ More accurate approximation similar to Kikuchi’s ex-

tension of the Bethe free-energy¨ Outer cluster

Larger than cliques of junction tree¨ Overlap


19

K=1 case:

Clusters form the cliques and separators in a junc-tion tree

Outer cluster:

Overlaps:

Counting number: 1

Counting number: -1(1-2 = -1)Counting number: 0(1- (3-2) = 0)Counting number: 0(1- (4-3+0) =0)


20


21

Alternative Backward Pass (ABP) Approximation to smoothed posteriors

¨ Based on Traditional Kalman smoother form¨ Treat discrete and continuous latent states separately


22

Experiments – with exact posteriors 100 Models: generated by drawing parameters

from conjugate priors Dataset: Generated a sequence of length 8


23

Experiments – with exact posteriors, Gibbs sam-pling


24

Experiments – Effect of larger outer clusters


25

APPENDIX


26

Canonical form Represents the intermediate result as a log-qua-

dratic form exp(Q(x))


27

Operation on canonical form (1/4) Multiplication

¨ Product of two canonical form factors

¨ Ex)


28

Operation on canonical form (2/4) Division

Vacuous canonical form¨ Causes no effect for multiplication and division¨ Defined as


29

Operation on canonical form (3/4) Marginalization

¨ The integral is finite iff KYY is positive definite


30

Operation on canonical form (4/4) Reduction

¨ Reduce a canonical form to context representing evi-dence

¨ If Y=y,


31

Sum-product algorithms Inference in linear Gaussian networks

¨ able to adapt variable elimination and clique tree algorithms using canonical forms

¨ Marginalization operation is well-defined for an arbitrary canonical form

Reduction for instantiating continuous variable¨ cf) discrete case: simply zeroing the entries that are not consistent with Z

= z

Computational complexity¨ Linear in the number of cliques¨ At most cubic in the size of the largest clique

Documents

Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh