Upload
drake
View
30
Download
0
Embed Size (px)
DESCRIPTION
Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs Onno Zoeter & Tom Heskes. Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh. Contents. Basic Model: SLDS Motivation: Complexity for Posteriors Methods - PowerPoint PPT Presentation
Citation preview
Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs
Onno Zoeter & Tom Heskes
Bayesian Time Series Models Seminar2012.07.19
Summarized and Presented by Heo, Min-Oh
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
2
Contents Basic Model: SLDS
Motivation: Complexity for Posteriors
Methods¨ Assumed Density Filtering
cf) Clique Tree Inference in HMM¨ EP in a Nutshell¨ Expectation propagation for Smoothing in SLDS¨ Generalized EP
Experiments
Appendix: Canonical form & the corresponding operations
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
3
Model Switching linear dynamical system
¨ conditionally Gaussian state space model¨ Switching Kalman filter model¨ Hybrid model
Ellipse: Gaussian Rectangle: multinomialShading: observed
Observation model
Transition model
Switch part
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
4
Complexity for Posteriors Posterior distribution for Filtering problems
¨ should consider all possible sequences of S1:T
P(Xt | St, y1:T, ) = MT-1 Gaussian mixture
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
5
Complexity for Posteriors Ex)
¨ For i=2, if we consider P(X1, X2),
¨ # of Gaussian components in P(X2) without approximation: 4¨ Then, P(Xi) is a mixture of 2i Gaussians.
Representing the correct marginal dist. in a hybrid network can require space that is exponential in the size of network
Exact inference in CLG networks including standard discrete networks is NP-hard (even in polytrees).
Even the problem of computing the prob. of a single discrete vairable is NP-hard
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
6
Exact Inference: Filtering as Clique Tree Propaga-tion Recursive filtering process as message passing in
a clique tree with belief state
HMM case:
Forward pass in a sum-product clique tree alg.
)( )1(SP
)|( )1()1( oSP )|( )1()2( oSP ),|( )2()1()2( ooSP
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
7
Assumed Density Filtering (ADF) ADF forces the belief state to live in some restricted fam-
ily F, e.g., product of histograms, Gaussian. Given a prior , do one step of exact Bayesian up-
dating to get . Then do a projection step to find the closest approximation in the family:
If F is the exponential family, we can solve the KL minimization by moment matching.
( ) ( )ˆ ˆarg min ( || )t tuq F
D q
( 1)ˆ t F ( )ˆ t
u F
( 1)ˆ tu ( )ˆ t
u
( 1)ˆ t
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
8
Assumed Density Filtering (ADF) Minimizing KL(p||q) with respect to exponential family q(z),
¨ By setting gradient w.r.t. η to zero,
¨ For general exponential family, the following holds
¨ So,
Moment matching¨ The optimum solution is to match the expected sufficient statistics
from the derivative of
Notation in this Chapter:
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
9
Potentials in Sum-product alg.
Forward Message:
Approximating forward pass message for Filtering only!
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
10
One Example for DBN
( )t
EP in a nutshell Approximate a function by a simpler one:
Where each lives in a parametric, exponen-tial family (e.g. Gaussian)
Factors can be conditional distributions in a Bayesian network
a
afp )()( xx a
afq )(~)( xx
)(~ xaf
)(xaf
EP algorithm Iterate the fixed-point equations:
specifies where the approximation needs to be good
Coordinated local approximations
))()(~||)()((minarg)(~ \\ xxxxx aa
aaa qfqfDf
ab
ba fq )(~)(\ xxwhere
)(\ xaq
(Loopy) Belief propagation Specialize to factorized approximations:
Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)¨ “send messages”
i
iaia xff )(~
)(~ x “messages”
)()( \ xx aa qf
)()(~ \ xx aa qf
EP versus BP EP approximation can be in a restricted family,
e.g. Gaussian EP approximation does not have to be factorized EP applies to many more problems
¨ e.g. mixture of discrete/continuous variables
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
15
Expectation Propagation EP approximate smoothing algorithm
¨ Smoother is backward (smoothing) version based on the assumed density filter
Considering forward & Backward pass together
¨ In exact case: (backward message) ¨ In approximation:
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
16
Expectation Propagation
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
17
Convergence in EP for SLDS Sometimes approximation may fail
How to resolve¨ Iteration
Step 1 to 4 in ADF iteration to Find local approximation that are consistent as possible
¨ Use damped messages Normalisablility in step 4 in ADF is guaranteed if the sum of
the respective inverse covariance matrices is positive definite Damped message in canonical space (appendix)
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
18
Generalized EP¨ More accurate approximation similar to Kikuchi’s ex-
tension of the Bethe free-energy¨ Outer cluster
Larger than cliques of junction tree¨ Overlap
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
19
K=1 case:
Clusters form the cliques and separators in a junc-tion tree
Outer cluster:
Overlaps:
Counting number: 1
Counting number: -1(1-2 = -1)Counting number: 0(1- (3-2) = 0)Counting number: 0(1- (4-3+0) =0)
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
20
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
21
Alternative Backward Pass (ABP) Approximation to smoothed posteriors
¨ Based on Traditional Kalman smoother form¨ Treat discrete and continuous latent states separately
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
22
Experiments – with exact posteriors 100 Models: generated by drawing parameters
from conjugate priors Dataset: Generated a sequence of length 8
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
23
Experiments – with exact posteriors, Gibbs sam-pling
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
24
Experiments – Effect of larger outer clusters
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
25
APPENDIX
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
26
Canonical form Represents the intermediate result as a log-qua-
dratic form exp(Q(x))
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
27
Operation on canonical form (1/4) Multiplication
¨ Product of two canonical form factors
¨ Ex)
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
28
Operation on canonical form (2/4) Division
Vacuous canonical form¨ Causes no effect for multiplication and division¨ Defined as
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
29
Operation on canonical form (3/4) Marginalization
¨ The integral is finite iff KYY is positive definite
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
30
Operation on canonical form (4/4) Reduction
¨ Reduce a canonical form to context representing evi-dence
¨ If Y=y,
(c)2012, Biointelligence Lab, http://bi.snu.ac.kr
31
Sum-product algorithms Inference in linear Gaussian networks
¨ able to adapt variable elimination and clique tree algorithms using canonical forms
¨ Marginalization operation is well-defined for an arbitrary canonical form
Reduction for instantiating continuous variable¨ cf) discrete case: simply zeroing the entries that are not consistent with Z
= z
Computational complexity¨ Linear in the number of cliques¨ At most cubic in the size of the largest clique