Click here to load reader
View
1
Download
0
Embed Size (px)
Expectation Propagation in Dynamical Systems
Marc Peter Deisenroth
Joint Work with Shakir Mohamed (UBC)
August 10, 2012
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1
Motivation
Figure : Complex time series: motion capture, GDP, climate
Time series in economics, robotics, motion capture, etc. have unknown dynamical structure, are high-dimensional and noisy
Flexible and accurate models Nonlinear (Gaussian process) dynamical systems (GPDS)
Accurate inference in (GP)DS important for Better knowledge about latent structures Parameter learning
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2
Outline
1 Inference in Time Series Models Filtering and Smoothing Expectation Propagation Approximating the Partition Function Relation to Smoothing
2 EP in Gaussian Process Dynamical Systems Gaussian Processes Filtering/Smoothing in GPDS Expectation Propagation in GPDS
3 Results
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3
Inference in Time Series Models Filtering and Smoothing
Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
xt = f(xt−1) + w , w ∼ N ( 0, Q
) zt = g(xt) + v , v ∼ N
( 0, R
) Latent state x ∈ RD Measurement/observation z ∈ RE Transition function f
Measurement function g
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4
Inference in Time Series Models Filtering and Smoothing
Inference in Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
Objective: Posterior distribution over latent variables xt Filtering (Forward Inference) Compute p(xt|z1:t) for t = 1, . . . , T Smoothing (Forward-Backward Inference) Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep) Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)
Examples:
Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference
Extended Kalman Filter/Smoother (Kalman, 1959–1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5
Inference in Time Series Models Filtering and Smoothing
Inference in Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
Objective: Posterior distribution over latent variables xt Filtering (Forward Inference) Compute p(xt|z1:t) for t = 1, . . . , T Smoothing (Forward-Backward Inference) Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep) Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)
Examples:
Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference
Extended Kalman Filter/Smoother (Kalman, 1959–1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5
Inference in Time Series Models Filtering and Smoothing
Machine Learning Perspective
xt−1 xt xt+1
zt−1 zt zt+1
Treat filtering/smoothing as an inference problem in graphical models with hidden variables
Allows for efficient local message passing distributed
Messages are unnormalized probability distributions
Iterative refinement of the posterior marginals p(xt), t = 1, . . . , T Multiple forward-backward sweeps until global consistency
(convergence)
Here: Expectation Propagation (Minka 2001)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6
Inference in Time Series Models Expectation Propagation
Expectation Propagation
xt−1 xt xt+1
zt−1 zt zt+1
xt xt+1 p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
Inference in factor graphs
p(xt) = ∏n
i=1 ti(xt)
q(xt) = ∏n
i=1 t̃i(xt)
Approximate factors t̃i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian)
Find good a good approximation such that q ≈ p
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7
Inference in Time Series Models Expectation Propagation
Expectation Propagation
xt−1 xt xt+1
zt−1 zt zt+1
xt xt+1 p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
Inference in factor graphs
p(xt) = ∏n
i=1 ti(xt)
q(xt) = ∏n
i=1 t̃i(xt)
Approximate factors t̃i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian)
Find good a good approximation such that q ≈ p
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7
Inference in Time Series Models Expectation Propagation
Expectation Propagation
Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006)
EP locally minimizes KL(p||q), where p is the true distribution and q is an approximation (from Exponential Family) to it.
EP = moment matching (unlike Variational Bayes [“mode matching”], which minimizes KL(q||p)) EP exploits properties of the Exponential Family: Compute moments of distributions via derivatives of the log-partition function
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8
Inference in Time Series Models Expectation Propagation
Expectation Propagation
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Figure : Factor graph (left) and fully factored factor graph (right).
Write down the (fully factored) factor graph
p(xt) = ∏n
i=1 ti(xt)
q(xt) = ∏n
i=1 t̃i(xt)
Find approximate t̃i, such that KL(p||q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9
Inference in Time Series Models Expectation Propagation
Expectation Propagation
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Figure : Factor graph (left) and fully factored factor graph (right).
Write down the (fully factored) factor graph
p(xt) = ∏n
i=1 ti(xt)
q(xt) = ∏n
i=1 t̃i(xt)
Find approximate t̃i, such that KL(p||q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9
Inference in Time Series Models Expectation Propagation
Messages in a Dynamical System
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Approximate (factored) marginal: q(xt) = ∏
i t̃i(xt)
Here, our messages t̃i have names:
Measurement message qM Forward message qB Backward message qC
Define cavity distribution: q\i(xt) = q(xt)/t̃i(xt) = ∏
k 6=i t̃k(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages t̃i, i = M,B,C Until convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it
) by
Gaussian division. 2 Compute the moments of ti(xt)q
\i(xt) Updated moments of q(xt)
3 Compute updated message
t̃i(xt) = q(xt)/q \i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages t̃i, i = M,B,C Until convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it
) by
Gaussian division. 2 Compute the moments of ti(xt)q
\i(xt) Updated moments of q(xt)
3 Compute updated message
t̃i(xt) = q(xt)/q \i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages t̃i, i = M,B,C Until convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it
) by
Gaussian division. 2 Compute the moments of ti(xt)q
\i(xt) Updated moments of q(xt)
3 Compute updated message
t̃i(xt) = q(xt)/q \i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt) xt
qM(xt)
xt+1 qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages t̃i