Click here to load reader

Expectation Propagation in Dynamical · PDF file 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

  • View
    1

  • Download
    0

Embed Size (px)

Text of Expectation Propagation in Dynamical · PDF file 8/10/2012  · Linear systems:...

  • Expectation Propagation in Dynamical Systems

    Marc Peter Deisenroth

    Joint Work with Shakir Mohamed (UBC)

    August 10, 2012

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1

  • Motivation

    Figure : Complex time series: motion capture, GDP, climate

    Time series in economics, robotics, motion capture, etc. have unknown dynamical structure, are high-dimensional and noisy

    Flexible and accurate models Nonlinear (Gaussian process) dynamical systems (GPDS)

    Accurate inference in (GP)DS important for Better knowledge about latent structures Parameter learning

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2

  • Outline

    1 Inference in Time Series Models Filtering and Smoothing Expectation Propagation Approximating the Partition Function Relation to Smoothing

    2 EP in Gaussian Process Dynamical Systems Gaussian Processes Filtering/Smoothing in GPDS Expectation Propagation in GPDS

    3 Results

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3

  • Inference in Time Series Models Filtering and Smoothing

    Time Series Models

    xt−1 xt xt+1

    zt−1 zt zt+1

    xt = f(xt−1) + w , w ∼ N ( 0, Q

    ) zt = g(xt) + v , v ∼ N

    ( 0, R

    ) Latent state x ∈ RD Measurement/observation z ∈ RE Transition function f

    Measurement function g

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4

  • Inference in Time Series Models Filtering and Smoothing

    Inference in Time Series Models

    xt−1 xt xt+1

    zt−1 zt zt+1

    Objective: Posterior distribution over latent variables xt Filtering (Forward Inference) Compute p(xt|z1:t) for t = 1, . . . , T Smoothing (Forward-Backward Inference) Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep) Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)

    Examples:

    Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference

    Extended Kalman Filter/Smoother (Kalman, 1959–1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

  • Inference in Time Series Models Filtering and Smoothing

    Inference in Time Series Models

    xt−1 xt xt+1

    zt−1 zt zt+1

    Objective: Posterior distribution over latent variables xt Filtering (Forward Inference) Compute p(xt|z1:t) for t = 1, . . . , T Smoothing (Forward-Backward Inference) Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep) Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)

    Examples:

    Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference

    Extended Kalman Filter/Smoother (Kalman, 1959–1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

  • Inference in Time Series Models Filtering and Smoothing

    Machine Learning Perspective

    xt−1 xt xt+1

    zt−1 zt zt+1

    Treat filtering/smoothing as an inference problem in graphical models with hidden variables

    Allows for efficient local message passing distributed

    Messages are unnormalized probability distributions

    Iterative refinement of the posterior marginals p(xt), t = 1, . . . , T Multiple forward-backward sweeps until global consistency

    (convergence)

    Here: Expectation Propagation (Minka 2001)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6

  • Inference in Time Series Models Expectation Propagation

    Expectation Propagation

    xt−1 xt xt+1

    zt−1 zt zt+1

    xt xt+1 p(xt+1|xt)

    p(zt|xt) p(zt+1|xt+1)

    Inference in factor graphs

    p(xt) = ∏n

    i=1 ti(xt)

    q(xt) = ∏n

    i=1 t̃i(xt)

    Approximate factors t̃i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian)

    Find good a good approximation such that q ≈ p

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

  • Inference in Time Series Models Expectation Propagation

    Expectation Propagation

    xt−1 xt xt+1

    zt−1 zt zt+1

    xt xt+1 p(xt+1|xt)

    p(zt|xt) p(zt+1|xt+1)

    Inference in factor graphs

    p(xt) = ∏n

    i=1 ti(xt)

    q(xt) = ∏n

    i=1 t̃i(xt)

    Approximate factors t̃i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian)

    Find good a good approximation such that q ≈ p

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

  • Inference in Time Series Models Expectation Propagation

    Expectation Propagation

    Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006)

    EP locally minimizes KL(p||q), where p is the true distribution and q is an approximation (from Exponential Family) to it.

    EP = moment matching (unlike Variational Bayes [“mode matching”], which minimizes KL(q||p)) EP exploits properties of the Exponential Family: Compute moments of distributions via derivatives of the log-partition function

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8

  • Inference in Time Series Models Expectation Propagation

    Expectation Propagation

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    p(xt+1|xt)

    p(zt|xt) p(zt+1|xt+1)

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    Figure : Factor graph (left) and fully factored factor graph (right).

    Write down the (fully factored) factor graph

    p(xt) = ∏n

    i=1 ti(xt)

    q(xt) = ∏n

    i=1 t̃i(xt)

    Find approximate t̃i, such that KL(p||q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

  • Inference in Time Series Models Expectation Propagation

    Expectation Propagation

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    p(xt+1|xt)

    p(zt|xt) p(zt+1|xt+1)

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    Figure : Factor graph (left) and fully factored factor graph (right).

    Write down the (fully factored) factor graph

    p(xt) = ∏n

    i=1 ti(xt)

    q(xt) = ∏n

    i=1 t̃i(xt)

    Find approximate t̃i, such that KL(p||q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

  • Inference in Time Series Models Expectation Propagation

    Messages in a Dynamical System

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    Approximate (factored) marginal: q(xt) = ∏

    i t̃i(xt)

    Here, our messages t̃i have names:

    Measurement message qM Forward message qB Backward message qC

    Define cavity distribution: q\i(xt) = q(xt)/t̃i(xt) = ∏

    k 6=i t̃k(xt)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10

  • Inference in Time Series Models Expectation Propagation

    Gaussian EP in More Detail

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    1 Write down the factor graph

    2 Initialize all messages t̃i, i = M,B,C Until convergence:

    3 For all latent variables xt and corresponding messages ti(xt) do

    1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it

    ) by

    Gaussian division. 2 Compute the moments of ti(xt)q

    \i(xt) Updated moments of q(xt)

    3 Compute updated message

    t̃i(xt) = q(xt)/q \i(xt)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

  • Inference in Time Series Models Expectation Propagation

    Gaussian EP in More Detail

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    1 Write down the factor graph

    2 Initialize all messages t̃i, i = M,B,C Until convergence:

    3 For all latent variables xt and corresponding messages ti(xt) do

    1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it

    ) by

    Gaussian division. 2 Compute the moments of ti(xt)q

    \i(xt) Updated moments of q(xt)

    3 Compute updated message

    t̃i(xt) = q(xt)/q \i(xt)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

  • Inference in Time Series Models Expectation Propagation

    Gaussian EP in More Detail

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    1 Write down the factor graph

    2 Initialize all messages t̃i, i = M,B,C Until convergence:

    3 For all latent variables xt and corresponding messages ti(xt) do

    1 Compute the cavity distribution q\i(xt) = N ( xt |µ\it , Σ\it

    ) by

    Gaussian division. 2 Compute the moments of ti(xt)q

    \i(xt) Updated moments of q(xt)

    3 Compute updated message

    t̃i(xt) = q(xt)/q \i(xt)

    Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

  • Inference in Time Series Models Expectation Propagation

    Gaussian EP in More Detail

    qB(xt) xt

    qM(xt)

    xt+1 qC(xt+1)

    qM(xt+1)

    qB(xt+1)qC(xt)

    1 Write down the factor graph

    2 Initialize all messages t̃i