Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
Kalman filtering & smoothing
40-957 Special Topics in Artificial Intelligence:
Probabilistic Graphical Models
Sharif University of Technology
Soleymani
Spring 2014
From latent variable models to dynamic
models
2
Two categories of latent variable models that we have seen in
previous lectures:
Discrete latent variable
Mixture models
Continuous latent variable
Factor analysis
Dynamical models
Mixture models -> HMM
HMM as dynamical generalization of mixture models
latent variables are discrete but with arbitrary emission probability distributions.
Factor Analysis -> Kalman Filter
Kalman filter as dynamical generalization of factor analysis
State-Space Model (SSM)
3
The latent variables 𝒛1, … , 𝒛𝑇 are form a chain.
Independence relationships (the same as HMM):
Given the state at one moment in the time, the states in the future are conditionally independent of those in the past.
The observation of the observation nodes fails to separate any of the state nodes.
𝒛1 𝒛2 𝒛𝑇
𝒙1 𝒙2 𝒙𝑇
… 𝒛𝑇−1
𝒙𝑇−1
Linear Dynamical System (LDS)
4
𝒛1 = 𝒖 𝒖𝑡~𝒩(𝟎, 𝚺0)
𝒛𝑡+1 = 𝑨𝒛𝑡 + 𝑮𝒘𝑡+1 𝒘𝑡~𝒩(𝟎, 𝑸)
𝒙𝑡 = 𝑪𝒛𝑡 + 𝒗𝑡 𝒗𝑡~𝒩(𝟎, 𝑹)
𝑃 𝒛1 = 𝒩(𝒛1|𝟎, 𝚺0) 𝑃 𝒛𝑡+1|𝒛𝑡 = 𝒩 𝒛𝑡+1|𝑨𝒛𝑡 , 𝑮𝑸𝑮𝑇
𝑃 𝒙𝑡|𝒛𝑡 = 𝒩 𝒙𝑡|𝑪𝒛𝑡, 𝑹
Linear Gaussian model
𝒛1 𝒛2 𝒛𝑇
𝒙1 𝒙2 𝒙𝑇
… 𝒛𝑇−1
𝒙𝑇−1
𝒛𝑡+1 = 𝒇(𝒛𝑡) + 𝑮𝒘𝑡+1 𝒘𝑡~𝒩(𝟎, 𝑸)
𝒙𝑡 = 𝒈(𝒛𝑡) + 𝒗𝑡 𝒗𝑡~𝒩(𝟎, 𝑹)
General state space model:
Kalman filter applications
5
The Kalman filter has been widely used in many real-time
tracking applications.
Many other applications such as:
Navigation and guidance system (Simultaneous Localization
And Mapping)
Control systems
Time-series processing
Inference in LDS
6
Calculation of the posterior probability of the states given
an observation sequence
We will see two types of inference problems on LDS:
Filtering
𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡
Smoothing
𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑇
Online inference
Offline inference
Inference in LDS
7
The graphical model of LDS is tree-structured and inference
can be solved efficiently using the sum-product algorithm:
Filtering: The forward recursions, analogous to 𝛼 messages of HMM, are
known as Kalman filter equations
Smoothing: The backward recursions, analogous to 𝛽 messages, are
known as the Kalman smoother equations
Inference algorithms on SSM are similar to the inference
algorithms on HMM
Message passing
8
Joint distribution (on a linear-Gaussian network) is multi-
variate Gaussian
Thus, marginal and conditional distributions will also be
Gaussian.
We can use message-passing in Gaussian networks to
solve inference problems of LDS
We will focus on only mean and variance computations
Kalman filter: messages
9
Filtering is similar to (but not the same as) forward algorithm in
HMM:
𝛼 𝑡 𝒛𝑡 = 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 =𝑃 𝒛𝑡 , 𝒙1, … , 𝒙𝑡
𝑃 𝒙1, … , 𝒙𝑡=
𝛼𝑡 𝒛𝑡
𝑃 𝒙1, … , 𝒙𝑡
The distribution 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 is 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒙1:𝑡, 𝜮𝒛𝑡|𝒙1:𝑡
Assume that we have calculated 𝛼 𝑡 𝒛𝑡 we need to calculate
𝛼 𝑡+1 𝒛𝑡+1 = 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1
𝒛1 𝒛2 𝒛𝑇
𝒙1 𝒙2 𝒙𝑇
… 𝒛𝑇−1
𝒙𝑇−1
𝛼 1(. ) 𝛼 2(. ) 𝛼 𝑇−1(. ) 𝛼 𝑇(. )
Kalman filter
10
To find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1 , we use two recursive
updates:
Predict step (time update): best guess before seeing
measurement
Compute 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 from 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡
Measurement update step: after measurement, we find the
new posterior in which 𝒙𝑡+1 is also given as evidence
Compute 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1 from 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡
Kalman filter: prediction step
11
In thus step, we have 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡|𝑡, 𝜮𝑡|𝑡 and we
find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 and 𝑃 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 :
𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡+1|𝑡 , 𝜮𝑡+1|𝑡
𝝁𝑡+1|𝑡 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡
= 𝐸𝒛𝑡+1|𝒙1:𝑡𝒛𝑡+1 = 𝐸 𝐴𝒛𝑡 + 𝒘𝑡+1|𝒙1:𝑡
= 𝑨𝝁𝒛𝑡|𝒙1:𝑡
𝜮𝑡+1|𝑡 ≡ 𝚺𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒛𝑡+1|𝒙1:𝑡
𝒛𝑡+1 − 𝝁𝑡+1|𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡𝑇
= 𝐸 𝐴𝒛𝑡 + 𝑮𝒘𝑡+1 − 𝝁𝑡+1|𝑡 𝐴𝒛𝑡 + 𝑮𝒘𝑡+1 − 𝝁𝑡+1|𝑡𝑇|𝒙1:𝑡
= 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇
𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡
𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡
Kalman filter: prediction step
12
𝑃 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒙𝑡+1|𝝁𝒙𝑡+1|𝒙1:𝑡, 𝜮𝒙𝑡+1|𝒙1:𝑡
𝝁𝒙𝑡+1|𝒙1:𝑡
= 𝐸𝒙𝑡+1|𝒙1:𝑡𝒙𝑡+1 = 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1|𝒙1:𝑡 = 𝑪𝝁𝑡+1|𝑡
𝜮𝒙𝑡+1|𝒙1:𝑡= 𝐸𝒙𝑡+1|𝒙1:𝑡
𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡
𝑇
= 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡
𝑇|𝒙1:𝑡
= 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹
𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡
𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡
Kalman filter: measurement update
13
We want to find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 , 𝒙𝑡+1 and for this purpose we
first find the joint distribution 𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 :
𝜮𝒙𝑡+1,𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒙𝑡+1,𝒛𝑡+1|𝒙1:𝑡
𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒛𝑡+1 − 𝝁𝒛𝑡+1|𝒙1:𝑡
𝑇
= 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒛𝑡+1 − 𝝁𝒛𝑡+1|𝒙1:𝑡
𝑇|𝒙1:𝑡 = 𝑪𝜮𝑡+1|𝑡
𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡
= 𝒩 𝒙𝑡+1, 𝒛𝑡+1
𝝁𝑡+1|𝑡
𝑪𝝁𝑡+1|𝑡,
𝜮𝑡+1|𝑡 𝜮𝑡+1|𝑡𝑪𝑇
𝑪𝜮𝑡+1|𝑡 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹
𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡
𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡
Kalman filter: measurement update
14
𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡
= 𝒩 𝒙𝑡+1, 𝒛𝑡+1
𝝁𝑡+1|𝑡
𝐶𝝁𝑡+1|𝑡,
𝜮𝑡+1|𝑡 𝜮𝑡+1|𝑡𝑪𝑇
𝑪𝜮𝑡+1|𝑡 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹
Conditional distribution from the above joint distribution: 𝝁𝑡+1|𝑡+1 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡+1
= 𝝁𝑡+1|𝑡 + 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪
𝑇 + 𝑹−1
𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡
𝜮𝑡+1|𝑡+1 ≡ 𝜮𝒛𝑡+1|𝒙1:𝑡+1
= 𝜮𝑡+1|𝑡 − 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪
𝑇 + 𝑹−1
𝑪𝜮𝑡+1|𝑡
𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡
𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡
𝑃 𝒙1, 𝒙2 = 𝒩𝒙1
𝒙2
𝝁1
𝝁2,
𝜮11 𝜮12
𝜮𝟐1 𝜮22
⇒ 𝑃 𝒙1|𝒙2 = 𝒩 𝒙1|𝝁1|2, 𝜮1|2
𝝁1|2 = 𝝁1 + 𝜮12𝜮22−1 𝒙2 − 𝝁2
𝜮1|2 = 𝜮11 − 𝜮12𝜮22−1𝜮21
Kalman filter: measurement update
15
𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡
= 𝒩 𝒙𝑡+1, 𝒛𝑡+1
𝝁𝑡+1|𝑡
𝐶𝝁𝑡+1|𝑡,
𝜮𝑡+1|𝑡 𝜮𝑡+1|𝑡𝑪𝑇
𝑪𝜮𝑡+1|𝑡 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹
Conditional distribution from the above joint distribution: 𝝁𝑡+1|𝑡+1 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡+1
= 𝝁𝑡+1|𝑡 + 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪
𝑇 + 𝑹−1
𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡
𝜮𝑡+1|𝑡+1 ≡ 𝜮𝒛𝑡+1|𝒙1:𝑡+1
= 𝜮𝑡+1|𝑡 − 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪
𝑇 + 𝑹−1
𝑪𝜮𝑡+1|𝑡
We also have found in the time update:
𝝁𝑡+1|𝑡 = 𝑨𝝁𝑡|𝑡
𝜮𝑡+1|𝑡 = 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇
𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡
𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡
𝝁1|0 = 𝝁0
𝜮1|0 = 𝜮0
Kalman filter: measurement update
16
Updates based on Kalman gain matrix:
𝝁𝑡+1|𝑡+1 = 𝝁𝑡+1|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡
𝜮𝑡+1|𝑡+1 = 𝜮𝑡+1|𝑡 − 𝑲𝑡+1𝑪𝜮𝑡+1|𝑡 = 𝑰 − 𝑲𝑡+1𝑪 𝜮𝑡+1|𝑡
Update takes linear combination of predicted mean 𝑪𝝁𝑡+1|𝑡 and
observation 𝒙𝑡+1, weighted by predicted covariance
Kalman filter as a process of making successive predictions 𝑪𝝁𝑡+1|𝑡 and then
correcting these predictions in the light of the new observations 𝒙𝑡+1.
Covariance update is independent of observed measurements
Only depends on LDS parameters and can be computed offline
Kalman gain matrix
𝑲𝑡+1 ≡ 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪
𝑇 + 𝑹−1
Kalman filter: update equations
17
𝝁𝑡+1|𝑡+1 = 𝝁𝑡+1|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡
𝜮𝑡+1|𝑡+1 = 𝜮𝑡+1|𝑡 − 𝑲𝑡+1𝑪𝜮𝑡+1|𝑡 = 𝑰 − 𝑲𝑡+1𝑪 𝜮𝑡+1|𝑡
𝝁𝑡+1|𝑡 = 𝑨𝝁𝑡|𝑡
𝜮𝑡+1|𝑡 = 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇
𝝁𝑡+1|𝑡+1 = 𝑨𝝁𝑡|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝑨𝝁𝑡|𝑡
𝜮𝑡+1|𝑡+1 = 𝑰 − 𝑲𝑡+1𝑪 𝑨𝜮𝑡|𝑡𝑨
𝑇 + 𝑮𝑸𝑮𝑇
Smoothing
18
Off-line inference in an LDS
Combine forward recursion with a backward recursion
It is the Gaussian analog of the forwards-backwards (alpha-
gamma) algorithm on an HMM
Rauch-Tung-Striebel (RTS) smoother
19
Messages in the backward pass are 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 and
first we find 𝑃 𝒛𝑡, 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 to compute the messages as
conditional of this joint distribution:
𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡|𝑡 , 𝜮𝑡|𝑡
𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡+1|𝝁𝑡+1|𝑡, 𝜮𝑡+1|𝑡
𝚺𝒛𝑡,𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒛𝑡,𝒛𝑡+1|𝒙1:𝑡
𝒛𝑡 − 𝝁𝑡|𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡𝑇
= 𝜮𝑡|𝑡𝑨𝑇
𝑃 𝒛𝑡, 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩𝝁𝑡|𝑡
𝝁𝑡+1|𝑡
𝜮𝑡|𝑡 𝜮𝑡|𝑡𝑨𝑇
𝑨𝜮𝑡|𝑡 𝜮𝑡+1|𝑡
All of the above quantities are available after a forward pass (Kalman filter)
Rauch-Tung-Striebel (RTS) smoother
20
We find the conditional distribution 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 from
the joint distribution introduced in the previous slide:
𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡
, 𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡
𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡
𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝜮𝑡|𝑡 − 𝑳𝑡𝜮𝑡+1|𝑡𝑳𝑡
𝑇
𝑳𝑡 ≡ 𝜮𝑡|𝑡𝑨𝑇𝜮𝑡+1|𝑡
−1
𝑃 𝒙1|𝒙2 = 𝒩 𝒙1|𝝁1|2, 𝜮1|2
𝝁1|2 = 𝝁1 + 𝜮12𝜮22−1 𝒙2 − 𝝁2
𝜮1|2 = 𝜮11 − 𝜮12𝜮22−1𝜮21
Rauch-Tung-Striebel (RTS) smoother
21
We find the conditional distribution 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 from
the joint:
𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡, 𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡
𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡
𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝜮𝑡|𝑡 − 𝑳𝑡𝜮𝑡+1|𝑡𝑳𝑡
𝑇
Also, according to the structure of the model:
𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑇
𝑳𝑡 ≡ 𝜮𝑡|𝑡𝑨𝑇𝜮𝑡+1|𝑡
−1
RTS smoother: update equations
22
𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑇 = 𝒩 𝝁𝑡|𝑇 , 𝚺𝑡|𝑇
𝝁𝑡|𝑇 = 𝝁𝑡|𝑡 + 𝑳𝑡 𝝁𝑡+1|𝑇 − 𝝁𝑡+1|𝑡
𝚺𝑡|𝑇 = 𝚺𝑡|𝑡 + 𝑳𝑡 𝚺𝑡+1|𝑇 − 𝚺𝑡+1|𝑡 𝑳𝑡𝑇
Derivation for mean: 𝝁𝑡|𝑇 = 𝐸 𝒛𝑡 𝒙1:𝑇 = 𝐸 𝐸[𝒛𝑡|𝒛𝑡+1, 𝒙1:𝑡]|𝒙1:𝑇
= 𝐸 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡 |𝒙1:𝑇
= 𝝁𝑡|𝑡 + 𝑳𝑡 𝝁𝑡+1|𝑇 − 𝝁𝑡+1|𝑡
𝚺𝑡|𝑇 can be found similarly
𝐸 𝑋 𝑍 = 𝐸 𝐸[𝑋|𝑌, 𝑍]|𝑍
𝑉𝑎𝑟 𝑋 𝑍 = 𝑉𝑎𝑟 𝐸[𝑋|𝑌, 𝑍]|𝑍 + 𝐸[𝑉𝑎𝑟[𝑋|𝑌, 𝑍]|𝑍]
𝝁𝑇|𝑇 and 𝚺𝑇|𝑇 are initialized from the filtering pass
𝑳𝑡 ≡ 𝜮𝑡|𝑡𝑨𝑇𝜮𝑡+1|𝑡
−1
Example
23
In LDS, the sequence of individually most probable values of latent variables is the same as the most probable latent sequence.
No need to use the analogue of the Viterbi algorithm for the LDS.
Thus, red crosses show the most probable sequences obtained using filtering (b) and smoothing (c) algorithms
[Murphy]
Learning of LDS
24
Learning – EM algorithm
E-step: expected sufficient statistics are found
𝐸[𝒛𝑡]
𝐸[𝒛𝑡𝒛𝑡𝑇]
𝐸[𝒛𝑡𝒛𝑡−1𝑇 ]
M-step:
update of 𝑪 and 𝑹 is similar to the M-step of factor analysis
LDS: summary
25
SSMs are dynamical models that allows continuous states
(latent variables)
LDS is a linear-Gaussian SSM
Inference problems in LDS can be solved using message
passing:
Kalman filter can be used to solve the filtering problem
RTS smoother can be used to solve the smoothing problem