Kalman filtering & smoothing - Sharif

Kalman filtering & smoothing

40-957 Special Topics in Artificial Intelligence:

Probabilistic Graphical Models

Sharif University of Technology

Soleymani

Spring 2014

From latent variable models to dynamic

models

2

Two categories of latent variable models that we have seen in

previous lectures:

Discrete latent variable

Mixture models

Continuous latent variable

Factor analysis

Dynamical models

Mixture models -> HMM

HMM as dynamical generalization of mixture models

latent variables are discrete but with arbitrary emission probability distributions.

Factor Analysis -> Kalman Filter

Kalman filter as dynamical generalization of factor analysis

State-Space Model (SSM)

3

The latent variables 𝒛1, … , 𝒛𝑇 are form a chain.

Independence relationships (the same as HMM):

Given the state at one moment in the time, the states in the future are conditionally independent of those in the past.

The observation of the observation nodes fails to separate any of the state nodes.

𝒛1 𝒛2 𝒛𝑇

𝒙1 𝒙2 𝒙𝑇

… 𝒛𝑇−1

𝒙𝑇−1

Linear Dynamical System (LDS)

4

𝒛1 = 𝒖 𝒖𝑡~𝒩(𝟎, 𝚺0)

𝒛𝑡+1 = 𝑨𝒛𝑡 + 𝑮𝒘𝑡+1 𝒘𝑡~𝒩(𝟎, 𝑸)

𝒙𝑡 = 𝑪𝒛𝑡 + 𝒗𝑡 𝒗𝑡~𝒩(𝟎, 𝑹)

𝑃 𝒛1 = 𝒩(𝒛1|𝟎, 𝚺0) 𝑃 𝒛𝑡+1|𝒛𝑡 = 𝒩 𝒛𝑡+1|𝑨𝒛𝑡 , 𝑮𝑸𝑮𝑇

𝑃 𝒙𝑡|𝒛𝑡 = 𝒩 𝒙𝑡|𝑪𝒛𝑡, 𝑹

Linear Gaussian model



… 𝒛𝑇−1

𝒙𝑇−1

𝒛𝑡+1 = 𝒇(𝒛𝑡) + 𝑮𝒘𝑡+1 𝒘𝑡~𝒩(𝟎, 𝑸)

𝒙𝑡 = 𝒈(𝒛𝑡) + 𝒗𝑡 𝒗𝑡~𝒩(𝟎, 𝑹)

General state space model:

Kalman filter applications

5

The Kalman filter has been widely used in many real-time

tracking applications.

Many other applications such as:

Navigation and guidance system (Simultaneous Localization

And Mapping)

Control systems

Time-series processing

Inference in LDS

6

Calculation of the posterior probability of the states given

an observation sequence

We will see two types of inference problems on LDS:

Filtering

𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡

Smoothing

𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑇

Online inference

Offline inference

Inference in LDS

7

The graphical model of LDS is tree-structured and inference

can be solved efficiently using the sum-product algorithm:

Filtering: The forward recursions, analogous to 𝛼 messages of HMM, are

known as Kalman filter equations

Smoothing: The backward recursions, analogous to 𝛽 messages, are

known as the Kalman smoother equations

Inference algorithms on SSM are similar to the inference

algorithms on HMM

Message passing

8

Joint distribution (on a linear-Gaussian network) is multi-

variate Gaussian

Thus, marginal and conditional distributions will also be

Gaussian.

We can use message-passing in Gaussian networks to

solve inference problems of LDS

We will focus on only mean and variance computations

Kalman filter: messages

9

Filtering is similar to (but not the same as) forward algorithm in

HMM:

𝛼 𝑡 𝒛𝑡 = 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 =𝑃 𝒛𝑡 , 𝒙1, … , 𝒙𝑡

𝑃 𝒙1, … , 𝒙𝑡=

𝛼𝑡 𝒛𝑡

𝑃 𝒙1, … , 𝒙𝑡

The distribution 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 is 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒙1:𝑡, 𝜮𝒛𝑡|𝒙1:𝑡

Assume that we have calculated 𝛼 𝑡 𝒛𝑡 we need to calculate

𝛼 𝑡+1 𝒛𝑡+1 = 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1



… 𝒛𝑇−1

𝒙𝑇−1

𝛼 1(. ) 𝛼 2(. ) 𝛼 𝑇−1(. ) 𝛼 𝑇(. )

Kalman filter

10

To find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1 , we use two recursive

updates:

Predict step (time update): best guess before seeing

measurement

Compute 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 from 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡

Measurement update step: after measurement, we find the

new posterior in which 𝒙𝑡+1 is also given as evidence

Compute 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡+1 from 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡

Kalman filter: prediction step

11

In thus step, we have 𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡|𝑡, 𝜮𝑡|𝑡 and we

find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 and 𝑃 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 :

𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡+1|𝑡 , 𝜮𝑡+1|𝑡

𝝁𝑡+1|𝑡 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡

= 𝐸𝒛𝑡+1|𝒙1:𝑡𝒛𝑡+1 = 𝐸 𝐴𝒛𝑡 + 𝒘𝑡+1|𝒙1:𝑡

= 𝑨𝝁𝒛𝑡|𝒙1:𝑡

𝜮𝑡+1|𝑡 ≡ 𝚺𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒛𝑡+1|𝒙1:𝑡

𝒛𝑡+1 − 𝝁𝑡+1|𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡𝑇

= 𝐸 𝐴𝒛𝑡 + 𝑮𝒘𝑡+1 − 𝝁𝑡+1|𝑡 𝐴𝒛𝑡 + 𝑮𝒘𝑡+1 − 𝝁𝑡+1|𝑡𝑇|𝒙1:𝑡

= 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇

𝜮𝑡|𝑡 ≡ 𝜮𝒛𝑡|𝒙1:𝑡

𝝁𝑡|𝑡 ≡ 𝝁𝒛𝑡|𝒙1:𝑡

Kalman filter: prediction step

12

𝑃 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒙𝑡+1|𝝁𝒙𝑡+1|𝒙1:𝑡, 𝜮𝒙𝑡+1|𝒙1:𝑡

𝝁𝒙𝑡+1|𝒙1:𝑡

= 𝐸𝒙𝑡+1|𝒙1:𝑡𝒙𝑡+1 = 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1|𝒙1:𝑡 = 𝑪𝝁𝑡+1|𝑡

𝜮𝒙𝑡+1|𝒙1:𝑡= 𝐸𝒙𝑡+1|𝒙1:𝑡

𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡

𝑇

= 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡

𝑇|𝒙1:𝑡

= 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹



Kalman filter: measurement update

13

We want to find 𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 , 𝒙𝑡+1 and for this purpose we

first find the joint distribution 𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡 :

𝜮𝒙𝑡+1,𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒙𝑡+1,𝒛𝑡+1|𝒙1:𝑡

𝒙𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒛𝑡+1 − 𝝁𝒛𝑡+1|𝒙1:𝑡

𝑇

= 𝐸 𝑪𝒛𝑡+1 + 𝒗𝑡+1 − 𝝁𝒙𝑡+1|𝒙1:𝑡𝒛𝑡+1 − 𝝁𝒛𝑡+1|𝒙1:𝑡

𝑇|𝒙1:𝑡 = 𝑪𝜮𝑡+1|𝑡

𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡

= 𝒩 𝒙𝑡+1, 𝒛𝑡+1

𝝁𝑡+1|𝑡

𝑪𝝁𝑡+1|𝑡,

𝜮𝑡+1|𝑡 𝜮𝑡+1|𝑡𝑪𝑇

𝑪𝜮𝑡+1|𝑡 𝑪𝜮𝑡+1|𝑡𝑪𝑇 + 𝑹




14

𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡

= 𝒩 𝒙𝑡+1, 𝒛𝑡+1

𝝁𝑡+1|𝑡

𝐶𝝁𝑡+1|𝑡,



Conditional distribution from the above joint distribution: 𝝁𝑡+1|𝑡+1 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡+1

= 𝝁𝑡+1|𝑡 + 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪

𝑇 + 𝑹−1

𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡

𝜮𝑡+1|𝑡+1 ≡ 𝜮𝒛𝑡+1|𝒙1:𝑡+1

= 𝜮𝑡+1|𝑡 − 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪

𝑇 + 𝑹−1

𝑪𝜮𝑡+1|𝑡



𝑃 𝒙1, 𝒙2 = 𝒩𝒙1

𝒙2

𝝁1

𝝁2,

𝜮11 𝜮12

𝜮𝟐1 𝜮22

⇒ 𝑃 𝒙1|𝒙2 = 𝒩 𝒙1|𝝁1|2, 𝜮1|2

𝝁1|2 = 𝝁1 + 𝜮12𝜮22−1 𝒙2 − 𝝁2

𝜮1|2 = 𝜮11 − 𝜮12𝜮22−1𝜮21


15

𝑃 𝒛𝑡+1, 𝒙𝑡+1|𝒙1, … , 𝒙𝑡

= 𝒩 𝒙𝑡+1, 𝒛𝑡+1

𝝁𝑡+1|𝑡

𝐶𝝁𝑡+1|𝑡,



Conditional distribution from the above joint distribution: 𝝁𝑡+1|𝑡+1 ≡ 𝝁𝒛𝑡+1|𝒙1:𝑡+1

= 𝝁𝑡+1|𝑡 + 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪

𝑇 + 𝑹−1

𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡

𝜮𝑡+1|𝑡+1 ≡ 𝜮𝒛𝑡+1|𝒙1:𝑡+1

= 𝜮𝑡+1|𝑡 − 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪

𝑇 + 𝑹−1

𝑪𝜮𝑡+1|𝑡

We also have found in the time update:

𝝁𝑡+1|𝑡 = 𝑨𝝁𝑡|𝑡

𝜮𝑡+1|𝑡 = 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇



𝝁1|0 = 𝝁0

𝜮1|0 = 𝜮0


16

Updates based on Kalman gain matrix:

𝝁𝑡+1|𝑡+1 = 𝝁𝑡+1|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡

𝜮𝑡+1|𝑡+1 = 𝜮𝑡+1|𝑡 − 𝑲𝑡+1𝑪𝜮𝑡+1|𝑡 = 𝑰 − 𝑲𝑡+1𝑪 𝜮𝑡+1|𝑡

Update takes linear combination of predicted mean 𝑪𝝁𝑡+1|𝑡 and

observation 𝒙𝑡+1, weighted by predicted covariance

Kalman filter as a process of making successive predictions 𝑪𝝁𝑡+1|𝑡 and then

correcting these predictions in the light of the new observations 𝒙𝑡+1.

Covariance update is independent of observed measurements

Only depends on LDS parameters and can be computed offline

Kalman gain matrix

𝑲𝑡+1 ≡ 𝜮𝑡+1|𝑡𝑪𝑇 𝑪𝜮𝑡+1|𝑡𝑪

𝑇 + 𝑹−1

Kalman filter: update equations

17

𝝁𝑡+1|𝑡+1 = 𝝁𝑡+1|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝝁𝑡+1|𝑡

𝜮𝑡+1|𝑡+1 = 𝜮𝑡+1|𝑡 − 𝑲𝑡+1𝑪𝜮𝑡+1|𝑡 = 𝑰 − 𝑲𝑡+1𝑪 𝜮𝑡+1|𝑡

𝝁𝑡+1|𝑡 = 𝑨𝝁𝑡|𝑡

𝜮𝑡+1|𝑡 = 𝑨𝜮𝑡|𝑡𝑨𝑇 + 𝑮𝑸𝑮𝑇

𝝁𝑡+1|𝑡+1 = 𝑨𝝁𝑡|𝑡 + 𝑲𝑡+1 𝒙𝑡+1 − 𝑪𝑨𝝁𝑡|𝑡

𝜮𝑡+1|𝑡+1 = 𝑰 − 𝑲𝑡+1𝑪 𝑨𝜮𝑡|𝑡𝑨

𝑇 + 𝑮𝑸𝑮𝑇

Smoothing

18

Off-line inference in an LDS

Combine forward recursion with a backward recursion

It is the Gaussian analog of the forwards-backwards (alpha-

gamma) algorithm on an HMM

Rauch-Tung-Striebel (RTS) smoother

19

Messages in the backward pass are 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 and

first we find 𝑃 𝒛𝑡, 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 to compute the messages as

conditional of this joint distribution:

𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝑡|𝑡 , 𝜮𝑡|𝑡

𝑃 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡+1|𝝁𝑡+1|𝑡, 𝜮𝑡+1|𝑡

𝚺𝒛𝑡,𝒛𝑡+1|𝒙1:𝑡= 𝐸𝒛𝑡,𝒛𝑡+1|𝒙1:𝑡

𝒛𝑡 − 𝝁𝑡|𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡𝑇

= 𝜮𝑡|𝑡𝑨𝑇

𝑃 𝒛𝑡, 𝒛𝑡+1|𝒙1, … , 𝒙𝑡 = 𝒩𝝁𝑡|𝑡

𝝁𝑡+1|𝑡

𝜮𝑡|𝑡 𝜮𝑡|𝑡𝑨𝑇

𝑨𝜮𝑡|𝑡 𝜮𝑡+1|𝑡

All of the above quantities are available after a forward pass (Kalman filter)


20

We find the conditional distribution 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 from

the joint distribution introduced in the previous slide:

𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡

, 𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡

𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡

𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝜮𝑡|𝑡 − 𝑳𝑡𝜮𝑡+1|𝑡𝑳𝑡

𝑇

𝑳𝑡 ≡ 𝜮𝑡|𝑡𝑨𝑇𝜮𝑡+1|𝑡

−1

𝑃 𝒙1|𝒙2 = 𝒩 𝒙1|𝝁1|2, 𝜮1|2

𝝁1|2 = 𝝁1 + 𝜮12𝜮22−1 𝒙2 − 𝝁2

𝜮1|2 = 𝜮11 − 𝜮12𝜮22−1𝜮21


21

We find the conditional distribution 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 from

the joint:

𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝒩 𝒛𝑡|𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡, 𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡

𝝁𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡

𝚺𝒛𝑡|𝒛𝑡+1,𝒙1:𝑡= 𝜮𝑡|𝑡 − 𝑳𝑡𝜮𝑡+1|𝑡𝑳𝑡

𝑇

Also, according to the structure of the model:

𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑡 = 𝑃 𝒛𝑡|𝒛𝑡+1, 𝒙1, … , 𝒙𝑇


−1

RTS smoother: update equations

22

𝑃 𝒛𝑡|𝒙1, … , 𝒙𝑇 = 𝒩 𝝁𝑡|𝑇 , 𝚺𝑡|𝑇

𝝁𝑡|𝑇 = 𝝁𝑡|𝑡 + 𝑳𝑡 𝝁𝑡+1|𝑇 − 𝝁𝑡+1|𝑡

𝚺𝑡|𝑇 = 𝚺𝑡|𝑡 + 𝑳𝑡 𝚺𝑡+1|𝑇 − 𝚺𝑡+1|𝑡 𝑳𝑡𝑇

Derivation for mean: 𝝁𝑡|𝑇 = 𝐸 𝒛𝑡 𝒙1:𝑇 = 𝐸 𝐸[𝒛𝑡|𝒛𝑡+1, 𝒙1:𝑡]|𝒙1:𝑇

= 𝐸 𝝁𝑡|𝑡 + 𝑳𝑡 𝒛𝑡+1 − 𝝁𝑡+1|𝑡 |𝒙1:𝑇

= 𝝁𝑡|𝑡 + 𝑳𝑡 𝝁𝑡+1|𝑇 − 𝝁𝑡+1|𝑡

𝚺𝑡|𝑇 can be found similarly

𝐸 𝑋 𝑍 = 𝐸 𝐸[𝑋|𝑌, 𝑍]|𝑍

𝑉𝑎𝑟 𝑋 𝑍 = 𝑉𝑎𝑟 𝐸[𝑋|𝑌, 𝑍]|𝑍 + 𝐸[𝑉𝑎𝑟[𝑋|𝑌, 𝑍]|𝑍]

𝝁𝑇|𝑇 and 𝚺𝑇|𝑇 are initialized from the filtering pass


−1

Example

23

In LDS, the sequence of individually most probable values of latent variables is the same as the most probable latent sequence.

No need to use the analogue of the Viterbi algorithm for the LDS.

Thus, red crosses show the most probable sequences obtained using filtering (b) and smoothing (c) algorithms

[Murphy]

Learning of LDS

24

Learning – EM algorithm

E-step: expected sufficient statistics are found

𝐸[𝒛𝑡]

𝐸[𝒛𝑡𝒛𝑡𝑇]

𝐸[𝒛𝑡𝒛𝑡−1𝑇 ]

M-step:

update of 𝑪 and 𝑹 is similar to the M-step of factor analysis

LDS: summary

25

SSMs are dynamical models that allows continuous states

(latent variables)

LDS is a linear-Gaussian SSM

Inference problems in LDS can be solved using message

passing:

Kalman filter can be used to solve the filtering problem

RTS smoother can be used to solve the smoothing problem

Documents

Kalman filtering & smoothing - Sharif