Download pdf - Lecture 5 Latent-state models Adaptive Modelssjrob/AIMS/SigProc/lect5.pdf · Kalman – some extras Off-line, we can run the filter forwards and backwards – this produces the Rauch-Kalman

Lecture 5

Latent-state modelsAdaptive Models

Latent variables and state-space

We consider a set of latent (hidden) variables which we cannot directly observe. Instead we observe some data (the observations sequence) which is generated by the latent state The latent variable has intrinsic dynamics

Continuous state models

Key concepts

Stability Rapidity (so keep to low computational costs) Sample by sample adaptation as opposed to 'rolling window'

methods

The Kalman Filter (smoother) offers a simple, yet vastly widely used introduction to adaptive models

Linear, Gaussian Can be seen as special case of sequential Gaussian

Process

The Kalman process

The observations process

The update process - 1

The update process - 2

The Kalman Gain

If the state and observation noise processes are not stationary then we have to infer the variances

Re-estimation (ML-II) method (Jazwinski) has the advantage of being causal

EM based methods very effective, but anti-causal

Variational Bayes approach can be used, but means extra computation

The KF in action

95% intervals

Synthetic exampleTwo Brownian streams

• 1 < t < 2000 : no correlation• 2001 < t < 4000 : low correlation (r = 0.5)• 4001 < t < 6000 : high correlation (r = 0.85)

Running correlation window

T=50 samples

Poor correlation estimates

Full state-space model

Significantly improved estimates

Markov dependency

Add Markov dependency between streams for• 5001 < t < 6000

Model-free estimators cannot pick this up

State-space model can pick this up and reduce covariance components accordingly

0th order & 1st order models

Predictive variance reduced as green is highly predictable from blue

Pink +/- 2sd

Incorporating explicit dynamicsOften the state vector is a lagged set of samples from the timeseries

If we know the observations are from e.g. a physical system we may have knowledge of the explicit equations of motion

For example, the constant acceleration model is based around the state vector

“plant” matrix F

(previously set as I)

Dynamic decisions

We can extend the standard Kalman framework such that our outcome variable is the posterior probability over a decision indicatorCan also project the observations into a non-linear basis

Can run using EKF

As with the standard KF, but

Get closed form expressions, so fast

Can handle missing data, from feedback to observations and bit errors in decisions

Uncertainty in Binomial

Quick example

Tracking decision boundary

Improved decision performance

Example from real-time depth of anaesthesia monitoring

Static classifier

Dynamic classifier

60% of labels missing

Active label requests

Label requests

Inferred high error points

Kalman state-space models: summary

Computationally very efficient Infer posteriors over variables of interest Handle missing data and corruptions at all levels

• How to infer system parameters on-line?– Maximum Likelihood re-estimation (causal)– Approximate Bayes (Variational)– Successive EM

Kalman – some extras

Off-line, we can run the filter forwards and backwards – this produces the Rauch-Kalman Smoother

Related to the Extended Kalman Filter is the Unscented Filter which handles non-linear transformations better

All Kalman processes are Gaussian Processes – hence we can convert a GP kernel function to a plant matrix and state-vector – this can produce huge speed ups

Discrete state models

The Hidden Markov Process

The observation (emission) model

State transitions

The posterior

Inference

The VB priors

M dimensional Dirichlet prior

M x M – dimensional Dirichlets

The priors over the observation model depend on the choice of model.

The ML solution

Has numerical stability issues. These are often not discussed, but involve underflow when dealing with long chains. Most software works in the log domain because of this.

For ML we are interested only in the most probable state so further numerical stability can be obtained by re-scaling the state probabilities

The ML solution is notoriously sensitive

VB / sampling

Sampling works very well for HMMs

VB performs excellently, and has the benefit of computational efficiency and potential sequential processing

Natural shrinkage of states occurs: over-complex models are ‘self-pruned’

Shrinkage really helps

data

ML(10) VB(2)

VB(4) VB(10)

An example: FX returns

Coupled Hidden Markov Models (cHMMs)

Why?

Each chain can have a different observation modelThis can be useful if a joint observation model is difficult to arrive at

The total number of states is the Cartesian product of states from each chain – enables large state spaces without parameter explosion

Biomedical example

Blood pressure and respiratory coupling

Summary

Latent variable models, continuous or discrete, model explicit dynamics in the observed data via a set of state variables

Inference proceeds iteratively by working with the joint probability of the state and observed variables

Factoring and keeping to exponential family pdfs allow rapid sequential processing, either ML-II (EM) or variational Bayes

Latent state models form a vast fraction of timeseries models of Markov form

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39