Lecture 5
Latent-state modelsAdaptive Models
Latent variables and state-space
We consider a set of latent (hidden) variables which we cannot directly observe. Instead we observe some data (the observations sequence) which is generated by the latent state The latent variable has intrinsic dynamics
Continuous state models
Key concepts
Stability Rapidity (so keep to low computational costs) Sample by sample adaptation as opposed to 'rolling window'
methods
The Kalman Filter (smoother) offers a simple, yet vastly widely used introduction to adaptive models
Linear, Gaussian Can be seen as special case of sequential Gaussian
Process
The Kalman process
The observations process
The update process - 1
The update process - 2
The Kalman Gain
If the state and observation noise processes are not stationary then we have to infer the variances
Re-estimation (ML-II) method (Jazwinski) has the advantage of being causal
EM based methods very effective, but anti-causal
Variational Bayes approach can be used, but means extra computation
The KF in action
95% intervals
Synthetic exampleTwo Brownian streams
• 1 < t < 2000 : no correlation• 2001 < t < 4000 : low correlation (r = 0.5)• 4001 < t < 6000 : high correlation (r = 0.85)
Running correlation window
T=50 samples
Poor correlation estimates
Full state-space model
Significantly improved estimates
Markov dependency
Add Markov dependency between streams for• 5001 < t < 6000
Model-free estimators cannot pick this up
State-space model can pick this up and reduce covariance components accordingly
0th order & 1st order models
Predictive variance reduced as green is highly predictable from blue
Pink +/- 2sd
Incorporating explicit dynamicsOften the state vector is a lagged set of samples from the timeseries
If we know the observations are from e.g. a physical system we may have knowledge of the explicit equations of motion
For example, the constant acceleration model is based around the state vector
“plant” matrix F
(previously set as I)
Dynamic decisions
We can extend the standard Kalman framework such that our outcome variable is the posterior probability over a decision indicatorCan also project the observations into a non-linear basis
Can run using EKF
As with the standard KF, but
Get closed form expressions, so fast
Can handle missing data, from feedback to observations and bit errors in decisions
Uncertainty in Binomial
Quick example
Tracking decision boundary
Improved decision performance
Example from real-time depth of anaesthesia monitoring
Static classifier
Dynamic classifier
60% of labels missing
Active label requests
Label requests
Inferred high error points
Kalman state-space models: summary
Computationally very efficient Infer posteriors over variables of interest Handle missing data and corruptions at all levels
• How to infer system parameters on-line?– Maximum Likelihood re-estimation (causal)– Approximate Bayes (Variational)– Successive EM
Kalman – some extras
Off-line, we can run the filter forwards and backwards – this produces the Rauch-Kalman Smoother
Related to the Extended Kalman Filter is the Unscented Filter which handles non-linear transformations better
All Kalman processes are Gaussian Processes – hence we can convert a GP kernel function to a plant matrix and state-vector – this can produce huge speed ups
Discrete state models
The Hidden Markov Process
The observation (emission) model
State transitions
The posterior
Inference
The VB priors
M dimensional Dirichlet prior
M x M – dimensional Dirichlets
The priors over the observation model depend on the choice of model.
The ML solution
Has numerical stability issues. These are often not discussed, but involve underflow when dealing with long chains. Most software works in the log domain because of this.
For ML we are interested only in the most probable state so further numerical stability can be obtained by re-scaling the state probabilities
The ML solution is notoriously sensitive
VB / sampling
Sampling works very well for HMMs
VB performs excellently, and has the benefit of computational efficiency and potential sequential processing
Natural shrinkage of states occurs: over-complex models are ‘self-pruned’
Shrinkage really helps
data
ML(10) VB(2)
VB(4) VB(10)
An example: FX returns
Coupled Hidden Markov Models (cHMMs)
Why?
Each chain can have a different observation modelThis can be useful if a joint observation model is difficult to arrive at
The total number of states is the Cartesian product of states from each chain – enables large state spaces without parameter explosion
Biomedical example
Blood pressure and respiratory coupling
Summary
Latent variable models, continuous or discrete, model explicit dynamics in the observed data via a set of state variables
Inference proceeds iteratively by working with the joint probability of the state and observed variables
Factoring and keeping to exponential family pdfs allow rapid sequential processing, either ML-II (EM) or variational Bayes
Latent state models form a vast fraction of timeseries models of Markov form
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39
Recommended