State space models - Rob J. Hyndman · 2014. 6. 2. · Outline 1Simple structural models 2Linear...

Rob J Hyndman

State space models

2: Structural models

Outline

1 Simple structural models

2 Linear Gaussian state space models

3 Kalman filter

4 Kalman smoothing

5 Time varying parameter models

State space models 2: Structural models 2

Outline

3 Kalman filter

4 Kalman smoothing

State space models

xt−1 yt

xt yt+1

xt+1 yt+2

xt+2 yt+3

xt+3 yt+4

xt+4 yt+5

ETS state vectorxt = (`t,bt, st, st−1, . . . , st−m+1)

State space models

xt−1 yt

xt yt+1

xt+1 yt+2

xt+2 yt+3

xt+3 yt+4

xt+4 yt+5

ETS models

å yt depends on xt−1.

å The same errorprocess affectsxt|xt−1 and yt|xt−1.

State space models

xt+1 yt+1

xt+2 yt+2

xt+3 yt+3

xt+4 yt+4

xt+5 yt+5

Structural models

å yt depends on xt.

å A different errorprocess affectsxt|xt−1 and yt|xt.

Local level model

Stochastically varying level (random walk)observed with noise

yt = `t + εt

`t = `t−1 + ξt

εt and ξt are independent Gaussian white noiseprocesses.

Compare ETS(A,N,N) where ξt = αεt−1.

Parameters to estimate: σ2ε and σ2

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).

Local level model

yt = `t + εt

`t = `t−1 + ξt

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).

Local level model

yt = `t + εt

`t = `t−1 + ξt

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).

Local level model

yt = `t + εt

`t = `t−1 + ξt

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).

Local linear trend modelDynamic trend observed with noise

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

εt, ξt and ζt are independent Gaussian whitenoise processes.Compare ETS(A,A,N) where ξt = (α + β)εt−1 andζt = βεt−1

Parameters to estimate: σ2ε , σ

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

Model is a time-varying linear regression.State space models 2: Structural models 7

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

Basic structural model

yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1

εt, ξt, ζt and ηt are independent Gaussian white noiseprocesses.Compare ETS(A,A,A).Parameters to estimate: σ2

ε , σ2ξ , σ

2ζ and σ2

Deterministic seasonality if σ2η = 0.

yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1

ε , σ2ξ , σ

2ζ and σ2

yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1

ε , σ2ξ , σ

2ζ and σ2

yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1

ε , σ2ξ , σ

2ζ and σ2

Trigonometric models

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

sj,t = cosλjsj,t−1 + sinλjs∗j,t−1 + ωj,t

s∗j,t = − sinλjsj,t−1 + cosλjs∗j,t−1 + ω∗

λj = 2πj/mεt, ξt, ζt, ωj,t, ω∗

j,t are independent Gaussian whitenoise processesωj,t and ω∗

j,t have same variance σ2ω,j

Equivalent to BSM when σ2ω,j = σ2

ω and J = m/2Choose J <m/2 for fewer degrees of freedom

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

ETS vs Structural modelsETS models are much more general as theyallow non-linear (multiplicative components).ETS allows automatic forecasting due to itslarger model space.Additive ETS models are almost equivalent tothe corresponding structural models.ETS models have a larger parameter space.Structural models parameters are alwaysnon-negative (variances).Structural models are much easier togeneralize (e.g., add covariates).It is easier to handle missing values withstructural models.

Structural models in R

StructTS(oil, type="level")StructTS(ausair, type="trend")StructTS(austourists, type="BSM")

fit <- StructTS(austourists, type = "BSM")decomp <- cbind(austourists, fitted(fit))colnames(decomp) <- c("data","level","slope",

"seasonal")plot(decomp, main="Decomposition of

International visitor nights")

Structural models in R20

2000 2002 2004 2006 2008 2010

Decomposition of International visitor nights

ETS decomposition20

2000 2002 2004 2006 2008 2010

Decomposition by ETS(A,A,A) method

Outline

3 Kalman filter

4 Kalman smoothing

Linear Gaussian SS models

Observation equation yt = f ′xt + εt

State equation xt = Gxt−1 +wt

State vector xt of length pG a p× p matrix, f a vector of length pεt ∼ NID(0, σ2), wt ∼ NID(0,W).

Local level model:f = G = 1, xt = `t.Local linear trend model:f ′ = [1 0],

[1 10 1

[σ2ξ 0

0 σ2ζ

]State space models 2: Structural models 15

[1 10 1

[σ2ξ 0

0 σ2ζ

[1 10 1

[σ2ξ 0

0 σ2ζ

[1 10 1

[σ2ξ 0

0 σ2ζ

[1 10 1

[σ2ξ 0

0 σ2ζ

[1 10 1

[σ2ξ 0

0 σ2ζ

[1 10 1

[σ2ξ 0

0 σ2ζ

Basic structural modelLinear Gaussian state space model

yt = f ′xt + εt, εt ∼ N(0, σ2)

xt = Gxt−1 +wt wt ∼ N(0,W)

f ′ = [1 0 1 0 · · · 0], W = diagonal(σ2ξ , σ

2ζ , σ

2η ,0, . . . ,0)

`tbts1,t

s3,t...

sm−1,t

1 1 0 0 . . . 0 00 1 0 0 . . . 0 00 0 −1 −1 . . . −1 −10 0 1 0 . . . 0 0

0 0 0 1 . . . ......

......

... . . . . . . 0 00 0 0 . . . 0 1 0

Basic structural modelLinear Gaussian state space model

yt = f ′xt + εt, εt ∼ N(0, σ2)

xt = Gxt−1 +wt wt ∼ N(0,W)

f ′ = [1 0 1 0 · · · 0], W = diagonal(σ2ξ , σ

2ζ , σ

2η ,0, . . . ,0)

`tbts1,t

s3,t...

sm−1,t

1 1 0 0 . . . 0 00 1 0 0 . . . 0 00 0 −1 −1 . . . −1 −10 0 1 0 . . . 0 0

0 0 0 1 . . . ......

......

... . . . . . . 0 00 0 0 . . . 0 1 0

Outline

3 Kalman filter

4 Kalman smoothing

Kalman filterNotation:

xt|t = E[xt|y1, . . . , yt] Pt|t = V[xt|y1, . . . , yt]

xt|t−1 = E[xt|y1, . . . , yt−1] Pt|t−1 = V[xt|y1, . . . , yt−1]

yt|t−1 = E[yt|y1, . . . , yt−1] vt|t−1 = V[yt|y1, . . . , yt−1]

Forecasting:yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

Updating or State Filtering:xt|t = xt|t−1 + Pt|t−1f v

−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1f v−1t|t−1f

′Pt|t−1

State Predictionxt+1|t = Gxt|t

Pt+1|t = GPt|tG′ +W

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

Iterate for t = 1, . . . , T

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

Assume we know x1|0 andP1|0.

vt|t−1 = f ′Pt|t−1f + σ2

−1t|t−1(yt − yt|t−1)

′Pt|t−1

Assume we know x1|0 andP1|0.

Just conditional expectations. So thisgives minimum MSE estimates.

Kalman recursionsKALMANRECURSIONS

2. Forecasting

1. State Prediction 3. State Filtering

Forecast Observation

observation at time t

Filtered State Predicted State Filtered State

Time t-1 Time t Time t

Initializing Kalman filterNeed x1|0 and P1|0 to get started.

Common approach for structural models:set x1|0 = 0 and P1|0 = kI for a very large k.

Lots of research papers on optimalinitialization choices for Kalman recursions.

ETS approach was to estimate x1|0 and avoidP1|0 by assuming error processes identical.

A random x1|0 could be used with ETS models,and then a form of Kalman filter would berequired for estimation and forecasting.

This gives more realistic prediction intervals.State space models 2: Structural models 20

Local level model

yt = `t + εt εt ∼ NID(0, σ2)

`t = `t−1 + ut ut ∼ NID(0,q2)

Kalman recursions:

yt|t−1 = ˆt−1|t−1

vt|t−1 = pt|t−1 + σ2

ˆt|t = ˆ

t−1|t−1 + pt|t−1v−1t|t−1(yt − yt|t−1)

pt+1|t = pt|t−1(1− v−1t|t−1pt|t−1) + q2

Local level model

yt = `t + εt εt ∼ NID(0, σ2)

`t = `t−1 + ut ut ∼ NID(0,q2)

Kalman recursions:

yt|t−1 = ˆt−1|t−1

vt|t−1 = pt|t−1 + σ2

ˆt|t = ˆ

t−1|t−1 + pt|t−1v−1t|t−1(yt − yt|t−1)

pt+1|t = pt|t−1(1− v−1t|t−1pt|t−1) + q2

Handling missing values

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

Updating or State Filtering:

xt|t = xt|t−1+Pt|t−1f v−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1−Pt|t−1f v−1t|t−1f

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate for t = 1, . . . , Tstarting withx1|0 and P1|0.

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Ignored greyed outsection if yt missing.

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Ignored greyed outsection if yt missing.

Multi-step forecasting

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate fort = T + 1, . . . , T + hstarting withxT|T and PT|T.

Multi-step forecasting

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate fort = T + 1, . . . , T + hstarting withxT|T and PT|T.

Treat future values asmissing.

Kalman filter

What’s so special about the Kalman filter

Very general equations for any model in statespace format.

Any model in state space format can easily begeneralized.

Optimal MSE forecasts

Easy to handle missing values.

Easy to compute likelihood.

Kalman filter

Likelihood calculationθ = all unknown parametersfθ(yt|y1, y2, . . . , yt−1) = one-step forecast density.

Likelihood

L(y1, . . . , yT;θ) =T∏

fθ(yt|y1, . . . , yt−1)

Gaussian log likelihood

log L = −T2

log(2π)− 1

T∑t=1

log vt|t−1 −1

T∑t=1

e2t /vt|t−1

where et = yt − yt|t−1.All terms obtained from Kalman filter equations.

Likelihood

L(y1, . . . , yT;θ) =T∏

fθ(yt|y1, . . . , yt−1)

log L = −T2

log(2π)− 1

T∑t=1

log vt|t−1 −1

T∑t=1

e2t /vt|t−1

Likelihood

L(y1, . . . , yT;θ) =T∏

fθ(yt|y1, . . . , yt−1)

log L = −T2

log(2π)− 1

T∑t=1

log vt|t−1 −1

T∑t=1

e2t /vt|t−1

Structural models in RForecasts from Basic structural model

2000 2002 2004 2006 2008 2010 2012

fit <- StructTS(austourists, type = "BSM")fc <- forecast(fit)plot(fc)

Outline

3 Kalman filter

4 Kalman smoothing

Kalman smoothing

Want estimate of xt|y1, . . . , yT where t < T. That is,xt|T.

xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t

where At = Pt|tG′ (Pt+1|t

)−1.

Uses all data, not just previous data.Useful for estimating missing values:yt|T = f ′xt|T.Useful for seasonal adjustment when one of thestates is a seasonal component.

Kalman smoothing

xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t

)−1.

Kalman smoothing

xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t

)−1.

Kalman smoothing in R

fit <- StructTS(austourists, type = "BSM")sm <- tsSmooth(fit)

plot(austourists)lines(sm[,1],col=’blue’)lines(fitted(fit)[,1],col=’red’)legend("topleft",col=c(’blue’,’red’),lty=1,

legend=c("Filtered level","Smoothed level"))

2000 2002 2004 2006 2008 2010

60 Filtered levelSmoothed level

fit <- StructTS(austourists, type = "BSM")sm <- tsSmooth(fit)

plot(austourists)

# Seasonally adjusted dataaus.sa <- austourists - sm[,3]lines(aus.sa,col=’blue’)

2000 2002 2004 2006 2008 2010

x <- austouristsmiss <- sample(1:length(x), 5)x[miss] <- NAfit <- StructTS(x, type = "BSM")sm <- tsSmooth(fit)estim <- sm[,1]+sm[,3]

plot(x, ylim=range(austourists))points(time(x)[miss], estim[miss],

col=’red’, pch=1)points(time(x)[miss], austourists[miss],

col=’black’, pch=1)legend("topleft", pch=1, col=c(2,1),

legend=c("Estimate","Actual"))

2000 2002 2004 2006 2008 2010

●●

EstimateActual

Outline

3 Kalman filter

4 Kalman smoothing

Time varying parameter modelsLinear Gaussian state space model

yt = f ′txt + εt, εt ∼ N(0, σ2t )

xt = Gtxt−1 +wt wt ∼ N(0,Wt)

Kalman recursions:

yt|t−1 = f ′txt|t−1

vt|t−1 = f ′tPt|t−1ft + σ2t

xt|t = xt|t−1 + Pt|t−1ftv−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1ftv−1t|t−1f

′tPt|t−1

xt|t−1 = Gtxt−1|t−1

Pt|t−1 = GtPt−1|t−1G′t +Wt

Time varying parameter modelsLinear Gaussian state space model

yt = f ′txt + εt, εt ∼ N(0, σ2t )

xt = Gtxt−1 +wt wt ∼ N(0,Wt)

Kalman recursions:

yt|t−1 = f ′txt|t−1

vt|t−1 = f ′tPt|t−1ft + σ2t

xt|t = xt|t−1 + Pt|t−1ftv−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1ftv−1t|t−1f

′tPt|t−1

xt|t−1 = Gtxt−1|t−1

Pt|t−1 = GtPt−1|t−1G′t +Wt

Structural models with covariates

Local level with covariate

yt = `t + βzt + εt

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

]Assumes zt is fixed and known (as inregression)

Estimate of β is given by xT|T.

Equivalent to simple linear regression with timevarying intercept.

Easy to extend to multiple regression withadditional terms.

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[1 00 1

[σ2ξ 0

Time varying regression

Simple linear regression with time varyingparameters

yt = `t + βtzt + εt

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[σ2ξ 0

0 σ2ζ

]Allows for a linear regression with parametersthat change slowly over time.

Parameters follow independent random walks.

Estimates of parameters given by xt|t or xt|T.State space models 2: Structural models 38

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[σ2ξ 0

0 σ2ζ

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[σ2ξ 0

0 σ2ζ

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[σ2ξ 0

0 σ2ζ

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[σ2ξ 0

0 σ2ζ

Updating (“online”) regression

Same idea can be used to estimate aregression iteratively as new data arrives.

Simple linear regression with updatingparameters

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[0 00 0

]Updated parameter estimates given by xt|t.Recursive residuals given by yt − yt|t−1.

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[0 00 0

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[0 00 0

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[0 00 0

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

[1 00 1

[0 00 0

State space models - Rob J. Hyndman · 2014. 6. 2. · Outline 1Simple structural models 2Linear...

Documents

Models Training.pptx - Models/Individualized Training

ZEISS Axiolab 5time and gain, then acquire an image, insert a scale bar, switch back to the microscope … and so on. That's what a typical documentation workflow looks like. Now,

Integrated Justice Models Integrated Justice Models

Models, managing models, quality models: an …NISTIR4738 Models,ManagingModels, QualityModels: AnExampleofQuality Management JoanE.Tyler U^.DEPARTMENTOFCOMMERCE NationalInstituteofStandards

Agent-based models of animal behaviour - Universiteit Utrecht · Agent-based Models = Agent-based Models (ABM) = Individual-based Models (IBM) = Individual-oriented Models(IOM)

DIGITAL CAMERA APPAREIL PHOTO NUMÉRIQUE ...cs.olympus-imaging.jp/.../manual/zoom/man_c720uz_e.pdf9 En Items Indications 1Battery check 2Movie mark 3Protect 4Date ’02. 09. 12 5Time

7. Models for Count Data, Inflation Models. Models for Count Data

HRA Models - Value Based Models

1Simple&Weird Sequences

Miracle Models, Bengaluru, Civil Engineering Models

Models Kieran Mathieson. Outline What Are Models? What Are Models For? Process Models Data Models Decision Models

Conceptual models & Mental Models

DIGITAL CAMERA APPAREIL PHOTO NUMÉRIQUE ......9 En Items Indications 1Battery check 2Movie mark 3Protect 4Date ’02. 09. 12 5Time 12:30 6Frame number 20 7Record mode HQ, SQ 8Number

Cognitive Models. 2 Contents Cognitive Models Device Models Cognitive Architectures

Agent-Based Economic Models and Econometrics · Agent-Based Economic Models and Econometrics 3 2-type models 3-type models n-type models -type models Few-type models Many-type models

Waiting Line Models Waiting Line Models -...Use waiting line models

Chapter 17 Probability Models Binomial Probability Models Poisson Probability Models

Duration Models: Parametric Models - Loginpsfaculty.ucdavis.edu/bsjjones/slide3_parm.pdf · Duration Models: Parametric Models ... Dept. of Political Science Parametric Survival Models

Daphne Koller Template Models Temporal Models Probabilistic Graphical Models Representation

Generator models Line models Transformer models Load