Time Series Analysis, Lecture 2, 2019 1 Contents · 2019. 9. 28. · Time Series Analysis, Lecture 2, 2019 6 Remark 1 P1 j=0 j"t tj is called the linearly indeterministic component,

Time Series Analysis, Lecture 2, 2019 1

Contents

1 Wold’s Decomposition 4

2 VAR 8

3 MLE and Hypothesis Testing for VAR 10

4 Estimating the Effects of Shocks to the Economy 13

5 Identification Problem 15

6 Variance Decomposition 24

Nan Li, Department of Finance, ACEM, SJTU


7 Standard Error for Impulse Response Functions 26

7.1 Confidence Intervals and the Bootstrap . . . . . . . . . . . . . . . . . 26

7.2 VAR Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8 Granger Causality 31

9 Kalman Filter 37

9.1 State-Space Representation . . . . . . . . . . . . . . . . . . . . . . . 38

9.2 Kalman Filter Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . 43

9.3 Innovation Representation . . . . . . . . . . . . . . . . . . . . . . . 47



9.4 Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.5 Serially Correlated Measurement Errors . . . . . . . . . . . . . . . . . 50

9.6 MLE estimation of the parameters . . . . . . . . . . . . . . . . . . . 53

9.7 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

9.8 Statistical Inference with the Kalman Filter . . . . . . . . . . . . . . 57

9.9 Application: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59



1 Wold’s Decomposition

All the stationary ARMA model can be written in the form

xt =∞∑j=1

θjεt−j

where εt is the white noise one would make in forecasting xt as a linear function oflagged xt and where θj’s are square summable and with θ0 = 1.

• Wold’s Decomposition Theorem says this result is in fact fundamental for anycovariance-stationary time series, not just stationary ARMAs!



Theorem 1 (Wold’s Decomposition) Any zero-mean covariance-stationary process xtcan be represented in the form

xt =∞∑j=0

θjεt−j + ηt

where

1. θ0 = 1 and∑∞j=0 θ

2j <∞,

2. εt is white noise and εt = xt − E(xt|xt−1, xt−2, ......)

3. All the roots of θ(L) are on or outside the unit circle, i.e. (unless is a unite rootprocess) the MA polynomial is invertible.

3. The value ηt is uncorrelated with εt−j for any j and is linear deterministic, i.e.ηt = E(ηt|xt−1, xt−2, ......)



Remark 1∑∞j=0 θjεt−j is called the linearly indeterministic component, if ηt = 0

then the process is called purely linear indeterministic (linearly regular).

Remark 2 . E(εt|xt−1, xt−2, ......) = 0

Remark 3θjand

εjare unique.

Idea of proof: rewrite xt as a sum of its forecast errors.

Remark 4 Extension for nonstationary time series, same as above except for that ηt islinear combination of its own past (not necessarily deterministic)

Remark 5 εt need NOT to be normally distributed and not to i.i.d

Remark 6 E(xt|xt−1, xt−2, ......) 6= E [xt|xt−1, xt−2, ......] (linear vs nonlinear)



Remark 7 εt need not to be the true shock

Remark 8 Wold’s decomposition is the unique linear presentation where shocks arelinear forecast error, not true for nonlinear presentation



Example 1 Non-invertible shocks

xt = ηt + 2ηt−1 where ηt i.i.d and σ2η = 1

xt stationary but MA polynomial is not invertible, hence can not express ηt asforecast error of xt.

Solution: Any MA(∞) can be expressed as an invertible MA(∞), which is unique,and said to be the fundamental innovations of xt

• Wold MA(∞) representation as fundamental representation: if two time serieshave the same Wold representation, they are same time series up to second mo-ment/linear forecast error.

2 VAR

• Proposed by Chris Sims in 1970s and 1980sNan Li, Department of Finance, ACEM, SJTU


• Major subsequent contributions by others (Bernanke, Blanchard-Watson, Blanchard-Quah)• Useful to organize data— VARs serve as "battleground" between alternative economic theories— VARs can be used to quantitatively construct a particular model

• Question that can (in principle) be addressed by VAR:— How does the economy respond to a particular shock?— Answer can be useful:∗ For discriminating between models∗ For estimating parameters of a given model

• VARs can’t actually address such a question:— Identification problem— Need extra assumption....structural VAR(SVAR)



3 MLE and Hypothesis Testing for VAR

1. The conditional likelihood function for a vector auroregression

Yt = c+ Φ1Yt−1 + Φ2Yt−2 + ...+ ΦpYt−p + εt, εt ∼ i.i.d.N(0,Σ)

Given the initial p observations, the conditional likelihood of θ = (c,Φ1,Φ2, ..,Φp,Σ)′

is

f(YT , YT−1,...,Y1|Y0, Y−1, ...Y−p+1; θ)

= ΠTt=1f(Yt|Yt−1, Yt−2,...,Y−p+1; θ)

and

Yt|Yt−1, Yt−2,...,Y−p+1 ∼ N(Π′Xt,Σ)

where

Xt =[

1 Y ′t−1 Y ′t−2 ... Y ′t−p]′(np+1)×1

Π′ =[c Φ1 Φ2 ... Φp

]n×(np+1)



Hence

L(θ) = −(Tn

2) log(2π) +

T

2log |Σ−1| − 1

2

T∑t=1

[(Yt − Π′Xt)′Σ−1(Yt − Π′Xt)

]2. Maximum Likelihood Estimate of Π and Σ

Π =

T∑t=1

YtX′t

T∑t=1

XtX′t

−1

which is same as "OLS regression equation-by-equation"

Σ =1

T

T∑t=1

εtε′t

εt = Yt − Π′Xt



3. Likelihood Ratio Test

L(Σ, Π) = −(Tn

2) log(2π) +

T

2log |Σ−1|

−1

2

T∑t=1

[(Yt − Π′Xt)′Σ−1(Yt − Π′Xt)

]= −(

Tn

2) log(2π) +

T

2log |Σ−1| − (Tn/2)

2(L1 − L0) = T (log |Σ0| − log |Σ1|) ∼ χ2(l)

where l is the number of restrictions, for example, a test of p1(H1) vs. p0(H0)

lags (p1 > p0) in a n variable VAR, l = n2(p1 − p0).

Modified likelihood test for small sample bias (Sims 1980)

(T − k)(log |Σ0| − log |Σ1|) ∼ χ2(l), k = 1 + np1

less likely to reject the null hypothesis in small sample.4. Asymptotic Distribution of Π and Σ : Π and Σ are consistent and[ √

T (vec(ΠT )− vec(Π))√T (vech(ΣT )− vech(Σ))

]L−→ N

([Σ⊗Q−1 0

0 2D+n (Σ′ ⊗ Σ)(D+

n )′

])Nan Li, Department of Finance, ACEM, SJTU


where Q = E(XtX′t), Dn is the unique matrix st. Dnvech(Σ) = vec(Σ) and

D+n = (D′nDn)−1D′n, st. D

+nDn = I

5. Wald Test of general hypothesis of the form

Rvec(Π) = r

√T (Rvec(ΠT )− r)

p−→ N(0, R(ΣT ⊗ Q−1T ))R′

hence

T (Rvec(ΠT )− r)′[R(ΣT ⊗ Q−1

T ))R′]−1

(Rvec(ΠT )− r) ∼ χ2(m)

4 Estimating the Effects of Shocks to the Economy

• Vector Autoregression for N × 1 vector of observed variables:

Xt = A1Xt−1 + ...+ApXp−1 + εt

Eεtε′t = V



• A’s, ε and V can be easily obtained by OLS• Problems: ε is statistical innovations— We want impulse response functions to fundamental economic shocks, ηt

εt = Cηt

Eηtη′t = I

CC′ = V

— Impulse response to ith shock:

Xt − Et−1Xt = Ciηit

EtXt+1 − Et−1Xt = A1Ciηit



5 Identification Problem

• We know A’s and V , we need to get C• Identification problem: not enough restrictions to pin down C— N2 unknown elements in C— Only N(N + 1)/2 equations in

CC′ = V

— Need more identifying restrictions!— Ambiguity of impulse response function for VAR (or VMA):

V AR : A(L)Xt = εt A(0) = I E(εtε′t) = Σ

VMA : Xt = B(L)εt B(0) = I E(εtε′t) = Σ

where B(L) = A(L)−1. If Σ is not diagonal, the system is in general unidenti-fied, the shocks and impulse responses are not identified. To show this, for anyfull rank Q such that QQ′ = I, we have

Xt = B(L)εt = B(L)ηt

where B(L) = B(L)Q−1 and ηt = Qεt. Hence B(L)εt and B(L)ηt areobservationally equivalent but with different IRs.



• Orthogonalization Assumptions:

1. Sims Orthogonalization: B(0) is lower triangular and E(ηtη′t) = I

(a) First variables is affected by its own shock contemporaneously; the other vari-able absorbs all the contemporaneous correlation between the additional shocksand the first shock.Note that in original system B(0) = I, or A(0) = I restrict each shock toaffect its own contemporaneously, but not B(0) 6= I unless Σ is diagonal.

(b) In term of MA representation,[x1tx2t

]= B(L)ηt =

[B0

11 0

B021 B0

22

] [η1tη2t

]+ B1ηt−1 + ....

(c) In term of AR representation, A(L) = B(L)−1, then B(0) is lower triangularimplies that A(0) be lower triangular or

A011x1t = −A1

11x1t−1 −A112x2t−1 +...+ η1t

A022x2t = −A0

21x1t −A121x1t−1 −A1

22x2t−1 +...+ η2t

that is, estimate the system by OLS with contemporaneous x1t in x2t, but notvice versa.



• Homework: Show that OLS residuals η1t and η2t are uncorrelated.(d) How to find A0 or B0 ? Cholesky decomposition will do the job.• Example:

Xt = AXt−1 + εt, E(εtε′t) = Σ

Let ηt = Cεt, then C should satisfy E(ηtη′t) = CΣC′ = I, and C is low

triangular. Cholesky decomposition of Σ will give us C−1.

Note: If Σ is Hermitian (symmetric) and positive definite, then Σ can bedecomposed as Σ = CCτ , where C is a lower triangular matrix with strictlypositive diagonal entries, and Cτ denotes the (conjugate) transpose of C.This is the Cholesky decomposition. (chol in matlab)

(e) Order of variables in VAR matters for interpretation of IRs, ideally determinedby economic theory

2. Example: Recursiveness Assumption:(a) Fed’s Policy rule

Rt = f(Ωt) + eRt

where f is a linear function, Ωt is set of variables that Fed looks at and eRt istime t policy shock



(b) What does this rule represent?• Literal interpretation: structural policy rule of central bank• Combination of structural rule and other "stuff" (see Clarida-Gertler)

True Policy Rule: Rt = αE[Xt+1|Ft] + eRt = f(zt) + eRt

where zt is all the time t data that generate information set Ft, in E(·|Ft)(c) What is a monetary policy shock?• Shocks to preference of monetary authority• Strategic consideration can lead to exogenous variation in policy (Self-fulfillingexpectation traps in Albanesi, Chari and Christiano)• Technical factors like measurement error (Bernank and Mihov)

(d) Problem: not enough assumptions to identify eRt• Assume:— policy shocks eRt are orthogonal to Ωt— Ωt contains current prices, wages, aggregate quantities, and lagged stuff• Economic content of this assumption:— Fed see prices and output when it makes its choice of Rt— Prices and output don’t respond at time t to eRt



• Response of other variables can be obtained by regression them on currentand lagged eRt• In VAR

A(L)Xt = εt

A(L) = I − A1L− A2L2 − ...− ApLp

εt = Cηt, CC′ = Σ

To think about recursiveness assumption, it is convenient to work with

A0 = C−1, A−10 A−1′

0 = Σ

A(L) = A0A(L)

Recursive assumption is then represented as

Xt =

X1tRtX2t

and A0 =

A11 0 0−→a21 a22 0A31

−→a32 A33

(**)

where Rt interest rate (middle equation is policy rule), X1t is k1 variableswhose current and lagged values do appear in policy rule and X2t is k2

variables whose current values do no appear in the policy rule



(e) Zero restrictions on A0 are implied by recursiveness assumption:• — Zeros in middle row: current values of X2t do not appear in policy rule— Zeros in first block of rows ensure that monetary policy shock does notaffect X1t: First block of zeros: prevent direct effect, via Rt; Second blockof zeros: prevent indirect effect via X2t

(f) There are many A0 which satisfy zero restrictions and

A−10 A−1′

0 = Σ (*)

• One normalization: lower triangular A0 with positive diagonal elements• A−1

0 is lower triangular Cholesky decomposition of Σ

(g) Proposition:• All A0 matrices that satisfy (*) and zero restrictions imply same value forcolumn of A−1

0 which corresponds to eRt , so we can work with low triangularCholesky decomposition of Σ without loss of generality• Suppose we change the ordering of the variables in X1t and X2t, but alwayspick lower triangular Cholesky decomposition of Σ, then dynamic response ofimpulse response of variable eRt unaffected.

3. Blanchard-Quah Orthogonalization(Long Run Identification):Nan Li, Department of Finance, ACEM, SJTU


(a) Restricts the long-run response for one variable to the other shock is zero, i.e.B(1) to be low triangular,

Xt = B(L)ηt, E(ηtη′t) = I

∞∑j=0

∂Xt+j

∂ηt= B(1)

(b) Why do we care? For system specified in changes,

∆Xt = B(L)ηt

limj→∞

∂Xt+j

∂ηt=

∞∑j=0

Bj = B(1)

B(1) gives the (limiting) long-run response of level of Xt to η shocks.(c) e.g. in DSGE model, technology shock is the only shock that has long-run

impact on level of labor productivity; in long-run risk model, only "permanentshock" has long-run impact on level of consumption and dividends; "demandshocks" have no long-run effect on GNP• There are two types of technology shocks: neutral and capital embodied

Yt = Z1tF (Kt, Lt)

Kt+1 = (1− δ)Kt + Z2tIt



• These are only shocks that can affect the log level of labor productivity• The only shock which also has a long run effect on the relative price of capitalis a capital embodied technology shock (Z2t)

• These identification strategies require that the variables in the VAR be co-variance stationary• Advantage of this approach:— Don’t need to make all the usual assumptions required to construct Solow-residual based measure of technology shocks, such as functional form as-sumption for production function, correction for labor hoarding, capital uti-lization and time-varying markups

• Disadvantage: some models don’t satisfy identification assumption— Endogenous growth models where all shocks affect productivity in the longrun

— Standard models when there are permanent shocks to the tax rate oncapital income

• Reference: Francis, Owyang and Theodorou (2003)(d) Implementation: Suppose you estimate the system from OLS, get A and Σ,

Xt = A1Xt−1 + ...+ApXt−p + εtNan Li, Department of Finance, ACEM, SJTU


Let ηt = C−1εt,such that E(ηtη′t) = I and

Xt = A1Xt−1 + ...+ApXt−p + Cηt

Define B(L) = A(L)−1 = (I −A1L−A2L2 − ...ApLp)−1, then

∞∑j=0

∂Xt+j

∂ηt= B(1)C = A(1)−1C

C should satisfy the following restrictions:• (exclusion restriction) B(1)C is low triangular• CC′ = Σ.

• (sign restriction), the (1,1) element of B(1)C is positiveSolution: Get Cholesky decomposition of B(1)ΣB(1)′ = PP ′, and letC = B(1)−1P.

(e) In particular VAR(1),

Xt = AXt−1 + Cηt∞∑j=0

∂Xt+j

∂ηt= (I −A)−1C



6 Variance Decomposition

How much of the k step ahead forecast error variance is due to a specified variable?

Xt = C(L)ηt, E(ηtη′t) = I

vart(Xt+k) = C0C′0 + C1C

′1 + ...+ Ck−1C

′k−1

decompose CjC′j as∑nτ=1CjIτC

′j, then we have

vart(Xt+k) =n∑τ=1

(k−1∑j=0

CjIτC′j) =

n∑τ=1

(νk,τ)

let k −→∞

var(Xt) =n∑τ=1

(ντ) =n∑τ=1

(∞∑j=0

CjIτC′j)

• VAR(1) representation



Yt = AYt−1 + Cηt, E(ηtη′t) = I

∂Yt+k∂εt

= AkC

vart(Yt+k) =k−1∑j=0

AjCC′Aj =n∑τ=1

(νk,τ), vk,τ =k−1∑j=0

AjCIτC′Aj

var(Yt) =∞∑j=0

AjCC′Aj =n∑τ=1

(ντ),

vτ =∞∑j=0

AjCIτC′Aj = CIτC

′ +A(∞∑j=1

Aj−1CIτC′Aj−1)A′

= CIτC′ +AvτA

′

Alternatively we can compute vk,τ recursively,

vk+1,τ = CIτC′ +Aνk,τA

′, for k > 1

ν1,τ = CIτC′



7 Standard Error for Impulse Response Functions

- Analytically from the distribution of AR parameters

- By Monte Carlo, for Gaussian residuals

- By bootstrap for sample and non Gaussian residuals.

7.1 Confidence Intervals and the Bootstrap

• Estimation produces:

Xt = A(L)Xt−1 + εt,

εt, t = 1, 2, ..., T

• BootstrapNan Li, Department of Finance, ACEM, SJTU


1. Generate r = 1, ..., R artificial data sets, each of length T.— For rth dataset:

λrt ∈ Uniform[0, 1], t = 1, ..., T

— Convert to integers ∈ 1, 2, ..., T :

λrt = integer(λrt × T ), t = 1, ..., T

— Draw shocks

ελr1, ..., ε

λrT

— Generate artificial data:

Xrt = A(L)Xr

t−1 + ελrt, t = 1, ..., T

2. Suppose statistics of interest is φ (could be vector of impulse response functions,serial correlation coeffi cients, etc.)

φr = f(Xr1, ....X

rT ), r = 1, 2, ..., R



— Compute

σφ =

1

R

R∑r=1

(φr − φ)2

1/2

— Report

φ± 2× σφ

7.2 VAR Diagnostics

• Whether or not to take first difference is important, for example: hours and pro-ductivity, consumption and dividends• Choose VAR Lag Length



— Construct s(p)

Akaike : s(p) = log(det Σp) + (m+m2p)2

T

Hannan-Quinn : s(p) = log(det Σp) + (m+m2p)2 log(log(T ))

T

Schwarz : s(p) = log(det Σp) + (m+m2p)log(T )

T

where T is sample size, m is number of variables, p is number of lags— choose optimal p

p = arg minps(p)

— With T = 170:

2

T= 0.0118;

2 log(log(T ))

T= 0.0192;

log(T )

T= 0.0302

— Akaike penalizes p the least∗ Hannan-Quinn and Schwarz (or Bayesian information Criterion BIC) are con-sistent∗ In population, Akaike has positive probability of overshooting true p.



— These IC can be used to compare estimated models only when the numericalvalues of the dependent variable are identical for all estimates being compared.The models being compared need not be nested, unlike the case when modelsare being compared using an F or likelihood ratio test.



8 Granger Causality

1. Basic idea• A forecasting relation• Different from "precede effect"

2. Definitionwt Granger causes yt if wt helps to forecast yt, given past yti.e. for s > 0

MSE[E(yt+s|yt, yt−1, ...)] > MSE[E(yt+s|yt, yt−1, ...wt, wt−1, ...)

]• Autoregressive presentation

yt = a(L)yt−1 + b(L)wt−1 + δt

wt = c(L)yt−1 + d(L)wt−1 + νt

wt does not Granger cause yt iff b(L) = 0, or

A(L)

[ytwt

]=

[δtνt

]

A(L) =

[I − La(L) −Lb(L)Lc(L) I − Ld(L)

]≡[a∗(L) b∗(L)c∗(L) d∗(L)

]Nan Li, Department of Finance, ACEM, SJTU


wt does not Granger cause yt iff b∗(L) = 0

• MA presentation

[ytwt

]= A(L)−1

[δtνt

]

=1

a∗(L)d∗(L)− b∗(L)c∗(L)

[d∗(L) −b∗(L)−c∗(L) a∗(L)

] [δtνt

]

≡[a(L) b(L)

c(L) d(L)

] [δtνt

]wt does not Granger cause yt iff the Wold moving average matrix lag polynomialis lower triangularwt does not Granger cause yt iff y’s bivariate Wold representation is same asits univariate Wold representation.• Univariate presentationConsider the pair of univariate Wold representation

yt = e(L)ξt, ξt = yt − E(yt|yt−1, yt−2, ...)

wt = f(L)µt, µt = wt − E(wt|wt−1, wt−2, ...)Nan Li, Department of Finance, ACEM, SJTU


wt does not Granger cause yt if E(µtξt+j) = 0 for all j > 0, i.e. the univariateinnovations of wt are uncorrelated with the univariate innovation in ytProof:

yt = a(L)δt = e(L)ξtwt = c(L)δt + d(L)vt = f(L)µt

=⇒ µt as a combination of δt and νt is uncorrelated with δt+j, hence uncor-related with ξt+j for all j > 0

Note: E(µtξt+j) = 0 =⇒ past µ do not help to forecast ξ =⇒ µ do not helpto forecast yt =⇒ wt = f(L)µt does not help to forecast ytwt does not Granger cause yt then the response of y to w shocks is zero.• Effect on projections: w does not Granger cause y iff E(wt|all yt) = E(wt|currentand past (no future) yt).Proof:

wt = c(L)a(L)−1yt + d(L)νt

3. Test of Granger CausalityF test of

H0 : b1 = b2 = ...bp = 0Nan Li, Department of Finance, ACEM, SJTU


Run an unconstrained regression, save

RSS1 =T∑t=1

u2t

Run a constrained regression, which is a univariate AR(p) for y and save

RSS0 =T∑t=1

e2t

Let

F =(RSS0 −RSS1)/p

RSS1/(T − 2p− 1)∼ F (p, T − 2p− 1)

asymptotically equivalent to

S2 =T (RSS0 −RSS1)

RSS1∼ χ2(p)

4. Interpreting Granger-causality test• It is not necessarily that one pair of variable must Granger cause the other andvice versa, example: money growth and GNP,- Question: fed rate and stock market?



• Warning: Granger causality is not causality!5. Granger causality in multivariate context• Estimation

y1t = c1 +A′1x1t +A′2x2t + ε1t

y2t = c2 +B′1x1t +B′2x2t + ε2t

H0 : A2 = 0

which is equivalent to estimate

y1t = c1 +A′1x1t +A′2x2t + ε1t

y2t = d+D′0y1t +D′1x1t +D′2x2t + ν2t

H0 : A2 = 0

Proof:

f(yt|xt; θ) = f(y1t|xt; θ)f(y2t|y1t, x; θ)

var(y2t|y1t, x) = Σ22 − Σ21Σ−111 Σ12

E(y2t|y1t, xt) = E(y2t|xt) + Σ21Σ−111 [y1t − E(y1t|xt)]



• Test: likelihood ratio test.

2L(θ)−L(θ(0)) = T (log |Σ11(0)| − log |Σ11|) ∼ χ2(number of restrictions)

• If A2 = 0 and B1 = 0, does it mean that there is no relation between y1t andy2t at all? Not necessarily.- Contemporaneous linear dependence may present.- Geweke test of linear dependence and decomposition of linear dependence



9 Kalman Filter

• An algorithm for sequentially updating a linear projection for the system• Deduce the restrictions that the models of the economy and of data collectionimpose on the "innovation representation" of the dynamics of the variables ofinterests• Using Kalman filter we can— calculate exact finite-sample forecasts— exact likelihood function for Gaussian ARMA process— factorize spectral density*— estimate VAR with coeffi cients that change over time*



9.1 State-Space Representation

state equation: ξt+1 = Fξt + Cvt+1

observation equation : yt = A′xt +H ′ξt + wt

ξt : vector of state variables

yt : vector of observed variables

xt : vector of exogenous or predetermined variables,

provide no info about xt+s or vt+svt : vector of white noise or martingale difference sequence of shocks

wt : vector of white noise or martingale difference sequence

of measurement error

E(vtv′τ

)=

I for t = τ0 otherwise

, E(wtw

′τ

)=

R for t = τ0 otherwise

E (vtwτ) = 0 for all t and τ , CC′ = Q



We need assumptions about ξ1

E(vtξ′1

)= 0, E

(wtξ′1

)= 0, for t = 1, 2, ...T.

In addition, assume that ξ1 is a random vector with know mean and covariance matrix,

E(ξ1) = ξ1

E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′

]= Σ1

Example of state-space representation

AR(p)

(yt − µ) = φ1(yt−1 − µ) + φ2(yt−2 − µ) + ...+ φp(yt−p − µ) + εt

ξt =

yt − µyt−1 − µ

...yt−p+1 − µ

, F =

φ1 φ2 ... φp−1 φp1 0 ... 0 0... ... ... ... ...0 0 ... 1 0

, C =

σ0...0

,vt+1 = εt+1/σ

yt = yt, A′ = µ, xt = 1, H ′ =

[1 0 ... 0

], wt = 0, R = 0,



MA(1)

yt = µ+ εt + θεt−1

ξt =

[εtεt−1

], F =

[0 01 0

], C =

[σ0

],

vt+1 = εt+1/σ

yt = yt, A′ = µ, xt = 1, H ′ =

[1 θ

], wt = 0, R = 0,

or

ξt =

[εt + θε−1

θεt

], F =

[0 10 0

], C =

[σσθ

],

vt+1 = εt+1/σ

yt = yt, A′ = µ, xt = 1, H ′ =

[1 0

], wt = 0, R = 0,

• Application in finance and macroeconomics— Real interest rate (Fama and Gibbons 1982), business cycle (Stock and Watson1991), market expectation of inflation (Hamilton 1985), capital stock (Li 2005)and etc.



— Estimation of a model specified at a finer time interval than pertains to theavailable dataKalman Filter Algorithm

Observed data: ytTt=1 , xtTt=1 , we want to construct linear least square forecasts

of ξt and yt based on the data observed through date t.

ξt+1|t = E(ξt+1|yt)

where yt =(y′t, ....y

′1, x′t, ...x

′1

)′. The MSE of the forecast is

Pt+1|t = E

[(ξt+1 − ξt+1|t

) (ξt+1 − ξt+1|t

)′]

• Idea: We accomplish this by constructing an innovation process yt such that[yt, E(ξ1)

]forms an orthogonal basis for the information set

[yt, E(ξ1)

], and

then recursively calculate the projection ξt+1 on[yt, E(ξ1)

]. The orthogonal basis

for[yt, E(ξ1)

]is constructed using Gram-Schmidt process.

Let y1 be the residual from a regression of y1 on E(ξ1) = ξ1|0

y1 = y1 −A′x1 +H ′ξ1|0Nan Li, Department of Finance, ACEM, SJTU


we can check that E [y1] = 0, and[y1, ξ1|0

]and

[y1, ξ1|0

]span the same linear

spaceNote that here we use the exogeneity of xt

E[ξt|xt, yt−1

]= E

[ξt|yt−1

]Next, form y2 as the residual from a regression of y2 on

[y1, ξ1|0

]y2 = y2 − E(y2|y1, ξ1|0)

then E [y1] = 0, E[y2y′1

]= 0 and E

[y2ξ′1|0]

= 0;[y2, ξ1|0

]and

[y2, ξ1|0

]span

the same linear spaceContinuing in this way, form

yt = yt − E(yt|yt−1, ξ1|0)

yt is the innovation representation of yt , and[yt, E(ξ1)

]forms an orthogonal

basis for the information set[yt, E(ξ1)

]



9.2 Kalman Filter Algorithm:

• — Step 0 - Starting point: If E(ξ1) and E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′

]is known

ξ1|0 = E(ξ1)

P1|0 = Σ1 = E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′

]Otherwise, if eigenvalues of F are all inside the unit circle, the ξt is weakstationary, hence we can solve for E(ξ1) and Σ1 directly

E(ξt+1

)= FE(ξt) =⇒ E (ξt) = 0

Σ = FΣF ′ +Q =⇒ vec(Σ) = [I − F ⊗ F ]−1 vec(Q)

=⇒

ξ1|0 = 0

vec(P1|0) = [I − F ⊗ F ]−1 vec(Q)



— Step 1: construct yt with ξt|t−1 and Pt|t−1

E(yt|xt,ξt) = A′xt +H ′ξtyt|t−1 ≡ E(yt|xt, yt−1) = A′xt +H ′E(ξt|xt, yt−1) = A′xt +H ′ξt|t−1

yt = yt − yt|t−1 = H ′(ξt − ξt|t−1

)+ wt

with MSE

E[yty′t

]= H ′Pt|t−1H +R

— Step 2: Update the inference about ξt

ξt|t = E(ξt|xt, yt, yt−1) = E(ξt|yt)= ξt|t−1 + Γtyt

Γt = E[(ξt − ξt|t−1)y′t]E(yty′t)−1

= Pt|t−1H(H ′Pt|t−1H +R)−1

where

E[(ξt − ξt|t−1)y′t] = E[(ξt − ξt|t−1)(H ′(ξt − ξt|t−1

)+ wt)

′] = Pt|t−1H



— Step 3: Forecast ξt+1 based on yt

ξt+1|t = F ξt|t = F ξt|t−1 + F (ξt|t − ξt|t−1)

where

ξt|t − ξt|t−1 = Γtyt = Γt(yt −A′xt −H ′ξt|t−1)

=⇒

ξt+1|t = F ξt|t = F ξt|t−1 +Ktyt

where Kt is "Kalman gain matrix"

Kt = FΓt = FPt|t−1H(H ′Pt|t−1H +R)−1

Note that

ξt+1|t = F tE (ξ1) +t∑

j=1

F j−1Kjyj

— Step 3: update Pt|t−1 = E[(ξt+1 − ξt+1|t)(ξt+1 − ξt+1|t)

′]

ξt+1 − ξt+1|t = (F −KtH ′)(ξt − ξt|t−1) + Cvt+1 −KtwtNan Li, Department of Finance, ACEM, SJTU


note yt = H ′(ξt − ξt|t−1) + wt, Hence

Pt+1|t = (F −KH ′)Pt|t−1(F −KH ′)′ + CC′ +KtRK′t

= FPt|t−1F′ −KtH ′Pt|t−1F

′ +Q



9.3 Innovation Representation

ξt+1|t = F ξt|t−1 +Ktyt

yt = Axt +H ′ξt|t−1 + yt

E[yty′t

]= H ′Pt|t−1H +R

is a time varying innovation representation of the original state-space representationstarting from initial condition ξ1|0 and P1|0

Or

yt = yt −Axt −H ′ξt|t−1

ξt+1|t = F ξt|t−1 +Ktyt

recursively filter out a record of innovations ytTt=1 from ξ1|0 and ytTt=1 , this iscalled a "whitening filter", which transform serially correlated process yt to a seriallyuncorrelated (i.e. "white") process ytTt=1 .

ytTt=1 is called a fundamental white noise for the ytTt=1 process.



9.4 Convergence Results

• If F have eigenvalues inside unite circle, Q and R are positive semidefinite symmet-ric matrices (at least one if strictly positive definite), then

Pt+1|t

is a monoton-

ically nonincreasing sequence and converges as T −→∞ to a steady state matrixP (which is unique), and

P = F[P − PH(H ′PH +R)−1H ′P

]F ′ +Q

and steady state value for Kalman gain matrix is

K = FPH(H ′PH +R)−1

has the property that the eigenvalues of(F −KH ′

)all lie on or inside unite circle.

• Use Kalman filter to find Wold decomposition.

ξt+1|t = F ξt|t−1 +K(yt −H ′ξt|t−1)

= (F −KH ′)ξt|t−1 +Kyt



=⇒

ξt+1|t =[I − (F −KH ′)L

]−1Kyt

yt+1|t = H ′ξt+1|t = H ′[I − (F −KH ′)L

]−1Kyt

yt+1 = yt+1 − yt+1|t = I −H ′[I − (F −KH ′)L

]−1KLyt+1

=⇒

yt+1 = I −H ′[I − (F −KH ′)L

]−1KL−1yt+1

= I +H ′[I − (F −KH ′)L

]−1KLyt+1



9.5 Serially Correlated Measurement Errors

If measurement error wt is serially correlated,


observation equation : yt = H ′ξt + wt

wt = Dwt−1 + ηt

E(ηtη′t

)= R

E[vtη′s

]= 0 for all t and s

Idea to transform yt to yt such the corresponding measurement error is uncorre-lated. Define

yt = yt+1 −Dyt= H ′ξt+1 + wt+1 −DH ′ξt −Dwt= (H ′F −DH ′)ξt +H ′Cvt+1 + ηt+1



thusξt, yt is governed by the state-space system


observation equation : yt = H ′ξt + wt

where H ′ = H ′F − DH ′. wt = H ′Cvt+1 + ηt+1 is the new "measurement noise",which is contemporary correlated with vt+1, but not serially correlated.

E(Cvt+1w′t) = CC′H = QH

E(wtw′t) = H ′QH +R

We can do the same thing to find the innovation representation ut of yt ,

ξt+1|t = F ξt|t−1 +Ktut

yt = H ′ξt|t−1 + ut

the only thing we need to change is

E(utu′t

)= H ′Pt|t−1H + (H ′QH +R)



Pt+1|t = (F −KH ′)Pt|t−1(F −KH ′)′ +Q+Kt(H′QH +R)K′t −QHK′t

= FPt|t−1F′ −KtH ′Pt|t−1F

′ +Q+KtH′QHK′t −QHK′t

For the original processyt , an alternative state-space representation is given by com-bining the innovation representation of yt and

yt+1 = Dyt + yt

hence

state equation:

[ξt+1|tyt+1

]=

[F 0H D

] [ξt|t−1

yt

]+

[KtI

]ut

observation equation : yt =[

0 I] [ ξt|t−1

yt

]+ [0]ut



9.6 MLE estimation of the parameters

• Let θ be the vector of parameter in F,H,A,Q,R, the estimates are obtained bymaximize the loglikelihood of θ given ytTt=1

f(yT , yT−1, ...y1; θ) = fT (yT |yT−1, ...y1)fT−1(yT−1|yT−2, ..., y1)...f1(y1)

y1Ñ(H ′ξ1|0, H′P1|0H +R),

yt|yt−1, ...y1Ñ(H ′ξt|t−1, H′Pt|t−1H +R)

on the other hand

ytÑ(0, H ′Pt|t−1H +R)

so if ξt is stationary, E(ξ1) = 0, then f1(y1) = g1(y1), and since

yt = H ′ξt|t−1 + yt

we have

ft(yt|yt−1, ...y1) = gt(yt)



hence the loglikelihood of yT is

T∑t=1

gt(yt)

= −1

2

T∑t=1

n log(2π) + log |H ′Pt|t−1H +R|+ y′t(H

′Pt|t−1H +R)−1yt

• Initialization— Stationary process: ξ1|0 = E(ξ1), and P1|0 = Σ1

— *Nonstationary process: put diffusion prior on ξ1

• Identification:Without putting restrictions on F,H,A,Q,R, the parameters of the state-spacerepresentation are unidentified. example

ξt+1 =

[ε1,t+1ε2,t+1

], yt = ε1t + ε2t

— Global identification at θ0: for any other θ, there exists yT such that f(yT ; θ) 6=f(yT ; θ0)

— Local identification at θ0: information matrix is nonsingular in the neighborhoodaround θ0



9.7 Smoothing

we want to form inference about state variables not only based on historical data butthe entire available data, that is, ξt|T = E(ξt|yT )

Step 1: E(ξt|ξt+1, yt) = ξt|t + Jt(ξt+1 − ξt+1|t

)Jt = E

[(ξt − ξt|t)(ξt+1 − ξt+1|t)

′]× E [(ξt+1 − ξt+1|t)(ξt+1 − ξt+1|t)′]−1

= Pt|tF′P−1t+1|t

Step 2:

E(ξt|ξt+1, yT ) = E(ξt|ξt+1, y

t) = ξt|t + Jt(ξt+1 − ξt+1|t

)

Step 3:

E(ξt|yT ) = ξt|t + Jt(E(ξt+1|yT )− ξt+1|t

)Nan Li, Department of Finance, ACEM, SJTU


Step 4:

Pt|T = Pt|t + Jt(Pt+1|T − Pt+1|t)J′t

Summarize: smoothed sequence is generated by backward recursion: start from ξT |T ,PT |T .



9.8 Statistical Inference with the Kalman Filter

We assumed that the true value θ is used to construct ξ and P, but in practice we usethe estimates of θ instead.

E[(ξt − ξt|T (θ))(ξt − ξt|T (θ))′|yT

]= E

[(ξt − ξt|T (θ0))(ξt − ξt|T (θ0))′|yT

]︸︷︷︸

filter uncertainty

+E[(ξt|T (θ0)− ξt|T (θ))(ξt|T (θ0)− ξt|T (θ))′|yT

]︸︷︷︸

parameter uncertainty

To measure these two parts of uncertainty we can use Monte Carlo simulation

θ|yTÑ(θ,1

TI−1)

Take M number of draws θ(j) from this distribution, and calculate

1

M

M∑j=1

[(ξt|T (θ(j))− ξt|T (θ))(ξt|T (θ(j))− ξt|T (θ0))′|yT

]Nan Li, Department of Finance, ACEM, SJTU


to estimate "parameter uncertainty", and use

1

M

M∑j=1

Pt|T (θ(j))

to estimate "filter uncertainty", the summation of the two estimate MSE of ξt|T (θ)

around the true value of ξt



9.9 Application:

• Aggregation over time— average over time

ξt+1 = Fξt + vt+1, t = 0, 1, 2, ...

yt = H ′ξtexpand the state space by including enough lags, Xt = [ξt ξt−1,...ξt−p]

Xt+1 = FXt + Cwt+1

yt = GXt

— skip sample, the data is sampled every τ > 0 period,

ξt+τ = Fτξt + vτt+τ , t = 0, τ , 2τ , ...

yt = H ′ξtwhere

Fτ = F τ

vτt+τ = F τ−1Cwt+1 + F τ−2Cwt+2 + ...+ Cwt+τ

E[vτt+τv

τ ′t+τ

]= CC′ + FCC′F ′ + ...+ F τ−1CC′(F τ−1)′



represented in state-space system as

ξs+1 = Fτξs + vτs+1, s = 0, 1, 2, ...

ys = H ′ξs

• Estimate dynamics of unobserved variables in the economy:Real interest rate (Fama and Gibbons 1982), business cycle (Stock and Watson1991), market expectation of inflation (Hamilton 1985), intangible capital stock (Li2005) and etc.


Documents

Time Series Analysis, Lecture 2, 2019 1 Contents · 2019. 9. 28. · Time Series Analysis, Lecture 2, 2019 6 Remark 1 P1 j=0 j"t tj is called the linearly indeterministic component,