Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Time Series Analysis, Lecture 2, 2019 1
Contents
1 Wold’s Decomposition 4
2 VAR 8
3 MLE and Hypothesis Testing for VAR 10
4 Estimating the Effects of Shocks to the Economy 13
5 Identification Problem 15
6 Variance Decomposition 24
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 2
7 Standard Error for Impulse Response Functions 26
7.1 Confidence Intervals and the Bootstrap . . . . . . . . . . . . . . . . . 26
7.2 VAR Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8 Granger Causality 31
9 Kalman Filter 37
9.1 State-Space Representation . . . . . . . . . . . . . . . . . . . . . . . 38
9.2 Kalman Filter Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . 43
9.3 Innovation Representation . . . . . . . . . . . . . . . . . . . . . . . 47
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 3
9.4 Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.5 Serially Correlated Measurement Errors . . . . . . . . . . . . . . . . . 50
9.6 MLE estimation of the parameters . . . . . . . . . . . . . . . . . . . 53
9.7 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.8 Statistical Inference with the Kalman Filter . . . . . . . . . . . . . . 57
9.9 Application: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 4
1 Wold’s Decomposition
All the stationary ARMA model can be written in the form
xt =∞∑j=1
θjεt−j
where εt is the white noise one would make in forecasting xt as a linear function oflagged xt and where θj’s are square summable and with θ0 = 1.
• Wold’s Decomposition Theorem says this result is in fact fundamental for anycovariance-stationary time series, not just stationary ARMAs!
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 5
Theorem 1 (Wold’s Decomposition) Any zero-mean covariance-stationary process xtcan be represented in the form
xt =∞∑j=0
θjεt−j + ηt
where
1. θ0 = 1 and∑∞j=0 θ
2j <∞,
2. εt is white noise and εt = xt − E(xt|xt−1, xt−2, ......)
3. All the roots of θ(L) are on or outside the unit circle, i.e. (unless is a unite rootprocess) the MA polynomial is invertible.
3. The value ηt is uncorrelated with εt−j for any j and is linear deterministic, i.e.ηt = E(ηt|xt−1, xt−2, ......)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 6
Remark 1∑∞j=0 θjεt−j is called the linearly indeterministic component, if ηt = 0
then the process is called purely linear indeterministic (linearly regular).
Remark 2 . E(εt|xt−1, xt−2, ......) = 0
Remark 3θjand
εjare unique.
Idea of proof: rewrite xt as a sum of its forecast errors.
Remark 4 Extension for nonstationary time series, same as above except for that ηt islinear combination of its own past (not necessarily deterministic)
Remark 5 εt need NOT to be normally distributed and not to i.i.d
Remark 6 E(xt|xt−1, xt−2, ......) 6= E [xt|xt−1, xt−2, ......] (linear vs nonlinear)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 7
Remark 7 εt need not to be the true shock
Remark 8 Wold’s decomposition is the unique linear presentation where shocks arelinear forecast error, not true for nonlinear presentation
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 8
Example 1 Non-invertible shocks
xt = ηt + 2ηt−1 where ηt i.i.d and σ2η = 1
xt stationary but MA polynomial is not invertible, hence can not express ηt asforecast error of xt.
Solution: Any MA(∞) can be expressed as an invertible MA(∞), which is unique,and said to be the fundamental innovations of xt
• Wold MA(∞) representation as fundamental representation: if two time serieshave the same Wold representation, they are same time series up to second mo-ment/linear forecast error.
2 VAR
• Proposed by Chris Sims in 1970s and 1980sNan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 9
• Major subsequent contributions by others (Bernanke, Blanchard-Watson, Blanchard-Quah)• Useful to organize data— VARs serve as "battleground" between alternative economic theories— VARs can be used to quantitatively construct a particular model
• Question that can (in principle) be addressed by VAR:— How does the economy respond to a particular shock?— Answer can be useful:∗ For discriminating between models∗ For estimating parameters of a given model
• VARs can’t actually address such a question:— Identification problem— Need extra assumption....structural VAR(SVAR)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 10
3 MLE and Hypothesis Testing for VAR
1. The conditional likelihood function for a vector auroregression
Yt = c+ Φ1Yt−1 + Φ2Yt−2 + ...+ ΦpYt−p + εt, εt ∼ i.i.d.N(0,Σ)
Given the initial p observations, the conditional likelihood of θ = (c,Φ1,Φ2, ..,Φp,Σ)′
is
f(YT , YT−1,...,Y1|Y0, Y−1, ...Y−p+1; θ)
= ΠTt=1f(Yt|Yt−1, Yt−2,...,Y−p+1; θ)
and
Yt|Yt−1, Yt−2,...,Y−p+1 ∼ N(Π′Xt,Σ)
where
Xt =[
1 Y ′t−1 Y ′t−2 ... Y ′t−p]′(np+1)×1
Π′ =[c Φ1 Φ2 ... Φp
]n×(np+1)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 11
Hence
L(θ) = −(Tn
2) log(2π) +
T
2log |Σ−1| − 1
2
T∑t=1
[(Yt − Π′Xt)′Σ−1(Yt − Π′Xt)
]2. Maximum Likelihood Estimate of Π and Σ
Π =
T∑t=1
YtX′t
T∑t=1
XtX′t
−1
which is same as "OLS regression equation-by-equation"
Σ =1
T
T∑t=1
εtε′t
εt = Yt − Π′Xt
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 12
3. Likelihood Ratio Test
L(Σ, Π) = −(Tn
2) log(2π) +
T
2log |Σ−1|
−1
2
T∑t=1
[(Yt − Π′Xt)′Σ−1(Yt − Π′Xt)
]= −(
Tn
2) log(2π) +
T
2log |Σ−1| − (Tn/2)
2(L1 − L0) = T (log |Σ0| − log |Σ1|) ∼ χ2(l)
where l is the number of restrictions, for example, a test of p1(H1) vs. p0(H0)
lags (p1 > p0) in a n variable VAR, l = n2(p1 − p0).
Modified likelihood test for small sample bias (Sims 1980)
(T − k)(log |Σ0| − log |Σ1|) ∼ χ2(l), k = 1 + np1
less likely to reject the null hypothesis in small sample.4. Asymptotic Distribution of Π and Σ : Π and Σ are consistent and[ √
T (vec(ΠT )− vec(Π))√T (vech(ΣT )− vech(Σ))
]L−→ N
([Σ⊗Q−1 0
0 2D+n (Σ′ ⊗ Σ)(D+
n )′
])Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 13
where Q = E(XtX′t), Dn is the unique matrix st. Dnvech(Σ) = vec(Σ) and
D+n = (D′nDn)−1D′n, st. D
+nDn = I
5. Wald Test of general hypothesis of the form
Rvec(Π) = r
√T (Rvec(ΠT )− r)
p−→ N(0, R(ΣT ⊗ Q−1T ))R′
hence
T (Rvec(ΠT )− r)′[R(ΣT ⊗ Q−1
T ))R′]−1
(Rvec(ΠT )− r) ∼ χ2(m)
4 Estimating the Effects of Shocks to the Economy
• Vector Autoregression for N × 1 vector of observed variables:
Xt = A1Xt−1 + ...+ApXp−1 + εt
Eεtε′t = V
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 14
• A’s, ε and V can be easily obtained by OLS• Problems: ε is statistical innovations— We want impulse response functions to fundamental economic shocks, ηt
εt = Cηt
Eηtη′t = I
CC′ = V
— Impulse response to ith shock:
Xt − Et−1Xt = Ciηit
EtXt+1 − Et−1Xt = A1Ciηit
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 15
5 Identification Problem
• We know A’s and V , we need to get C• Identification problem: not enough restrictions to pin down C— N2 unknown elements in C— Only N(N + 1)/2 equations in
CC′ = V
— Need more identifying restrictions!— Ambiguity of impulse response function for VAR (or VMA):
V AR : A(L)Xt = εt A(0) = I E(εtε′t) = Σ
VMA : Xt = B(L)εt B(0) = I E(εtε′t) = Σ
where B(L) = A(L)−1. If Σ is not diagonal, the system is in general unidenti-fied, the shocks and impulse responses are not identified. To show this, for anyfull rank Q such that QQ′ = I, we have
Xt = B(L)εt = B(L)ηt
where B(L) = B(L)Q−1 and ηt = Qεt. Hence B(L)εt and B(L)ηt areobservationally equivalent but with different IRs.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 16
• Orthogonalization Assumptions:
1. Sims Orthogonalization: B(0) is lower triangular and E(ηtη′t) = I
(a) First variables is affected by its own shock contemporaneously; the other vari-able absorbs all the contemporaneous correlation between the additional shocksand the first shock.Note that in original system B(0) = I, or A(0) = I restrict each shock toaffect its own contemporaneously, but not B(0) 6= I unless Σ is diagonal.
(b) In term of MA representation,[x1tx2t
]= B(L)ηt =
[B0
11 0
B021 B0
22
] [η1tη2t
]+ B1ηt−1 + ....
(c) In term of AR representation, A(L) = B(L)−1, then B(0) is lower triangularimplies that A(0) be lower triangular or
A011x1t = −A1
11x1t−1 −A112x2t−1 +...+ η1t
A022x2t = −A0
21x1t −A121x1t−1 −A1
22x2t−1 +...+ η2t
that is, estimate the system by OLS with contemporaneous x1t in x2t, but notvice versa.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 17
• Homework: Show that OLS residuals η1t and η2t are uncorrelated.(d) How to find A0 or B0 ? Cholesky decomposition will do the job.• Example:
Xt = AXt−1 + εt, E(εtε′t) = Σ
Let ηt = Cεt, then C should satisfy E(ηtη′t) = CΣC′ = I, and C is low
triangular. Cholesky decomposition of Σ will give us C−1.
Note: If Σ is Hermitian (symmetric) and positive definite, then Σ can bedecomposed as Σ = CCτ , where C is a lower triangular matrix with strictlypositive diagonal entries, and Cτ denotes the (conjugate) transpose of C.This is the Cholesky decomposition. (chol in matlab)
(e) Order of variables in VAR matters for interpretation of IRs, ideally determinedby economic theory
2. Example: Recursiveness Assumption:(a) Fed’s Policy rule
Rt = f(Ωt) + eRt
where f is a linear function, Ωt is set of variables that Fed looks at and eRt istime t policy shock
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 18
(b) What does this rule represent?• Literal interpretation: structural policy rule of central bank• Combination of structural rule and other "stuff" (see Clarida-Gertler)
True Policy Rule: Rt = αE[Xt+1|Ft] + eRt = f(zt) + eRt
where zt is all the time t data that generate information set Ft, in E(·|Ft)(c) What is a monetary policy shock?• Shocks to preference of monetary authority• Strategic consideration can lead to exogenous variation in policy (Self-fulfillingexpectation traps in Albanesi, Chari and Christiano)• Technical factors like measurement error (Bernank and Mihov)
(d) Problem: not enough assumptions to identify eRt• Assume:— policy shocks eRt are orthogonal to Ωt— Ωt contains current prices, wages, aggregate quantities, and lagged stuff• Economic content of this assumption:— Fed see prices and output when it makes its choice of Rt— Prices and output don’t respond at time t to eRt
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 19
• Response of other variables can be obtained by regression them on currentand lagged eRt• In VAR
A(L)Xt = εt
A(L) = I − A1L− A2L2 − ...− ApLp
εt = Cηt, CC′ = Σ
To think about recursiveness assumption, it is convenient to work with
A0 = C−1, A−10 A−1′
0 = Σ
A(L) = A0A(L)
Recursive assumption is then represented as
Xt =
X1tRtX2t
and A0 =
A11 0 0−→a21 a22 0A31
−→a32 A33
(**)
where Rt interest rate (middle equation is policy rule), X1t is k1 variableswhose current and lagged values do appear in policy rule and X2t is k2
variables whose current values do no appear in the policy rule
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 20
(e) Zero restrictions on A0 are implied by recursiveness assumption:• — Zeros in middle row: current values of X2t do not appear in policy rule— Zeros in first block of rows ensure that monetary policy shock does notaffect X1t: First block of zeros: prevent direct effect, via Rt; Second blockof zeros: prevent indirect effect via X2t
(f) There are many A0 which satisfy zero restrictions and
A−10 A−1′
0 = Σ (*)
• One normalization: lower triangular A0 with positive diagonal elements• A−1
0 is lower triangular Cholesky decomposition of Σ
(g) Proposition:• All A0 matrices that satisfy (*) and zero restrictions imply same value forcolumn of A−1
0 which corresponds to eRt , so we can work with low triangularCholesky decomposition of Σ without loss of generality• Suppose we change the ordering of the variables in X1t and X2t, but alwayspick lower triangular Cholesky decomposition of Σ, then dynamic response ofimpulse response of variable eRt unaffected.
3. Blanchard-Quah Orthogonalization(Long Run Identification):Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 21
(a) Restricts the long-run response for one variable to the other shock is zero, i.e.B(1) to be low triangular,
Xt = B(L)ηt, E(ηtη′t) = I
∞∑j=0
∂Xt+j
∂ηt= B(1)
(b) Why do we care? For system specified in changes,
∆Xt = B(L)ηt
limj→∞
∂Xt+j
∂ηt=
∞∑j=0
Bj = B(1)
B(1) gives the (limiting) long-run response of level of Xt to η shocks.(c) e.g. in DSGE model, technology shock is the only shock that has long-run
impact on level of labor productivity; in long-run risk model, only "permanentshock" has long-run impact on level of consumption and dividends; "demandshocks" have no long-run effect on GNP• There are two types of technology shocks: neutral and capital embodied
Yt = Z1tF (Kt, Lt)
Kt+1 = (1− δ)Kt + Z2tIt
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 22
• These are only shocks that can affect the log level of labor productivity• The only shock which also has a long run effect on the relative price of capitalis a capital embodied technology shock (Z2t)
• These identification strategies require that the variables in the VAR be co-variance stationary• Advantage of this approach:— Don’t need to make all the usual assumptions required to construct Solow-residual based measure of technology shocks, such as functional form as-sumption for production function, correction for labor hoarding, capital uti-lization and time-varying markups
• Disadvantage: some models don’t satisfy identification assumption— Endogenous growth models where all shocks affect productivity in the longrun
— Standard models when there are permanent shocks to the tax rate oncapital income
• Reference: Francis, Owyang and Theodorou (2003)(d) Implementation: Suppose you estimate the system from OLS, get A and Σ,
Xt = A1Xt−1 + ...+ApXt−p + εtNan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 23
Let ηt = C−1εt,such that E(ηtη′t) = I and
Xt = A1Xt−1 + ...+ApXt−p + Cηt
Define B(L) = A(L)−1 = (I −A1L−A2L2 − ...ApLp)−1, then
∞∑j=0
∂Xt+j
∂ηt= B(1)C = A(1)−1C
C should satisfy the following restrictions:• (exclusion restriction) B(1)C is low triangular• CC′ = Σ.
• (sign restriction), the (1,1) element of B(1)C is positiveSolution: Get Cholesky decomposition of B(1)ΣB(1)′ = PP ′, and letC = B(1)−1P.
(e) In particular VAR(1),
Xt = AXt−1 + Cηt∞∑j=0
∂Xt+j
∂ηt= (I −A)−1C
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 24
6 Variance Decomposition
How much of the k step ahead forecast error variance is due to a specified variable?
Xt = C(L)ηt, E(ηtη′t) = I
vart(Xt+k) = C0C′0 + C1C
′1 + ...+ Ck−1C
′k−1
decompose CjC′j as∑nτ=1CjIτC
′j, then we have
vart(Xt+k) =n∑τ=1
(k−1∑j=0
CjIτC′j) =
n∑τ=1
(νk,τ)
let k −→∞
var(Xt) =n∑τ=1
(ντ) =n∑τ=1
(∞∑j=0
CjIτC′j)
• VAR(1) representation
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 25
Yt = AYt−1 + Cηt, E(ηtη′t) = I
∂Yt+k∂εt
= AkC
vart(Yt+k) =k−1∑j=0
AjCC′Aj =n∑τ=1
(νk,τ), vk,τ =k−1∑j=0
AjCIτC′Aj
var(Yt) =∞∑j=0
AjCC′Aj =n∑τ=1
(ντ),
vτ =∞∑j=0
AjCIτC′Aj = CIτC
′ +A(∞∑j=1
Aj−1CIτC′Aj−1)A′
= CIτC′ +AvτA
′
Alternatively we can compute vk,τ recursively,
vk+1,τ = CIτC′ +Aνk,τA
′, for k > 1
ν1,τ = CIτC′
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 26
7 Standard Error for Impulse Response Functions
- Analytically from the distribution of AR parameters
- By Monte Carlo, for Gaussian residuals
- By bootstrap for sample and non Gaussian residuals.
7.1 Confidence Intervals and the Bootstrap
• Estimation produces:
Xt = A(L)Xt−1 + εt,
εt, t = 1, 2, ..., T
• BootstrapNan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 27
1. Generate r = 1, ..., R artificial data sets, each of length T.— For rth dataset:
λrt ∈ Uniform[0, 1], t = 1, ..., T
— Convert to integers ∈ 1, 2, ..., T :
λrt = integer(λrt × T ), t = 1, ..., T
— Draw shocks
ελr1, ..., ε
λrT
— Generate artificial data:
Xrt = A(L)Xr
t−1 + ελrt, t = 1, ..., T
2. Suppose statistics of interest is φ (could be vector of impulse response functions,serial correlation coeffi cients, etc.)
φr = f(Xr1, ....X
rT ), r = 1, 2, ..., R
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 28
— Compute
σφ =
1
R
R∑r=1
(φr − φ)2
1/2
— Report
φ± 2× σφ
7.2 VAR Diagnostics
• Whether or not to take first difference is important, for example: hours and pro-ductivity, consumption and dividends• Choose VAR Lag Length
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 29
— Construct s(p)
Akaike : s(p) = log(det Σp) + (m+m2p)2
T
Hannan-Quinn : s(p) = log(det Σp) + (m+m2p)2 log(log(T ))
T
Schwarz : s(p) = log(det Σp) + (m+m2p)log(T )
T
where T is sample size, m is number of variables, p is number of lags— choose optimal p
p = arg minps(p)
— With T = 170:
2
T= 0.0118;
2 log(log(T ))
T= 0.0192;
log(T )
T= 0.0302
— Akaike penalizes p the least∗ Hannan-Quinn and Schwarz (or Bayesian information Criterion BIC) are con-sistent∗ In population, Akaike has positive probability of overshooting true p.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 30
— These IC can be used to compare estimated models only when the numericalvalues of the dependent variable are identical for all estimates being compared.The models being compared need not be nested, unlike the case when modelsare being compared using an F or likelihood ratio test.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 31
8 Granger Causality
1. Basic idea• A forecasting relation• Different from "precede effect"
2. Definitionwt Granger causes yt if wt helps to forecast yt, given past yti.e. for s > 0
MSE[E(yt+s|yt, yt−1, ...)] > MSE[E(yt+s|yt, yt−1, ...wt, wt−1, ...)
]• Autoregressive presentation
yt = a(L)yt−1 + b(L)wt−1 + δt
wt = c(L)yt−1 + d(L)wt−1 + νt
wt does not Granger cause yt iff b(L) = 0, or
A(L)
[ytwt
]=
[δtνt
]
A(L) =
[I − La(L) −Lb(L)Lc(L) I − Ld(L)
]≡[a∗(L) b∗(L)c∗(L) d∗(L)
]Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 32
wt does not Granger cause yt iff b∗(L) = 0
• MA presentation
[ytwt
]= A(L)−1
[δtνt
]
=1
a∗(L)d∗(L)− b∗(L)c∗(L)
[d∗(L) −b∗(L)−c∗(L) a∗(L)
] [δtνt
]
≡[a(L) b(L)
c(L) d(L)
] [δtνt
]wt does not Granger cause yt iff the Wold moving average matrix lag polynomialis lower triangularwt does not Granger cause yt iff y’s bivariate Wold representation is same asits univariate Wold representation.• Univariate presentationConsider the pair of univariate Wold representation
yt = e(L)ξt, ξt = yt − E(yt|yt−1, yt−2, ...)
wt = f(L)µt, µt = wt − E(wt|wt−1, wt−2, ...)Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 33
wt does not Granger cause yt if E(µtξt+j) = 0 for all j > 0, i.e. the univariateinnovations of wt are uncorrelated with the univariate innovation in ytProof:
yt = a(L)δt = e(L)ξtwt = c(L)δt + d(L)vt = f(L)µt
=⇒ µt as a combination of δt and νt is uncorrelated with δt+j, hence uncor-related with ξt+j for all j > 0
Note: E(µtξt+j) = 0 =⇒ past µ do not help to forecast ξ =⇒ µ do not helpto forecast yt =⇒ wt = f(L)µt does not help to forecast ytwt does not Granger cause yt then the response of y to w shocks is zero.• Effect on projections: w does not Granger cause y iff E(wt|all yt) = E(wt|currentand past (no future) yt).Proof:
wt = c(L)a(L)−1yt + d(L)νt
3. Test of Granger CausalityF test of
H0 : b1 = b2 = ...bp = 0Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 34
Run an unconstrained regression, save
RSS1 =T∑t=1
u2t
Run a constrained regression, which is a univariate AR(p) for y and save
RSS0 =T∑t=1
e2t
Let
F =(RSS0 −RSS1)/p
RSS1/(T − 2p− 1)∼ F (p, T − 2p− 1)
asymptotically equivalent to
S2 =T (RSS0 −RSS1)
RSS1∼ χ2(p)
4. Interpreting Granger-causality test• It is not necessarily that one pair of variable must Granger cause the other andvice versa, example: money growth and GNP,- Question: fed rate and stock market?
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 35
• Warning: Granger causality is not causality!5. Granger causality in multivariate context• Estimation
y1t = c1 +A′1x1t +A′2x2t + ε1t
y2t = c2 +B′1x1t +B′2x2t + ε2t
H0 : A2 = 0
which is equivalent to estimate
y1t = c1 +A′1x1t +A′2x2t + ε1t
y2t = d+D′0y1t +D′1x1t +D′2x2t + ν2t
H0 : A2 = 0
Proof:
f(yt|xt; θ) = f(y1t|xt; θ)f(y2t|y1t, x; θ)
var(y2t|y1t, x) = Σ22 − Σ21Σ−111 Σ12
E(y2t|y1t, xt) = E(y2t|xt) + Σ21Σ−111 [y1t − E(y1t|xt)]
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 36
• Test: likelihood ratio test.
2L(θ)−L(θ(0)) = T (log |Σ11(0)| − log |Σ11|) ∼ χ2(number of restrictions)
• If A2 = 0 and B1 = 0, does it mean that there is no relation between y1t andy2t at all? Not necessarily.- Contemporaneous linear dependence may present.- Geweke test of linear dependence and decomposition of linear dependence
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 37
9 Kalman Filter
• An algorithm for sequentially updating a linear projection for the system• Deduce the restrictions that the models of the economy and of data collectionimpose on the "innovation representation" of the dynamics of the variables ofinterests• Using Kalman filter we can— calculate exact finite-sample forecasts— exact likelihood function for Gaussian ARMA process— factorize spectral density*— estimate VAR with coeffi cients that change over time*
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 38
9.1 State-Space Representation
state equation: ξt+1 = Fξt + Cvt+1
observation equation : yt = A′xt +H ′ξt + wt
ξt : vector of state variables
yt : vector of observed variables
xt : vector of exogenous or predetermined variables,
provide no info about xt+s or vt+svt : vector of white noise or martingale difference sequence of shocks
wt : vector of white noise or martingale difference sequence
of measurement error
E(vtv′τ
)=
I for t = τ0 otherwise
, E(wtw
′τ
)=
R for t = τ0 otherwise
E (vtwτ) = 0 for all t and τ , CC′ = Q
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 39
We need assumptions about ξ1
E(vtξ′1
)= 0, E
(wtξ′1
)= 0, for t = 1, 2, ...T.
In addition, assume that ξ1 is a random vector with know mean and covariance matrix,
E(ξ1) = ξ1
E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′
]= Σ1
Example of state-space representation
AR(p)
(yt − µ) = φ1(yt−1 − µ) + φ2(yt−2 − µ) + ...+ φp(yt−p − µ) + εt
ξt =
yt − µyt−1 − µ
...yt−p+1 − µ
, F =
φ1 φ2 ... φp−1 φp1 0 ... 0 0... ... ... ... ...0 0 ... 1 0
, C =
σ0...0
,vt+1 = εt+1/σ
yt = yt, A′ = µ, xt = 1, H ′ =
[1 0 ... 0
], wt = 0, R = 0,
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 40
MA(1)
yt = µ+ εt + θεt−1
ξt =
[εtεt−1
], F =
[0 01 0
], C =
[σ0
],
vt+1 = εt+1/σ
yt = yt, A′ = µ, xt = 1, H ′ =
[1 θ
], wt = 0, R = 0,
or
ξt =
[εt + θε−1
θεt
], F =
[0 10 0
], C =
[σσθ
],
vt+1 = εt+1/σ
yt = yt, A′ = µ, xt = 1, H ′ =
[1 0
], wt = 0, R = 0,
• Application in finance and macroeconomics— Real interest rate (Fama and Gibbons 1982), business cycle (Stock and Watson1991), market expectation of inflation (Hamilton 1985), capital stock (Li 2005)and etc.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 41
— Estimation of a model specified at a finer time interval than pertains to theavailable dataKalman Filter Algorithm
Observed data: ytTt=1 , xtTt=1 , we want to construct linear least square forecasts
of ξt and yt based on the data observed through date t.
ξt+1|t = E(ξt+1|yt)
where yt =(y′t, ....y
′1, x′t, ...x
′1
)′. The MSE of the forecast is
Pt+1|t = E
[(ξt+1 − ξt+1|t
) (ξt+1 − ξt+1|t
)′]
• Idea: We accomplish this by constructing an innovation process yt such that[yt, E(ξ1)
]forms an orthogonal basis for the information set
[yt, E(ξ1)
], and
then recursively calculate the projection ξt+1 on[yt, E(ξ1)
]. The orthogonal basis
for[yt, E(ξ1)
]is constructed using Gram-Schmidt process.
Let y1 be the residual from a regression of y1 on E(ξ1) = ξ1|0
y1 = y1 −A′x1 +H ′ξ1|0Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 42
we can check that E [y1] = 0, and[y1, ξ1|0
]and
[y1, ξ1|0
]span the same linear
spaceNote that here we use the exogeneity of xt
E[ξt|xt, yt−1
]= E
[ξt|yt−1
]Next, form y2 as the residual from a regression of y2 on
[y1, ξ1|0
]y2 = y2 − E(y2|y1, ξ1|0)
then E [y1] = 0, E[y2y′1
]= 0 and E
[y2ξ′1|0]
= 0;[y2, ξ1|0
]and
[y2, ξ1|0
]span
the same linear spaceContinuing in this way, form
yt = yt − E(yt|yt−1, ξ1|0)
yt is the innovation representation of yt , and[yt, E(ξ1)
]forms an orthogonal
basis for the information set[yt, E(ξ1)
]
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 43
9.2 Kalman Filter Algorithm:
• — Step 0 - Starting point: If E(ξ1) and E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′
]is known
ξ1|0 = E(ξ1)
P1|0 = Σ1 = E[(ξ1 − E(ξ1))(ξ1 − E(ξ1))′
]Otherwise, if eigenvalues of F are all inside the unit circle, the ξt is weakstationary, hence we can solve for E(ξ1) and Σ1 directly
E(ξt+1
)= FE(ξt) =⇒ E (ξt) = 0
Σ = FΣF ′ +Q =⇒ vec(Σ) = [I − F ⊗ F ]−1 vec(Q)
=⇒
ξ1|0 = 0
vec(P1|0) = [I − F ⊗ F ]−1 vec(Q)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 44
— Step 1: construct yt with ξt|t−1 and Pt|t−1
E(yt|xt,ξt) = A′xt +H ′ξtyt|t−1 ≡ E(yt|xt, yt−1) = A′xt +H ′E(ξt|xt, yt−1) = A′xt +H ′ξt|t−1
yt = yt − yt|t−1 = H ′(ξt − ξt|t−1
)+ wt
with MSE
E[yty′t
]= H ′Pt|t−1H +R
— Step 2: Update the inference about ξt
ξt|t = E(ξt|xt, yt, yt−1) = E(ξt|yt)= ξt|t−1 + Γtyt
Γt = E[(ξt − ξt|t−1)y′t]E(yty′t)−1
= Pt|t−1H(H ′Pt|t−1H +R)−1
where
E[(ξt − ξt|t−1)y′t] = E[(ξt − ξt|t−1)(H ′(ξt − ξt|t−1
)+ wt)
′] = Pt|t−1H
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 45
— Step 3: Forecast ξt+1 based on yt
ξt+1|t = F ξt|t = F ξt|t−1 + F (ξt|t − ξt|t−1)
where
ξt|t − ξt|t−1 = Γtyt = Γt(yt −A′xt −H ′ξt|t−1)
=⇒
ξt+1|t = F ξt|t = F ξt|t−1 +Ktyt
where Kt is "Kalman gain matrix"
Kt = FΓt = FPt|t−1H(H ′Pt|t−1H +R)−1
Note that
ξt+1|t = F tE (ξ1) +t∑
j=1
F j−1Kjyj
— Step 3: update Pt|t−1 = E[(ξt+1 − ξt+1|t)(ξt+1 − ξt+1|t)
′]
ξt+1 − ξt+1|t = (F −KtH ′)(ξt − ξt|t−1) + Cvt+1 −KtwtNan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 46
note yt = H ′(ξt − ξt|t−1) + wt, Hence
Pt+1|t = (F −KH ′)Pt|t−1(F −KH ′)′ + CC′ +KtRK′t
= FPt|t−1F′ −KtH ′Pt|t−1F
′ +Q
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 47
9.3 Innovation Representation
ξt+1|t = F ξt|t−1 +Ktyt
yt = Axt +H ′ξt|t−1 + yt
E[yty′t
]= H ′Pt|t−1H +R
is a time varying innovation representation of the original state-space representationstarting from initial condition ξ1|0 and P1|0
Or
yt = yt −Axt −H ′ξt|t−1
ξt+1|t = F ξt|t−1 +Ktyt
recursively filter out a record of innovations ytTt=1 from ξ1|0 and ytTt=1 , this iscalled a "whitening filter", which transform serially correlated process yt to a seriallyuncorrelated (i.e. "white") process ytTt=1 .
ytTt=1 is called a fundamental white noise for the ytTt=1 process.
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 48
9.4 Convergence Results
• If F have eigenvalues inside unite circle, Q and R are positive semidefinite symmet-ric matrices (at least one if strictly positive definite), then
Pt+1|t
is a monoton-
ically nonincreasing sequence and converges as T −→∞ to a steady state matrixP (which is unique), and
P = F[P − PH(H ′PH +R)−1H ′P
]F ′ +Q
and steady state value for Kalman gain matrix is
K = FPH(H ′PH +R)−1
has the property that the eigenvalues of(F −KH ′
)all lie on or inside unite circle.
• Use Kalman filter to find Wold decomposition.
ξt+1|t = F ξt|t−1 +K(yt −H ′ξt|t−1)
= (F −KH ′)ξt|t−1 +Kyt
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 49
=⇒
ξt+1|t =[I − (F −KH ′)L
]−1Kyt
yt+1|t = H ′ξt+1|t = H ′[I − (F −KH ′)L
]−1Kyt
yt+1 = yt+1 − yt+1|t = I −H ′[I − (F −KH ′)L
]−1KLyt+1
=⇒
yt+1 = I −H ′[I − (F −KH ′)L
]−1KL−1yt+1
= I +H ′[I − (F −KH ′)L
]−1KLyt+1
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 50
9.5 Serially Correlated Measurement Errors
If measurement error wt is serially correlated,
state equation: ξt+1 = Fξt + Cvt+1
observation equation : yt = H ′ξt + wt
wt = Dwt−1 + ηt
E(ηtη′t
)= R
E[vtη′s
]= 0 for all t and s
Idea to transform yt to yt such the corresponding measurement error is uncorre-lated. Define
yt = yt+1 −Dyt= H ′ξt+1 + wt+1 −DH ′ξt −Dwt= (H ′F −DH ′)ξt +H ′Cvt+1 + ηt+1
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 51
thusξt, yt is governed by the state-space system
state equation: ξt+1 = Fξt + Cvt+1
observation equation : yt = H ′ξt + wt
where H ′ = H ′F − DH ′. wt = H ′Cvt+1 + ηt+1 is the new "measurement noise",which is contemporary correlated with vt+1, but not serially correlated.
E(Cvt+1w′t) = CC′H = QH
E(wtw′t) = H ′QH +R
We can do the same thing to find the innovation representation ut of yt ,
ξt+1|t = F ξt|t−1 +Ktut
yt = H ′ξt|t−1 + ut
the only thing we need to change is
E(utu′t
)= H ′Pt|t−1H + (H ′QH +R)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 52
Pt+1|t = (F −KH ′)Pt|t−1(F −KH ′)′ +Q+Kt(H′QH +R)K′t −QHK′t
= FPt|t−1F′ −KtH ′Pt|t−1F
′ +Q+KtH′QHK′t −QHK′t
For the original processyt , an alternative state-space representation is given by com-bining the innovation representation of yt and
yt+1 = Dyt + yt
hence
state equation:
[ξt+1|tyt+1
]=
[F 0H D
] [ξt|t−1
yt
]+
[KtI
]ut
observation equation : yt =[
0 I] [ ξt|t−1
yt
]+ [0]ut
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 53
9.6 MLE estimation of the parameters
• Let θ be the vector of parameter in F,H,A,Q,R, the estimates are obtained bymaximize the loglikelihood of θ given ytTt=1
f(yT , yT−1, ...y1; θ) = fT (yT |yT−1, ...y1)fT−1(yT−1|yT−2, ..., y1)...f1(y1)
y1˜N(H ′ξ1|0, H′P1|0H +R),
yt|yt−1, ...y1˜N(H ′ξt|t−1, H′Pt|t−1H +R)
on the other hand
yt˜N(0, H ′Pt|t−1H +R)
so if ξt is stationary, E(ξ1) = 0, then f1(y1) = g1(y1), and since
yt = H ′ξt|t−1 + yt
we have
ft(yt|yt−1, ...y1) = gt(yt)
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 54
hence the loglikelihood of yT is
T∑t=1
gt(yt)
= −1
2
T∑t=1
n log(2π) + log |H ′Pt|t−1H +R|+ y′t(H
′Pt|t−1H +R)−1yt
• Initialization— Stationary process: ξ1|0 = E(ξ1), and P1|0 = Σ1
— *Nonstationary process: put diffusion prior on ξ1
• Identification:Without putting restrictions on F,H,A,Q,R, the parameters of the state-spacerepresentation are unidentified. example
ξt+1 =
[ε1,t+1ε2,t+1
], yt = ε1t + ε2t
— Global identification at θ0: for any other θ, there exists yT such that f(yT ; θ) 6=f(yT ; θ0)
— Local identification at θ0: information matrix is nonsingular in the neighborhoodaround θ0
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 55
9.7 Smoothing
we want to form inference about state variables not only based on historical data butthe entire available data, that is, ξt|T = E(ξt|yT )
Step 1: E(ξt|ξt+1, yt) = ξt|t + Jt(ξt+1 − ξt+1|t
)Jt = E
[(ξt − ξt|t)(ξt+1 − ξt+1|t)
′]× E [(ξt+1 − ξt+1|t)(ξt+1 − ξt+1|t)′]−1
= Pt|tF′P−1t+1|t
Step 2:
E(ξt|ξt+1, yT ) = E(ξt|ξt+1, y
t) = ξt|t + Jt(ξt+1 − ξt+1|t
)
Step 3:
E(ξt|yT ) = ξt|t + Jt(E(ξt+1|yT )− ξt+1|t
)Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 56
Step 4:
Pt|T = Pt|t + Jt(Pt+1|T − Pt+1|t)J′t
Summarize: smoothed sequence is generated by backward recursion: start from ξT |T ,PT |T .
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 57
9.8 Statistical Inference with the Kalman Filter
We assumed that the true value θ is used to construct ξ and P, but in practice we usethe estimates of θ instead.
E[(ξt − ξt|T (θ))(ξt − ξt|T (θ))′|yT
]= E
[(ξt − ξt|T (θ0))(ξt − ξt|T (θ0))′|yT
]︸ ︷︷ ︸
filter uncertainty
+E[(ξt|T (θ0)− ξt|T (θ))(ξt|T (θ0)− ξt|T (θ))′|yT
]︸ ︷︷ ︸
parameter uncertainty
To measure these two parts of uncertainty we can use Monte Carlo simulation
θ|yT˜N(θ,1
TI−1)
Take M number of draws θ(j) from this distribution, and calculate
1
M
M∑j=1
[(ξt|T (θ(j))− ξt|T (θ))(ξt|T (θ(j))− ξt|T (θ0))′|yT
]Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 58
to estimate "parameter uncertainty", and use
1
M
M∑j=1
Pt|T (θ(j))
to estimate "filter uncertainty", the summation of the two estimate MSE of ξt|T (θ)
around the true value of ξt
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 59
9.9 Application:
• Aggregation over time— average over time
ξt+1 = Fξt + vt+1, t = 0, 1, 2, ...
yt = H ′ξtexpand the state space by including enough lags, Xt = [ξt ξt−1,...ξt−p]
Xt+1 = FXt + Cwt+1
yt = GXt
— skip sample, the data is sampled every τ > 0 period,
ξt+τ = Fτξt + vτt+τ , t = 0, τ , 2τ , ...
yt = H ′ξtwhere
Fτ = F τ
vτt+τ = F τ−1Cwt+1 + F τ−2Cwt+2 + ...+ Cwt+τ
E[vτt+τv
τ ′t+τ
]= CC′ + FCC′F ′ + ...+ F τ−1CC′(F τ−1)′
Nan Li, Department of Finance, ACEM, SJTU
Time Series Analysis, Lecture 2, 2019 60
represented in state-space system as
ξs+1 = Fτξs + vτs+1, s = 0, 1, 2, ...
ys = H ′ξs
• Estimate dynamics of unobserved variables in the economy:Real interest rate (Fama and Gibbons 1982), business cycle (Stock and Watson1991), market expectation of inflation (Hamilton 1985), intangible capital stock (Li2005) and etc.
Nan Li, Department of Finance, ACEM, SJTU