Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Angewandte Okonometrie
Bachelor-Studiengang VWL (Wahlpflicht)
Wintersemester 2012/13
Prof. Dr. Jorg Breitung
1 / 79
Textbook recommendations
I Hey, C., P. de Boer, P.H. Franses, T. Kloek, H.K. van Dijk (2004),Econometric Methods with Applications in Business and Economics,Oxford University Press
I Verbeek, M. (2012), A Guide to Modern Econometrics, Wiley, 4thed.
I Greene, W.H. (2008), Econometric Analysis, 6th edition, Pearson.
I Dimitros, A. and S.G. Hall (2006), Applied Econmetrics: A modernapproach, Palgrave MacMillan.
I Vogelvang, B. (2005), Econometrics: Theory and Applications withEViews, Prentice Hall.
I Enders, W. (2004), Applied Econometric Time Series, 2nd ed.,Wiley.
I Lutkepohl, H. (2005), New Introduction to Multiple Time SeriesAnalysis, Berlin: Springer.
I Lutkepohl, H. and M. Kratzig (2004), Applied Time SeriesEconometrics, Cambridge University Press.
2 / 79
1.1. Basic results for the classical econometric model
I The linear modelyt = x ′tβ + ut
or in matrix notation
y = Xβ + u
where y and u are n × 1 vectors,β is k × 1 and X is n × k
I Assumptions:
(ia) X is deterministic(ib) X is stochastic(ii) X is of full column rank(iii) u ∼ N(0, σ2In) or u|X ∼ N(0, σ2In)
3 / 79
Multiple Regression
I OLS estimator:
β = argminβ
(y − Xβ)′(y − Xβ)
= (X ′X )−1X ′y
I Estimator for σ2:
σ2 =1
n − k(y − X β)′(y − X β)
I Maximum-Likelihood (ML) estimatorLog-likelihood function assuming normal distribution:
ln L(β, σ2) = −n
2ln 2π − n
2ln σ2
− 1
2σ2(y − Xβ)′(y − Xβ)
I ML and OLS of β are identical under normality
4 / 79
I Frisch-Waugh-Lovell theorem:
y = X1β1 + X2β2 + u
Two-step estimator of β2:
β2 = (X ′2M1X2)−1X ′2M1y
where M1 = In − P1 and P1 = X1(X ′1X1)−1X ′1I Scatterplot of (M1y) and (M1X2)
I R2 between (M1y) and (M1X2) is the partial R2
I Goodness of fit:
R2 =ESS
TSS= 1− SSR
TSS= 1− u′u
y ′y − ny 2
5 / 79
Properties of the OLS estimator
a) Expectation:
I E (β) = βI E (σ2) = σ2
I E (σ2) = σ2(n − k)/n
b) Distribution (assuming u ∼ N(0, σ2I )):
β ∼ N(β,Σβ), Σβ = σ2(X ′X )−1
(n − k)
σ2σ2 ∼ χ2
n−k
c) Efficiencyβ is BLUEunder normality: β and σ2 are MVUE
6 / 79
Asymptotic properties
β and σ2 are consistent:
βp→ β, σ2 p→ σ2
Asymptotic distributions:
√n(β − β)
d→ N(0,Σβ)
√n(σ2 − σ2)
d→ N(0, 2σ4)
where
Σβ = σ2
(plimn→∞
1
nX ′X
)−1
7 / 79
1.2 Testing Hypotheses
Basic concepts:
– parameter space: θ ∈ Ω– null hypothesis H0 : θ ∈ Ω0
– alternative: Ha : θ ∈ Ω1
– observed test statistic: λ(y)– reject if λ(y) ∈ C (critical region)
I Neyman-Pearson lemma:
H0 : θ = θ0 vs. Ha : θ = θ1
The most powerful test at a givensignificance level α (size) is
reject H0 if λ(y) =L(θ1; y)
L(θ0; y)> cα
8 / 79
9 / 79
10 / 79
I Critical region: C = λ(y) > cα with P[λ > cα|H0] = α
I Power: Probability to reject under the alternative:P[λ > cα|H0] = α
I NP lemma is used to compute the power envelope
I If there exists a monotonic transformation λ∗ = f (λ) suchthat the distribution under H0 does not depend on θ1: UMP
I p-value (marginal significance level):
p-value = 1− F 0λ [λ(y)]
I Under the null hypothesis the p-value is U(0, 1) distributed
I The p-value is NOT the probability that the null hypothesis iscorrect!
11 / 79
I Testing j linear hypotheses on β
H0 : Rβ = r , j × 1
or: β = Hφ+ h, φ is (k − j)× 1
where R = H ′⊥ and r = H ′⊥h.
I Estimation under H0:
βr = β + (X ′X )−1R ′[R(X ′X )−1R ′]−1(r − Rβ)
I Inserting the null hypothesis gives:
y − Xh = XHφ+ u
⇒ OLS estimator
12 / 79
I LR statistic:
λ(y ,X ) =maxθ∈Ω
L(θ; y ,X )
maxθ∈Ω0
L(θ; y ,X )with Ω0 ⊂ Ω
I Maximum of the likelihood function maxθ
L(θ; y ,X )
L(θ) = (2πσ2)−n/2exp(−n/2)
where σ2 = n−1u′u. It follows that
λ(y ,X ) =
(σ2
r
σ2
)n/2
=
(u′r ur
u′u
)n/2
I Transformation of the test statistic gives:
F =n − k
j
(λ2/n − 1
)=
(n − k
k
)u′r ur − u′u
u′u∼ F j
n−k
13 / 79
I Generalized LR test:
2[`(θ)− `(θr )] = n[log(σ2r )− log(σ2)] ∼ χ2
j
where `(·) = log L(·)I Wald test: Under H0 :
Rβ − r ∼ N(
0, σ2R(X ′X )−1R ′)
it follows
(Rβ − r)′[σ2R(X ′X )−1R ′]−1(Rβ − r) ∼ χ2j
I LM (Score) test
LM =∂`(θr )
∂θ′I (θr )−1∂`(θr )
∂θ
=1
σ2r
u′r X (X ′X )−1X ′ur = n · R2u
All parameters are estimated under the null hypothesis!
14 / 79
Bootstrap test
I Idea: Replace unknown distribution of u by its empiricaldistribution
I Random variables with empirical distribution are generated bydrawing without replacement
I estimate the model under the null hypothesis and generatepseudo data as
y∗ = X βr + u∗
where u∗ is obtained from resampling from the residual vectoru = y − X β
I Estimate β from B pseudo datasets of the model: β(b),b = 1, . . . ,B and form the test statistic λ(b)
I Critical values (or p-statistics) are obtained as the α%percentile of the distribution of λ(b)
15 / 79
1.4 Specification tests(i) Durbin-Watson-Test
dw =
n∑t=2
(ut − ut−1)2
n∑t=1
u2t
≈ 2(1− ρ)
where ρ is the OLS estimator of ρ in ut = ρut−1 + et
ρ =
n∑t=2
ut ut−1
n∑t=2
u2t−1
I Problem: Distribution of dw depends on X⇒ Tables with lower and upper values:
I Critical values:
P dw < dα(X ) = α e.g.: α = 0.05
16 / 79
Durbin-Watson test
17 / 79
18 / 79
I Range of uncertainty:
dL = minXdα(X ) , dU = max
Xdα(X )
(ii) Breusch-Godfrey-Test
I autoregressive model of order m:
ut = ρ1ut−1 + · · ·+ ρmut−m + εt
Inserting residuals yields:
yt = x ′tβ + ρ1ut−1 + · · ·+ ρmut−m + et
I Test of ρ1 = · · · = ρp = 0 (LM version)I yt may also be replaced by ut
(iii) Box-Pierce test:
Qm = nm∑
i=1
ρi2 a∼ χ2
m
test of autocorrelation up to lag order m19 / 79
Tests for heteroskedasticityGoldfeld-Quandt statistic: 2 groups of observations:
GQ =σ2
1
σ22
∼ F n2−kn1−k
Ordering according to time, size etc.Breusch-Pagan/Koenker: σ2
t = h(α0 + z ′tα)h(·): arbitrary monotonous function
(u2t − σ2︸ ︷︷ ︸) = z ′tα + et
q = Zα + e
Test statistic:
LM = nR2u =
q′Z (Z ′Z )−1Z ′q
q′q/na∼ χ2
m
normally distributed errors: q′q/np→ 2σ4
20 / 79
I White’s information matrix testidea: under H0 we have:
(X ′X )−1X ′ΩX (X ′X )−1 − σ2(X ′X )−1 = 0
or (σ2t − σ2)xitxjt = 0 for all i , j ∈ 1, . . . , k
This gives rise to choosing zt as all possible cross products
z ′t =[
x2i1, (xi1xi2), x2
i2, (xi1xi3), (xi2xi3), . . .]
⇒ special case of the Breusch-Pagan/Koenker test
I ARCH-LM-Test
E (u2t |It−1) = α0 + α1u2
t−1 + · · ·+ αpu2t−p
⇒ popular in time series applicationsspecial case with zt = [u2
t−1, . . . , u2t−p]′
21 / 79
Test of normality
I The asymptotic properties of the OLS estimator do notdepend on the validity of the normality assumption
I Deviations from the normal distribution only relevant in verysmall samples
I Outliers may be modeled by mixing distributions
I Tests for normality are very sensitive against outliers
I Under the null hypothesis E (u3t ) = 0 and E (u4
t ) = 3σ4
I Jarque-Bera test statistic:
JB = n
[1
6m2
3 +1
24(m4 − 3)2
]d→ χ2
2
where
m3 =1
T σ3
T∑t=1
u3t m4 =
1
T σ4
T∑t=1
u4t
I Other tests: χ2 and Kolmogorov-Smirnov Test
22 / 79
Tests for structural breaks
I Parameter changes at period t = T :
Yt = x ′tβ1 + ut for t = 1, . . . ,T ∗
Yt = x ′tβ2 + ut for t = T ∗ + 1, . . . , n
I in matrix notation[y1
y2
]=
[X1 00 X2
] [β1
β2
]+
[u1
u2
]I OLS estimator:
β =
[X ′1X1 0
0 X ′2X2
]−1 [X ′1y1
X ′2y2
]
=
[(X ′1X1)−1X ′1y1
(X ′2X2)−1X ′2y2
]=
[β1
β2
]⇒ separate estimation for both parts of the sample
23 / 79
I Chow test: Test of the linear restriction
H0 : β1 = β2
F test:
F =
(n − 2k
k
)SSE0 − SSE
SSE∼ F
(k)(n−2k)
SSE0 = u′∗u∗
SSE = u′1u1 + u′2u2
where
u∗ = Model without structural break
u1 = Residuals from the 1st subsample
u2 = Residuals from the 2nd subsample
24 / 79
Quand-Andrews test
I T ∗ is unknown
I supremum tests:
sup-LM = maxT∗LM(T ∗)
I relative break date τ∗ = T ∗/n
I searching in the interval [τ0, 1− τ0] (often τ0 = 0.10)
I Limiting distribution can be represented as
supτ∈[τ0,1−τ0]
[τW (1)−W (τ)]′[τW (1)−W (τ)]
τ(1− τ)
I Critical values are presented in Andrews (1983). They are
considerably larger than χ2k
25 / 79
Critical values for the Quand-Andrew Test
26 / 79
1.4 Searching for the “Correct Specification”a) Should “insignificant variables” be dropped from theregression?
I Including irrelevant variables increases standard errorsI “General-to-specific” approach (“PC-Gets” Hendry/Krolzig
2001)1. Ascertain that the general statistical model is congruent.2. Eliminate a variables that satisfies the simplification criteria.3. Check that the simplified model remains congruent.4. Continue steps 2 and 3 until none of the remaining variables
can be eliminated.
I The final model is result of a random process andpath-dependent
I Selection criteria (eg. significance level of 5 percent) arearbitrary and not “optimal”
I Pretest bias: (a) estimator is biased toward zero(b) t-statistics are oversized if variables are correlated
27 / 79
MSE of the pretest estimator
28 / 79
MSE of the pretest estimator
29 / 79
b) Multicolinearity
I variables are nearly collinear (highly correlated)
I some eigenvalue of X ′X is very small (Condition indexλmax/λmin > 30)
I High R2, low t-statistics
I Estimators are still unbiased!
I In a regression model with constant and 2 variables:
var(β1) =1
(1− r 212)︸ ︷︷ ︸
Variance inflation factor
·S1
where S1 is the variance for the estimator that ignores x1
I Possible strategies: (1) differences (2) ratios(3) prior information (4) biased estimators (Ridge, Stein)
30 / 79
1.5 The Generalized Regression Model
I Nonscalar (nonspherical) covariance matrix:
E (uu′) = Ω 6= σ2In
I Heteroskedastic errors:
Ω =
σ2
1 0 · · · 00 σ2
2 · · · 0...
. . ....
0 0 · · · σ2n
I Autocorrelated errors:
Ω = σ2u
1 ρ1 · · · ρn−1
ρ1 1 · · · ρn−2...
. . ....
ρn−1 ρn−2 · · · 1
31 / 79
I Properties of the OLS estimator:
E (β) = β
Σβ = E (β − β)(β − β)′
= (X ′X )−1X ′ΩX (X ′X )−1
⇒ unbiased (and consistent) but inefficient
I Standard errors and t-statistic are biased
I Robust standard errors for heteroskedastic errors(White 1992):
1
n
n∑t=1
u2t xtx ′t
p→ limn→∞
1
nX ′ΩX
where ut is the residual from the OLS estimation
I Extension to autocorrelated errors (“HAC standard errors)
32 / 79
GLS estimatorI Factorization:
Ω−1 = ΨΨ′
where Ψ = Ω−1/2
I Transformation:
Ψ′y = Ψ′Xβ + Ψ′u
y∗ = X ∗β + u∗
I Covariance matrix of u∗:
E (u∗u∗′) = Ψ′(ΨΨ′)−1Ψ
= In
I GLS estimator:
β = (X ∗′X ∗)−1X ∗′y∗
= (X ′ΨΨ′X )−1X ′ΨΨ′y
= (X ′Ω−1X )−1X ′Ω−1y
33 / 79
I GLS is BLUE and MVUE (under normality)
I Feasible (estimated) GLS estimator (FGLS, EGLS)⇒ Replace Ω by Ω
I GLS-Transformation of the heteroskedastic modelΩ =diag(σ2
1, . . . , σ2n)
1
σtyt =
(1
σtxt
)′β +
1
σtut
I Estimating the variance function
σ2t = z ′tα
Using E (u2t ) ≈ σ2 we have
u2t = z ′tα + v∗t
34 / 79
I OLS estimation yields:
σ2t = z ′t α
I FGLS estimator
1
σtyt =
(1
σtxt
)′β +
1
σtut
I A Monte Carlo experiment:
yt = βxt + ut
with two subsamples:
t = 1, . . . , n/2 : ut ∼ N(0, 1), xt ∼ N(0, 1)
t = n/2 + 1, . . . , n : ut ∼ N(0, λ), xt ∼ N(0, 2)
T = 100, 5000 MC replications
35 / 79
Results: Standard deviations
λ OLS GLS σ2(X ′X )−1 White (X ′Ω−1X )−1
1 0.0645 0.0648 0.0638 0.0622 0.06292 0.0863 0.0834 0.0780 0.0830 0.08093 0.1005 0.0928 0.0900 0.0994 0.09174 0.1178 0.1012 0.1005 0.1131 0.09905 0.1270 0.1066 0.1102 0.1259 0.1048
10 0.1838 0.1232 0.1489 0.1754 0.119920 0.2559 0.1310 0.2056 0.2460 0.1322
I Estimating the autocorrelated model:
AR(1) error process
ut = ρut−1 + εt
where the process starts at t = −∞ and |ρ| < 1
E (εt) = 0 for all t
E (ε2t ) = σ2
ε for all t
E (εtεs) = 0 for all t 6= s36 / 79
It follows:
Var(ut) ≡ σ2u =
σ2ε
1− ρ2
I Transformed regression
yt−ρyt−1 = (xt − ρxt−1)′β + εt
for t = 2, 3, . . . , nI First observation:(√
1− ρ2)
y1 =[(√
1− ρ2)
x1
]′β + u∗1
where Var(u∗1) = σ2ε
I 2-step estimator:ut = yt − x ′t β
and OLS regression to obtain ρ
ut = ρut−1 + et
⇒ perform GLS transformation and re-estimate the modelusing OLS
37 / 79
Relative efficiency: AR(1) model
I Data generating process:
yt = xt · 1 + ut
where u∗t = 0.5u∗t−1 + εt and ut = u∗t /σu∗.
I Results of the Monte Carlo experiment:
Standard deviations
λ OLS GLS σ2(X ′X )−1 N-W(6) (X ′Ω−1X )−1
0.0 0.0880 0.0891 0.0879 0.0819 0.08700.2 0.1035 0.1023 0.0934 0.0943 0.09830.4 0.1401 0.1219 0.1142 0.1246 0.11830.6 0.2318 0.1583 0.1701 0.2004 0.15570.8 0.6048 0.2742 0.3966 0.5051 0.26660.9 1.6395 0.4838 1.0131 1.3501 0.4882
38 / 79
1.6 Nonlinear models
Nonlinear relationship between yt and xt
yt = h(xt , β) + ui
y = h(X , β) + u
where uti .i .d .∼ N(0, σ2)
I Linear Taylor series expansion:
h(X , β) ≈ h(X , β0) +∂h(X , β0)
∂β′(β − β0)
pseudo-linear model:
yt(β0) = zt(β0)′β + ut
y(β0) = Z (β0)β + u
39 / 79
where
y(β0) = y − h(X , β0) + Z (β0)β0
Z (β0) =∂h(X , β0)
∂β
β = [Z (β0)′Z (β0)]−1Z (β0)′y(β0)
= β0 + [Z (β0)′Z (β0)]−1Z (β0)′[y − h(X , β0)]
I Nonlinear LS estimation
S(β) = [y − h(X , β)]′[y − h(X , β)]
first derivative
∂S(β)
∂β= −2Z (β)′[y − h(X , β)]
!= 0
⇒ Nonlinear LS estimation is equivalent to estimation of thepseudo-linear model
40 / 79
I Gauss-Newton (or scoring) algorithm:
From the pseudo-linear model:
βm+1 = βm + [Z (βm)′Z (βm)]−1Z (βm)′[y − h(X , βm)]
⇒ Iterations until |βm+1 − βm| → 0
I distribution of the LS estimator:
βa∼ N(β, Σβ)
whereΣβ = σ2[Z (β)′Z (β)]−1
and
σ2 =1
n − k[y − h(X , β)]′[y − h(X , β)]
41 / 79
2. Instrumental variables and GMM estimation“Extremum estimators”:
θ = argmaxθ∈Ω
m(θ, y ,X )
Alternative objective functions:
estimator m(θ; y ,X )
ML `(θ)GMM m(θ)′Λ m(θ)
MD [π − g(θ)]′Σ−1π [π − g(θ)]
2.1 IV estimator
I IV estimator for the linear regression
y = Xβ + u
for the case E(X ′u) 6= 0
I Instrumental variables: Z is n × l (` ≥ k)
42 / 79
I Moment condition:E(Z ′u) = 0
I Transformed equation:
Z ′Y = Z ′Xβ + Z ′u
Y∗ = X∗β + u∗
whereE (u∗u
′∗) = σ2E(Z ′Z )
GLS yields:
βIV = [X ′∗(Z ′Z )−1X∗]−1X ′∗(Z ′Z )−1y∗
= [X ′Z (Z ′Z )−1Z ′X ]−1X ′Z (Z ′Z )−1Z ′y
I If
limn→∞
1
nE (Z ′u) = 0 and lim
n→∞
1
nE (Z ′X ) = Ψ
with rk(Ψ) ≥ k then the IV estimator is consistent
43 / 79
Two-stage least-squares interpretation
I IV estimator is identical to the (Gaussian) ML estimator in
y = Xβ + u
X = Z Π + V with u = V δ + ε
I Replacing V by V = Mz X and using I = Mz + Pz yields
y = Xβ + V δ + Mz V + ε
= Pz Xβ + V (δ + β) + e
= Xβ + u∗
where X = Z Π is the LS estimator for X based on Z
I Note that plimE(
1n Z ′u∗
)= 0
⇒ replacing X by X yields a consistent estimator
I Since Pz is idempotent we have
β2s = (X ′Pz X )−1X ′Pz y = βiv
44 / 79
I Just-identified case: ` = k
βIV = (Z ′X )−1Z ′y
I Asymptotic distribution:
√n(βIV − β)
d→ N(
0, σ2[Sxz S−1zz Szx ]−1
)where Sab = plim
n→∞1n
∑nt=1 atb′t
I Sargan test: Test of H0 : E (Z ′u) = 0LM test of γ = 0 in the auxiliary regression
u = Zγ + e
yielding
S =1
σ2u′Z (Z ′Z )−1Z ′u
Test is χ2 distributed with `− k degrees of freedom
45 / 79
Test for exogenous regressors
I H0 : X2 in y = X1β + X2γ + u is exogenous:
H0 : E (X ′2u) = 0 or δ = 0 in u = V δ + ε
give the set of all exogenous variables: Z = [X1,X3].
I Durbin-Wu-Hausman test:
(βiv − β)′V−1∆ (βiv − β) ∼ χ2
k
where V∆ = Var(βiv )− Var(β)
I OLS is equivalent to IV if δ = 0 and therefore
y = X1β + X2γ + V2(γ + δ) + ut
= X1β + X2γ + V δ + ut
⇒ test whether the residuals V are significant
46 / 79
2.2 GMM estimator
I `× 1 moment condition, e.g.,
E [mt(β)] = E[zt(yt − x ′tβ)] = 0
I Minimization of the criterion function:
θgmm = argminθ∈Ω
[n∑
t=1
mt(θ)′
]Wn
[n∑
t=1
mt(θ)
]
I With mt(β) = zt(yt − x ′tβ)
θgmm = (X ′ZWnZ ′X )−1X ′ZWnZ ′y
for nonlinear moment conditions: replace Z ′X by∂∑
mt(θ)/∂θ′
47 / 79
I Distribution:
Var(θgmm) = (X ′ZWnZ ′X )−1X ′ZWnΣmWnZ ′X (X ′ZWnZ ′X )−1
where Σm = E [mt(θ0)mt(θ0)′]
I Optimal weight matrix:
Wn =
[E
n∑t=1
mt(θ0)mt(θ0)′
]−1
Note: if Z is independent of u we have
En∑
t=1
mt(β)mt(β)′ = E (Z ′ΩZ )
where E(uu′) = Ω
48 / 79
I Estimated weight matrix:
W =
[n∑
t=1
mt(θ)mt(θ)′
]−1
such that for the linear model:
Var(θgmm) = (X ′Z WnZ ′X )−1
⇒ if Ω = σ2I , GMM = IVI Serially correlated errors:
W =
E
[n∑
t=1
mt(θ)
][n∑
t=1
mt(θ)′
]−1
W =
∑j=−`
gj Cj
where for j ≥ 0 : Cj =
n∑t=j+1
mt(θ)mt−j (θ)′ and Cj = C ′−j for j < 0
and gj is some weight function, e.g. gj = (`− j + 1)/(`+ 1)49 / 79
2.3 Tests based on IV/GMM
(i) Hansen-Sargan statistic:H0: The model is correctly specified
E[mt(θ)] = 0
I Test statistic:
Q(θ) =
[n∑
t=1
mt(θ)
]′W
[n∑
t=1
mt(θ)
]d→ χ2
m−k
I pseudo LR-Statistic
Q(θr )− Q(θ)d→ χ2
j
I Wald-test straightforward
50 / 79
2.4 Weak instrumentsI Two-stage representation:
y = Xβ + u
X = Z Π + V u = V γ + ε
I Strong instruments imply rk(π) ≥ k
I Weak instruments: π = C/√
n, where rk(C ) ≥ k⇒ OLS is inconsistent and not asymptotically normallydistributed
I Indication for weak instrument (k = 1):centralization parameter:
µ2 =1
σ2v
Π′Z ′Z Π
I F -statistic for H0 : Π = 0:
F = µ2/`
51 / 79
I Distribution if there is NO identification
βiv = β +Z ′u
Z ′v6= N(0, σ
β2)
I Stock-Yogo test: H0 : µ2/√
n ≤ c0
⇒ Instruments are “too weak”
I Generalization to k > 1: Test for H0 : rk(Π) ≥ k (eg.Kleijbergen/Paap)
I Testing hypotheses under weak/no identification:
y − Xβ0 = X (β − β0) + u
= Z Π(β − β0) + u∗
= Zϕ+ u∗
I Anderson-Rubin statistic: F -Test of ϕ = 0 is χ2`−k
I LM test of ϕ = 0 is χ2k distributed (Kleibergen 2002)
52 / 79
53 / 79
3.1 Univariate time series models1.1 Basic concepts
I stochastic processes: Yt(ω) with t = 1, 2, . . . ,Tω : (vector of) random variables
I white noise process:
E (Yt) = µ (usually µ = 0)
var(Yt) = σ2
E (Yt − µ)(Ys − µ) = 0 for t 6= s
I weak stationarity:
E (Yt) = µ for all t
E (Yt − µ)2 = σ2 for all t
E [(Yt − µ)(Ys − µ)] = E (Yt − µ)(Yt−|t−s| − µ) for all t
I strict stationarity:
f (Yt ,Yt−1, . . . ,Yt−m) = f (Yt−h,Yt−h−1, . . . ,Yt−h−m)
for all m and h54 / 79
I The autocovariance function of stationary time series:
γk = E (Yt − µ)(Yt−k − µ)
γ0 is the variance
I autocorrelation function
ρk =γk
γ0
I Estimation of γk :
ck =1
T
T∑t=k+1
(yt − y)(yt−k − y)
rk = ck/c0
I Modifications:• correction for missing k observations• degrees of freedom correction
55 / 79
Are economic data stationary?
Usually some transformations are required to obtain stationaryseries:
I logarithm (to stabilize variances)
I first differences (to remove the trend)
I annual differences (to remove seasonally changing means)
I Seasonal adjustment
I deviations from an estimated trend
I Adjustment for structural breaks (dummy variables)
I normalization relative to a scale variable
56 / 79
3.2 Seasonality and trends
I Component models
Yt = Gt + St + ut or Yt = Gt · St · ut
a) Gt : trend component• Polynomial trend: Gt = a0 + a1t + · · ·+ aqtq
b) St : seasonal component• deterministic model: st = β1d1,t + · · ·+ β12d12,t
where dj ,t is a seasonal dummy variable• stochastic seasonals: st = −st−1 − · · · − st−11 + εt ,where εt is an error term
Seasonal adjustment procedure (CENSUS)
I First version of 1954 (U.S. Bureau of Census),
I Version X-11 of 1965, X-12-ARIMA of 1998
57 / 79
(1) Computing raw seasonal component by applying centered12-months moving averages:
s∗t =yt
D12(yt)≈ stut
(2) Smoothing of the raw components: st .(3) Outlier adjustment:
|s∗t − st | > 2σ(st)
⇒ replace by the average of neighborhood values(4) Smoothing of outlier adjusted seasonal component and computing
the (preliminary) seasonally adjusted series:
y at =
yt
st
(5) Application of 15-points trend filter (Spencer):
gt = F 15(yt)
Computation of new raw seasonal components
yt
gt≈ st · ut
Continue with (2).
58 / 79
Hodrick-Prescott (HP) Filter:
T∑t=1
(yt − gt)2
︸ ︷︷ ︸“fit′′
+T−1∑t=2
λ (∆gt+1 −∆gt)2
︸ ︷︷ ︸“smoothness′′
= min!
I Usual values:
λ = 100 for annual data
λ = 1600 for quarterly data
λ = 14400 for monthly data
I Differentiation with respect to gt yields:
t = 1 y1 = (1 + λ)g1 − 2λg2 + λg3
t = 2 y2 = −2λg1 + (1 + 5λ)g2 − 4λg3 + λg4
· · · yt = λgt−2 − 4λgt−1 + (1 + 6λ)gt − 4λgt+1 + λgt+2
t = T − 1 yT−1 = λgT−3 − 4λgT−2 + (1 + 5λ)gT−1 − 2λgT
t = T yT = λgT−2 − 2λgT−1 + (1 + λ)gT
⇒ solving for g = [g1, . . . , gT ]′ 59 / 79
3.3 Autoregressive models
I White noise process: εt with
E (εt) = 0 and E (εt) = σ2
I Autoregressive AR(p) process:
Yt = α1Yt−1 + α2Yt−2 + · · ·+ αpYt−p + εt
I the lag operator:
LYt ≡ Yt−1 , Lk Yt ≡ Yt−k
I Notation using lag polynomials:
α(L)Yt = εt
where α(L) = 1− α1L− · · · − αpLp
60 / 79
The AR(1) model
Yt = αYt−1 + εt
I successive replacement yields:
Yt = εt + αεt−1 + α2εt−2 + α3εt−3 + · · ·I properties for |α| < 1:
λ0 = var(Yt) =σ2
1− α2
λk = cov(Yt ,Yt−k ) = αk σ2
1− α2
Autocorrelation: ρk = λk/λ0 = αk
I properties for |α| = 1: (“random walk”)
Yt = Y0 + ε1 + ε2 + · · ·+ εt
var(Yt) = var(Y0) + tσ2
cov(Yt ,Yt−k ) = var(Y0) + (t − k)σ2
61 / 79
3.4 MA and ARMA models
I Moving-Average (MA) process:
Yt = εt − β1εt−1 − · · · − βqεt−p
Yt = β(L)εt
where β(L) = 1− β1L− . . .− βqLq.
I special case MA(1): Yt = εt − βεt−1
γ0 = E (Y 2t ) = (1 + β2)σ2
γk = E (YtYt−k ) =
−βσ2 for k = 10 for k ≥ 2
I ARMA process
Yt = α1Yt−1 + · · ·+ αpYt−p + εt − β1εt−1 − · · · − βqεt−q
α(L)Yt = β(L)εt
62 / 79
Wold representation
I AR representation
β(L)−1α(L)Yt = γ(L)Yt = εt
where γ(L) = 1− γ1L− γ2L2 − . . .
I All stationary time series have a MA representation:
Yt = εt + φ1εt−1 + φ2εt−2 + · · ·= φ(L)εt
whereεt = Yt − E (yt |Ft−1)
where Ft = Yt ,Yt−1, . . .I ARMA models with φ(L) = β(L)/α(L)
63 / 79
3.5 Unit root tests
I Test of the null hypothesis α = 1 in the AR(1) model:
Yt = αYt−1 + εt
∆Yt = (α− 1)︸ ︷︷ ︸φ
Yt−1 + εt
I Problem: φ (and t statistic) is not normally distributed
I Let W (a) denote a Brownian motion. Then
T φd→∫ 1
0 W (a)dW (a)∫ 10 W (a)2da
tφd→∫ 1
0 W (a)dW (a)√∫ 10 W (a)2da
I Critical values see e.g. Hamilton (1994)
64 / 79
65 / 79
66 / 79
Extensions
I Deterministic terms:
∆Yt = c + φYt−1 + εt
∆Yt = c + βt + φYt−1 + εt
⇒ Null distribution changes
I AR(p) process
H0 : α1 + · · ·+ αp = 1 or α(1) = 0
Test of φ = 0 in the model:
∆Yt = φYt−1 + γ1∆Yt−1 + · · ·+ γp−1∆Yt−p+1 + εt
⇒ null distribution does not change
67 / 79
3.6 Dynamic regression equations
I ADL(p, q) regression model:
α(L)yt = β(L)xt + ut
Assumptions: (i) roots of α(z) = 0 are outside the unit circle(ii) xt is stationary
I Partial adjustment
I desired (long-run) level:
y∗t = βxt + ut
I partial adjustment:
yt = (1− α)y∗t + αyt−1
I Inserting yieldsyt = αyt−1 + γxt + u∗t
mit γ = (1− α)β
68 / 79
3.7 The spurious regression problem
I Assume:
yt ∼ I (1)
xt ∼ I (1)
⇒ In general yt − xtβ is also I (1)I Spurious regression: If yt and xt are independent random
walks:I t-values are often significantI large R2
I Low Durbin-Watson statistic
I Common trend model (“cointegration”)
xt = rt + ε1t ∼ I (1)
yt = βrt + ε2t ∼ I (1)
yt − βxt = ε2t − βε1t = ut ∼ I (0)
69 / 79
3.8 Cointegration: Single equation approach
I Properties of OLS:
• β is “super-consistent”
• robust against endogenous xt
• Efficient only if (i) xt is exogenous (ii) et is seriallyuncorrelated
• t statistics are generally invalid
I Test for cointegration:
1. Step: ADF test of yt and xt
2. Step: ADF test of the residuals ut = yt − xt β
I Critical values depend also on k
70 / 79
71 / 79
Dynamic OLS estimator (DOLS)
I Decomposition of the errors:
ut =
p∑j=−q
γ′j ∆xt−j + εt
I Inserting in regression equation yields:
yt = β′xt +
p∑j=−q
γ′j ∆xt−j + εt
⇒ DOLS estimator is asymptotically efficient
I Autocorrelation of εt :
− Newey-West HAC standard errors (Saikkonen 1991)
− GLS based on AR(p) errors (Stock-Watson 2003)
72 / 79
Engle-Granger two-step approach
I Error correction representation
yt = αyt−1 + β′0xt + β′1xt−1 + εt
yields:∆yt = γ(yt−1 − β′xt−1) + δ′∆xt + εt
(yt−1 − β′xt−1) is the error correction term
I replace β by β (E/G 2-step estimator)
I EC test for cointegration:
∆yt = γ1yt−1 + γ′2xt−1 + δ′∆xt + lags + εt
t-test of H0 : γ1 = 0Critical values are tabulated in Banerjee et al. (1999)
73 / 79
3.9 Dynamic systems: VAR modelI time series vector:
yt =
y1t
y2t...
yKt
I vector autoregressive (VAR) model:
yt = A1yt−1 + A2yt−2 + · · ·+ Apyt−p + ut
A(L)yt = ut
where A(L) = I − A1L + · · · − ApLp
I Assumptions:
E (ut) = 0
E (utu′t) = Σ =
σ11 σ12 · · · σ1K
σ21 σ22 · · · σ2K...
...σK1 σK2 · · · σKK
74 / 79
EstimationI Regression format:
yt = c + A1yt−1 + · · ·+ Apyt−p + εt
= Bxt + εt
where B = [c ,A1, · · · ,Ap] and
xt =
1
yt−1...
yt−p
I Differentiating with respect to B and Ω yields
B =
(T∑
t=1
ytx ′t
)(T∑
t=1
xtx ′t
)−1
Ω =1
T
T∑t=1
(yt − Bxt)(yt − Bxt)′
single-equation OLS is consistentand asymptotically efficient
75 / 79
Granger causality
I Bivariate process:[a11(L) a12(L)a21(L) a22(L)
] [xt
yt
]=
[ε1t
ε2t
]yt is not causal for xt if
H0 : a12(L) = 0
I Test of the hypothesis using the first equation
xt = α1xt−1 + · · ·+ αpxt−p
+β1yt−1 + · · ·+ βpyt−p + ε1t
I F test of β1 = · · · = βp = 0
76 / 79
3.10 Structural (identified) VAR modelsI forecast errors (“innovations”)
ut+1 = yt+1 − E (yt+1|yt , yt−1, . . .)
orthogonal shocks (recursive scheme):
u1t = ε1t
u2t = %21ε1t + ε2t
u3t = %31ε1t + %32ε2t + ε3t
...
uKt = %K1ε1t + %K2ε2t + · · ·+ εKt
where E (εitεjt) = 0 for i 6= j .I Choleski decomposition:
R =
1 0 0 · · · 0 0%21 1 0 · · · 0 0%31 %32 1 · · · 0 0
.... . .
...%K1 %K2 %K3 · · · %K ,K−1 1
77 / 79
and ut = Rεt
E (εtε′t) = D = diagvar(ε1t), . . . , var(εKt)
so that
Σ = E (utu′t) = RDR ′
= PP ′
where P = RD1/2
⇒ P is obtained from a Choleski factorization:
Σ = PP ′
I Impulse response function:
yt = ut + Φ1ut−1 + Φ2ut−2 + · · ·= Rεt + Φ1Rεt−1 + Φ2Rεt−2 + · · ·= Θ0εt + Θ1εt−1 + Θ2εt−2 + · · ·
78 / 79
where Θh = ΦhR and
θij (h) =∂yi ,t+h
∂εjt
⇒ IRF θij (h) represents the dynamic effect of yj ,t+h w.r.t. the“shock” εit
I Estimation of the impulse response function:
1. Estimated VAR-Model: A1, . . . , Ap, Σ2. Compute MA representation using
Φi =i∑
j=1
Φi−j Aj and Φ0 = Ik
3. Choleski decomposition: Σ = PP ′
4. impulse response matrices:
Θh = ΦhR
79 / 79