Angewandte Okonometrie Bachelor-Studiengang VWL (Wahlp icht) · Angewandte Okonometrie Bachelor-Studiengang VWL (Wahlp icht) Wintersemester 2012/13 Prof. Dr. J org Breitung 1/79

Angewandte Okonometrie

Bachelor-Studiengang VWL (Wahlpflicht)

Wintersemester 2012/13

Prof. Dr. Jorg Breitung

1 / 79

Textbook recommendations

I Hey, C., P. de Boer, P.H. Franses, T. Kloek, H.K. van Dijk (2004),Econometric Methods with Applications in Business and Economics,Oxford University Press

I Verbeek, M. (2012), A Guide to Modern Econometrics, Wiley, 4thed.

I Greene, W.H. (2008), Econometric Analysis, 6th edition, Pearson.

I Dimitros, A. and S.G. Hall (2006), Applied Econmetrics: A modernapproach, Palgrave MacMillan.

I Vogelvang, B. (2005), Econometrics: Theory and Applications withEViews, Prentice Hall.

I Enders, W. (2004), Applied Econometric Time Series, 2nd ed.,Wiley.

I Lutkepohl, H. (2005), New Introduction to Multiple Time SeriesAnalysis, Berlin: Springer.

I Lutkepohl, H. and M. Kratzig (2004), Applied Time SeriesEconometrics, Cambridge University Press.

2 / 79

1.1. Basic results for the classical econometric model

I The linear modelyt = x ′tβ + ut

or in matrix notation

y = Xβ + u

where y and u are n × 1 vectors,β is k × 1 and X is n × k

I Assumptions:

(ia) X is deterministic(ib) X is stochastic(ii) X is of full column rank(iii) u ∼ N(0, σ2In) or u|X ∼ N(0, σ2In)

3 / 79

Multiple Regression

I OLS estimator:

β = argminβ

(y − Xβ)′(y − Xβ)

= (X ′X )−1X ′y

I Estimator for σ2:

σ2 =1

n − k(y − X β)′(y − X β)

I Maximum-Likelihood (ML) estimatorLog-likelihood function assuming normal distribution:

ln L(β, σ2) = −n

2ln 2π − n

2ln σ2

− 1

2σ2(y − Xβ)′(y − Xβ)

I ML and OLS of β are identical under normality

4 / 79

I Frisch-Waugh-Lovell theorem:

y = X1β1 + X2β2 + u

Two-step estimator of β2:

β2 = (X ′2M1X2)−1X ′2M1y

where M1 = In − P1 and P1 = X1(X ′1X1)−1X ′1I Scatterplot of (M1y) and (M1X2)

I R2 between (M1y) and (M1X2) is the partial R2

I Goodness of fit:

R2 =ESS

TSS= 1− SSR

TSS= 1− u′u

y ′y − ny 2

5 / 79

Properties of the OLS estimator

a) Expectation:

I E (β) = βI E (σ2) = σ2

I E (σ2) = σ2(n − k)/n

b) Distribution (assuming u ∼ N(0, σ2I )):

β ∼ N(β,Σβ), Σβ = σ2(X ′X )−1

(n − k)

σ2σ2 ∼ χ2

n−k

c) Efficiencyβ is BLUEunder normality: β and σ2 are MVUE

6 / 79

Asymptotic properties

β and σ2 are consistent:

βp→ β, σ2 p→ σ2

Asymptotic distributions:

√n(β − β)

d→ N(0,Σβ)

√n(σ2 − σ2)

d→ N(0, 2σ4)

where

Σβ = σ2

(plimn→∞

1

nX ′X

)−1

7 / 79

1.2 Testing Hypotheses

Basic concepts:

– parameter space: θ ∈ Ω– null hypothesis H0 : θ ∈ Ω0

– alternative: Ha : θ ∈ Ω1

– observed test statistic: λ(y)– reject if λ(y) ∈ C (critical region)

I Neyman-Pearson lemma:

H0 : θ = θ0 vs. Ha : θ = θ1

The most powerful test at a givensignificance level α (size) is

reject H0 if λ(y) =L(θ1; y)

L(θ0; y)> cα

8 / 79

9 / 79

10 / 79

I Critical region: C = λ(y) > cα with P[λ > cα|H0] = α

I Power: Probability to reject under the alternative:P[λ > cα|H0] = α

I NP lemma is used to compute the power envelope

I If there exists a monotonic transformation λ∗ = f (λ) suchthat the distribution under H0 does not depend on θ1: UMP

I p-value (marginal significance level):

p-value = 1− F 0λ [λ(y)]

I Under the null hypothesis the p-value is U(0, 1) distributed

I The p-value is NOT the probability that the null hypothesis iscorrect!

11 / 79

I Testing j linear hypotheses on β

H0 : Rβ = r , j × 1

or: β = Hφ+ h, φ is (k − j)× 1

where R = H ′⊥ and r = H ′⊥h.

I Estimation under H0:

βr = β + (X ′X )−1R ′[R(X ′X )−1R ′]−1(r − Rβ)

I Inserting the null hypothesis gives:

y − Xh = XHφ+ u

⇒ OLS estimator

12 / 79

I LR statistic:

λ(y ,X ) =maxθ∈Ω

L(θ; y ,X )

maxθ∈Ω0

L(θ; y ,X )with Ω0 ⊂ Ω

I Maximum of the likelihood function maxθ

L(θ; y ,X )

L(θ) = (2πσ2)−n/2exp(−n/2)

where σ2 = n−1u′u. It follows that

λ(y ,X ) =

(σ2

r

σ2

)n/2

=

(u′r ur

u′u

)n/2

I Transformation of the test statistic gives:

F =n − k

j

(λ2/n − 1

)=

(n − k

k

)u′r ur − u′u

u′u∼ F j

n−k

13 / 79

I Generalized LR test:

2[`(θ)− `(θr )] = n[log(σ2r )− log(σ2)] ∼ χ2

j

where `(·) = log L(·)I Wald test: Under H0 :

Rβ − r ∼ N(

0, σ2R(X ′X )−1R ′)

it follows

(Rβ − r)′[σ2R(X ′X )−1R ′]−1(Rβ − r) ∼ χ2j

I LM (Score) test

LM =∂`(θr )

∂θ′I (θr )−1∂`(θr )

∂θ

=1

σ2r

u′r X (X ′X )−1X ′ur = n · R2u

All parameters are estimated under the null hypothesis!

14 / 79

Bootstrap test

I Idea: Replace unknown distribution of u by its empiricaldistribution

I Random variables with empirical distribution are generated bydrawing without replacement

I estimate the model under the null hypothesis and generatepseudo data as

y∗ = X βr + u∗

where u∗ is obtained from resampling from the residual vectoru = y − X β

I Estimate β from B pseudo datasets of the model: β(b),b = 1, . . . ,B and form the test statistic λ(b)

I Critical values (or p-statistics) are obtained as the α%percentile of the distribution of λ(b)

15 / 79

1.4 Specification tests(i) Durbin-Watson-Test

dw =

n∑t=2

(ut − ut−1)2

n∑t=1

u2t

≈ 2(1− ρ)

where ρ is the OLS estimator of ρ in ut = ρut−1 + et

ρ =

n∑t=2

ut ut−1

n∑t=2

u2t−1

I Problem: Distribution of dw depends on X⇒ Tables with lower and upper values:

I Critical values:

P dw < dα(X ) = α e.g.: α = 0.05

16 / 79

Durbin-Watson test

17 / 79

18 / 79

I Range of uncertainty:

dL = minXdα(X ) , dU = max

Xdα(X )

(ii) Breusch-Godfrey-Test

I autoregressive model of order m:

ut = ρ1ut−1 + · · ·+ ρmut−m + εt

Inserting residuals yields:

yt = x ′tβ + ρ1ut−1 + · · ·+ ρmut−m + et

I Test of ρ1 = · · · = ρp = 0 (LM version)I yt may also be replaced by ut

(iii) Box-Pierce test:

Qm = nm∑

i=1

ρi2 a∼ χ2

m

test of autocorrelation up to lag order m19 / 79

Tests for heteroskedasticityGoldfeld-Quandt statistic: 2 groups of observations:

GQ =σ2

1

σ22

∼ F n2−kn1−k

Ordering according to time, size etc.Breusch-Pagan/Koenker: σ2

t = h(α0 + z ′tα)h(·): arbitrary monotonous function

(u2t − σ2︸︷︷︸) = z ′tα + et

q = Zα + e

Test statistic:

LM = nR2u =

q′Z (Z ′Z )−1Z ′q

q′q/na∼ χ2

m

normally distributed errors: q′q/np→ 2σ4

20 / 79

I White’s information matrix testidea: under H0 we have:

(X ′X )−1X ′ΩX (X ′X )−1 − σ2(X ′X )−1 = 0

or (σ2t − σ2)xitxjt = 0 for all i , j ∈ 1, . . . , k

This gives rise to choosing zt as all possible cross products

z ′t =[

x2i1, (xi1xi2), x2

i2, (xi1xi3), (xi2xi3), . . .]

⇒ special case of the Breusch-Pagan/Koenker test

I ARCH-LM-Test

E (u2t |It−1) = α0 + α1u2

t−1 + · · ·+ αpu2t−p

⇒ popular in time series applicationsspecial case with zt = [u2

t−1, . . . , u2t−p]′

21 / 79

Test of normality

I The asymptotic properties of the OLS estimator do notdepend on the validity of the normality assumption

I Deviations from the normal distribution only relevant in verysmall samples

I Outliers may be modeled by mixing distributions

I Tests for normality are very sensitive against outliers

I Under the null hypothesis E (u3t ) = 0 and E (u4

t ) = 3σ4

I Jarque-Bera test statistic:

JB = n

[1

6m2

3 +1

24(m4 − 3)2

]d→ χ2

2

where

m3 =1

T σ3

T∑t=1

u3t m4 =

1

T σ4

T∑t=1

u4t

I Other tests: χ2 and Kolmogorov-Smirnov Test

22 / 79

Tests for structural breaks

I Parameter changes at period t = T :

Yt = x ′tβ1 + ut for t = 1, . . . ,T ∗

Yt = x ′tβ2 + ut for t = T ∗ + 1, . . . , n

I in matrix notation[y1

y2

]=

[X1 00 X2

] [β1

β2

]+

[u1

u2

]I OLS estimator:

β =

[X ′1X1 0

0 X ′2X2

]−1 [X ′1y1

X ′2y2

]

=

[(X ′1X1)−1X ′1y1

(X ′2X2)−1X ′2y2

]=

[β1

β2

]⇒ separate estimation for both parts of the sample

23 / 79

I Chow test: Test of the linear restriction

H0 : β1 = β2

F test:

F =

(n − 2k

k

)SSE0 − SSE

SSE∼ F

(k)(n−2k)

SSE0 = u′∗u∗

SSE = u′1u1 + u′2u2

where

u∗ = Model without structural break

u1 = Residuals from the 1st subsample

u2 = Residuals from the 2nd subsample

24 / 79

Quand-Andrews test

I T ∗ is unknown

I supremum tests:

sup-LM = maxT∗LM(T ∗)

I relative break date τ∗ = T ∗/n

I searching in the interval [τ0, 1− τ0] (often τ0 = 0.10)

I Limiting distribution can be represented as

supτ∈[τ0,1−τ0]

[τW (1)−W (τ)]′[τW (1)−W (τ)]

τ(1− τ)

I Critical values are presented in Andrews (1983). They are

considerably larger than χ2k

25 / 79

Critical values for the Quand-Andrew Test

26 / 79

1.4 Searching for the “Correct Specification”a) Should “insignificant variables” be dropped from theregression?

I Including irrelevant variables increases standard errorsI “General-to-specific” approach (“PC-Gets” Hendry/Krolzig

2001)1. Ascertain that the general statistical model is congruent.2. Eliminate a variables that satisfies the simplification criteria.3. Check that the simplified model remains congruent.4. Continue steps 2 and 3 until none of the remaining variables

can be eliminated.

I The final model is result of a random process andpath-dependent

I Selection criteria (eg. significance level of 5 percent) arearbitrary and not “optimal”

I Pretest bias: (a) estimator is biased toward zero(b) t-statistics are oversized if variables are correlated

27 / 79

MSE of the pretest estimator

28 / 79

MSE of the pretest estimator

29 / 79

b) Multicolinearity

I variables are nearly collinear (highly correlated)

I some eigenvalue of X ′X is very small (Condition indexλmax/λmin > 30)

I High R2, low t-statistics

I Estimators are still unbiased!

I In a regression model with constant and 2 variables:

var(β1) =1

(1− r 212)︸︷︷︸

Variance inflation factor

·S1

where S1 is the variance for the estimator that ignores x1

I Possible strategies: (1) differences (2) ratios(3) prior information (4) biased estimators (Ridge, Stein)

30 / 79

1.5 The Generalized Regression Model

I Nonscalar (nonspherical) covariance matrix:

E (uu′) = Ω 6= σ2In

I Heteroskedastic errors:

Ω =

σ2

1 0 · · · 00 σ2

2 · · · 0...

. . ....

0 0 · · · σ2n

I Autocorrelated errors:

Ω = σ2u

1 ρ1 · · · ρn−1

ρ1 1 · · · ρn−2...

. . ....

ρn−1 ρn−2 · · · 1

31 / 79

I Properties of the OLS estimator:

E (β) = β

Σβ = E (β − β)(β − β)′

= (X ′X )−1X ′ΩX (X ′X )−1

⇒ unbiased (and consistent) but inefficient

I Standard errors and t-statistic are biased

I Robust standard errors for heteroskedastic errors(White 1992):

1

n

n∑t=1

u2t xtx ′t

p→ limn→∞

1

nX ′ΩX

where ut is the residual from the OLS estimation

I Extension to autocorrelated errors (“HAC standard errors)

32 / 79

GLS estimatorI Factorization:

Ω−1 = ΨΨ′

where Ψ = Ω−1/2

I Transformation:

Ψ′y = Ψ′Xβ + Ψ′u

y∗ = X ∗β + u∗

I Covariance matrix of u∗:

E (u∗u∗′) = Ψ′(ΨΨ′)−1Ψ

= In

I GLS estimator:

β = (X ∗′X ∗)−1X ∗′y∗

= (X ′ΨΨ′X )−1X ′ΨΨ′y

= (X ′Ω−1X )−1X ′Ω−1y

33 / 79

I GLS is BLUE and MVUE (under normality)

I Feasible (estimated) GLS estimator (FGLS, EGLS)⇒ Replace Ω by Ω

I GLS-Transformation of the heteroskedastic modelΩ =diag(σ2

1, . . . , σ2n)

1

σtyt =

(1

σtxt

)′β +

1

σtut

I Estimating the variance function

σ2t = z ′tα

Using E (u2t ) ≈ σ2 we have

u2t = z ′tα + v∗t

34 / 79

I OLS estimation yields:

σ2t = z ′t α

I FGLS estimator

1

σtyt =

(1

σtxt

)′β +

1

σtut

I A Monte Carlo experiment:

yt = βxt + ut

with two subsamples:

t = 1, . . . , n/2 : ut ∼ N(0, 1), xt ∼ N(0, 1)

t = n/2 + 1, . . . , n : ut ∼ N(0, λ), xt ∼ N(0, 2)

T = 100, 5000 MC replications

35 / 79

Results: Standard deviations

λ OLS GLS σ2(X ′X )−1 White (X ′Ω−1X )−1

1 0.0645 0.0648 0.0638 0.0622 0.06292 0.0863 0.0834 0.0780 0.0830 0.08093 0.1005 0.0928 0.0900 0.0994 0.09174 0.1178 0.1012 0.1005 0.1131 0.09905 0.1270 0.1066 0.1102 0.1259 0.1048

10 0.1838 0.1232 0.1489 0.1754 0.119920 0.2559 0.1310 0.2056 0.2460 0.1322

I Estimating the autocorrelated model:

AR(1) error process

ut = ρut−1 + εt

where the process starts at t = −∞ and |ρ| < 1

E (εt) = 0 for all t

E (ε2t ) = σ2

ε for all t

E (εtεs) = 0 for all t 6= s36 / 79

It follows:

Var(ut) ≡ σ2u =

σ2ε

1− ρ2

I Transformed regression

yt−ρyt−1 = (xt − ρxt−1)′β + εt

for t = 2, 3, . . . , nI First observation:(√

1− ρ2)

y1 =[(√

1− ρ2)

x1

]′β + u∗1

where Var(u∗1) = σ2ε

I 2-step estimator:ut = yt − x ′t β

and OLS regression to obtain ρ

ut = ρut−1 + et

⇒ perform GLS transformation and re-estimate the modelusing OLS

37 / 79

Relative efficiency: AR(1) model

I Data generating process:

yt = xt · 1 + ut

where u∗t = 0.5u∗t−1 + εt and ut = u∗t /σu∗.

I Results of the Monte Carlo experiment:

Standard deviations

λ OLS GLS σ2(X ′X )−1 N-W(6) (X ′Ω−1X )−1

0.0 0.0880 0.0891 0.0879 0.0819 0.08700.2 0.1035 0.1023 0.0934 0.0943 0.09830.4 0.1401 0.1219 0.1142 0.1246 0.11830.6 0.2318 0.1583 0.1701 0.2004 0.15570.8 0.6048 0.2742 0.3966 0.5051 0.26660.9 1.6395 0.4838 1.0131 1.3501 0.4882

38 / 79

1.6 Nonlinear models

Nonlinear relationship between yt and xt

yt = h(xt , β) + ui

y = h(X , β) + u

where uti .i .d .∼ N(0, σ2)

I Linear Taylor series expansion:

h(X , β) ≈ h(X , β0) +∂h(X , β0)

∂β′(β − β0)

pseudo-linear model:

yt(β0) = zt(β0)′β + ut

y(β0) = Z (β0)β + u

39 / 79

where

y(β0) = y − h(X , β0) + Z (β0)β0

Z (β0) =∂h(X , β0)

∂β

β = [Z (β0)′Z (β0)]−1Z (β0)′y(β0)

= β0 + [Z (β0)′Z (β0)]−1Z (β0)′[y − h(X , β0)]

I Nonlinear LS estimation

S(β) = [y − h(X , β)]′[y − h(X , β)]

first derivative

∂S(β)

∂β= −2Z (β)′[y − h(X , β)]

!= 0

⇒ Nonlinear LS estimation is equivalent to estimation of thepseudo-linear model

40 / 79

I Gauss-Newton (or scoring) algorithm:

From the pseudo-linear model:

βm+1 = βm + [Z (βm)′Z (βm)]−1Z (βm)′[y − h(X , βm)]

⇒ Iterations until |βm+1 − βm| → 0

I distribution of the LS estimator:

βa∼ N(β, Σβ)

whereΣβ = σ2[Z (β)′Z (β)]−1

and

σ2 =1

n − k[y − h(X , β)]′[y − h(X , β)]

41 / 79

2. Instrumental variables and GMM estimation“Extremum estimators”:

θ = argmaxθ∈Ω

m(θ, y ,X )

Alternative objective functions:

estimator m(θ; y ,X )

ML `(θ)GMM m(θ)′Λ m(θ)

MD [π − g(θ)]′Σ−1π [π − g(θ)]

2.1 IV estimator

I IV estimator for the linear regression

y = Xβ + u

for the case E(X ′u) 6= 0

I Instrumental variables: Z is n × l (` ≥ k)

42 / 79

I Moment condition:E(Z ′u) = 0

I Transformed equation:

Z ′Y = Z ′Xβ + Z ′u

Y∗ = X∗β + u∗

whereE (u∗u

′∗) = σ2E(Z ′Z )

GLS yields:

βIV = [X ′∗(Z ′Z )−1X∗]−1X ′∗(Z ′Z )−1y∗

= [X ′Z (Z ′Z )−1Z ′X ]−1X ′Z (Z ′Z )−1Z ′y

I If

limn→∞

1

nE (Z ′u) = 0 and lim

n→∞

1

nE (Z ′X ) = Ψ

with rk(Ψ) ≥ k then the IV estimator is consistent

43 / 79

Two-stage least-squares interpretation

I IV estimator is identical to the (Gaussian) ML estimator in

y = Xβ + u

X = Z Π + V with u = V δ + ε

I Replacing V by V = Mz X and using I = Mz + Pz yields

y = Xβ + V δ + Mz V + ε

= Pz Xβ + V (δ + β) + e

= Xβ + u∗

where X = Z Π is the LS estimator for X based on Z

I Note that plimE(

1n Z ′u∗

)= 0

⇒ replacing X by X yields a consistent estimator

I Since Pz is idempotent we have

β2s = (X ′Pz X )−1X ′Pz y = βiv

44 / 79

I Just-identified case: ` = k

βIV = (Z ′X )−1Z ′y

I Asymptotic distribution:

√n(βIV − β)

d→ N(

0, σ2[Sxz S−1zz Szx ]−1

)where Sab = plim

n→∞1n

∑nt=1 atb′t

I Sargan test: Test of H0 : E (Z ′u) = 0LM test of γ = 0 in the auxiliary regression

u = Zγ + e

yielding

S =1

σ2u′Z (Z ′Z )−1Z ′u

Test is χ2 distributed with `− k degrees of freedom

45 / 79

Test for exogenous regressors

I H0 : X2 in y = X1β + X2γ + u is exogenous:

H0 : E (X ′2u) = 0 or δ = 0 in u = V δ + ε

give the set of all exogenous variables: Z = [X1,X3].

I Durbin-Wu-Hausman test:

(βiv − β)′V−1∆ (βiv − β) ∼ χ2

k

where V∆ = Var(βiv )− Var(β)

I OLS is equivalent to IV if δ = 0 and therefore

y = X1β + X2γ + V2(γ + δ) + ut

= X1β + X2γ + V δ + ut

⇒ test whether the residuals V are significant

46 / 79

2.2 GMM estimator

I `× 1 moment condition, e.g.,

E [mt(β)] = E[zt(yt − x ′tβ)] = 0

I Minimization of the criterion function:

θgmm = argminθ∈Ω

[n∑

t=1

mt(θ)′

]Wn

[n∑

t=1

mt(θ)

]

I With mt(β) = zt(yt − x ′tβ)

θgmm = (X ′ZWnZ ′X )−1X ′ZWnZ ′y

for nonlinear moment conditions: replace Z ′X by∂∑

mt(θ)/∂θ′

47 / 79

I Distribution:

Var(θgmm) = (X ′ZWnZ ′X )−1X ′ZWnΣmWnZ ′X (X ′ZWnZ ′X )−1

where Σm = E [mt(θ0)mt(θ0)′]

I Optimal weight matrix:

Wn =

[E

n∑t=1

mt(θ0)mt(θ0)′

]−1

Note: if Z is independent of u we have

En∑

t=1

mt(β)mt(β)′ = E (Z ′ΩZ )

where E(uu′) = Ω

48 / 79

I Estimated weight matrix:

W =

[n∑

t=1

mt(θ)mt(θ)′

]−1

such that for the linear model:

Var(θgmm) = (X ′Z WnZ ′X )−1

⇒ if Ω = σ2I , GMM = IVI Serially correlated errors:

W =

E

[n∑

t=1

mt(θ)

][n∑

t=1

mt(θ)′

]−1

W =

∑j=−`

gj Cj

where for j ≥ 0 : Cj =

n∑t=j+1

mt(θ)mt−j (θ)′ and Cj = C ′−j for j < 0

and gj is some weight function, e.g. gj = (`− j + 1)/(`+ 1)49 / 79

2.3 Tests based on IV/GMM

(i) Hansen-Sargan statistic:H0: The model is correctly specified

E[mt(θ)] = 0

I Test statistic:

Q(θ) =

[n∑

t=1

mt(θ)

]′W

[n∑

t=1

mt(θ)

]d→ χ2

m−k

I pseudo LR-Statistic

Q(θr )− Q(θ)d→ χ2

j

I Wald-test straightforward

50 / 79

2.4 Weak instrumentsI Two-stage representation:

y = Xβ + u

X = Z Π + V u = V γ + ε

I Strong instruments imply rk(π) ≥ k

I Weak instruments: π = C/√

n, where rk(C ) ≥ k⇒ OLS is inconsistent and not asymptotically normallydistributed

I Indication for weak instrument (k = 1):centralization parameter:

µ2 =1

σ2v

Π′Z ′Z Π

I F -statistic for H0 : Π = 0:

F = µ2/`

51 / 79

I Distribution if there is NO identification

βiv = β +Z ′u

Z ′v6= N(0, σ

β2)

I Stock-Yogo test: H0 : µ2/√

n ≤ c0

⇒ Instruments are “too weak”

I Generalization to k > 1: Test for H0 : rk(Π) ≥ k (eg.Kleijbergen/Paap)

I Testing hypotheses under weak/no identification:

y − Xβ0 = X (β − β0) + u

= Z Π(β − β0) + u∗

= Zϕ+ u∗

I Anderson-Rubin statistic: F -Test of ϕ = 0 is χ2`−k

I LM test of ϕ = 0 is χ2k distributed (Kleibergen 2002)

52 / 79

53 / 79

3.1 Univariate time series models1.1 Basic concepts

I stochastic processes: Yt(ω) with t = 1, 2, . . . ,Tω : (vector of) random variables

I white noise process:

E (Yt) = µ (usually µ = 0)

var(Yt) = σ2

E (Yt − µ)(Ys − µ) = 0 for t 6= s

I weak stationarity:

E (Yt) = µ for all t

E (Yt − µ)2 = σ2 for all t

E [(Yt − µ)(Ys − µ)] = E (Yt − µ)(Yt−|t−s| − µ) for all t

I strict stationarity:

f (Yt ,Yt−1, . . . ,Yt−m) = f (Yt−h,Yt−h−1, . . . ,Yt−h−m)

for all m and h54 / 79

I The autocovariance function of stationary time series:

γk = E (Yt − µ)(Yt−k − µ)

γ0 is the variance

I autocorrelation function

ρk =γk

γ0

I Estimation of γk :

ck =1

T

T∑t=k+1

(yt − y)(yt−k − y)

rk = ck/c0

I Modifications:• correction for missing k observations• degrees of freedom correction

55 / 79

Are economic data stationary?

Usually some transformations are required to obtain stationaryseries:

I logarithm (to stabilize variances)

I first differences (to remove the trend)

I annual differences (to remove seasonally changing means)

I Seasonal adjustment

I deviations from an estimated trend

I Adjustment for structural breaks (dummy variables)

I normalization relative to a scale variable

56 / 79

3.2 Seasonality and trends

I Component models

Yt = Gt + St + ut or Yt = Gt · St · ut

a) Gt : trend component• Polynomial trend: Gt = a0 + a1t + · · ·+ aqtq

b) St : seasonal component• deterministic model: st = β1d1,t + · · ·+ β12d12,t

where dj ,t is a seasonal dummy variable• stochastic seasonals: st = −st−1 − · · · − st−11 + εt ,where εt is an error term

Seasonal adjustment procedure (CENSUS)

I First version of 1954 (U.S. Bureau of Census),

I Version X-11 of 1965, X-12-ARIMA of 1998

57 / 79

(1) Computing raw seasonal component by applying centered12-months moving averages:

s∗t =yt

D12(yt)≈ stut

(2) Smoothing of the raw components: st .(3) Outlier adjustment:

|s∗t − st | > 2σ(st)

⇒ replace by the average of neighborhood values(4) Smoothing of outlier adjusted seasonal component and computing

the (preliminary) seasonally adjusted series:

y at =

yt

st

(5) Application of 15-points trend filter (Spencer):

gt = F 15(yt)

Computation of new raw seasonal components

yt

gt≈ st · ut

Continue with (2).

58 / 79

Hodrick-Prescott (HP) Filter:

T∑t=1

(yt − gt)2

︸︷︷︸“fit′′

+T−1∑t=2

λ (∆gt+1 −∆gt)2

︸︷︷︸“smoothness′′

= min!

I Usual values:

λ = 100 for annual data

λ = 1600 for quarterly data

λ = 14400 for monthly data

I Differentiation with respect to gt yields:

t = 1 y1 = (1 + λ)g1 − 2λg2 + λg3

t = 2 y2 = −2λg1 + (1 + 5λ)g2 − 4λg3 + λg4

· · · yt = λgt−2 − 4λgt−1 + (1 + 6λ)gt − 4λgt+1 + λgt+2

t = T − 1 yT−1 = λgT−3 − 4λgT−2 + (1 + 5λ)gT−1 − 2λgT

t = T yT = λgT−2 − 2λgT−1 + (1 + λ)gT

⇒ solving for g = [g1, . . . , gT ]′ 59 / 79

3.3 Autoregressive models

I White noise process: εt with

E (εt) = 0 and E (εt) = σ2

I Autoregressive AR(p) process:

Yt = α1Yt−1 + α2Yt−2 + · · ·+ αpYt−p + εt

I the lag operator:

LYt ≡ Yt−1 , Lk Yt ≡ Yt−k

I Notation using lag polynomials:

α(L)Yt = εt

where α(L) = 1− α1L− · · · − αpLp

60 / 79

The AR(1) model

Yt = αYt−1 + εt

I successive replacement yields:

Yt = εt + αεt−1 + α2εt−2 + α3εt−3 + · · ·I properties for |α| < 1:

λ0 = var(Yt) =σ2

1− α2

λk = cov(Yt ,Yt−k ) = αk σ2

1− α2

Autocorrelation: ρk = λk/λ0 = αk

I properties for |α| = 1: (“random walk”)

Yt = Y0 + ε1 + ε2 + · · ·+ εt

var(Yt) = var(Y0) + tσ2

cov(Yt ,Yt−k ) = var(Y0) + (t − k)σ2

61 / 79

3.4 MA and ARMA models

I Moving-Average (MA) process:

Yt = εt − β1εt−1 − · · · − βqεt−p

Yt = β(L)εt

where β(L) = 1− β1L− . . .− βqLq.

I special case MA(1): Yt = εt − βεt−1

γ0 = E (Y 2t ) = (1 + β2)σ2

γk = E (YtYt−k ) =

−βσ2 for k = 10 for k ≥ 2

I ARMA process

Yt = α1Yt−1 + · · ·+ αpYt−p + εt − β1εt−1 − · · · − βqεt−q

α(L)Yt = β(L)εt

62 / 79

Wold representation

I AR representation

β(L)−1α(L)Yt = γ(L)Yt = εt

where γ(L) = 1− γ1L− γ2L2 − . . .

I All stationary time series have a MA representation:

Yt = εt + φ1εt−1 + φ2εt−2 + · · ·= φ(L)εt

whereεt = Yt − E (yt |Ft−1)

where Ft = Yt ,Yt−1, . . .I ARMA models with φ(L) = β(L)/α(L)

63 / 79

3.5 Unit root tests

I Test of the null hypothesis α = 1 in the AR(1) model:

Yt = αYt−1 + εt

∆Yt = (α− 1)︸︷︷︸φ

Yt−1 + εt

I Problem: φ (and t statistic) is not normally distributed

I Let W (a) denote a Brownian motion. Then

T φd→∫ 1

0 W (a)dW (a)∫ 10 W (a)2da

tφd→∫ 1

0 W (a)dW (a)√∫ 10 W (a)2da

I Critical values see e.g. Hamilton (1994)

64 / 79

65 / 79

66 / 79

Extensions

I Deterministic terms:

∆Yt = c + φYt−1 + εt

∆Yt = c + βt + φYt−1 + εt

⇒ Null distribution changes

I AR(p) process

H0 : α1 + · · ·+ αp = 1 or α(1) = 0

Test of φ = 0 in the model:

∆Yt = φYt−1 + γ1∆Yt−1 + · · ·+ γp−1∆Yt−p+1 + εt

⇒ null distribution does not change

67 / 79

3.6 Dynamic regression equations

I ADL(p, q) regression model:

α(L)yt = β(L)xt + ut

Assumptions: (i) roots of α(z) = 0 are outside the unit circle(ii) xt is stationary

I Partial adjustment

I desired (long-run) level:

y∗t = βxt + ut

I partial adjustment:

yt = (1− α)y∗t + αyt−1

I Inserting yieldsyt = αyt−1 + γxt + u∗t

mit γ = (1− α)β

68 / 79

3.7 The spurious regression problem

I Assume:

yt ∼ I (1)

xt ∼ I (1)

⇒ In general yt − xtβ is also I (1)I Spurious regression: If yt and xt are independent random

walks:I t-values are often significantI large R2

I Low Durbin-Watson statistic

I Common trend model (“cointegration”)

xt = rt + ε1t ∼ I (1)

yt = βrt + ε2t ∼ I (1)

yt − βxt = ε2t − βε1t = ut ∼ I (0)

69 / 79

3.8 Cointegration: Single equation approach

I Properties of OLS:

• β is “super-consistent”

• robust against endogenous xt

• Efficient only if (i) xt is exogenous (ii) et is seriallyuncorrelated

• t statistics are generally invalid

I Test for cointegration:

1. Step: ADF test of yt and xt

2. Step: ADF test of the residuals ut = yt − xt β

I Critical values depend also on k

70 / 79

71 / 79

Dynamic OLS estimator (DOLS)

I Decomposition of the errors:

ut =

p∑j=−q

γ′j ∆xt−j + εt

I Inserting in regression equation yields:

yt = β′xt +

p∑j=−q

γ′j ∆xt−j + εt

⇒ DOLS estimator is asymptotically efficient

I Autocorrelation of εt :

− Newey-West HAC standard errors (Saikkonen 1991)

− GLS based on AR(p) errors (Stock-Watson 2003)

72 / 79

Engle-Granger two-step approach

I Error correction representation

yt = αyt−1 + β′0xt + β′1xt−1 + εt

yields:∆yt = γ(yt−1 − β′xt−1) + δ′∆xt + εt

(yt−1 − β′xt−1) is the error correction term

I replace β by β (E/G 2-step estimator)

I EC test for cointegration:

∆yt = γ1yt−1 + γ′2xt−1 + δ′∆xt + lags + εt

t-test of H0 : γ1 = 0Critical values are tabulated in Banerjee et al. (1999)

73 / 79

3.9 Dynamic systems: VAR modelI time series vector:

yt =

y1t

y2t...

yKt

I vector autoregressive (VAR) model:

yt = A1yt−1 + A2yt−2 + · · ·+ Apyt−p + ut

A(L)yt = ut

where A(L) = I − A1L + · · · − ApLp

I Assumptions:

E (ut) = 0

E (utu′t) = Σ =

σ11 σ12 · · · σ1K

σ21 σ22 · · · σ2K...

...σK1 σK2 · · · σKK

74 / 79

EstimationI Regression format:

yt = c + A1yt−1 + · · ·+ Apyt−p + εt

= Bxt + εt

where B = [c ,A1, · · · ,Ap] and

xt =

1

yt−1...

yt−p

I Differentiating with respect to B and Ω yields

B =

(T∑

t=1

ytx ′t

)(T∑

t=1

xtx ′t

)−1

Ω =1

T

T∑t=1

(yt − Bxt)(yt − Bxt)′

single-equation OLS is consistentand asymptotically efficient

75 / 79

Granger causality

I Bivariate process:[a11(L) a12(L)a21(L) a22(L)

] [xt

yt

]=

[ε1t

ε2t

]yt is not causal for xt if

H0 : a12(L) = 0

I Test of the hypothesis using the first equation

xt = α1xt−1 + · · ·+ αpxt−p

+β1yt−1 + · · ·+ βpyt−p + ε1t

I F test of β1 = · · · = βp = 0

76 / 79

3.10 Structural (identified) VAR modelsI forecast errors (“innovations”)

ut+1 = yt+1 − E (yt+1|yt , yt−1, . . .)

orthogonal shocks (recursive scheme):

u1t = ε1t

u2t = %21ε1t + ε2t

u3t = %31ε1t + %32ε2t + ε3t

...

uKt = %K1ε1t + %K2ε2t + · · ·+ εKt

where E (εitεjt) = 0 for i 6= j .I Choleski decomposition:

R =

1 0 0 · · · 0 0%21 1 0 · · · 0 0%31 %32 1 · · · 0 0

.... . .

...%K1 %K2 %K3 · · · %K ,K−1 1

77 / 79

and ut = Rεt

E (εtε′t) = D = diagvar(ε1t), . . . , var(εKt)

so that

Σ = E (utu′t) = RDR ′

= PP ′

where P = RD1/2

⇒ P is obtained from a Choleski factorization:

Σ = PP ′

I Impulse response function:

yt = ut + Φ1ut−1 + Φ2ut−2 + · · ·= Rεt + Φ1Rεt−1 + Φ2Rεt−2 + · · ·= Θ0εt + Θ1εt−1 + Θ2εt−2 + · · ·

78 / 79

where Θh = ΦhR and

θij (h) =∂yi ,t+h

∂εjt

⇒ IRF θij (h) represents the dynamic effect of yj ,t+h w.r.t. the“shock” εit

I Estimation of the impulse response function:

1. Estimated VAR-Model: A1, . . . , Ap, Σ2. Compute MA representation using

Φi =i∑

j=1

Φi−j Aj and Φ0 = Ik

3. Choleski decomposition: Σ = PP ′

4. impulse response matrices:

Θh = ΦhR

79 / 79

Documents

Angewandte Okonometrie Bachelor-Studiengang VWL (Wahlp icht) · Angewandte Okonometrie Bachelor-Studiengang VWL (Wahlp icht) Wintersemester 2012/13 Prof. Dr. J org Breitung 1/79