Slides 2014 Panel Data

An Introduction to Panel Data

Econometrics

Joan Llull

Advanced Time Series and Panel DataBarcelona GSE

Spring Term 2014

joan.llull [at] movebarcelona [dot] eu

Outline of the course

1. Introduction to Panel Data

2. Static Models

2.1 The xed eects model. Within groups estimation.

2.2 The random eects model. Error components.

2.3 Testing for correlated individual eects.

3. Dynamic Models

3.1 Autoregressive models with individual eects.

3.2 Dierenced GMM estimation.

3.3 System GMM estimation.

An Introduction to Panel Data Econometrics. Spring Term 2013 2

Introduction to Panel Data


Panel data

Panel data consists of repeated observations of a given cross-

section of individuals over time. Examples.

Individuals can be persons, households, rms, countries,...

It is dierent from repeated cross-sections.

Main advantages of panel data:

Unobserved heterogeneity

Dynamic responses and error components


Micro and macro panel data

Micro panel data usually has large N , small T (e.g. household

surveys).

Macro panel data usually have longer T , but smaller N (e.g.

countries).

In this course, we will review asymptotic results only for the case

of xed T , N→∞ (micro panels).

Hence, the results that we are after are closer to cross-section

approaches than to time series.


Employment equations for U.K. rms

We will use the same example all over the course.

Consider the following equation for rm i demand of employ-

ment in year t:

nit = β0 + β1wit + β2kit + ηi + vit,

where

nit is log employment

wit is log wage

kit is log capital

ηi + vit is unobserved

Available in Gretl: data from Arellano and Bond (1991).


General notation

We will consider the following model:

yit = x′itβ + (ηi + vit)

where xit and β are vectors:

xit =

x1it...

xKit

and β =

β1...

βK

yit and xit are observed, and ηi and vit are unobserved


Matrix notation

We can write the model piling observations for each individual:

yi = Xiβ + (ηi + vi)

where yi, β, and vi are vectors, and Xi is a matrix:

yi =

yi1...yiT

, Xi =

x1i1 . . . xKi1...

. . ....

x1iT . . . xKiT

,β =

β1...

βK

, and vi =

vi1...viT

,

and ηi = ηiιT , where ιT is a size T vector of ones.


Matrix notation (cont'd)

And, on top we can pile over all observations:

y = Xβ + (η + v)

where yi, β, η, and vi are vectors, and Xi is a matrix:

y =

y11...

y1T...

yN1...

yNT

=

y1...

yN

, X =

X1...

XN

,η =

η1...

ηN

, and v =

v1...

vN

.


Pooled OLS

Dene: u ≡ η + v ⇔ uit ≡ ηi + vit.

Then, we have y = Xβ + u.

A simple approach would be to do standard OLS:

βOLS = (X ′X)−1X ′y.

We assume that E[xitvit] = 0 (xit is predetermined).

Therefore, the properties of βOLS depend on E[xitηi]


Pooled OLS in Employment equations

In our previous example, we would redene our regression as:

nj = β0 + β1wj + β2kj + uj ,

where I used subindex j to emphasize that each observation it isconsidered as one independent observation.

In Gretl we would do:

ols n const w k --robust


Properties of Pooled OLS

The properties of βOLS depend on E[xitηi].

If E[xitηi] = 0 ⇒E[xitvit]=0

E[xjuj ] = 0 (random eects):

βOLS is consistent as N→∞, or T →∞, or both.

it is ecient only if σ2η = 0.

cross-section produce (asymptotically) the same results.

If E[xitηi] 6= 0⇒ E[xjuj ] 6= 0 (xed eects):

βOLS is inconsistent as N→∞, or T →∞, or both.

cross-section results are inconsistent as well (but panel helpsin constructing a consistent estimator).

Panel data is particularly very useful in the second case.


Employment equations for U.K. rms

In our example of employment in U.K. rms:

Most productive rms may pay higher wages and yet hire

more workers.

Larger plants can use larger amounts of capital and yet hire

more workers.

...


Static Models


General assumptions for static models

Our model is given by:

yit = x′itβ + ηi + vit

with

E[xitηi] 6= 0 (xed e.) or E[xitηi] = 0 (random e.).

E[xitvis] = 0 ∀ s, t (strict exogeneity): rules out eects ofpast vis on current xit (and thus lagged dependent variables).

E[ηi] = E[vit] = E[ηivit] = 0 (error components).

E[vitvis] = 0 ∀ s, t (serially uncorrelated shocks).

ηi ∼ iid(0, σ2η), vit ∼ iid(0, σ2

v) (iid+homoskedasticity).


The xed eects model.

Within groups estimation.


Within groups transformation

Consider the following transformation of the model:

yit = yit − yi yi =1

T

T∑t=1

yit.

Notice that:

ηi =1

T

T∑t=1

ηi =1

TTηi = ηi.

Therefore:

yit = (xit − xi)′β + (ηi − ηi) + (vit − vi) = x′itβ + vit.


Within groups estimator

Given the previous assumptions,

E[xitvit] = 0.

Therefore, OLS on the transformed model,

βWG =(X ′X

)−1X ′y,

provide a consistent estimator regardless of whether E[xitηi] 6= 0or E[xitηi] = 0

This consistency is an advantage of βWG compared to cross-

section OLS, pooled OLS or GLS (to be seen below).


Disadvantages of within groups estimator

This advantage comes at a cost:

Not ecient:

When N→∞ but T is xed, less ecient that e.g. βGLS ifE[xitηi] = 0.

If E[xitηi] 6= 0, it is only ecient when all regressors arecorrelated with ηi.

It does not allow to identify coecients for (even almost)

time-invariant regressors:

yit = x′itβ +w′iγ + ηi + vit


The role of strict exogeneity

In the case where N→∞ and T is xed, consistency depends on

strict exogeneity.

To see it, recall that

xit = xit −1

T(xi1 + ...+ xiT ) and vit = vit −

1

T(vi1 + ...+ viT ).

Therefore E[xitvit] = 0 requires E[xitvis] = 0 ∀ s, t unless T →∞.

This has motivated the development of dynamic panel data

models, to relax this assumption.


Within Groups in Employment equations

In our example, Gretl results from the pooled estimation were:

ols n const w k --robust

If we now estimate βWG, we get:

panel n const w k --fixed-effects --robust


Least Squares Dummy Variables (LSDV)

The Within Groups estimator can also be obtained by including

a set of N individual dummy variables:

yit = x′itβ + η1D1i + ...+ ηNDNi + vit,

where Dhi = 1h = i (e.g. D1i takes the value of 1 for the

observations on individual 1 and 0 for all other observations).

OLS estimation of this model gives numerically equivalent es-

timates to WG (that's why βWG is a.k.a. βLSDV ).

This gives intuition on why WG is not very ecient if there is

only limited time-series variation (degrees of freedom are NT −K −N = N(T − 1)−K)


LSDV in Employment equations

In Gretl, we can generate individual (rm) dummies and estimate

by OLS, to check that it delivers the same results:

discrete unit

dummify unit

ols n w k Dunit ∗ --robust


First-dierenced OLS (FDLS)

Another transformation we can consider is rst dierences:

∆yit = ∆x′itβ + ∆vit, for i = 1, ..., N ; t = 2, ..., T

where ∆yit = yit − yit−1.

It is an alternative way to get rid of time-invariant individual

eects (∆ηi = ηi − ηi = 0), so OLS on the dierenced model is

consistent.


Some properties of FDLS

Consistency requires E[∆xit∆vit] = 0 which is (implied by but)

weaker than strict exogeneity.

WG is more ecient than FDLS under classical assumptions.

FDLS is more ecient if vit is random walk (i.e. ∆vit = εit ∼iid(0, σ2

ε)).

FDLS (and not WG) is consistent if vi1, ..., vit−2, vit+1, ..., viTare correlated with xit, as long as vit and vit−1 are uncorrelated

with xit.


FDLS in Employment equations

In Gretl, we can generate st dierences and estimate by OLS,

to get FDLS:

diff n w k

ols d n d w d k --robust


The random eects model.

Error components.


Uncorrelated eects

Now we go back to the assumption of uncorrelated or random

eects: E[xitηi] = 0.

In this case, pooled (and cross-section) OLS estimates are con-

sistent, but not ecient.

The ineciency is provided by the serial correlation induced

by ηi:

E[uituis] = E[(ηi + vit)(ηi + vis)] = E[η2i ] = σ2

η

The variance of the unobservables (under classical assumptions) is:

E[u2it] = E[η2

i ] + E[v2it] = σ2

η + σ2v


Error structure

Therefore, the variance-covariance matrix of the unobservables is

E[uiu′i] =

σ2η + σ2

v σ2η . . . σ2

η

σ2η σ2

η + σ2v . . . σ2

η...

.... . .

...

σ2η σ2

η . . . σ2η + σ2

v

= Ωi,

whose dimensions are T × T , and E[uiu′h] = 0 ∀ i 6= h, or

E[uu′] =

Ω1 0 . . . 00 Ω2 . . . 0...

.... . .

...

0 0 . . . ΩN

= Ω,

which is block-diagonal with dimension NT ×NT .


Generalized Least Squares (GLS)

Under the classical assumptions, GLS (or random eects) estima-

tor is consistent and ecient if E[xitηi] = 0:

βGLS =(X ′Ω−1X

)−1X ′Ω−1y.

If E[xitηi] 6= 0 GLS is inconsistent as N→∞ and T is xed.

This estimator is unfeasible because we do not know σ2η and σ

2v .


Theta-dierencing

βGLS is equivalent to OLS on the following transformation of

the model:

y∗it = x∗it′β + u∗it,

where

y∗it = yit − (1− θ)yi,

and

θ2 =σ2v

σ2v + Tσ2

η

.

Two special cases:

If σ2η = 0 (i.e. no individual eect), OLS is ecient.

If T →∞, then θ→ 0, and y∗it→ yit = yit − yi: WG is ecient.


Between Groups (BG)

A consistent but inecient way to estimate β if E[xitηi] = 0 is

to do OLS on the following cross-section:

yi = x′iβ + ηi + vi, i = 1, ..., N

Between groups is not ecient, and, in practice, it is only used

to estimate σ2η when implementing feasible GLS.


Feasible GLS (FGLS)

βGLS is unfeasible because we do not know σ2η and σ

2v .

A consistent estimator of σ2v is provided by the WG residuals:

ˆvit = yit − x′itβWG

σ2v =

ˆv′ ˆv

N(T − 1)−K

Then, a consistent estimator of σ2η is given by the BG residuals:

ûi = yi − x′iβBG

σ2u =

(σ2η +

1

Tσ2v) =

û′ û

N −K

σ2η = σ2

u −1

Tσ2v


Feasible GLS in Employment equations

In our previous example, Gretl results from the WG estimation

were:

panel n const w k --fixed-effects --robust

If we now estimate βFGLS , we get:

panel n const w k --random-effects --robust


Testing for correlated individual eects.


Testing for correlated individual eects

With xed T , it is useful to test whether some of the included

explanatory variables are correlated with the unobserved eects.

βWG is consistent regardless of E[xitηi] being equal to zero or

not.

βFGLS is consistent only if E[xitηi] = 0.

⇒ we can test whether both estimates are similar!


Hausman test

The Hausman test does exactly this comparison:

h = q′[avar(q)]−1qa∼ χ2(K)

under the null hypothesis E[xitηi] = 0, where

q = βWG − βFGLS ,

and

avar(q) = avar(βWG

)− avar

(βFGLS

).

The test require classical assumptions (under which the FGLS

is more ecient that WG).


Hausman test in Employment equations

Gretl output for random eects includes the Hausman test:

panel n const w k --random-effects --robust


Dynamic Models


Autoregressive models with individual

eects.


Autorregressive panel data model

We will consider the following model:

yit = αyit−1 + ηi + vit |α| < 1.

Other regressors can be included, but main results unaected.

We assume that we observe yi0, and, hence, this model is dened

for i = 1, ..., N and t = 1, ..., T .

Two important properties:

E[yit−1ηi] = σ2η

(1−αt−1

1−α

)> 0 since yit−1 includes ηi.

E[yit−1vit−1] = σ2v > 0 for the same reason.

Hence, yit−1 is correlated with the individual eects, and it

is not strictly exogenous.An Introduction to Panel Data Econometrics. Spring Term 2013 41

Properties of pooled OLS and WG estimators

Even assuming E[yit−1vit] = 0, still pooled OLS delivers

plimN→∞

αOLS > α,

because E[yit−1ηi] = σ2η

(1−αt−1

1−α

)> 0.

Doing the within groups transformation we see that

plimN→∞

αWG < α

because E[yit−1vit] = −Aσ2v < 0.

(A =

(1−α)(1+T

(1−αt−1−αT−1−t

))+αT

(1−αT−1

)T2(1−α)2

)

WG bias vanishes as T →∞ (bias not small even with T = 15).

Supposedly consistent estimators that give α >> αOLS or α << αWG should

be seen with suspicion.


Instrumental variables

Consider the model in rst dierences:

∆yit = α∆yit−1 + ∆vit.

OLS on the rst dierences is inconsistent:

E[∆yit−1∆vit] = −σ2v < 0.

However, if E[yit−1vit] = 0 (serially uncorrelated shocks), then

yit−2 or ∆yit−2 are valid instruments for ∆yit−1:

Relevance: E[∆yit−2∆yit−1] = E[yit−2∆yit−1] = E[−y2it−2] 6= 0.

Orthogonality: E[∆yit−2∆vit] = E[yit−2∆vit] = 0.


Anderson and Hsiao

The 2SLS estimators of this type were suggested by Anderson

and Hsiao (1981):

αAH =(

∆y′−1∆y−1

)−1∆y′−1∆y,

where

∆y−1 = Z(Z ′Z

)−1Z ′∆y−1,

where Z can be y−2 or ∆y−2.

A minimum of three periods (T = 2 plus yi0) is needed to

implement it.

This approach is only ecient if T = 2 (otherwise, additional

instruments are available).


Dierenced GMM estimation.


GMM in 3 slides (I): the setup

GMM nds parameter estimates that come as close as possible

to satisfy orthogonality conditions (or moment conditions) in

the sample.

E.g., consider K regressors β and L instruments zi:

ui = yi − f(xi,β) zi = g(xi).

The model species L moment conditions: E[ziui] = 0.

Sample analogue:

bN (β) =1

N

N∑i=1

ziui(β)


GMM in 3 slides (II): the estimation

Two possible cases cases:

L = K (just identied): bN (βGMM ) = 0.

L > K (overidentied): βGMM = arg minβ bN (β)′WNbN (β).

WN is a positive denite weighting matrix.

Optimal GMM (ecient) uses(

1N

∑Ni=1 E[ziuiu

′iz′i])−1

, the in-

verse of the variance-covariance matrix, as weighting matrix.


GMM in 3 slides (III): asymptotic properties

βGMM is a consistent estimator of β.

It is asymptotically normal, with the following variance:

avar(βGMM ) = (D′WD)−1D′WSWWD(D′WD)−1

where

D = plimN→∞

∂bN (β)

∂β′

W = plimN→∞

WN

SW =1

N

N∑i=1

E[ziuiu′iz′i]


Back to the AR(1) panel data: assumptions

Our model is given by:

yit = αyit−1 + ηi + vit, i = 1, ..., N ; t = 1, ..., T

with

E[ηi] = E[vit] = E[ηivit] = 0 (error components).

E[vitvis] = 0 ∀ s, t (serially uncorrelated shocks).

E[yi0vit] = 0 ∀ t = 1, ..., T (predeterm. initial cond.).


Moment conditionsGiven previous assumptions, several moment conditions:

Equation Instruments Orthogonality cond.

∆yi2 = α∆yi1 + ∆vi2 yi0 E[∆vi2yi0] = 0

∆yi3 = α∆yi2 + ∆vi3 yi0, yi1 E[∆vi3

(yi0yi1

)]= 0

∆yi4 = α∆yi3 + ∆vi4 yi0, yi1, yi2 E

∆vi4

yi0yi1yi2

= 0

......

...

∆yiT = α∆yiT−1 + ∆viT yi0, yi1, yi2, ..., yiT−2 E

∆viT

yi0yi1yi2...

yiT−2

= 0

We end up with (T − 1)T/2 moment conditions.An Introduction to Panel Data Econometrics. Spring Term 2013 50

Moment conditions in matrix form

We can write these moment conditions as E[Z′i∆vi] = 0, where

Zi =

yi0 0 0 0 . . . 0 0 . . . 00 yi0 yi1 0 . . . 0 0 . . . 0...

......

.... . .

...... . . .

...0 0 0 0 . . . yi0 yi1 . . . yiT−2

and ∆vi =

∆vi2∆vi3...

∆viT

,

and the sample analogue is

bN (α) =1

N

N∑i=1

Z′i∆vi(α)

Hence, the GMM estimator (proposed by Arellano and Bond, 1991) is

αGMM = arg minα

(1

N

N∑i=1

∆v′i(α)Zi

)WN

(1

N

N∑i=1

Z′i∆vi(α)

)=

= (∆y′−1ZWNZ′∆y−1)−1∆y′−1ZWNZ

′∆y


Optimal weighting matrix

The optimal weighting matrix (ecient GMM) is:

WN =

(1

N

N∑i=1

E[Z′i∆viv′iZi]

)−1

The sample analogue is obtained in a two-step procedure:

WN =

(1

N

N∑i=1

[Z′i∆vi(α)∆v′i(α)Zi]

)−1

Windeijer (2005) proposes a nite sample correction of the variance thataccounts for α being estimated.

The most common one-step (and rst-step) matrix uses the structure ofE[∆vi∆v

′i]:

E[∆vi∆v′i] = σ2

v

2 −1 0 0 . . . 0−1 2 −1 0 . . . 0...

......

. . ....

0 0 0 0 . . . 2


GMM on Employment equationsBy default, Gretl estimates a one-step GMM:

dpanel 1; n

However, we can also do the two-step GMM:

dpanel 1; n --two-step --asymptotic

When doing two-step, we want to do small sample correction:

dpanel 1; n --two-step


GMM on Employment equations (cont'd)The command is exible enough so that we can estimate Anderson-Hsiao:

dpanel 1; n; GMM(n,2,2)

Let us also compute OLS:

lags 1; n

ols n n 1 --robust

and Within Groups:

ols n n 1 Dunit ∗ --robust


Potential limitations of Arellano-Bond

Weak instruments:

When α→ 1, relevance of the instrument decreases.

Instruments are still valid, but have poor small sample properties.

Monte Carlo evidence shows that with α > 0.8, estimator behavespoorly unless huge samples available.

There are alternatives in the literature.

Overtting:

Too many instruments if T relative to N is relatively large.

We might want to restrict the number of instruments used.

It is always good practice to check robustness to dierent combina-tions of instruments.


Dynamic linear model

Once we include regressors, the model is

yit = αyit−1 + x′itβ + ηi + vit |α| < 1.

We maintain the previous assumptions: error components, se-

rially uncorrelated shocks, and predetermined initial conditions.

Therefore, moment conditions of the kind

E[yit−s∆vit] = 0 t = 2, ..., T, s ≥ 2

are still valid.


Assumptions on regressors

Dierent assumptions regarding xit will enable dierent addi-

tional orthogonality conditions:

xit can be correlated or uncorrelated with ηi.

xit can endogenous, predetermined, or strictly exogenous

with respect to vit.

For instance, if assumptions are analogous to those for yit−1, we

may use xit−1 (and longer lags) as instruments:

Zi =

yi0 x′i0 x′i1 . . . 0 . . . 0 0′ . . . 0′

......

.... . .

... . . ....

......

...0 0′ 0′ . . . yi0 . . . yiT−2 x′i0 . . . x′iT−1

.


Employment equations with regressorsGretl command is easily extended to include regressors:

dpanel 1; n w k

And we can do OLS:

ols n n 1 w k --robust

and Within Groups in order to compare:

ols n n 1 w k Dunit ∗ --robust


Specication tests

There are several relevant aspects for the validity of the estima-

tion that can be tested formally.

Orthogonality conditions: are they small enough to not

reject that they are zero (overidentifying restrictions)

Validity of a subset of more restrictive assumptions (dier-

ence Sargan test, Hausman test)

Serial correlation in the data: vital for the validity of

the instruments (Arellano-Bond test).


Sargan/Hansen overidentifying restrictions test

The null hypothesis is whether the orthogonality conditions are satised(i.e. moments are equal to zero).

The test can only be implemented if the problem is overidentied (other-wise the sample moments are exactly zero by construction).

The test is:

S = NJN (β) = N

1

N

N∑i=1

û′iZi

(1

N

N∑i=1

Z′iuiu′iZi

)−1

1

N

N∑i=1

Z′i ûi

,

where u are predicted residuals from the rst step and û are those predictedfrom the second stage,and

Sa∼ χ2(L−K).


Testing stronger assumptions

Some of the assumptions that we make are stronger than others.

If the problem is overidentied, we can test whether results change

if we include or exclude the orthogonality conditions generated

by them.

If they are true, increase eciency, but if not, inconsistent!

Two ways of testing it:

Overidentifying restrictions (dierence in Sargan should

be close to zero: ∆S = S − SA a∼ χ2(L− LA))

Dierences in coecients (Hausman test: q = δAGMM −

δGMM )


Direct test for serial correlation

The test was proposed by Arellano-Bond (1991).

Tests for the presence of second order autocorrelation in the

rst-dierenced residuals.

If dierences in residuals are second-order correlated, some in-

struments would not be valid!

The test is:

m2 =∆v−2

′∆v∗

se

a∼ N (0, 1),

where ∆v−2 is the second lagged residual in dierences, and ∆v∗is the part of the vector of contemporaneous rst dierences for

the periods that overlap with the second lagged vector.

Values close to zero do not reject the hypothesis of no serial

correlation.


System GMM estimation.


Additional orthogonality conditions

Without loss of generality, we will go back to the autoregressive

model:

yit = αyit−1 + ηi + vit |α| < 1

Recall our (T − 1)T/2 moment conditions:

E[yit−s∆vit] = 0 t = 2, ..., T ; s ≥ 2

System GMM (Arellano and Bover, 1995) uses additional con-

ditions:

E[∆yisηi] = 0,

or, alternatively,

E[∆yisuiT ] = 0, uiT = ηi + uiT


The System GMM estimator

Analogously to the rst-dierenced GMM, the estimator will be

given by E[(Z∗)′u∗i ] = 0:

αSys−GMM =((X∗)′Z∗WN (Z∗)′X∗

)−1X∗Z∗WN (Z∗)′y∗,

where,

Z∗i =

(Zi 0 . . . 00 ∆yi1 . . . ∆yiT−1

),u∗i =

(ui

ηi + viT

), X∗i =

(∆y−1i

yiT−1

)and y∗i =

(∆yiyiT

),

This estimator is more ecient, as it uses additional moment conditions.

It reduces small sample bias, especially when α→ 1


System-GMM on Employment equationsRemember our Gretl for the two-step rst-dierenced GMM with smallsample correction:

dpanel 1; n --two-step

Two-step system-GMM with small sample correction delivers:

dpanel 1; n --system --two-step

And the one-step system-GMM delivers:

dpanel 1; n --system


System GMM in employment equations (cont'd)And with regressors, two-step System-GMM:

dpanel 1; n w k --two-step --system

Versus OLS:

ols n n 1 w k --robust

and Within Groups estimates:

ols n n 1 w k Dunit ∗ --robust


Documents

Slides 2014 Panel Data