Multiple Equation GMM with Common Coeﬃcients: Panel Data · Key insight: With panel data, δcan be consistently estimated without using instruments. There are 3 equivalent approaches

Multiple Equation GMM with Common

Coefficients: Panel Data

Eric Zivot

Winter 2013

Multi-equation GMM with common coefficients

Example (panel wage equation)

69 = + 69 + + 69 + 1

80 = + 80 + + 80 + 2

Note: common coefficients across two equations and (1 2) 6= 0

The general multi-equation model with common coefficients δ is

= z01×

δ×1

+ = 1 ; = 1

[z] 6= 0 for some element of

The model for the equation is

y×1

= Z×

δ×1

+ ε×1

= 1

Note

δ1 = · · · = δ = δ

The stacked system (across equations) is

y1 = Z1δ + ε1...

y = Zδ + ε

⎫⎪⎬⎪⎭⎡⎢⎣ y1

...y

⎤⎥⎦ =⎡⎢⎣ Z1

...Z

⎤⎥⎦ δ +⎡⎢⎣ ε1

...ε

⎤⎥⎦

This is in the form of a giant regression (with common coefficients)

y×1

= Z(×)

δ(×1)

+ ε×1

Assume each equation has instruments x satisfying

[x] = 0

Very often the same instruments x are used for each equation.

The moment conditions are

g(δ)×1

=

⎡⎢⎣ x11...

x

⎤⎥⎦ =⎡⎢⎣ x1(1 − z01δ)...x( − z0δ)

⎤⎥⎦There are a total of =

P=1 moment conditions.

Identification

[g(δ)] = 0 for δ = δ06= 0 for δ 6= δ0

Identification implies that⎡⎢⎣ [x11]...

[x ]

⎤⎥⎦−⎡⎢⎣ [x1z

01]...

[xz0 ]

⎤⎥⎦ δ0 = 0or

σ×1

− Σ(×)

δ0(×1)

= 0×1

For δ0 to be the unique solution requires the rank condition

rank(Σ) =

Remarks:

• In the model with common coefficients, some equations may be individuallyunidentified but identified within the system.

• A sufficient condition for identification is that [xz0] have full rank for some equation

The moments g(δ) are a MDS with

[g(δ0)g(δ0)0] = S =

⎛⎜⎝ S11 · · · S1... . . . ...S1 · · · S

⎞⎟⎠ S = [xx0]

Efficient GMM estimation

min

(δ S−1) = g(δ)0S−1g(δ)

g(δ) = S − Sδ

S =

⎡⎢⎣ S11...S

⎤⎥⎦ S =⎡⎢⎣ S11...S

⎤⎥⎦Straightforward algebra gives

δ(S−1) = (S0S−1S)−1S0S

−1S

Special cases

1. 3SLS with common coefficients

conditional homoskedasticity

x1 = x2 = · · · = x = x

S = S3 = Σ⊗[xx0]

2. SUR with common coefficients (Random effects)

conditional homoskedasticity

x1 = x2 = · · · = x = x

x = union(z1 z)

S = S = Σ⊗[zz0]

Random Effects Estimator (Efficient GMM with common coefficients)

The SUR model with common coefficients is the giant regression

y×1

= Z(×)

δ(×1)

+ ε Z =

⎡⎢⎣ Z1...Z

⎤⎥⎦[εε0] = Σ⊗ I

Σ =

⎛⎜⎝ 11 · · · 1... . . . ...1 · · ·

⎞⎟⎠

The FGLS estimator is the random effects (RE) estimator and has the form

δ = (Z0(Σ

−1 ⊗ I)Z)−1Z0(Σ−1 ⊗ I)y

The elements of Σ are usually estimated using the pooled OLS estimator ofthe giant regression

δ = (Z0Z)−1Z0y

and forming

= (y − Zδ)0(y − Zδ)

Note: Hayashi uses a somewhat non-standard version of the RE estimator. Heassumes that Σ is unstructured. Most other treatments of the RE estimatorassume a special structure for Σ based on an error components representationfor In this case, the FGLS estimator uses a different estimator for Σ

Simplifying the RE estimator

It turns out that the RE estimator may also be derived from an alternativerepresentation of the giant regression.

In the alternative representation, the system is stacked by observations insteadof by equations. We can do this because δ is the same across equations. Define

y×1

=

⎡⎢⎣ 1...

⎤⎥⎦ Z×

=

⎡⎢⎣ z01...z0

⎤⎥⎦ ε×1

=

⎡⎢⎣ 1...

⎤⎥⎦Then, the model for the individual across all equations is

y×1

= Z(×)

δ(×1)

+ ε×1

= 1

[εε0] = Σ

×[εε

0] = 0

Example: 2 period panel wage equation

y =

Ã6980

!Z =

Ã1 69 691 80 80

!δ = ( )0 ε = (1 2)

0

Σ =

Ã11 1212 22

!

The giant regression stacked by observations has the form⎡⎢⎣ y1...y

⎤⎥⎦ =⎡⎢⎣ Z1...Z

⎤⎥⎦ δ +⎡⎢⎣ ε1...ε

⎤⎥⎦or

y×1

= Z(×)

δ(×1)

+ ε×1

[εε0]×

=

⎛⎜⎝ [ε1ε01] · · · [ε1ε

0 ]... . . . ...

[εε01] · · · [εε0 ]

⎞⎟⎠=

⎛⎜⎝ Σ 0 00 . . . 00 0 Σ

⎞⎟⎠ = I ⊗Σ

The FGLS or RE estimator is

δ = (Z0(I ⊗ Σ−1)Z)−1(Z0(I ⊗ Σ−1)y

Now

Z0(I ⊗ Σ−1)Z

=hZ01· · ·Z0

i⎛⎜⎝ Σ−1 0 00 . . . 0

0 0 Σ−1

⎞⎟⎠⎡⎢⎣ Z1...Z

⎤⎥⎦=

X=1

Z0Σ−1Z

and

Z0(I ⊗ Σ−1)y

=hZ01· · ·Z0

i⎛⎜⎝ Σ−1 0 00 . . . 0

0 0 Σ−1

⎞⎟⎠⎡⎢⎣ y1...y

⎤⎥⎦=

X=1

Z0Σ−1y

Therefore,

δ =

⎛⎝ X=1

Z0Σ−1Z

⎞⎠−1 X=1

Z0Σ−1y

Asymptotic Properties of the RE estimator

Consistency of the RE estimator follows from the usual manipulations

δ − δ =

⎛⎝−1 X=1

Z0Σ−1Z

⎞⎠−1 −1 X=1

Z0Σ−1ε

as long as

−1X=1

Z0Σ−1ε

→ 0

the RE estimator will be consistent. This will occur, for example, if Z isuncorrelated with (no endogeneity).

Asymptotic normality also follows from the usual manipulations:

√(δ − δ) =

⎛⎝−1 X=1

Z0Σ−1Z

⎞⎠−1 −12 X=1

Z0Σ−1ε

Under the SUR model assumptions

−1X=1

Z0Σ−1Z

→ [Z0Σ−1Z]

−12X=1

Z0Σ−1ε

→ (0 [Z0Σ−1Z])

so that√(δ − δ)

→ (0 [Z0Σ−1Z]

−1)

Pooled OLS again ⎡⎢⎣ y1...y

⎤⎥⎦ =⎡⎢⎣ Z1...Z

⎤⎥⎦ δ +⎡⎢⎣ ε1...ε

⎤⎥⎦or

y×1

= Z(×)

δ(×1)

+ ε×1

The pooled OLS estimator may also be computed using the giant regressionstacked by observations:

δ = (Z0Z)−1Z0y

=

⎛⎝ X=1

Z0Z

⎞⎠−1 X=1

Z0y

The asymptotic properties of the OLS estimator are straightforward to derive.

For asymptotic normality,

√(δ − δ) =

⎛⎝−1 X=1

Z0Z

⎞⎠−1 −12 X=1

Z0ε

−1X=1

Z0Z→ [Z0Z]

−12X=1

Z0ε→ (0 [Z0εε

0Z])

Note, by iterated expectations and conditional homoskedasticity

[Z0εε0Z] = [[Z0εε

0Z|Z]

= [Z0[εε0]Z]|Z]

= [Z0ΣZ]

Therefore,√(δ − δ)

→ [Z0Z]−1 ×(0 [Z0ΣZ])

≡ (0 [Z0Z]−1[Z0ΣZ][Z

0Z]

−1)

Equivalently,

δ∼ (δ −1 davar(δ))

davar(δ) =

⎛⎝−1 X=1

Z0Z

⎞⎠−1⎛⎝−1 X=1

Z0Σ−1Z

⎞⎠×⎛⎝−1 X

=1

Z0Z

⎞⎠−1

Question: When is pooled OLS efficient?

Introduction to Panel Data

= z0δ +

= 1 (individuals); = 1 (time periods)

y×1

= Z(×)

δ(×1)

+ ε {yZ} is i.i.d.

Main question: Is z uncorrelated with ?

1. If yes, then we have a SUR type model with common coefficients. Underconditional homoskedasticity, the appropriate estimator is the RE estimator.

2. If no, then we have a multi-equation system with endogenous regressors. Weneed to use an estimation procedure to deal with the endogeneity. In the panelset-up, under certain assumptions, we can deal with the endogeneity withoutusing instruments using the so-called fixed effects (FE) estimator.

Error Components Assumption

= + = unobserved fixed effect

[z] = 0

[εε0|x] = Σ x = union(z1 z)

The fixed effect component (which is actually an unobserved random vari-able) captures unobserved heterogeneity across individuals that is fixed overtime.

With the error components assumption, the RE and FE models are defined asfollows:

RE model: [z] = 0

FE model: [z] 6= 0

Note: Hayashi’s treatment of error components model is slightly non-standard.

Example: Panel wage equation

69 = + 69 + + 69 + + 69

80 = + 80 + + 80 + + 80

[] 6= 0

Here captures unobserved ability that is correlated with . It is assumedthat ability does not vary over time.

Note: The existence of guarantees across equation error correlation:

[6980] = [³ + 69

´ ³ + 80

´]

= [2 ] +[80] +[69] +[6980]

6= 0

Remark: With panel data, the endogeneity due to unobserved endogeneity (i.e.,[z] 6= 0) can be eliminated without the use of instruments. To see this,consider the difference in log-wages over time:

80 − 69 = (− ) + (80 − 69)

+(80 −69) + ( − ) +³80 − 69

´= (80 − 69) + (80 −69) +

³80 − 69

´Hence, we can consistently estimate and by using the first differenced data!

Fixed Effects Estimation

Key insight: With panel data, δ can be consistently estimated without usinginstruments.

There are 3 equivalent approaches

1. Within group estimator

2. Least squares dummy variable estimator

3. First difference estimator

Within group estimator

To illustrate the within group estimator consider the simplified panel regressionwith a single regressor

= + + [] 6= 0

[] = 0

Trick to remove fixed effect : First, for each average over time

= + +

=1

X=1

=1

X=1

=1

X=1

Second, form the transformed regression

− = ( − ) + ( − ) + −

or

= +

The FE estimator is pooled OLS on the transformed regression (stacked byobservation)

= (0)−10

=

⎛⎝ X=1

0

⎞⎠−1 X=1

0

Remarks

1. If z does not vary with (e.g. z = z) then z = 0 and we cannotestimate δ

2. Must be careful computing the degrees of freedom for the FE estimator.There are total observations and parameters in δ so it appreas thatthere are− degrees of freedom. However, you lose 1 degree of freedom foreach fixed effect eliminated. So the actual degrees of freedom are−− =( − 1)−

Hayashi Notation for Within Estimator

Consider the general model (assume all variables vary with and )

= z0δ + +

Stack the observations over giving

y×1

= Z(×)

δ(×1)

+ (1×1)

1(×1)

+ η×1

Define

Q×

= I − 1(101)−110= I −P

P = 1(101)

−110 =−1110

Note

P1 = 1 Q1 = 0

Py = 1(101)

−110y= 1

Qy = (I −P)y= y − 1

= y

The transformed error components model is then

Qy = QZδ + Q1 +Qη⇒ y = Zδ + η

The giant regression (stacked by observation) is⎡⎢⎣ y1...y

⎤⎥⎦ =⎡⎢⎣ Z1...Z

⎤⎥⎦ δ +⎡⎢⎣ η1...η

⎤⎥⎦or

y×1

= Z(×)

δ(×1)

+ η×1

Note: Unless [ηη] = σ2I δ = δ is not efficient.

The FE estimator is again pooled OLS on the transformed system

δ =

⎛⎝ X=1

Z0Z

⎞⎠−1 X=1

Z0y

=

⎛⎝ X=1

(QZ)0QZ

⎞⎠−1 X=1

(QZ)0Qy

=

⎛⎝ X=1

Z0QZ

⎞⎠−1 X=1

Z0Qy

since Q is idempotent.

Least Squared Dummy Variable Model

Consider the general model

= z0δ + +

Stack the observations over giving

y×1

= Zδ + 1 + η

Now create the giant regression⎡⎢⎣ y1...y

⎤⎥⎦ =⎡⎢⎣ Z1...Z

⎤⎥⎦ δ +⎡⎢⎣ 1 0 00 . . . 00 0 1

⎤⎥⎦⎡⎢⎣ 1...

⎤⎥⎦+⎡⎢⎣ η1...η

⎤⎥⎦or

y×1

= Z(×)

δ(×1)

+ D(×)

α(×1)

+ η(×1)

= Zδ + (I ⊗ 1)α+ η

D = I ⊗ 1

Aside: Partitioned Regression

Consider the partitioned regression equation

y×1

= X1×1

β11×1

+ X2×2

β22×2

+ ε

The LS estimators for β1 and β2 can be expressed as

β1 = (X01Q2X1)−1X01Q2y Q2 = I −PX2

β2 = (X02Q1X2)−1X02Q1y Q1 = I −PX1

where

PX1 = X1(X01X1)

−1X01 PX2 = X2(X02X2)

−1X02

Result: The FE estimator is the OLS estimator of δ in the giant regression

δ = (Z0QZ)−1Z0Qy

Q = I −P P = D(D0D)−1D0 D = I ⊗ 1Now,

P = (I ⊗ 1)h(I ⊗ 1)0 (I ⊗ 1)

i−1(I ⊗ 1)0

= (I ⊗ 1) [I ⊗ 101 ]−1 (I ⊗ 1)0

= (I ⊗ 1) [I ⊗³101

´−1]³I ⊗ 10

´= I ⊗P

Therefore,

Q = I −P= I − (I ⊗P)= I ⊗Q

As a result

δ = (Z0QZ)−1Z0Qy

= (Z0 (I ⊗Q)Z)−1Z0 (I ⊗Q)y

=

⎛⎝ X=1

Z0Z

⎞⎠−1 X=1

Z0y

which is exactly the result we got when we transformed the model by subtractingoff group means.

Recovering Estimates of α

With the LSDV approach, it is straightforward to deduce estimates for Byexamining the normal equations for α in the Giant Regression, one can deducethat

= − z0δ

=1

X=1

z =1

X=1

z

FE as a First Difference Estimator

Consider the error components model

= z0δ + +

[ηη0] = 2I

Note: The spherical error assumption is made so that GLS estimation is possi-ble.

Take 1st differences over :

∆2 = 2 − 1 = ∆z02δ +∆2...∆ = − (−1) = ∆z0δ +∆

Notice that differencing removes the fixed effect.

In matrix form the model is

C0y(−1)×1

= C0Zδ +C0η

or (in Hayashi notation)

y = Zδ + η

where y = C0y Z = C0Z η = C0η and

C0(−1)×

=

⎛⎜⎜⎜⎝−1 1 0 00 −1 1 0

0 0 0 −1 1

⎞⎟⎟⎟⎠Note that

[ηη0] = [C0ηη

0C] = C

0[ηη0]C =2C

0C

The transformed Giant Regression is⎡⎢⎣ y1...y

⎤⎥⎦ =⎡⎢⎣ Z1...Z

⎤⎥⎦ δ +⎡⎢⎣ η1...η

⎤⎥⎦or

y(−1)×1

= Z((−1)×)

δ(×1)

+ η(−1)×1

Notice that

[ηη0](−1)×(−1)

=

⎛⎜⎜⎜⎜⎝2C

0C 0 · · · 0

0 2C0C · · · 0

... ... . . . ...0 0 · · · 2C

0C

⎞⎟⎟⎟⎟⎠ = 2hI ⊗ (C0C)

i

Pooled GLS on the transformed regression gives

δ =hZ0³I ⊗ (C0C)−1

´Zi−1

Z0³I ⊗ (C0C)−1

´y

=

⎛⎝ X=1

Z0C(C0C)−1C0Z

⎞⎠−1 X=1

Z0C(C0C)−1C0y

=

⎛⎝ X=1

Z0PZ

⎞⎠−1 X=1

Z0Py

=

⎛⎝ X=1

Z0QZ

⎞⎠−1 X=1

Z0Qy

=

⎛⎝ X=1

Z0Z

⎞⎠−1 X=1

Z0y

Here

Z0³I ⊗ (C0C)−1

´Z

=hZ01· · · Z

0

i⎛⎜⎝ (C0C)−1 0 00 . . . 0

0 0 (C0C)−1

⎞⎟⎠⎡⎢⎣ Z1...Z

⎤⎥⎦=

X=1

Z0(C0C)−1Z

=X=1

(C0Z)0(C0C)−1C0Z

=X=1

Z0C(C0C)−1C0Z =

X=1

Z0PZ P = C(C

0C)−1C0

The result

P = C(C0C)−1C0 = Q = I − 1(101)−110

follows from the result that

C01 =

⎛⎜⎜⎜⎝−1 1 0 00 −1 1 0

0 0 0 −1 1

⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝11...1

⎞⎟⎟⎟⎠ =⎛⎜⎜⎜⎝00...0

⎞⎟⎟⎟⎠That is, P = C(C

0C)−1C0 is an idempotent matrix satisfying P1 = 0

It projects onto the space orthogonal to 1 and this is exactly what Q

does.

The FE estimator is a GMM estimator

Using the first-difference transformation of the error components model

y = Zδ + η

the FE estimator can be viewed as a GMM estimator of the form

δ(W) =³S¯0WS

¯´−1

S¯0WS

¯

where

S¯

=1

X=1

Z ⊗ x x = union of (1 )

S¯

=1

X=1

y ⊗ x

W = (C0C)−1 ⊗ S−1

Asymptotic Distribution of FE estimator

Assumptions:

1. y = Zδ + ε = 1

2. ε = 1 + η (error components)

3. {yZ} is i.i.d.

4. [] 6= 0 (Endogeneity)

5. [] = 0 for = 1 2

It follows that

√³δ − δ

´=

⎛⎝1

X=1

Z0Z

⎞⎠−1 1√

X=1

Z0η

→ [Z0Z]−1(0 [Z0ηη

0Z])

≡ (0 avar(δ))

where

avar(δ) = [Z0Z]−1[Z0ηη

0Z][Z

0Z]

−1

Therefore,

δ ∼ (δ −1 davar(δ))Remark: The sandwich form of avar(δ) suggests that it is an inefficientGMM estimator.

General Case: Conditional Heteroskedasticity

1

X=1

Z0Z→ [Z0Z]

1

X=1

Z0ηη0Z→ [Z0ηη

0Z]

where η = y − Zδ Then a consistent estimate for avar(δ) is

davar(δ) =

⎛⎝1

X=1

Z0Z

⎞⎠−1 ×⎛⎝1

X=1

Z0ηη0Z

⎞⎠−1

×⎛⎝1

X=1

Z0Z

⎞⎠−1

Remarks

• The formula for davar(δ) is called the panel robust covariance estimate.It is not the same as the usual White correction for heteroskedasticity in apooled OLS regression. The White correction does not account for serialcorrelation. It is also not the Newey-West correction for heteroskedasticityand autocorrelation.

• The formula for davar(δ) is also known as the cluster robust covarianceestimate when the clustering variable is (each individual is a cluster)

Special Case: Conditional Homoskedasticity

Assume: [εε0|x] = Σ x = union(z1 z)

Then it can be shown that (Hayashi page 332)

[Z0ηη0Z] = [Z0[ηη

0]Z] = [Z0VZ] V = [ηη

0]

and

V =1

X=1

ηη0→ [ηη

0] where η = (y − Zδ)

Then a consistent estimate for avar(δ) is

davar(δ) =

⎛⎝1

X=1

Z0Z

⎞⎠−1 ×⎛⎝1

X=1

Z0VZ

⎞⎠−1

×⎛⎝1

X=1

Z0Z

⎞⎠−1

Special Case: When η is Spherical

[ηη0] = 2I ⇒ [ηη

0] = 2Q

This implies that there is no serial correlation (correlation across time)

Then

[Z0ηη0Z] = [Z0[ηη

0]]Z] = 2[Z

0Z]

and

avar(δ) = 2[Z0Z]

−1

Then a consistent estimate for avar(δ) is

davar(δ) = 2

⎛⎝1

X=1

Z0Z

⎞⎠−1

2 =1

− −

X=1

η0η

η = (y − Zδ)

Remark: In this case the FE estimator is an efficient GMM estimator!

Hausman Specification Test for FE vs. RE

The hypotheses to be tested are

0 : [z] = 0 (RE estimation)

1 : [z] 6= 0 (FE estimation)

Hausman and Taylor (1981, Ecta) considered a test statistic based on

q = δ − δ

in the context of maximum likelihood estimation. Newey (1985, see alsoHayashi Chapter 3, Analytic Exercise 9) studied the test in the context ofGMM.

Intuition: Under 0 both δ and δ are consistent (but δ is efficient)so that

q→ 0

Under 1 δ is not consistent but δ is consistent so that

q9 0

Therefore, consider the test statistic

= q (davar(q))−1 qIf is big then reject 0; otherwise do not reject 0

Q1: What is davar(q)?Q2: What is the asymptotic distribution of ?

To answer these questions consider the general linear GMM model

= z0δ + = 1

[x] = 0 [xx02 ] = S

Let δ1(W1) denote an inefficient GMM estimator with weight matrix W1→

W1 6= S−1

Let δ2(W2) denote another inefficient GMM estimator with weight matrixW2

→W2 6=W1 6= S−1

The following results are based on Newey 1985 (See also Hayashi, Chapter 3,Analytic Exercise 9)

Result 1 ⎡⎣ √ ³δ1(W1)− δ´

√³δ2(W2)− δ

´ ⎤⎦ →

"Ã00

!

"A11 A12A012 A22

##

where

A11 = avar(δ1(W1))

A22 = avar(δ2(W2))

A12 = acov(δ1(W1) δ2(W2))

Define

q = δ1(W1)− δ2(W2)

Result 2:√q

→ (0 avar(q))

avar(q) = A11 +A22 −A12 −A012= avar(δ1(W1)) + avar(δ2(W2))

−acov(δ1(W1) δ2(W2))− acov(δ1(W1) δ2(W2))0

Result 3: Set W2 = S−1 where S

→ S Then

A12 = A012 = A22 = avar(δ2(S

−1))

and

avar(q) = A11 +A22 − 2A22 = A11 −A22= avar(δ1(W1))− avar(δ2(S−1))

Remark: Because δ2(S−1) is the efficient GMM estimator avar(q) ≥ 0

Applying Newey’s result to

q = δ − δ

gives

avar(q) = avar(δ)− avar(δ)

and

= q (davar(q))−1 q ∼ 2()

Therefore, we reject 0 : [z] = 0 at the × (100)% level if

21−()

Remark: Hayashi shows (Appendix to chapter 5) that avar(q) 0 so thatavar(q)−1 is well defined.

Remarks

The form of avar(q) = avar(δ) − avar(δ) depends on assumptionsabout heteroskedasticity.

Typical case: Assume conditional homoskedasticity

√(δ − δ)

→ (0 [Z0Σ−1Z]

−1)√³δ − δ

´→

(0 [Z0Z]−1[Z0[ηη

0]Z][Z

0Z]

−1)

Then

davar(δ) =

⎛⎝1

X=1

Z0Z

⎞⎠−1 ×⎛⎝1

X=1

Z0VZ

⎞⎠−1

×⎛⎝1

X=1

Z0Z

⎞⎠−1

davar(δ) =

⎛⎝−1 X=1

Z0Σ−1Z

⎞⎠

Documents

Multiple Equation GMM with Common Coeﬃcients: Panel Data · Key insight: With panel data, δcan be consistently estimated without using instruments. There are 3 equivalent approaches