Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multiple Equation GMM with Common
Coefficients: Panel Data
Eric Zivot
Winter 2013
Multi-equation GMM with common coefficients
Example (panel wage equation)
69 = + 69 + + 69 + 1
80 = + 80 + + 80 + 2
Note: common coefficients across two equations and (1 2) 6= 0
The general multi-equation model with common coefficients δ is
= z01×
δ×1
+ = 1 ; = 1
[z] 6= 0 for some element of
The model for the equation is
y×1
= Z×
δ×1
+ ε×1
= 1
Note
δ1 = · · · = δ = δ
The stacked system (across equations) is
y1 = Z1δ + ε1...
y = Zδ + ε
⎫⎪⎬⎪⎭⎡⎢⎣ y1
...y
⎤⎥⎦ =⎡⎢⎣ Z1
...Z
⎤⎥⎦ δ +⎡⎢⎣ ε1
...ε
⎤⎥⎦
This is in the form of a giant regression (with common coefficients)
y×1
= Z(×)
δ(×1)
+ ε×1
Assume each equation has instruments x satisfying
[x] = 0
Very often the same instruments x are used for each equation.
The moment conditions are
g(δ)×1
=
⎡⎢⎣ x11...
x
⎤⎥⎦ =⎡⎢⎣ x1(1 − z01δ)...x( − z0δ)
⎤⎥⎦There are a total of =
P=1 moment conditions.
Identification
[g(δ)] = 0 for δ = δ06= 0 for δ 6= δ0
Identification implies that⎡⎢⎣ [x11]...
[x ]
⎤⎥⎦−⎡⎢⎣ [x1z
01]...
[xz0 ]
⎤⎥⎦ δ0 = 0or
σ×1
− Σ(×)
δ0(×1)
= 0×1
For δ0 to be the unique solution requires the rank condition
rank(Σ) =
Remarks:
• In the model with common coefficients, some equations may be individuallyunidentified but identified within the system.
• A sufficient condition for identification is that [xz0] have full rank for some equation
The moments g(δ) are a MDS with
[g(δ0)g(δ0)0] = S =
⎛⎜⎝ S11 · · · S1... . . . ...S1 · · · S
⎞⎟⎠ S = [xx0]
Efficient GMM estimation
min
(δ S−1) = g(δ)0S−1g(δ)
g(δ) = S − Sδ
S =
⎡⎢⎣ S11...S
⎤⎥⎦ S =⎡⎢⎣ S11...S
⎤⎥⎦Straightforward algebra gives
δ(S−1) = (S0S−1S)−1S0S
−1S
Special cases
1. 3SLS with common coefficients
conditional homoskedasticity
x1 = x2 = · · · = x = x
S = S3 = Σ⊗[xx0]
2. SUR with common coefficients (Random effects)
conditional homoskedasticity
x1 = x2 = · · · = x = x
x = union(z1 z)
S = S = Σ⊗[zz0]
Random Effects Estimator (Efficient GMM with common coefficients)
The SUR model with common coefficients is the giant regression
y×1
= Z(×)
δ(×1)
+ ε Z =
⎡⎢⎣ Z1...Z
⎤⎥⎦[εε0] = Σ⊗ I
Σ =
⎛⎜⎝ 11 · · · 1... . . . ...1 · · ·
⎞⎟⎠
The FGLS estimator is the random effects (RE) estimator and has the form
δ = (Z0(Σ
−1 ⊗ I)Z)−1Z0(Σ−1 ⊗ I)y
The elements of Σ are usually estimated using the pooled OLS estimator ofthe giant regression
δ = (Z0Z)−1Z0y
and forming
= (y − Zδ)0(y − Zδ)
Note: Hayashi uses a somewhat non-standard version of the RE estimator. Heassumes that Σ is unstructured. Most other treatments of the RE estimatorassume a special structure for Σ based on an error components representationfor In this case, the FGLS estimator uses a different estimator for Σ
Simplifying the RE estimator
It turns out that the RE estimator may also be derived from an alternativerepresentation of the giant regression.
In the alternative representation, the system is stacked by observations insteadof by equations. We can do this because δ is the same across equations. Define
y×1
=
⎡⎢⎣ 1...
⎤⎥⎦ Z×
=
⎡⎢⎣ z01...z0
⎤⎥⎦ ε×1
=
⎡⎢⎣ 1...
⎤⎥⎦Then, the model for the individual across all equations is
y×1
= Z(×)
δ(×1)
+ ε×1
= 1
[εε0] = Σ
×[εε
0] = 0
Example: 2 period panel wage equation
y =
Ã6980
!Z =
Ã1 69 691 80 80
!δ = ( )0 ε = (1 2)
0
Σ =
Ã11 1212 22
!
The giant regression stacked by observations has the form⎡⎢⎣ y1...y
⎤⎥⎦ =⎡⎢⎣ Z1...Z
⎤⎥⎦ δ +⎡⎢⎣ ε1...ε
⎤⎥⎦or
y×1
= Z(×)
δ(×1)
+ ε×1
[εε0]×
=
⎛⎜⎝ [ε1ε01] · · · [ε1ε
0 ]... . . . ...
[εε01] · · · [εε0 ]
⎞⎟⎠=
⎛⎜⎝ Σ 0 00 . . . 00 0 Σ
⎞⎟⎠ = I ⊗Σ
The FGLS or RE estimator is
δ = (Z0(I ⊗ Σ−1)Z)−1(Z0(I ⊗ Σ−1)y
Now
Z0(I ⊗ Σ−1)Z
=hZ01· · ·Z0
i⎛⎜⎝ Σ−1 0 00 . . . 0
0 0 Σ−1
⎞⎟⎠⎡⎢⎣ Z1...Z
⎤⎥⎦=
X=1
Z0Σ−1Z
and
Z0(I ⊗ Σ−1)y
=hZ01· · ·Z0
i⎛⎜⎝ Σ−1 0 00 . . . 0
0 0 Σ−1
⎞⎟⎠⎡⎢⎣ y1...y
⎤⎥⎦=
X=1
Z0Σ−1y
Therefore,
δ =
⎛⎝ X=1
Z0Σ−1Z
⎞⎠−1 X=1
Z0Σ−1y
Asymptotic Properties of the RE estimator
Consistency of the RE estimator follows from the usual manipulations
δ − δ =
⎛⎝−1 X=1
Z0Σ−1Z
⎞⎠−1 −1 X=1
Z0Σ−1ε
as long as
−1X=1
Z0Σ−1ε
→ 0
the RE estimator will be consistent. This will occur, for example, if Z isuncorrelated with (no endogeneity).
Asymptotic normality also follows from the usual manipulations:
√(δ − δ) =
⎛⎝−1 X=1
Z0Σ−1Z
⎞⎠−1 −12 X=1
Z0Σ−1ε
Under the SUR model assumptions
−1X=1
Z0Σ−1Z
→ [Z0Σ−1Z]
−12X=1
Z0Σ−1ε
→ (0 [Z0Σ−1Z])
so that√(δ − δ)
→ (0 [Z0Σ−1Z]
−1)
Pooled OLS again ⎡⎢⎣ y1...y
⎤⎥⎦ =⎡⎢⎣ Z1...Z
⎤⎥⎦ δ +⎡⎢⎣ ε1...ε
⎤⎥⎦or
y×1
= Z(×)
δ(×1)
+ ε×1
The pooled OLS estimator may also be computed using the giant regressionstacked by observations:
δ = (Z0Z)−1Z0y
=
⎛⎝ X=1
Z0Z
⎞⎠−1 X=1
Z0y
The asymptotic properties of the OLS estimator are straightforward to derive.
For asymptotic normality,
√(δ − δ) =
⎛⎝−1 X=1
Z0Z
⎞⎠−1 −12 X=1
Z0ε
−1X=1
Z0Z→ [Z0Z]
−12X=1
Z0ε→ (0 [Z0εε
0Z])
Note, by iterated expectations and conditional homoskedasticity
[Z0εε0Z] = [[Z0εε
0Z|Z]
= [Z0[εε0]Z]|Z]
= [Z0ΣZ]
Therefore,√(δ − δ)
→ [Z0Z]−1 ×(0 [Z0ΣZ])
≡ (0 [Z0Z]−1[Z0ΣZ][Z
0Z]
−1)
Equivalently,
δ∼ (δ −1 davar(δ))
davar(δ) =
⎛⎝−1 X=1
Z0Z
⎞⎠−1⎛⎝−1 X=1
Z0Σ−1Z
⎞⎠×⎛⎝−1 X
=1
Z0Z
⎞⎠−1
Question: When is pooled OLS efficient?
Introduction to Panel Data
= z0δ +
= 1 (individuals); = 1 (time periods)
y×1
= Z(×)
δ(×1)
+ ε {yZ} is i.i.d.
Main question: Is z uncorrelated with ?
1. If yes, then we have a SUR type model with common coefficients. Underconditional homoskedasticity, the appropriate estimator is the RE estimator.
2. If no, then we have a multi-equation system with endogenous regressors. Weneed to use an estimation procedure to deal with the endogeneity. In the panelset-up, under certain assumptions, we can deal with the endogeneity withoutusing instruments using the so-called fixed effects (FE) estimator.
Error Components Assumption
= + = unobserved fixed effect
[z] = 0
[εε0|x] = Σ x = union(z1 z)
The fixed effect component (which is actually an unobserved random vari-able) captures unobserved heterogeneity across individuals that is fixed overtime.
With the error components assumption, the RE and FE models are defined asfollows:
RE model: [z] = 0
FE model: [z] 6= 0
Note: Hayashi’s treatment of error components model is slightly non-standard.
Example: Panel wage equation
69 = + 69 + + 69 + + 69
80 = + 80 + + 80 + + 80
[] 6= 0
Here captures unobserved ability that is correlated with . It is assumedthat ability does not vary over time.
Note: The existence of guarantees across equation error correlation:
[6980] = [³ + 69
´ ³ + 80
´]
= [2 ] +[80] +[69] +[6980]
6= 0
Remark: With panel data, the endogeneity due to unobserved endogeneity (i.e.,[z] 6= 0) can be eliminated without the use of instruments. To see this,consider the difference in log-wages over time:
80 − 69 = (− ) + (80 − 69)
+(80 −69) + ( − ) +³80 − 69
´= (80 − 69) + (80 −69) +
³80 − 69
´Hence, we can consistently estimate and by using the first differenced data!
Fixed Effects Estimation
Key insight: With panel data, δ can be consistently estimated without usinginstruments.
There are 3 equivalent approaches
1. Within group estimator
2. Least squares dummy variable estimator
3. First difference estimator
Within group estimator
To illustrate the within group estimator consider the simplified panel regressionwith a single regressor
= + + [] 6= 0
[] = 0
Trick to remove fixed effect : First, for each average over time
= + +
=1
X=1
=1
X=1
=1
X=1
Second, form the transformed regression
− = ( − ) + ( − ) + −
or
= +
The FE estimator is pooled OLS on the transformed regression (stacked byobservation)
= (0)−10
=
⎛⎝ X=1
0
⎞⎠−1 X=1
0
Remarks
1. If z does not vary with (e.g. z = z) then z = 0 and we cannotestimate δ
2. Must be careful computing the degrees of freedom for the FE estimator.There are total observations and parameters in δ so it appreas thatthere are− degrees of freedom. However, you lose 1 degree of freedom foreach fixed effect eliminated. So the actual degrees of freedom are−− =( − 1)−
Hayashi Notation for Within Estimator
Consider the general model (assume all variables vary with and )
= z0δ + +
Stack the observations over giving
y×1
= Z(×)
δ(×1)
+ (1×1)
1(×1)
+ η×1
Define
Q×
= I − 1(101)−110= I −P
P = 1(101)
−110 =−1110
Note
P1 = 1 Q1 = 0
Py = 1(101)
−110y= 1
Qy = (I −P)y= y − 1
= y
The transformed error components model is then
Qy = QZδ + Q1 +Qη⇒ y = Zδ + η
The giant regression (stacked by observation) is⎡⎢⎣ y1...y
⎤⎥⎦ =⎡⎢⎣ Z1...Z
⎤⎥⎦ δ +⎡⎢⎣ η1...η
⎤⎥⎦or
y×1
= Z(×)
δ(×1)
+ η×1
Note: Unless [ηη] = σ2I δ = δ is not efficient.
The FE estimator is again pooled OLS on the transformed system
δ =
⎛⎝ X=1
Z0Z
⎞⎠−1 X=1
Z0y
=
⎛⎝ X=1
(QZ)0QZ
⎞⎠−1 X=1
(QZ)0Qy
=
⎛⎝ X=1
Z0QZ
⎞⎠−1 X=1
Z0Qy
since Q is idempotent.
Least Squared Dummy Variable Model
Consider the general model
= z0δ + +
Stack the observations over giving
y×1
= Zδ + 1 + η
Now create the giant regression⎡⎢⎣ y1...y
⎤⎥⎦ =⎡⎢⎣ Z1...Z
⎤⎥⎦ δ +⎡⎢⎣ 1 0 00 . . . 00 0 1
⎤⎥⎦⎡⎢⎣ 1...
⎤⎥⎦+⎡⎢⎣ η1...η
⎤⎥⎦or
y×1
= Z(×)
δ(×1)
+ D(×)
α(×1)
+ η(×1)
= Zδ + (I ⊗ 1)α+ η
D = I ⊗ 1
Aside: Partitioned Regression
Consider the partitioned regression equation
y×1
= X1×1
β11×1
+ X2×2
β22×2
+ ε
The LS estimators for β1 and β2 can be expressed as
β1 = (X01Q2X1)−1X01Q2y Q2 = I −PX2
β2 = (X02Q1X2)−1X02Q1y Q1 = I −PX1
where
PX1 = X1(X01X1)
−1X01 PX2 = X2(X02X2)
−1X02
Result: The FE estimator is the OLS estimator of δ in the giant regression
δ = (Z0QZ)−1Z0Qy
Q = I −P P = D(D0D)−1D0 D = I ⊗ 1Now,
P = (I ⊗ 1)h(I ⊗ 1)0 (I ⊗ 1)
i−1(I ⊗ 1)0
= (I ⊗ 1) [I ⊗ 101 ]−1 (I ⊗ 1)0
= (I ⊗ 1) [I ⊗³101
´−1]³I ⊗ 10
´= I ⊗P
Therefore,
Q = I −P= I − (I ⊗P)= I ⊗Q
As a result
δ = (Z0QZ)−1Z0Qy
= (Z0 (I ⊗Q)Z)−1Z0 (I ⊗Q)y
=
⎛⎝ X=1
Z0Z
⎞⎠−1 X=1
Z0y
which is exactly the result we got when we transformed the model by subtractingoff group means.
Recovering Estimates of α
With the LSDV approach, it is straightforward to deduce estimates for Byexamining the normal equations for α in the Giant Regression, one can deducethat
= − z0δ
=1
X=1
z =1
X=1
z
FE as a First Difference Estimator
Consider the error components model
= z0δ + +
[ηη0] = 2I
Note: The spherical error assumption is made so that GLS estimation is possi-ble.
Take 1st differences over :
∆2 = 2 − 1 = ∆z02δ +∆2...∆ = − (−1) = ∆z0δ +∆
Notice that differencing removes the fixed effect.
In matrix form the model is
C0y(−1)×1
= C0Zδ +C0η
or (in Hayashi notation)
y = Zδ + η
where y = C0y Z = C0Z η = C0η and
C0(−1)×
=
⎛⎜⎜⎜⎝−1 1 0 00 −1 1 0
0 0 0 −1 1
⎞⎟⎟⎟⎠Note that
[ηη0] = [C0ηη
0C] = C
0[ηη0]C =2C
0C
The transformed Giant Regression is⎡⎢⎣ y1...y
⎤⎥⎦ =⎡⎢⎣ Z1...Z
⎤⎥⎦ δ +⎡⎢⎣ η1...η
⎤⎥⎦or
y(−1)×1
= Z((−1)×)
δ(×1)
+ η(−1)×1
Notice that
[ηη0](−1)×(−1)
=
⎛⎜⎜⎜⎜⎝2C
0C 0 · · · 0
0 2C0C · · · 0
... ... . . . ...0 0 · · · 2C
0C
⎞⎟⎟⎟⎟⎠ = 2hI ⊗ (C0C)
i
Pooled GLS on the transformed regression gives
δ =hZ0³I ⊗ (C0C)−1
´Zi−1
Z0³I ⊗ (C0C)−1
´y
=
⎛⎝ X=1
Z0C(C0C)−1C0Z
⎞⎠−1 X=1
Z0C(C0C)−1C0y
=
⎛⎝ X=1
Z0PZ
⎞⎠−1 X=1
Z0Py
=
⎛⎝ X=1
Z0QZ
⎞⎠−1 X=1
Z0Qy
=
⎛⎝ X=1
Z0Z
⎞⎠−1 X=1
Z0y
Here
Z0³I ⊗ (C0C)−1
´Z
=hZ01· · · Z
0
i⎛⎜⎝ (C0C)−1 0 00 . . . 0
0 0 (C0C)−1
⎞⎟⎠⎡⎢⎣ Z1...Z
⎤⎥⎦=
X=1
Z0(C0C)−1Z
=X=1
(C0Z)0(C0C)−1C0Z
=X=1
Z0C(C0C)−1C0Z =
X=1
Z0PZ P = C(C
0C)−1C0
The result
P = C(C0C)−1C0 = Q = I − 1(101)−110
follows from the result that
C01 =
⎛⎜⎜⎜⎝−1 1 0 00 −1 1 0
0 0 0 −1 1
⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝11...1
⎞⎟⎟⎟⎠ =⎛⎜⎜⎜⎝00...0
⎞⎟⎟⎟⎠That is, P = C(C
0C)−1C0 is an idempotent matrix satisfying P1 = 0
It projects onto the space orthogonal to 1 and this is exactly what Q
does.
The FE estimator is a GMM estimator
Using the first-difference transformation of the error components model
y = Zδ + η
the FE estimator can be viewed as a GMM estimator of the form
δ(W) =³S¯0WS
¯´−1
S¯0WS
¯
where
S¯
=1
X=1
Z ⊗ x x = union of (1 )
S¯
=1
X=1
y ⊗ x
W = (C0C)−1 ⊗ S−1
Asymptotic Distribution of FE estimator
Assumptions:
1. y = Zδ + ε = 1
2. ε = 1 + η (error components)
3. {yZ} is i.i.d.
4. [] 6= 0 (Endogeneity)
5. [] = 0 for = 1 2
It follows that
√³δ − δ
´=
⎛⎝1
X=1
Z0Z
⎞⎠−1 1√
X=1
Z0η
→ [Z0Z]−1(0 [Z0ηη
0Z])
≡ (0 avar(δ))
where
avar(δ) = [Z0Z]−1[Z0ηη
0Z][Z
0Z]
−1
Therefore,
δ ∼ (δ −1 davar(δ))Remark: The sandwich form of avar(δ) suggests that it is an inefficientGMM estimator.
General Case: Conditional Heteroskedasticity
1
X=1
Z0Z→ [Z0Z]
1
X=1
Z0ηη0Z→ [Z0ηη
0Z]
where η = y − Zδ Then a consistent estimate for avar(δ) is
davar(δ) =
⎛⎝1
X=1
Z0Z
⎞⎠−1 ×⎛⎝1
X=1
Z0ηη0Z
⎞⎠−1
×⎛⎝1
X=1
Z0Z
⎞⎠−1
Remarks
• The formula for davar(δ) is called the panel robust covariance estimate.It is not the same as the usual White correction for heteroskedasticity in apooled OLS regression. The White correction does not account for serialcorrelation. It is also not the Newey-West correction for heteroskedasticityand autocorrelation.
• The formula for davar(δ) is also known as the cluster robust covarianceestimate when the clustering variable is (each individual is a cluster)
Special Case: Conditional Homoskedasticity
Assume: [εε0|x] = Σ x = union(z1 z)
Then it can be shown that (Hayashi page 332)
[Z0ηη0Z] = [Z0[ηη
0]Z] = [Z0VZ] V = [ηη
0]
and
V =1
X=1
ηη0→ [ηη
0] where η = (y − Zδ)
Then a consistent estimate for avar(δ) is
davar(δ) =
⎛⎝1
X=1
Z0Z
⎞⎠−1 ×⎛⎝1
X=1
Z0VZ
⎞⎠−1
×⎛⎝1
X=1
Z0Z
⎞⎠−1
Special Case: When η is Spherical
[ηη0] = 2I ⇒ [ηη
0] = 2Q
This implies that there is no serial correlation (correlation across time)
Then
[Z0ηη0Z] = [Z0[ηη
0]]Z] = 2[Z
0Z]
and
avar(δ) = 2[Z0Z]
−1
Then a consistent estimate for avar(δ) is
davar(δ) = 2
⎛⎝1
X=1
Z0Z
⎞⎠−1
2 =1
− −
X=1
η0η
η = (y − Zδ)
Remark: In this case the FE estimator is an efficient GMM estimator!
Hausman Specification Test for FE vs. RE
The hypotheses to be tested are
0 : [z] = 0 (RE estimation)
1 : [z] 6= 0 (FE estimation)
Hausman and Taylor (1981, Ecta) considered a test statistic based on
q = δ − δ
in the context of maximum likelihood estimation. Newey (1985, see alsoHayashi Chapter 3, Analytic Exercise 9) studied the test in the context ofGMM.
Intuition: Under 0 both δ and δ are consistent (but δ is efficient)so that
q→ 0
Under 1 δ is not consistent but δ is consistent so that
q9 0
Therefore, consider the test statistic
= q (davar(q))−1 qIf is big then reject 0; otherwise do not reject 0
Q1: What is davar(q)?Q2: What is the asymptotic distribution of ?
To answer these questions consider the general linear GMM model
= z0δ + = 1
[x] = 0 [xx02 ] = S
Let δ1(W1) denote an inefficient GMM estimator with weight matrix W1→
W1 6= S−1
Let δ2(W2) denote another inefficient GMM estimator with weight matrixW2
→W2 6=W1 6= S−1
The following results are based on Newey 1985 (See also Hayashi, Chapter 3,Analytic Exercise 9)
Result 1 ⎡⎣ √ ³δ1(W1)− δ´
√³δ2(W2)− δ
´ ⎤⎦ →
"Ã00
!
"A11 A12A012 A22
##
where
A11 = avar(δ1(W1))
A22 = avar(δ2(W2))
A12 = acov(δ1(W1) δ2(W2))
Define
q = δ1(W1)− δ2(W2)
Result 2:√q
→ (0 avar(q))
avar(q) = A11 +A22 −A12 −A012= avar(δ1(W1)) + avar(δ2(W2))
−acov(δ1(W1) δ2(W2))− acov(δ1(W1) δ2(W2))0
Result 3: Set W2 = S−1 where S
→ S Then
A12 = A012 = A22 = avar(δ2(S
−1))
and
avar(q) = A11 +A22 − 2A22 = A11 −A22= avar(δ1(W1))− avar(δ2(S−1))
Remark: Because δ2(S−1) is the efficient GMM estimator avar(q) ≥ 0
Applying Newey’s result to
q = δ − δ
gives
avar(q) = avar(δ)− avar(δ)
and
= q (davar(q))−1 q ∼ 2()
Therefore, we reject 0 : [z] = 0 at the × (100)% level if
21−()
Remark: Hayashi shows (Appendix to chapter 5) that avar(q) 0 so thatavar(q)−1 is well defined.
Remarks
The form of avar(q) = avar(δ) − avar(δ) depends on assumptionsabout heteroskedasticity.
Typical case: Assume conditional homoskedasticity
√(δ − δ)
→ (0 [Z0Σ−1Z]
−1)√³δ − δ
´→
(0 [Z0Z]−1[Z0[ηη
0]Z][Z
0Z]
−1)
Then
davar(δ) =
⎛⎝1
X=1
Z0Z
⎞⎠−1 ×⎛⎝1
X=1
Z0VZ
⎞⎠−1
×⎛⎝1
X=1
Z0Z
⎞⎠−1
davar(δ) =
⎛⎝−1 X=1
Z0Σ−1Z
⎞⎠