GENERALIZED LEAST SQUARES INFERENCE IN …faculty.chicagobooth.edu/christian.hansen/research/clus_fgls_joe... · GENERALIZED LEAST SQUARES INFERENCE IN PANEL AND MULTILEVEL MODELS

GENERALIZED LEAST SQUARES INFERENCE IN PANEL AND

MULTILEVEL MODELS WITH SERIAL CORRELATION AND FIXED

EFFECTS

CHRISTIAN B. HANSEN†

Abstract. In this paper, I consider generalized least squares (GLS) estimation in fixed

effects panel and multilevel models with autocorrelation. The presence of fixed effects

complicates implementation of GLS as they will typically render standard estimators of

the covariance parameters necessary for obtaining feasible GLS estimates inconsistent. I

focus on the case where the disturbances follow an AR(p) process and offer a simple to

implement bias-correction for the AR coefficients. The usefulness of GLS and the derived

bias-correction for the parameters of the autoregressive process is illustrated through a

simulation study which uses data from the Current Population Survey.

Keywords: efficiency, panel, autocorrelation, higher-order, bias reduction

JEL Codes: C12, C13, C23

Date: 14 March 2003. This draft: 7 March 2006.

I would like to thank Josh Angrist, Jim Poterba, Byron Lutz, Joanna Lahey, and David Lyle as well

as seminar participants at Brigham Young University, the University of Chicago, Boston University, Brown

University, the University of Michigan, the University of Illinois, and Stanford University for helpful com-

ments and suggestions. Anonymous referees and a co-editor also provided valuable suggestions that greatly

improved the paper. I am especially grateful to my advisors, Whitney Newey and Victor Chernozhukov,

for comments, suggestions, and support provided throughout the development of this paper. This work was

partially supported by the William S. Fishman Faculty Research Fund at the Graduate School of Business,

the University of Chicago. All remaining errors are mine. † University of Chicago, Graduate School of

Business, 5807 S. Woodlawn Ave., Chicago, IL 60637. Email: [email protected].

1

1. Introduction

Many economic analyses are characterized by regressions involving both aggregate and

individual level data, that is, multilevel data. This is especially prevalent in differences-

in-differences (DD) estimation,1 and policy analysis more generally, where the dependent

variable is often an individual level outcome and the covariate of interest is a policy which

applies to all individuals within a group. For example, in a study of the impact of the

minimum wage on employment, the dependent variable may be the employment in a firm

in group s at time t, and the policy would be the minimum wage in group s at time

t. While this sampling design does not pose any serious problems for estimation of the

linear model, it may lead to serious problems for inference. In particular, the sampling

design gives rise to potential sources of correlation between observations, termed here the

“clustering problem” and the “policy autocorrelation problem”, that would be ignored in

computing conventional least squares standard errors. The clustering problem is caused by

the presence of a common unobserved random shock at the group level that will lead to

correlation between all observations within each group. The policy autocorrelation problem

arises if the groups (not necessarily the individuals within the groups) are followed over

time and the group level shocks are serially correlated, which will result in correlation

between individuals from the same group at different time periods.2 In general, ignoring

these correlations will bias conventional least squares standard errors and lead to misleading

inference. The purpose of this paper is to provide accurate, powerful, and easily computable

inference methods for data which are potentially affected by both the clustering problem

and the policy autocorrelation problem.

The clustering problem has long been recognized in the econometric literature on panel

data, and has more recently been emphasized in economics in other contexts which involve

multi-stage sampling or multilevel data. There are a number of methods for dealing with

this problem which are available in most statistical packages. The most common approach is

to estimate a linear model with OLS and then correct the standard errors for the intracluster

correlation as in Moulton (1986), Arellano (1987), or Kezdi (2002). Feasible Generalized

Least Squares (FGLS)3 estimation may also be performed easily and will asymptotically

result in a more efficient estimator and more powerful tests than OLS.

1DD estimation deals with common unobserved effects via differencing. As a simple example, suppose

we observe treatment and control groups before and after a treatment. Then, assuming linearity, differences

in average outcomes before and after the treatment in both groups remove any common group effects, and

the difference in these differences then removes any common time trend leaving the effect of the treatment

unconfounded with time and group effects. In a regression framework, this differencing may be achieved via

a regression of the outcome on group effects, time effects, and a treatment variable.2For simplicity, I will typically refer to time periods as years.3Throughout, I refer to GLS as the infeasible estimator which assumes that the variance matrix of the

disturbances is known and to FGLS as a feasible estimator which uses estimates of the elements of the

variance matrix.

2

The policy autocorrelation problem has received considerably less attention from applied

economic researchers. In a survey of DD papers published in six leading applied economics

journals from 1990-2000,4 Bertrand, Duflo, and Mullainathan (2004) found that only five

of 65 articles with a potential serial correlation problem explicitly address it; and in sim-

ulations based on individual level Current Population Survey Merged Outgoing Rotation

Group (CPS-MORG) data, they found a 44% rejection rate for a 5% level test using stan-

dard techniques which correct only for the intragroup correlation. To focus on the policy

autocorrelation problem, they also performed a simulation based on data from the CPS-

MORG aggregated to the group-year level. In the aggregate data, they found that using

simple parametric models for the serial correlation did not correct the size distortion, but

that tests based on the OLS estimator which flexibly account for serial correlation, such

as the bootstrap or using a variance matrix robust to arbitrary correlation at the group

instead of the group-year level, had approximately correct size. However, while these tests

appear to have correct size, the results of Bertrand, Duflo, and Mullainathan (2004) also

suggest that they have low power against relevant alternatives.

In this paper, I contribute to the existing literature by offering computationally attrac-

tive FGLS-based estimation and inference procedures that deliver accurate and powerful

inference in settings that are subject to both the clustering problem and the policy autocor-

relation problem. In particular, I explicitly consider FGLS estimation in a general model

for grouped individual and aggregate level data which incorporates standard DD and panel

models under the assumption that the group-year shock follows a stationary AR(p) process.

FGLS estimation and inference based on parametric time series models for the error process

is complicated by the relatively short time dimension available in many policy analyses and

the inclusion of group-level fixed effects which results in bias in conventional estimators of

the time series model’s parameters. Intuitively, this bias is introduced by the subtraction

of the group means from the data to eliminate the fixed effects which alters the variance

structure of the data when the time dimension is short. As a result, conventional estima-

tors that fail to account for this distortion of the variance structure will be biased for the

parameters of the underlying time series model.

To help alleviate this bias, I derive a bias correction for the coefficients of an AR(p) model

and develop asymptotic inference results for the bias-corrected coefficients and the corre-

sponding FGLS estimator. I also consider higher-order properties of the FGLS estimator

in asymptotics where the number of groups and time periods go to infinity jointly showing

that there are higher-order efficiency gains to using bias-corrected AR parameters. The

usefulness of the bias correction and the FGLS procedure are then demonstrated through

a simulation study based on the CPS-MORG.

4The journals are The American Economic Review, Industrial and Labor Relations Review, The Journal

of Labor Economics, The Journal of Political Economy, The Journal of Public Economics, and The Quarterly

Journal of Economics.

3

The results from the simulation study strongly support the use of the FGLS procedure

with bias-corrected AR coefficients for performing inference in settings with combined indi-

vidual and grouped data where the groups are potentially autocorrelated. As in Bertrand,

Duflo, and Mullainathan (2004), I find that conventional OLS and, to a lesser extent, conven-

tional FGLS suffer from severe size distortions in the presence of the policy autocorrelation

and clustering problems. This size distortion is essentially removed by OLS with standard

errors clustered by group and the bias-corrected FGLS procedure. However, the FGLS pro-

cedure clearly dominates OLS with standard errors robust to arbitrary correlation within

groups in terms of both power and confidence interval length. For example, in a simulation

performed by resampling directly from the CPS-MORG, I find that conventional OLS has

a rejection rate of 37% for a 5% level test. In contrast, OLS with standard errors clustered

by group rejects 6.6% of the time, and FGLS based on bias-corrected AR(3) coefficients

has a 6.4% rejection rate. At the same time, the power of the bias-corrected FGLS-based

procedure versus the alternative that the treatment increases the dependent variable by 2%

is 0.788 compared to 0.344 from OLS with clustered standard errors. Similarly, the length

of the FGLS confidence interval is 0.028 compared to the OLS interval length of 0.050.

The remainder of this paper is organized as follows. In Section 2, I briefly review GLS

estimation in settings involving both individual and aggregate data and present a compu-

tationally attractive procedure for obtaining the GLS estimates which will be valid as the

group size grows large within each group-year cell. Section 3 presents a bias-correction

for fixed effects estimates of the parameters of a pth order autoregressive model which will

be used in FGLS estimation and outlines the asymptotic properties of the AR parameter

estimators and the FGLS estimators based upon them. Simulation results comparing the

FGLS estimator to other estimators are presented in Section 4, and Section 5 concludes.

2. GLS Estimation in Multilevel Data

2.1. Overview of GLS with Correlated Error Components. Estimates in DD and

policy analysis studies are often obtained using a linear model defined by

yist = w′istβ0 + Cst + uist (1)

and

Cst = x′stβ1 + z′stβ

s2 + vst, (2)

where s = 1, ..., S, t = 1, ..., T , i = 1, ..., Nst for each s and t, Cst are group-year effects, wist

are covariates that vary at the individual level, xst are covariates that vary at the group-

year level and have constant coefficients, zst are covariates that vary at the group-year

level and have group-specific coefficients5, yist is the outcome of interest which varies at the

individual level, and vst and uist are unobserved random variables which are uncorrelated

with the observed explanatory variables and with each other and have zero means. Typical

5For the theoretical development, zst will be assumed to be nonstochastic and identical across groups.

4

specifications of zst include zst = 1, the fixed effects model, and zst = [1, t], the fixed

effects model with group-specific time trends. It is also standard in DD models to include

time effects in xst. In addition, it is often assumed that E[vstvs′t′ ] = 0 for all s 6= s′.

Conventionally, estimation and inference are performed on the model formed by combining

(1) and (2) as

yist = w′istβ0 + x′

stβ1 + z′stβs2 + ǫist (3)

where ǫist = vst+uist. The clustering problem then results from the fact that E[ǫistǫjst] = σ2v

for all i 6= j, and the policy autocorrelation problem arises from E[ǫistǫjs(t−k)] = γ(k) 6= 0

if vst is serially correlated. In most studies, the model is estimated using OLS, and the

estimated standard errors are adjusted to account for the presence of correlation between

individuals within group-year cells. While this approach has a number of appealing fea-

tures, it will yield incorrect standard error estimates and tests if there are other sources of

correlation, such as correlation within groups over time due to a correlated group-specific

shock. In addition, if the errors are correlated, OLS is not the Gauss-Markov estimator,

and more efficient estimates and more powerful tests may be obtained through GLS.

To facilitate discussion of the GLS estimator, note that equations (3) may be stacked

and represented in matrix form as

Y = Φθ + ǫ, (4)

where Φ = [W,X,Z], Y = (Y ′1 , ..., Y

′S)′, Ys = (Y ′

s1, ..., Y′sT )′, Yst = (y1st, ..., yNstst)

′, and W ,

X, Z, and ǫ are defined similarly. ǫ may be written as DV + U for V = (v11, v12, ..., vST )′,

U defined as Y , and D = [d11 d12 · · · dST ] with dst a dummy variable indicating the

observation belongs to group s at time t, so under the assumption that V and U are

uncorrelated, E[ǫǫ′] = Σ = DΩD′ + Λ where E[V V ′] = Ω and E[UU ′] = Λ. Given the

parameters of Ω and Λ, the best linear unbiased estimator of θ is the GLS estimator

θGLS = (Φ′Σ−1Φ)−1Φ′Σ−1Y. (5)

Given Ω and Λ (Ω and Λ), implementation of the GLS (FGLS) estimator may proceed

in a straightforward fashion for moderately sized data sets by numerically obtaining Σ−1

(Σ−1) and computing θGLS (θFGLS) directly. However, for larger scale problems, such as

the one considered in the simulation section, this procedure is computationally burdensome

due to the size of Σ. Fortunately, there are also numerically convenient approaches available

for compution of θGLS. Amemiya (1978) uses the fact that equation (3) is equivalent to

the model defined by (1) and (2) to reduce the dimension of the problem of finding the

GLS estimates of β1 and βs2 from

∑s

∑t Nst to ST .6 In addition, this approach provides

intuition for asymptotic results regarding the parameters that vary at the group-year level

6Another approach considered in Hansen (2004) recognizes that the structure of the problem implies thatbθGLS may be computed as a least squares regression on quasi-differenced data. This method will provide the

GLS estimates of all parameters in θ and will generally reduce the computational burden from computing

(5) directly.

5

and suggests a simple estimation method that will be asymptotically equivalent to GLS

when Nst is sufficiently large.

2.2. Modeling the Group-Year Fixed Effects. A convenient approach which will pro-

vide the GLS estimates of β1 and βs2, the coefficients on the covariates that vary at the

group-year level, is based on the decomposition of equation (3) into equations (1) and (2).

Amemiya (1978) demonstrated that estimates of β1 and βs2 from the following two-step

procedure are numerically identical to the GLS estimates obtained from estimating model

(3):

1. Estimate equation (1), yist = w′istβ0 + Cst + uist by GLS to obtain estimates of Cst,

Cst.

2. Obtain estimates of β1 and βs2 by estimating the equation Cst = x′

stβ1 + z′stβs2 + νst,

where νst = vst + (Cst − Cst) by GLS.

This approach will typically be computationally easier than directly computing Σ−1. If, as

is conventionally assumed, Λ is diagonal, the first step may be computed by weighted least

squares, and the second step only requires inversion of an ST × ST matrix.

In addition to providing a tractable method for obtaining GLS estimates of β1, this

approach also clearly illustrates that β1 is not consistent as Nst → ∞ with S and T fixed.

In particular, we see that consistency and asymptotic normality of β1 requires that ST →∞. The use of Amemiya’s (1978) approach and the inconsistency of estimates of β1 in

asymptotics with S and T fixed has been emphasized in work of Donald and Lang (2001)

in the context of DD estimation when serial correlation is not present.

Finally, Amemiya’s (1978) results suggest a simple estimation strategy which will be

equivalent to GLS as Nst → ∞ for all s and t:

1’. Estimate equation (1), yist = w′istβ0 +Cst +uist by OLS or GLS to obtain estimates

of Cst, Cst.

2’. Obtain estimates of β1 and βs2 by estimating the equation Cst = x′

stβ1 + z′stβs2 + vst,

by GLS.

Note that 1’ and 2’ differ from 1 and 2 above in that the first step may be estimated

by OLS regardless of the assumption about Λ and 2’ ignores the fact that the dependent

variable in the second step, Cst, was estimated. The equivalence of this approach to GLS

for estimating β1 and βs2 when Nst is sufficiently large follows from numeric equivalence of

Amemiya’s (1978) two-step approach to GLS and consistency of Cst for Cst as Nst → ∞.

This result also implies that estimation of Cst may be ignored and GLS estimates of β1 may

be obtained through standard panel methods when Nst is large. This is particularly useful

since data used in many DD problems are characterized by rather large cell sizes, and the

approach outlined above is easy to implement.6

3. Bias Correction for pth Order Autoregressive Coefficients in Fixed

Effects Models

In order to operationalize GLS estimation and inference in practice, parameters of the

covariance matrices Ω and Λ must be estimated. Under the usual assumption that the uist

are i.i.d., estimation of the parameters of Λ may proceed in a straightforward fashion from

equation (1), and when the Nst are large, estimation of these parameters may be bypassed

completely using the aggregation method discussed above. Thus, I focus on estimation of Ω

in the case where estimation of Cst is ignored. I present a modification of the basic results

that accounts for estimation of Cst in Section 3.3.

Let vst denote the least squares residuals from estimating

Cst = x′stβ1 + z′stβ

s2 + vst (6)

and suppose Ω = Ω(α) is characterized by a finite dimensional parameter vector α. An

obvious approach to estimating α would be to use the vst and a conventional time series

estimator to recover α; for example, regressing vst on vst−1 to recover the autocorrelation

coefficient in an AR(1) model.7 These estimates of α will be consistent as T approaches

infinity, but in practice, estimation of α is complicated by the presence of zst in equation (6)

and the fairly short time series dimension available in most applications which may result

in substantial bias in the estimates of α.8

The intuition for this bias may be seen by considering the fixed effects model. In this

model, vst ≈ vst − vs where vs = 1T

∑Tt=1 vst as S gets large with T fixed. Thus, for small

T , the residuals do not behave like the underlying errors but behave like the difference

between these errors and their within group means. This behavior alters the correlation

structure of the residuals when T is small and results in inconsistency of conventional

estimators of the time series parameters which fail to account for this difference. As a

simple example, note that the natural estimator of the covariance between vst and vst−1,1

S(T−1)

∑Ss=1

∑Tt=2 vstvst−1 converges to E[

∑Tt=2(vst−vs)(vst−1−vs)/(T −1)] = E[vstvst−1]−

E[∑T

t=2 vstvs]/(T − 1) − E[∑T

t=2 vsvst−1]/(T − 1) + E[v2s ] which will generally be equal to

E[vstvst−1] + O(1/T ) as S → ∞. Even when the vst are iid with mean 0 and variance

σ2v , E[

∑Tt=2(vst − vs)(vst−1 − vs)/(T − 1)] = −σ2

v/(T − 1) = 0 + O(1/T ), and for small or

moderate T , the bias from failing to account for the difference in the correlation structure in

the observed residuals and the underlying process may be substantial. Similar behavior also

occurs in the more general model (6) where the least squares residuals behave like residuals

from the within-group regression of V on Z.

7Estimating the model using maximum likelihood poses similar problems to those described below.8Bertrand, Duflo, and Mullainathan (2004), in a survey of published differences-in-differences papers, find

an average time series length of only 16.5 periods. They also find significant bias in autoregressive parameter

estimates in their simulations.

7

Given this, there are a number of approaches that could be pursued. In the case of

the fixed effects model, a potential solution suggested by Macurdy (1982) in a frequentist

setting and Lancaster (2002) in a Bayesian setting is to first-difference the data to remove

the fixed effects which will produce a known transformation of the original process. One

may then use a likelihood or moments based estimator and the transformed error process

to estimate the parameters of the original process. Of course, in models where zst is not a

constant, this approach will generally not resolve the problem; but for more general Z, one

may make use of the fact that least squares residuals behave like residuals from the within-

group regression of V on Z which also produces a known transformation of the original

error process to form a moments or likelihood-based estimator.

The chief drawback of the transformation approaches mentioned above is that the im-

plied transformations may be quite complicated. Solving for the parameters of the error

process will typically involve non-linear estimation and may be difficult in some cases. An

alternative approach is to allow for an unconstrained covariance matrix within groups as

in Kiefer (1980) and Hausman and Kuersteiner (2003). This approach is appealing as it

flexibly models the covariance matrix while bypassing the need for nonlinear optimization.9

The chief drawback of the approach is that it relies heavily on S being much greater than

T as one needs to estimate an unconstrained (T − 1) × (T − 1) covariance matrix from S

observations. In samples of sizes common in differences-in-differences estimation, this may

lead to poor finite sample performance as illustrated in the simulation section.10

In this paper, I consider a somewhat different approach that builds on the early work of

Nickell (1981) and Solon (1984) and makes use of the simple idea that rather than working

with a transformed error process, one may calculate the bias of a conventional estimator

and then remove an estimate of this bias from the estimator. Specifically, I derive the bias

of the least squares estimator of α as S → ∞ under the assumption that vst follows a

stationary pth order autoregressive process,

vst =

p∑

j=1

αjvs(t−j) + ηst.

I then subtract an estimate of this bias from the least squares estimator of α to form a bias-

corrected estimator of α. A simple one-step estimator based on this strategy removes the

bias from the asymptotic distribution of the estimator of α as long as ST 3 → 0. In addition,

I show that an iterative procedure is consistent as S → ∞ even with T fixed and that the

approach produces an estimator that has the same asymptotic variance as the conventional

least squares estimator of the AR parameters in asymptotics where S and T go to infinity

jointly. The basic approach is similar to that of Hahn and Newey (2004) and Hahn and

9For details, see Kiefer (1980) or Hausman and Kuersteiner (2003).10Hausman and Kuersteiner (2003) provide corrections for the inference based on higher-order in S

asymptotics. Simulation results in Hausman and Kuersteiner (2003) suggest the correction is effective for

moderate T , but they still find substantial size distortions with S = 50 and T = 20 which is not uncommon

in differences-in-differences estimation.

8

Kuersteiner (2002). In fact, for the AR(1) model, it is straightforward to show that the

difference between the Hahn and Kuersteiner (2002) bias reduction and the one-step bias

reduction derived here is O(

1T 2

).

Working with an AR(p) model is convenient for a number of reasons. The AR(p), and

specifically the AR(1), is a common model for the error process employed in empirical

research that offers a reasonably general yet parsimonious model for the correlation strucure.

The structure of the least squares estimator of α makes it particularly easy to analyze in

the present setting and allows derivation of reasonably simple exact expressions for the

bias of the estimator which would not be possible for a more general model that required

nonlinear estimation. This linearity is exploited in both deriving the bias and in providing

bias corrections that involve only linear operations and so are quite simple to implement.

The parsimony of the model also makes it attractive in cases where T is moderate and

allowing an unconstrained covariance matrix is problematic.

In the results presented below, I focus on the case where the data have been aggregated

to the group-year level using the approach of Amemiya (1978), though the method outlined

below could be extended to treat other cases. The results here may also be adapted to

bias-correct AR coefficients in dynamic panel models without covariates.11 Throughout the

remainder of the development it is assumed that vst has zero mean and constant variance

which does not depend on X or Z for all s and t.12

3.1. The Bias Correction. The least squares estimator of α using the residuals from the

estimation of equation (6), vst, is

α =

1

S(T − p)

S∑

s=1

T∑

t=p+1

v−stv−′

st

−1 1

S(T − p)

S∑

s=1

T∑

t=p+1

v−stvst

(7)

where v−′

st = (vs(t−p), ..., vs(t−1)). Let E[vstvs(t−k)] = γk(α), and let

Γp =

γ0(α) γ1(α) · · · γp−1(α)

γ1(α) γ0(α) γp−2(α)...

. . ....

γp−1(α) γp−2(α) · · · γ0(α)

. (8)

Then, using calculations similar to those found in Nickell (1981) and Solon (1984) and

assuming regularity condition collected in Assumption 2 in the appendix hold, one can

show that as S → ∞ with T fixed αp→ αT (α) = (Γp(α)+ 1

T−p∆Γ(α))−1(A(α)+ 1T−p∆A(α))

11Note that the approaches in Kiviet (1995) and Alvarez and Arellano (2003) for correcting AR coefficients

in dynamic panel models could also be adapted to the present setting.12When discussing estimates of α and the bias-correction, the variance of vst is normalized to 1 to

simplify some notation as it does not affect the location of the estimators. This normalization is dropped

when discussing the asymptotic distributions of the estimators.

9

with

A(α) = (γ1(α), ..., γp(α))′, (9)

∆A(α) a p× 1 vector, ∆Γ(α) a p× p matrix, and ∆(α) = [∆A(α)∆Γ(α)] a p× p + 1 matrix

with i, j element

[∆(α)][i,j] = trace(Z ′

sΓ(α)Zs(Z′sZs)

−1Z ′s,−iZs,−(j−1)(Z

′sZs)

−1)

−trace(Z ′

sΓ−i(α)Zs,−(j−1)(Z′sZs)

−1)− trace

(Z ′

sΓ−(j−1)(α)Zs,−i(Z′sZs)

−1)

(10)

where Γ(α) = E[VsV′s ], Γ−k(α) = E[VsV

′s,−k], Vs,−k = (vs(p+1−k), vs(p+2−k), . . . , vs(T−k))

′ and

Zs,−k is defined similarly. Thus, the asymptotic bias of α is −α + αT (α), which suggests

that the bias of α may be estimated as −α + αT (α) and that a bias corrected estimator of

α may be constructed as

α(1) = α − [−α + αT (α)]. (11)

In addition, αp→ αT (α) suggests that a consistent (in S alone) estimate of α may be

obtained by inverting αT (α) to obtain α(∞) = α−1T (α). This estimator can be calculated

by iterating α(k+1) = α − [αT (α(k)) − α(k)] to convergence, since, denoting α(∞) as the

point that the procedure converges to, α(∞) = α − [αT (α(∞)) − α(∞)] ⇒ αT (α(∞)) = α ⇒α(∞) = α−1

T (α). Bhargava, Franzini, and Narendranathan (1982) suggest a similar iterative

procedure based on the Durbin-Watson statistic to remove the bias of autoregressive pa-

rameter estimates in AR(1) models with fixed effects, though no formal asymptotic results

are presented and the extension to models beyond the AR(1) is not clear.

Asymptotic properties of the estimators of α are collected in two propositions which will

make use of the following notation. Representing equation (2) in vector notation, let

Cs = Xsβ1 + Zsβs2 + Vs (12)

where Cs = [Cs1, . . . , CsT ]′ is T × 1, Xs = [xs1, . . . , xsT ]′ is T × k1, Zs = [zs1, . . . , zsT ]′ is

T × k2, and Vs = [vs1, . . . , vsT ]′ is T × 1. Define

vst = vst − z′st(Z′sZs)

−1Z ′sVs, (13)

and Vs = [vs1, . . . , vsT ]′. Let v−st be a p×1 vector with v−st = [vs(t−p), . . . , vs(t−1)]′, and define

v−st similarly.

The first proposition provides asymptotic results as S → ∞ with T fixed, and the second

provides asymptotic results as S and T go to infinity jointly. Regularity conditions are

collected in Assumption 2 in the appendix.

Proposition 1. Suppose αT (α) is continuously differentiable in α and that the derivative

matrix of αT (α) in α, H, is invertible for all α such that Assumption 2.(i) is satisfied.

Then, if conditions (i)-(v) of Assumption 2 are satisfied, α(∞)−αp→ 0 and

√S(α(∞)−α)

d→10

1T−pH−1(Γp(α) + 1

T−p∆Γ(α))−1χ where χ ∼ N(0,ΞT ),

ΞT = E[

T∑

t1=p+1

T∑

t2=p+1

v−st1 µst1µst2 v−′

st2 ]

and µst = vst − v−′

st αT (α).

Remark 3.1. The condition that αT (α) is continuously differentiable in α and that H is

invertible guarantees the existence of the inverse of αT (α) and seems reasonable in many

settings.13 For example, it is satisfied in the fixed effects model. However, this is an

asymptotic result, and even if all the assumptions are satisfied, sampling variation in α may

result in there not being a solution to αT (α) = α that satisfies the stationarity assumption,

Assumption 2.(i), in any given sample. The effects of this are illustrated in the simulation

section.

Proposition 1 verifies that α(∞) is consistent and asymptotically normal as S → ∞ even

if T is fixed, demonstrating that the inconsistency in the uncorrected estimator may be

completely removed through the use of an iterative bias-correction. The result may also be

of interest in the dynamic panel context, where it provides a simple alternative to GMM

methods, although it does rely on strong exogeneity assumptions. Proposition 2 provides a

similar result under asymptotics where S and T jointly approach infinity.

Proposition 2. If conditions (i)-(iii), (vi), and (vii) of Assumption 2 are satisfied,

(i)√

ST (α − α)d→ N(ρB(α),Γ−1

p ΞΓ−1p ) if S

T → ρ ≥ 0, where

Ξ = limT→∞

1

T − p

T∑

t1=p+1

T∑

t2=p+1

E[v−st1ηst1ηst2v−′

st2 ].

In addition, if ηst are independent for all s and t, Ξ = σ2ηΓp.

(ii)√

ST (α(1) − α)d→ N(0,Γ−1

p ΞΓ−1p ) for Ξ defined in Proposition 2(i) if S

T → ρ ≥ 0.

In addition, if B(α, T ) ≡ αT (α)−α is continuously differentiable in α with bounded

derivative uniformly in T for all α satisfying the stationarity condition given in

Assumption 2.(i), then√

ST (α(1) − α)d→ N(0,Γ−1

p ΞΓ−1p ) if S

T 3 → 0.

(iii)√

ST (α(∞) − α)d→ N(0,Γ−1

p ΞΓ−1p ) for Ξ defined in Proposition 2(i) if B(α, T ) ≡

αT (α) − α is continuously differentiable in α and the derivative matrix of αT (α) in

α, H is invertible for all α satisfying the stationarity condition given in Assumption

2.(i) uniformly in T as S, T → ∞ jointly.

Remark 3.2. The conditions stated in (ii) and (iii) impose that the derivatives of αT (α)

are well-behaved as T → ∞. The additional condition in (iii) guarantees that the inverse

of αT (α) exists asymptotically. The additional conditions in (ii) and (iii) seem to be rea-

sonable and are satisfied, for example, in the fixed effects model. As above, even if all the

13Note this condition is analogous to the usual full rank condition on the derivative of the scores in GMM

estimation.

11

assumptions are satisfied, sampling variation in α may result in there not being a solution

to αT (α) = α that satisfies the stationarity assumption, Assumption 2.(i), in finite samples.

Remark 3.3. Conclusion (ii) demonstrates that α(1) removes the bias from the asymptotic

distribution of α as long as S grows more slowly than T 3. ST 3 → 0 may be a good ap-

proximation in many situations, such as the CPS data examined in the simulation section.

Conclusion (iii) verifies that iterating the bias-correction to convergence removes the bias

from the limiting distribution as S and T grow large for any S, T sequence.

Conclusions (i) and (ii) of Proposition 2 mirror similar results from Hahn and Kuersteiner

(2002) and Hahn and Newey (2004), demonstrating that bias remains in the limiting dis-

tribution of α even when T grows as fast as S, but that this bias is removed if the one-step

bias correction is used as long as T grows fast enough relative to S. Hahn and Newey (2004)

also suggest iterating their bias correction for nonlinear fixed effects models, though they

find that it does not lead to any improvement asymptotically in a general non-linear model.

It is important to note that Proposition 2 ignores the estimation of time effects, which

would further complicate the analysis. The time effects will be√

S-, not√

ST -, consistent,

which will generally add an O(

1S

)bias to the estimator. This will not affect the stated

results as long as ST → ∞. However, if S and T grow at the same rate, the inclusion of the

time effects will result in bias in the limiting distribution of all the estimators, including the

bias-corrected ones. In many applications, this should not be a large source of bias as S is

typically larger than T . In addition, the neglected term is a sum over groups not over time,

so the additional bias will include only contemporaneous correlations. Overall, it seems

that any bias coming from the time fixed effect is likely to be small, and the simulations

in the next section also suggest that ignoring this source of bias provides a reasonable

approximation.

3.2. Implications for FGLS Estimation. While the bias-correction results presented

above may be interesting for a number of reasons, the chief motivation for their development

in this paper is for use in FGLS estimation as outlined in Section 2. In asymptotics where

S → ∞ with T fixed, the need for using a corrected estimator of α when performing FGLS

estimation is clear as the uncorrected estimator is not consistent for α and so the FGLS

estimator which is produced when the uncorrected estimator is used will not converge to

the GLS estimator. Under asymptotics where S and T are both large, the need for the

bias correction is less obvious as even the uncorrected estimator is consistent in this setting.

However, intuition would suggest that using an improved estimator of α when forming

the FGLS estimator would produce an estimator of β with better finite sample properties,

which is verified in Proposition 3 below. In addition, since an estimate of α appears in the

estimator of the variance of the FGLS estimator of β, it is reasonable to believe that using

a bias-corrected estimate of α would reduce the bias in the estimate of the variance of the

FGLS estimator of β. This intuition is also confirmed below.12

To develop the properties of the GLS estimator of β1, define Xs = Γ(α)−1/2Xs and

Zs = Γ(α)−1/2Zs where Γ(α) = E[VsV′s ]. Then, under standard conditions and using

conventional arguments, it follows that the GLS estimator of β1, β1(α), is consistent and that√S(β1(α) − β1)

d→ N(0, A(α)−1) where A(α) = E[X ′sXs − X ′

sZs(Z′sZs)

−1Z ′sXs] as S → ∞

with T fixed and that β1(α) is consistent and that√

ST (β1(α) − β1)d→ N(0, A(α)−1) for

A(α) = limT→∞

E[X ′sXs−X ′

sZs(Z′sZs)

−1Z ′sXs] as S, T → ∞. Furthermore, the GLS estimator

is the Gauss-Markov estimator and hence is efficient among linear estimators.

Then letting β1(α) denote the FGLS estimate of β1 when the covariance matrix is

constructed using α and using standard arguments, it is straightforward to show that√S(β1(α

(∞))−β1(α)) = op(1) in asymptotics where S → ∞ and that√

ST (β1(α)−β1(α)) =

op(1) for any αp→ α in asymptotics where S, T → ∞. This result indicates that, as S → ∞

with T fixed, FGLS based on the iteratively bias-corrected estimator of α will yield effi-

ciency gains relative to the other estimators of α considered above, but that there is no

efficiency gain from using a bias-corrected estimate of α in asymptotics with S, T → ∞.

While these results suggest that there is little motivation to use bias-corrected estimates

of α in performing FGLS estimation and inference in circumstances where S, T → ∞asymptotics provide a good approximation, there are at least two reasons that using the

bias-correction may be preferable. First, the estimator of the covariance matrix of the FGLS

estimator will be biased to the same order as the estimator of α. To see this, note that

vec(√

ST (A(α) − A(α))) = vec(∑p

i=1∂( bA(eeα))

∂αi

√ST (αi − αi) +

√ST (A(α) − A(α))) where ˜α

is an intermediate value between α and α. It then follows that, under regularity conditions,

A(α) is biased to the same order as α.

Second, it seems likely that there would be higher-order improvements to the FGLS

estimator of β when bias-corrected estimates of α are used. In particular, that bias-corrected

estimates of α will tend to be closer to α than estimates without correction suggests that

there will be higher-order efficiency gains to using the bias-corrected estimates. This higher-

order efficiency gain is confirmed in Proposition 3 which provides the higher-order bias and

variance of the estimator. Further evidence on the extent of the efficiency gain is provided

in the simulation results reported in Section 4.

Assumption 1 (Higher-Order Asymptotics). Suppose the data are generated by model

(12) and

HO1. vst = v−′

st α + ηst, where ηst are iid N(0, σ2η) random variables, and E[VsV

′s ] = Γ(α)

is a T × T positive definite matrix with minimum eigenvalue bounded away from 0

and maximum eigenvalue bounded away from infinity uniformly in T.

HO2. (i) Xs, Zs are nonstochastic with Zs identical across s. (ii) [X,Z], where X =

[X ′1, . . . ,X

′S ]′ and Z = diag(Z1, ..., ZS) has full rank, and the minimum eigenvalue of

X ′X −X ′Z(Z ′Z)−1Z ′X is bounded away from zero. (iii) |xsth| ≤ ∆ and |zsth| ≤ ∆.13

HO3. Let Mz(α) = (I⊗Γ(α)−1)−(I⊗Γ(α)−1)Z(Z ′(I⊗Γ(α)−1)Z)−1Z ′(I⊗Γ(α)−1), M =

I−X(X ′Mz(α)X)−1X ′Mz(α), A(α) = 1ST X ′Mz(α)X, and Ψ(α) = 1√

STX ′Mz(α)MV .

For i = 1, ..., p, j = 1, ..., p, and k = 1, ..., p, (i) each matrix in A(α), Ai(α) = ∂A∂eαi

|α,

and Aij = ∂2A∂eαi∂eαj

|α approaches a limit as S, T → ∞, and limA(α) is nonsingular,

(ii) the covariance matrices for the vectors in Ψi(α) = ∂Ψ∂eαi

|α, and Ψij(α) = ∂2Ψ∂eαi∂eαj

|αapproach limits as S, T → ∞, and (iii) all the matrices in A(α), Ai(α), Aij(α),

Aijk(α) = ∂3A∂eαi∂eαj∂eαk

|eα, Ψ(α), Ψi(α), Ψij(α), and Ψijk(α) = ∂3Ψ∂eαi∂eαj∂eαk

|eα are bounded

in probability uniformly for α ∈ A where A is a neighborhood of α.

HO4. B(α, T ) is three times continuously differentiable in α with the first three derivatives

bounded uniformly in T for all α satisfying the stationarity condition given in NT1.

Remark 3.4. While weaker conditions are certainly available, the conditions imposed in

Assumption 1 are quite similar to those in Rothenberg (1984), with the exception of HO4,

and are sufficient for deriving the higher-order bias and variance of the FGLS estimator of

β based on bias-corrected and uncorrected estimates of α. HO4 is necessary for establishing

the properties of the iteratively bias-corrected estimator of α and basically requires that

the first three derivatives of αT (α) are well-behaved.

Remark 3.5. Under additional regularity conditions, including continuous distributions,

the higher-order bias and variance derived below will correspond to that of an Edgeworth

approximation to the distribution of the FGLS estimator. In addition, even when the data

are discrete so the Edgeworth approximation is not valid, the results may still be used for

higher-order efficiency comparisons as in Pfanzagl and Wefelmeyer (1978).

The conditions in Assumption 1 are sufficient to establish the following result.

Proposition 3. Let α be the least squares estimator of α, α(∞) be the iteratively bias-

corrected estimator, and β(α) and β(α(∞)) be the corresponding FGLS estimators. If As-

sumption 1 is satisfied and ST → ρ as S, T → ∞ jointly, the higher-order bias and

variance of β(α) and β(α(∞)) are

Bias(β(α)) = Bias(β(α(∞))) = 0 (14)

V ar(√

ST (β(α) − β)) = A−1 + Υ/ST (15)

+1

ST

p∑

i=1

p∑

j=1

ζijS

TB(α, T )iB(α, T )j + O(1/ST 2)

V ar(√

ST (β(α(∞)) − β)) = A−1 + Υ/ST + O(1/ST 2). (16)

for ζij = E[ΨiΨ′j] and Υ =

∑pi=1

∑pj=1 ζij(

STS(T−p)Γ

−1p )[i,j] where ( ST

S(T−p)Γ−1p )[i,j] is the [i, j]

element of matrix STS(T−p)Γ

−1p . Also,

V ar(√

ST (β(α) − β))−V ar(√

ST (β(α(∞)) − β))

=1

ST

p∑

i=1

p∑

j=1

ζijS

TB(α, T )iB(α, T )j ≥ 0.

14

Remark 3.6. Proposition 3 presents the higher-order bias and variance of the FGLS estima-

tors based on bias-corrected and uncorrected estimates of the serial correlation parameters,

α. As would be expected, the bias of the FGLS estimator does not depend on whether

the bias-corrected estimator of α is used. The variance, on the other hand, depends on the

mean squared error of the asymptotic distribution of the estimator of α, implying that the

use of a bias-corrected estimator of α results in a higher-order efficiency gain. The presence

of the fixed effects also results in the presence of an O(1/ST 2) term in the variance of the

FGLS estimator which is a complicated function of Z and Ω; however, the term is the same

regardless of whether α or α(∞) is used and so cancels out of the efficiency comparison.

Finally, it should be observed that the validity of the results presented here requires strin-

gent assumptions about the exact nature of the error process. In practice, a researcher may

be concerned that these assumptions are not satisfied. For example, one may suspect that

there is temporal heteroskedasticity or that the AR process is not constant across groups.

In these cases, the FGLS estimates obtained assuming homoskedasticity and constant AR

coefficients will still generally be consistent and asymptotically normal and may still offer

efficiency gains over OLS, although use of a robust variance matrix will be necessary for

correct inference.14 This approach is examined in the simulation section.

3.3. Bias Correction when Cst is Estimated. An additional complication will arise if

estimation of Cst is considered. In this case, the error in equation (6) is vst + Cst − Cst,

not vst, and implementation of the FGLS estimator will require estimation of E[(Vs + Cs −Cs)(Vs + Cs − Cs)

′] = σ2vΓ(α) + E[(Cs − Cs)(Cs − Cs)

′]. The presence of Cst − Cst adds

an additional O(

1N

)bias to the OLS estimates of α, where for simplicity I have assumed

Nst = N for all s and t. For moderate or large N , this bias will likely be a small concern,

and ignoring it may be preferable.15 However, if N is small, the researcher may wish to

account for this bias.

To this end, note that if the groups are distinct then the only correlation between Cst

across states comes from the fact that β1 is estimated rather than known, and the covariance

between Cst and Cs′t′ will be zero if a separate estimate of β1 is calculated within each group.

Let Cs denote the vector of time effects for state s obtained in this manner, and define V1(C)

as a p× p matrix, V2(C) as a p× 1 vector, and V (C) = [V2(C) V1(C)] as a p× p + 1 matrix

with [i, j] element

V1[i,j](C) =1

S(T − p)

S∑

s=1

T∑

t=p+1

Cov(Cs(t−i), Cs(t−(j−1)))

− 1

T − ptrace

([Z ′

sVs,−i(C)Zs,−(j−1)(Z′sZs)

−1])

14The use of FGLS with a robust variance matrix has also been suggested by Wooldridge (2003) and

Liang and Zeger (1986).15For a discussion in a different but related context, see Dickens (1990).

15

− 1

T − ptrace

([Z ′

sVs,−(j−1)(C)Zs,−i(Z′sZs)

−1])

+1

T − ptrace

([Z ′

sVs(C)Zs(Z′sZs)

−1Z ′s,−iZs,−(j−1)(Z

′sZs)

−1])

where Cov(Cst, Cs′t′) estimates the covariance between Cst and Cs′t′ , Vs(C) estimates

E[(Cs − Cs)(Cs − Cs)′], Vs,−k(C) estimates E[(Cs − Cs)(Cs,−k − Cs,−k)

′], and

Cs,−k = (Cs(p+1−k), Cs(p+2−k), . . . , Cs(T−k))′. It is then straightforward to demonstrate that

α =

1

S(T − p)

S∑

s=1

T∑

t=p+1

v−stv−′

st − V1(C)

−1 1

S(T − p)

S∑

s=1

T∑

t=p+1

v−stvst − V2(C)

p→ αT (α)

as S → ∞ with T fixed as long as a law of large numbers applies and Cov(Cst, Cs′t′) is an

unbiased estimate of the covariance between Cst and Cs′t′ . The use of a consistent estimate

instead of an unbiased estimate will remove the O(

1N

)bias.

As opposed to the case where estimation of Cst is ignored, σ2v also needs to be estimated

in this case before FGLS estimation may be performed as it will no longer cancel out from

the numerator and denominator of the estimator. A natural estimator of σ2v may be found

by noting that

σ2v =

1

ST

S∑

s=1

T∑

t=1

v′stvst −1

ST

S∑

s=1

T∑

t=1

E(Cst − Cst)2 +

1

Ttrace(Z ′

sVs(C)Zs(Z′sZs)

−1

p→ σ2v − σ2

v

Ttrace(Z ′

sΓ(α)Zs(Z′sZs)

−1)

from which a consistent estimator σ2v = σ2

v/(1 − 1T trace(Z ′

sΓ(α)Zs(Z′sZs)

−1) may easily be

recovered.

4. Monte Carlo Evidence

In order to provide evidence on the performance of the proposed methods, I performed a

Monte Carlo experiment using data drawn from the CPS-MORG. The data are for women

in their fourth interview month for the years 1979 to 2001, and the sample is restricted

to women aged 25 to 50 who report positive weekly earnings.16 With these restrictions

imposed, the total sample size is 600,941 observations in 1173 state-year cells, which gener-

ates an average cell size of approximately 512 observations. The dependent variable, yist is

defined as the log of the weekly wage, and covariates include a quartic in age, four educa-

tion dummies, and state and time fixed effects. Iteratively bias-corrected AR(4) parameter

estimates (standard errors) in the actual data are α1 = 0.397 (0.032), α2 = 0.268 (0.034),

16These are the same sample selection criteria as used in Bertrand, Duflo, and Mullainathan (2004),

though the Bertrand, Duflo, and Mullainathan (2004) study only had data from 1979 to 1999. In addition,

the data are aggregated to state-year cells using different methods in their paper. However, the OLS and

clustered results I report are similar to those in Bertrand, Duflo, and Mullainathan (2004).

16

α3 = 0.146 (0.034), and α4 = 0.058 (0.032), where the last coefficient is insignificant at the

95% level.

I consider two different simulation designs. In Design 1, I draw from the actual data by

resampling states and include a randomly generated treatment which varies at the state-

year level, xst, as a regressor which enters the model with β1 = 0. In Design 2, I aggregate

the data to state-year cells by estimating equation (1) and saving the estimated fixed effects.

I then regress the estimated fixed effects on all regressors which are constant within state-

year cells and treat these coefficient estimates as the “true” parameters. Then for each

simulation iteration, I construct Cst from model (2) using the parameters estimated in the

previous step and a randomly generated treatment, xst which enters with coefficient β1 = 0.

I assume vst follows an AR(1) with α1 = 0.8, and I construct the error term so that its

variance is similar to the empirical variance in the sample. All estimates are constructed

ignoring estimation of Cst. In both Design 1 and Design 2, I generate the treatment by

randomly selecting 26 states to be treated and then randomly selecting a start date for

the treatment, which may be any but the first period. The treatment variable is a dummy

variable which equals one in the treatment year and all years following, and I allow the

treatment date to be different in each treated state. In most of the simulations, I use

S = 51 and I consider three different values for T : 6, 12, and 23. When simulating from

the actual data with T = 12 (T = 6), I use the most recent 12 (6) years of data. In the

simulated data, the time blocks are drawn randomly.

Design 1 resamples from actual data in CPS and so should be similar to data that re-

searchers actually employ. In this case, I do not constrain the errors to follow any particular

process but, by resampling states, maintain the within state correlation structure found in

the data.17 Given this and the good performance of the AR models in the simulation sec-

tion, it appears that an AR model provides a reasonable approximation to the underlying

error process in the data. Data drawn using Design 1 are used in Tables 3 and 4.

Design 2 is constructed to represent conventional panel data with AR error processes.

In this sense, it corresponds to the ideal situation for employing the correction proposed in

Section 3. Design 2 makes use of an AR(1) for the error process, and I estimate an AR(1)

and AR(2) allowing examination of the properties of the FGLS estimator under correct

specification and when the model is slightly overfit. Design 2 is used in Tables 1, 2, and 5.18

4.1. Bias of AR(p) Parameter Estimates. Before turning to inference on the treatment

effect, it is useful to consider the bias of uncorrected and bias-corrected estimates of the

AR parameters. Table 1 contains the bias and MSE (in parentheses) when the model is

specified as (2) and the model is simulated using Design 2. In the columns corresponding

17By independently resampling states, I do impose no cross-state correlation.18I also considered a design similar to Design 2 in which the error process followed an AR(2). The results

were broadly consistent with those from Designs 1 and 2 and are available upon request.

17

to α(∞), the number in brackets represents the number of instances in which α−1T (α) failed

to converge in the unit interval. In these cases, α(∞) was set equal to α(1).

[Insert Table 1 about here]

The results in Table 1 clearly demonstrate that the uncorrected estimates suffer from

substantial bias, even when T is reasonably large. In addition, the results show that both

the one-step and iterative bias-corrections eliminate a large portion of the bias in all cases

considered, though a sizable bias remains in the one-step estimator for small T . The results

also illustrate the consistency of the iterative bias-correction as S → ∞ with T fixed, though

the bias goes away much more slowly in S than in T . Overall, the results suggest that both

of the derived bias-corrections are effective in removing a large component of the bias in

the AR parameter estimates with the iterative procedure dominating in terms of both bias

and MSE.

4.2. Inference on the Treatment Effect. Results for inference about the treatment

parameter are contained in Tables 2 to 5. In each table, the first three columns use the

full sample of 51 states and 23 years, while the middle three columns use 12 years of data

and the last three use only 6 years of data. Rows labeled OLS contain test results from the

OLS estimates without any adjustment to the standard errors, and rows labeled cluster use

variance matrices that are robust to correlation within groups at the specified levels. E.g.

a row labeled with “Cluster by State” uses a variance matrix which is robust to arbitrary

correlation among all observations within a state. The use of robust variance matrices in

the individual level data allowing for correlation at the level of the aggregate data (the

state-year level in this case) is probably the most commonly used method for accounting for

possible correlations arising due to the use of aggregate and individual level data. Bertrand,

Duflo, and Mullainathan (2004) suggest using this correction, clustering at the state level

instead of the state-year level, and find that this procedure yields tests with approximately

correct size in their simulation study. The row labeled random effects reports results from

the standard random effects estimator allowing for correlation at the state-year level, and

rows labeled “FGLS-U” use the FGLS approach suggested in Kiefer (1980) which does not

constrain the variance matrix over time within states but assumes the variance matrix is

identical across states. The remaining rows contain test results based on FGLS where the

state-year shock is assumed to follow the specified process; the “bc” subscript indicates the

use of the iteratively bias-corrected AR parameter estimates in the FGLS estimation and

inference. The rows designated “AR(p)-Cluster by state” estimate the model using FGLS

based on an AR(p) process and then use a robust variance matrix clustered at the state level

for inference, while the rows labeled “AR(p)” use the standard GLS formula to estimate the

variance matrix. Within each table, I report results from conventional inference methods

in Panel A and results which use the bias-correction procedure developed in this paper in

Panel B.

[Insert Table 2 about here]18

Table 2 contains results regarding the variance of the estimated treatment parameter, β1.

The columns labeled σ2 report the mean of the estimated variance of β1, while the corre-

sponding asymptotic variance and variance of the simulation estimates of β1 are contained

in the columns labeled σ2a and σ2

s , respectively. Simulation results are for data generated

using Design 2 described above. For readability, all results are multiplied by 1000.

The results summarized in Table 2 provide strong evidence supporting the use of FGLS

estimation with bias-corrected estimators of the AR-parameters. While the difference be-

tween the asymptotic variance and the mean of the estimated variance is small for all of

the estimators considered with the exception of “FGLS-U” and the unadjusted OLS esti-

mator, the variances estimated from FGLS with bias-corrected AR-parameters are always

approximately unbiased for the asymptotic variance of the estimator. Unsurprisingly, it

appears that the asymptotic approximation of the FGLS estimator performs substantially

better when the FGLS is based on the bias-corrected AR parameters than when the uncor-

rected estimates are used. The results also clearly indicate the efficiency gain due to using

bias-corrected estimates of the AR coefficients when forming the FGLS estimates. With

T = 6, the variance of the FGLS estimator based on uncorrected AR parameter estimates

is 1.3 times as large as the variance of the FGLS estimator which uses bias-corrected AR

coefficient estimates; and even with T as large as 23, the variance of the FGLS estimator

which uses uncorrected AR parameter estimates remains 1.08 times as large as the variance

of the FGLS estimator based on bias-corrected coefficients.




Tables 3 to 5, which contain results for size and power of hypothesis tests about the

treatment parameter as well as confidence interval lengths, provide further evidence on

the potential gains to using FGLS with bias-corrected AR parameters. In all cases, size

and power are for 5% level tests, and power is versus the alternative that β1 = 0.02.19

The reported interval length is the confidence interval length divided by two. Reference

distributions used in obtaining critical values for the tests and confidence intervals vary

depending on the design and are given in each table’s caption.

Tables 3 and 4 summarize the results for the simulation based on Design 1 outlined above.

Table 3 reports results from estimation in the individual level data, while Table 4 contains

the results from estimation in data aggregated using the aggregation method of Amemiya

(1978) outlined in Section 2.2 and ignoring the first stage estimation of Cst.

19The dependent variable is the log of the weekly wage, so an impact of .02 represents an approximate

2% increase in weekly wages. This is the magnitude of the effect considered in Bertrand, Duflo, and

Mullainathan (2004). The reported power is also the raw rejection frequency, not size-adjusted power,

so the most meaningful comparisons are across tests with similar sizes.

19

The results in Table 3 clearly illustrate the potential pitfalls in using individual level

data with aggregate level variables. As expected, the uncorrected OLS estimates have large

size distortions for moderate T , though the size distortion is modest when T = 6. The

rejection rates for a 5% level test are 0.594 with T = 23, 0.398 with T = 12, and 0.072

with T = 6. Mirroring results from Bertrand, Duflo, and Mullainathan (2004), I also find

that, for T = 12 and T = 23, tests which allow for correlation within state-year cells but

not over time suffer from severe size distortions, but that tests based on OLS with standard

errors clustered at the state level remove much of the distortion, rejecting 7.8% of the

time for a 5% level test in both cases. For T = 23, the tests based on parametric FGLS

with bias-corrected coefficients also remove much of the size distortion, producing similar

rejection rates to tests based on OLS with clustered standard errors. For T = 12, the FGLS

estimates remain more distorted than the OLS-based test using robust standard errors,

though the robust FGLS tests have similar size to the robust OLS-based test. As would be

anticipated, all the FGLS-based tests, including those which use robust standard errors, do

have substantially more power against an alternative of .02 than the test using OLS and

clustering standard errors at the state level. In addition, the confidence intervals of OLS

with standard errors clustered by state are substantially longer than the FGLS intervals.

It is interesting that, with T = 6, serial correlation does not appear to play much of a role.

In this case, none of the size distortions are large, and the test using the random effects

estimator has correct size and good power relative to the other tests.

The results summarized in Table 4 follow a similar pattern to those in Table 3, though

in most cases the size distortions are smaller. In general, tests based on OLS with clustered

standard errors, tests based on bias-corrected FGLS, and tests based on FGLS with robust

standard errors have similar size. However, the FGLS tests are more powerful against

the alternative that β1 = .02 and have shorter confidence intervals. In many cases, tests

based on FGLS with bias-corrected AR parameters and robust standard errors are more

size distorted than the corresponding tests without robust standard errors. This distortion

seems likely to be due to the small sample bias of the robust standard errors discussed in Bell

and McCaffrey (2002) and illustrated in Table 2. Also, as in Table 3, serial correlation does

not seem to pose a serious problem to inference with T = 6. In this case, the unadjusted

OLS has correct size as does the OLS test which uses clustered standard errors. Finally, it

is interesting that FGLS estimation using a variance matrix which is unconstrained within

states (“FGLS-U”) does poorly in all cases. While this is unsurprising for moderate T ,

the poor performance with T = 6 suggests that even with a reasonably short time series

dimension the added variability induced by estimating an unconstrained variance matrix

poses a serious problem for inference. Also, comparing across Tables 3 and 4, it appears

that the loss of efficiency due to aggregating is small and that tests performed in the

aggregate data suffer from smaller size distortions, suggesting that performing inference in

the aggregate data may be preferable to using the individual level data.

20

Table 5 summarizes the results from the simulation models based on Design 2. These

data are simulated without taking into account estimation of Cst and so are representative

of standard panel data. The results follow the same general pattern of those presented in

Table 4, though the sizes are generally closer to the actual size of the test. In particular, the

results show a substantial bias in the uncorrected OLS tests which is largely eliminated by

clustering or the use of FGLS with bias-corrected AR coefficients. A comparison of the power

and interval lengths of FGLS and OLS with clustered standard errors clearly demonstrates

the large potential efficiency gain to using FGLS, and the results also indicate that the use

of an unconstrained variance matrix is problematic even for small T .

Overall, the simulation results support the use of FGLS methods for performing inference

in the type of models examined here. Tests based on bias-corrected FGLS do not appear

to be substantially more size-distorted than the OLS tests with standard errors robust

to arbitrary correlation within states but have much higher power and shorter confidence

intervals in the majority of cases. This improved performance also appears to hold when

estimation is performed with FGLS and robust standard errors are used, though in some

cases this does result in a larger size distortion to the test. It would be interesting to see if

performance in these cases could be further improved using the bias-reduction and degrees

of freedom corrections outlined in Bell and McCaffrey (2002).

5. Conclusion

Many policy analyses rely on data which vary at both the individual and aggregate

level. The grouped structure of the data gives rise to many potential sources of correlation

between individual observations. In particular, the presence of group level shocks will result

in correlation among all individuals within a group. In addition, if groups are followed over

time, correlation between individuals in the same group at different times may arise due

to serial correlation in the group level shock. While there are numerous solutions to the

first source of correlation, relatively little attention has been paid to the potential problems

which may be caused by the second. Bertrand, Duflo, and Mullainathan (2004) illustrate

that serial correlation in the group level shock may cause conventional tests to be highly

misleading, and offer several OLS-based strategies which yield tests with correct size, but

have low power against relevant alternatives.

In this paper, I explore FGLS estimation in data with a grouped structure where the

groups may be autocorrelated and present a simple method for obtaining the FGLS esti-

mates which will be valid as the number of individual observations within each aggregate cell

grows large. I then focus on the case where the group level shock follows an AR(p) process.

In this case, standard estimates of the AR coefficients will typically be biased due to the

incidental parameters problem. I offer a simple bias correction for these coefficients which

will be valid in the presence of fixed effects or other variables with coefficients that vary at21

the group level. The usefulness of FGLS and the derived bias-correction for the AR param-

eters is demonstrated through a simulation study based on data from the CPS-MORG. The

simulation results show that the proposed bias-correction removes a substantial portion of

the bias from the AR parameter estimates. The results also demonstrate that tests based

on FGLS using bias-corrected AR parameter estimates have approximately correct size. In

addition, the simulations confirm that the FGLS-based tests have much higher power and

yield much shorter confidence intervals than their OLS-based counterparts.

Appendix

Note that proofs of all propositions are available in an additional Technical Appendix from the

author upon request.

In addition to the notation defined in Section 3, let xsth be the hth element of xst so that

x′

st = [xst1, . . . , xstk1], and define zsth similarly. Define

x′

st = x′

st − z′st(Z′

sZs)−1Z ′

sXs, (17)

and Xs = [xs1, . . . , xsT ]′.

To establish the asymptotic properties of the estimators of α, I impose the following conditions

in addition to model (12).

Assumption 2. Suppose the data are generated by model (12) and

i. vst = v−′

st α + ηst, where ηst is strictly stationary in t for each s, E[η2st] = σ2

η, E[ηstηsτ ] = 0

for t 6= τ , and the roots of 1 − α1z − α2z2 − . . . − αpz

p = 0 have modulus greater than 1.

ii. Xs, Vs, ηs are iid across s. Zs are nonstochastic and identical across s.

iii. E[Vs|Xs] = 0, E[VsV′

s |Xs] = Γ(α).

iv. (i) Rank(∑T

t=1E[xstx

′

st]) = Rank(E[X ′

sXs]) = k1. (ii) Rank(Z ′

sZs) = k2 ∀ s.

v. E[η4st] = µ4 < ∞ and E[x4

sth] ≤ ∆ < ∞ ∀ s, t, h.

vi. (i) [X, Z], where X = [X ′

1, . . . , X′

S ]′ and Z = diag(Z1, ..., ZS) has full rank. (ii) Z ′

sZs is

uniformly positive definite with minimum eigenvalue λs ≥ λ > 0 for all s.

vii. Xst, Vst, ηst is α-mixing of size −3rr−4

, r > 4, and z2ith ≤ ∆ < ∞, E|x2

ith|r+δ ≤ ∆ < ∞, and

E|η2it|r+δ ≤ ∆ < ∞ for some δ > 0 and all i, t, h.

Remark A.1. Conditions (i)-(v) are used for establishing results as S → ∞ with T fixed, and

conditions (i)-(iii), (vi), and (vii) are used to establish results as S, T → ∞ jointly. The majority

of the conditions imposed in Assumption 2 are standard for fixed effects panel models, with the

key difference being the imposition of the AR(p) structure on the error term. In addition, the

conditions are quite strong in that they rule out intertemporal heteroskedasticity and also require

full stationarity of initial observations, neither of which is innocuous in this context. Note that

existence of absolute moments of order 2(r + δ) and strict stationarity of ηst in (vii) imply the

existence of absolute moments of order 2(r + δ) for vst under the stationarity condition (i).

Remark A.2. A simple modification of model (12) is necessary for the results to accommodate

trends in asymptotics where T is not fixed. In particular, by redefining the coefficient on the trend

as βsT2h = Tβs

2h and the trend in each time period t as tT

, the trend becomes a uniform variable22

and the conditions in (vii) apply. It is straightforward to verify that the estimates of β1, α, and

other components of βs2 obtained with the transformed data are numerically identical to the original

estimates of β1, α, and βs2 . In addition, standard results for the coefficient on the trend are obtained

by considering βs2h − βs

2h = 1

T(βsT

2h − βsT2h ).

References

Alvarez, J., and M. Arellano (2003): “The Time Series and Cross-Section Asymptotics of DynamicPanel Data Estimators,” Econometrica, 71(4), 1121–1159.

Amemiya, T. (1978): “A Note on a Random Coefficient Model,” International Economic Review, 19(3),793–796.

Arellano, M. (1987): “Computing Robust Standard Errors for Within-Groups Estimators,” Oxford Bul-letin of Economics and Statistics, 49(4), 431–434.

Bell, R. M., and D. F. McCaffrey (2002): “Bias Reduction in Standard Errors for Linear Regressionwith Multi-Stage Samples,” Mimeo RAND.

Bertrand, M., E. Duflo, and S. Mullainathan (2004): “How Much Should We Trust Differences-in-Differences Estimates?,” Quarterly Journal of Economics, 119(1), 249–275.

Bhargava, A., L. Franzini, and W. Narendranathan (1982): “Serial Correlation and the Fixed EffectsModel,” Review of Economic Studies, 49, 533–549.

Dickens, W. T. (1990): “Error Components in Grouped Data: Is It Ever Worth Weighting?,” Review ofEconomics and Statistics, 72(2), 328–333.

Donald, S., and K. Lang (2001): “Inference with Difference in Differences and Other Panel Data,”Mimeo.

Hahn, J., and G. M. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic PanelModel with Fixed Effects When Both N and T Are Large,” Econometrica, 70(4), 1639–1657.

Hahn, J., and W. K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear PanelModels,” Econometrica, 72(4), 1295–1319.

Hansen, C. B. (2004): Inference in Linear Panel Data Models with Serial Correlation and an Essay on theImpact of 401(k) Participation on the Wealth Distribution. Ph.D. Dissertation, Massachusetts Institute ofTechnology.

Hausman, J., and G. Kuersteiner (2003): “Differences in Differences Meets Generalized Least Squares:Higher Order Properties of Hypothesis Tests,” Mimeo.

Kezdi, G. (2002): “Robust Standard Errors Estimation in Fixed-Effects Panel Models,” Mimeo.Kiefer, N. M. (1980): “Estimation of Fixed-Effects Models for Time Series of Cross-Sections with ArbitraryIntertemporal Covariance,” Journal of Econometrics, 14, 195–202.

Kiviet, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel DataModels,” Journal of Econometrics, 68, 53–78.

Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666.

Liang, K.-Y., and S. Zeger (1986): “Longitudinal Data Analysis Using Generalized Linear Models,”Biometrika, 73(1), 13–22.

Macurdy, T. E. (1982): “The Use of Time Series Processes to Model the Error Structure of Earnings in aLongitudinal Data Analysis,” Journal of Econometrics, 18(1), 83–114.

Moulton, B. R. (1986): “Random Group Effects and the Precision of Regression Estimates,” Journal ofEconometrics, 32(3), 385–397.

Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49(6), 1417–1426.Pfanzagl, J., and W. Wefelmeyer (1978): “A Third-Order Optimum Property of the Maximum Like-lihood Estimator,” Journal of Multivariate Analysis, 8, 1–29.

Rothenberg, T. J. (1984): “Approximate Normality of Generalized Least Squares Estimates,” Economet-rica, 52(4), 811–825.

Solon, G. (1984): “Estimating Autocorrelations in Fixed Effects Models,” NBER Technical Working PaperNo. 32.

Wooldridge, J. M. (2003): “Cluster-Sample Methods in Applied Econometrics,” American EconomicReview, 93(2), 133–188.

23

Table 1. Bias and MSE of α and αBC with Fixed Effects

AR(1) Model AR(2) Model

α1 = 0.8 α1 = 0.80, α2 = 0

S T bα bα(1) bα(∞) bα1 bα2 bα(1)1 bα(1)

2 bα(∞)1 bα(∞)

2

51 23 -0.099 -0.013 -0.003 -0.061 -0.058 -0.009 -0.008 -0.002 -0.002

(0.010) (0.0008) (0.0007) (0.005) (0.004) (0.0012) (0.0011) (0.0012) (0.0011)

[0] [0]

204 23 -0.098 -0.011 -0.001 -0.060 -0.056 -0.008 -0.007 -0.001 -0.00003

(0.010) (0.0003) (0.0002) (0.004) (0.003) (0.0003) (0.0003) (0.0003) (0.0003)

[0] [0]

1020 23 -0.096 -0.010 -0.0001 -0.059 -0.056 -0.007 -0.007 0.0001 -0.0001

(0.009) (0.0001) (0.00003) (0.004) (0.003) (0.0001) (0.0001) (0.0001) (0.0001)

[0] [0]

51 12 -0.216 -0.052 -0.010 -0.149 -0.129 -0.045 -0.033 -0.009 -0.002

(0.048) (0.005) (0.003) (0.025) (0.018) (0.005) (0.003) (0.003) (0.003)

[0] [0]

204 12 -0.210 -0.044 -0.002 -0.142 -0.127 -0.039 -0.031 -0.002 -0.0003

(0.044) (0.002) (0.001) (0.021) (0.017) (0.002) (0.002) (0.001) (0.001)

[0] [0]

1020 12 -0.208 -0.043 -0.0003 -0.141 -0.127 -0.037 -0.031 -0.0002 -0.0001

(0.044) (0.002) (0.0001) (0.020) (0.016) (0.002) (0.001) (0.0001) (0.0001)

[0] [0]

51 6 -0.491 -0.201 -0.023 -0.424 -0.325 -0.246 -0.164 -0.036 -0.020

(0.244) (0.046) (0.010) (0.187) (0.109) (0.068) (0.033) (0.012) (0.009)

[30] [67]

204 6 -0.484 -0.192 -0.005 -0.418 -0.321 -0.238 -0.158 -0.005 0.0007

(0.235) (0.038) (0.003) (0.176) (0.104) (0.058) (0.027) (0.003) (0.003)

[0] [4]

1020 6 -0.481 -0.189 -0.0007 -0.415 -0.322 -0.234 -0.159 -0.0006 0.0001

(0.232) (0.036) (0.0006) (0.172) (0.104) (0.055) (0.026) (0.0005) (0.0005)

[0] [0]

Monte Carlo results for simulation model based on aggregate CPS-MORG data with fixed effects. S is thenumber of aggregate cross-sectional observations, and T denotes the number of time series observations.S = 51 and T = 23 were chosen to match CPS data, and remaining sample sizes were chosen to explorebehavior for small T and large S. bα, bα(1), and bα(∞) are, respectively, least squares, one-step bias-correctedleast squares, and iteratively bias-corrected least squares estimates of the autocorrelation coefficients. MSEis given in parentheses, and the number of times α−1

T (bα) failed to exist in the unit interval is given inbrackets. The number of simulations is 1000.

24

Table 2. Estimated Variance of Treatment Parameter in Simulated AR(1) Model with α = .8.

S = 51, T = 23 S = 51, T = 12 S = 51, T = 6bσ2 σ2a σ2

s bσ2 σ2a σ2

s bσ2 σ2a σ2

s

A. Conventional Inference Methods

OLS 0.024 0.108 0.107 0.035 0.101 0.109 0.046 0.078 0.081

OLS - Cluster by state 0.102 0.108 0.107 0.096 0.101 0.109 0.075 0.078 0.081

FGLS-U 0.020 0.037 0.062 0.030 0.040 0.049 0.038 0.044 0.051

AR(1) 0.034 0.037 0.039 0.038 0.040 0.046 0.045 0.044 0.060

AR(1) - Cluster by state 0.037 0.037 0.039 0.044 0.040 0.046 0.054 0.044 0.060

AR(2) 0.034 0.037 0.040 0.038 0.040 0.046 0.045 0.044 0.060

AR(2) - Cluster by state 0.038 0.037 0.040 0.045 0.040 0.046 0.055 0.044 0.060

B. FGLS with Bias-Corrected AR Coefficients

AR(1)bc 0.037 0.037 0.037 0.040 0.040 0.038 0.044 0.044 0.046

AR(1)bc - Cluster by state 0.035 0.037 0.037 0.039 0.040 0.038 0.041 0.044 0.046

AR(2)bc 0.037 0.037 0.037 0.040 0.040 0.038 0.044 0.044 0.046

AR(2)bc - Cluster by state 0.035 0.037 0.037 0.038 0.040 0.038 0.041 0.044 0.046

Monte Carlo results for simulation model based on aggregate CPS-MORG data. Data are simulated usingDesign 2 discussed in the text. S is the number of aggregate cross-sectional observations, and T denotesthe number of time series observations. S = 51 and T = 23 correspond to the full sample of CPS data, andremaining sample sizes were chosen to explore behavior for small T . Results are for the variance of thetreatment parameter, β1, only. bσ2 is 1000 times the mean of the estimated variances, σ2

a is 1000 times theasymptotic variance, and σ2

s is 1000 times the variance of the β1’s estimated in the simulation. Thenumber of simulations is 1000.

25

Table 3. Monte Carlo Results from CPS-MORG Microdata

S = 51, T = 23 S = 51, T = 12 S = 51, T = 6

Size Power Length Size Power Length Size Power Length


OLS 0.594 0.860 0.006 0.398 0.866 0.009 0.072 0.850 0.012

(0.022) (0.016) (0.022) (0.015) (0.012) (0.016)

OLS - Cluster 0.078 0.354 0.024 0.078 0.548 0.019 0.064 0.798 0.013

by state (0.012) (0.021) (0.012) (0.022) (0.011) (0.018)

Random Effects - 0.392 0.778 0.010 0.222 0.870 0.011 0.050 0.808 0.013

state × year (0.022) (0.019) (0.019) (0.015) (0.009) (0.018)

OLS - Cluster by 0.398 0.766 0.011 0.280 0.804 0.011 0.076 0.844 0.012

state × year (0.022) (0.019) (0.020) (0.018) (0.012) (0.016)


AR(2)bc 0.078 0.714 0.015 0.152 0.880 0.012 0.050 0.804 0.014

(0.012) (0.020) (0.016) (0.015) (0.010) (0.018)

AR(2)bc - Cluster 0.078 0.728 0.016 0.070 0.880 0.015 0.058 0.792 0.014

by state (0.012) (0.020) (0.011) (0.015) (0.010) (0.018)

AR(3)bc 0.076 0.738 0.015 0.142 0.888 0.012

(0.012) (0.020) (0.016) (0.014)

AR(3)bc - Cluster 0.076 0.744 0.015 0.072 0.774 0.015

by state (0.012) (0.020) (0.012) (0.019)

Monte Carlo results for simulation model using individual CPS-MORG data. Data are simulated usingDesign 1 discussed in the text. Data are manufactured by resampling states from actual CPS-MORG data.S is the number of aggregate cross-sectional observations, and T denotes the number of time seriesobservations. S = 51 and T = 23 correspond to the full sample of CPS data, and remaining sample sizeswere chosen to explore behavior for small T . Results are for the treatment parameter which enters themodel with a true coefficient of β1 = 0. Size and power are for 5% level tests, and power is versus thealternative that β1 = .02. Length is the 95% confidence interval length divided by two. Interval lengths arebased on a t50 for tests with standard errors clustered at the state level, a tST−1 for tests with standarderrors clustered at the state × year level, and a N(0, 1) for the remaining tests. Simulation standard errorsare reported in parentheses. (For interval length, the simulation standard error is negligible, so it is notreported.) The number of simulations is 500.

26

Table 4. Monte Carlo Results from Aggregate CPS-MORG Data

S = 51, T = 23 S = 51, T = 12 S = 51, T = 6



OLS 0.374 0.762 0.011 0.158 0.846 0.012 0.052 0.730 0.015

(0.022) (0.019) (0.016) (0.016) (0.010) (0.020)

OLS - Cluster 0.066 0.344 0.025 0.062 0.656 0.017 0.052 0.744 0.015

by state (0.011) (0.021) (0.011) (0.021) (0.010) (0.020)

FGLS-U 0.362 0.962 0.007 0.112 0.932 0.010 0.086 0.804 0.013

(0.021) (0.009) (0.014) (0.011) (0.013) (0.018)

AR(2) 0.104 0.772 0.014 0.116 0.844 0.014 0.080 0.774 0.014

(0.014) (0.019) (0.014) (0.016) (0.012) (0.019)

AR(2) - Cluster 0.064 0.662 0.016 0.062 0.726 0.016 0.048 0.716 0.015

by state (0.011) (0.021) (0.011) (0.020) (0.010) (0.020)

AR(3) 0.106 0.770 0.014 0.132 0.834 0.013

(0.014) (0.019) (0.015) (0.017)

AR(3) - Cluster 0.072 0.688 0.016 0.068 0.708 0.016

by state (0.012) (0.021) (0.011) (0.020)


AR(2)bc 0.080 0.750 0.015 0.064 0.814 0.014 0.042 0.720 0.015

(0.012) (0.019) (0.011) (0.017) (0.009) (0.020)

AR(2)bc - Cluster 0.062 0.754 0.015 0.062 0.800 0.014 0.060 0.734 0.015

by state (0.011) (0.019) (0.011) (0.018) (0.011) (0.020)

AR(3)bc 0.064 0.766 0.014 0.066 0.810 0.014

(0.011) (0.019) (0.011) (0.018)

AR(3)bc - Cluster 0.076 0.788 0.014 0.060 0.808 0.014

by state (0.012) (0.018) (0.011) (0.018)

Monte Carlo results for simulation model using aggregate CPS-MORG data. Data are simulated usingDesign 1 discussed in the text. Data are manufactured by resampling states from actual CPS-MORG dataand aggregating data to the state-year level using the method of Amemiya (1978) outlined in the text. S isthe number of aggregate cross-sectional observations, and T denotes the number of time seriesobservations. S = 51 and T = 23 correspond to the full sample of CPS data, and remaining sample sizeswere chosen to explore behavior for small T . Results are for the treatment parameter which enters themodel with a true coefficient of β1 = 0. Size and power are for 5% level tests, and power is versus thealternative that β1 = .02. Length is the 95% confidence interval length divided by two. Interval lengths arebased on a t50 for tests with standard errors clustered at the state level and a tST−S−T for the remainingtests. Simulation standard errors are reported in parentheses. (For interval length, the simulation standarderror is negligible, so it is not reported.) The number of simulations is 500.

27

Table 5. Monte Carlo Results from Simulated AR(1) Model with α = .8.

S = 51, T = 23 S = 51, T = 12 S = 51, T = 6



OLS 0.356 0.842 0.010 0.285 0.794 0.012 0.141 0.730 0.013

(0.015) (0.011) (0.014) (0.013) (0.011) (0.014)

OLS - Cluster 0.056 0.490 0.020 0.065 0.520 0.020 0.058 0.574 0.017

by state (0.007) (0.016) (0.008) (0.016) (0.007) (0.016)

FGLS-U 0.269 0.915 0.009 0.121 0.900 0.011 0.091 0.834 0.012

(0.014) (0.008) (0.011) (0.009) (0.009) (0.012)

AR(1) 0.078 0.914 0.011 0.077 0.878 0.012 0.083 0.786 0.013

(0.008) (0.009) (0.008) (0.011) (0.008) (0.013)

AR(1) - Cluster 0.057 0.884 0.012 0.059 0.830 0.013 0.064 0.728 0.015

by state (0.007) (0.010) (0.008) (0.012) (0.008) (0.014)

AR(2) 0.080 0.912 0.011 0.081 0.874 0.012 0.093 0.778 0.013

(0.008) (0.009) (0.008) (0.011) (0.009) (0.013)

AR(2) - Cluster 0.062 0.882 0.012 0.060 0.828 0.013 0.064 0.724 0.015

by state (0.008) (0.010) (0.008) (0.012) (0.008) (0.014)


AR(1)bc 0.056 0.908 0.012 0.044 0.880 0.012 0.062 0.818 0.013

(0.007) (0.009) (0.006) (0.011) (0.008) (0.012)

AR(1)bc - Cluster 0.061 0.904 0.012 0.050 0.874 0.012 0.069 0.822 0.013

by state (0.008) (0.009) (0.007) (0.011) (0.008) (0.012)

AR(2)bc 0.057 0.904 0.012 0.044 0.882 0.012 0.066 0.822 0.013

(0.007) (0.009) (0.006) (0.010) (0.008) (0.012)

AR(2)bc - Cluster 0.065 0.900 0.012 0.049 0.880 0.012 0.070 0.824 0.013

by state (0.008) (0.009) (0.007) (0.011) (0.008) (0.012)

Monte Carlo results for simulation model based on aggregate CPS-MORG data. Data are simulated usingDesign 2 discussed in the text. S is the number of aggregate cross-sectional observations, and T denotesthe number of time series observations. S = 51 and T = 23 correspond to the full sample of CPS data, andremaining sample sizes were chosen to explore behavior for small T . Results are for the treatmentparameter which enters the model with a true coefficient of β1 = 0. Size and power are for 5% level tests,and power is versus the alternative that β1 = .02. Length is the 95% confidence interval length divided bytwo. Interval lengths are based on a t50 for tests with standard errors clustered at the state level and atST−S−T for the remaining tests. Simulation standard errors are reported in parentheses. (For intervallength, the simulation standard error is negligible, so it is not reported.) The number of simulations is 1000.

28

Documents

GENERALIZED LEAST SQUARES INFERENCE IN …faculty.chicagobooth.edu/christian.hansen/research/clus_fgls_joe... · GENERALIZED LEAST SQUARES INFERENCE IN PANEL AND MULTILEVEL MODELS