fix

Practical fixed effects estimation methods for the

three-way error components model

Martyn Andrews

University of Manchester

Thorsten Schank

Universitat ErlangenNurnberg

Richard Upward

University of Nottingham

July 21, 2005

Abstract

Methods for fixed effects estimation of the three-way error component model arenot yet standard. In this paper, we make the fixed effects methods developed originallyby Abowd, Kramarz & Margolis (1999) for linked worker-firm data more accessible,where possible, and show how they can be implemented in Stata. There is a caveatto these methods, however. If the researcher wishes to recover estimates of the errorcomponents themselves, and the number of units at the higher level of aggregation islarge, memory or matrix constraints may make estimation of the components them-selves infeasible using Stata.

1 Introduction

Panel data with three or more dimensions of variation are increasingly available to re-searchers in various fields. One might have data on workers observed over time and alsothe firms in which they work. Alternatively, one might have children in schools or patientsin hospitals. These data are also commonly described as being multilevel or hierarchical.Note that, in the examples given here, the lower-level units (workers, children, patients)are not merely nested in the higher-level. They may also move between the higher-levelunits. Workers change their employer, for example.

In the econometric literature the linear error components model is frequently used forpanel data. If the fixed (over time) error components are assumed to be uncorrelated withobserved explanatory variables then a random effects estimator may be used. These maybe estimated in Stata in a variety of ways, including the standard xtreg command forthe two-way model and the new xtmixed command for models with a more complicatedhierarchical structure. However, in many cases it may not be desirable to impose the

1

assumption that the error components are uncorrelated with the observed explanatoryvariables, in which case fixed effects methods are required to estimate the parameters ofinterest.

It is well known that, in the linear model, fixed error components can be modelled us-ing dummies, for example individuals, firms and time. It is also well known that, in atwo-way model with individual and time dummies, algebraic solutions are available forthe estimation of all parameters of interest, including those associated with both sets ofdummies Baltagi (2005, Section 3.2). However, in the case of data strutures such as thoseconsidered here, there is no algebraic transformation which sweeps away all the fixed errorcomponents in one go, and which allows them to be recovered subsqeuently.

Although there is a growing literature in economics, much of it based on the work ofAbowd, Kramarz & Margolis (AKM), the analysis of data with three dimensions of vari-ation (such as linked employer-employee panel data) is not yet routine. There are variouseconometric issues to overcome, which mean that routine techniques and packages cannotbe used. AKMs papers suggest these issues are quite technical. The objective of thispaper, therefore, is to make these fixed effects methods more accessible, where possible,and then show how they can be implemented in Stata.

To illustrate these methods we focus on a dataset of workers who are observed annually,together with the identity of the firms they work for. In Section 2, we set out the genericmodel that best represents the econometrics of fixed effects models using these type ofdata. In Section 3 we describe the various methods that can be used to estimate thisgeneric model. In Section 4, we present some illustrative results using an example datasetfrom the Institut fur Arbeitsmarkt und Berufsforschung (IAB) in Germany.

2 A generic model

Consider the following linear three-way error components model:

yit = xit +wjt + ui + qj+ i + j + t + it (1)

Workers are indexed i = 1, . . . , N . They are observed once per period t = 1, . . . , Ti in firmj = 1, . . . , J . Workers can change firms over time. yit is the dependent variable, xit and uiare vectors of observable i-level covariates and wjt and qj are vectors of observable j-levelcovariates. Both workers and firms are assumed to enter and exit the panel, which meanswe have an unbalanced panel with Ti observations per worker. There are N =

Ni=1 Ti

observations (worker-periods) in total.

The error components (or unobserved heterogeneities) comprise i for the worker and jfor the firm. The third component, t, represents the unobserved time effect. The error

2

components may be correlated with each other and with any of the observable covari-ates. We assume it is strictly exogenous, which implies, inter alia, that workers mobilitydecisions are independent of it.

Note that both i and ui are variables that are time-invariant for workers and similarly jand qj are fixed over time for firms. xit, on the other hand, varies across i and t, and wjtvaries across j and t. However, because the data are recorded at the (i, t) level, firm-levelcovariates also vary at that level. Thus we should strictly write wJ(i,t)t and qJ(i,t), whereJ(i, t) is the function which maps worker i to firm j at time t.

Equation (1) therefore contains all four possible types of covariate which a researcher mighthave about workers and firms. There are K observed covariates in total.

We emphasise that it is usual to assume that the heterogeneity terms i and j arecorrelated with the observables. This means that random effects methods are inconsistent,and so fixed effects methods are needed to estimate the parameters of interest. This, inturn, means that [,], the parameter vector associated with the time-invariant variables,is not identified. Rather than dropping [ui,qj ], it is usual to define

i i + ui (2)

andj j + qj (3)

givingyit = xit +wjt + i + j + it. (4)

In the next section we describe how to estimate the parameters of Equation (4) usingvarious fixed effects methods. We assume throughout that the unobserved time componentt is to be treated as fixed and estimated directly by using time dummies.1 This is why wehave dropped t from (4); we have subsumed these time dummies into one of the vectorsof observable covariates. This means that, hereafter, we are essentally analysing a two-waymodel.

3 Econometric methods

3.1 Spell Fixed Effects

If one is not interested in the estimates of i and j themselves, or in estimating theparameters on the time-invariant variables ui and qj , consistent estimates of and fromEquation (4) are straightforward to obtain by taking differences or by time-demeaning

1This will always be practical so long as the time dimension of the panel is relatively short, which itusually is with these kinds of data.

3

within each unique worker-firm combination (or spell). This is because for each spellof a worker within a firm neither i nor j vary. Defining s i + j as spell-levelheterogeneity, which is swept out by subtracting averages at the spell-level, both i andj have disappeared:

yit ys = (xit xs) + (wjt ws) + (it s). (5)

The effects of ui and qj are not identified, because ui us = 0 and qj qs = 0. Inaddition, any variable xit or wjt which is constant within a spell will also not be identified.One observation per spell is used up in identifying each spell fixed effect.

This is essentially the first method that AKM discuss (Abowd et al. 1999, Section 3.3),except they use differences rather than mean-deviations. We call this method Spell FixedEffects or FE(s). Because one cannot separate the worker and firm heterogeneities, AKMdo not pursue this method further. Given estimates s, one cannot recover i and j .

It is worth emphasising, however, that for many researchers this spell fixed effects methodis a practical and simple solution which does not present any computational difficulty,providing the researcher is not interested in analysing the heterogeneity post-estimation.It is trivial to implement in Stata using the xtreg, fe command.

3.2 Least squares dummy variables (LSDV)

The spell fixed effects method outlined above is not useful if one wishes to recover estimatesof i and j . This might be important if one wants to analyse these terms themselves,or if one wants to recover estimates of and using Equations (2) and (3) respectively.An alternative is to use a Least Squares Dummy Variables (LSDV) estimator. The LSDVestimates of i are inconsistent, although unbiased.2 The estimates of j have the sameproperties as those of [,].

However, direct estimation of (4) using dummy variables when the dataset is large will notusually be feasible, since this is a model with approximately K + N + J parameters. Inthe two-way model this problem is circumvented by using the within transformation whichsweeps out the i-level heterogeneity. But in this model there is no algebraic transformationof the observables that sweeps away both heterogeneity terms in one go and which allowsthem to be recovered subsequently. This is because of the lack of patterning betweenworkers and the firms they work for.3 To circumvent this problem, AKM note that ex-plicitly including dummy variables for the firm heterogeneity, but sweeping out the workerheterogeneity algebraically, gives exactly the same solution as the LSDV estimator.4

2See Wooldridge (2002, ch. 10) for assumptions and properties of panel data models.3More precisely, sort the data by workers and the firm dummies are unpatterned; sort the data by firms,

and the worker dummies are unpatterned.4In linear models, there is no distinction between removing the heterogeneity algebraically or adding

two full sets of dummy variables, for workers and firms, and so the terminology LSDV applies to both.

4

More precisely, the researcher must generate a dummy variable for each firm:

F jit = 1(J(i, t) = j) j = 1, . . . , J,

where 1( ) is the dummy variable indicator function and the function J(i, t) = j mapsworker i at time t to firm j. Now substitute

J(i,t) =J

j=1

jFjit (6)

into Equation (4). The i are removed by time-demeaning (or differencing) over i:

yit yi = (xit xi) + (wjt wi) +J

j=1

j(Fjit F ji ) + it. (7)

This means that J de-meaned (or differenced) firm dummies actually need creating.5 Todistinguish this estimator from LSDV above, we label this estimator FEiLSDVj. They areidentical estimators, but differ in how they are computed.

It is worth emphasising that firm dummies are no different from any multi-categorydummy, so long as workers can move from one category to another over time (eg re-gion dummies, but not ethnicity dummies). F jit F ji will be zero for all J dummies forany worker i who does not change firm. Furthermore, if we have a sample of firms (ratherthan the population) F jit F ji will only be non-zero for workers who change from one firmwithin the sample to another firm in the sample. This means that in many cases onlya tiny proportion of workers have any non-zero terms. Identification of j is driven bythe total number of such movers in each firm j. Some small firms may have no movers,in which case j is not identified. Other small firms may have only a very few movers,in which case estimates of j will be very imprecise. This means that it may be not besensible to estimate j for small firms, and instead one should group small firms together(this is what AKM and others do.)

There are two potential computational problems with this estimator. The first is thenumber of firms J , because the software needs to invert a matrix of dimension (K + J)(K+J). For many applications, the number of firms is sufficiently small that FEiLSDVj iscomputationally feasible. However, some datasets have tens of thousands of firms, or evenhundreds of thousands. The second problem is the requirement that one must create andstore J mean-deviations forN observations, meaning that the data matrix isN(K+J).This may be prohibitively large for software packages which store all data in memory, suchas Stata.

5Differencing is ignored hereafter. There are various reasons why it is easier to implement the covariancetransformation. Normally, the decision whether to estimate the model in first differences or use thecovariance transform depends on which give the more efficient estimates. Both estimators are consistent.See Wooldridge (2002, Section 10.6.3).

5

Some improvement in the storage efficiency of the J mean-deviated firm dummies can beachieved in Stata by using the lowest common multiple of all values of Ti if the panel issufficiently short. For example, if the data span a maximum of 5 years then Ti can beany value from [1, 2, 3, 4, 5]. Multiplying F jit F ji by the lowest common multiple (in thiscase 60) yields a set of integers which can be stored in Stata as single bytes rather than 4-or 8-byte fractions.6 To implement this method the researcher would need to create themean deviations manually and use OLS on the transformed data, rather than relying onxtreg.

The memory requirements of the data matrix for the FEiLSDVj estimator are then ap-proximately (NJ) + 4[N(K + 1)] bytes. We require NJ bytes for the mean-deviatedfirm dummies and 4[N(K + 1)] bytes for the remaining K explanatory variables and thedependent variable, assuming each is stored as 4-bytes.

3.3 A Classical Minimum Distance (CMD) method

The FEiLSDVj method requires the inversion of a potentially very large (K+J)(K+J)cross-product matrix, and, in addition, enough memory to store J mean-deviated firmdummies across N observations. To circumvent the second of these problems, we proposethe following method, based on the fact that only movers between firms identify firmeffects.

We separate the model into observations for movers and non-movers, subscripted by 1and 2 respectively. There are N1 mover observations and N2 non-mover observations.We then write Equation (4) in matrix notation, where each model is estimated separately:7

y1 = X11 + F11 + 1 (8)

y2 = X22 + 2. (9)

Note that y1, y2, X1, X2 and F1 have all been mean-deviated and defined viz y1 =MDy1etc, where MD I D(DD)1D. Denote the variances of the two error terms as 21and 22. We now drop all columns of F1 that are the zero vector, that is the J2 firms thathave no turnover. By definition, F2 0.Because there are often very few movers, eliminating F2 0 from the model means that,by estimating the model for movers and non-movers separately, the memory constraintsnoted above are sided-stepped. The Classical Minimum Distance (CMD) estimator formsa restricted estimator for and from 1, 2 and 1.8

In general, denote pi as the S 1 unrestricted parameter vector and denote as the6Storing the mean-deviated firm dummies as integers also appears to improve the accuracy of the matrix

inversion procedure.7We dispense with the distinction between xit variables and wjt variables.8See Wooldridge (2002, ch. 14.6) for further details.

6

P 1 restricted parameter vector. The constraint is pi = h(). In CMD estimation, oneestimates pi and then finds a such that the distance between pi and h() is minimised.An efficient CMD estimator uses any consistent estimator of asymptotic covariance matrixV to act as weighting matrix for the distance between pi and h(), denoted V. In otherwords, Efficient CMD solves:

min[pi h()]V1[pi h()],

whose solution is = (HV1H)1HV1pi,

when the mapping from to pi is linear: pi = H. Also, the appropriate estimator ofAvar() with which to conduct inference is

Avar() = [HAvar(pi)1H]1 = [HV1H]1.

A test of the validity of the restrictions is given by Wooldridge (2002, Eqn. 14.76):

[pi h()]V1[pi h()] 2(S P ).

For the model at hand, the constraint pi = H is written: 112

= IK 00 IJIK 0

(

)

where pi is (2K + J) 1, is (K + J) 1, and H is (2K + J) (K + J). The appropriateasymptotic covariance matrix is:

V =

[V1 0

0 V2

]=

21(

X1X1 X1F1F1X1 F1F1

)0

0 22 (X2X2)

.Given the general expressions immediately above, it follows that the restricted estimator = [HV1H]1HV1pi is given by:

=

(

)=

[V11 +

(V12 00 0

)]1 [V11

(1

1

)+ V12

(2

0

)](10)

and that

Avar() = [HV1H]1 =

[V11 +

(V12 00 0

)]1(11)

a (K + J) (K + J) matrix. It should be emphasised that these expressions use standard

7

(unrobust) covariance matrices. A robust version of this covariance matrix replaces V1and V2 in Equation (11) by robust equivalents. It is wrong to do this in (10), however,because if the true constraint could be imposed upon the model, one would not end upwith the LSDV estimator.

A standard criticism is that movers and non-movers are different groups of individualsand so one should model them separately. Before imposing H0 : 1 = 2, one should testthese restrictions, although this rarely happens. Under H0:(

1 1

)V11

(1 1

)+ (2 )V12 (2 ) 2(K). (12)

It should be emphasised that the only price paid with this approach is that one cannotconstrain 21 =

22. The only difference between this and the LSDV estimator is because

V11 and V12 come from separate regressions.

3.4 Post-estimation analysis of the error components

Once estimates of (,) have been made (either using FEiLSDVj or CMD methods), theresearcher can recover estimates of the error components themselves. First compute

J(i,t) =J

j=1

jFjit (13)

and theni = yi i xi wi (14)

where i averages J(i,t) over t for each i.

Identification of firm effects is only possible within a group, where a group is defined bythe movement of workers between firms. A group contains all the workers who have everworked for any of the firms in that group, and all the firms at which any of the workerswere employed. A second (unconnected) group is defined only if no firm in the first grouphas ever employed any workers in the second, and no firm in the second group has everemployed any workers in the first. It is not possible to identify one firm per group becausewithin each group the mean-deviated firm dummies sum to zero. Some normalisation istherefore required between groups. In Section 4 we show how to identify groups and howto perform this normalisation.

Identifying the effects of time-invariant variables

AKM suggest that one can recover estimates of i and j by estimating Equations (2, 3)as follows. Thus, one can analyse distributions of j , i, specifically to see whether they

8

are correlated. First, run the auxiliary regressions:

i = const + ui + error (15)

j = const + qj+ error (16)

which give consistent estimates of , (AKM 1999, Section 3.4.4). Because i is droppedfrom (2), the identifying assumption is that Cov(ui, i) = 0 or else there is omittedvariables bias. Similarly, Cov(qj , j) = 0 is assumed in (3). One only needs N observationsto estimate (2) and J observations to estimate (3). AKM estimate these equations byGLS, because of the aggregation to the firm-level. Because there are other causes ofheteroskedasticity, one could use OLS and adjust the covariance matrix for clustering atthe firm-level. Second, the researcher computes

i = i ui (17)j = j qj. (18)

and can be defined at three levels of aggregation:

i, t i replicated Ti times J(i,t)

i i i =Ti

t=1 J(i,t)/Ti

j j =

(it)j i/Nj j

(Nj is the total number of worker-years observed in firm j.) AKM show that statisticsbased on aggregating i and i to the level of the firm are consistent as Ti goes to infinityChamberlain (1984, see also).

4 An illustrative example

To illustrate these methods we use data from a linked worker-firm dataset made availableby the IAB. The firm data comprise a panel of 4,376 establishments (or plants) fromthe former West Germany observed over the period 19931997. The worker data comprisea panel of 1,930,260 workers who are employed in these plants. A common establishmentidentifier is available in both datasets, allowing them to be linked.9 After eliminating obser-vations with missing or incomplete information, the resulting linked dataset has 5,145,098

9Kolling (2000) provides more information on the IAB establishment panel, Bender, Haas & Klose(2000) has details on the worker data and Alda, Bender & Gartner (2005) has details on the linked data.

9

worker-year observations (the i, t level). For each row in the data the identity j of theplant is recorded.

Table 1: Sample sizesN N J S

Whole sample 5,145,098 1,930,260 4,376 1,953,774

Workers who move to other IABplants

72,253 23,393 1,821 46,907

Workers who dont move to otherIAB plants

5,072,845 1,906,867 4,376 1,906,867

Workers in plants with move-ment to other IAB plants

4,883,331 1,816,368 1,821 1,839,882

The first row of Table 1 reports the total sample size in terms of the total number of rowsin the data (N), the number of workers (N), the number of plants (J) and the number ofunique worker-firm combinations, or spells (S). Notice that the total number of spellsis only slightly greater than the total number of workers. This is a consequence of havinga sample of plants: a worker will be observed with more than one spell only if they movefrom one plant in the sample to another, which is actually very unlikely.

Identification of unobserved plant-effects is driven only by those workers who change plants.Thus an important sub-sample comprises those workers who have two or more spells(Si > 1) in IAB plants (IAB movers). There are only 23,393 of these movers, and theywork in 1,821 plants. However, those 1,821 plants employ over 1.8 million workers becausethey tend to be larger than plants who employ no IAB movers.

To illustrate our methods, we estimate log-wage equations using a a small set of covariatesof each type (x, w, q and u). The dependent variable yit is the log daily wage in Pfennigen.Table 2 gives the sample means for the relevant variables.

Table 2: Sample meansDescription Variable Variable Mean S.D.

type name

log daily wage in Pfennigen yit lw 9.763 0.278female ui female 0.213 0.409married xit married 0.624 0.484age xit age 39.643 10.539age2/100 xit age 2 16.827 8.628single-plant enterprise qj single 0.269 0.443log plant employment wjt lN 7.702 1.454(log plant employment)2 wjt lN 2 61.429 22.283

A simple OLS estimate of Equation (1) provides a useful benchmark. This estimate treatsi and j as part of the error term, while t is estimated using a dummy for each year.

10

. reg lw female married age age_2 single lN lN_2 year2-year5

Source SS df MS Number of obs = 5145098

F( 11,5145086) = .

Model 97058.6434 11 8823.51304 Prob > F = 0.0000

Residual 300561.3595145086 .058417169 R-squared = 0.2441

Adj R-squared = 0.2441

Total 397620.0035145097 .077281342 Root MSE = .2417

lw Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.1773087 .0002664 -665.56 0.000 -.1778308 -.1767865

married .0126403 .000244 51.81 0.000 .0121621 .0131185

age .0351288 .0000828 424.48 0.000 .0349666 .035291

age_2 -.0356353 .0000999 -356.55 0.000 -.0358312 -.0354395

single -.0317087 .0002526 -125.51 0.000 -.0322038 -.0312135

lN .0793973 .0004695 169.10 0.000 .0784771 .0803176

lN_2 -.0028624 .0000306 -93.53 0.000 -.0029224 -.0028024

year2 .0295777 .0003233 91.49 0.000 .0289441 .0302114

year3 .0723557 .0003402 212.66 0.000 .0716888 .0730226

year4 .0983264 .0003471 283.26 0.000 .0976461 .0990068

year5 .1078701 .0003587 300.75 0.000 .1071671 .108573

_cons 8.514487 .0023888 3564.36 0.000 8.509805 8.519169

We now wish to investigate whether any of these parameter estimates are likely to bebiased because of potential correlation between the unobserved error components and thevariables of interest.

It is straightforward to estimate two-way fixed-effects models by using the xtreg command.The within-i transformation eliminates the unobserved worker-level component of the errorterm, i:

11

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(i)

Fixed-effects (within) regression Number of obs = 5145098

Group variable (i): id Number of groups = 1930260

R-sq: within = 0.3029 Obs per group: min = 1

between = 0.1088 avg = 2.7

overall = 0.0964 max = 5

F(9,3214829) = 155200.61

corr(u_i, Xb) = -0.6637 Prob > F = 0.0000


female (dropped)

married .0061755 .0002202 28.05 0.000 .005744 .006607

age .0582696 .0001161 501.69 0.000 .058042 .0584973

age_2 -.0353585 .0001397 -253.12 0.000 -.0356323 -.0350847

single -.0049728 .0010414 -4.78 0.000 -.0070139 -.0029316

lN -.0016697 .0010828 -1.54 0.123 -.0037919 .0004526

lN_2 .0010928 .0000734 14.89 0.000 .0009489 .0012366

(output omitted )

sigma_u .35916844

sigma_e .06832495

rho .96507601 (fraction of variance due to u_i)

F test that all u_i=0: F(1930259, 3214829) = 31.69 Prob > F = 0.0000

Note that we cannot estimate a coefficient on female because it does not vary within-i.There are some significant changes in some of the parameter estimates. For example, theeffect of age has gone up from 0.035 to 0.058, while the effect of age2 remains the same.This means that the estimated quadratic wage-age profile is steeper, with a much olderturning point. This implies a negative correlation between i and age older workershave lower (unobserved) earning power. This is reflected in the large correlation (0.6637)between i and the estimated effect of all the covariates.

12

The within-j transformation eliminates the unobserved plant-level component, j :

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(j)


Group variable (i): idnum Number of groups = 4376


between = 0.0000 avg = 1175.8

overall = 0.1810 max = 102106

F(10,5140712) = 140884.11

corr(u_i, Xb) = 0.0100 Prob > F = 0.0000


female -.1779654 .0002496 -712.90 0.000 -.1784547 -.1774761

married .0247614 .0002188 113.16 0.000 .0243325 .0251902

age .0327271 .0000717 456.32 0.000 .0325865 .0328676

age_2 -.0336805 .0000866 -389.10 0.000 -.0338502 -.0335109

single (dropped)

lN -.0694038 .0031901 -21.76 0.000 -.0756563 -.0631512

lN_2 .0042823 .000225 19.03 0.000 .0038412 .0047233

(output omitted )

sigma_u .35985478

sigma_e .20663177



The coefficient estimates on female, married and age are all very similar to those fromthe OLS regression, suggesting that the correlation between individual-level variables andj is not very important. However, the wage effect of plant size is now quite different fromthose estimated by either the OLS or FE(i) models.

13

We now wish to eliminate both the unobserved worker- and plant-level error components.The simplest way to do this is to estimate Equation (5) using FE(s):

. egen s=group(i j)

.

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(s)


Group variable (i): s Number of groups = 1953774


between = 0.1119 avg = 2.6

overall = 0.0994 max = 5

F(8,3191316) = 173364.57

corr(u_i, Xb) = -0.6669 Prob > F = 0.0000


female (dropped)

married .0057338 .0002203 26.02 0.000 .005302 .0061657

age .0582328 .0001164 500.43 0.000 .0580048 .0584609

age_2 -.0350417 .0001396 -250.97 0.000 -.0353154 -.034768

single (dropped)

lN -.0102442 .0012252 -8.36 0.000 -.0126456 -.0078428

lN_2 .0021531 .0000847 25.43 0.000 .0019871 .002319

(output omitted )

sigma_u .36006301

sigma_e .067745



Note that by defining a spell s in this way we treat as a single spell all unique combinationsof i and j. Therefore a worker who has two periods of employment with employer A,separated by a period with employer B is treated as having just two spells in total.

If the correct model is given by Equation (4), and if the error components are correlatedwith the observed data then these estimates should be preferred to either of the standardFE or OLS estimates. In this example, estimates from FE(s) are generally very close tothose from FE(i). However, we are now unable to estimate either of the coefficients onfemale or single.

We now wish to estimate Equation (4), but we also want to subsequently recover ourestimates of i and j . If we had sufficient memory, we could use the LSDV methodsoutlined in Section 3.2. In our example we have N = 5, 145, 098, J = 1, 821 and K = 11,meaning that we require about 10GB of memory to proceed. At the time of writing thisamount of memory is not available to us (or to many researchers) and so we must use theCMD method described in Section in Section 3.3.

The dummy variable mover identifies workers who change plant during the sample period.The variable plantin counts the number of workers in each plant who move.

14

sort i j

by i: gen byte mover = j[1]!=j[_N]

egen plantin = sum(mover), by(j)

save cmd, replace

Only workers with mover=1 contribute to estimates of j , and one cannot estimate jfor plants with plantin=0. To estimate Equation (8) we use xtreg only for movers, andinclude a full set of firm dummies:

keep if mover==1

tab j, gen(F_)

local J1 = r(r)

xtreg lw married age age_2 single lN lN_2 year2-year4 F_*, fe i(i)

We then save the coefficient estimates 1 and and the variance covariance matrix V1,removing the constant from both:

matrix B1 = e(b)

matrix B1 = B1["x".."F_J1",1]

matrix V1 = e(V)

matrix V1 = V1["married".."F_J1","married".."F_J1"]

The process is then repeated for the non-movers. Note that we do not issue a clearcommand on its own. Instead use, clear loads in the data without destroying any of therelevant matrices in memory.

use cmd if mover==0, clear

xtreg y married age age_2 lN lN_2 year2-year4, fe i(i)

matrix BETA2 = e(b)

matrix BETA2 = BETA2["married".."year4",1]

local K = rowsof(BETA2)

matrix V2 = e(V)

matrix V2 = V2["married".."year4","married".."year4"]

Now we can compute the restricted estimator , given by Equation (10). To do this weneed to construct the vector (

2

0

)and the matrix (

V12 00 0

)

which is achieved by adding blocks of zeros to 2 and V2.

matrix B2 = BETA2\J(J1,1,0)

matrix V2inv = J(J1+K,J1+K,0)

matrix V2inv[1,1] = syminv(V2)

Equation (10) is then computed:

matrix DELTA = syminv(syminv(V1)+V2inv)*((syminv(V1)*B1)+(V2inv*B2))

and the variance-covariance matrix of is given by Equation (11):

15

matrix VARDELTA = syminv(syminv(V1)+V2inv)

We can label the resulting matrices using the variable names of (1, V1), and we can thendisplay the results in the usual format.

local rownames: rownames B1

matrix rownames DELTA = rownames

matrix rownames VARDELTA = rownames

matrix colnames VARDELTA = rownames

matrix DELTA = DELTA

ereturn post DELTA VARDELTA

ereturn display

Coef. Std. Err. z P>|z| [95% Conf. Interval]

married .0057685 .0002198 26.24 0.000 .0053377 .0061992

age .0582999 .000116 502.42 0.000 .0580724 .0585273

age_2 -.0351287 .0001392 -252.35 0.000 -.0354016 -.0348559

lN -.0103505 .0012258 -8.44 0.000 -.0127531 -.0079479

lN_2 .0021577 .0000847 25.47 0.000 .0019917 .0023237

year2 -.0060567 .0000902 -67.13 0.000 -.0062335 -.0058798

year3 .0145369 .0000931 156.21 0.000 .0143545 .0147193

year4 .0073687 .0000972 75.80 0.000 .0071782 .0075593

F_1 .1060651 .2684473 0.40 0.693 -.420082 .6322121

F_2 .1250478 .2576193 0.49 0.627 -.3798769 .6299724

(output omitted )

F_1821 .4486626 .2417074 1.86 0.063 -.0250753 .9224005

The 2 statistic to test whether the restriction imposed by pooling movers and non-moversis given by Equation (12).

. matrix DELTA = e(b)

. matrix VARDELTA = e(V)

. matrix BETA = DELTA["married".."year4",1]

. matrix PSI = DELTA["F_1".."F_$J1",1]

. matrix CHI2 = ((B1-DELTA)*syminv(V1)*(B1-DELTA) + (BETA2-BETA)*V2inv[K,K]*(BETA2-BETA))

. display as text "Chi^2 statistic: " el(CHI2,1,1)

Chi^2 statistic: 317.33385

. display as text "P-value: " chi2tail(K,el(CHI2,1,1))

P-value: 8.383e-64

Our test statistic strongly rejects the pooling hypothesis H0 : 1 = 2, namely that themodels for movers and non-movers are the same. Therefore, it would be wrong to estimatethis model by LSDV, even if one could.

16

Post-estimation analysis of error components

The CMD method allows us to recover the estimates of i and j so they can be analysedand possibly used in auxilliary regressions such as (15) and (16).

We have a vector which contains the firm-level error component for each firm which hasa mover. Using Equation (13) we can map this back to the data in the following way.

use cmd, clear

egen j1 = group(j) if plantin>0

generate psi=.

forvalues j=1(1)J1 {quietly replace psi = PSI[j,1] if j1==j

}assert psi==. if plantin==0

The variable psi now contains the appropriate value of . Note that we need a newvariable j1 which contains the index only for those firms with movement. We then assertthat plants with no movers do not have an estimated value of j .

An additional complication is that estimates of j cannot be directly compared acrossgroups, as defined on Page 8. This is because it is arbitrary which j is set equal tozero for normalisation in each group. The same issue applies to the resulting i. Abowd,Creecy & Kramarz therefore suggest normalising estimates of j so they have the samemean across groups. To do this we must first define the groups. We have written an adofile which creates a new variable which records which group each firm is in. The syntax ofgrouping is very simple:

grouping newvarname , ivar(varname) jvar(varname)

In our data we have 33 groups, as shown below.

. grouping g, ivar(i) jvar(j)

New variable g contains grouping indicator

Group 1: 4857672 person-years allocated to groups






(output omitted )





But note that almost all rows in the data belong to group 1. All those workers in plantswith movement to other plants are allocated a group. To normalise the estimates of jacross groups:

17

egen psigbar = mean(psi), by(g)

replace psi = psi-psigbar

To recover estimates of i we can use Equation (14). The easiest way to implement thisin Stata is to use the matrix score command:

matrix x = DELTA["married".."year4",1]

matrix score xb = x

gen theta_it = lw - xb - psi

egen theta = mean(theta_it), by(i)

Finally, we can estimate the auxilliary regressions and see whether the components arethemselves correlated.

. bysort i: gen t=_n

. regress theta female if t==1, robust

Regression with robust standard errors Number of obs = 1816368

F( 1,1816366) =10679.67

Prob > F = 0.0000

R-squared = 0.0071

Root MSE = .35695

Robust

theta Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.0724758 .0007013 -103.34 0.000 -.0738503 -.0711012

_cons 7.986986 .000287 . 0.000 7.986423 7.987548

. bysort j: gen n=_n

. regress psi single if n==1, robust

Regression with robust standard errors Number of obs = 1821

F( 1, 1819) = 8.06

Prob > F = 0.0046

R-squared = 0.0048

Root MSE = .17472

Robust

psi Coef. Std. Err. t P>|t| [95% Conf. Interval]

single -.024473 .0086226 -2.84 0.005 -.0413842 -.0075618

_cons .0105364 .0046327 2.27 0.023 .0014505 .0196223

The first regression sample comprises those 1,816,368 workers who work in the 1,821 plantsfor which we can estimate j (see Table 1). The second regression sample comprises oneobservation for each of these 1,821 plants.

The coefficients on female and single are both smaller than those estimated from theoriginal OLS regression, suggesting that these original estimates were biased. However, oneshould be aware that these auxilliary regressions impose the usual identifying assumptions

18

that the unobserved component of the error is uncorrelated with the observed component,so Cov(ui, i) = 0 and Cov(qj , j) = 0.

5 Conclusion

We have shown how, using standard Stata code, it is possible to estimate fixed effectsthree-way error components models.

There are two points worth emphasising. The first is that researchers who are interested inestimating unobserved i and j-level heterogeneity, and who have a large number of j-levelunits must use the Direct Least Squares algorithm of Abowd, Creecy and Kramarz. Inthis paper we explain how the researcher can make the feasible number of plants as largeas possible without having to resort to the Direct Least Squares algorithm. Our CMDmethod is virtually identical to the correct FEiLSDVj method, and only differs becausethe error variances are different in the mover and non-mover regressions.10

The second point is that it is important to emphasise the estimates of j rely entirely onworkers who change plants, as in any fixed-effects model. If one has a sample of plants,as here, there are very few movers (we have 1.9 million workers, but only 23,000 movers).The estimates of j therefore need interpreting with caution.

If researchers who are not interested in estimating the worker and firm heterogeneitiesthemselves, but merely wish to control for them, Spell-level fixed effects is very straight-forward to use.

Acknowledgments

The authors thank the IAB (Institut fur Arbeitsmarkt und Berufsforschung, Nurnberg)for kindly supplying the data, in particular, Lutz Bellmann and Stephan Bender. Financialsupport from the British Academy under Grant SG-35691 is also gratefully acknowledged.The views expressed in this paper are solely those of the authors and are not those ofthe IAB. Comments from presentations at the Symposium of Multisource Databases, Uni-versitat ErlangenNurnberg, July 2004, the 10th Annual Stata Users Group conference,London 2004, the IAB, the Institute of Social and Economic Research at Essex, and theDepartments of Economics at Aberdeen, Kent, Manchester, and Warwick are gratefullyacknowledged. The usual disclaimer applies. All calculations were performed with Stata9 SE and the code is available fromhttp://www.nottingham.ac.uk/economics/staff/details/richard upward.html.

10In fact, these are estimated as 0.08512 and 0.0682 respectively.

19

References

Abowd, J., Creecy, R. & Kramarz, F. (2002), Computing person and firm effects usinglinked longitudinal employer-employee data, Technical Paper 2002-06, U.S. CensusBureau.

Abowd, J., Kramarz, F. & Margolis, D. (1999), High wage workers and high wage firms,Econometrica 67, 251333.

Alda, H., Bender, S. & Gartner, H. (2005), The linked employer-employee dataset of theIAB (LIAB), IAB Discussion Paper 06/2005.

Baltagi, B. (2005), Econometric analysis of panel data, 3rd edn, John Riley.

Bender, S., Haas, A. & Klose, C. (2000), The IAB employment subsample 1975-1995:Opportunities for analysis provided by the anonymised sample, DP 117, IZA.

Chamberlain, G. (1984), Panel data, in Z. Griliches & M. Intrilligator, eds, Handbookof Econometrics, Vol. 2, Elsevier, Amsterdam, chapter 22, pp. 1247318.

Kolling, A. (2000), The IAB establishment panel, Schmollers Jahrbuch: Zeitschrift furWirtschafts- und Sozialwissenschaften 120, 291300.

Wooldridge, J. (2002), The econometric analysis of cross section and panel data, MITPress.

20

IntroductionA generic modelEconometric methodsSpell Fixed EffectsLeast squares dummy variables (LSDV)A Classical Minimum Distance (CMD) methodPost-estimation analysis of the error components

An illustrative exampleConclusion

Documents

fix