2. Fixed Effects Models 2.1Basic fixed-effects model 2.2Exploring panel data 2.3Estimation and inference 2.4Model specification and diagnostics 2.5Model

2. Fixed Effects Models

• 2.1 Basic fixed-effects model• 2.2 Exploring panel data • 2.3 Estimation and inference • 2.4 Model specification and diagnostics • 2.5 Model extensions• Appendix 2A - Least squares estimation

2.1 Basic fixed effects model

• Basic Elements• Subject i is observed on Ti occasions;

– i = 1, ..., n, – Ti T, the maximal number of time periods.

• The response of interest is yit.• The K explanatory variables are xit = {xit1, xit2, ..., xitK}´, a

vector of dimension K 1.• The population parameters are = (1, ..., K)´, a vector of

dimension K 1.

Observables Representation of the Linear Model

• E yit = + 1 xit1+2 xit2+ ... + K xitK.

• {xit,1, ... , xit,K} are nonstochastic variables.

• Var yit = σ 2.

• { yit } are independent random variables.

• { yit } are normally distributed.

• The observable variables are {xit,1, ... , xit,K , yit}.

• Think of {xit,1, ... , xit,K} as defining a strata.

– We take a random draw, yit , from each strata.– Thus, we treat the x’s as nonstochastic – We are interested in the distribution of y, conditional on the x’s.

Error Representation of the Linear Model

• yit = + 1 xit,1+ 2 xit,2+ ... + K xit,K + εit

where E εit = 0.

• {xit,1, ... , xit,K} are nonstochastic variables..

• Var εit = σ 2.

• { εit } are independent random variables.

• This representation is based on the Gaussian theory of errors – it is centered on the unobservable variable εit .

• Here, εit are i.i.d., mean zero random variables.

Heterogeneous model

• We now introduce a subscript on the intercept term, to account for heterogeneity.

• E yit = i + 1 xit,1+ 2 xit,2+ ... + K xit,K.

• For short-hand, we write this as

E yit = i + xit´

Analysis of covariance model• The intercept parameter, , varies by subject.• The population parameters do not but control for the

common effect of the covariates x.• Because the errors are mean zero, the expected response is

E yit = i + xit´ .

Parameters of interest

• The common effects of the explanatory variables are dictated by the sign and magnitude of the betas (´s)– These are the parameters of interest

• The intercept parameters vary by subject and account for different behavior of subjects.– The intercept parameters control for the heterogeneity of

subjects.– Because they are of secondary interest, the intercepts are

called nuisance parameters.

Time-specific analysis of covariance

• The basic model also is a traditional analysis of covariance model.

• The basic fixed-effects model focuses on the mean response and assumes:– no serial correlation (correlation over time)– no cross-sectional (contemporaneous) correlation

(correlation between subjects)• Hence, no special relationship between subjects and time is

assumed.• By interchanging i and t, we may consider the model

yit = t + xit´ + it .

• The parameters t are time-specific variables that do not depend on subjects.

Subject and time heterogeneity

• Typically, the number of subjects, n, substantially exceeds the maximal number of time periods, T.

• Typically, the heterogeneity among subjects explains a greater proportion of variability than the heterogeneity among time periods.

• Thus, we begin with the “basic” model yit = i + xit´ + it .

– This model allows explicit parameterization of the subject-specific heterogeneity.

– By using binary variables for the time dimension, we can easily incorporate time-specific parameters.

2.2 Exploring panel data • Why Explore?

• Many important features of the data can be summarized numerically or graphically without reference to a model

• Data exploration provides hints of the appropriate model

– Many social science data sets are observational - they do not arise as the result of a designed experiment

– The data collection mechanism does not dictate the model selection process.

– To draw reliable inferences from the modeling procedure, it is important that the data be congruent with the model.

• Exploring the data also alerts us to any unusual observations and/or subjects.

Data exploration techniques• Panel data is a special case of regression data.

– Techniques applicable to regression are also useful for panel data.

– Some commonly used techniques include:

• Summarize the distribution of y and each x:

– Graphically, through histograms and other density estimators

– Numerically, through basic summary statistics (mean, median, standard deviation, minimum and maximum) .

• Summarize the relation between between y and each x:

– Graphically, through scatter plots

– Numerically, through correlation statistics

• Summary statistics by time period may be useful for detecting temporal patterns.

• Three more specialized (for panel data) techniques are:

– Multiple time series plots

– Scatterplots with symbols

– Added variable plots.

• Section 2.2 discusses additional techniques; these are performed after the fit of a preliminary model.

Multiple time series plots

• Plot of the response, yit, versus time t.• Serially connect observations among common subjects.• This graph helps detect:

– Patterns over time– Unusual observations and/or subjects.– Visualize the heterogeneity.

Scatterplots with symbols • Plot of the response, yit, versus an explanatory variable, xitj

• Use a plotting symbol to encode the subject number “i”• See the relationship between the response and explanatory

variable yet account for the varying intercepts.• Variation: If there is a separation in the x’s, such as

increasing over time, – then we can serially connect the observations. – We do not need a separate plotting symbol for each

subject.

Basic added variable plot• This is a plot of versus .

• Motivation: Typically, the subject-specific parameters account for a large portion of the variability.

• This plot allows us to visualize the relationship between y and each x, without forcing our eye to adjust for the heterogeneity of the subject-specific intercepts.

iit yy ijitj xx

Trellis Plot

YEAR

CCPD

5000

10000

15000

20000

PR UN VI MD ID SD IA WY ND WV MS MT OR AR WA NM WI OK

KY MN VT IN UT ME NE GA TN OH NC KS RI VA MA NH SC

5000

10000

15000

20000

CO

5000

10000

15000

20000

AK MO MI AL LA DE NY IL AZ CT TX FL PA NJ CA HI NV DC

2.3 Estimation and inference • Least squares estimates• By the Gauss-Markov theorem, the best linear unbiased estimates are the ordinary least square (ols) estimates.• These are given by:

• and

• Here, and are averages of {yit} and {xit} over time.• Time-constant x’s prevent one from getting unique estimates of b !!!

iit

n

i

T

tiitiit

n

i

T

tiit yy

ii

1 1

1

1 1

xxxxxxb

bxiii ya

ixiy

Estimation details• Although there are n+K unknown parameters, the calculation of the ols estimates requires inversion of only a K × K matrix.• The ols estimate of can also be expressed as a weighted average of estimates of subject-specific parameters.

– Suppose that all parameters are subject-specific so that the model is yit = i + xit´ i + it

– The ols estimate of i turns out to be

– Define the weighting matrix

– With this weight, we can express the ols estimate of as

– a weighted average of subject-specific parameter estimates.

iit

T

tiitiit

T

tiiti yy

ii

1

1

1

xxxxxxb

iit

T

tiiti

i

xxxxW1

i

n

ii

n

ii bWWb

1

1

1

Properties of estimates• Both ai and b have the usual properties of ols regression estimators

– They are unbiased estimators.– By the Gauss-Markov theorem, they are minimum variance among the class of unbiased estimates.

• To see this, consider an expression of the ols estimate of

– That is, b is a linear combination of responses.– If the responses are normally distributed, then so is b.

• The variance of b turns out to be

n

iit

T

tit y

i

1 11,Wb iit

n

iiit xxWW

1

11,

1

1

2Var

n

iiWb

ANOVA and standard errors• This follows the usual regression set-up.

• We define the residuals as eit = yit - (ai + xit´ b

• The error sum of squares is Error SS = it eit 2.

• The mean square error is

• the residual standard deviation is s.

• The standard errors of the slope estimates are from the square root of the diagonal of the estimated variance matrix

MSErrorKnN

SSErrors

)(2

1

1

2Var

n

iis Wb

Consistency of estimates• As the number of subjects (n) gets large, then b approaches .

– Specifically, weak consistency means approaching (convergence) in probability.

– This is a direct result of the unbiasedness and an assumption that i Wi grows without bound.

• As n gets large, the intercept estimates ai do not approach i.

– They are inconsistent.

– Intuitively, this is because we assume that the number of repeated measurements of i is Ti , a bounded number.

Other large sample approximations• Typically, the number of subjects is large relative to the number of time periods observed.• Thus, in deriving large sample approximations of the sampling distributions of estimators,

assume that n although T remains fixed.• With this assumption, we have a central limit theorem for the slope estimator.

– That is, b is approximately normally distributed even though though responses are not. – The approximation improves as n becomes large.

• Unlike the usual regression set-up, this is not true for the intercepts. If the responses are not normally distributed, then i are not even approximately normal.

2.4 Model specification and diagnostics

• Pooling Test

• Added variable plots

• Influence diagnostics

• Cross-sectional correlations

• Heteroscedasticity

Pooling test• Test whether the intercepts take on a common value, say .

• Using notation, we wish to test the null hypothesis

H0: 1= 2= ... = n= • This can be done using the following partial F- (Chow) test:

– Run the “full model” yit = i + xit´ + it to get Error SS and s2 .

– Run the “reduced model” yit = + xit´ + it to get (Error SS)reduced .

– Compute the partial F-statistic,

– Reject H0 if F exceeds a quantile from an F-distribution with numerator degrees of freedom df1 = n-1 and denominator degrees of freedom df2 = N-(n+K).

2)1( sn

SSErrorSSErrorratioF reduced

Added variable plot

• An added variable plot (also called a partial regression plot) is a standard graphical device used in regression analysis

• Purpose: To view the relationship between a response and an explanatory variable, after controlling for the linear effects of other explanatory variables.

• Added variable plots allow us to visualize the relationship between y and each x, without forcing our eye to adjust for the differences induced by the other x’s.

• The basic added variable plot is a special case.

Procedure for making an added variable plot

• Select an explanatory variable, say xj. • Run a regression of y on the other explanatory variables (omitting xj)

– calculate the residuals from this regression. Call these residuals e1.• Run a regression of xj on the other explanatory variables (omitting xj)

– calculate the residuals from this regression. Call these residuals e2.• The plot of e1 versus e2 is an added variable plot.

Correlations and added variable plots• Let corr(e1, e2 ) be the correlation between the two sets of residuals.

– It is related to the t-statistic of xj, t(bj ) , from the full regression equation (including xj) through:

– Here, K is the number of regression coefficients in the full regression equation and N is the number of observations.• Thus, the t-statistic can be used to determine the correlation coefficient of the added variable plot without running the three

step procedure.• However, unlike correlation coefficients, the added variable plot allows us to visualize potential nonlinear relationships

between y and xj .

)()(

)(),corr(

221

KnNbt

btee

j

j

Influence diagnostics• Influence diagnostics allow the analyst to understand the impact of individual observations and/or subjects

on the estimated model• Traditional diagnostic statistics are observation-level

– of less interest in panel data analysis – the effect of unusual observations is absorbed by subject-specific parameters.

• Of greater interest is the impact that an entire subject has on the population parameters.• We use the statistic

• Here, b(i) is the ols estimate b calculated with the ith subject omitted.

KB i

n

iiii /)( )(

1)( bbWbbb

Calibration of influence diagnostic • The panel data influence diagnostic is similar to Cook’s distance for regression.

– Cook’s distance is calculated at the observational level yet Bi(b) is at the subject level• The statistic Bi(b) has an approximate 2 (chi-square) with K degrees of freedom

– Observations with a “large” value of Bi(b) may be influential on the parameter estimates.– Use quantiles of the 2 to quantify the adjective “large.”

• Influential observations warrant further investigation – they may need correction, additional variable specification to accommodate differences or

deletion from the data set.

Cross-sectional correlations• The basic model assumes independence between subjects.

– Looking at a cross-section of subjects, we assume zero cross-sectional correlation, that is, ij = Corr (yit ,yjt) = 0 for i j.

• Suppose that the “true” model is yit = t + xit´ it ,where t is a random temporal effect that is common to all subjects.– This yields Var yit =

+

– The covariance between observations at the same time but from different subjects is Cov (yit ,yjt) = , i

j.– Thus, the cross-sectional correlation is

22

2

,Corr

jtit yy

Testing for cross-sectional correlations • To test H0: ij = 0 for all i j, assume that Ti = T .

– Calculate model residuals {eit}.

– For each subject i, calculate the ranks of each residual. • That is, define {ri,1 , ..., ri,T} to be the ranks of {ei,1 , ..., ei,T} .

• Ranks will vary from 1 to T, so the average rank is (T+1)/2.

– For the ith and jth subject, calculate the rank correlation coefficient (Spearman’s correlation)

– Calculate the average Spearman’s correlation and the average squared Spearman’s correlation

• Here, {i<j} means sum over i=1, ..., j-1 and j=2, ..., n.

T

t ti

T

t tjti

ijTr

TrTrsr

1

2,

1 ,,

2/)1(

2/)1(2/)1(

}{2/)1(

1ji ijAVE sr

nnR

}{

22

2/)1(

1ji ijAVE sr

nnR

Calibration of cross-sectional correlation test• We compare R2

ave to a distribution that is a weighted sum of chi-square random variables (Frees, 1995).

• Specifically, define

Q = a(T) (12- (T-1)) + b(T) (2

2- T(T-3)/2) .

– Here, 12 and 2

2 are independent chi-square random variables with T-1 and T(T-3)/2 degrees of freedom, respectively.

– The constants are

a(T) = 4(T+2) / (5(T-1)2(T+1))

and

b(T) = 2(5T+6) / (5T(T-1)(T+1)) .

Calculation short-cuts• Rule of thumb for cut-offs for the Q distributon .

• To calculate R2ave

– Define

– For each t, u, calculate iZi,t,u and iZi,t,u 2. .

– We have

– Here, {t,u} means sum over t=1, ..., T and u=1, ..., T.

– Although more complex in appearance, this is a much faster computation form for R2ave.

– Main drawback - the asymptotic distribution is only available for balanced data.

2/)1(2/)1(121

,,3,,

TrTrTT

Z uitiuti

},{

2,,

2

,,2

)1(

1ut i utii utiAVE ZZ

nnR

Heteroscedasticity• Carroll and Ruppert (1988) provide a broad treatment

• Here is a test due to Breusch and Pagan (1980).

– Ha: Var it = 2 + wit, where wit is a known vector of weighting variables and is a p-dimensional vector of parameters.

– H0: Var it = 2. This procedure is:

• Fit a regression model and calculate the model residuals, {rit}.

• Calculate squared standardized residuals,

• Fit a regression model of on wit.

• The test statistic is LM = (Regress SSw)/2, where Regress SSw is the regression sum of squares from the model fit in step 3.

• Reject the null hypothesis if LM exceeds a percentile from a chi-square distribution with p degrees of freedom. The percentile is one minus the significance level of the test.

NSSErrorrr itit //22* 2*

itr

2.5 Model extensions• In panel data, subjects are measured repeatedly over time. Panel data analysis is useful

for studying subject changes over time.• Repeated measurements of a subject tend to be intercorrelated.

– Up to this point, we have used time-varying covariates to account for the presence of time in the mean response.

– However, as in time series analysis, it is also useful to measure the tendencies in time patterns through a correlation structure.

Timing of observations• We now specify the time periods when the observations are made.

– We assume that we have at most T observations on each subject. • These observations are made at time periods t1, t2, ..., tT.

– Each subject has observations made at a subset of these T time periods, labeled t1, t2, ..., tTi.

– The subset may vary by subject and thus could be denoted by t1i, t2i, ..., tTii.

– For brevity, we use the simpler notation scheme and drop the second i subscript.• This framework, although notationally complex, allows for missing data and incomplete observations.

Temporal covariance matrix• For a full set of observations, let R denote the T T temporal (time) variance-covariance matrix.

– This is defined by R = Var (i)• Let Rrs = Cov (ir, is) is the element in the rth row and sth column of R.

– There are at most T(T+1)/2 unknown elements of R. – Denote this dependence of R on parameters using R(). Here, is the vector of unknown parameters of R.

• For the ith observation, we have Var (i ) = Ri(), a Ti Ti matrix.– The matrix Ri() can be determined by removing certain rows and columns of the matrix R(). – We assume that Ri() is positive-definite and only depends on i through its dimension.

Special cases of R• R = 2 I, where I is a T T identity matrix. This is the case of no serial correlation.• R = 2 ( (1-) I + J ), where J is a T T matrix of 1’s. This is the uniform correlation model

(also called “compound symmetry”). – Consider the model yit = i + it ,where i is a random cross-sectional effect.– This yields Rtt = Var it =

+ .

– For r s, consider Rrs = Cov (yir ,yis) = .

• To write this in terms of 2, note that Corr (it , is ) = / (

+ ) =

• Thus, Rrs = 2 .

More special cases of R:• Rrs = 2 exp( - | tr - ts | ) .

– In the case of equally spaced in time observations, we may assume that tr+1 - tr = 1. Thus, Rrs = 2 |r-s| , where = exp (- ) . – This is the autoregressive of order one model, denoted by AR(1).

• More generally, for equally spaced in time observations, assume

Cov (ir , is ) = Cov (ij , ik ) for |r-s| = |j-k|. – This is a stationary assumption. – It implies homoscedasticity.– There are only T unknown elements of R, “Toeplitz” matrix.

• Assume only homoscedasticity. – There are 1 + T(T-1)/2 unknown elements of R, corresponding to the variance and the correlation matrix.

• Make no assumptions on R. – There are T(T+1)/2 unknown elements of R.

Subject-specific slopes• Let one or more slopes vary by subject.• The fixed effects linear panel data model is

yit = zit´ i + xit´ + it .– The q explanatory variables are zit = {zit1, zit2, ..., zitq}´, a vector of dimension q 1.– The subject-specific parameters are = (i1, ..., iq)´, a vector of dimension q 1.– This is short-hand notation for the model

yit = i1 zit1 +... + iq zitq +1 xit1+... + K xitK + it .

• The responses between subjects are independent • We allow for temporal correlation through the assumption that Var i = Ri().

Assumptions of the Fixed Effects Linear Longitudinal Data Model

• E yi = Zi αi + Xi β.• {xit,1, ... , xit,K} and {zit,1, ... , zit,q} are nonstochastic.• Var yi = Ri(τ) = Ri.

• { yi } are independent random vectors.• { yit } are normally distributed.

iiii iT

i

i

KiTiTiT

Kiii

Kiii

i

xxx

xxx

xxx

x

x

x

X

2

1

,2,1,

,22,21,2

,12,11,1

Least Squares Estimates• The estimates are derived in Appendix 2A.2.

• They are given by:

• and

• Here,

n

iiiiZii

n

iiiiZiiFE

1

2/1,

2/1

1

1

2/1,

2/1 yRQRXXRQRXb

FEiiiiiiiiFE bXyRZZRZa 111,

2/1112/1,

iiiiiiiiiZ RZZRZZRIQ

Robust estimation of standard errors• It is common practice to ignore serial correlation and heteroscedasticity, so that one assumes Ri = 2 Ii .

• Thus,

• where

• Huber (1967), White (1980) and Liang and Zeger (1986) suggested replacing Ri by ei ei . Here, ei is the vector of residuals. Thus, a robust standard error of bj is

1

11

1

1

)(

n

iiiiii

n

iiiii

n

iiii

thj ofelementdiagonaljbse XQXXQeeQXXQX

n

iiii

n

iiii

1

1

1

yQXXQXb

iiiiii ZZZZIQ 1

Documents

2. Fixed Effects Models 2.1Basic fixed-effects model 2.2Exploring panel data 2.3Estimation and inference 2.4Model specification and diagnostics 2.5Model