Common Errors: How to (and Not to) Control for Unobserved Heterogeneity Lecture slides by Todd Gormley

Common Errors: How to (and Not to) Control

for Unobserved Heterogeneity

Lecture slides by Todd Gormley

Slides by Gormley Panel Data & Common Errors

The following slides are a combination of lecture slides used by Todd Gormley in his Ph.D. course on “Empirical Methods in Corporate Finance” at The Wharton School

For more details about the issues discussed in these slides, please see the below article

Gormley, T. and D. Matsa, 2014, “Common Errors: How to (and Not to) Control for Unobserved Heterogeneity,” Review of Financial Studies 27(2): 617-61.

What are these slides?


Controlling for unobserved heterogeneity is a fundamental challenge in empirical finance

Unobservable factors affect corporate policies and prices

These factors may be correlated with variables of interest

Important sources of unobserved heterogeneity are often common across groups of observations

Demand shocks across firms in an industry, differences in local economic environments, etc.

Motivation [Part 1]


E.g. consider a the firm-level estimation

where leverage is debt/assets for firm i, operating in industry j in year t, and profit is the firms net income/assets

What might be some unobservable omitted variables in this estimation?

Motivation [Part 2]

, , 0 1 , , 1 , ,i j t i j t i j tleverage profit u


Motivation [Part 3]

Oh, there are so, so many…

Managerial talent and/or risk aversion Cost of capital Industry supply and/or demand shock Regional demand shocks And so on…

Easy to think of ways these might be affect leverage and be correlated with profits

Sadly, this is easy to do with other dependent or independent variables…


Panel data to the rescue…

Thankfully, panel data can help us with a particular type of unobserved variable…

What type of unobserved variable does panel data help us with, and why?

Answer = It helps with any unobserved variable that doesn’t vary within groups of observations


Panel data and fixed effects (FE) How not to control for unobserved

heterogeneity General implications Benefits and limitations of FE model Estimating high-dimensional FE models

Outline for lecture


Panel data

Panel data = whenever you have multiple observations per unit of observation i (e.g. you observe each firm over multiple years)

Let’s assume N units i And, J observations per unit i [i.e.

balanced panel]

E.g., You observe 5,000 firms in Compustat over a twenty year period [i.e. N=5,000, J=20]


When unobserved heterogeneity is thought to be present, researcher implicitly assumes the following:

i indexes groups of observations (e.g. industry); j indexes observations within each group (e.g. firm)

yi,j = dependent variable

Xi,j = independent variable of interest

fi = unobserved group heterogeneity

= error term

The underlying model [Part 1]

, , ,i j i j i i jy X f

, i j


The following standard assumptions are made:

2

2

2

var( ) , 0

var( ) , 0

var( ) , 0

f f

X X

f

X


Simplifies some expressions, but doesn’t change any results

N groups, J observations per group, where J is small and N is largeX and ε are i.i.d. across groups, but not necessarily i.i.d. within groups


Finally, the following assumptions are made:

,

, , , ,

,

cov( , ) 0

co v( , ) co v( , ) 0

cov( , ) 0

i i j

i j i j i j i j

i j i Xf

f

X X

X f


Source of identification concern

What do these imply?Answer = Model is correct in that if we can control for f, we’ll properly identify effect of X; but if we don’t control for f there will be omitted variable bias


, , , i j i j i i jy X f

, , ,OLS OLS

i j i j i jy X u

By failing to control for group effect, fi, OLS suffers from omitted variable bias

True model is:

But OLS estimates:

OLS estimate of β is inconsistent

2ˆ XfOLS

X

Alternative estimation strategies are required…


First, notice that if you take the population mean of the dependent variable for each unit of observation, i, you get…

where

Can solve this by transforming data

Again, I assumed there are J obs. per unit i


Now, if we subtract from , we have

And look! The unobserved variable, fi , is gone (as is the constant) because it is group-invariant

With our earlier assumptions, easy to see that is uncorrelated with the new disturbance, , which means…

Transforming data [Part 2]

iy ,i ty

,i t ix x

?


Fixed effects (or within) estimator

Answer: OLS estimation of transformed model will yield a consistent estimate of β

The prior transformation is called the “within transformation” because it demeans all variables within their group

This is also called the FE estimator


Least Squares Dummy Variable (LSDV)

Another way to do the FE estimation is by adding indicator (dummy) variables

I.e. create a dummy variable for each group i, and add it to the regression

This is least squares dummy variable model

Now, our estimation equation exactly matches the true underlying model


LSDV versus FE [Part 1]

Why do both approaches work? Well… Frisch-Waugh-Lovell Theorem shows us there

are two ways to estimate the below β1…

Estimate directly; i.e. regress y onto both x and z OR we can just partial z out from both y and x

before regressing y on x (i.e. regress residuals from regression of y on z onto residuals from regression of x on z)


LSDV versus FE [Part 2]

Can show that LSDV and within-transformation of FE are identical because demeaned variables of within regression are the residuals from a regression onto group dummies!




Outline for lecture


Gormley and Matsa (RFS 2014) notes that existing literature uses various other strategies to control for unobserved group-level heterogeneity…

Their questions – How do each of the approaches differ? And, when are they consistent?

Their answer – Some popular strategies can distort inferences and should not be used; FE estimator should be used instead

Other approaches…


They focus on two popular strategies

“Adjusted-Y” (AdjY) – dependent variable is demeaned within groups [e.g. ‘industry-adjust’]

“Average effects” (AvgE) – uses group mean of dependent variable as control [e.g. ‘state-year’ control]


AdjY & AvgE are widely used

In Journal of Finance, Journal of Financial Economics, and Review of Financial Studies

Used since at least the late 1980s Still used, 60+ papers published in 2008-

2010 Variety of subfields; asset pricing, banking,

capital structure, governance, M&A, etc.

Also been used in papers published in the American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics


As Gormley and Matsa (RFS 2014) shows…

Both can be more biased than OLS Both can get opposite sign as true

coefficient In practice, bias is likely and trying to

predict its sign or magnitude will typically impractical

But, AdjY and AvgE are inconsistent


Other, related strategies should also not be used

“Characteristically-adjusted” stock returns in AP

“Adjusted” stock returns when trying to estimate firms’ internal value of cash

Simple comparisons of benchmark-adjusted outcomes before & after events (like M&A)

“Diversification discount” Using group average of an independent

variable as instrumental variable

Now, let’s see why…

More implications of GM (RFS 2014)


AdjY estimates:

Tries to remove unobserved group heterogeneity by demeaning the dependent variable within groups

, , ,AdjY AdjY

i j i i j i jy y X u

Adjusted-Y (AdjY)

, , i

1i i k i i k

k group

y X fJ

where

Note: Researchers often exclude observation at hand when calculating group mean or use a group median, but both modifications will yield similarly inconsistent estimates


Example AdjY estimation

One example – firm value regression:

= Tobin’s Q for firm j, industry i, year t

= mean of Tobin’s Q for industry i in year t

Xi,j,t = vector of variables thought to affect value

Researchers might also include firm & year FE

, , , , , ,'i j t i t i, j t i j tQ Q β X

,i tQ, ,i j tQ

Anyone know why AdjY is going to be inconsistent?


Rewriting the group mean, we have:

Therefore, AdjY transforms the true data to:

,i i i iy f X

Here is why…

, , ,i j i i j i i j iy y X X

What is the AdjY estimation forgetting?


But, AdjY estimates:

can be inconsistent when

By failing to control for , AdjY suffers from omitted variable bias when

0XX

, , ,i j i i j i i j iy y X X

, , ,AdjY AdjY

i j i i j i jy y X u

AdjY has omitted variable bias

True model:

2ˆ AdjY XX

X

iX

In practice, a positive covariance between X and will be very common

X

adjY 0


Further analysis of AdjY estimate

Bias doesn’t disappear as group size J increases

Can be inconsistent even when OLS is not; this happens when σXf = 0 and

Bias is more complicated with two variables…

2ˆ AdjY XX

X

0XX


Suppose, there are instead two RHS variables

Use same assumptions as before, but add:

, , , ,

2

, ,

,

cov( , ) cov( , ) 0

var( ) , 0

cov( , )

cov( , )

i j i j i j i j

Z Z

i j i j XZ

i j i Zf

Z Z

Z

X Z

Z f

AdjY estimates with 2 variables

, , , ,i j i j i j i i jy X Z f True model:


AdjY estimates with 2 variables [Part 2]

With a bit of algebra, it is shown that:

2 2

2 2 2

2 2

2 2 2

ˆ

ˆ

XZ Z XZ ZZX XX ZZ XZ

AdjYZ X XZ

AdjYXZ X XZ XXX ZX XZ ZZ

Z X XZ

Estimates of both β and γ can be inconsistent

Determining sign and magnitude of bias will typically be difficult


AvgE uses group mean of dependent variable as control for unobserved heterogeneity

, , ,AvgE AvgE AvgE

i j i j i i jy X y u

Average effects (AvgE)

AvgE estimates:


Example AvgE estimation

Following profit regression is an AvgE example:

ROAs,t = mean of ROA for state s in year t

Xi,s,t = vector of variables thought to profits

Researchers might also include firm & year FE

, , , , , ,'i s t i,s t s t i s tROA ROA β X

Anyone know why AvgE is going to be inconsistent?


, , ,i j i j i i jy X f

AvgE uses group mean of dependent variable as control for unobserved heterogeneity

, , ,AvgE AvgE AvgE

i j i j i i jy X y u

Average effects (AvgE)

Recall, true model:

AvgE estimates:

Problem is that measures fi with error

iy


,i i i iy f X Recall that group mean is given by

Therefore, measures fi with error

As is well known, even classical measurement error causes all estimated coefficients to be inconsistent

Bias here is complicated because error can be correlated with both mismeasured variable, , and with Xi,j when

AvgE has measurement error bias

i iX iy

0XX

if


2 2 2 2

22 2 2 2 2

ˆ2

Xf ffX X XX fXAvgE

X f XffX X XX

AvgE estimate of β with one variable

With a bit of algebra, it is shown that:

Determining magnitude and direction of bias is difficult

Covariance between X and again problematic, but not needed for AvgE estimate to be inconsistent

Even non-i.i.d. nature of errors can affect bias!

X


How common will the bias be?

First, we look at when by separating Xi,j into it’s group and idiosyncratic components

0XX

, ,i j i i jX x w

Assume group means are i.i.d. with mean zero and variance

Idiosyncratic component distributed with mean 0 and variance

2x

2w

And, assume ,cov( , ) 0i i jx w


AdjY and AvgE bias very common

Both AdjY and AvgE biased when But with prior setup, we can show

that…

0XX

Bias whenever different means across groups!

Or, bias whenever observations within groups are not independent!

, ,

2,i j i jx w wXX

* Solved excluding observation at hand (most common approach)


Analytical comparisons

Next, we use analytical solutions to compare relative performance of OLS, AdjY, and AvgE

To do this, we re-express solutions…

We use correlations (e.g. solve bias in terms of correlation between X and f, , instead of )

We also assume i.i.d. errors [just makes bias of AvgE less complicated]

XfXf


00.5

11.5

2

-0.75 -0.5 -0.25 0 0.25 0.5 0.75

, ,1, 0.25, 0.5, 10.

i j i jf X X x w w w J

Xf

Estimate,

ρXf has large effect on performance

OLS

AdjY

AvgE

True β = 1

(from Figure 1A)

Other parameters held constant

AdjY more biased than OLS, except for large values for ρXf

AvgE worst for low correlations, best for high


/x w

00.2

50.5

0.7

51

1.2

5

0 .5 1 1.5 2

Relative variation across groups key

OLSEstimate,

AdjY

AvgE

(from Figure 1B)

, ,1, 0.25, 0.5, 10.

i j i jf X X Xf w w J


0.5

0.7

51

1.2

5

0 5 10 15 20 25

More observations need not help!

OLS

Estimate,

AdjY

AvgE

J

(from Figure 1F)

, ,1, 0.25, 0.5, 10.

i j i jf X X x w Xf w w J


Summary of OLS, AdjY, and AvgE

In general, all three estimators are inconsistent in presence of unobserved group heterogeneity

AdjY and AvgE may not be an improvement over OLS; depends on various parameter values

AdjY and AvgE can yield estimates with opposite sign of the true coefficient


To estimate effect of X on Y controlling for Z

One could regress Y onto both X and Z…

Or, regress residuals from regression of Y on Z onto residuals from regression of X on Z

AdjY and AvgE aren’t the same as finding the effect of X on Y controlling for Z because...

AdjY only partials Z out from Y AvgE uses fitted values of Y on Z as

control

Comparing FE, AdjY, and AvgE

Add group FE

Within-group transformation!


The differences matter! Example #1

Consider the following capital structure regression:

(D/A)it = book leverage for firm i, year t

Xi,t = vector of variables thought to affect leverage

fi = firm fixed effect

We now run this regression for each approach to deal with firm fixed effects, using 1950-2010 data, winsorizing at 1% tails…

, ,( / ) i t i,t i i tD A fβX


Estimates vary considerablyDependent variable = book leverage

OLS Adj Y Avg E FE

Fixed Assets/ Total Assets 0.270*** 0.066*** 0.103*** 0.248***(0.008) (0.004) (0.004) (0.014)

Ln(sales) 0.011*** 0.011*** 0.011*** 0.017***(0.001) 0.000 0.000 (0.001)

Return on Assets -0.015*** 0.051*** 0.039*** -0.028***(0.005) (0.004) (0.004) (0.005)

Z-score -0.017*** -0.010*** -0.011*** -0.017***0.000 (0.000) (0.000) (0.001)

Market-to-book Ratio -0.006*** -0.004*** -0.004*** -0.003***(0.000) (0.000) (0.000) (0.000)

Observations 166,974 166,974 166,974 166,974R-squared 0.29 0.14 0.56 0.66

(from Table 2)



Consider the following firm value regression:

Q = Tobin’s Q for firm i, industry j, year t Xi,j,t = vector of variables thought to

affect value fj,t = industry-year fixed effect

We now run this regression for each approach to deal with industry-year fixed effects…

, , , , , ,' i j t i, j t j t i j tQ fβ X


OLS Adj Y Avg E FE

Delaware Incorporation 0.100*** 0.019 0.040 0.086**(0.036) (0.032) (0.032) (0.039)

Ln(sales) -0.125*** -0.054*** -0.072*** -0.131***(0.009) (0.008) (0.008) (0.011)

R&D Expenses / Assets 6.724*** 3.022*** 3.968*** 5.541***(0.260) (0.242) (0.256) (0.318)

Return on Assets -0.559*** -0.526*** -0.535*** -0.436***(0.108) (0.095) (0.097) (0.117)

Observations 55,792 55,792 55,792 55,792R-squared 0.22 0.08 0.34 0.37

Dependent Variable = Tobin's Q

Estimates vary considerably(from Table 4)



It also matters in literature on antitakeover laws

Past papers used AvgE to control for unobserved, time-varying differences across states & industries

Gormley and Matsa (2014) show that properly using industry-year, state-year, and firm FE estimator changes estimates considerably

E.g., using this framework, they show that managers have an underlying preference to “Play it Safe”

For details, see http://ssrn.com/abstract=2465632




Outline for lecture


General implications

With this framework, easy to see that other commonly used estimators will be biased

AdjY-type estimators in M&A, asset pricing, etc.

AvgE-type instrumental variables


Other AdjY estimators are problematic

Same problem arises with other AdjY estimators

Subtracting off median or value-weighted mean

Subtracting off mean of matched control sample [as is customary in studies if diversification “discount”]

Comparing industry-adjusted means for treated firms pre- versus post-event [as often done in M&A studies]

Characteristically adjusted returns [as used in asset pricing]


AdjY-type estimators in asset pricing

Common to sort and compare stock returns across portfolios based on a variable thought to affect returns

But, returns are often first “characteristically adjusted”

I.e. researcher subtracts the average return of a benchmark portfolio containing stocks of similar characteristics

This is equivalent to AdjY, where “adjusted returns” are regressed onto indicators for each portfolio

Approach fails to control for how avg. independent variable varies across benchmark portfolios


Asset pricing example; sorting returns based on R&D expenses / market value of equity

Asset Pricing AdjY – Example

(0.003) (0.009) (0.008) (0.007) (0.013) (0.006)

Q4 Q5

-0.012*** -0.033*** -0.023*** -0.002 0.008 0.020***

Characteristically adjusted returns by R&D Quintile (i.e., Adj Y)Missing Q1 Q2 Q3

We use industry-size benchmark portfolios and sorted using R&D/market value

Difference between Q5 and Q1 is 5.3 percentage points


R&D Quintile 2

R&D Quintile 3

R&D Quintile 4

R&D Quintile 5

ObservationsR 2

R&D Missing

Adj Y

0.021**(0.009)

0.01(0.013)

0.032***(0.012)

0.041***(0.015)

0.053***(0.011)

144,5920.00

FE

0.030***(0.010)

0.019

Dependent Variable = Yearly Stock Return

(0.019)

144,5920.47

(0.014)

0.051***(0.018)

0.068***(0.020)

0.094***

Estimates vary considerably(from Table 5)

Same AdjY result, but in regression format; quintile 1 is excluded

Use benchmark-period FE to transform both returns and R&D; this is equivalent to double sort


AvgE IV estimators also problematic

Many researchers try to instrument problematic Xi,j with group mean, , excluding observation j

Argument is that is correlated with Xi,j but not error

But, this is typically going to be problematic

Any correlation between Xi,,j and an unobserved hetero-geneity, fi, causes exclusion restriction to not hold

Can’t add FE to fix this since IV only varies at group level

iX

iX


What if AdjY or AvgE is true model?

If data exhibits structure of AvgE estimator, this would be a peer effects model [i.e. group mean affects outcome of other members]

In this case, none of the estimators (OLS, AdjY, AvgE, or FE) reveal the true β [Manski 1993; Leary and Roberts 2010]

Even if interested in studying , AdjY only consistent if Xi,j does not affect yi,j !

,i j iy y




Outline for lecture


FE Estimator – Benefits [Part 1]

There are many benefits of FE estimator

Allows for arbitrary correlation between each fixed effect, fi, and each x within group i

I.e. it is very general and not imposing much structure on what the underlying data must look like

Very intuitive interpretation; coefficient is identified using only changes within cross-sections


FE Estimator – Benefits [Part 2]

It is also very flexible and can help us control for many types of unobserved heterogeneities

Can add year FE if worried about unobserved heterogeneity across time [e.g. macroeconomic shocks]

Can add CEO FE if worried about unobserved heterogeneity across CEOs [e.g. talent, risk aversion]

Add industry-by-year FE if worried about unobserved heterogeneity across industries over time [e.g. investment opportunities, demand shocks]


FE Estimator – Limitations

But, FE estimator also has its limitations

Can’t identify variables that don’t vary within group

Subject to potentially large measurement error bias

Can be hard to estimate in some cases


Limitation #1 – Can’t est. some var.

If no within-group variation in the independent var., x, of interest, can’t disentangle it from group FE

It is collinear with group FE; and will be dropped by computer or swept out in the within transformation

In some cases, IV can be used to obtain estimates for variables that do not vary within groups [see Hausman and Taylor 1981]


Limitation #2 – Noisy ind. variables

If some within-group variation is noise, then variation being exploited that is noise rises in FE

Think of there being two types of variation

Good (meaningful) variation Noise variation because we don’t perfectly

measure the underlying variable of interest

Adding FE can sweep out a lot of the good variation; fraction of remaining variation coming from noise goes up [What will this do?]


Noisy independent variables [Part 2]

Answer: Attenuation bias on mismeasured (i.e. noisy) independent variable will go up!

Practical advice: Be careful in interpreting ‘zero’ coefficients on potentially mismeasured regressors; might just be attenuation bias!

Note… sign of bias on other coefficients will be generally difficult to know


Noisy independent variables [Part 3]

Problem can also apply even when all variables are perfectly measured [How?]

Answer: Adding FE might throw out relevant variation; e.g. y in firm FE model might respond to sustained changes in x, rather than transitory changes [see McKinnish 2008 for more details]

With FE you’d only have the transitory variation leftover; might find x uncorrelated with y in FE estimation even though sustained changes in x is most important determinant of y


Possible solutions for Limitation #2

Standard solutions for measurement error apply (e.g. IV), but in practice, hard to fix

For examples on how to deal with measurement error, see following papers

Griliches and Hausman (JoE 1986) Biorn (Econometric Reviews 2000) Erickson and Whited (JPE 2000, RFS 2012) Almeida, Campello, and Galvao (RFS 2010)


Researchers occasionally motivate using AdjY and AvgE because FE estimator is computationally difficult to do when there are more than one FE of high-dimension

Now, let’s see why this is (and isn’t) a problem…

Limitation #3 – Computation issues


Computational issues [Part 1]

Estimating a model with multiple types of FE can be computationally difficult

When more than one type of FE, you cannot remove both using within-transformation

Generally, you can only sweep one away with within-transformation; other FE dealt with by adding dummy variable to model

E.g. firm and year fixed effects [See next slide]



Consider below model:

To estimate this in Stata, we’d use a command something like the following…

, , ,i t i t t i i ty x f u Firm FE

Year FE

xtset firmxi: xtreg y x i.year, fe

Tells Stata that panel dimension is given by firm variableTells Stata to remove FE for panels (i.e. firms) by doing within-transformation

Tells Stata to create and add dummy variables for year variable


Dummies not swept away in within-transformation are actually estimated

With year FE, this isn’t problem because there aren’t that many years of data

If had to estimate 1,000s of firm FE, however, it might be a problem…



Why is this a problem?

Estimating FE model with many dummies can require a lot of computer memory

E.g., estimation with both firm and 4-digit industry-year FE requires ≈ 40 GB of memory

Most researchers don’t have this much memory; hence, we don’t see these regressions being used


This is growing problem

Multiple unobserved heterogeneities increasingly argued to be important

Manager and firm fixed effects in executive compensation and other CF applications [Graham, Li, and Qui 2011, Coles and Li 2011]

Firm, industry×year, state×year FE to control for industry- and state-level shocks [Gormley and Matsa 2014]


There exist two techniques that can be used to arrive at consistent FE estimates without requiring as much memory

#1 – Interacted fixed effects #2 – Memory saving procedures

But, there are solutions!




Outline for lecture


Combine multiple fixed effects into one-dimensional set of fixed effect, and remove using within transformation

E.g. firm and industry-year FE could be replaced with firm-industry-year FE

But, there are limitations…

Can severely limit parameters you can estimate

Could have serious attenuation bias

#1 – Interacted fixed effects


#2 – Memory-saving procedures

Use properties of sparse matrices to reduce required memory, e.g. Cornelissen (2008)

Or, instead iterate to a solution, which eliminates memory issue entirely, e.g. Guimaraes and Portugal (2010)

See Gormley and Matsa (RFS 2014) for details of how each method works

Both can be done in Stata using user-written commands FELSDVREG and REGHDFE


These latter techniques work…

Estimated typical capital structure regression with firm and 4-digit industry×year dummies

Standard FE approach would not work; my computer did not have enough memory…

Sparse matrix procedure took 8 hours… Iterative procedure took 5 minutes

See new Gormley and Matsa “Playing it Safe” working paper for example application http://ssrn.com/abstract=2465632


See website for more details…

For examples of SAS, STATA, and R code one can use to estimate these high-dimensional FE estimations, please see our website

http://finance.wharton.upenn.edu/~tgormley/papers/fe.html


Concluding remarks

Unobserved heterogeneity across groups is common identification concern in empirical finance

Despite heavy use, AdjY and AvgE are typically biased

Can lead to very misleading inferences, including estimates with opposite sign of true effect

Problem also applies to other, ad hoc transformations of dep. var. used in literature

FE is best way to account for unobserved heterogeneity; limitations can easily be overcome


Practical advice… the punch lines

Don’t use AdjY or AvgE! Don’t use group averages as

instruments! But, do use fixed effects

Should use benchmark portfolio-period FE in asset pricing rather than char-adjusted returns

Use iteration techniques to estimate models with multiple high-dimensional FE


In addition to Gormley and Matsa (RFS 2014), other sources used to construct these slides are…

Chapter 10 of Wooldridge, Jeffrey M., 2010, Econometric Analysis of Cross-Section and Panel Data, MIT Press, Massachusetts, Second Edition

Chapter 11 of Greene, William H., 2011, Econometric Analysis, Prentice Hall, N.J., Seventh Edition.

Sections 5.1 of Angrist, Joshua D., and Jorn-Steffen Pischke, 2009, Mostly Harmless Econometrics, Princeton University Press, New Jersey

Additional sources

Documents

Common Errors: How to (and Not to) Control for Unobserved Heterogeneity Lecture slides by Todd Gormley