58
Econometrics Course: Econometrics Course: Endogeneity & Simultaneity Endogeneity & Simultaneity Mark W. Smith Mark W. Smith

Econometrics Course: Endogeneity & Simultaneity

  • Upload
    rue

  • View
    77

  • Download
    5

Embed Size (px)

DESCRIPTION

Econometrics Course: Endogeneity & Simultaneity. Mark W. Smith. Overview. Endogeneity Sources Responses Omitted Variables Measurement Error Proxy Variables Method of Instrumental Variables Properties Validity and strength of instruments. Definition of Endogeneity. - PowerPoint PPT Presentation

Citation preview

Page 1: Econometrics Course: Endogeneity & Simultaneity

Econometrics Course:Econometrics Course:Endogeneity & SimultaneityEndogeneity & Simultaneity

Mark W. SmithMark W. Smith

Page 2: Econometrics Course: Endogeneity & Simultaneity

22

OverviewOverview EndogeneityEndogeneity

– SourcesSources

– ResponsesResponses Omitted VariablesOmitted Variables Measurement ErrorMeasurement Error Proxy VariablesProxy Variables Method of Instrumental VariablesMethod of Instrumental Variables

– PropertiesProperties

– Validity and strength of instrumentsValidity and strength of instruments

Page 3: Econometrics Course: Endogeneity & Simultaneity

33

Definition of EndogeneityDefinition of Endogeneity

Suppose we have a regression equationSuppose we have a regression equation

yy1xx1xx

The variable xThe variable x11 is is endogenousendogenous if it is correlated with if it is correlated with . .

Note that this is related to, but not identical to, the Note that this is related to, but not identical to, the heuristic definition that “x1 is determined within the heuristic definition that “x1 is determined within the model.”model.”

Page 4: Econometrics Course: Endogeneity & Simultaneity

44

Sources of EndogeneitySources of Endogeneity

1. Omitted variables1. Omitted variables If the true model underlying the data isIf the true model underlying the data is

yy1xx1xxxx

but you estimate the modelbut you estimate the model

yy1xx1xx

then variable xthen variable x11 will be endogenous will be endogenous ifif it is correlated it is correlated with xwith x33. Why? Because . Why? Because f f xx

Page 5: Econometrics Course: Endogeneity & Simultaneity

55

Sources of EndogeneitySources of Endogeneity

2. Measurement error2. Measurement error

Suppose the true model underlying the data isSuppose the true model underlying the data is

yy1xx1xx

but you estimate the modelbut you estimate the model

yy1xx1xx

where (xwhere (xxx

Page 6: Econometrics Course: Endogeneity & Simultaneity

66

Sources of EndogeneitySources of Endogeneity

2. Measurement error - continued2. Measurement error - continued

Variable xVariable xwill be endogenous if depends on xx

Example: Suppose that xxmeasures hospital size

(no. of beds), and that the measurement error is greater for larger hospitals. Then as xx22 grows, so grows, so

does does . Thus . Thus is correlated with x is correlated with x2, 2, causing causing

endogeneity.endogeneity.

Page 7: Econometrics Course: Endogeneity & Simultaneity

77

Sources of EndogeneitySources of Endogeneity

2. Measurement error - continued2. Measurement error - continued

Rearranging the equation, we have Rearranging the equation, we have

yy1xx1xx

yy1xx1(x(x

yy1xx1xx

If If = f(x = f(x22) then error term is correlated with x) then error term is correlated with x2,2,

causing endogeneity.causing endogeneity.

Page 8: Econometrics Course: Endogeneity & Simultaneity

88

Sources of EndogeneitySources of Endogeneity

3. Simultaneity3. Simultaneity

A system of simultaneous equations occurs when two or A system of simultaneous equations occurs when two or more left-hand side variables are functions of each more left-hand side variables are functions of each other (there are other ways of stating it, too):other (there are other ways of stating it, too):

yy111xx1yy

yy221xx1yy11

Page 9: Econometrics Course: Endogeneity & Simultaneity

99

Sources of EndogeneitySources of Endogeneity

3. Simultaneity 3. Simultaneity

With some algebra you can rewrite these two With some algebra you can rewrite these two equations in “reduced form” as a single equation equations in “reduced form” as a single equation with an endogenous regressor. with an endogenous regressor.

Page 10: Econometrics Course: Endogeneity & Simultaneity

1010

Pretesting for EndogeneityPretesting for Endogeneity

The most famous test is Hausman (1978). Many others are described in Nakamura and Nakamura (1998).

Idea: the method of instrumental variables (IV) uses two-stage least squares (2SLS). If there is no endogeneity, it is more efficient to use OLS. If there is endogeneity, OLS is inconsistent and so 2SLS is best.

Page 11: Econometrics Course: Endogeneity & Simultaneity

1111

Pretesting for EndogeneityPretesting for Endogeneity

Problem: the tests all have low power, particularly when 2SLS would cause a significant loss of efficiency.

In practice, many people use a Hausman test, fail to reject the null hypothesis of no endogeneity, and then use OLS.

A more statistically reliable approach is to base judgments of endogeneity on how the system under study works.

Page 12: Econometrics Course: Endogeneity & Simultaneity

1212

Responses to EndogeneityResponses to Endogeneity

What if you are unsure whether a variable is endogenous?

Approach #1: ignore it

Approach #2: use instrumental variables (IV) -- described later -- for every possibly endogenous variable

Approach #3: subtract out the variable using time-series (panel) data

Page 13: Econometrics Course: Endogeneity & Simultaneity

1313

Responses to Endogeneity Responses to Endogeneity

Approach #1: ignore it -- Not advisable: true endogeneity causes OLS to be

inconsistent

Approach #2: use IV on every possibly endogenous variable

-- Not advisable: it will cause a loss of efficiency (and hence wider confidence intervals) and may lead to bias.

Page 14: Econometrics Course: Endogeneity & Simultaneity

1414

Responses to EndogeneityResponses to Endogeneity

Approach #3: Difference it out

Suppose that the endogeneity is fixed over time, such as measurement error or an omitted variable. Further, suppose that observe data in two time periods.

A difference-in-difference (DD) model can be used: subtract values at time 1 (“before”) from values at time 2 (“after”) and the endogenous variable will drop out.

Page 15: Econometrics Course: Endogeneity & Simultaneity

1515

Responses to EndogeneityResponses to Endogeneity

Approach #3: Difference it out -- continued

Limitations:

- DD models will not eliminate selection bias.

- DD models only eliminate fixed variables; sometimes endogenous variables change values over time

Page 16: Econometrics Course: Endogeneity & Simultaneity

1616

Dealing with Omitted VariablesDealing with Omitted Variables

Page 17: Econometrics Course: Endogeneity & Simultaneity

1717

Dealing with Omitted Variables

The investigator should have a conceptual model of the process under study. Guided by this understanding, there are a few options for dealing with omitted variables.

1. Find additional data so that every relevant variable is included.

2. Ignore it

- Acceptable only if omitted variable is uncorrelated with all included variables; otherwise the coefficient estimates will be biased up or down.

Page 18: Econometrics Course: Endogeneity & Simultaneity

1818

Dealing with Omitted Variables

3. Find proxy variable

Suppose the following:

y is the outcome

q is the omitted variable

z is the proxy for q

What properties should the proxy z have?

Page 19: Econometrics Course: Endogeneity & Simultaneity

1919

Dealing with Omitted Variables

a. Proxy z should be strongly correlated with q.

b. Proxy z must be redundant (= ignorable)

E (y | x, q, z) = E (y | x, q)

c. Omitted q must be uncorrelated with other regressors conditional on z:

(corr (q , x(corr (q , xjj) = 0 | z) for each x) = 0 | z) for each xjj

Page 20: Econometrics Course: Endogeneity & Simultaneity

2020

Dealing with Omitted VariablesDealing with Omitted Variables

The last two mean roughly that q and z provide The last two mean roughly that q and z provide similar information about the outcome.similar information about the outcome.

You don’t observe q, so how can you prove these You don’t observe q, so how can you prove these conditions are met? Either argue it from theory conditions are met? Either argue it from theory

or test the assumption using other data.or test the assumption using other data.

Page 21: Econometrics Course: Endogeneity & Simultaneity

Dealing with Measurement ErrorDealing with Measurement Error

Page 22: Econometrics Course: Endogeneity & Simultaneity

2222

Dealing with Measurement ErrorDealing with Measurement Error

1. Improve measurement- DSS improved by refusing extreme outlier values

- NPPD improved by requiring more complete data

2. Argue that the degree of error is small- Use outside data for validation

3. Argue that error is uncorrelated with included variables

Page 23: Econometrics Course: Endogeneity & Simultaneity

2323

Dealing with Proxy VariablesDealing with Proxy Variables

Page 24: Econometrics Course: Endogeneity & Simultaneity

2424

Dealing with Proxy VariablesDealing with Proxy Variables

1. What if proxy variable z is correlated with a regressor x?

OLS is inconsistent, but one can hope and argue that the inconsistency is less than if z is omitted.

Page 25: Econometrics Course: Endogeneity & Simultaneity

2525

Dealing with Proxy VariablesDealing with Proxy Variables

2. Consider using a lagged dependent variable as a proxy variable.

Example: If you believe that omitted variable qt strongly affects outcome yt, then a lagged value of y (such as yt-2) is probably correlated with qt as well.

Problem: yt-2 may be correlated with other x’s as well, leading to inconsistency.

Page 26: Econometrics Course: Endogeneity & Simultaneity

2626

Dealing with Proxy VariablesDealing with Proxy Variables

3. Consider using multiple proxy variables for a single omitted variable.

How? Simply put all proxy variables in the equation.

Note: they all must meet the requirements for proxies.

Page 27: Econometrics Course: Endogeneity & Simultaneity

2727

Dealing with Proxy VariablesDealing with Proxy Variables

4. What if omitted variable q interacts with a regressor x?

yyxxqqxx

y/x = q

marginal effect of x on y involves q, which is unobserved

Page 28: Econometrics Course: Endogeneity & Simultaneity

2828

Dealing with Proxy VariablesDealing with Proxy Variables

Demean z: take every value of z and subtract out the grand (overall) average value. Call it zd.

yy1xxzdzdxx

y/x = zd

= because E[zd] = 0

Page 29: Econometrics Course: Endogeneity & Simultaneity

2929

Instrumental VariablesInstrumental Variables

Page 30: Econometrics Course: Endogeneity & Simultaneity

3030

Method of Instrumental VariablesMethod of Instrumental Variables

Often used to deal with simultaneity.Often used to deal with simultaneity.

More generally, IV applies whenever a regressor More generally, IV applies whenever a regressor xx is is correlated with the error term correlated with the error term ..

Page 31: Econometrics Course: Endogeneity & Simultaneity

3131

IV DefinitionIV Definition

Model: y = Model: y = + + 1xx1 + + 2xx2 + +

Suppose that xSuppose that x22 is endogenous to y. An instrumental is endogenous to y. An instrumental variable is one thatvariable is one that

(a) is correlated with the endogenous variable x(a) is correlated with the endogenous variable x22

(b) is uncorrelated with error term (b) is uncorrelated with error term (c) should not enter the main equation (i.e., does not (c) should not enter the main equation (i.e., does not explain y)explain y)

Page 32: Econometrics Course: Endogeneity & Simultaneity

3232

Two-Stage Least SquaresTwo-Stage Least Squares

Two-stage least squares (2SLS) approachTwo-stage least squares (2SLS) approach

Stage 1: Stage 1:

Predict xPredict x22 as a function of all other variables plus an as a function of all other variables plus an

IV (call it z):IV (call it z):

xx2 = a + = a + xx1 + + 2z + +

Create predicted values of xCreate predicted values of x2 2 – call them x– call them x22pp

Page 33: Econometrics Course: Endogeneity & Simultaneity

3333

Two-Stage Least SquaresTwo-Stage Least Squares

Two-stage least squares (2SLS) approachTwo-stage least squares (2SLS) approach

Stage 2: Stage 2:

Predict y as a function of xPredict y as a function of x22pp and all other variables (but and all other variables (but

not z):not z):

y = a + y = a + xx1 + + 2 xx22pp + +

Note: adjust the standard errors to account for the fact that xx22

pp is predicted.is predicted.

Page 34: Econometrics Course: Endogeneity & Simultaneity

3434

Two-Stage Residual InclusionTwo-Stage Residual Inclusion

2SLS is only consistent when the Stage 2 equation is linear.

If Stage 2 is nonlinear, use the two-stage residual inclusion (2SRI) method:

- Stage 1 as in 2SLS, leading to predicted xx22pp

- Develop residuals v = xx22 - xx22p p

Page 35: Econometrics Course: Endogeneity & Simultaneity

3535

Two-Stage Residual InclusionTwo-Stage Residual Inclusion

- Stage 2: Stage 2:

Predict y as a function of xPredict y as a function of x11, x, x2 2 (not x(not x22pp) and the new ) and the new

residuals v:residuals v:

y = f (a + y = f (a + xx1 + + 2 xx22 ++ 3v)v) + +

where f(.) is a nonlinear function.where f(.) is a nonlinear function.

Note that if Stage 2 is linear, then 2SRI yields the Note that if Stage 2 is linear, then 2SRI yields the same results as 2SLS.same results as 2SLS.

Page 36: Econometrics Course: Endogeneity & Simultaneity

3636

Multiple IVsMultiple IVsWhat if you have multiple endogenous variables?What if you have multiple endogenous variables?

1. The number of IVs must equal or exceed the number of endogenous variables

2. Estimate a separate 1st-stage regression for each endogenous variable

3. Every 1st-stage regression should contain all IVs

Page 37: Econometrics Course: Endogeneity & Simultaneity

3737

IV IssuesIV Issues

Two issues plague the IV method:

1. No IV is available

2. A potential IV is found, but its quality is uncertain

Page 38: Econometrics Course: Endogeneity & Simultaneity

3838

IV IssuesIV Issues

What if there is no IV?

State that no IV exists and forge ahead anyway, arguing that any bias in OLS is likely to be small. - Argue that the endogeneity is weak on theoretical grounds.

- Argue that external data indicate that the bias from OLS is likely to be small.

Page 39: Econometrics Course: Endogeneity & Simultaneity

3939

IV PropertiesIV Properties

What if you have an IV of unknown quality?

Two characteristics mark a good IV:

1. Validity

2. Strength

Page 40: Econometrics Course: Endogeneity & Simultaneity

4040

IV ValidityIV Validity

Validity has several components:

a. Non-zero correlation with x2

b. Uncorrelated with error term

c. Uncorrelated with y except through x2

d. Monotonicity: as z increases, x2 increases

Page 41: Econometrics Course: Endogeneity & Simultaneity

4141

IV ValidityIV Validity

There are several ways to show validity of an IV:

• Non-zero correlation with the endogenous variable can be shown directly.

• Robustness: do alternative IVs yield similar results?

• Non-correlation with the outcome variable of the 2nd

stage. This point must be argued from theory, an understanding of how the system under study works.

Page 42: Econometrics Course: Endogeneity & Simultaneity

4242

IV ValidityIV Validity

Warning: one cannot simply add a candidate IV to the main model (i.e., the 2nd stage) to see whether it is significant. The result is biased.

BUT

If there are multiple IVs, one can use a test of over-identifying restrictions.

Page 43: Econometrics Course: Endogeneity & Simultaneity

4343

IV ValidityIV Validity

Overidentification: number of candidate IVs exceeds number of endogenous variables.

Suppose that

(a) You have one endogenous variable and three candidate IVs

(b) You know that one of the IVs is truly valid.

Use the known-valid IV in the 1st stage and put the remaining two IVs in the 2nd stage.

Page 44: Econometrics Course: Endogeneity & Simultaneity

4444

IV ValidityIV Validity

Over-identification test, continued

If the two remaining IVs are jointly insignificant in the 2nd stage, then this supports their use as alternative IVs.

Problem: this only works if the IV(s) in the 1st stage are truly valid – and you don’t know that!

Page 45: Econometrics Course: Endogeneity & Simultaneity

4545

IV ValidityIV Validity

Over-identification test, continued

Partial solution: use Sargan’s (1984) test, which assumes only that one or more of your IVs are valid –you don’t have to specify which. This method fails only if none of the IVs is valid.

In the end, you must argue for validity on conceptual grounds at a minimum.

Page 46: Econometrics Course: Endogeneity & Simultaneity

4646

IV ValidityIV Validity

Conceptual arguments:

1. Explain why z should influence x2

2. Explain why z should not influence y directly

3. Anticipate objections about omitted variables that link z to the error term . Show that z is not related to those omitted variables, perhaps using outside data. For example, use data on non-veterans to support a claim about how veterans act.

Page 47: Econometrics Course: Endogeneity & Simultaneity

4747

IV PropertiesIV Properties

Two characteristics mark a good instrumental variable:

1. Validity

2. Strength

Page 48: Econometrics Course: Endogeneity & Simultaneity

4848

Strong IVsStrong IVs

A strong instrument has a high correlation with the endogenous variable.

How strong a correlation? Staiger & Stock (1997) recommend a partial F statistic of 5 or greater.

- Run 1st stage with and without the IV.

- Compare the overall F statistics: a difference of 5 or

more is sufficient evidence of strength.

Page 49: Econometrics Course: Endogeneity & Simultaneity

4949

Weak IVsWeak IVs

If the IVs are weak,• 2SLS and 2SRI are consistent, but there can be

considerable bias even in large samples• standard errors are too small • 2SLS and 2SRI perform poorly

Page 50: Econometrics Course: Endogeneity & Simultaneity

5050

Weak IVsWeak IVs

What to do if IVs are weak?

If there is a single endogenous variable, use a conditional likelihood ratio (CLR) test:

* perform a regular likelihood ratio test

* adjust the critical values

* available in Stata; see Stata Journal, 3, 57-70 and http://elsa.berkeley.edu/wp/marcelo.pdf by Moreira

and Poi

Page 51: Econometrics Course: Endogeneity & Simultaneity

5151

Weak IVsWeak IVs

What if there are multiple endogenous variables and only weak IVs?

A solution has not been developed … yet!

Page 52: Econometrics Course: Endogeneity & Simultaneity

5252

Selected ReferencesSelected References

Page 53: Econometrics Course: Endogeneity & Simultaneity

5353

Selected ReferencesSelected ReferencesJM Wooldridge. Econometric analysis of cross section and panel JM Wooldridge. Econometric analysis of cross section and panel

data. Cambridge, MA: MIT Press, 2002.data. Cambridge, MA: MIT Press, 2002.

A graduate-level econometrics textbook with lengthy textual A graduate-level econometrics textbook with lengthy textual descriptions of practical issues.descriptions of practical issues.

HS Bloom, ed. Learning more from social experiments: evolving HS Bloom, ed. Learning more from social experiments: evolving analytic approaches. Russell Sage. analytic approaches. Russell Sage.

A largely non-technical exploration of how instrumental A largely non-technical exploration of how instrumental variables are found and used, with examples from welfare variables are found and used, with examples from welfare reform studies.reform studies.

Page 54: Econometrics Course: Endogeneity & Simultaneity

5454

Selected ReferencesSelected ReferencesMP Murray. Avoiding invalid instruments and coping with weak MP Murray. Avoiding invalid instruments and coping with weak

instruments. instruments. Journal of Economic PerspectivesJournal of Economic Perspectives 2006;20(4): 2006;20(4): 111-132. 111-132.

A superb reference with relatively few equations. Has an A superb reference with relatively few equations. Has an extensive reference list. extensive reference list.

A Nakamura, M Nakamura. Model specification and A Nakamura, M Nakamura. Model specification and endogeneity. endogeneity. Journal of EconometricsJournal of Econometrics 1998;83:213-237. 1998;83:213-237.

Presents major endogeneity tests, explores approaches to Presents major endogeneity tests, explores approaches to endogeneity testing. Somewhat iconoclastic. endogeneity testing. Somewhat iconoclastic.

Page 55: Econometrics Course: Endogeneity & Simultaneity

5555

Selected ReferencesSelected References

M McClellan, B McNeil, J Newhouse. Does more intensive M McClellan, B McNeil, J Newhouse. Does more intensive treatment of acute myocardial infarction in the elderly reduce treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. mortality? Analysis using instrumental variables. JAMAJAMA1994;272(11):859-66 1994;272(11):859-66

Classic paper using IV in health, but challenging to read.Classic paper using IV in health, but challenging to read.

J Newhouse, M McClellan. Econometrics in outcomes research: J Newhouse, M McClellan. Econometrics in outcomes research: the use of instrumental variables.the use of instrumental variables. Ann Rev Pub Health Ann Rev Pub Health 1998; 1998; 19:17-34.19:17-34.

Non-technical introduction to IV.Non-technical introduction to IV.

Page 56: Econometrics Course: Endogeneity & Simultaneity

5656

Selected ReferencesSelected References

J Terza, A Basu, P Rathouz. Two-stage residual inclusion J Terza, A Basu, P Rathouz. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric estimation: Addressing endogeneity in health econometric modeling. modeling. Journal of Health EconomicsJournal of Health Economics 2008;27:531-543. 2008;27:531-543.

Explains two-stage residual inclusion models and contrasts Explains two-stage residual inclusion models and contrasts them to two-stage least squares. Moderately technical.them to two-stage least squares. Moderately technical.

Page 57: Econometrics Course: Endogeneity & Simultaneity

5757

AcknowledgementsAcknowledgements

Much of the content of this presentation is derived from Much of the content of this presentation is derived from Wooldridge (2002), Murray (2006), and Nakamura and Wooldridge (2002), Murray (2006), and Nakamura and Nakamura (2006). Nakamura (2006).

Helpful comments were also provided by HERC staff.Helpful comments were also provided by HERC staff.

Page 58: Econometrics Course: Endogeneity & Simultaneity

5858

Questions?Questions?