41
Using the Instrumental Variables Technique in Educational Research By Larry V. Hedges Northwestern University

Instrumental Variables Approaches

  • Upload
    dohuong

  • View
    237

  • Download
    0

Embed Size (px)

Citation preview

Using the Instrumental Variables Technique in Educational Research

ByLarry V. Hedges

Northwestern University

Outline• The place of IV in educational research

methodology

• The classical econometric justification of IV

• The modern statistical approach to IV and causal inference

• Implementing IV analyses

• What can go wrong

• Practical problems in IV

DisclaimerThis talk is intended to be non-technical, therefore:

No matrix algebra will be used

Some technical details will be glossed over

For example, I will speak of bias and accuracy in situations where the actual moments of estimates do not exist

The object is to build intuition and understanding not to be rigorously technically correct

Estimating Treatment EffectsConsider treatment assignment (dummy variable) X and outcome Y

Regress Y on X

Yi = β0 + β1Xi + εi

The estimate of β1 is just the difference between the mean Y for X = 1 (the treatment group) and the mean Y for X = 0 (the control group)

Thus the OLS estimate is

= β1 +

Y β β ε

Y β ε

1 0 1 1

0 0 0

1 0Y Y 1 0

Estimating Treatment Effects(With Random Assignment)

If the treatment is randomly assigned, then X is uncorrelated with ε (X is exogenous)

If X is uncorrelated with ε if and only if

But if , then the mean difference is

= β1 + = β1

This implies that standard methods (OLS) give an unbiased estimate of β1, which is the average treatment effect

That is, the treatment-control mean difference is an unbiased estimate of β1,

1 0

1 0

1 0Y Y 1 0

What goes wrong without randomization?(Simple Case)

If we do not have randomization, there is no guarantee that X is uncorrelated with ε (X may be endogenous)

Thus the OLS estimate is still

= β1 +

If X is correlated with ε, then

Hence does not estimate β1, but some other quantity that depends on the correlation of X and ε

If X is correlated with ε, then standard methods give a biased estimate of β1

1 0Y Y 1 0

1 0Y Y

1 0

What goes wrong without randomization?

When you regress Y on X, Y = β0 + β1X + ε and the OLS estimate of β1 can be described as

But since X and ε are correlated, bOLS does not estimate β1 but some other quantity that depends on the correlation of X and ε

Cov ,Cov ,Cov , Cov ,

Cov , Cov , Cov ,Cov , Cov ,

OLSX XY X

bX X X X

X X ε X ε XX X X X

0 1

11

β β

ββ

ε

Instrumental VariablesNatural experiments are naturally occurring situations where we want

to know the effect of variable X on Y and there is a variable Z related to X, but not ε

Another way so say this is: Z effects Y only through X

This variable Z is called an instrumental variable

It can be shown that

is an unbiased estimator of β1 in large samples but not in small samples (bIV is consistent)

Cov ,Cov ,IV

Y Zb

X Z

Instrumental VariablesOne way to see this is in terms of two regression equations

Yi = β0 + β1Xi + εi

Xi = γ0 + γ1Zi + ηi

Note that, in this model X is endogenous (may be correlated with ε)

The instrumental variables model requires that:

1. γ1 ≠ 0 so that Z predicts X, and

2. Z uncorrelated with ε (Z is exogenous) [Cov{ε, Z} = 0]

Instrumental Variables

You can see the logic of IV as follows

IVX ZY Z

bX Z X Z

X Z Z X ZX Z X Z

0 1

1 1

1

ε

ε

β β

β β

β

Cov ,Cov ,Cov , Cov ,

Cov , Cov , Cov ,Cov , Cov ,

Instrumental VariablesRecall the two regression equations

Yi = β0 + Xiβ1 + εi

Xi = γ0 + Ziγ1 + ηi

This is why instrumental variables methods are associated with simultaneous equations methods in econometrics

In this formulation, Zi and Xi can be vectors, so you can have • several X variables, only some of which are endogenous and

• several Z variables only some of which are instruments (but you must have more instruments than endogenous X variables)

The instrumental variables model requires that γ1 ≠ 0 and Z uncorrelated with ε

Instrumental VariablesRemember: To be an instrument Z must be:

• Relevant (Z must be related to the endogenous variable X)

• Exogenous (Z must be related to the outcome Y only through X)

Failure of either condition is a problem!

But both conditions can be hard to satisfy at the same time

ExampleExperiments with imperfect complianceEffect of intent to treat, versus treatment on the treated

Intent to treat estimateCompare Y for all those assigned to treatment 1 to those assigned to

treatment 0

This estimates the causal effect on Y of assignment to treatment

It does not measure the effect of actually receiving the treatment unless there is perfect compliance

Experimental methods cannot estimate the effect of receiving the treatment, because that cannot be randomly assigned (without perfect compliance)

For example, families that use vouchers may be systematically different than those who do not in ways that affect Y

ExampleExperiments with imperfect compliance Voucher experiments

We may want to know the causal effect of using vouchers

But not all families assigned vouchers use them

Because use of vouchers is not randomly assigned, it may be correlated with residuals

Random assignment to receive vouchers (is?) an instrument because

• Voucher assignment is related to voucher use

• Voucher assignment may affect school achievement only through voucher use

ExampleExperiments with imperfect complianceThis same idea can be applied to study the effect

of receiving treatment (the effect of treatment on the treated) in many settings

It can also be used to study the effect of the “active ingredients” in imperfectly implemented treatments

It can (more cautiously) be used to study effects of a treatment where there is an instrument that does not arise via random assignment

Other examples of IV Studies

Note: See Angrist & Krueger (2001)

Y X Z College enrollment

Financial Aid Thresholds in Aid

Test scores Class size Thresholds of maximum class size Health Heart surgery Proximity to hospital Earnings Year of school Quarter of birth Birth weight Maternal smoking State cigarette taxes

Estimating Causal EffectsThe Rubin-Holland-Rosenbaum model starts with 2

potential responses for each unit

r1i = outcome unit i experiences in treatment 1

r0i = outcome unit i experiences in treatment 0

The causal effect of treatment 1 versus 0 on unit i is defined as

τi = r1i – r0i

You can’t estimate τi directly, but you can estimate the average causal effect in some circumstances, like a randomized experiment

Estimating Causal Effects (Randomized Experiments)

Let Z = {0, 1} be a variable that expresses treatment assignment

In a perfectly implemented randomized experiment, treatment assignment (Z) is uncorrelated with both r1i and r0i, so

E{r1i | treatment 1 (Z = 1)} = E{r1i}

E{r0i | treatment 0 (Z = 0)} = E{r0i}

Thus E{r1 | Z = 1} – E{r0 | Z = 0} = E{r1 – r0} =

So the estimate of the treatment effect is unbiased r r1 0

τ

Estimating Causal Effects (IV Studies)

Consider IV within randomized experiments

Random assignment Z, with endogenous X (believed to be the efficacious causal component of treatment)

We want to know the causal effect of the endogenous variable X on outcome Y

For example

• Effect of voucher use in randomized choice studies

• Effect of treatment implementation

• Effect of using specific instructional methods

Estimating Causal Effects (IV Studies)

IV can estimate causal effects of X on Y, if the following assumptions hold:

1. SUTVA

2. Random assignment of Z

3. Exclusion restriction (exogeneity of Z)

4. Nonzero causal effect of Z on X

5. Monotonicity (no defiers)

Then the IV estimate is an estimate of the average treatment effect for those who comply with assignment

Unit’s Reaction to TreatmentWe can characterize unit’s reaction to treatment into four

categories

• Compliers (do what they are assigned to do)

• Always takers (get treatment regardless of assignment)

• Never takers (never get treatment regardless of assignment)

• Defiers (always do the opposite of what is assigned)[Note that we ruled out defiers by hypothesis]

Note that we cannot necessarily identify individuals are which

Estimating Causal Effects (IV Studies)

Note that the causal effect of treatment on always takers and never takers is 0 by definition

We can also see the IV estimate as the ratio of two causal effects (two intent to treat estimates)

Causal effect of on Causal effect of on IV

Z YbZ X

Carrying Out IV AnalysesRecall the description of IV in terms of two regression equations

Yi = β0 + β1Xi + εi

Xi = γ0 + γ1Zi + ηi

Two-stage least squares estimation involves

• Regressing X on Z to get estimates of X

• Regressing Y on to get an estimate of β1

Specialized programs are also available in many packages (e.g., STATA or SAS)

There are also other, more complex procedures (such as LIML)

What Can Go Wrong In the Use of IV

Failure of the assumptions!

Failure of exogeneity (Z influences Y though other variables than X)

Failure of relevance (Z has only a weak relation to X)

Both of these kinds of failures are quantitative, not qualitative

Choice of instruments may involve a tradeoff between these two kinds of failures

But also, IV is a large sample procedure, even when assumptions are met it is only guaranteed to be unbiased in large samples

Violation of IV AssumptionsIt is important to distinguish between two situations:

1. The assumption of exogeneity is met exactly and the relevance may be small (but nonzero) [weak instruments]In this case the only bias is due to small sample bias in estimation

2. The exogeneity assumption is not met exactlyIn this case there is additional (large sample) bias due to direct causal effect of Z on Y

The analysis of bias is quite different in these two cases!

Exogenous, but Weak Instruments

Even when assumptions are perfectly met, IV is not unbiased in small (finite) samples

Finite sample bias can be non-negligible (e.g., 20 - 30%), even when the sample size is over 100,000 if the instrument is weak (Z is only weakly correlated with X)

The relative bias of bIV (versus bOLS) is approximately 1/F where F is the F-statistic for testing the relation between the instrument (Z) and endogenous variable (X)

A small value of F, even if it is large enough to be statistically significant signals possible large bias in bIV

Exogenous, but Weak InstrumentsMeasuring strength of instruments: The concentration parameter

One interpretation of the concentration parameter is related to the F-test statistic in the regression of X on Z is a test of the hypothesis that γ = 0:

k(F – 1) estimates λ

where k is the number of instruments

The accuracy of bIV (2SLS) estimate depends on λ, (λ functions like a sample size)

λη

VarVar

Testing for Weak InstrumentsIt is not sufficient that the relation between Z and X is statistically

significant

Need to test whether λ/k exceeds a threshold (below which instruments are weak enough to imperil inference)

Two definitions of ‘weak enough to imperil inference,’ and both can be tested with first stage F for relation of Z and X (Stock & Yugo, 2005):

1. Bias of bIV exceeds 10% of the bias of bOLS

Requires F > 10

2. Actual level of 5% significance test exceeds 15%Requires F > 24

Exogenous, but Weak Instruments

Exact (small sample) results are available, but very complex (almost to the point of being uninformative)

In general, more instruments increases the relevance of the instrument set (increases the first stage F)

But, too many instruments increases small sample bias (compared to few instruments)

In general it is best to have as few instruments as possible, and for them to be strongly correlated with X (the endogenous variable)

There are Several IV MethodsI focused on 2SLS, the most widely used IV method

There are more complex competitors, such as the Limited Information Maximum Likelihood (LIML) estimation

Analyses of these methods are difficult too. Large sample methods can help, but

There are at least 4 different large sample (asymptotic) models for analyzing IV (and they often give different results)

One of these suggests that 2SLS is equivalent to LIML

Small sample studies (not definitive) suggest that LIML may be superior to 2SLS in small samples

There are Several IV Methods

But the full story is not completely clear (e.g., how much this finding depends on normality) and it is not simple

Although it is generally found that 2SLS has particularly poor finite sample behavior, each alternative estimator seems to have its own pathologies when instruments are weak. (Andrews & Stock, 2005, p. 2)

Failure of ExogeneityLet H be the direct causal effect of Z on Y

Then if the exclusion restriction (exogeneity) is violated, the (large sample, large λ) bias in bIV is

This shows that bias is reduced when the instrument is relevant (strong correlation between Z and X), so the odds of being a noncomplier are small

E | is a complier

P is a noncomplierE | is a noncomplier

P is a complier

i

i

H i

iH i

i

Failure of Exogeneity

Failure of exogeneity may introduce large biases that are hard to quantify precisely because they depend on unobservables

Usually, this assumption will be (somewhat) false

The best we can do is often to be skeptical and to make sure exogeneity is highly plausible in the setting to which we apply IV

IV Can Provide Valid EstimatesThere are applications in which IV does provide credible estimates

Krueger’s (1999) IV estimate of the effects of actual class size on achievement using randomization as an instrument

Howell et al.’s (2000) IV estimate of the effects of using school vouchers on achievement using randomization as an instrument

Bloom et al.’s (1997) IV estimate of the effects of JTPA on earnings using randomization as an instrument

Practical Problems with IV

How do we know if Z is exogenous?

Isn’t randomization always a good instrument?

No!

Consider a randomized experiment to change instruction (using many sites or schools)

Practical Problems with IVZ is assignment to treatment to change instruction

X is a measure of the instruction targeted by treatment

Is Z relevant (a strong instrument)?

Hard to tell a priori (e.g., if Z is dichotomous, X is continuous, Z may not explain much variance in X)

Is Z (exogenous)?

Why should Z not influence Y through other unmeasured instructional practices?

Practical Problems with IVPossible Solution

Include other instructional practices as covariates or endogeneous variables

But the number of instruments must exceed the number of endogenous variables—now we need more instruments

We could include Z-by-site interactions as instruments

But now we have increased the number of instruments, which may increase bias

Practical Problems with IVAssignment may have direct effects on Y if volunteers want

the treatment (Shadish, Cook, & Campbell, 2002)

Assignment may influence units to get alternatives

• Tutoring

• Teacher induction

• Health care

• After school programs

Assignment may have a discouraging effect on control group

ConclusionsIV can make possible estimates of causal effects without

random assignment in some cases

But it is no panacea

Often, it will be difficult to find instruments that are both relevant (strong enough) and exogenous

IV estimation is a complicated subject and good theory for all of the relevant issues is not available

For example, all of the theory I have mentioned assumes simple random sampling so it does not take clustered sampling (of the kind in most education experiments) into account

Select BibliographyCausal InferenceRubin, D. B. (1974). Estimating causal effects in randomized and non-

randomized studies. Journal of Educational Psychology, 66, 688-701.

Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444-455.

Imbens, G. W. & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467-475.

Natural ExperimentsAngrist, J. D. & Krueger, A. B. (2000). Instrumental variables and the

search for identification: From supply an demand to natural experiments. The Journal of Economic Perspectives, 15, 69-85.

Select BibliographyWeak InstrumentsBound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with instrumental

variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak, Journal of the American Statistical Association, 90, 443-450.

Staiger, D., & Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65, 557-586.

Nelson, C. R. & Startz, R. (1990). Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica, 58, 967-976.

Stock, J. H., Wright, J. H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics, 20, 518-529

Buse, A. (1992). The bias of instrumental variable estimators. Econometrica, 60, 173-180.