SREE workshop march 2010sean f reardon using instrumental variables in education research

SREE workshop

march 2010 sean f reardon

using instrumental variables in education research

T

Z X

Y

W

T U Y

outline

a little background on the potential outcomes framework

what is an instrumental variable? and what’s it good for?

assumptions needed to instrumental variables

practical methods of estimating IV models

sources of bias in IV models additional topics© 2010 by sean f. reardon. all rights reserved.

potential outcomes framework

a stylized example

what is the effect of receiving tutoring in math on student math achievement?

some made-up data for illustration:

Observed Student Treatment and Achievement Data

ID Treatment Condition

Test Score

1 no tutoring 55 2 no tutoring 60 3 no tutoring 65 4 tutoring 60 5 tutoring 72 6 tutoring 63

© 2010 by sean f. reardon. all rights reserved.



Test Score


Observed and Unobserved Potential Achievement Data

Student ID Treatment

Condition Score if not

Tutored Score if Tutored

Observed Score

Tutoring Effect

1 no tutoring 55 55 2 no tutoring 60 60 3 no tutoring 65 65 Untutored Average 60 60 4 tutoring 60 60 5 tutoring 72 72 6 tutoring 63 63 Tutored Average 65 65 Overall Average 62.5


Definition of an “effect”

The effect, , [on some outcome Y] [for some unit i] [of some treatment condition t relative to some other condition c] is defined as the difference between the value of Y that would be observed if unit i were exposed to treatment t and the value of Y that would be observed if unit i were exposed to treatment c.

More formally, we define the effect of t relative to c on Y for unit i as:

We define the average effect of t relative to c in a population P as:© 2010 by sean f. reardon. all rights reserved.

The “Fundamental Problem of Causal Inference” (Holland, 1986)

Although both and are defined in principle, it is impossible to observe both of them for the same unit (because any given unit can be exposed to only one of t or c).

Thus, the causal effect cannot be observed.

The problem of causal inference is thus a problem of missing data. The outcome Yi under its “counterfactual” condition is never observed.

How can we construct unbiased estimates of the average potential outcomes and under the counterfactual conditions?© 2010 by sean f. reardon. all rights reserved.



Test Score


Observed and Possible Unobserved Potential Achievement Data




Observed Score

Tutoring Effect

1 no tutoring 55 60 55 +5 2 no tutoring 60 72 60 +12 3 no tutoring 65 63 65 -2 Untutored Average 60 65 60 +5 4 tutoring 55 60 60 +5 5 tutoring 60 72 72 +12 6 tutoring 65 63 63 -2 Tutored Average 60 65 65 +5 Overall Average 60 65 62.5 +5




Test Score


Observed and Possible Unobserved Potential Achievement Data




Observed Score

Tutoring Effect

1 no tutoring 55 60 55 +5 2 no tutoring 60 55 60 -5 3 no tutoring 65 65 65 0 Untutored Average 60 60 60 0 4 tutoring 55 60 60 +5 5 tutoring 70 72 72 +2 6 tutoring 70 63 63 -7 Tutored Average 65 65 65 0 Overall Average 62.5 62.5 62.5 0


What if we can’t conduct an RCT?

If we can randomize students to receive either tutoring or no tutoring, and ensure that every student complies with his or her assigned treatment status, the randomization will allow us to estimate the effect of tutoring very easily.

but what if students don’t comply with their treatment assignment? some assigned to tutoring don’t go to tutoring some assigned to no tutoring get tutored anyway this means tutoring is no longer randomly assigned – at

least some of the variation in treatment status is potentially endogenous

so a comparison of those assigned to tutoring and no tutoring won’t give us an estimate of the effect of tutoring (but only the effect of being assigned to tutoring)

this is one case where instrumental variables are useful

instrumental variables models

What is an instrumental variable?

an instrumental variable is an exogenous factor that causes some of the variation in treatment status (though need not be all)

we use it to identify the portion of variation in treatment that is exogenous and then only rely on that exogenous variation to estimate the effect of treatment


A general structural model

T: treatment status

Y: outcome measure

X: observed confounders

U: unobserved confounders

W: observed ignorable causes of Y

Y: unobserved ignorable causes of Y

T: unobserved ignorable causes of T

Z: instrument (observed ignorable cause of T)


T

Z X

Y

W

T U Y

Relating treatments and outcomes

we would like to estimate the effect of T on Y

this involves seeing how T and Y are related

but to infer a causal relationship from the covariance of T and Y, we need to understand the source of variation in T why do some people get

different types/degrees of the treatment?


T Y


variation in T may be caused by factors unrelated to the outcome Y these may be observed (Z) or unobserved (T)

if the only variation in Z comes from factors unrelated to Y, then T is as good as randomly assigned, so getting a causal estimate is easy


T

Z

Y

T


variation in T may be caused, in part, by observed factors that are related to the outcome Y observed confounders (X)

as long as there is some variation in T that is caused by some (not necessarily observable) ignorable cause (Z or T), we can still easily get an estimate of the effect of T statistically control for X

(compute relationship between T and Y, conditional on X)


T

X

Y

T


variation in T may be caused, in part, by observed and unobserved factors that are related to the outcome Y observed confounders (X) unobserved confounders (U) reverse causality (Y affects T)

here, we cannot get an unbiased estimate of the effect of T statistical control can’t adjust

for U the ignorable cause (T) is not

observed


T

X

Y

T U


if we cannot observe all the confounders (or if Y affects T), then we need some observed factor that affects T but does not otherwise affect Y

this (Z) is called an instrument (or instrumental variable).

because the part of the variation in T that is induced is ignorable (as good as random), we can use this part of the variation in T to identify the effect of T on Y


T

Z X

Y

T U

Tutoring example, revisited

the observed data is not sufficient to estimate the average effect of tutoring

what if we can’t do an experiment, or if we do an experiment and not everyone complies?


tutoring voucher as an instrument

randomly assign eligible students to receive a either voucher allowing them to receive free tutoring (Z=1) or no voucher (Z=0).

observe whether students attend tutoring (T=1) or not (T=0). note: this choice is not random—students may

choose tutoring or not, regardless of voucher status (Ti≠Zi).

observe later achievement (Y) we want to estimate the effect of T

(tutoring vs no tutoring) on Y (achievement).


Four subpopulations (angrist, imbens, & rubin, 1996)

compliers those who would comply with treatment

assignment (those for whom Ti=Zi) non-compliers

always-takers those who would always receive the treatment,

regardless of assignment (those for whom Ti=1) never-takers

those who would never receive the treatment, regardless of assignment (those for whom Ti=0)

defiers those who would always do the opposite of treatment

assignment (those for whom Ti=1-Zi)


Observed Outcomes

N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):

Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40


Observed Outcomes



might be compliers or never-takers


Observed Outcomes



might be defiers or never-takers


Observed Outcomes



might be defiers or always-takers


Observed Outcomes



might be compliers or always-takers


estimating the proportion of compliers

assume there are no defiers then everyone with Z=1, T=0 is a never-

taker (15 of 50 (30%) with Z=1 in our example)

there should be the same proportion (30%) of never-takers among those with Z=0, because Z is random

the same logic implies there are 10% of the population who are always-takers

thus, 60% (100% - 30% - 10%) are compliers© 2010 by sean f. reardon. all rights reserved.

Estimating the proportion of compliers

we can also estimate this by regressing the treatment variable on the instrument

tutor = G0 + G1*voucher + etutor = .10 + 0.60*voucher

Thus, the average effect of being assigned a voucher on tutoring status is +0.60, meaning that the average student’s probability of receiving tutoring increases by 0.60 if assigned a voucher (which means that 60% of the students comply with the voucher assignment).


Observed Outcomes

Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1

Offered Tutored Voucher No Yes TotalNo 48.3 70.0 50.5Yes 44.9 61.6 56.6Total 47.5 62.6 53.5


Observed Outcomes



average outcome among untutored compliers and never-takers

here we’re assuming no defiers (later we will see why this is necessary)


Observed Outcomes



average outcome among untutored compliers and never-takers

average outcome among tutored compliers and always-takers

here we’re assuming no defiers (later we will see why this is necessary)


OLS estimates

OLS yields:test = 47.5 + 15.1*(tutored)

the estimated effect of tutoring is +15.1 points

but we should worry about whether this is biased, because some students chose whether to get tutoring or not.

the tutored group includes compliers and always-takers; the control group includes compliers and never-takers; so they are not equivalent groups


The Wald IV estimator

if we are willing to assume that the voucher offer had no effect on the outcome of the non-compliers (because it did not alter their treatment status and does not affect their outcome through any other way), then we can estimate the effect of tutoring like this: The average effect of the voucher in the

population is estimated to be +6.1 but only 60% of students’ decisions about

whether to get tutoring were affected by the voucher offer (only 60% of sample are compliers)


Wald estimator

average effect in population ( )= average effect on compliers ( )

x proportion who are compliers ( ) + average effect on non-compliers ( )

x proportion who are non-compliers ( )


Wald estimator

this says that the average effect of the treatment among the compliers equals the average effect in the population divided by the proportion of the population who are compliers

thus, the average effect among the compliers is

= +6.1/.60 = +10.1© 2010 by sean f. reardon. all rights reserved.

What have we learned?

An instrumental variable allows us to estimate the average effect of the treatment among those whose treatment status is affected by the instrument (“compliers”) called the “local average treatment effect” (LATE) note that we can’t identify who the compliers are

We can’t estimate the average treatment effect in the population, because we can’t estimate the effect among non-compliers because the instrument doesn’t affect their treatment

status, there is no exogenous variation in their treatment status that we can use.


What assumptions have we made?

the instrument only affects the outcome through its impact on the treatment (this is called the exclusion restriction)

the instrument is ignorably (randomly) assigned this allows us to estimate the effect of the

instrument on the outcome and on the treatment

the instrument affects the treatment for at least some people otherwise there are no compliers

there are no defiers© 2010 by sean f. reardon. all rights reserved.

more general IV models

what if treatment is not binary?

above we assumed the treatment (tutoring) was binary

but not all treatments are binary we could offer vouchers of different

amounts students could receive different amounts of

tutoring as a result, compliance may take on

many values for some students, the amount of tutoring

received may be strongly affected by the instrument; for others, it may be weakly affected or not at all affected.


a more general model of the IV estimator

for a given individual i, is the effect of Z on Y

this effect may vary across individuals

we would like to estimate the averageeffect,

Zi

Yi

i


1. exclusion restriction

if the only way thatZ affects Y is throughits effect on T, thenwe have .

or, put differently,

the assumption that the only way that Z affects Y is through its effect on T is called the exclusion restriction.

Ti

Zi

Yi

γi

i


2. zero compliance-effect covariance

we can write the average effect of Z on Y as

if we assume , then we have

the assumption that is called the zero compliance-effect covariance assumption.


3. instrument relevance

as long as , we can rewrite the above as

the assumption that is sometimes called the instrument relevance assumption; or sometimes just referred to as the assumption that the instrument affects the treatment.

if is small (close to zero), we say that the instrument is a weak instrument.© 2010 by sean f. reardon. all rights reserved.

4. the instrument is ignorably assigned

if the above three assumptions are met, we have

if Z is ignorably assigned, then we can easily estimate both (the average effect of Z on Y) and (the average effect of Z on T).

the assumption of ignorable assignment thus makes estimation of the effect of T on Y possible.© 2010 by sean f. reardon. all rights reserved.

what do these assumptions mean?

exclusion restriction: the offer of a tutoring voucher does not affect students’ achievement except by affecting the amount of tutoring they receive

zero compliance-effect covariance: there is no correlation between how strongly a voucher offer affects the amount of tutoring a student gets and how effective tutoring is for that student


what do these assumptions mean?

instrument relevance: the offer of a voucher has some effect, on average, on the amount of tutoring students receive (at least one student is affected by the offer).

ignorable assignment of the instrument: the voucher offer is randomly assigned (this would be violated, for example, if the principal gave vouchers to students she deemed most in need of tutoring).


some examples

NYC voucher experiment (howell et al, 2002; krueger & zhu, 2004)

Effect of schooling on wages, using quarter of birth as instrument (angrist & kreuger, 1991).

Effect of teacher absence on student achievement, using snowfall as instrument (miller, murnane & willet, 2007)

Effects of segregation on educational attainment and wages, using railroads as an instrument (ananat 2007)© 2010 by sean f. reardon. all rights reserved.

estimating IV models

estimating IV models in practice

in practice, we don’t usually compute the effect of Z on Y and Z on T and divide them because we made need more complex

models (if we want to include other covariates in the model, for example)

because we need to compute standard errors

most common methods of estimating IV models is with two-stage least squares (TSLS or 2SLS).


Three relevant equations

1: is the person-specific effect of Z on Y.

2: is the person-specific effect of Z on T.

but the equation we really are interested in is

3: is the person-specific effect of T on Y.© 2010 by sean f. reardon. all rights reserved.





3: is the person-specific effect of T on Y.

the “reduced form” equation








the “first stage” equation








the “first stage” equation

the “second stage” equation


two-stage least squares

fit the first-stage equation (estimate the effect of Z on T); compute fitted values:

fit the second-stage equation, using predicted values of T in place of observed values of T:











because the predicted values of T from the first-stage equation include only the variation in T that is caused by the instrument, the estimated coefficient from the second-stage equation will be unbiased (as long as the 4 IV assumptions are met).

if you do this by hand, you’ll get the wrong standard errors; statistical software usually has built-in routines (e.g., -ivregress- command in Stata) to compute correct standard errors.


Effects of attending charter school

we can’t randomize students to charter or traditional public schools

Abdulkadiroglu, et al (2009) examine students who apply to oversubscribed charter schools, whose admission is determined by lottery (randomization)

instrument is winning the lottery treatment is # of years in a charter

school


example: effect of charter schooling

first stage reduced form 2sls

(compliance) (effect of winning (effect of a

lottery on ach.) year in charter)


are the IV assumptions valid in this study?

exclusion restriction? zero compliance-effect covariance? instrument relevance? ignorable assignment?


sources of bias in IV models

sources of bias in IV

failure of exclusion restriction assumption failure of ignorability assumption failure of zero compliance-effect covariance

assumption finite sample bias weak instruments cause 3 problems:

exacerbate bias due to failure of assumptions (exclusion restriction, ignorability, zero covariance)

exacerbate finite sample bias lead to incorrect estimation of standard errors when

using two-stage least squares


failure of the exclusion restriction

recall that the exclusionrestriction says that theonly way that Z affects Y

is through its effect on T.

as a result, we can write

Ti

Zi

Yi

γi

i


failure of the exclusion restriction

if the exclusion restriction is violated, then there is some other path through which Z affects Y

as a result, we can write

Ti

Zi

Yi

γi

i

Zi

Yi

i


failure of the zero covariance assumption

averaging the above in the population

now, dividing through by , we get

so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be

larger



averaging the above in the population

now, dividing through by , we get

so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be

larger

bias due to failure of the exclusion restriction

bias due to failure of the zero compliance-effect covariation assumption



if all the assumptions except the zero compliance-effect covariance assumption are met, we have

so the IV model will estimate the compliance-weighted average treatment effect (CWATE). if T is binary and there are no defiers, this

will be the same as the average effect among the compliers (LATE), because non-compliers will get 0 weight.


failure of the ignorability assumption

if the instrument is not ignorably assigned, then we cannot obtain unbiased estimates of the effect of Z on Y or of the effect of Z on T.

Thus, the ratio of the two may be biased.


weak instruments

weak instruments do not, strictly-speaking, violate any of the IV assumptions, but they do exacerbate the bias from other assumptions

rule of thumb: an instrument is weak if the F-statistic on the instrument(s) from the first stage equation is <10.


weak instruments and bias the IV estimator

weak instruments cause 3 problems with IV estimator: exacerbate bias due to failure of the exclusion

restriction, ignorability, and monotonicity exacerbate finite sample bias lead to incorrect estimation of standard errors when

using two-stage least squares finite sample bias

even if the 4 IV assumptions are met, IV estimation is biased unless using an infinite sample

most pronounced with weak instruments and small samples


additional uses

mediation models

suppose we randomly assign a treatment (e.g., teacher professional development) that we think will affect student learning by affecting instructional practice

we can treat the PD as an instrument, and the mediator (instructional practice) as the ‘treatment’ and use IV to estimate the effect of instructional practice (which can’t be randomized) on learning but worry about exclusion restriction (are

there other ways that the PD could affect learning?)


multiple mediator models

suppose we have a randomize students to 3 treatment conditions.

two first stage equations:

second stage equation:


IV to correct for measurement error

suppose we want to estimate the effect of cognitive skill on wages:

if cognitive skill is measured with error by ACH, OLS will give a biased estimate of .

if we have a second test of skills, we can use one test as an instrument for the second test, and then use the predicted value of the second test in the wage equation.

called “errors-in-variables” (EIV) model.© 2010 by sean f. reardon. all rights reserved.

Documents

SREE workshop march 2010sean f reardon using instrumental variables in education research