Upload
beatrix-murphy
View
215
Download
1
Embed Size (px)
Citation preview
SREE workshop
march 2010 sean f reardon
using instrumental variables in education research
T
Z X
Y
W
T U Y
outline
a little background on the potential outcomes framework
what is an instrumental variable? and what’s it good for?
assumptions needed to instrumental variables
practical methods of estimating IV models
sources of bias in IV models additional topics© 2010 by sean f. reardon. all rights reserved.
potential outcomes framework
a stylized example
what is the effect of receiving tutoring in math on student math achievement?
some made-up data for illustration:
Observed Student Treatment and Achievement Data
ID Treatment Condition
Test Score
1 no tutoring 55 2 no tutoring 60 3 no tutoring 65 4 tutoring 60 5 tutoring 72 6 tutoring 63
© 2010 by sean f. reardon. all rights reserved.
Observed Student Treatment and Achievement Data
ID Treatment Condition
Test Score
1 no tutoring 55 2 no tutoring 60 3 no tutoring 65 4 tutoring 60 5 tutoring 72 6 tutoring 63
Observed and Unobserved Potential Achievement Data
Student ID Treatment
Condition Score if not
Tutored Score if Tutored
Observed Score
Tutoring Effect
1 no tutoring 55 55 2 no tutoring 60 60 3 no tutoring 65 65 Untutored Average 60 60 4 tutoring 60 60 5 tutoring 72 72 6 tutoring 63 63 Tutored Average 65 65 Overall Average 62.5
© 2010 by sean f. reardon. all rights reserved.
Definition of an “effect”
The effect, , [on some outcome Y] [for some unit i] [of some treatment condition t relative to some other condition c] is defined as the difference between the value of Y that would be observed if unit i were exposed to treatment t and the value of Y that would be observed if unit i were exposed to treatment c.
More formally, we define the effect of t relative to c on Y for unit i as:
We define the average effect of t relative to c in a population P as:© 2010 by sean f. reardon. all rights reserved.
The “Fundamental Problem of Causal Inference” (Holland, 1986)
Although both and are defined in principle, it is impossible to observe both of them for the same unit (because any given unit can be exposed to only one of t or c).
Thus, the causal effect cannot be observed.
The problem of causal inference is thus a problem of missing data. The outcome Yi under its “counterfactual” condition is never observed.
How can we construct unbiased estimates of the average potential outcomes and under the counterfactual conditions?© 2010 by sean f. reardon. all rights reserved.
Observed Student Treatment and Achievement Data
ID Treatment Condition
Test Score
1 no tutoring 55 2 no tutoring 60 3 no tutoring 65 4 tutoring 60 5 tutoring 72 6 tutoring 63
Observed and Possible Unobserved Potential Achievement Data
Student ID Treatment
Condition Score if not
Tutored Score if Tutored
Observed Score
Tutoring Effect
1 no tutoring 55 60 55 +5 2 no tutoring 60 72 60 +12 3 no tutoring 65 63 65 -2 Untutored Average 60 65 60 +5 4 tutoring 55 60 60 +5 5 tutoring 60 72 72 +12 6 tutoring 65 63 63 -2 Tutored Average 60 65 65 +5 Overall Average 60 65 62.5 +5
© 2010 by sean f. reardon. all rights reserved.
Observed Student Treatment and Achievement Data
ID Treatment Condition
Test Score
1 no tutoring 55 2 no tutoring 60 3 no tutoring 65 4 tutoring 60 5 tutoring 72 6 tutoring 63
Observed and Possible Unobserved Potential Achievement Data
Student ID Treatment
Condition Score if not
Tutored Score if Tutored
Observed Score
Tutoring Effect
1 no tutoring 55 60 55 +5 2 no tutoring 60 55 60 -5 3 no tutoring 65 65 65 0 Untutored Average 60 60 60 0 4 tutoring 55 60 60 +5 5 tutoring 70 72 72 +2 6 tutoring 70 63 63 -7 Tutored Average 65 65 65 0 Overall Average 62.5 62.5 62.5 0
© 2010 by sean f. reardon. all rights reserved.
What if we can’t conduct an RCT?
If we can randomize students to receive either tutoring or no tutoring, and ensure that every student complies with his or her assigned treatment status, the randomization will allow us to estimate the effect of tutoring very easily.
but what if students don’t comply with their treatment assignment? some assigned to tutoring don’t go to tutoring some assigned to no tutoring get tutored anyway this means tutoring is no longer randomly assigned – at
least some of the variation in treatment status is potentially endogenous
so a comparison of those assigned to tutoring and no tutoring won’t give us an estimate of the effect of tutoring (but only the effect of being assigned to tutoring)
this is one case where instrumental variables are useful
instrumental variables models
What is an instrumental variable?
an instrumental variable is an exogenous factor that causes some of the variation in treatment status (though need not be all)
we use it to identify the portion of variation in treatment that is exogenous and then only rely on that exogenous variation to estimate the effect of treatment
© 2010 by sean f. reardon. all rights reserved.
A general structural model
T: treatment status
Y: outcome measure
X: observed confounders
U: unobserved confounders
W: observed ignorable causes of Y
Y: unobserved ignorable causes of Y
T: unobserved ignorable causes of T
Z: instrument (observed ignorable cause of T)
© 2010 by sean f. reardon. all rights reserved.
T
Z X
Y
W
T U Y
Relating treatments and outcomes
we would like to estimate the effect of T on Y
this involves seeing how T and Y are related
but to infer a causal relationship from the covariance of T and Y, we need to understand the source of variation in T why do some people get
different types/degrees of the treatment?
© 2010 by sean f. reardon. all rights reserved.
T Y
Relating treatments and outcomes
variation in T may be caused by factors unrelated to the outcome Y these may be observed (Z) or unobserved (T)
if the only variation in Z comes from factors unrelated to Y, then T is as good as randomly assigned, so getting a causal estimate is easy
© 2010 by sean f. reardon. all rights reserved.
T
Z
Y
T
Relating treatments and outcomes
variation in T may be caused, in part, by observed factors that are related to the outcome Y observed confounders (X)
as long as there is some variation in T that is caused by some (not necessarily observable) ignorable cause (Z or T), we can still easily get an estimate of the effect of T statistically control for X
(compute relationship between T and Y, conditional on X)
© 2010 by sean f. reardon. all rights reserved.
T
X
Y
T
Relating treatments and outcomes
variation in T may be caused, in part, by observed and unobserved factors that are related to the outcome Y observed confounders (X) unobserved confounders (U) reverse causality (Y affects T)
here, we cannot get an unbiased estimate of the effect of T statistical control can’t adjust
for U the ignorable cause (T) is not
observed
© 2010 by sean f. reardon. all rights reserved.
T
X
Y
T U
Relating treatments and outcomes
if we cannot observe all the confounders (or if Y affects T), then we need some observed factor that affects T but does not otherwise affect Y
this (Z) is called an instrument (or instrumental variable).
because the part of the variation in T that is induced is ignorable (as good as random), we can use this part of the variation in T to identify the effect of T on Y
© 2010 by sean f. reardon. all rights reserved.
T
Z X
Y
T U
Tutoring example, revisited
the observed data is not sufficient to estimate the average effect of tutoring
what if we can’t do an experiment, or if we do an experiment and not everyone complies?
© 2010 by sean f. reardon. all rights reserved.
tutoring voucher as an instrument
randomly assign eligible students to receive a either voucher allowing them to receive free tutoring (Z=1) or no voucher (Z=0).
observe whether students attend tutoring (T=1) or not (T=0). note: this choice is not random—students may
choose tutoring or not, regardless of voucher status (Ti≠Zi).
observe later achievement (Y) we want to estimate the effect of T
(tutoring vs no tutoring) on Y (achievement).
© 2010 by sean f. reardon. all rights reserved.
Four subpopulations (angrist, imbens, & rubin, 1996)
compliers those who would comply with treatment
assignment (those for whom Ti=Zi) non-compliers
always-takers those who would always receive the treatment,
regardless of assignment (those for whom Ti=1) never-takers
those who would never receive the treatment, regardless of assignment (those for whom Ti=0)
defiers those who would always do the opposite of treatment
assignment (those for whom Ti=1-Zi)
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):
Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):
Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40
might be compliers or never-takers
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):
Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40
might be defiers or never-takers
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):
Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40
might be defiers or always-takers
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply):
Offered Tutored ProportionVoucher No Yes TutoredNo 45 5 .10Yes 15 35 .70 Total 60 40 .40
might be compliers or always-takers
© 2010 by sean f. reardon. all rights reserved.
estimating the proportion of compliers
assume there are no defiers then everyone with Z=1, T=0 is a never-
taker (15 of 50 (30%) with Z=1 in our example)
there should be the same proportion (30%) of never-takers among those with Z=0, because Z is random
the same logic implies there are 10% of the population who are always-takers
thus, 60% (100% - 30% - 10%) are compliers© 2010 by sean f. reardon. all rights reserved.
Estimating the proportion of compliers
we can also estimate this by regressing the treatment variable on the instrument
tutor = G0 + G1*voucher + etutor = .10 + 0.60*voucher
Thus, the average effect of being assigned a voucher on tutoring status is +0.60, meaning that the average student’s probability of receiving tutoring increases by 0.60 if assigned a voucher (which means that 60% of the students comply with the voucher assignment).
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1
Offered Tutored Voucher No Yes TotalNo 48.3 70.0 50.5Yes 44.9 61.6 56.6Total 47.5 62.6 53.5
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1
Offered Tutored Voucher No Yes TotalNo 48.3 70.0 50.5Yes 44.9 61.6 56.6Total 47.5 62.6 53.5
average outcome among untutored compliers and never-takers
here we’re assuming no defiers (later we will see why this is necessary)
© 2010 by sean f. reardon. all rights reserved.
Observed Outcomes
Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1
Offered Tutored Voucher No Yes TotalNo 48.3 70.0 50.5Yes 44.9 61.6 56.6Total 47.5 62.6 53.5
average outcome among untutored compliers and never-takers
average outcome among tutored compliers and always-takers
here we’re assuming no defiers (later we will see why this is necessary)
© 2010 by sean f. reardon. all rights reserved.
OLS estimates
OLS yields:test = 47.5 + 15.1*(tutored)
the estimated effect of tutoring is +15.1 points
but we should worry about whether this is biased, because some students chose whether to get tutoring or not.
the tutored group includes compliers and always-takers; the control group includes compliers and never-takers; so they are not equivalent groups
© 2010 by sean f. reardon. all rights reserved.
The Wald IV estimator
if we are willing to assume that the voucher offer had no effect on the outcome of the non-compliers (because it did not alter their treatment status and does not affect their outcome through any other way), then we can estimate the effect of tutoring like this: The average effect of the voucher in the
population is estimated to be +6.1 but only 60% of students’ decisions about
whether to get tutoring were affected by the voucher offer (only 60% of sample are compliers)
© 2010 by sean f. reardon. all rights reserved.
Wald estimator
average effect in population ( )= average effect on compliers ( )
x proportion who are compliers ( ) + average effect on non-compliers ( )
x proportion who are non-compliers ( )
© 2010 by sean f. reardon. all rights reserved.
Wald estimator
this says that the average effect of the treatment among the compliers equals the average effect in the population divided by the proportion of the population who are compliers
thus, the average effect among the compliers is
= +6.1/.60 = +10.1© 2010 by sean f. reardon. all rights reserved.
What have we learned?
An instrumental variable allows us to estimate the average effect of the treatment among those whose treatment status is affected by the instrument (“compliers”) called the “local average treatment effect” (LATE) note that we can’t identify who the compliers are
We can’t estimate the average treatment effect in the population, because we can’t estimate the effect among non-compliers because the instrument doesn’t affect their treatment
status, there is no exogenous variation in their treatment status that we can use.
© 2010 by sean f. reardon. all rights reserved.
What assumptions have we made?
the instrument only affects the outcome through its impact on the treatment (this is called the exclusion restriction)
the instrument is ignorably (randomly) assigned this allows us to estimate the effect of the
instrument on the outcome and on the treatment
the instrument affects the treatment for at least some people otherwise there are no compliers
there are no defiers© 2010 by sean f. reardon. all rights reserved.
more general IV models
what if treatment is not binary?
above we assumed the treatment (tutoring) was binary
but not all treatments are binary we could offer vouchers of different
amounts students could receive different amounts of
tutoring as a result, compliance may take on
many values for some students, the amount of tutoring
received may be strongly affected by the instrument; for others, it may be weakly affected or not at all affected.
© 2010 by sean f. reardon. all rights reserved.
a more general model of the IV estimator
for a given individual i, is the effect of Z on Y
this effect may vary across individuals
we would like to estimate the averageeffect,
Zi
Yi
i
© 2010 by sean f. reardon. all rights reserved.
1. exclusion restriction
if the only way thatZ affects Y is throughits effect on T, thenwe have .
or, put differently,
the assumption that the only way that Z affects Y is through its effect on T is called the exclusion restriction.
Ti
Zi
Yi
γi
i
© 2010 by sean f. reardon. all rights reserved.
2. zero compliance-effect covariance
we can write the average effect of Z on Y as
if we assume , then we have
the assumption that is called the zero compliance-effect covariance assumption.
© 2010 by sean f. reardon. all rights reserved.
3. instrument relevance
as long as , we can rewrite the above as
the assumption that is sometimes called the instrument relevance assumption; or sometimes just referred to as the assumption that the instrument affects the treatment.
if is small (close to zero), we say that the instrument is a weak instrument.© 2010 by sean f. reardon. all rights reserved.
4. the instrument is ignorably assigned
if the above three assumptions are met, we have
if Z is ignorably assigned, then we can easily estimate both (the average effect of Z on Y) and (the average effect of Z on T).
the assumption of ignorable assignment thus makes estimation of the effect of T on Y possible.© 2010 by sean f. reardon. all rights reserved.
what do these assumptions mean?
exclusion restriction: the offer of a tutoring voucher does not affect students’ achievement except by affecting the amount of tutoring they receive
zero compliance-effect covariance: there is no correlation between how strongly a voucher offer affects the amount of tutoring a student gets and how effective tutoring is for that student
© 2010 by sean f. reardon. all rights reserved.
what do these assumptions mean?
instrument relevance: the offer of a voucher has some effect, on average, on the amount of tutoring students receive (at least one student is affected by the offer).
ignorable assignment of the instrument: the voucher offer is randomly assigned (this would be violated, for example, if the principal gave vouchers to students she deemed most in need of tutoring).
© 2010 by sean f. reardon. all rights reserved.
some examples
NYC voucher experiment (howell et al, 2002; krueger & zhu, 2004)
Effect of schooling on wages, using quarter of birth as instrument (angrist & kreuger, 1991).
Effect of teacher absence on student achievement, using snowfall as instrument (miller, murnane & willet, 2007)
Effects of segregation on educational attainment and wages, using railroads as an instrument (ananat 2007)© 2010 by sean f. reardon. all rights reserved.
estimating IV models
estimating IV models in practice
in practice, we don’t usually compute the effect of Z on Y and Z on T and divide them because we made need more complex
models (if we want to include other covariates in the model, for example)
because we need to compute standard errors
most common methods of estimating IV models is with two-stage least squares (TSLS or 2SLS).
© 2010 by sean f. reardon. all rights reserved.
Three relevant equations
1: is the person-specific effect of Z on Y.
2: is the person-specific effect of Z on T.
but the equation we really are interested in is
3: is the person-specific effect of T on Y.© 2010 by sean f. reardon. all rights reserved.
Three relevant equations
1: is the person-specific effect of Z on Y.
2: is the person-specific effect of Z on T.
but the equation we really are interested in is
3: is the person-specific effect of T on Y.
the “reduced form” equation
© 2010 by sean f. reardon. all rights reserved.
Three relevant equations
1: is the person-specific effect of Z on Y.
2: is the person-specific effect of Z on T.
but the equation we really are interested in is
3: is the person-specific effect of T on Y.
the “reduced form” equation
the “first stage” equation
© 2010 by sean f. reardon. all rights reserved.
Three relevant equations
1: is the person-specific effect of Z on Y.
2: is the person-specific effect of Z on T.
but the equation we really are interested in is
3: is the person-specific effect of T on Y.
the “reduced form” equation
the “first stage” equation
the “second stage” equation
© 2010 by sean f. reardon. all rights reserved.
two-stage least squares
fit the first-stage equation (estimate the effect of Z on T); compute fitted values:
fit the second-stage equation, using predicted values of T in place of observed values of T:
© 2010 by sean f. reardon. all rights reserved.
two-stage least squares
fit the first-stage equation (estimate the effect of Z on T); compute fitted values:
fit the second-stage equation, using predicted values of T in place of observed values of T:
© 2010 by sean f. reardon. all rights reserved.
two-stage least squares
fit the first-stage equation (estimate the effect of Z on T); compute fitted values:
fit the second-stage equation, using predicted values of T in place of observed values of T:
© 2010 by sean f. reardon. all rights reserved.
two-stage least squares
because the predicted values of T from the first-stage equation include only the variation in T that is caused by the instrument, the estimated coefficient from the second-stage equation will be unbiased (as long as the 4 IV assumptions are met).
if you do this by hand, you’ll get the wrong standard errors; statistical software usually has built-in routines (e.g., -ivregress- command in Stata) to compute correct standard errors.
© 2010 by sean f. reardon. all rights reserved.
Effects of attending charter school
we can’t randomize students to charter or traditional public schools
Abdulkadiroglu, et al (2009) examine students who apply to oversubscribed charter schools, whose admission is determined by lottery (randomization)
instrument is winning the lottery treatment is # of years in a charter
school
© 2010 by sean f. reardon. all rights reserved.
example: effect of charter schooling
first stage reduced form 2sls
(compliance) (effect of winning (effect of a
lottery on ach.) year in charter)
© 2010 by sean f. reardon. all rights reserved.
are the IV assumptions valid in this study?
exclusion restriction? zero compliance-effect covariance? instrument relevance? ignorable assignment?
© 2010 by sean f. reardon. all rights reserved.
sources of bias in IV models
sources of bias in IV
failure of exclusion restriction assumption failure of ignorability assumption failure of zero compliance-effect covariance
assumption finite sample bias weak instruments cause 3 problems:
exacerbate bias due to failure of assumptions (exclusion restriction, ignorability, zero covariance)
exacerbate finite sample bias lead to incorrect estimation of standard errors when
using two-stage least squares
© 2010 by sean f. reardon. all rights reserved.
failure of the exclusion restriction
recall that the exclusionrestriction says that theonly way that Z affects Y
is through its effect on T.
as a result, we can write
Ti
Zi
Yi
γi
i
© 2010 by sean f. reardon. all rights reserved.
failure of the exclusion restriction
if the exclusion restriction is violated, then there is some other path through which Z affects Y
as a result, we can write
Ti
Zi
Yi
γi
i
Zi
Yi
i
© 2010 by sean f. reardon. all rights reserved.
failure of the zero covariance assumption
averaging the above in the population
now, dividing through by , we get
so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be
larger
© 2010 by sean f. reardon. all rights reserved.
failure of the zero covariance assumption
averaging the above in the population
now, dividing through by , we get
so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be
larger
bias due to failure of the exclusion restriction
bias due to failure of the zero compliance-effect covariation assumption
© 2010 by sean f. reardon. all rights reserved.
failure of the zero covariance assumption
if all the assumptions except the zero compliance-effect covariance assumption are met, we have
so the IV model will estimate the compliance-weighted average treatment effect (CWATE). if T is binary and there are no defiers, this
will be the same as the average effect among the compliers (LATE), because non-compliers will get 0 weight.
© 2010 by sean f. reardon. all rights reserved.
failure of the ignorability assumption
if the instrument is not ignorably assigned, then we cannot obtain unbiased estimates of the effect of Z on Y or of the effect of Z on T.
Thus, the ratio of the two may be biased.
© 2010 by sean f. reardon. all rights reserved.
weak instruments
weak instruments do not, strictly-speaking, violate any of the IV assumptions, but they do exacerbate the bias from other assumptions
rule of thumb: an instrument is weak if the F-statistic on the instrument(s) from the first stage equation is <10.
© 2010 by sean f. reardon. all rights reserved.
weak instruments and bias the IV estimator
weak instruments cause 3 problems with IV estimator: exacerbate bias due to failure of the exclusion
restriction, ignorability, and monotonicity exacerbate finite sample bias lead to incorrect estimation of standard errors when
using two-stage least squares finite sample bias
even if the 4 IV assumptions are met, IV estimation is biased unless using an infinite sample
most pronounced with weak instruments and small samples
© 2010 by sean f. reardon. all rights reserved.
additional uses
mediation models
suppose we randomly assign a treatment (e.g., teacher professional development) that we think will affect student learning by affecting instructional practice
we can treat the PD as an instrument, and the mediator (instructional practice) as the ‘treatment’ and use IV to estimate the effect of instructional practice (which can’t be randomized) on learning but worry about exclusion restriction (are
there other ways that the PD could affect learning?)
© 2010 by sean f. reardon. all rights reserved.
multiple mediator models
suppose we have a randomize students to 3 treatment conditions.
two first stage equations:
second stage equation:
© 2010 by sean f. reardon. all rights reserved.
IV to correct for measurement error
suppose we want to estimate the effect of cognitive skill on wages:
if cognitive skill is measured with error by ACH, OLS will give a biased estimate of .
if we have a second test of skills, we can use one test as an instrument for the second test, and then use the predicted value of the second test in the wage equation.
called “errors-in-variables” (EIV) model.© 2010 by sean f. reardon. all rights reserved.