View
3
Download
0
Category
Preview:
Citation preview
Model/Assumptions Identification Estimation
The Econometric Evaluation
of Policy Design:
Part III:
Selection Models and the MTE
Edward Vytlacil,Yale University
Renmin UniversityMarch 2019
1 / 72
Model/Assumptions Identification Estimation
Lectures primarily drawing upon:
Heckman and Vytlacil (2001b), “Local InstrumentalVariables”
Heckman, Vytlacil and Urzua (2006), “UnderstandingInstrumental Variables in Models with EssentialHeterogeneity”
while also drawing on other work to a lesser extent.
2 / 72
Model/Assumptions Identification Estimation
Imbens and Angrist (1994) establish that IV can identifyan interpretable parameter in the model with essentialheterogeneity. Their parameter is instrument dependent,and whether it is interesting depends on the context.
Their parameter is a discrete approximation to themarginal gain parameter of Bjorklund and Moffitt (1987).
Their assumptions are equivalent to imposingnonparametric selection model (Vytlacil, 2002).
These observations motivate Heckman and Vytlacil MTEframework.
3 / 72
Model/Assumptions Identification Estimation
Selection Models
Heckman, Vytlacil and co-authors
Impose Nonparametric Selection Model
Goals:
Unify literature with a common set of underlyingparameters interpretable across studies.
To understand relationship between selection andtreatment effect heterogeneity.
Consider strategies other than linear IV, and parametersother than LATE.
4 / 72
Model/Assumptions Identification Estimation
Today
Selection Model, Assumptions.
The Marginal Treatment Effect
Identification Analysis
Estimation Analysis
Interpreting the Instrumental Variables Estimand
Testing for Essential Heterogeneity
Applications
Will continue to cover additional topics related to the MTEframework in the next lecture.
5 / 72
Model/Assumptions Identification Estimation
Model for outcomes
Y1 = µ1 (X ,U1) (1.1)
Y0 = µ0 (X ,U0) .
X are observed and (U1,U0) are unobserved by theanalyst.
Allow X 6⊥⊥ U0,U1.
∆ = Y1 − Y0 = µ1 (X ,U1)− µ0 (X ,U0)(Treatment Effect)
6 / 72
Model/Assumptions Identification Estimation
Model for outcomes
A special case that links our analysis to standard models ineconometrics:
Y1 = Xβ1 + U1 and
Y0 = Xβ0 + U0; so
∆ = X (β1 − β0) + (U1 − U0).
In the case of separable outcomes, heterogeneity in ∆ arisesbecause in general U1 6= U0 and people differ in their X .
We will not require linearity or any parametric form on µ1(X ),µ0(X ), but linearity useful in practice for estimation toincrease precision and allow for high dimensional X .
7 / 72
Model/Assumptions Identification Estimation
Conditioning on Observed Covariates
For purposes of identification, will conduct analysisconditional on Xi .
For now, I will suppress conditioning on Xi , leavingimplicit conditioning on Xi .
We will make conditioning on Xi explicit when discussionestimation issues.
8 / 72
Model/Assumptions Identification Estimation
Threshold Crossing Model for Di
Heckman-Vytlacil impose:
Di = 1 [D∗i > 0] ,
D∗i = µD(Zi)− Vi ,
with
Zi ⊥⊥ Vi ,
Zi observed random vector,
Vi continuous, unobserved random variable.
µD (Zi)− Vi can be interpreted as a net utility for a personwith characteristics (Zi ,Vi).
9 / 72
Model/Assumptions Identification Estimation
Additive Separability in Threshold Crossing Model for D
D = 1 [µD(Z )− V > 0] .
Separability between V and Z in latent index of choiceequation is conventional.
Plays a critical role in the properties of instrumentalvariable estimators in models with essential heterogeneity.
Wider class of latent index models will have arepresentation in this form (Vytlacil, 2006).
Vytlacil (2002) shows that independence andmonotonicity of Imbens and Angrist (1994) is equivalentto this model with Zi ⊥⊥ Vi .
10 / 72
Model/Assumptions Identification Estimation
Propensity Score, Normalization
Define P(z) as the propensity score:
P(z) = Pr(Di = 1 | Zi = z) = Pr(µD(z) > Vi) = FV (µ(z)).
As normalization, can then rewrite model as:
Di = 1 [µD (Zi)− Vi ≥ 0]
= 1 [FV (µD (Zi)) ≥ FV (Vi)]
= 1 [P (Zi) ≥ UDi ] ,
with UDi ≡ FV (Vi) ∼ Unif[0, 1].
11 / 72
Model/Assumptions Identification Estimation
Threshold Crossing Model as Reduced Form for Roy Model
Special Case: Generalized Roy model:
D = 1[Y1 − Y0 − C > 0],
where
Outcomes: Y1 = µ1(X ) + U1,
Y0 = µ0(X ) + U0,
Costs: C = µC (W ) + UC ,
Z = (X ,W ) .
Implying following reduced form:
⇒{ D = 1[µD(Z )− V ≥ 0]µD (Z ) = µ1 (X )− µ0 (X )− µC (W )
V = − (U1 − U0 − UC ) .
12 / 72
Model/Assumptions Identification Estimation
Threshold Crossing Model Nests Parametric Examples: Probit Model
Special Case: D determined by Probit Model
D = 1 [µD(Z )− V > 0].
Probit Model special case with
µD(Z ) = γZ ,
V ∼ N(0, 1).
⇒ P(z) = Φ(γz), with Φ(·) std. normal cdf,⇒ D = 1[γZ − V ≥ 0] = 1[Φ(γZ )− Φ(V ) ≥ 0].
If Yj = Xβj + Uj and (V ,U0,U1) joint normal, than classicHeckman normal selection model. (see, e.g. Heckman (1978)).
If Yj = 1[Xβj + Uj ≥ 0] and (V ,U0,U1) joint normal, thanclassic bivaraite probit model with structural shift.
13 / 72
Model/Assumptions Identification Estimation
Threshold Crossing Model Nests Parametric Examples: Logit Model
Logit Model special case with
µD(Z ) = γZ ,
V ∼ standard logistic .
⇒ P(z) = Λ(γz), with Λ(·) std. logistic cdf.⇒ D = 1[γZ − V ≥ 0] = 1[Λ(γZ )− Λ(V ) ≥ 0].
We will not impose any parametric functional form ordistributional assumption for identification theory, but may bewill be useful in practice to increase precision and allow formultidimensional Z .
See, e.g.,
Heckman, Tobias, and Vytlacil (2003)
Aakvik, Heckman and Vytlacil (2005).14 / 72
Model/Assumptions Identification Estimation
MTE
Key, unifying parameter,Marginal Treatment Effect (MTE):
MTE (u) = E (Y1 − Y0 | UD = u).
MTE and the local average treatment effect (LATE)parameter are closely related (MTE can be seen as limitof LATE parameters).
Generalizes the marginal gain parameter of Bjorklund andMoffitt (1987).
How MTE (u) varies with u uncovers relationship betweenselection and treatment effect heteorgeneity.
Provides average effect at given quantile of unobserveddesire for treatment.
15 / 72
Model/Assumptions Identification Estimation
Figure: MTE for Effect of DI on EmploymentFrom Maestas, Mullen and Strand (2013, AER)
MTE for Effect of DI on Emplolyment
1825MAESTAS ET AL.: CAUSAL EFFECTS OF DISABILITY INSURANCE RECEIPTVOL. 103 NO. 5
the predicted probability of SSDI receipt. Specifically, we regress initial allowance decisions on indicators for type of impairment, age group, decision month, and DDS, as well as a measure of average prior earnings, and construct the residual, Z, which by construction is orthogonal to the case mix controls and varies systemati-cally only with EXALLOW. Then we estimate a probit of ultimate SSDI receipt on the residualized Z. This is our measure of the predicted probability of SSDI receipt, P(Z ). Next we estimate a local quadratic regression of employment on predictedSSDI receipt and compute the numerical derivative of this function to estimate ∂E[ y]/∂P(Z ).
Figure 7 shows the MTE as a function of unobserved severity, where severity is reverse ordered and measured in percentiles (see definition of u in Section IVA),along with boot-strapped 95 percent confidence intervals. Applicants on the margin for an examiner with a predicted SSDI receipt rate of 65 percent (the mean rate)are in the sixty-fifth percentile of the unobserved (reverse) severity distribution.That is, they have an impairment that is less severe than 65 percent of applicants, and more severe than 35 percent of applicants. Since we estimate that 57 percent of applicants are always takers (that is, they would receive SSDI benefits regardless ofinitial examiner assignment), the MTE is not identified for applicants on the marginof SSDI receipt rates less than 57 percent. Similarly, the MTE is not identified for applicants on the margin of SSDI receipt rates greater than 80 percent (= 57 + 23,the fraction of marginal applicants). As a result, we are only able to trace the MTEfor applicants between the fifty-seventh and eightieth percentiles of the unobserved (reverse) severity distribution (or the twentieth to forty-third percentiles of theactual unobserved severity distribution s). The estimates become imprecise at themore extreme ends of the distribution since there are relatively small numbers of examiners with margins at these points.
–2
–1.5
–1
–0.5
0
0.55 0.6 0.65 0.7 0.75 0.8
Percentile of (reverse) unobserved severity distribution
Figure 7. Marginal Treatment Effect on Employment
Notes: Ninety-five percent confidence intervals shown with dashed lines. Bandwidth is 0.084.
Source: DIODS data for 2005 and 2006.
Source: Maestas, Mullen and Strand, “Does Disability Insurance Receipt Discourage Work? Using ExaminerAssignment to Estimate Causal Effects of SSDI Receipt.” (2013, AER)
16 / 72
Model/Assumptions Identification Estimation
Figure: MTE for Effect of Norwegian VR Training on EmploymentFrom Aakvik, Heckman and Vytlacil (2005, JOE)
MTE for Effect of Norwegian Vocational Rehabilitation on Employment
7.3. Heterogeneity in observables
The estimated treatment effect vary substantially with observed characteristics.For example, the variance of EðDjX Þ is 0.0064 (standard error ¼ 0:08), compared toits mean of �0:014: The variance of EðDjX ;D ¼ 1Þ is 0.0085 (standard error¼ 0:092)compared to its mean of �0:11: The degree to which the treatment effect varies withobservable characteristics can also be seen by studying the marginal effect of eachobservable characteristic on the expected treatment effect. The marginal effects onthe treatment parameters are reported in Table 4. For example, being older, havinglower pre-program income, having lower spouse’s income, and having youngchildren are all associated with a larger treatment effect for all definitions of meantreatment effects. We develop this point further after we analyze distributionaltreatment parameters.
7.4. Estimated distributional treatment parameters
The distributional treatment effect parameters capture an additional type oftreatment effect heterogeneity beyond that previously discussed for mean treatmenteffects. We now report estimates of the distributional treatment parameters. Table 5reports the distributional versions of ATE; TT ; and MTE evaluated at selectedvalues of UD: We find that if a random applicant is assigned to training, withprobability 0.225 the applicant benefits from the training, that is, will be employedafter receiving the training but would have been unemployed without the training.However, with probability 0.24 the applicant will be hurt by receiving the training,
ARTICLE IN PRESS
Fig. 1. Estimated marginal treatment effect.
A. Aakvik et al. / Journal of Econometrics 125 (2005) 15–5138
Source: Aakvik, Heckman and Vytlacil, “Estimating treatment effects for discrete outcomes when responses totreatment vary: an application to Norwegian vocational rehabilitation programs.” (2005, JOE)
17 / 72
Model/Assumptions Identification Estimation
Figure: MTE for Effect of Fertility on Schooling of First BornBrinch, Mogstad, and Wiswall (JPE, 2017)
MTE for Effect of Additional Child on Yrs of Schooling of First Born
Figure 5: MTE estimates with same-sex instrument
0.2 0.3 0.4 0.5 0.6 0.7−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
p
E(
Y1 −
Y0 | p
,X )
Note: This �gure displays the MTE estimates from the semiparametric generalized Roy model based
on Assumptions 1 and 2, with �Same sex, �rst and second� as instrument. We construct P(Z) using
the parameter estimates from the logit model with average derivatives reported in Table 2. We use
the same speci�cation for the covariates as reported in Table 2. The MTE estimates are based on
double residual regression separately for the treated and non-treated, using local quadratic regression
with rectangular kernel and bandwidth of 0.055. The 95 percent con�dence interval is computed from a
non-parametric bootstrap with 100 bootstrap replications. The y-axis measures the value of the MTE
in years of schooling, whereas the x-axis represents the unobserved component of parents' net gain from
having 3 or more children rather than 2 children. A high value of p means that a family is less likely to
have 3 or more children.
39
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
18 / 72
Model/Assumptions Identification Estimation
Connection: LATE and MTE
LATE and MTE closely connected.
For (z , z ′) such that P(z) > P(z ′), LATE is:
E (Y1 − Y0 | D(z) = 1,D(z ′) = 0)
= E (Y1 − Y0 | P(z ′) < UD ≤ P(z))
=
∫ P(z)
P(z ′)
∆MTE (u)du,
using that UD ∼Unif[0, 1] and
D(z) = 1⇔ UD ≤ P(z)
D(z ′) = 0⇔ UD > P(z) .
19 / 72
Model/Assumptions Identification Estimation
MTE
Marginal Treatment Effect (MTE):
MTE (u) = E (Y1 − Y0 | UD = u).
Many treatment effect parameters can be represented asweighted averages of MTE.
Broad class of policy counterfactuals can be representedas weighted averages of MTE.
20 / 72
Table: A. Treatment effects and IV estimands as weighted averagesof MTE
ATE= E (Y1 − Y0) =∫ 1
0 MTE(uD) duD
TT= E (Y1 − Y0|D = 1) =∫ 1
0 MTE(uD)ωTT(uD) duD
TUT(x) = E (Y1 − Y0|D = 0) =∫ 1
0 MTE (uD) ωTUT (uD) duD
PRTE= E (Ya′)− E (Ya) =∫ 1
0 MTE (uD) ωPRTE (uD) duD
IVJ =∫ 1
0 ∆MTE(uD)ωJIV(uD) duD , given instrument J(Z )
B. Weights
ωATE(uD) = 1
ωTT(uD) =1− FP(Z)(uD)
E (P(Z ))
ωTUT (uD) =FP(Z)(uD)
E ((1− P(Z )))
ωPRTE(uD) =
[FPa′
(uD)−FPa (uD)
∆P
], where ∆P = E (Pa)− E (Pa′)
ωJIV(uD) = ωJ
IV (u) =E(J(Z)−J(Z)|P(Z)>u) · FP(Z)(u)
Cov(J(Z),D)
The weights in the table all integrate to one.
Weights for ATE, TT, and TUT will be nonnegative.
Weights for IV will be nonnegative if J(Z ) is a monotonicfunction of P(Z ), but need not be otherwise.
Policy Relevant Treatment Effect (PRTE) is effect of policycounterfactual that changes incentives for treatment withresulting tchange in distribution of fitted probabilities fromFPa to FPa′ .
Source: Heckman and Vytlacil (2005).
Model/Assumptions Identification Estimation
Figure 4: Weights on MTE for Alternative Parameters(Hypothetical Example)
Weights on MTE for Alternative Parameters (Hypothetical Example)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0 . 5
1
1 . 5
2
2 . 5
3
3 . 5 ω(uD )
u D
MTE 0.35
MTE
ATE
TT
0
TUT
Source: Heckman and Vytlacil (2005)
24 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Parameters
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
25 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Instruments
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
26 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Instruments, P(Z)
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017). 27 / 72
Model/Assumptions Identification Estimation
Identification of the MTE
Different parameters can be seen as different weightedaverages of MTE, IV is a weighted average of MTE.
If can identify MTE, can:
1 Integrate MTE to obtain other parameters of interest
2 Understand connection between selection into treatmentand individual effects.
How to identify MTE?
28 / 72
Model/Assumptions Identification Estimation
Identification of the MTE (cont’d)
E (Y | P(Z ) = p) = E (DY1 + (1− D)Y0 | P (Z ) = p)
= E (Y0) + E (D (Y1 − Y0) |P (Z ) = p)
= E (Y0) +
[E (Y1 − Y0|D = 1,P (Z ) = p)
·Pr (D = 1 | Z = z)
]
= E (Y0) +
∫ p
0
E (Y1 − Y0|UD = u) du.
⇒ ∂
∂pE (Y | P(Z ) = p)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|UD = p)︸ ︷︷ ︸MTE
.
29 / 72
Model/Assumptions Identification Estimation
Identification of the MTE (cont’d)
LIV (Local Instrumental Variables) identifies MTE
∂
∂pE (Y | P(Z ) = p)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|UD = p)︸ ︷︷ ︸MTE
. (2.1)
Suppose P(Z ) is continuous.(requires at least one component of Z be continuous)Then ∆MTE (u) identified by LIV for u ∈ Supp(P(Z )).
The greater the variation in P(Z ), the greater the rangeover which MTE is identified.
30 / 72
Model/Assumptions Identification Estimation
Using MTE for Identification of Treatment Effects
Treatment Parameter (j) =∫ 1
0∆MTE (u) ωj (u) du,
Identification using this relationship requires identificationof ∆MTE (u) for u such that ωj (u) 6= 0.
We identify ∆MTE (u) for u ∈ Supp(P(Z )).(supposing P(Z ) continuous).
Thus, to integrate MTE to identify treatment parameter,require Supp(P(Z )) ⊇ {u : ωj (u) 6= 0}.
Strong requirement for traditional treatment parameters,typically “identification at infinity” requirement.
31 / 72
Model/Assumptions Identification Estimation
Using MTE for Identification of Treatment Effects (cont’d)
To integrate MTE to identify treatment parameter,require Supp(P(Z )) ⊇ {u : ωj (u) 6= 0}.For example:
For ATE, need Supp(P(Z )) = [0, 1],For TT, need Supp(P) = [0, pu],For TUT, need Supp(P) = [pl , 1].
Even with identification at infinity, estimation oftraditional parameters involves estimation on thin sets,slow rate of convergence.
Same issue as Andrews and Schafgans (1998).
32 / 72
Model/Assumptions Identification Estimation
Using MTE for Identification of Treatment Effects (cont’d)
Can identify without integrating MTE to obtainparameter under slightly weaker conditions, but stillrequire identification at infinity without imposing morestructure.
For example, for ATE, require Supp(P(Z )) ⊇ {0, 1},instead of Supp(P(Z )) = [0, 1].
Can follow bounding/partal-identification approach ifsupport of P(Z ) does not allow point identification.
33 / 72
Model/Assumptions Identification Estimation
Partial Identification
Partial identification analysis for traditional treatmentparameters developed by Heckman and Vytlacil (2001),
Suppose potential outcomes are bounded,Pr[yl ≤ Yj ≤ yu] = 1, j = 0, 1.
They develop sharp bounds on traditional treatmentparameters.
Width of bounds on ATE depends linearly on distance ofmaximum propensity score from one and minimumpropensity score from zero.
Width of bounds on TT depends linearly on distance ofminimum propensity score from zero.
Relation to Balke and Pearl (1997)?
See also Mogstad, Santos and Torgovitsky (2018).34 / 72
Model/Assumptions Identification Estimation
Partial Identification of MTE
Heckman, Li, Oka and Vytlacil (2017):“Identification of Treatment Effects Under Discrete Variation in thePropensity Score.”
Suppose that
potential outcomes are bounded:Pr[yl ≤ Yj ≤ yu] = 1, j = 0, 1,know a priori that the MTE is a (weakly) monotonicfunction, know the direction of the monotonicity.
Develop sharp bounds on MTE for any given support of thepropensity score, including the propensity score being a discreterandom variable.
Bounds are nontrivial, even if the distribution of the propensityscore is degenerate.
Without imposing monotonicity on MTE, and without imposingother assumptions, the sharp bounds on MTE can be trivial.
35 / 72
Model/Assumptions Identification Estimation
Using MTE for Identification of Treatment Effects (cont’d)
Without large support, can still:
1 Bound conventional parameters as discussed above.
2 Understand treatment effect for some groups ofindividuals, and understand part of the connectionbetween selection and individual effects, by examiningMTE over identified values.
3 We will show that one can still nonparametricallyidentify average effect for those on margin ofindifference, and effect of marginal policy changes,without large support requirements.
36 / 72
Model/Assumptions Identification Estimation
Alternative Parameters of Interest: AMTE, MPRTE.
Carnerio, Heckman and Vytlacil (2010 ECMA, 2011 AER)consider following parameters:
Average Marginal Treatment Effect (AMTE):
E (Y1 − Y0|P(Z ) = U).
Marginal Policy Relevant Treatment Effect (MPRTE)
Consider policy counterfactuals that change incentivesfor treatment in particular direction. Consider limit ofsuch policy counterfactuals for infinitesimal change inincentives.
The AMTE parameter is not uniquely defined (Borel Paradox).
CHV show that alternative definitions of AMTE are equivalentto alternative versions of MPRTE corresponding toinfinitesimal policy changes in alternative directions.
37 / 72
Model/Assumptions Identification Estimation
Change Parameter of Interest: AMTE, MPRTE
CHV show that
AMTE and MPRTE parameters can be written asweighted averages of MTE, with weights that depend onparticular definition of AMTE, equivalently, depend ondirection of marginal policy change.
Identification of AMTE, MPRTE parameters depends onhaving a continuous instrument, but does not otherwisedepend on support of P(Z ). No need for identification atinfinity.
Nonparametric estimation of AMTE, MPRTE parametersfundamentally easier than estimation of traditionaltreatment parameters, can be consistently estimated at√N−rate.
38 / 72
Model/Assumptions Identification Estimation
Estimation
Possible parameters of interest:
1 The MTE function itself, E [Y1 − Y0 | UD = u].
2 Some parameter that is a functional of MTE,
1 Traditional parameters, e.g., ATE, TT, etc.2 Non-traditional parameters, e.g., AMTE, MPRTE, etc.3 Probability limit of IV, interpret.
First consider estimation of MTE.
39 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
∂
∂pE (Y | P(Z ) = p)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|UD = p)︸ ︷︷ ︸MTE
.
Suppose no need to condition on other covariates forinstruments to be valid (or at least no other continuouscovariates).
Suppose at least one element of Z is continuous, andresulting P(Z ) is continuous.
Then can non-parametrically estimate∂∂pE (Y | P(Z ) = p), for example, through local
polynomial regression of Y on P(Z ), with P(Z )estimated in a first step.
40 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
∂
∂pE (Y | P(Z ) = p)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|UD = p)︸ ︷︷ ︸MTE
.
Explosion of recent analysis using as instrumentsjudges’/administrators’ proclivity to assign treatment.
Resulting instruments are
plausibly exogenous without conditioning on additionalcovariates,approximately continuous.Concerns with estimation, inference?
Such papers often nonparametrically estimate∂∂pE (Y | P(Z ) = p).
41 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
Often need to condition on X for Z to be plausiblyexogenous:
∂
∂pE (Y | P(Z ) = p,X = x)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|X = x ,UD = p)︸ ︷︷ ︸MTE
.
Z understood to possibly include some or all elements ofX .
Z |X varies due to elements of Z that are not elements ofX , those elements are the instruments.
42 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
Often need to condition on X for Z to be plausiblyexogenous:
∂
∂pE (Y | P(Z ) = p,X = x)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|X = x ,UD = p)︸ ︷︷ ︸MTE
.
In theory, can still non parametrically estimate∂∂pE (Y | P(Z ) = p,X ). However, . . .
43 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
Problem: Curse of Dimensionality
If X contains continuous elements, especially multiplecontinuous elements, point wise estimation ofE (Y | P(Z ) = p,X = x) will be very poor.
Formally: very slow rate of convergence. Expect largebias and high imprecision in finite samples. Expectasymptotics to be poor guide.
Point-wise estimation of derivative ofE (Y | P(Z ) = p,X = x) should be even more difficult.
All of above problems, but more so.
44 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE
∂
∂pE (Y | P(Z ) = p,X = x)
︸ ︷︷ ︸LIV
= E (Y1 − Y0|X = x ,UD = p)︸ ︷︷ ︸MTE
.
Additional Issue: Often Z discrete, or discrete variationconditional on X , in which case cannot nonparametricallyidentify (much less estimate) E (Y | P(Z ) = p,X = x).
45 / 72
Model/Assumptions Identification Estimation
Estimation of MTE
Options if continuous X and/or Z discrete:
1 Impose semiparametric structure.
For example, impose linear regression model on Y1,Y0
resulting in semiparametric, partially linear regressionmodel for E [Y | X ,P(Z )].
2 Impose parametric model on (D,Y0,Y1) | (X ,Z ).
For example, classical Heckman selection model withjoint normality.
3 Impose parametric functional form restrictions directly onE (Y1 | X = x ,UD = u),E (Y0 | X = x ,UD = u).
E (Y1 | X = x ,UD = u),E (Y0 | X = x ,UD = u) called“Marginal Treatment Response” (MTR) functions.See Brinch, Mogstad, and Wiswall (JPE, 2017)
46 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of Treatment Parameters through MTE
Estimation of Other Treatment Parametersas a Functional of MTE:Additional Problem: Support Problem, Irregular Estimation
To estimate MTE non parametrically for all evaluationpoints, need support of P(Z ) conditional on X to be fullunit interval.
Requires extremely powerful instrument.
To integrate up MTE to traditional parameters, requireMTE over broad support.
Traditional treatment parameters are “non-smooth”functions of MTE, expect slower than
√N estimation.
47 / 72
Model/Assumptions Identification Estimation
Nonparametric Estimation of MTE, Treatment Parameters
Realistically, would need extremely large samples andextremely strong instruments to have nonparametricestimation of MTE and of traditional treatmentparameters to be feasible, even if X is low dimensional.
What is feasible?
Estimation of average effect for those on margin ofindifference, and effect of marginal policy changes,fundamentally easier than for traditional parameters.Can estimate IV, interpret.Can follow bounding approach.Can incorporate some parametric functional formrestrictions, follow semi parametric or parametricestimation approaches.
48 / 72
Model/Assumptions Identification Estimation
Semiparametric Estimation of MTE
When need to condition on covariates for validity ofinstruments, most common method is to follow Heckman,Vytlacil and Urzua (2006) and Carneiro, Heckman and Vytlacil(2011), impose:
Y1 = Xβ1 + U1,
Y0 = Xβ0 + U0,
⇒ Y = Xβ0 + DX (β1 − β0) + D(U1 − U0) + U0.
49 / 72
Model/Assumptions Identification Estimation
Semiparametric Estimation of MTE
Assume (X ,Z ) ⊥⊥ (UD ,U0,U1).
Y = Xβ0 + DX (β1 − β0)
+D(U1 − U0) + U0,
⇒ E (Y | X ,P(Z )) = Xβ0 + P(Z )X (β1 − β0) + K (P(Z )),
where
K (P(Z )) = E (D(U1 − U0)|P(Z ))
= P(Z )E (U1 − U0|D = 1,P(Z ))
= P(Z )E (U1 − U0|UD ≤ P(Z )).
50 / 72
Model/Assumptions Identification Estimation
Semiparametric Estimation of MTE (cont’d)
E (Y | X ,P(Z )) = Xβ0 + P(Z )X (β1 − β0) + K (P(Z )),
K (P(Z )) = P(Z )E (U1 − U0|UD ≤ P(Z )).
Thus, imposing linear model on potential outcomes results in apartially linear model for the observed outcome.
If impose joint normality assumptions on error terms, or otherjoint parametric distributional assumption on error terms, thanK is a known function (possibly up to finite dimensionalparameter vector), and we have a standard non-linearparametric regression model.
Without imposing parametric distributional assumption onerror terms, K (·) is an unknown, nonparametric function.
51 / 72
Model/Assumptions Identification Estimation
Semiparametric Estimation of MTE (cont’d)
E (Y | X ,P(Z )) = Xβ0 + P(Z )X (β1 − β0) + K (P(Z )),
K (·) unknown function, suggests semiparametric multistepestimation strategy.
1 Estimate P(Z ) in first step, either parametrically orsemi/nonparametrically (using, e.g., Ichimura 1993; Kleinand Spady, 1993; Ahn, Ichimura and Powell 2004).
Most applications use a parametric model for P(Z ), aprobit or a logit.
2 Flexibly estimate E (Y | X ,P(Z )) using estimated P(Z ).
52 / 72
Model/Assumptions Identification Estimation
Semiparametric Estimation of MTE (cont’d)
2 Flexibly estimate E (Y | X ,P(Z )) using estimated P(Z ),for example, using:
Partial linear regression/nonparametric double residualregression techniques, as in Robinson (1988), orRegress Y on X ,P(Z )X , and a series in P(Z ), adaptingDas, Newey and Vella (2003), Newey, Powell and Vella(1999).
Note dimension reduction.
See also Cattaneo, Jansson, and Ma (2019),“Two-Step Estimation and Inference with Possibly ManyIncluded Covariates.,” (Review of Economic Studies,forthcoming)
53 / 72
Model/Assumptions Identification Estimation
Implications of Semiparametric Model
E [Y | X = x ,P(Z ) = p] = xβ0 + p · x(β1 − β0) + K (p),
E (Y1 − Y0|X = x ,UD = p) =∂
∂pE (Y | X = x ,P(Z ) = p)
= X (β1 − β0) + k(p)
where k(p) = K ′(p).
With partially linear restriction, X shifts MTE function up anddown by a constant, does not change curvature.
54 / 72
Model/Assumptions Identification Estimation
Extrapolation based on Semiparametric Model
E [Y | X = x ,P(Z ) = p] = xβ0 + p · x(β1 − β0) + K (p),
E (Y1 − Y0|X = x ,UD = p) =∂
∂pE (Y | X = x ,P(Z ) = p)
= X (β1 − β0) + k(p)
where k(p) = K ′(p).
Nonparametrically, identify E [Y | X = x ,P(Z ) = p] for(x , p) ∈ Supp(X ,P(Z )).
Exploiting semiparametric, partially linear structure,identify E [Y | X = x ,P(Z ) = p] for(x , p) ∈ Supp(X )× Supp(P(Z )).
55 / 72
Model/Assumptions Identification Estimation
Extrapolation based on Semiparametric Model
Nonparametrically, identify E [Y | X = x ,P(Z ) = p] for(x , p) ∈ Supp(X ,P(Z )).
Exploiting semiparametric, partially linear structure,identify E [Y | X = x ,P(Z ) = p] for(x , p) ∈ Supp(X )× Supp(P(Z )).
Supp(P(Z )) typically much larger than Supp(P(Z ) | X ).
P(Z ) may be continuous even when P(Z ) | X is discrete.
Semiparametric, partially linear regression thus allowsconsiderable extrapolation, somewhat controversial.
56 / 72
Model/Assumptions Identification Estimation
Example: college attendance on wages for high school graduates
Table 2Definitions of the Variables Used in the Empirical Analysis
Variable Definition
Y Log Wage in 1991 (average of all non-missing wages between 1989 and 1993)S=1 If ever Enrolled in College by 1991: zero otherwiseX AFQT,a Mother’s Education, Number of Siblings, Average Log Earnings 1979-
2000 in County of Residence at 17, Average Unemployment 1979-2000 in State ofResidence at 17, Urban Residence at 14, Cohort Dummies, Years of Experience in1991, Average Local Log Earnings in 1991, Local Unemployment in 1991.
Z\Xb Presence of a College at Age 14 (Card, 1993, Cameron and Taber, 2004), LocalEarnings at 17 (Cameron and Heckman, 1998, Cameron and Taber, 2004), LocalUnemployment at 17 (Cameron and Heckman, 1998), Local Tuition in Public 4Year Colleges at 17 (Kane and Rouse, 1995).
Note: aWe use a measure of this score corrected for the effect of schooling attained by the participant at the date of the test, since at the
date the test was taken, in 1981, different individuals have different amounts of schooling and the effect of schooling on AFQT scores is
important. We use a version of the nonparametric method developed in Hansen, Heckman, et al. (2004). We perform this correction for
all demographic groups in the population and then standardize the AFQT to have mean 0 and variance 1. See Table A-2. bThe papers
in parentheses are papers that previously used these instruments.
2
Source: Carneiro, Heckman and Vytlacil (2009)
57 / 72
Model/Assumptions Identification Estimation
Example: college attendance on wages for high school graduates
Figure 2: Support of P Conditional on X
0
0.5
1
1.5
2
2.5
00.1
0.20.3
0.40.5
0.60.7
0.80.9
10
0.05
0.1
0.15
0.2
XP
f(P|X
)
P is the estimated probability of going to college. It is estimated from a logit regression of collegeattendance on corrected AFQT, mother’s education, number of siblings, urban residence at 14,permanent earnings in the county of residence at 17, permanent unemployment in the state ofresidence at 17, cohort dummies, a dummy variable indicating the presence of a college in thecounty of residence at age 14, average log earnings in the county of residence at age 17, averagestate unemployment in the state of residence at age 17 (see Table 3). X corresponds to an index ofvariables in the outcome equation.
58 / 72
Model/Assumptions Identification Estimation
Example: college attendance on wages for high school graduates
E(Y | X ,P) as a function of P for average X
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−3.5
−3.4
−3.3
−3.2
−3.1
−3
−2.9
−2.8
−2.7
P
E(Y|P)
Source: Carneiro, Heckman and Vytlacil (2009)
59 / 72
Model/Assumptions Identification Estimation
Example: college attendance on wages for high school graduates
E(Y1 − Y0) | X ,U) estimated using locally quadratic regression (averaged over X )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4
−0.2
0
0.2
0.4
0.6
0.8
1E
(Y1
- Y0
| X,U
S)
US
Source: Carneiro, Heckman and Vytlacil (2009)
60 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE
Alternatively, can impose parametric model on(D,Y0,Y1)|X ,Z ) for estimation:
Much less data intensive, reasonably precise estimationfeasible with smaller sample sizes.
Naturally provides extrapolation outside of support, canestimate MTE over full unit interval and estimate alltreatment parameters.
Negative: less flexible, parametric structure might beincorrect.
61 / 72
Model/Assumptions Identification Estimation
Parametric Examples:
Assume
Yj = Xβj + Uj , j = 0, 1,
D = 1 [Z ′γ − V > 0]
with (U0,U1,V ) ⊥⊥ (X ,Z ), (U0,U1,V ) ∼ Fθ, distributionknown up to finite-dimensional unknown parameter vector θ.
Again have
E [Y | X = x ,P(Z ) = p] = xβ0 + p · x(β1 − β0) + Kθ(p),
E (Y1 − Y0|X = x ,UD = p) = X (β1 − β0) + kθ(p)
but with K (θ·) and kθ(·) now known functions (up tofinite-dimensional parameters θ).
62 / 72
Model/Assumptions Identification Estimation
Parametric Examples (cont’d)
Assume
Yj = Xβj + Uj , j = 0, 1,
D = 1 [Z ′γ − V > 0]
with (U0,U1,V ) ⊥⊥ (X ,Z ), (U0,U1,V ) ∼ Fθ, distributionknown up to finite-dimensional unknown parameter vector θ.
If (V ,U0,U1) joint normal, than classic Heckman (1978)normal selection model.
Used for estimation of MTE by, e.g., Heckman, Vytlaciland Urzua (2006), Carneiro, Heckman and Vytlacil(2011).
63 / 72
Model/Assumptions Identification Estimation
Parametric Examples (cont’d)
Assume
Yj = Xβj + Uj , j = 0, 1,
D = 1 [Z ′γ − V > 0]
with (U0,U1,V ) ⊥⊥ (X ,Z ), (U0,U1,V ) ∼ Fθ, parametricdistribution up to unknown parameter vector θ.
Tobias, Heckman and Vytlacil (2003) estimate MTE whileconsidering other parametric distributions for (V ,U0,U1)
For example, student-tν distributions instead of jointnormal.
Adapts results from Lee (1982, 1983).
64 / 72
Model/Assumptions Identification Estimation
Parametric Examples (cont’d)
Assume
Yj = 1 [Xβj + Uj ≥ 0] , j = 0, 1,
D = 1 [Z ′γ − V > 0]
with (U0,U1,V ) ⊥⊥ (X ,Z ), (U0,U1,V ) ∼ Fθ, parametricdistribution up to unknown parameter vector θ.
If (V ,U0,U1) joint normal, than bivariate probit modelmodel with structural shift..
Developed for MTE by Aakvik, Heckman and Vytlacil(2005). They also consider factor-model generalizationsof joint normality.
65 / 72
Model/Assumptions Identification Estimation
Example: MTE for Effect of Vocational Rehabilitation on Employment
MTE for Effect of Vocational Rehabilitation on Employment
Source: Aakvik, Heckman and Vytlacil (2005)
66 / 72
Model/Assumptions Identification Estimation
Example: Effect of Vocational Rehabilitation on Employment
7.5. Cream-skimming: the relationship between selection into the program and
outcomes
A central question in the analysis of a program like VR is whether those whobenefit the most from it are those most likely to participate in it. We have alreadynoted that ATE is greater than TT ; i.e., that randomly selected persons benefit morefrom the program than those who participate in it. This suggests that thecombinations of UD and Z values that promote program participation are perverselyassociated with the observed and unobserved factors associated with gains from theprogram.In order to determine the extent of cream-skimming on both observables and
unobservables, it is necessary to relate D (as defined by the various means anddistributional parameter analogues) to ZbD and UD: We have estimated relation-ships among D and (Xb1;Xb0;U1;U0Þ; however. So the problem is how to go fromthe relationships we have estimated to determine the relationships between gains andZbD and UD:Given the factor structure model, we can easily determine how variation in UD
affects U1 and U0 (see Eq. (12)). By virtue of independence assumption (iii), thefactor relationship does not depend on values of ZbD; Xb1 and Xb0: We have usedthis relationship in computing Fig. 1 and in inferring that selection into the program
ARTICLE IN PRESS
Table 5
Mean and distributional treatment parameters
ATE Distributional version of ATE:
EðDÞ ¼ �0:014 Pr½D ¼ 1� ¼ 0:225ðstandard error ¼ 0:08Þ Pr½D ¼ 0� ¼ 0:532
Pr½D ¼ �1� ¼ 0:240
TT Distributional version of TT :
EðD j D ¼ 1Þ ¼ �0:110 Pr½D ¼ 1 j D ¼ 1� ¼ 0:178ðstandard error ¼ 0:09Þ Pr½D ¼ 0 j D ¼ 1� ¼ 0:534
Pr½D ¼ �1 j D ¼ 1� ¼ 0:288
MTE with UD ¼ 2 Distributional version of MTE with UD ¼ 2:
EðD j UD ¼ 2Þ ¼ 0:224 Pr½D ¼ 1 j UD ¼ 2� ¼ 0:350ðstandard error ¼ 0:17Þ Pr½D ¼ 0 j UD ¼ 2� ¼ 0:524
Pr½D ¼ �1 j UD ¼ 2� ¼ 0:126
MTE with UD ¼ 0 Distributional version of MTE with UD ¼ 0:
EðD j UD ¼ 0Þ ¼ �0:014 Pr½D ¼ 1 j UD ¼ 0� ¼ 0:219ðstandard error ¼ 0:07Þ Pr½D ¼ 0 j UD ¼ 0� ¼ 0:549
Pr½D ¼ �1 j UD ¼ 0� ¼ 0:233
MTE with UD ¼ �2 Distributional version of MTE with UD ¼ �2:EðD j UD ¼ �2Þ ¼ �0:255 Pr½D ¼ 1 j UD ¼ �2� ¼ 0:119ðstandard error ¼ 0:16Þ Pr½D ¼ 0 j UD ¼ �2� ¼ 0:508
Pr½D ¼ �1 j UD ¼ �2� ¼ 0:373
A. Aakvik et al. / Journal of Econometrics 125 (2005) 15–5140
Source: Aakvik, Heckman and Vytlacil (2005)67 / 72
Model/Assumptions Identification Estimation
Example: Effect of Year of College on Wages (Parametric)
MTE for Effect of Year of College on Wages2767cARnEiRO Et Al.: EStimAting mARginAl REtuRnS tO EducAtiOnVOl. 101 nO. 6
u S ). Individuals choose the schooling sector in which they have comparative advan-tage. The magnitude of the heterogeneity in returns on which agents select is sub-stantial: returns can vary from −15.6 percent (for high u S persons, who would lose from attending college) to 28.8 percent per year of college (for low u S persons).16 The magnitude of total heterogeneity is likely to be even higher since the MTE is the average gain at that quantile of desire to attend college. In general, there will be a distribution of returns centered at each value of the MTE. Furthermore, once we account for variation in X and its impact on returns through X( δ 1 − δ 0 ), we observe returns as low as −31.56 percent and as high as 51.02 percent.
Using the weights presented in online Appendix Table A-1B, we can construct the standard treatment parameters from the MTE. We present the results in the first column of Table 5 (standard errors are bootstrapped). These include marginal returns to the three different policies considered in Table 1 (MPRTE), which are all
16 One unattractive feature of the normal model is that (for our estimates of σ 1V and σ 0V ) mtE(x, 0) = + ∞ and mtE(x, 1) = −∞. In order to get finite values at the extremes of the normal MTE, we restrict the support of u S to be between 0.0001 and 0.9999.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
MT
E
US
Figure 1. MTE Estimated from a Normal Selection Model
notes: To estimate the function plotted here, we estimate a parametric normal selection model by maximum likeli-hood. The figure is computed using the following formula:
ΔMTE (x, uS) = μ1 (x) − μ0 (x) − (σ1V − σ0V) Φ−1 (uS),
where σ 1V and σ 0V are the covariances between the unobservables of the college and high school equation and the unobservable in the selection equation; and X includes experience, current average earnings in the county of resi-dence, current average unemployment in the state of residence, AFQT, mother’s education, number of siblings, urban residence at 14, permanent local earnings in the county of residence at 17, permanent unemployment in the state of residence at 17, and cohort dummies. We plot 90 percent confidence bands.
Source: Carneiro, Heckman and Vytlacil (2011)68 / 72
Model/Assumptions Identification Estimation
Example: Effect of Year of College on Wages (Semi-Parametric)
MTE for Effect of Year of College on Wages
2771cARnEiRO Et Al.: EStimAting mARginAl REtuRnS tO EducAtiOnVOl. 101 nO. 6
mean values in the sample. As above, we annualize the MTE. Our estimates show that, in agreement with the normal model, E( u 1 − u 0 | u S = u S ) is declining in u S , i.e., students with high values of u S have lower returns than those with low values of u S .
Even though the semiparametric estimate of the MTE has larger standard errors than the estimate based on the normal model, we still reject the hypothesis that its slope is zero. We have already discussed the rejection of the hypothesis that MTE is constant in u S , based on the test results reported in Table 4, panel A. But we can also directly test whether the semiparametric MTE is constant in u S or not. We evaluate the MTE at 26 points, equally spaced between 0 and 1 (with intervals of 0.04). We construct pairs of nonoverlapping adjacent intervals (0–0.04, 0.08–0.12, 0.16–0.20, 0.24–0.28, …), and we take the mean of the MTE for each pair. These are LATEs defined over different sections of the MTE. We compare adjacent LATEs. Table 4, panel B, reports the outcome of these comparisons. For example, the first column reports that
E ( Y 1 − Y 0 | X = _ x , 0 ≤ u S ≤ 0.04)
− E ( Y 1 − Y 0 | X = _ x , 0.08 ≤ u S ≤ 0.12) = 0.0689.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
US
MT
E
Figure 4. E( Y 1 − Y 0 | X, u S ) with 90 Percent Confidence Interval— Locally Quadratic Regression Estimates
notes: To estimate the function plotted here, we first use a partially linear regression of log wages on polynomials in X, interactions of polynomials in X and P, and K(P), a locally quadratic function of P (where P is the predicted probability of attending college), with a bandwidth of 0.32; X includes experience, current average earnings in the county of residence, current average unemployment in the state of residence, AFQT, mother’s education, number of siblings, urban residence at 14, permanent local earnings in the county of residence at 17, permanent unemployment in the state of residence at 17, and cohort dummies. The figure is generated by evaluating by the derivative of (9) at the average value of X. Ninety percent standard error bands are obtained using the bootstrap (250 replications).
Source: Carneiro, Heckman and Vytlacil (2011)
69 / 72
Model/Assumptions Identification Estimation
Example: Effect of Year of College on Wages
Effect of Year of College on Wages
2768 tHE AmERicAn EcOnOmic REViEW OctOBER 2011
below the return to the average student (t t = E(β | S = 1)), the average person (AtE = E(β)), and the IV estimate. But it is not clear if these estimates are reliable, given the strong normality assumption used to generate them. We next corroborate these estimates of marginal returns using a more robust semiparametric approach.
C. Estimating the mtE and marginal Policy Effects using local instrumental Variables
An alternative and more robust approach for estimating the MTE estimates E(Y | X, P(Z) = p) semiparametrically and then computes its derivative with respect to p, as shown in the analysis of equations (5) and (6). If all we are willing to assume is that ( u 0 , u 1 , V ) is independent of Z given X, then it is only possible to estimate the MTE over the support of P conditional on X. Figure 2 plots f (P | X), the density of P given X (P is estimated by a logit). Since X is multidimensional, we use an index of X (X[ δ 1 − δ 0 ]). It is striking how small the support of P is for each value of the X index. It is not possible to estimate MTE over the full unit interval, and as a conse-quence, it is not possible to estimate conventional treatment parameters such as the average treatment effect (E(β)) or the effect of treatment on the treated (E(β | S = 1)). It is still possible, however, to estimate MPRTE, since this parameter only puts posi-tive weight over sections of the MTE that are identified within the support of f (P | X).
Empirically, it is very difficult to apply the procedure described in Section I while conditioning on X nonparametrically. We first proceed by invoking the stronger assumption that (X, Z) is independent of ( u 0 , u 1 , u S ). We relax it below. Under this
Table 5—Returns to a Year of College
Model Normal Semiparametric
AtE = E(β) 0.0670 Not identified(0.0378)
tt = E(β | S = 1) 0.1433 Not identified(0.0346)
tut = E(β | S = 0) −0.0066 Not identified(0.0707)
MPRTEPolicy perturbation Metric Z α k
= Z k + α | Z γ − V | < e 0.0662 0.0802(0.0373) (0.0424)
P α = P + α | P − u | < e 0.0637 0.0865(0.0379) (0.0455)
P α = (1 + α)P | P _ u
− 1 | < e 0.0363 0.0148(0.0569) (0.0589)
Linear IV (Using P(Z) as the instrument) 0.0951(0.0386)
OLS 0.0836(0.0068)
notes: This table presents estimates of various returns to college, for the semiparametric and the normal selection models: average treatment effect (ATE), treatment on the treated (TT), treatment on the untreated (TUT), and different versions of the marginal policy relevant treat-ment effect (MPRTE). The linear IV estimate uses P as the instrument. Standard errors are bootstrapped (250 replications). See online Appendix Table A-1 for the exact definitions of the weights. See Table 1 for the weights for MPRTE. For more discussion of MPRTE, see Carneiro, Heckman, and Vytlacil (2010).
Source: Carneiro, Heckman and Vytlacil (2011) 70 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE, Structure on MTR
Alternative Parametric Approach:Brinch, Mogstad, and Wiswall (JPE, 2017)“Beyond LATE with a Discrete Instrument”
Place parametric structure directly on E (Y1|U), E (Y0|U)(MTRs).
Can identify model with linear E (Y1|U), E (Y0|U), linearMTE with only a binary instrument.
The greater the variation in Z , the richer the parametricmodel on E (Y1|U), E (Y0|U) that can be estimated.
Can test external validity of LATE even with binary Z .
See also Kowalski (2016)
71 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE, Structure on MTR (cont’d)
Suppose Yj = Xβj + Uj , j = 0, 1, with(U0,U1,UD) ⊥⊥ (X ,Z ), and define
k0(p) = E [U0 | UD = p]
k1(p) = E [U1 | UD = p]
k(p) = k1(p)− k0(p)
so that
E [Y0 | X ,P] = Xβ0 + k0(P)
E [Y1 | X ,P] = Xβ1 + k1(P)
MTE (X ,P) = X (β1 − β0) + k(p)
Brinch et al (2017) impose parametric structure directly onk1(p), k0(p).
72 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE, Structure on MTR (cont’d)
E [Y0 | D = 0,X ,P] = Xβ0 + K0(P)
E [Y1 | D = 1,X ,P] = Xβ1 + K1(P)
where
K0(p) ≡ E [U0 | UD ≤ p] =1
1− p
∫ 1
p
k0(u) du
K1(p) ≡ E [U1 | UD > p] =1
p
∫ p
0
k1(u) du
73 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE, Structure on MTR (cont’d)
General idea:
1 impose parametric structure on k0(P), k1(P),
2 estimate parameters by estimating E [Y0 | D = 0,X ,P],E [Y1 | D = 1,X ,P], using parametric form implied byparametric restrictions on k0(P), k1(P).
3 use resulting parameters to estimate MTE, functionals ofMTE.
74 / 72
Model/Assumptions Identification Estimation
Parametric Estimation of MTE, Structure on MTR (cont’d)
For example, suppose no X covariates, and assume:
k0(p) = α0 · p −1
2α0
k1(p) = α1 · p −1
2α1
which implies
K0(p) =1
2α0p
K1(p) =1
2α1(p − 1)
and thus
E [Y0 | D = 0,P] = µ0 +1
2α0p
E [Y1 | D = 1,P] = µ1 +1
2α1(p − 1)
Can estimate α0, α1, then use to form MTE. Can do so evenwith Z binary.
75 / 72
Model/Assumptions Identification Estimation
What Does Linear IV Estimate?
Consider J(Z ) as an instrument, a scalar function of Z .
∆IVJ =
Cov(Y , J(Z ))
Cov(D, J(Z )).
How to express as a weighted average of MTE?
76 / 72
Model/Assumptions Identification Estimation
∆IVJ =
∫ 1
0
∆MTE (u) ωJIV (u) du (4.1)
ωJIV (u) =
E(J (Z )− J(Z ) | P (Z ) > u
)Pr (P (Z ) > u)
Cov (J (Z ) ,D).
(4.2)J(Z ) and P(Z ) do not have to be continuous randomvariables.
Functional forms of P(Z ) and J(Z ) are general.
The weights are always positive if J (Z ) is monotonic in thescalar P(Z ).
77 / 72
Model/Assumptions Identification Estimation
The possibility of negative weights arises only when J(Z )is not a monotonic function of P(Z ).
This may arise, e.g., when there are two or moreinstruments, and the analyst computes estimates withonly one instrument or a combination of the Zinstruments that is not a monotonic fuction of P(Z ) sothat J(Z ) and P(Z ) are not perfectly dependent.
78 / 72
Model/Assumptions Identification Estimation
If use P(Z ) as the instrument, J(Z ) = P(Z ), then
The weights are everywhere non-negative.
Weighting function is maximal for u = E (P(Z ) | X = x)and minimal for u = 0, 1.
IV weights MTE more where density of P(Z ) is higher.
79 / 72
Model/Assumptions Identification Estimation
The weights can be constructed from data on (J ,P ,D).
Weights on ∆MTE (u) generating ∆IV are different fromthe weights corresponding to ∆TT (u), different fromweights corresponding to other standard treatmentparameters.
IV gives one weighted version of MTE, conventionaltreatment parameters give other weighted versions.
80 / 72
Model/Assumptions Identification Estimation
Discrete instruments J (Z )
Discrete Case
Support of the distribution of P(Z ) contains a finitenumber of values p1 < p2 < · · · < pK .
Support of the instrument J (Z ) is also discrete, taking Idistinct values, j1 < j2 < · · · < jI .
E (J(Z )|P(Z ) ≥ u) is constant in u for u within any(p`, p`+1) interval, and Pr(P(Z ) ≥ u) is constant in u foru within any (p`, p`+1) interval.
81 / 72
Model/Assumptions Identification Estimation
Discrete instruments J (Z )
∆IVJ =
∫E (Y1 − Y0|UD = u)ωJ
IV (u) du (4.3)
=K−1∑
`=1
λ`
∫ p`+1
p`
E (Y1 − Y0|UD = u)1
(p`+1 − p`)du
=K−1∑
`=1
∆LATE(p`, p`+1)λ`.
where
λ` =
I∑i=1
(ji − E (J))K∑t>`
Pr [J = ji ,P = pt ]
Cov (J (Z ) ,D)(p`+1 − p`) (4.4)
82 / 72
Model/Assumptions Identification Estimation
Discrete instruments J (Z )
Generalizes the expression presented by Imbens andAngrist (1994) and Yitzhak (1989).
Their analysis of the case of vector Z only considers thecase where J(Z ) and P(Z ) are perfectly dependentbecause J(Z ) is a monotonic function of P (Z ).
The weights can be positive or negative for any ` butthey must sum to 1 over the `.
83 / 72
Model/Assumptions Identification Estimation
The central role of the propensity score
For the IV weight to be correctly constructed andinterpreted, we need to know the correct model for P (Z ).
IV depends on:1 the choice of the instrument J (Z ),2 its dependence with P (Z ),3 the specification of the propensity score (i.e., what
variables go into Z ).
“Structural” LATE or MTE identified by P(Z ).
Can derive all other instruments in terms of this.
84 / 72
Model/Assumptions Identification Estimation
Comparing IV and OLS
In comparison to IV, what is plim of OLS?
Y = Y0 + D(Y1 − Y0).
E (Y |D = 1)− E (Y |D = 0)
= E (Y1 − Y0|D = 1)︸ ︷︷ ︸+E (Y0|D = 1)− E (Y0|D = 0)︸ ︷︷ ︸TT Selection Bias
= E (Y1 − Y0) +
{E (Y1 − Y0|D = 1)−E (Y1 − Y0)
}+
{E (Y0|D = 1)−E (Y0|D = 0)
}
= ATE + Sorting Gain + Selection Bias
85 / 72
Model/Assumptions Identification Estimation
Comparing IV and OLS (cont’d)
If ATE is a parameter of interest, OLS suffers from bothsorting bias and Selection Bias.
If TT is parameter of interest, OLS suffers from SelectionBias.
Using IV removes Selection Bias, but changes theparameter being estimated (neither ATE nor TT ingeneral).
86 / 72
Model/Assumptions Identification Estimation
When will MTE be a constant?
Important question: Does∆MTE (u) = E (Y1 − Y0 | UD = u) vary with uD?
If E (Y1 − Y0 | UD = u) does not vary with u:“standard case.”Implies:
ATE = TT = LATE = policy counterfactuals = plim IV.
87 / 72
Model/Assumptions Identification Estimation
When will E (Y1 − Y0 | UD = u) not vary with u?
1 If Y1 = Y0 + β for some constant β.
2 More Generally, if Y1 − Y0 is mean independent of UD , sotreatment effect heterogeneity is allowed but individualsdo not act upon their own idiosyncratic effect.
If Y1 − Y0 is not independent of UD , so treatment effectheterogeneity is allowed and individuals do act upon their ownidiosyncratic effect, MTE will vary with uD , and treatmentparameters will differ.
88 / 72
Model/Assumptions Identification Estimation
ATE is only identified in limit sets (P (z) = 1 andP (z ′) = 0).
TT requires a limit set that sets P (z) = 1 for each X .
“Identification at infinity” stalks IV and control functionenterprise.
We can test to see if these complications are needed.
89 / 72
Model/Assumptions Identification Estimation
Testing for essential heterogeneity
Since ∂∂pE (Y | P(Z ) = p) = E (Y1 − Y0|UD = p), we have:
Y1 − Y0 ⊥⊥ D⇒ E (Y1 − Y0|UD = u) = E (Y1 − Y0)⇒ ∂
∂pE (Y | P(Z ) = p) = E (Y1 − Y0)
⇒ E (Y |P(Z ) = p) is linear in p.
Thus,
Y1 − Y0 ⊥⊥ D ⇒ E (Y | P(Z ) = p) = a + bp,
where b = ∆MTE = ∆ATE = ∆TT.
90 / 72
Model/Assumptions Identification Estimation
Testing for essential heterogeneity (cont’d)
Y1 − Y0 ⊥⊥ D ⇒ E (Y | P(Z ) = p) = a + bp
If can’t reject E (Y |P(Z ) = p) is linear in p, can’t rejecteither Y1 − Y0 is constant or Y1 − Y0 ⊥⊥ D. No essentialheterogeneity, analysis simplifies tremendously.
If E (Y |P(Z ) = p) is nonlinear in p, then evidence ofessential heterogeneity – the returns to treatment vary inthe population, and individuals act upon it.
91 / 72
Model/Assumptions Identification Estimation
Testing for essential heterogeneity (cont’d)
Test E (Y | P(Z ) = p) = a + bp as a test of essentialheterogeneity?
Simple testing strategy from Carneiro, Heckman and Vytlacil(2011):
Regress Y on polynomial in P(Z ), test higher order termsof polynomial are jointly zero.
Not omnibus test, but valid test with power in somedirections.
Sequential version of test developed in Heckman,Schmierer, and Urzua (2010).
92 / 72
Model/Assumptions Identification Estimation
Example: Effect of Year of College on Wages
Table 4A - Test of Linearity of E (Y |X,P = p) using polynomials in P a
Degree of Polynomial 2 3 4 5p-value of joint test of nonlinear terms 0.013 0.018 0.032 0.028Adjusted critical value 0.026Outcome of test: Reject
Table 4B - Test of Equality of LATEs (H0 : LATE1
(U1LS , U1H
S
)− LATE1
(U2LS , U2H
S
)= 0) - Baseline Modelb
Ranges of US for LATE1 (0,0.04)- (0.08,0.12)- (0.12,0.20)- (0.24,0.28)- (0.32,0.36)- (0.40,0.44)-Ranges of US for LATE2 -(0.08,0.12) -(0.16,0.20) -(0.24,0.28) -(0.32,0.36) -(0.40,0.44) -(0.48,0.52)Difference in LATEs 0.0689 0.0629 0.0577 0.0531 0.0492 0.0459P-Value 0.0240 0.0280 0.0280 0.0320 0.0320 0.0520Ranges of US for LATE1 (0.48,0.52)- (0.56,0.60)- (0.64,0.68)- (0.72,0.76)- (0.80,0.84)- (0.88,0.92)-Ranges of US for LATE2 -(0.56,0.60) -(0.64,0.68) -(0.72,0.76) -(0.80,0.84) -(0.88,0.92) -(0.96,1)Difference in LATEs 0.0431 0.0408 0.0385 0.0364 0.0339 0.0311P-Value 0.0520 0.0760 0.0960 0.1320 0.1800 0.2400Joint P-Value 0.0520
Notes: aThe size of the test is controlled using a critical value constructed by the bootstrap method of Romano and Wolf (2005). bIn
order to compute the numbers in this table we construct groups of values of US and average the MTE within these groups, by computing
E(Y1 − Y0|X = x, UL
S ≤ US ≤ UHS
), where UL
S and UHS are the lowest and highest values of US for a given group. Then we compare the
average MTE across adjacent groups and test whether the difference is equal to zero (using the bootstrap with 250 replications).
4
Source: Carneiro, Heckman and Vytlacil (2011)
93 / 72
Model/Assumptions Identification Estimation
Application: Foster Care and Adult Crime
Application: Doyle (2008),“Child Protection and Adult Crime: Using InvestigatorAssignment to Estimate Causal Effects of Foster Care”(Journal of Political Economy)
Foster Care in U.S.
U.S. spends $20 billion per year on foster care.
800,000 children per year spend some time in foster care.
20% of U.S. prison population under age 30 had been infoster care.
Investigate the effect of foster care on later adult crime,including “at the margin.”
94 / 72
Model/Assumptions Identification Estimation
Application: Foster Care and Adult Crime
Data from Illinois, linking Illinois State Police data to childabuse investigation data.
Cases first investigated 1990-2003, children aged 4-16.
Crime data 2000-2005, ages 18-31 in 2005.
23,254 observations.
16% of cases result in foster care placement.
26% of children are later arrested as adults by 2005.
95 / 72
Model/Assumptions Identification Estimation
Table: Summary Statistics, Foster Carechild protection and adult crime 753
TABLE 1Summary Statistics
Variable MeanStandardDeviation
Foster care placement .16 .36Race:
White .71 .46African American .25 .43Hispanic .03 .18
Initial reporter:Physician .07 .25School .17 .38Police .21 .41Family .18 .38Neighbor .07 .25Other government .14 .35Anonymous .12 .33Other reporter .03 .17
Age at report 11.0 3.1Sex: boy .50 .50Allegation:
Lack of supervision .26 .44Environmental neglect .11 .31Other neglect .06 .24Substantial risk of harm .35 .48Physical abuse .20 .40Other abuse .02 .16
Observations 23,254
Note.—The statistics pertain to children investigated outside of Cook Countybetween July 1, 1990, and June 30, 2003, and who were at least 18 years old in 2005.
not at risk for an adult arrest. This results in a sample of childrenbetween the ages of 4 and 16 at the time of the initial child protectionreport: children who would be between the ages of 18 and 31 in 2005.Thus, the results focus on older, poorer children than the populationof children who are investigated for abuse or neglect. In addition, sexualabuse cases (8 percent of the total) are excluded, since these cases donot enter into the rotational assignment of investigators.
The analysis sample includes over 23,000 children. To better under-stand the types of allegations, reporters, and child characteristics in thechild protection system, table 1 reports summary statistics: 16 percentof the children investigated were eventually placed in foster care (ap-proximately 10 percent of investigated children are placed in foster carein the United States as a whole, and the higher placement rate herelargely reflects the restriction of the sample to children who receivedPublic Assistance at some point prior to the abuse report); 71 percentof the investigated children are white, compared to 87 percent of thepopulation aged 5–14 in 2000 in Illinois outside of Cook County (figurefrom the U.S. Census of Population); the reporters of the abuse orneglect are typically school officials, police, and family members; and
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
96 / 72
Model/Assumptions Identification Estimation
Table: Summary Statistics, Foster Care
child protection and adult crime 753
TABLE 1Summary Statistics
Variable MeanStandardDeviation
Foster care placement .16 .36Race:
White .71 .46African American .25 .43Hispanic .03 .18
Initial reporter:Physician .07 .25School .17 .38Police .21 .41Family .18 .38Neighbor .07 .25Other government .14 .35Anonymous .12 .33Other reporter .03 .17
Age at report 11.0 3.1Sex: boy .50 .50Allegation:
Lack of supervision .26 .44Environmental neglect .11 .31Other neglect .06 .24Substantial risk of harm .35 .48Physical abuse .20 .40Other abuse .02 .16
Observations 23,254
Note.—The statistics pertain to children investigated outside of Cook Countybetween July 1, 1990, and June 30, 2003, and who were at least 18 years old in 2005.
not at risk for an adult arrest. This results in a sample of childrenbetween the ages of 4 and 16 at the time of the initial child protectionreport: children who would be between the ages of 18 and 31 in 2005.Thus, the results focus on older, poorer children than the populationof children who are investigated for abuse or neglect. In addition, sexualabuse cases (8 percent of the total) are excluded, since these cases donot enter into the rotational assignment of investigators.
The analysis sample includes over 23,000 children. To better under-stand the types of allegations, reporters, and child characteristics in thechild protection system, table 1 reports summary statistics: 16 percentof the children investigated were eventually placed in foster care (ap-proximately 10 percent of investigated children are placed in foster carein the United States as a whole, and the higher placement rate herelargely reflects the restriction of the sample to children who receivedPublic Assistance at some point prior to the abuse report); 71 percentof the investigated children are white, compared to 87 percent of thepopulation aged 5–14 in 2000 in Illinois outside of Cook County (figurefrom the U.S. Census of Population); the reporters of the abuse orneglect are typically school officials, police, and family members; and
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
97 / 72
Model/Assumptions Identification Estimation
Application: Foster Care and Adult Crime
Children suspected of abuse reported to Illinois Department ofChildren and Family Services (DCFS) by physicians,educations, police, family members.
Once reported, allegation assigned to “case manager”.
Doyle argues assignment is essentially randomized(except for special cases, including alleged sexual abuse).
Case managers investigate, decide whether chargesunsubstantiated or to bring case to judge.
Differences across case managers in fraction of cases thatresult in foster care placement.
98 / 72
Model/Assumptions Identification Estimation
Application: Foster Care and Adult Crime
Doyle uses placement rate of assigned case manager as aninstrument:
He argues that case managers are essentially randomlyassigned, argues for instrument exogeneity.
Differences across case managers in fraction of cases thatresult in foster care placement, argues for instrumentrelevance.
In particular, constructs JIVE type instrument:
For each case, compute assigned case worker’splacement differential, defined in a “leave one out”manner to not include the particular case.
99 / 72
Model/Assumptions Identification Estimation
Figure: Placement Differential
Fig. 1.—Foster care placement (actual and predicted) and arrest indicator vs. case manager placement differential. Local linear estimates, evaluatedat each percentile of the case manager placement differential. Pilot bandwidth chosen by cross-validation is 0.034 for the actual and predicted placementrates. For the arrest rate the bandwidth is 0.056.
This content dow
nloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 U
TC
All use subject to http://about.jstor.org/term
s
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
100 / 72
Model/Assumptions Identification Estimation
Table: Determinants of Case Worker Placement Differentialchild protection and adult crime 759
TABLE 3Case Manager Assignment and Foster Care Placement
Dependent Variable: Foster Care Placement
Coefficient(1)
StandardError(2)
Coefficient(3)
StandardError(4)
Case manager placementdifferential .229 .036** .233 .035**
Race:White �.002 .029African American .093 .029**Hispanic �.030 .031
Initial reporter:Physician .043 .018*School .025 .015Police .073 .016**Family .016 .015Neighbor �.013 .016Other government .084 .016**Anonymous .002 .016
Age at report:Age 6 �.027 .018Age 7 .001 .016Age 8 .008 .017Age 9 .014 .017Age 10 .016 .017Age 11 .016 .017Age 12 .020 .017Age 13 .020 .018Age 14 .016 .017Age 15 �.007 .018Age 16 �.017 .018
Sex: boy �.016 .005**Allegation:
Physical abuse �.172 .015**Substantial risk �.180 .015**Other abuse �.162 .019**Lack of supervision �.152 .015**Environmental neglect �.188 .016**
Mean of dependent variable .16Observations 23,254
Note.—Models are estimated by OLS. Data are for school-aged children outside of Cook County. Standard errorsare clustered at the case manager level. All models include year indicators.
* Significant at 5 percent.** Significant at 1 percent.
Crime Outcomes
To compare crime outcomes, Y, empirical models for child i investigatedby case manager c in subteam j during year t are of the form
Y p a � a R � a X � d 1(t p k) � � . (7)�icj 0 1 icj 2 i k i icjk
This model is estimated separately for each outcome by OLS and two-
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care” 101 / 72
Model/Assumptions Identification Estimation
Table: Determinants of Case Worker Placement Differential
child protection and adult crime 759
TABLE 3Case Manager Assignment and Foster Care Placement
Dependent Variable: Foster Care Placement
Coefficient(1)
StandardError(2)
Coefficient(3)
StandardError(4)
Case manager placementdifferential .229 .036** .233 .035**
Race:White �.002 .029African American .093 .029**Hispanic �.030 .031
Initial reporter:Physician .043 .018*School .025 .015Police .073 .016**Family .016 .015Neighbor �.013 .016Other government .084 .016**Anonymous .002 .016
Age at report:Age 6 �.027 .018Age 7 .001 .016Age 8 .008 .017Age 9 .014 .017Age 10 .016 .017Age 11 .016 .017Age 12 .020 .017Age 13 .020 .018Age 14 .016 .017Age 15 �.007 .018Age 16 �.017 .018
Sex: boy �.016 .005**Allegation:
Physical abuse �.172 .015**Substantial risk �.180 .015**Other abuse �.162 .019**Lack of supervision �.152 .015**Environmental neglect �.188 .016**
Mean of dependent variable .16Observations 23,254
Note.—Models are estimated by OLS. Data are for school-aged children outside of Cook County. Standard errorsare clustered at the case manager level. All models include year indicators.
* Significant at 5 percent.** Significant at 1 percent.
Crime Outcomes
To compare crime outcomes, Y, empirical models for child i investigatedby case manager c in subteam j during year t are of the form
Y p a � a R � a X � d 1(t p k) � � . (7)�icj 0 1 icj 2 i k i icjk
This model is estimated separately for each outcome by OLS and two-
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care” 102 / 72
Model/Assumptions Identification Estimation
Results: OLS, TSLS760 journal of political economy
TABLE 4Foster Care Placement and Crime Outcomes: 2000–2005
Model
OLS(1)
OLS(2)
2SLS(3)
2SLS(4)
LIML(5)
LIML(6)
A. Dependent Variable: Arrested
Foster care placement .075 .060 .388 .391 .226 .217(.008)** (.008)** (.189)* (.182)* (.113)* (.111)*
Mean of dependentvariable .260
Full controls No Yes No Yes No YesObservations 23,254 23,254 23,254 23,254 22,691 22,632
B. Dependent Variable: Sentence of Guilty/Withheld
Foster care placement .045 .039 .403 .405 .236 .241(.007)** (.007)** (.160)* (.154)** (.092)** (.092)**
Mean of dependentvariable .151
Full controls No Yes No Yes No YesObservations 23,254 23,254 23,254 23,254 22,691 22,632
C. Dependent Variable: Sentenced to Prison
Foster care placement .035 .031 .219 .225 .176 .176(.005)** (.005)** (.104)* (.102)* (.070)* (.070)**
Mean of dependentvariable .066
Full controls No Yes No Yes No YesObservations 23,254 23,254 23,254 23,254 22,691 22,632
Note.—Data are for children investigated for abuse or neglect outside of Cook County. Standard errors are clusteredat the case manager level. All models include year indicators. Columns 1 and 2 report estimates from models estimatedby OLS. Columns 3 and 4 report estimates from models estimated by 2SLS, with the case manager removal differentialas the excluded instrument. Columns 5 and 6 report estimates from models estimated by LIML, with individual inves-tigator indicators as the excluded instruments and subteam by year fixed effects. The LIML models use a sample limitedto investigators with at least five investigations in the analysis sample.
* Significant at 5 percent.** Significant at 1 percent.
stage least squares (2SLS), with the case manager placement differential,, used as an instrument for the indicator for removal, .Z Ricj icj
Similarly, the following model is estimated by LIML:
Y p a � a R � a X � q 1(T p k) � � , (8)�icj 0 1 icj 2 i k i icjk
where is a subteam indicator to estimate the model within the poolTi
of investigators who could have been assigned to a child, and case man-ager indicators are used as excluded instruments. The LIML estimatesare restricted to the sample of investigators with at least five investiga-tions in the analysis sample.
Table 4 reports the crime outcome results. As discussed above, themean arrest rate in the sample is 26 percent. In terms of the OLS results,children who were placed in foster care have higher crime outcomes,
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
103 / 72
Model/Assumptions Identification Estimation
Results: Subgroups
TABLE 5Results across Child Characteristics
Subgroup
SexAllegation/
Reporter Race
Location(Matched byName and
Date of Birth)
Boy(1)
Girl(2)
Abuse(3)
Neglect(4)
White(5)
AfricanAmerican
(6)Non-Cook
(7)Cook(8)
A. Dependent Variable: Placed in Foster Care
Case manager placement differential .211 .302 .276 .24 .213 .367 .256 .295(.052)** (.047)** (.047)** (.058)** (.040)** (.081)** (.040)** (.058)
Relative to overall first stage .92 1.32 1.21 1.05 .93 1.60 1.12 1.29
B. Dependent Variable: Arrested, 2000–2005
Foster care placement:OLS .062 .060 .051 .070 .059 .057 .079 .061
(.012)** (.011)** (.012)** (.013)** (.011)** (.010)** (.009) (.008)2SLS .221 .509 .385 .389 .541 .591 .235 .201
(.298) (.187)** (.203) (.297) (.263)* (.249)* (.210) (.111)Mean of dependent variable .305 .215 .261 .259 .249 .248 .344 .403Observations 11,673 11,581 13,149 10,105 16,402 17,459 23,899 22,357
Note.—Panel A reports the first-stage coefficients for each subgroup and the relative first stage to the overall first-stage coefficient reported in table 3, with the exception of cols. 7 and 8, whichreport the first-stage coefficient in each set of counties relative to the first stage from a pooled sample of all counties. The OLS and 2SLS cells report coefficients on foster care placement, withthe second set instrumented by the case manager placement differential. Standard errors clustered at the case manager level are reported. In cols. 7 and 8, the match using name and date ofbirth allows the use of individuals with missing social security numbers, resulting in larger sample sizes.
* Significant at 5 percent.** Significant at 1 percent.
This content dow
nloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 U
TC
All use subject to http://about.jstor.org/term
s
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
104 / 72
Model/Assumptions Identification Estimation
Figure: Adult Crime on Fitted Probability of Foster Care (Doyle, 2008)
766 journal of political economy
Fig. 2.—A, Arrested vs. . B, Arrested marginal treatment effects. Lo-P(PlacementFZ)cal quadratic and associated derivative estimates, evaluated at each percentile of
: the predicted placement from a probit model that includes only the caseP(PlacementFZ)manager placement differential. Dashed lines report 5–95 percent bootstrapped confi-dence intervals. Pilot bandwidth chosen by cross-validation is 0.031.
VI. Conclusion
Foster care placement is a far-reaching intervention in the lives of chil-dren who are at high risk of arrests and incarceration as adults. Theanalysis here uses the effective randomization of families to child pro-tection investigators to estimate causal effects of foster care placementon crime outcomes. The results suggest that children placed in carehave two to three times higher arrest, conviction, and imprisonmentrates than children who remained at home. The point estimates arelarge and relatively imprecisely estimated, however, which suggests some
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
105 / 72
Model/Assumptions Identification Estimation
Figure: MTE, Foster Care on Adult Crime (Doyle, 2008)
766 journal of political economy
Fig. 2.—A, Arrested vs. . B, Arrested marginal treatment effects. Lo-P(PlacementFZ)cal quadratic and associated derivative estimates, evaluated at each percentile of
: the predicted placement from a probit model that includes only the caseP(PlacementFZ)manager placement differential. Dashed lines report 5–95 percent bootstrapped confi-dence intervals. Pilot bandwidth chosen by cross-validation is 0.031.
VI. Conclusion
Foster care placement is a far-reaching intervention in the lives of chil-dren who are at high risk of arrests and incarceration as adults. Theanalysis here uses the effective randomization of families to child pro-tection investigators to estimate causal effects of foster care placementon crime outcomes. The results suggest that children placed in carehave two to three times higher arrest, conviction, and imprisonmentrates than children who remained at home. The point estimates arelarge and relatively imprecisely estimated, however, which suggests some
This content downloaded from 130.132.173.185 on Fri, 10 Jun 2016 18:29:07 UTCAll use subject to http://about.jstor.org/terms
Source: Doyle (2008, JPE),“Child Protection and Adult Crime: Using Investigator Assignment to Estimate Causal Effects of Foster Care”
106 / 72
Model/Assumptions Identification Estimation
Application: Quantity-Quality
Alternative Parametric Approach:Brinch, Mogstad, and Wiswall (2017, JPE)“Beyond LATE with a Discrete Instrument:Heterogeneity in the Quantity-Quality Interaction of Children”
Examine effect of two or more siblings rather than onesibling on years of schooling of first born.
Use administrative data from Statistics Norway.
Entire resident population of Norway, children aged atleast 25 in 2000.
Sample of 514,049 children.
Use first two children same-sex and second birth resultedin twins as instruments.
107 / 72
Model/Assumptions Identification Estimation
Table: Sample StatisticsBrinch, Mogstad, and Wiswall (JPE, 2017)
6 Tables
Table 1: Descriptive Statistics
Mean Std. Dev.Outcome:Years of schooling 12.3 2.7
Instruments:Same sex, 1st and 2nd child 0.501 0.5Twins at second birth 0.0096 0.097
Endogenous regressor:At least three children 0.5021 0.5
Covariates:Female 0.47 0.50Age in 2000 39.5 9.2Mother's age at �rst birth 24.0 4.2Father's age at �rst birth 26.8 4.5Mother's years of schooling 10.0 1.4Father's years of schooling 10.1 2.6
Note: Descriptive statistics are for 514,049 children. All children are �rst born with at least one sibling.
Twins at �rst birth are excluded from the sample. All children, parents and siblings are aged between 16
and 74 years at some point between 1986 and 2000.
31
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
108 / 72
Model/Assumptions Identification Estimation
Table: Brinch, Mogstad, and Wiswall (JPE, 2017)
Table 2: Fertility decision model - Average Derivatives
Average e�ect (std. err.)
Covariates:Age in 2000 0.0163 (0.0011)Mother's age at �rst birth -0.0161 (0.0013)Father's age at �rst birth 0.0007 (0.0008)Mother's years of schooling 0.0030 (0.0016)Father's years of schooling -0.0038 (0.0019)Female -0.0016 (0.0018)
Instruments:Same sex, �rst and second 0.0567 (0.0012)Twins at 2nd parity 0.5179 (0.0007)
Note: This table reports the average partial e�ect (average treatment e�ect for binary variables) from a
logit model for the probability of being in a family with 2 or more siblings rather than 1 sibling. The
emodel is speci�ed in the following way: We use a third order polynomial in �Age in 2000�, �Mother's
age at �rst birth�, �Father's age at �rst birth birth�, �Mother's years of schooling� and �Father's years of
schooling�; We include interactions between the �rst order terms of all covariates; �Same sex, �rst and
second� enters the model without interaction terms; �Twins at 2nd parity� is interacted with all covariates
(including higher order terms and interactions) to ensure that the model is consistent with the fact that
there are no never takers with twins. Standard errors in parantheses are computed by nonparametric
bootstrap with 100 bootstrap replications.
Table 3: OLS and IV estimates
Z− as instrument P (Z) as instrumentIV:Same-sex instrument 0.174 -0.208
(0.115) (0.104)Twins instrument 0.050 -0.060
(0.063) (0.063)Both instruments 0.076 -0.015
(0.055) (0.054)
OLS -0.052(0.007)
Note: This table reports OLS and IV estimates of the e�ect of family size on the educational attainment
of �rst born children. The �rst column (Z− as instrument) uses the �rst stage equation (16). The second
column (P(Z) as instrument) uses the �rst stage equation (17). We construct P(Z) using the parameter
estimates from the logit model with average derivatives reported in Table 2. The second stage is given by
equation (15). We use the same speci�cation for the covariates as reported in Table 2. The �rst row uses
the �Same sex, �rst and second� instrument, the second row uses the �Twins at 2nd parity� instrument,
and the third row uses both instruments. The OLS estimates is reported in the fourth row. Standard
errors in parantheses are heteroskedasticity-robust.
32
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
109 / 72
Model/Assumptions Identification Estimation
Table: Brinch, Mogstad, and Wiswall (JPE, 2017)
Table 2: Fertility decision model - Average Derivatives
Average e�ect (std. err.)
Covariates:Age in 2000 0.0163 (0.0011)Mother's age at �rst birth -0.0161 (0.0013)Father's age at �rst birth 0.0007 (0.0008)Mother's years of schooling 0.0030 (0.0016)Father's years of schooling -0.0038 (0.0019)Female -0.0016 (0.0018)
Instruments:Same sex, �rst and second 0.0567 (0.0012)Twins at 2nd parity 0.5179 (0.0007)
Note: This table reports the average partial e�ect (average treatment e�ect for binary variables) from a
logit model for the probability of being in a family with 2 or more siblings rather than 1 sibling. The
emodel is speci�ed in the following way: We use a third order polynomial in �Age in 2000�, �Mother's
age at �rst birth�, �Father's age at �rst birth birth�, �Mother's years of schooling� and �Father's years of
schooling�; We include interactions between the �rst order terms of all covariates; �Same sex, �rst and
second� enters the model without interaction terms; �Twins at 2nd parity� is interacted with all covariates
(including higher order terms and interactions) to ensure that the model is consistent with the fact that
there are no never takers with twins. Standard errors in parantheses are computed by nonparametric
bootstrap with 100 bootstrap replications.
Table 3: OLS and IV estimates
Z− as instrument P (Z) as instrumentIV:Same-sex instrument 0.174 -0.208
(0.115) (0.104)Twins instrument 0.050 -0.060
(0.063) (0.063)Both instruments 0.076 -0.015
(0.055) (0.054)
OLS -0.052(0.007)
Note: This table reports OLS and IV estimates of the e�ect of family size on the educational attainment
of �rst born children. The �rst column (Z− as instrument) uses the �rst stage equation (16). The second
column (P(Z) as instrument) uses the �rst stage equation (17). We construct P(Z) using the parameter
estimates from the logit model with average derivatives reported in Table 2. The second stage is given by
equation (15). We use the same speci�cation for the covariates as reported in Table 2. The �rst row uses
the �Same sex, �rst and second� instrument, the second row uses the �Twins at 2nd parity� instrument,
and the third row uses both instruments. The OLS estimates is reported in the fourth row. Standard
errors in parantheses are heteroskedasticity-robust.
32
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
110 / 72
Model/Assumptions Identification Estimation
Figure: MTE for Effect of Fertility on Schooling of First BornBrinch, Mogstad, and Wiswall (JPE, 2017)
MTE for Effect of Additional Child on Yrs of Schooling of First Born
Figure 5: MTE estimates with same-sex instrument
0.2 0.3 0.4 0.5 0.6 0.7−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
p
E(
Y1 −
Y0 | p
,X )
Note: This �gure displays the MTE estimates from the semiparametric generalized Roy model based
on Assumptions 1 and 2, with �Same sex, �rst and second� as instrument. We construct P(Z) using
the parameter estimates from the logit model with average derivatives reported in Table 2. We use
the same speci�cation for the covariates as reported in Table 2. The MTE estimates are based on
double residual regression separately for the treated and non-treated, using local quadratic regression
with rectangular kernel and bandwidth of 0.055. The 95 percent con�dence interval is computed from a
non-parametric bootstrap with 100 bootstrap replications. The y-axis measures the value of the MTE
in years of schooling, whereas the x-axis represents the unobserved component of parents' net gain from
having 3 or more children rather than 2 children. A high value of p means that a family is less likely to
have 3 or more children.
39
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
111 / 72
Model/Assumptions Identification Estimation
Table: Brinch, Mogstad, and Wiswall (JPE, 2017)
Table 5: Tests of constant MTE: Comparing LATEs at di�erent propensity score ranges
LATE over interval (0.20,0.25) (0.30,0.35) (0.40,0.45) (0.50,0.55) (0.60,0.65)- LATE over interval (0.30,0.35) (0.40,0.45) (0.50,0.55) (0.60,0.65) (0.70,0.75)point est. 1.109 1.285 0.053 -0.752 -1.239std. err. 0.441 0.371 0.294 0.285 0.390p-value 0.012 0.001 0.857 0.008 0.002joint p-value 0.000
Note: This table reports tests of constant MTE of family size on the educational attainment of �rst born
children. The MTE estimates are from the semiparametric generalized Roy model based on Assumptions
1 and 2, with �Same sex, �rst and second� as instrument (see Figure 5). We construct P(Z) using the
parameter estimates from the logit model with average derivatives reported in Table 2. We use the same
speci�cation for the covariates as reported in Table 2. The MTE estimates are based on double residual
regression separately for the treated and non-treated, using local quadratic regression with rectangular
kernel and bandwidth of 0.055. The LATEs are derived from the MTE estimates by integrating over the
indicated intervals. Standard errors are based on nonparametric bootstrap (of both estimation stages)
with 100 bootstrap replications.
Table 6: Treatment e�ect parameters using same-sex instrument
model ATE ATT ATUTlower bound 0.232 0.313 0.194
(0.060) (0.086) (0.061)rescaled support 0.423 0.756 0.553
(0.110) (0.171) (0.150)
Note: This table reports ATE, ATET, and ATUT of family size on the educational attainment of �rst born
children. Lower bound: We use estimates of MTE in the region (0.20,0.75). In the regions (0,0.20) and
(0.75,1) the MTE is set equal to 0. Rescaled support: We use estimates of MTE in the region (0.20,0.75),
and rescale the weights to integrate to one over this region. In both cases, the MTE estimates are from
the semiparametric generalized Roy model based on Assumptions 1 and 2, with �Same sex, �rst and
second� as instrument (see Figure 5). We construct P(Z) using the parameter estimates from the logit
model with average derivatives reported in Table 2. . We use the same speci�cation for the covariates
as reported in Table 2. The MTE estimates are based on double residual regression separately for the
treated and non-treated, using local quadratic regression with rectangular kernel and bandwidth of 0.055.
Standard errors are based on nonparametric bootstrap (of both estimation stages), with 100 bootstrap
replications.
34
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
112 / 72
Model/Assumptions Identification Estimation
Table: Brinch, Mogstad, and Wiswall (JPE, 2017)
Table 5: Tests of constant MTE: Comparing LATEs at di�erent propensity score ranges
LATE over interval (0.20,0.25) (0.30,0.35) (0.40,0.45) (0.50,0.55) (0.60,0.65)- LATE over interval (0.30,0.35) (0.40,0.45) (0.50,0.55) (0.60,0.65) (0.70,0.75)point est. 1.109 1.285 0.053 -0.752 -1.239std. err. 0.441 0.371 0.294 0.285 0.390p-value 0.012 0.001 0.857 0.008 0.002joint p-value 0.000
Note: This table reports tests of constant MTE of family size on the educational attainment of �rst born
children. The MTE estimates are from the semiparametric generalized Roy model based on Assumptions
1 and 2, with �Same sex, �rst and second� as instrument (see Figure 5). We construct P(Z) using the
parameter estimates from the logit model with average derivatives reported in Table 2. We use the same
speci�cation for the covariates as reported in Table 2. The MTE estimates are based on double residual
regression separately for the treated and non-treated, using local quadratic regression with rectangular
kernel and bandwidth of 0.055. The LATEs are derived from the MTE estimates by integrating over the
indicated intervals. Standard errors are based on nonparametric bootstrap (of both estimation stages)
with 100 bootstrap replications.
Table 6: Treatment e�ect parameters using same-sex instrument
model ATE ATT ATUTlower bound 0.232 0.313 0.194
(0.060) (0.086) (0.061)rescaled support 0.423 0.756 0.553
(0.110) (0.171) (0.150)
Note: This table reports ATE, ATET, and ATUT of family size on the educational attainment of �rst born
children. Lower bound: We use estimates of MTE in the region (0.20,0.75). In the regions (0,0.20) and
(0.75,1) the MTE is set equal to 0. Rescaled support: We use estimates of MTE in the region (0.20,0.75),
and rescale the weights to integrate to one over this region. In both cases, the MTE estimates are from
the semiparametric generalized Roy model based on Assumptions 1 and 2, with �Same sex, �rst and
second� as instrument (see Figure 5). We construct P(Z) using the parameter estimates from the logit
model with average derivatives reported in Table 2. . We use the same speci�cation for the covariates
as reported in Table 2. The MTE estimates are based on double residual regression separately for the
treated and non-treated, using local quadratic regression with rectangular kernel and bandwidth of 0.055.
Standard errors are based on nonparametric bootstrap (of both estimation stages), with 100 bootstrap
replications.
34
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
113 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Parameters
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
114 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Instruments
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017).
115 / 72
Model/Assumptions Identification Estimation
Figure: Weights on MTE for Empirical Example (Brinch et al)
Weights on MTE for Alternative Instruments, P(Z)
Figure 3: Weight of MTE for treatment e�ects parameters and instruments
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
weights for att
weights for atut
weights for ate
(a) ATT, ATUT, and ATE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(b) IV with Z− as instrument
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
p
we
igh
t
both instruments
samesex only
twins only
(c) IV with P(Z) as instrument
Note: The upper panel graphs MTE weights associated with the average treatment e�ect on the treated
(ATT), the average treatment e�ect (ATE), and the average treatment e�ect on the untreated (ATUT).
The middle panel (Z− as instrument) and lower panel (P (Z) as instrument) graph MTE weights
associated with the IV estimates presented in Table 3. To compute the weights, we use the weight
formulas described in the Appendix. The y-axis measures the density of the distribution of weights,
whereas the x-axis represents the unobserved component of parents' net gain from having 3 or more
children rather than 2 children. A high value of p means that a family is less likely to have 3 or more
children. 37
Source: “Beyond LATE with a Discrete Instrument: Heterogeneity in the Quantity-Quality Interaction of Children”,by Brinch, Mogstad, and Wiswall (JPE, 2017). 116 / 72
Recommended