IV Quantile Regression for Group-level Treatments, with an ...web.stanford.edu/~bjlarsen/Grouped_IV_Quantile_Regression_slides.… · • Effect of law, varying at state-by-year level,

IV Quantile Regression for Group-levelTreatments, with an Application to the

Distributional Effects of Trade

Denis Chetverikov Brad Larsen Christopher Palmer

UCLA Stanford & NBER UC Berkeley

May 2015

1 / 37

MotivationMethod to study effect of group-level treatment on distribution of outcomes in group

In many applied micro settings, researcher has data onmicro-level outcomes within a group and wishes to study effectof group-level treatment

Examples:• Effect of law, varying at state-by-year level, on individual

wages within a state-by-year cell• Effect of school-level policy on student outcomes within a

school• Effect of market-level regulation on outcomes of firms

within a market

OLS of outcome variable on group-level treatment measureseffect of treatment on average outcome in group

We want computationally simple approach to estimate effect ondistribution of outcomes

2 / 37

The most basic model we considerA group-level treatment and micro-level data on outcomes within group

• For a fixed quantile u ∈ (0, 1),

Qyig|xg(u) = x′gβ(u) + εg(u)

• yig : outcome for individual i in group g• xg : treatment for group g (contains constant too)• εg(u) : group-level unobservables

• For now, assume xg ⊥ εg(u)• If εg(u) = 0, basic quantile estimation works (Koenker and

Bassett 1978):

β̂(u) = arg minβ

G

∑g=1

Ng

∑i=1

ρu(yig − x′gβ)

where ρu(x) = (u− 1{x < 0})x3 / 37

Downsides to standard quantile regression

• Standard quantile regression inconsistent if εg(u) 6= 0,even when xg ⊥ εg(u)

• εg(u) akin to left-hand side measurement error or omittedvariables

• LHS measurement error biases quantile regression(Hausman 2000; Hausman, Luo, and Palmer 2014)

• When dimension of xg large, standard quantile regressionextremely slow (ex: group is state-by-year cell and modelincludes state and year effects)

• Standard errors in quantile regression computationallyburdensome (no simple analytic approaches to handlingheteroskedasticity, clustering, etc.)

4 / 37

Our estimator: Grouped quantile regression

• Our estimator in this simple case:1 Compute u quantile within each group (e.g. median wage

in each state-by-year cell)2 OLS regression of group-level quantile on xg (a regression

at the group-level)

• In Stata, for u = 0.1 (10th percentile), as simple as

· collapse xvar (p10) yvar_p10 = yvar, by(group_id)· reg yvar_p10 xvar

• Benefits:• εg(u) 6= 0 not a problem, handled in second step (OLS)• Much faster to compute; large-dimensional xg handled in

second step (OLS)• Under large G, N asymptotics, can use traditional

heteroskedasticity and clustering appproaches for standarderrors

5 / 37

More general cases of our model/estimatorxg is endogenous


Qyig|xg(u) = x′gβ(u) + εg(u)

where xg, εg are not independent

• Estimator:1 Compute u quantile within each group2 2SLS regression of group-level quantile on xg,

instrumenting with wg, group-level instrument

• In Stata, for u = 0.1 (10th percentile),

· collapse xvar (p10) yvar_p10 = yvar, by(group_id)· ivregress 2sls yvar_p10 xvar1 (xvar2 = wvar)

6 / 37

Our IV quantile estimator applies to different settingsthat other IV quantile estimators

• IV quantile approaches (Abadie, Angrist, and Imbens 2002;Chernozhukov and Hansen 2005) differ:

• Model has RHS variable of interest (such as xg) correlatedwith unobserved quantile (u, considered a randomvariable)

• No unobserved, additively separable variables

• Our model:• u ∈ (0, 1) is a fixed quantile of interest (or a vector of

indices of interest, U , potentially the entire intervalU = (0, 1))

• Unobserved, additively separable variables do exist (εg(u))• RHS variable of interest xg is correlated with εg(u)

7 / 37

More general cases of our model/estimatorRight-hand side contains micro-level covariates


Qyig|xg(u) = z′igγ(u) + x

′gβ(u) + εg(u)

where xg, εg correlated and zig micro-level covariates

• Estimator:1 In each group, run quantile regression and save coefficient

on the constant2 2SLS regression of coefficients on xg, instrumenting with wg

8 / 37

Comparison of our model to other quantile panelmodelsWith micro-level covariates, our model looks similar to other panel quantile methods,but we can estimate group-level effects

• Model of Kato, Galvao, and Montes-Rojas (2012), Kato andGalvao (2011), Koenker (2004):

Qyig|zig,αg(u) = z′igγ(u) + αg(u)

• Provide estimator for γ(u)• Can’t estimate our β(u) because xg would be absorbed by

group-level fixed effects• These papers do not consider endogeneity

9 / 37

Our estimator is quantile extension of Hausman andTaylor (1981)

• Hausman and Taylor (1981) linear panel model

yig = z′igγ + x′gβ + εg + νig

• If xg correlated with εg, need group-level fixed effects αg toidentify γ (“within regression”)

• To estimate β, can regress fixed effect estimates, αg, on xg(“between regression”)

• Similar to steps of our estimator

• Hausman and Taylor (1981) point out ‘internalinstruments” (such as z̄g). Also works here:

• within variation of zig is used to estimate γ• between variation of z̄g is used to instrument for xg

10 / 37

More general cases of our model/estimatorInteraction effects of group-level treatment with micro-level covariates

• Now consider model

Qyig|xg(u) = γ0(u) + zig(x′gβ(u) + εg(u))

where xg, εg correlated and zig micro-level covariate (scalar)

• For example, researcher might be interested in how astate-by-year-varying policy differentially affected wagesfor individuals of differing education levels (whereindividual educ level is contained in zig)

• Estimator:1 In each group, run quantile regression of yig on zig and save

coefficient on zig2 2SLS regression of coefficients on xg, instrumenting with wg

11 / 37

Theoretical results

• Consistent and asymptotically normal under growthcondition: as G→ ∞,

G2/3(log NG)/NG → 0

• Number of groups (G) and number of individuals pergroup (NG) both grow large

• Mild growth condition compared to other nonlinear paneldata models, which typically require at least

G/N → c > 0

• Under growth condition, first-stage error negligible⇒ Traditional heteroskedasticity-robust or clusteredstandard errors can be used in second stage

12 / 37

Additional theoretical results

• Derive joint asymptotic behavior of our estimator over allindices u ∈ U and provide an estimator of asymptoticcovariance function

• Derive confidence bands for β(u) that hold uniformly overU (i.e. for inference over multiple quantilessimultaneously)

• Derive approach for uniform inference over set {αg,1(u)}

13 / 37

Monte Carlo simulation

• Let

yig = zigγ(uig) + xgβ(uig) + ε(uig, ηg)

• The variable xg is correlated with ηg, where

xg = πwg + ηg + νg

• wg, νg, zig ∼ exp(0.25∗N[0, 1]); uig, ηg ∼ U[0, 1], ε(u, η) = uη• Generate data with G ∈ {25, 200}, N ∈ {25, 200}

• Estimate β(·) using traditional quantile regression andusing grouped IV quantile regression

• Also examine case where xg is exogenous (ηg doesn’t enterfirst stage) and case with ε = 0 (no group-levelunobservables)

14 / 37

Bias of grouped IV quantile estimator relative tostandard quantile regression

(N,G) Q regGrouped IV

Q. Reg. Q reg

Grouped IV

Q. Reg. Q reg

Grouped IV

Q. Reg.

(25,25) 0.197 0.108 0.010 0.017 0.004 0.023

(200,25) 0.195 0.037 0.010 0.007 0.002 0.004

(25,200) 0.193 0.008 0.009 0.014 0.001 0.004

(200,200) 0.195 0.003 0.010 0.003 0.000 0.004

Endogenous x Exogenous x

No group-level

unobservables

15 / 37

Several recent papers apply our estimatorExample applications

1 Angrist and Lang (2004), studies Boston’s Metco program,looks at impact on lower tail of student outcomes by school

2 Palmer (2012) studies effects of suburbanization at the citylevel on within-city distribution of outcomes

3 Larsen (2014) studies effect of occupational licensing ondistribution of teacher quality

4 Backus (2015) studies question of whether competitionincreases productivity through weeding outless-productive firms (affecting mainly lower tail ofproductivity) or increasing productivity of all firms

16 / 37

Our application: The effect of increased importcompetition on the distribution of local wages

Background:• Wage inequality increased drastically over past 40 years• Heated debates as to cause (globalization vs. skill-biased

technological change vs. declining real minimum wage)• Autor, Dorn, and Hansen (2013) (ADH) show local labor

markets with greater emphasis on manufacturing hadgreater decrease in average local wage

• ADH instrument for Chinese import competition in USwith Chinese import competition in other developedcountries

17 / 37

Applying grouped IV quantile regression in the ADHframework

• A “group” is local labor market (“commuting zone”)• ADH have micro-level data on individual wages for many

workers in each group• ADH compute average wage in group, regress change in

group-level average wages on Chinese import competitionvia 2SLS

• Our approach: compute group-level quantiles rather thanaverage, then follow ADH

⇒ We can quantify effect of Chinese import competition ondistribution of local wages

18 / 37

ADH regression of interest

• Regression of interest given by

∆ln wg = β1∆IPWUg + X′gβ2 + εg

where• ∆ln wg: change in average individual log weekly wage in a

given CZ in a given decade• ∆IPWUg : change in Chinese imports per US worker for the

CZ and decade corresponding to group g• Xg: characteristics of the CZ and decade, including decade

indicators

• ADH instrument for ∆IPWUg with ∆IPWOg , change inChinese imports to other similarly developed nations forthe same industry

• We replace ∆ln wg with change in quantile of log wages ingroup g

19 / 37

Effect of Chinese Import Competition on ConditionalWage Distribution: Full SampleUnits = change in log points due to $1,000 change in Chinese imports per US worker

20 / 37

Effect of Chinese Import Competition on ConditionalWage Distribution: Males OnlyUnits = change in log points due to $1,000 change in Chinese imports per US worker

21 / 37

Effect of Chinese Import Competition on ConditionalWage Distribution: Females OnlyUnits = change in log points due to $1,000 change in Chinese imports per US worker

22 / 37

ConclusionComputationally simple estimator for effects of group-level treatment on distributionof outcomes within group

• When researcher has outcome data on individuals within agroup, and the variable of interest varies at the group level,estimator is

1 In each group, run quantile regression and save coefficienton the constant

2 2SLS regression of coefficients on xg, instrumenting with wg• If no micro-level covariates, step (1) replaced by simply

computing quantile (e.g. median, 20th percentile, etc.)within group

• If no endogeneity, step (2) replaced by OLS• Standard errors simple: standard approaches for

OLS/2SLS• Much faster than standard quantile regression even when

both valid

23 / 37

Appendix:

Example from Larsen (2014)

General case of estimator

Additional notes on theoretical properties

24 / 37

Example: Larsen (2014) (teacher licensing)

• Proponents of occupational licensing argue it weeds outlow-quality candidates from profession

• Opponents argue it drives our high-quality candidates• Test by quantile regression of quality measure on licensing

stringency measure• Let Qst(u) be the uth quantile of teacher quality within state

s and year t (here an (s, t) combination = a group)

Qst(u) = γs(u) + λt(u) + Law′stδ(u) + εst(u)

whereγs is a state fixed effectλt is a fixed effect for year tLawst is indicator for whether candidates required to passlicensing test in state s in year t

25 / 37

If increasing licensing stringency leads to increasedquality...

• Could be due to weeding out low-quality candidates, orimproving whole distribution

0

Teacher quality

Den

sity

→

Teacher qualityD

ensi

ty

→

• Left tail effect is what proponents argue exists; hasn’t beentested

• Previous literature looks only at average—unable todistinguish difference

26 / 37

If increasing licensing stringency actually decreasesquality...

• Could be due to driving away high quality candidates, ordecreasing whole distribution

Teacher quality

Den

sity

←

Teacher quality

Den

sity

←

• Averages alone not sufficient to distinguish

27 / 37

Effects on distribution of input qualityDoes licensing raise lower tail of quality, drive out high-quality candidates?

21

Figure 4: Effects of certification test laws on input quality distribution

Notes: Effects of subject test law, basic skills test law, and professional knowledge test law on quantiles of teacher

input quality distribution. Panels on the left display first-year teacher sample and on the right display pooled

teacher sample. Robust, pointwise 95% confidence bands are displayed by dashed lines.

The pooled sample of teachers yields a significant effect of subject test laws on the

distribution of teacher qualifications, as shown in panel (b) of Figure 4. The effect of licensing

is positive, implying that the sample of teachers who remain in the occupation for multiple

years is of higher quality when subject test laws are in place than when they are not.

Interestingly, this result is relatively flat across the distribution, indicating that subject test laws

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

First-year teachers sample

(a)

Subje

ct

test

Avg u

nderg

rad S

AT

(std

)

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

(c)

Basic

skill

s t

est

Avg u

nderg

rad S

AT

(std

)

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

(e)

Pro

f. k

now

ledge t

est

Avg u

nderg

rad S

AT

(std

)

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

Pooled teachers sample

(b)

Avg u

nderg

rad S

AT

(std

)

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

(d)

Avg u

nderg

rad S

AT

(std

)

0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

Quantile of quality

(f)

Avg u

nderg

rad S

AT

(std

)

28 / 37

Most general case of model• Most general case given by

Qyig|zig,xg,αg(u) = z′igαg(u)

αg,1(u) = x′gβ(u) + εg(u)

• Estimator:1 For each group g, run quantile regression of yig on zig using

α̂g(u) = arg mina∈Rdz

Ng

∑i=1

ρu(yig − z′iga),

Denote α̂g(u) = (α̂g,1(u), . . . , α̂g,dz)′

2 2SLS regression of α̂g,1(u) on xg using wg as instrument,that is,

β̂(u) =(X′PWX

)−1 (X′PWÂ(u))where X = (x1, ..., xG)′, W = (w1, ..., wG)′,Â(u) = (α̂1,1(u), . . . , α̂G,1(u))′, and PW = W(W′W)−1W′

29 / 37

Theoretical properties of the estimator: substantialconditions

1 Design (i) Observations are independent across groups.(ii) For all g, the pairs (zig, yig) are i.i.d. across i = 1, . . . , Nconditional on (xg, εg).

2 Instruments (i) E[wgεg(u)] = 0. (ii)G−1 ∑Gg=1 E[xgw′g]→ Qxw and G−1 ∑Gg=1 E[wgw′g]→ Qww.(iii) The matrices Qxw and Qww have singular valuesbounded from below and from above. (iv) yig isindependent of wg conditional on (zig, xg, αg). (v)E[‖wg‖4+δ] is finite.

3 Growth Condition G2/3(log N)/N → 0.

30 / 37

Theoretical properties of the estimator: otherregularity conditions

4 Covariates (i) Random vectors zig and xg are bounded. (ii)All eigenvalues of Eg[z1gz′1g] are bounded.

5 Coefficients ‖αg(u2)− αg(u1)‖ ≤ CL|u2 − u1|.

6 Noise (i) E[supu∈U |εg(u)|4+δ] is finite. (ii) For some(matrix-valued) function J : U × U → Rdw×dw ,G−1 ∑Gg=1 E[εg(u1)εg(u2)wgw′g]→ J(u1, u2) uniformly overu1, u2 ∈ U . (iii) |εg(u2)− εg(u1)| ≤ CL|u2 − u1|.

7 Density Some standard conditions on the density of yigappearing in the quantile regression literature.

8 Quantile indices The set of quantile indices U is a compactset included in (0, 1).

31 / 37

Theoretical properties of the estimatorTheorem (Main convergence result)Let Assumptions 1-8 hold. Then

√G(β̂(·)− β(·))⇒ G(·), in `∞(U )

where G(·) is a zero-mean Gaussian process with uniformlycontinuous sample paths and covariance functionC(u1, u2) = SJ(u1, u2)S′ where

S =(

QxwQ−1wwQ′xw

)−1QxwQ−1ww

J(u1, u2) = limG→∞

1G

G

∑g=1

E[εg(u1)εg(u2)wgw′g]

Qxw = limG→∞

1G

G

∑g=1

E[xgw′g], Qww = limG→∞1G

G

∑g=1

E[wgw′g].

32 / 37

Main growth conditionTheorem requires that

G2/3(log N)/N → 0The number of observations per group is allowed to be smallerthan the number of groups.

• This is interesting because nonlinear panel data modelstudies typically require at least

G/N → c > 0.This is achieved by employing asymptotic unbiasedness of thequantile regression estimator via the Bahadur representation:

α̂g(u)− αg(u) =1N

N

∑i=1

ψig(u)+OP(N−3/4), where E[ψig] = 0, and so

1√G

G

∑g=1

wg(α̂g(u)− α(u)) =1

N√

G

G

∑g=1

N

∑i=1

wgψig(u)+OP

( √G

N3/4

),

which is oP(1), yielding the growth condition.33 / 37

Estimation of covarianceLet

Ĉ(u1, u2) = ŜĴ(u1, u2)Ŝ′

Ŝ = (Q̂xwQ̂−1wwQ̂′xw)−1Q̂xwQ̂−1ww

Ĵ(u1, u2) =1G

G

∑g=1

((α̂g(u1)− x′g β̂(u1))(α̂g(u2)− x′g β̂(u2))wgw′g

)Q̂xw =

1G

G

∑g=1

xgw′g, and Q̂ww =1G

G

∑g=1

wgw′g.

We show that Ĉ(u1, u2) is consistent for C(u1, u2) uniformlyover u1, u2 ∈ U .Theorem (Estimating C(·, ·))Under the same conditions as those in Theorem 1,

Ĉ(u1, u2)− C(u1, u2) = op(1)

uniformly over u1, u2 ∈ U .34 / 37

Simultaneous confidence bandsThus, point-wise standard errors for our estimator can beconstructed using traditional heteroscedasticity robustapproaches for 2SLS estimator (extension to clustered standarderrors is also available)

We can also construct simultaneous confidence bands coveringthe whole function {βj(u), u ∈ U}. Indeed, take a statistic

T = supu∈U

√G|β̂j(u)− βj(u)|√Ĉjj(u, u)

Simultaneous confidence bands with coverage probability α areβ̂j(u)− cα√Ĉjj(u, u)

G, β̂j(u) + cα

√Ĉjj(u, u)

G

where cα is the (1− α)th quantile of T.

35 / 37

Simultaneous confidence bands: multiplier bootstrapprocedure

The bands above are infeasible because cα is unknown. We usethe multiplier bootstrap method to estimate it:

1 Generate i.i.d. sequence of N(0, 1) random variables{ei, 1 ≤ i ≤ n} that are independent of the data

2 Define the multiplier bootstrap statistic

TMB = supu∈U

1√GĈjj(u, u)

G

∑g=1

(eg(α̂g − x′g β̂(u)) · (Ŝwg)j)

)3 Define the multiplier bootstrap estimate of cα

ĉα = (1− α) quantile of distribution of TMB given the data

Using results in Chernozhukov, Chetverikov, Kato (2013, 2014a,2014b, 2015), we can show that ĉα is a good estimator of cα

36 / 37

Simultaneous confidence bands

Theorem (Validity of Simultaneous Confidence BandsBased on MB Procedure)

Let Assumptions 1-8 hold. In addition, suppose that all eigenvalues ofJ(u, u) are bounded away from zero uniformly over all u ∈ U . Then

P

β̂j(u)− ĉ1−α√Ĉjj(u,u)

G ≤ βj(u) ≤ β̂j(u) + ĉ1−α

√Ĉjj(u,u)

Gfor all u ∈ U

→ 1− α.

37 / 37

Background

Documents

IV Quantile Regression for Group-level Treatments, with an ...web.stanford.edu/~bjlarsen/Grouped_IV_Quantile_Regression_slides.… · • Effect of law, varying at state-by-year level,