37
This article was downloaded by: [University of California Santa Cruz] On: 14 November 2014, At: 10:26 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Econometric Reviews Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lecr20 Class Size and Educational Policy: Who Benefits from Smaller Classes? Esfandiar Maasoumi a , Daniel L. Millimet a & Vasudha Rangaprasad a a Department of Economics , Southern Methodist University , Dallas, Texas, USA Published online: 06 Feb 2007. To cite this article: Esfandiar Maasoumi , Daniel L. Millimet & Vasudha Rangaprasad (2005) Class Size and Educational Policy: Who Benefits from Smaller Classes?, Econometric Reviews, 24:4, 333-368, DOI: 10.1080/07474930500405485 To link to this article: http://dx.doi.org/10.1080/07474930500405485 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Class Size and Educational Policy: Who Benefits from Smaller Classes?

  • Upload
    vasudha

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Class Size and Educational Policy: Who Benefits from Smaller Classes?

This article was downloaded by: [University of California Santa Cruz]On: 14 November 2014, At: 10:26Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Econometric ReviewsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/lecr20

Class Size and Educational Policy: WhoBenefits from Smaller Classes?Esfandiar Maasoumi a , Daniel L. Millimet a & Vasudha Rangaprasad aa Department of Economics , Southern Methodist University , Dallas,Texas, USAPublished online: 06 Feb 2007.

To cite this article: Esfandiar Maasoumi , Daniel L. Millimet & Vasudha Rangaprasad (2005) Class Sizeand Educational Policy: Who Benefits from Smaller Classes?, Econometric Reviews, 24:4, 333-368, DOI:10.1080/07474930500405485

To link to this article: http://dx.doi.org/10.1080/07474930500405485

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Econometric Reviews, 24(4):333–368, 2005Copyright © Taylor & Francis, Inc.ISSN: 0747-4938 print/1532-4168 onlineDOI: 10.1080/07474930500405485

CLASS SIZE AND EDUCATIONAL POLICY: WHO BENEFITS FROMSMALLER CLASSES?

Esfandiar Maasoumi, Daniel L. Millimet,and Vasudha Rangaprasad � Department of Economics, Southern MethodistUniversity, Dallas, Texas, USA

� The impact of class size on student achievement remains an open question despite hundredsof empirical studies and the perception among parents, teachers, and policymakers that largerclasses are a significant detriment to student development. This study sheds new light on thisambiguity by utilizing nonparametric tests for stochastic dominance to analyze unconditionaland conditional test score distributions across students facing different class sizes. Analyzing theconditional distributions of test scores (purged of observables using class-size specific returns),we find that there is little causal effect of marginal reductions in class size on test scores withinthe range of 20 or more students. However, reductions in class size from above 20 studentsto below 20 students, as well as marginal reductions in classes with fewer than 20 students,increase test scores for students below the median, but decrease test scores above the median.This nonuniform impact of class size suggests that compensatory school policies, whereby lower-performing students are placed in smaller classes and higher-performing students are placed inlarger classes, improves the academic achievement of not just the lower-performing students butalso the higher-performing students.

Keywords Class size; Program evaluation; Quantile treatment effects; School quality: Stochasticdominance; Student achievement.

JEL Classification C14; C33; I21; I28.

1. INTRODUCTION

The effect of school quality, notably class size, on student achievementis one of the most debated educational policy issues (e.g., Hanushek,1986, 1996, 2003; Krueger, 2003). While a number of researchers haveanalyzed the issue, no consistent relationship between class size andstudent outcomes has been identified. In a recent review, Akerhielm (1995)

Received August 2004; Accepted September 2005Address correspondence to Daniel L. Millimet, Department of Economics, Box 0496, Southern

Methodist University, Dallas, TX 75275-0496, USA; Fax: (214) 768-1821; E-mail: [email protected].

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 3: Class Size and Educational Policy: Who Benefits from Smaller Classes?

334 E. Maasoumi et al.

noted that of 112 studies in the literature, 23 documented a statisticallysignificant relationship between class size and student achievement, 14finding a negative relationship and nine showing a positive relationship.1

Consequently, Krueger (2003, p. F61) labels the effect “subtle and easilyobscured,” Fertig (2003b, p. 2) calls the mixed results “disconcerting,” andCard and Krueger (1996, p. 47) note that “decisions about educationalresources and reform have to be made in an environment of muchuncertainty.” In spite of these ambiguous findings, the common perceptionamong parents, teachers, and policymakers is that larger classes are asignificant detriment to student development. Several states, includingTexas, have mandated maximum class size levels. As a result, Dustmann(2003, p. F1) labels class size reduction policies as “easily the most popularpolicy for school improvements in the US,” although efficacy of suchpolicies is “an issue of ongoing debate.” Hanushek (2002, p. 61) concludes,“Despite the political popularity of overall class size reduction, the scientificsupport of such policies is weak to nonexistent.”

Given the importance of student development, future labor marketsuccess, racial achievement and wage gaps, economic growth, and thelimited budgets that school districts confront, improved understanding ofany benefits of reductions in class size, and of who benefits, are crucialto informed policymaking. Unfortunately, all existing empirical studies (toour knowledge) are lacking in at least one of three important respects,contributing to the inconsistent results noted above. First, in the past,most studies utilized the average pupil–teacher ratio at an individual’sschool to proxy for the actual class size faced by the individual, given theabsence of data on actual class size. However, the relationship betweenstudent performance and average pupil–teacher ratios is likely to be weaker(Ehrenberg et al., 2001). Second, nearly all studies focus on only oneaspect of the distribution of student achievement: the (conditional) mean.2

This is helpful only to a decision maker who places equal weights on allstudent groups. Finally, class size is treated as an intercept effect/shift,potentially missing important interactions between class size and othereducational inputs. For example, it may be that effects of smaller classesarise because they increase the productivity of other inputs, such as teacherquality or parental involvement.

To account simultaneously for these three shortcomings, we use anationally representative sample of public high school students in the US.

1More recently, Todd and Wolpin (2003, p. F4) state that despite “hundreds of empirical studiesof the school quality–achievement relationship,” such studies “do not appear to be convergingtoward a consensus.” See also Hanushek (2002).

2Notable exceptions are Eide and Showalter (1998), Levin (2001), and Fertig (2003a), whouse quantile regression methods to analyze the effect of school attributes on student achievementat select points in the distribution. However, Eide and Showalter (1998) only have data on thepupil–teacher ratio at the school level, not actual class size, and Levin (2001) and Fertig (2003a)analyze Dutch and German data, respectively.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 4: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 335

The data, taken from the National Educational Longitudinal Study of 1988(NELS), contain several test score measures, the class size correspondingto the subject being tested, as well as an exhaustive set of conditioningvariables. We examine the entire distribution of tenth and twelfth gradetest scores, both unconditional and conditional, across several class sizegroupings, controlling for a host of student, family, and school attributesand incorporating the logic of the Oaxaca–Blinder decomposition toallow for interaction effects between class size and the returns to othereducational inputs.

Our comparisons of the distributions of test scores rely on recentlydeveloped nonparametric tests for stochastic dominance (SD). Such testsof first, second, and higher order stochastic dominance (hereafter FSDand SSD, respectively) were examined in McFadden (1989) and Kauret al. (1994). Our implementation of these tests treats the distribution ofstudent test scores as an unknown to be estimated nonparametrically, andit draws upon bootstrap techniques developed in Maasoumi and Heshmati(2000) and Linton et al. (2005) to assess the level of statistical confidenceregarding various relations.

Testing for SD is a useful and insightful companion to standardregression analysis for several reasons. First, it allows one to examineeffects of the “treatment” in question at different parts of the distribution,as viewed by very different evaluation functions. Because many policyinterventions have different effects on high-, middle-, and low-achievingstudents, any summary (index) measure of this distributed outcome mustassign implicit weights to these different groups. For instance, averagesassign equal weights, while focusing on the lowest quintile, for instance,places zero weight on others, etc. Second, SD analysis endows a contextto averages and estimators of Quantile Treatment Effects (QTEs), whichexposes the welfare functions, or policy evaluation functions, underlyingthe assessment of the differences in the distributions of the outcomes ofdifferent groups exposed to different treatments (see, e.g., Bitler et al.,2005). For example, SD analysis may make clear that the only way twosituations can be (could have been) ranked is by a summary measurethat prefers higher scores and cares about “inequality” or certain trade-offs between groups. Third, finding a SD ranking (of a particular order)indicates that all utility/welfare functions belonging to a particular classwould agree that one policy is preferable to another. Thus summarizedcomparisons based on specific indices (e.g., mean comparisons in the caseof first order SD) are only needed for quantification of the treatmenteffect. In addition, the inability to infer a dominance relation is equallyinformative, indicating that any (implicit) welfare ordering based on aparticular index (such as the conditional mean) is highly subjective;different indices, even within the same general class of utility/welfarefunctions, will yield different substantive conclusions. As a result, summary

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 5: Class Size and Educational Policy: Who Benefits from Smaller Classes?

336 E. Maasoumi et al.

measures may be used for complete, strong rankings of student outcomesacross classes of varying size, consistent with explicitly revealed welfarepreferences. More sophisticated techniques are needed for weak yetuniform ranking of student outcomes over large classes of welfare functions.Such uniform rankings (i.e., rankings robust to the specific choiceof preference function and distribution of outcomes) are needed for“majority” rankings of policy outcomes. Empirical examination of suchuniform rankings, based on the notion of stochastic dominance (SD), isthe approach utilized in this paper.

The power of such SD relations, combined with the recently developedtheory necessary to conduct rigorous statistical tests for the presenceof such relations, has led to their growing application. For example,Maasoumi and Millimet (2005) examine changes in US pollutiondistributions over time and across regions at a point in time. Maasoumi andHeshmati (2005) analyze changes in the Swedish income distribution overtime as well as across different population subgroups. Fisher et al. (1998)compare the distribution of returns to different length US Treasury Bills.Particularly relevant to the analysis at hand are previous applications ofSD to the analysis of various treatment effects on the distribution of theoutcome of interest. For instance, Amin et al. (2003) analyze the effect of amicrocredit program in Bangladesh on the distribution of consumption ofparticipants versus nonparticipants. Abadie (2002) analyzes the impact ofveteran status on the distribution of civilian earnings. Bishop et al. (2000)compare the distribution of nutrition levels across populations exposed totwo different types of food stamp programs. Anderson (1996) comparespre- and posttax income distributions in Canada over several years.

The results are quite striking. In particular, we reach four mainconclusions. First, we find that the unconditional distributions of test scoresfavor students in larger classes below the 70th percentile or so, and studentsin smaller classes in the upper tail. Specifically, while marginal changesin class size in classes with at least 20 students are not associated withchanges in the unconditional test score distributions, the unconditionaltest scores of low-achieving (high-achieving) students are superior in classeswith at least (less than) 20 students. Second, analyzing the distribution oftest scores purged of a multitude of individual, family, class, teacher, andschool attributes, including lagged test scores and grade point average,using class-size specific returns, we find that there is little causal effect ofmarginal reductions in class size on test scores within the range of 20 ormore students. However, reductions in class size from above 20 students tobelow 20 students, as well as marginal reductions in classes with fewer than20 students, increase test scores for students below the median, but decreasetest scores for students above the median. Third, our conclusions hinge onthe fact that we allow the returns to observables to vary by class size. Infact, unknown heretofore, we conclude that most of the beneficial impacts

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 6: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 337

of class size reduction arise because of the productivity-enhancing effect ithas on other educational inputs. However, the improvement in the returnsto observables is nonmonotonic; the overall returns are maximized (insome sense) in classes with 16–19 students. Finally, class size reductionsrarely have a uniform impact across the test score distributions (i.e., theQTEs vary across the distributions in terms of sign and magnitude), sothat the current focus on mean treatment effects is misleading. Only byexpanding one’s analysis to also incorporate the dispersion of test scorescan policymakers hope to arrive at uniform rankings of class sizes.

In terms of informing educational policy, our results provide twokey insights. First, since marginal reductions in class size in classes withmore than 20 students have little impact on test scores, absent otherconsiderations (e.g., student discipline), class size reductions in schoolswith current class sizes well above 20 should only be undertaken ifschools are willing to reduce class size close to or below 20 studentsper class. Second, the nonuniform impact of class size suggests thatcompensatory school policies, whereby lower-performing students areplaced in smaller classes and higher-performing students are placed inlarger classes, improves the academic achievement of not just the lower-performing students but also the higher-performing students.

The remainder of the paper is organized as follows. Section 2 presentsan overview of the literature. Section 3 defines the various dominancerelations and describes the tests used to identify such relations in the data.Section 4 discusses the data. Section 5 presents the initial results, whileSection 6 presents some further detailed analysis. Section 7 offers someconcluding remarks.

2. LITERATURE REVIEW

Given that student achievement has been linked to variation ineconomic growth across countries (Barro, 2001; Hanushek and Kimko,2000), and that racial disparities in student achievement have been well-documented (Cook and Evans, 2000; Hanushek, 2001; Jenks and Phillips,1998), a considerable literature has developed attempting to understandthe factors contributing to student development. The vast majority ofprevious studies focus on the pupil–teacher ratio at the school level toproxy for class size (e.g., Chubb and Moe, 1990; Coleman et al., 1996; Eideand Showalter, 1998), with the resulting link between student achievementand pupil–teacher ratios found to be tenuous at best (Hanushek, 1986,1989). More recent work has utilized actual measures of class size (e.g.,Akerhielm, 1995; Boozer and Rouse, 2001).

In early work, Summers and Wolfe (1977) analyzed data on 627sixth grade elementary school students across 103 randomly selectedelementary schools from the Philadelphia School District in 1970 and

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 7: Class Size and Educational Policy: Who Benefits from Smaller Classes?

338 E. Maasoumi et al.

1971, as well as 553 eighth grade students and 716 twelfth grade students.The authors concluded that disadvantaged students and students fromlow socioeconomic backgrounds benefited from smaller class sizes. Severalmore recent studies have reaffirmed this conclusion. Angrist and Lavy(1999) find a beneficial impact of smaller classes on the achievement offourth and fifth grade students in Israel using an instrumental variable (IV)approach. Rivkin et al. (2002) use a complex fixed effects model and dataon class size within Texas schools across multiple cohorts, documentingsome significant positive effects of smaller classes on fourth and fifthgrade students, although the effects disappear by sixth grade and inany event are of much smaller magnitude than the impact of teacherquality. Akerhielm (1995) and Boozer and Rouse (2001) use data fromthe NELS, relying on IV methods to control for potential endogeneity ofclass size. Both find statistically significant, negative effects of larger classeson test scores. Utilizing experimental evidence on kindergarten throughthird grade students from Tennessee’s Project STAR (Student/TeacherAchievement Ratio), Krueger (1999) finds a modest beneficial impact ofsmaller classes during kindergarten and first grade but no impact onsubsequent achievement.3 Krueger and Whitmore (2002), also examiningthe Project STAR data, document a positive impact of class size reductions,particularly for minorities.

Conversely, several studies have offered compelling empirical supportfor the notion that (i) smaller classes are not as beneficial as the abovestudies suggest, or (ii) larger classes are actually beneficial for students.Jepsen and Rivkin (2002) examine the effectiveness of California’sClass Size Reduction Program (CSRP) on student achievement usinglongitudinal data from the 1990s. Employing a difference-in-differencesapproach, the authors find that smaller class sizes raised third grademathematics and reading test scores, particularly for low income students.However, the reduction in class size lowered the quality of teachers onthe margin, and the deterioration in teacher quality at least partiallyoffsets the gains from smaller classes (see also Rivkin et al., 2002 fora similar argument). Hoxby (2000a) employs a similar approach tomany of the above studies, relying on exogenous variation in class sizein Connecticut arising from idiosyncratic variation in the population,and finds no significant relationship between class size and studentperformance on tests in fourth and sixth grade. Goldhaber and Brewer(1997) control for school and teacher fixed and random effects, findinga positive impact of class size on mathematics achievement of tenth

3Interestingly, Hanushek (2002) implicitly suggests the benefit of examining achievementdistributions, rather than average outcomes, noting that while overall average kindergartenachievement is higher in smaller classes under the STAR experiment, this ranking only holds in 40of 79 schools participating in the experiment.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 8: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 339

grade students using the NELS. Wößmann (2003) uses internationaldata from the Trends in International Mathematics and Science Study(TIMSS) on seventh and eighth grade students, spanning 39 countries, andpresents statistically significant evidence (using IV and other estimationtechniques) that smaller class size is related to lower student achievementin mathematics and science. Fertig (2003a), using German data on thereading achievement of 15 and 16 year old students, also finds a statisticallysignificant positive impact of larger classes using both OLS and quantileregression methods and controlling for a host of other attributes, includingthe homogeneity (in terms of ability) of the school population. The authorargues that heterogeneity, not class size, is a more important detrimentto student performance (see also Fertig, 2003b and Lazear, 2001). Finally,Dobbelsteen et al. (2002, p. 36) utilize Dutch data on fourth, sixth, andeighth grade students and exogenous variation in class size arising fromrules linking total school enrollment to the number of teachers. Theauthors find that “after correcting for endogeneity, pupils in large classesdo no worse—and sometimes even better—than identical pupils in smallclasses.” Consonant with Fertig (2003a,b), the authors test and find supportfor their hypothesis that students learn from other students of similarability; thus larger classes increase the probability of being surrounded byothers from whom one may learn (see also Levin, 2001).

Finally, Krueger (2003) performs a meta-analysis of the literature,finding in general a small, not particularly robust, positive impact ofsmaller class sizes. Hanushek (1996, 2002, 2003) also performs severalmeta-analyses, concluding that there is no consistent relationship betweenresources and student achievement. This lack of consensus suggests thatexamination of the QTEs, along with the application of SD testing,may be particularly fruitful. To contrast, it is worth reemphasizing thatpractically all the above studies utilizing regression-based inferences facetwo limitations: (i) the assumed structure of the various educationalproduction functions estimated may be too simplistic as they restrict classsize to only an intercept shift and (ii) the focus is on the conditional meanof the distribution of scores/performance and the impact of differentconditioning variables thereon.

3. EMPIRICAL METHODOLOGY

3.1. Test Statistics

Our distributional comparisons are based on the notion of SD. Severaltests for SD have been proposed in the literature; the approach herein isbased on a generalized Kolmogorov-Smirnov test.4 To begin, let X and Y

4Maasoumi and Heshmati (2000) provide a brief review of the development of alternative tests.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 9: Class Size and Educational Policy: Who Benefits from Smaller Classes?

340 E. Maasoumi et al.

denote two outcome (test score) variables being compared (e.g., X (Y )might refer to test scores of students exposed to a short (long) schoolyear). �xi�Ni=1 is a vector of N possibly dependent observations of X ; �yi�Mi=1is an analogous vector of realizations of Y . In the spirit of the historicaldevelopment of such two-sample tests, �xi�Ni=1 and �yi�Mi=1 each constituteone sample. Thus we refer to dependence between xi and xj , i �= j , aswithin-sample dependence (similarly for observations of Y ), and dependencebetween X and Y as between-sample dependence.

Assuming general von Neumann–Morgenstern conditions, let �1

denote the class of (increasing) utility functions u such that utility isincreasing in test scores (i.e., u ′ ≥ 0) and �2 the class of social welfarefunctions in �1 such that u ′′ ≤ 0 (i.e., concavity). Concavity represents riskaversion, or an aversion to inequality in the achievement of students; a highconcentration of both high- and low-achieving students is undesirable. LetF (x) and G(y) represent the cumulative density functions (CDF) of X andY , respectively, which are assumed to be continuous and differentiable.

Under this notation, X first order stochastically dominates Y (denotedX FSD Y ) iff E[u(X )] ≥ E[u(Y )] for all u ∈ �1, with strict inequality forsome u.5 Equivalently,

F (z) ≤ G(z) ∀z ∈ �, with strict inequality for some z� (1)

where � denotes the union of the supports of X and Y . If X FSD Y , thenthe expected welfare from X is at least as great as that from Y for allincreasing welfare functions, with strict inequality holding for some utilityfunction(s) in the class. Equivalent conditions may be given in terms ofquantiles. Loosely speaking, all quantiles of F (·) will be at least as largeas those of, and G(·). The distribution of X second order stochasticallydominates Y (denoted as X SSD Y ) iff E[u(X )] ≥ E[u(Y )] for all u ∈ �2,with strict inequality for some u. Equivalently,∫ z

−∞F (v)dv ≤

∫ z

−∞G(v)dv ∀z ∈ �, with strict inequality for some z�

(2)

If X SSD Y , then the expected social welfare from X is at least as greatas that from Y for all increasing and concave utility functions in the class�2, with strict inequality holding for some utility function(s) in the class.FSD implies SSD and higher orders, and SSD is equivalent to generalizedLorenz dominance.

5Note that SD relations offer insights into which distribution provides greater welfare consideringonly the outcome of interest, without regard for other considerations such as cost. If the treatment iscostly, a separate cost–benefit analysis is required to decide if the welfare gains exceed the costs.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 10: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 341

Define the following generalizations of the Kolmogorov–Smirnov testcriteria:

d =√

NMN + M

min supz∈�

[F (z) − G(z)] (3)

s =√

NMN + M

min supz∈�

∫ z

−∞[F (u) − G(u)]du, (4)

where min is taken over F − G and G − F , in effect performing two tests inorder to leave no ambiguity between the “equal” and “unrankable” cases.Our nonparametric tests for FSD and SSD are based on the empiricalcounterparts of d and s using the empirical CDFs, where the empirical CDFfor X is given by

FN (x) = 1N

N∑i=1

I(X ≤ x), (5)

where I(·) is an indicator function; GM (y) is defined similarly for Y . If d ≤ 0(s ≤ 0) to a degree of statistical confidence, then the null hypothesis ofFSD (SSD) is not rejected (see Appendix A for details).

To this point X and Y have represented two unconditional test scorevariables. However, dependence between class size and other determinantsof student achievement may confound the effect of class size withthe impact of other characteristics (Dustmann, 2003; Hanushek, 1979;Hanushek et al., 1996). In particular, student achievement may berelated to family background variables (such as family structure andincome), teacher attributes (such as salary and experience), and schoolcharacteristics (such as school size and the ability of peers).6 Concernover the ability to control for all such reasonable determinants of studentachievement, however, has led researchers to focus on the various IVmethods cited in the previous section. In particular, two potential sourcesof endogeneity of class size are frequently cited in the literature (Boozerand Rouse, 2001; Dobbelsteen et al., 2002; Ehrenberg et al., 2001; Hoxby,2000b; Todd and Wolpin, 2003). First, family attributes may be correlatedwith both student achievement and class size through endogenousresidential decisions (so-called Tiebout, 1956 sorting). Second, schools mayhave compensatory policies specifically designed to assign less able studentsto smaller classes and/or to superior teachers.

To control for the myriad of determinants of student performance, aswell as circumvent the potential endogeneity issue, we follow in the spiritof Dearden et al. (2002) and perform dominance tests on conditional test

6For a theoretical account of the cognitive development of students, see Todd and Wolpin(2003).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 11: Class Size and Educational Policy: Who Benefits from Smaller Classes?

342 E. Maasoumi et al.

score distributions derived via two approaches. Under the first approach,we control for a host of observable attributes that may generate a spuriouscorrelation between class size and test scores and conduct dominance testson the distributions of test scores purged of the “average” effects of these attributes.Under the second approach, we incorporate differential average effects byclass size into the conditional distributions. In both cases, the conditioningcovariates (discussed below) represent individual (such as race, gender,and lagged test scores), family (such as parental education, socioeconomicstatus, and family composition), class (such as subject and average studentability), teacher (such as race, gender, experience, and education), andschool (such as enrollment, number of teachers, and average teachersalaries) attributes. Because our conditioning set is quite exhaustive, theimplicit selection on observables assumption required to identify the treatmenteffect on the distribution of test scores appears reasonable.

To proceed with the first approach, we estimate separate educationalproduction function models for students in each size category, obtain theintercept-adjusted residuals, and perform the dominance tests on theseresiduals.7 Specifically, in the first-stage, we estimate

tijk = �k + hij�k + �ijk , k = 1, � � � ,K , (6)

where tijk is the test score for individual i in school j in class size group k, his a lengthy vector of individual, family, class, teacher, and school attributes,� is the error term, and there are K class size categories (K = 3 in theapplication). In the second stage, we analyze the distributions of �ijk ≡�k + ˆ�ijk , which correspond to test scores net of all observable characteristics(evaluated at the size specific returns, �k).8

Under the above approach, the intercept-adjusted residuals, �ijk reflecttest scores net of all observable characteristics evaluated at the size specificreturns, �k . Since this method nets out test score differences due to observ-ables as well as the size specific returns to such observables, we refer tothese tests as being based on partial residuals (PR). As an alternative, weimplement a second approach based on the full residuals (FR), where wedenote the full residuals as inclusive of differences in the return to obser-vables. Specifically, we rewrite the first-stage regression (6) for class size k as

tijk = �k + hij�k + �ijk

= �k + hij�k + �ijk + (hij�k ′ − hij�k ′

)= �k + hij�k ′ + hij(�k − �k ′) + �ijk , (7)

7The intercepts are included as part of the residuals, otherwise the conditional distributionswill all be mean zero, precluding the possibility of first order dominance.

8Note at controls for class size are omitted from (6), thereby allowing the error term to capturethe residual effect of class size not captured by the included regressors.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 12: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 343

where class size k ′ is implicitly treated as the dominant category (Neumanand Oaxaca, 2004). Consequently, we amend the residual tests to comparethe previous intercept-adjusted residual distribution of �ijk ′ with �FRijk ≡ (�k +hij(�k − �k ′) + ˆ�ijk), k �= k ′.

To aid in the comparison of the various residual dominance results,we also present the results from standard Oaxaca–Blinder parametricdecompositions. Specifically, the mean test score gap between any two classsizes, k and k ′, may be expressed as

tk − tk ′ = (�k − �k ′)︸ ︷︷ ︸U

+ (hk − hk ′)�k ′︸ ︷︷ ︸E

+ hk(�k − �k ′)︸ ︷︷ ︸C

(8)

where class k ′ is treated as the dominant category. If the difference inreturns (term C) in (8) is large in absolute value, then the two residualtests may be expected to yield disparate results. Moreover, if the three setsof results (two sets of residual tests and the one set of unconditional tests)offer different inferences, one may infer a significant association betweenclass size, the distribution of test scores, the set of conditioning variables,and the returns to the conditioning variables.

3.2. Inference

The asymptotic null distribution of the test statistics, d and s, dependon the unknown distributions F and G . In the analysis below, we firstapproximate the empirical distribution of the test statistics using simplebootstrap methods as in Maasoumi and Heshmati (2000) and Maasoumi andMillimet (2005) and report the estimated significance level. To evaluatethe null Ho : d ≤ 0, we first report in our tables whether the observedempirical distributions are seemingly rankable by FSD or SSD; we presentthe sample values of max�d1�, max�d2�, d , max�s1�, max�s2�, and s (seeAppendix A). We then obtain bootstrap estimates of the probability that dlies in the nonpositive interval (i.e., Pr�d ≤ 0�) using the relative frequencyof �d∗ ≤ 0�, where d∗ is the bootstrap estimate of d (500 repetitions areused).9 If this interval has a large probability, say 0.90 or higher, andd ≤ 0, we may infer dominance to a desirable degree of confidence. Ifthis interval has a low probability, say 0.10 or smaller, and d > 0, wemay infer the presence of significant crossings of the empirical CDFs,implying an inability to rank the outcomes. Finally, if the probability liesin the intermediate range, say between 0.10 and 0.90, there is insufficientevidence to distinguish between equal and unrankable distributions. Thisis a classic confidence interval test; specifically, we are assessing the

9Note that we also report simple bootstrap estimates of the Pr�d∗ ≥ d�. These are provided tofacilitate visualization of the simple bootstrap distribution.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 13: Class Size and Educational Policy: Who Benefits from Smaller Classes?

344 E. Maasoumi et al.

likelihood that the event d ≤ 0 has occurred. Similarly, we estimatePr�s ≤ 0� to evaluate the second order dominance proposition given byHo : s ≤ 0.10

As an alternative, we also evaluate the less decisive dominanceproposition H o : d = 0 via the Linton et al. (2005) recentered bootstrapprocedure, which the authors demonstrate provides a consistent test.It is known that d converges to d under general conditions (likewisefor the SSD statistic). However, under the null Ho : d = 0, centeringof computations around their corresponding sample values introducessecond order errors that are negligible for first order (asymptotic)approximations, but it is desirable for removing some uncertainties dueto estimation of unknown parameters and distributions. This is the sourceof improvement in bootstrap power gained from recentering. Theother source of improvement arising from recentering pertains to thetechnique’s robustness to within-sample dependence.

Utilizing the algorithm detailed in Appendix A, we obtain recenteredbootstrap p-values in the classical sense as the relative frequency of�d∗∗ > d�, where d∗∗ is the recentered bootstrap estimate of d . If thePr�d∗∗ > d� is low, say 0.10 or smaller, we reject the null Ho : d = 0; if thisp-value is greater than 0.10, we do not reject the null.11 It is importantto emphasize, however, that while rejection of the null provides valuableinsight in the recentered bootstrap case, failure to reject the null providesless information. If we reject the null and d < 0, we may infer dominanceto a desirable degree of confidence. Conversely, if we reject the null andd > 0, we may infer unequal, but unrankable, distributions. These areboth strong findings, as the former (latter) indicates that all (not all)increasing social welfare functions will concur on the relative rankings ofthe distributions in question. On the other hand, failure to reject the nullmerely implies that we cannot eliminate the possibility that F = G ; strictdominance also cannot be ruled out to some degree of confidence. Seen inthis light, the recentered bootstrap is a conservative test. In the discussionof the results, we focus more heavily on the more decisive simple bootstrapfor inference. Similarly, we report the relative frequency of �s∗∗ > s� and�s∗∗ > 0� to evaluate the null Ho : s = 0.

10Note that we do not impose and or the least favorable case (LFC) of equality of thedistributions. This could be done by combining the data on X and Y and bootstrapping fromthe combined sample (e.g., Abadie, 2002). Our bootstrap samples still contain N (M ) observationsfrom X (Y ). As argued in Linton et al. (2005), working under LFC has some undesirable powerconsequences as it can produce biased tests that are not similar on the boundary of the null.This happens when the boundary of the null itself is composite. The bootstrap methods employedherein, combined with a fixed critical value at zero (the boundary of the null hypothesis), rendersour tests “asymptotically similar” and unbiased on the boundary (Maasoumi and Heshmati, 2005).

11We also report the Pr�d∗∗ ≤ 0� in the tables. This allows the reader to see the significancelevel (size) of the test associated with the special critical value zero. In our tables, these are obtainedsimply as Pr�d∗∗ > 0� = 1− Pr�d∗∗ ≤ 0�.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 14: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 345

A final, necessary comment pertains to inference in the FR tests(i.e., those incorporating the Oaxaca–Blinder decomposition). Owing tothe usage of a common set of coefficient estimates in obtaining bothresidual distributions being compared, there necessarily exists between-sample dependence. For example, the FR test using data on small (k = 1)and medium (k = 2) classes compares the distributions of �ij1 and �FRij2 .The former depends on �tij1, hij1, �1(t1, h1)�, where t1 and h1 representthe full data vector for t and h for the sample of students in smallclasses; the latter, �FRij2 = tij2 − hij2�1, depends on �tij2, hij2, �1(t1, h1)�. Thissource of dependence is atypical. Between-sample dependence usuallyarises when the same individuals appear in the two samples beingcompared (e.g., distributions of pre- and posttax incomes for a sampleof individuals). To handle this more common type of between-sampledependence, pairwise (or clustered) bootstrap samples are drawn in orderto maintain the dependence in the resampled data (Linton et al., 2005).In the current situation, the between-sample dependence is maintainedby reestimating the first-stage equations (6) and (7) on each bootstrapresample. Specifically, by resampling N observations �t ∗ij1, h

∗ij1� and M

observations �t ∗ij2, h∗ij2� nonparametrically and reestimating (6), we obtain

the resampled distributions of �∗ij1 and �FR∗

ij2 , where the former dependson �t ∗ij1, h

∗ij1, �

∗1(t

∗1 , h

∗1)� and the latter depends on �t ∗ij2, h

∗ij2, �

∗1(t

∗1 , h

∗1)�. Thus,

as in the usual pairwise bootstrap case, the source of between-sampledependence is maintained in the resampling procedure.

4. DATA

The data are obtained from the National Education Longitudinal Study(NELS) of 1988, a large-scale longitudinal study of high school studentsconducted by the National Center for Education Statistics (NCES). TheNELS contains a nationally representative sample of eighth graders firstsurveyed in the spring of 1988. Follow-up surveys were administered tothe respondents in 1990 and 1992. The original sample was chosen byinitially sampling some 1,000 public and private schools from a universe ofapproximately 40,000 schools containing eighth grade students, and thendrawing random samples of approximately 24–26 eighth grade studentsper school. The original sample, therefore, contains roughly 25,000 eighthgrade students.

The students were administered cognitive tests in reading, socialstudies, mathematics and science during the base year (1988), first follow-up (1990), and second follow-up (1992). The grade specific tests containedmaterial appropriate for each grade but included sufficient overlap withthe exams for the other grades to permit measurement of academic

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 15: Class Size and Educational Policy: Who Benefits from Smaller Classes?

346 E. Maasoumi et al.

growth.12 For each sampled student, the teachers from two of the foursubjects were surveyed, thereby yielding information on class size. Thus, wehave two observations for each student in each wave of the survey after wematch the class size with the test score for that particular subject. The NELSalso supplies general descriptive information about the school obtainedfrom the chief administrator of each school in the sample.

Following Boozer and Rouse (2001), we construct two samples, eachincluding only students attending public schools. The first sample usestest score results from the first follow-up (tenth grade) and informationon eighth grade class size, and contains 12,412 students from 762 schools(23,549 total observations).13 The second sample is analogously definedusing information from tenth and twelfth grades, and contains 8,685students from 756 schools (14,796 total observations), considerably smallerthan the previous sample due to attrition (either dropping out of thesurvey or relocating to a out-of-sample school). All results are weightedusing the appropriate sample weights.

Because there is a continuum of class sizes to which students belong,we classify each student-test observation in each sample into one of threegroups to make the number of SD tests manageable. The groupings are:small (19 or fewer students), medium (20–30 students), and large (more than30 students).

To obtain the residual test scores, we utilize an extensive set ofindividual, family, class, teacher, and school characteristics available fromthe NELS. Specifically, the vector h in (6) includes controls for thefollowing (in addition to a constant term):

Individual:. race, gender, an indicator for limited English proficiency(LEP), and lagged test scores;

Family:. father’s education, mother’s education, family composition,number of siblings, an indicator for the presence of a home computer,family socioeconomic status, and dummy variables indicating if familycomposition and number of siblings are missing;

Class:. subject and the overall relative ability of the class;

Teacher:. race, gender, experience, education, indicators if the studentand teacher are of the same race and of the same gender;

School:. urban/rural status, region, total school enrollment, grade-levelenrollment, number of full-time teachers, number of teachers by race,

12We follow Goldhaber and Brewer (1997) and Boozer and Rouse (2001) and utilize the rawitem response theory (IRT) scores for each test.

13The exams may be admistered as early as January, midway through the school year. As aresult, we follow the lead of Boozer and Rouse (2001) and initially examine the effect of previousclass size on current test scores. See also Hoxby (2000a).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 16: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 347

level of student disruptions in class, percentage of minority students inthe school, teacher salaries, length of school year, percentage of studentsin remedial reading, remedial math, and bilingual education, and dummyvariables indicating if teacher salary information, school year length, thenumber of full-time teachers by race, and the percentage of students inremedial reading, remedial math, and bilingual education are missing.

Summary statistics for the samples are available upon request.Before continuing, a few comments are warranted. First, the inclusion

of lagged test scores proxies for innate ability, following the strategy ofEide and Showalter (1998), Dearden et al. (2002), and others. Lagged testscores also control for all previous inputs into the educational productionprocess, giving the results a “value-added” interpretation (Goldhaber andBrewer, 1997; Rivkin et al., 2002; Todd and Wolpin, 2003). In addition, wealso condition on the teacher’s subjective assessment of the overall abilityof the class from which the test score is taken.14 Such ability controls arevital to circumventing the potential endogeneity arising from endogenousresidential choice and nonrandomness in the assignment of students toclasses by schools (Betts and Shkolnik, 2000). The ability level of theclass also captures peer effects that have been shown to be important(Dobbelsteen et al., 2002; Levin, 2001). Second, to minimize further anypotential spurious correlation between class size and other determinants ofstudent achievement, we condition on a fairly substantial vector of schoolattributes. Since actual class size is only observed ex post, while schoolattributes are observable ex ante, controlling for school characteristicsnot only removes their direct effect on student performance but alsoproxies for family background traits and parental involvement.15 Third,as argued in Hanushek (1979), controlling for family attributes such associoeconomic status and parental education levels also severely mitigatesany bias resulting from endogenous residential choice.

5. RESULTS

5.1. Unconditional SD Tests

The initial SD tests involve comparing the unconditional test scoredistributions across students differentiated by class size. Results areprovided in Table 1; Panel A examines tenth grade test scores as a

14The response options are (i) the class is of above average ability (relative to the school),(ii) the class is of average ability, (iii) the class is of below average ability, or (iv) the class hasstudents of widely varying abilities. Figlio and Paige (2002) also make use of this variable.

15For instance, when looking at homes for sale at http://www.realtor.com, links are providedfor (virtually) every house in the US in order to view the average pupil–teacher ratio and totalenrollment for that location (along with average SAT scores and percentage of students continuingto college). See also Todd and Wolpin (2003).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 17: Class Size and Educational Policy: Who Benefits from Smaller Classes?

TABLE

1Unco

ndition

alstoc

hasticdo

minan

cetests

Distributions

Firstorde

rdo

minan

ceSe

cond

orde

rdo

minan

ceObserved

XY

ranking

d 1,M

AX

d 2,M

AX

dPr�d

∗ 1≤

0�Pr�d

∗ 2≤

0�Pr�d

∗≤

0�Pr�d

∗ 1≥

d 1�

Pr�d

∗ 2≥

d 2�

Pr�d

∗≥

d�

s 1,M

AX

s 2,M

AX

sPr�s

∗ 1≤

0�Pr�s

∗ 2≤

0�Pr�s

∗≤

0�Pr�s

∗ 1≥

s 1�

Pr�s

∗ 2≥

s 2�

Pr�s

∗≥

s�

A.8th

Grade

Class

Size:10

thGrade

Test

Scores

Small

Med

ium

Non

e2.15

01.53

31.53

30.00

00.00

00.00

00.62

20.59

80.51

424

6.82

69.04

99.04

90.00

00.26

60.26

60.48

00.53

00.52

80.01

60.01

20.02

80.03

00.09

00.00

00.14

80.13

00.27

80.09

20.66

60.33

6Sm

all

Large

Non

e1.63

60.80

40.80

40.00

20.00

00.00

20.65

80.81

40.75

215

3.16

34.38

54.38

50.09

60.12

20.21

80.53

80.63

40.52

20.01

40.02

80.04

20.13

60.48

60.12

60.22

00.32

80.54

80.24

20.60

20.32

6Med

ium

Large

Non

e0.71

40.64

80.64

80.00

00.00

20.00

20.77

40.81

80.64

026

.211

41.308

26.211

0.35

40.10

00.45

40.53

80.59

40.19

80.02

00.01

80.03

80.60

80.61

80.30

80.26

60.32

80.59

40.58

60.45

60.11

4

B.10

thGrade

Class

Size:12

thGrade

Test

Scores

Small

Med

ium

MSS

DS

2.72

01.16

91.16

90.00

00.00

00.00

00.65

80.65

80.65

837

3.40

4−0

�439

−0�439

0.00

00.88

20.88

20.52

40.59

80.59

80.00

80.01

00.01

80.00

00.13

20.00

00.19

80.18

40.38

20.00

40.99

20.98

6Sm

all

Large

LSS

DS

4.14

90.33

30.33

30.00

00.01

20.01

20.60

80.72

80.72

854

7.16

9−0

�174

−0�174

0.00

00.87

00.87

00.54

20.49

60.49

60.01

00.00

40.01

40.00

00.83

80.70

40.15

20.19

00.34

20.00

00.94

00.89

2Med

ium

Large

LFS

DM

2.93

50.04

40.04

40.00

00.13

40.13

40.57

80.80

40.80

447

7.34

40.11

20.11

20.00

00.40

80.40

80.51

80.53

20.53

20.00

80.00

80.01

60.00

00.98

40.96

80.18

40.17

60.36

00.00

00.80

80.60

2

Smallclasseshave

fewer

than

20stud

ents;med

ium

classeshave

between

20an

d30

stud

ents;large

classeshave

31or

more

stud

ents.All

resultsuse

approp

riate

panel

weigh

ts.Prob

abilities

obtained

via

500

bootstrap

repe

tition

s(fi

rstrow:simple

bootstrap;

seco

nd

row:recentered

bootstrap)

.No

observed

ranking

implieson

lythat

the

distribu

tion

sare

not

rankable

inthe

first-

orseco

nd-de

gree

sense.Se

etext

for

further

details.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 18: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 349

FIGURE 1 CDFs and integrated CDFs: unconditional 10th grade test scores by 8th grade class size.Small, <20 students, Medium, 20–30 students, Large, >30 students.

function of eighth grade class size, and Panel B examines twelfth gradetest scores as a function of tenth grade class size. The corresponding CDFs,integrated CDFs, and differences in the CDFs are plotted in Figures 1and 2, respectively.

Comparing the empirical distributions of tenth grade test scores acrosssmall, medium, and large classes (Panel A), we find no instance wherewe are able to rank the distributions in either the first- or second-degreesense, although a clear ordering exists for mean test scores.16 Moreover,the simple bootstrap indicates that the crossings of the CDFs in eachcase are statistically meaningful at the p < 0�01 confidence level. Therecentered bootstrap confirms this finding, indicating rejection of thenull Ho : d = 0 when comparing small and medium classes (p-value =0�000), rejecting strict dominance and equality of the distributions; thenull is nearly rejected when comparing small and large classes as well(p-value = 0�126). The recentered bootstrap also nearly rejects the nullHo : s = 0 when comparing medium and large classes (p-value = 0�114),rejecting strict second order dominance and equality of the distributions.

The lack of first- and second-degree SD is an extremely powerful result.For example, if the test score level for Small FSD Medium (or Large),then any policymaker with a social welfare function increasing in test scoreswould prefer smaller class sizes. Similarly, a finding of SSD would implythat any policymaker with a social welfare function that is increasing andaverse to dispersion in test scores would prefer smaller class sizes. However,we find no such dominance relations; individuals with different preferencefunctions in the class �1 or �2 can reasonably disagree about the efficacyof smaller classes.

Examining the actual plots (Figure 1), we see that the three CDFsand integrated CDFs are extremely similar, never differing by more than

16The average test score is highest in small classes (31.24), followed by large classes (31.21),and then medium classes (31.17).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 19: Class Size and Educational Policy: Who Benefits from Smaller Classes?

350 E. Maasoumi et al.

FIGURE 2 CDFs and integrated CDFs: unconditional 12th grade test scores by 10th grade class size.Small, <20 students, Medium, 20–30 students, Large, >30 students.

four test points in any part of the distribution.17 Nonetheless, the plot ofthe differences in the CDFs—corresponding to estimates of the QTEs—isstill revealing, indicating that medium and large classes are most similar,while small classes are quite distinct. Specifically, small classes outperformmedium and large classes at the upper end of the distribution (above the70th percentile), with the converse holding below the 70th percentile.

The twelfth grade test scores (Panel B) reveal several instances wherethe empirical distributions of test scores are rankable, and in all such casesit is the distribution from the larger class size that dominates the empiricaldistribution from the smaller class size.18 However, not all of these rankingsare statistically meaningful at conventional levels, highlighting the needfor formal statistical testing. Specifically, we observe Medium SSD Small,Large SSD Small, and Large FSD Medium. The simple bootstrap yields amarginally significant Pr(s ≤ 0) = 0�882 and 0�870 in the first two cases,and a Pr(d ≤ 0) = 0�134 (Pr(s ≤ 0) = 0�408) in the final case. Moreover,the recentered bootstrap fails to reject the null Ho : s = 0 in any of the threecases; it does reject the null Ho : d = 0 in the test of small versus mediumclasses (p-value = 0�000), thereby rejecting strict dominance and equalityof the distributions. Thus there is at best modest evidence that uniformrankings favor larger classes when (i) unconditional test scores and (ii) thedispersion of test scores are considered.

Examining the actual plots (Figure 2), we see that—as in Figure 1—thethree CDFs and integrated CDFs are extremely similar, never differing bymore than four points in any part of the distribution.19 However, as above,the plot of the differences in the CDFs (i.e., the QTEs) is informative,indicating that large classes are preferable to medium classes over the

17The standard deviation of tenth grade test scores is 12.08.18Focusing on the mean test scores by class size grouping also favors larger classes. The average

twelfth grade test score is 36.6 in large classes, 35.5 in medium classes, and 35.0 in small classes.19The standard deviation of twelfth grade test scores is 13.60.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 20: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 351

entire distribution (consonant with the observed FSD ranking) and that thegains from large classes (relative to medium classes) is fairly uniform acrossthe distribution. Furthermore, as was the case with eighth grade class size,small classes outperform medium and large classes at the upper end of thedistribution (roughly above the 70th percentile), with the reverse holdingat the lower end.

In sum, the unconditional tests assessing the impact of eighth gradeclass size refute the existence of a uniform ranking—a ranking robustto the choice of specific preference function—of test score distributionsacross class size categories. This finding highlights the false sense ofdecisiveness that one gets from summary comparisons, such as thosebased on mean treatment effects. The unconditional tests based ontenth grade class size, however, provide modest evidence of a uniformranking of test score distributions across class size categories. However,such rankings, to the extent that they exist, are only obtained when oneincorporates dispersion of test scores into the welfare criteria. Moreover,the distributional approach reveals exactly who gains and who loses (in theunconditional sense) from smaller classes: students in the upper (lower)tail of the distribution gain from smaller (larger) classes. These findings aremerely suggestive, however, as they fail to control for observables correlatedwith both class size and student achievement. For instance, if low-achievingstudents are allocated to smaller classes, this may explain the lower testscores found in smaller classes below the median. To determine if theunconditional results hold once we purge test scores of such potentialconfounders, we turn to the conditional SD tests.

5.2. Conditional SD Tests

5.2.1. Tenth Grade Test ScoresThe PR and FR dominance test results for the pairwise comparisons

involving tenth grade test scores are displayed in Table 2.20 Thecorresponding plots are displayed in Figures 3–5. Examining the PRresults (Panel A), we find that in every pairwise comparison, the empiricaldistribution for larger classes first order dominates the correspondingdistributions for smaller and medium classes. Moreover, both rankingsare statistically significant at conventional levels according to the simplebootstrap (Large FSD Small: Pr(d ≤ 0) = 0�986; Large FSD Medium:Pr(d ≤ 0) = 0�924). The recentered bootstrap fails to reject the nullHo : d = 0 or s = 0 (Large FSD Small: p-value = 0�922; Large FSD Medium:p-value = 0�874). In addition, we observe that Medium FSD Small,although this ranking is only marginally significant according to the simple

20First-stage regression results are not presented but are available from the authors uponrequest.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 21: Class Size and Educational Policy: Who Benefits from Smaller Classes?

TABLE

2Con

dition

alstoc

hasticdo

minan

cetests

Distributions

Firstorde

rdo

minan

ceSe

cond

orde

rdo

minan

ce

Observed

XY

ranking

d 1,M

AX

d 2,M

AX

dPr�d

∗ 1≤

0�Pr�d

∗ 2≤

0�Pr�d

∗≤

0�Pr�d

∗ 1≥

d 1�Pr�d

∗ 2≥

d 2�Pr�d

∗≥

d�

s 1,M

AX

s 2,M

AX

sPr�s

∗ 1≤

0�Pr�s

∗ 2≤

0�Pr�s

∗≤

0�Pr�s

∗ 1≥

s 1�Pr�s

∗ 2≥

s 2�Pr�s

∗≥

s�

A.8th

Grade

Class

Size:10

thGrade

Test

Scores

(Partia

lResidua

l)Sm

all

Med

ium

MFS

DS

15.571

−0�902

−0�902

0.14

80.73

40.88

20.59

40.46

60.43

836

97.722

−0�985

−0�985

0.19

60.74

20.93

80.57

60.44

00.42

60.32

40.35

80.68

20.36

60.78

00.60

40.35

80.48

60.84

40.33

40.59

80.36

8Sm

all

Large

LFS

DS

30.681

−1�084

−1�084

0.00

80.97

80.98

60.55

20.58

40.58

473

91.206

−1�084

−1�084

0.00

80.98

60.99

40.51

80.58

20.58

20.25

20.14

80.40

00.00

00.93

00.92

20.37

80.37

00.74

80.00

20.73

60.72

8Med

ium

Large

LFS

DM

29.703

−0�945

−0�945

0.04

20.88

20.92

40.42

00.50

40.49

870

14.720

−0�945

−0�945

0.05

20.92

00.97

20.40

40.50

20.48

40.38

00.15

80.53

80.00

20.93

40.87

40.46

20.31

60.77

80.01

40.81

80.75

4

B.8th

Grade

Class

Size:10

thGrade

Test

Scores

(FullResidua

l)Sm

all

Med

ium

SSSD

M1.25

61.61

51.25

60.00

00.00

00.00

00.97

80.89

60.96

2−0

�422

194.29

6−0

�422

0.52

60.00

00.52

60.62

00.89

00.62

00.00

00.00

00.00

00.81

60.53

40.58

40.15

60.02

60.18

20.91

80.21

20.91

4Sm

all

Large

Non

e1.22

01.48

41.22

00.00

00.00

00.00

00.90

20.82

80.80

09.34

114

4.51

29.34

10.39

80.00

20.40

00.53

60.78

80.53

00.00

40.00

00.00

40.79

60.63

60.55

80.25

60.05

60.31

20.67

20.42

20.58

8Med

ium

Large

MSS

DL

0.21

23.08

40.21

20.05

20.00

00.05

20.70

20.44

20.70

2−0

�076

461.81

9−0

�076

0.58

60.00

00.58

60.47

40.42

80.47

40.00

00.00

40.00

41.00

00.05

20.98

80.12

20.18

20.30

40.88

00.00

60.73

8

C.10

thGrade

Class

Size:12

thGrade

Test

Scores

(Partia

lResidua

l)Sm

all

Med

ium

SFS

DM

−0�108

1.76

0−0

�108

0.57

40.21

60.79

00.45

00.59

00.26

0−0

�362

394.16

8−0

�362

0.62

00.22

60.84

60.41

00.58

00.27

20.27

20.45

00.72

20.90

80.37

20.60

60.41

80.48

00.89

80.74

00.33

20.43

8Sm

all

Large

LFS

DS

4.85

4−0

�165

−0�165

0.17

40.65

00.82

40.66

00.40

00.25

011

21.477

−0�291

−0�291

0.18

80.74

60.93

40.63

00.31

40.17

80.27

20.41

40.68

60.50

40.66

20.43

00.31

20.51

80.83

00.46

60.59

60.39

6Med

ium

Large

LFS

DM

8.38

1−0

�573

−0�573

0.12

20.64

40.76

60.58

60.46

40.42

019

16.552

−0�599

−0�599

0.13

20.75

80.89

00.57

40.40

20.35

00.30

00.29

80.59

80.38

40.87

40.77

40.35

60.39

60.75

20.33

40.81

80.72

0

D.10

thGrade

Class

Size:12

thGrade

Test

Scores

(FullResidua

l)Sm

all

Med

ium

Non

e1.52

30.58

20.58

20.00

00.00

00.00

00.89

40.97

60.97

623

1.47

948

.165

48.165

0.08

60.00

00.08

60.44

20.96

80.81

80.00

20.00

00.00

20.45

00.94

60.88

60.39

20.00

60.39

80.05

60.83

40.21

8Sm

all

Large

Non

e1.89

80.36

60.36

60.00

00.00

60.00

60.72

20.86

40.86

434

6.35

019

.359

19.359

0.07

40.03

20.10

60.43

00.79

20.70

60.00

80.00

20.01

00.20

00.96

60.89

40.29

40.08

60.38

00.03

80.82

00.24

2Med

ium

Large

Non

e0.86

70.45

20.45

20.00

00.00

40.00

40.84

00.74

60.72

012

3.88

52.29

62.29

60.11

00.27

00.38

00.51

40.67

00.53

20.00

00.00

00.00

00.67

20.94

80.87

20.15

40.09

00.24

40.19

20.83

40.60

4

Smallclasseshave

fewer

than

20stud

ents;med

ium

classeshave

between

20an

d30

stud

ents;large

classeshave

31or

more

stud

ents.All

resultsuse

approp

riate

panel

weigh

ts.Prob

abilities

obtained

via

500

bootstrap

repe

tition

s(fi

rstrow:simple

bootstrap;

seco

nd

row:recentered

bootstrap)

.No

observed

ranking

implieson

lythat

the

distribu

tion

sare

not

rankable

inthe

first-

orseco

nd-de

gree

sense.First-stage

regression

sinclud

erace

dummies,

gende

rdu

mmy,

limited

English

profi

cien

cy(L

EP)

dummy,

father’s

educ

ation

dummies,

mother’s

educ

ation

dummies,

hom

eco

mpu

terdu

mmy,

family

compo

sition

dummies,

family

socioe

conom

icstatus,num

berof

siblings

dummies,

dummiesforteacher

race,teacher

gende

rdu

mmy,

teacher

expe

rien

cedu

mmies,

teacher

educ

ation

dummies,

average

classab

ility

dummies,

dummies

indicating

stud

entan

dteacher

are

ofthe

same

race,du

mmyindicating

stud

entan

dteacher

are

ofthe

same

gende

r,du

mmiesforam

ountof

stud

entdisrup

tion

sin

class,

region

aldu

mmies,

urba

nan

drural

dummies,

schoo

len

rollm

entdu

mmies,

grad

e-levelen

rollm

entdu

mmies,

dummiesforthe

percen

tage

ofminoritystud

ents

inschoo

l,du

mmiesfornum

berof

totalfull-time

teachersas

wellas

byrace,teacher

salary

dummies,

percen

tage

ofstud

ents

inschoo

lin

remed

ialread

ing,

percen

tage

ofstud

ents

inschoo

lin

remed

ialmath,pe

rcen

tage

ofstud

ents

inschoo

lin

bilin

gual

educ

ation,length

ofschoo

lyear

dummies,

lagg

edtest

score,

dummiesindicating

iffamily

compo

sition

,num

berof

siblings,teacher

salaries,schoo

lyear

length,num

berof

full-time

teachersby

race,an

dpe

rcen

tage

ofstud

ents

inremed

ial

read

ing,

remed

ialmath,an

dbilin

gual

educ

ation

are

missing.

Partialresidu

altest

compa

risonsba

sed

onEqu

ation

(7);

full

residu

alco

mpa

risonsba

sed

onEqu

ation

(8).

See

text

forfurther

details.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 22: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 353

FIGURE 3 CDFs and integrated CDFs: partial residual 10th grade test scores by 8th grade class size.Small, <20 students, Medium, 20–30 students, Large, >30 students.

bootstrap (simple: Pr(d ≤ 0) = 0�882), the recentered bootstrap fails toreject the null of equality (p-value = 0�604). However, the second orderrelation is statistically significant at conventional levels according to thesimple bootstrap (Pr(s ≤ 0) = 0�938). Examination of Figure 3 reveals thatdisparities in PR test scores (the QTEs) are fairly uniform across the entiredistribution: at any quantile of the distribution, students in large (medium)classes score roughly ten (two) points higher than students in small classes.

These striking results indicate that not only do smaller classes notlead to improvements in the subsequent distribution of test scores, netof a host of individual, family, class, teacher, and school attributes, butthey actually result in inferior student performance according to the PRtests. Furthermore, this finding is robust across any social welfare functionthat is increasing in test scores. Finally, the corresponding unconditionaldistributions are unrankable (Table 1, Panel A), which indicates thatobservable attributes that are positively associated with test scores arenegatively correlated with class size, consonant with positive selection intosmaller classes through Tiebout sorting.

FIGURE 4 CDFs and integrated CDFs: full residual 10th grade test scores by 8th grade class size.Small class size is dominant category. Small, <20 students, Medium, 20–30 students, Large, >30students.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 23: Class Size and Educational Policy: Who Benefits from Smaller Classes?

354 E. Maasoumi et al.

FIGURE 5 CDFs and integrated CDFs: full residual 10th grade test scores by 8th grade class size.Medium class size is dominant category. Medium, 20–30 students, Large, >30 students.

As noted previously, however, a potential shortcoming of the PR testsis that differences in the class size specific returns to observables are notincluded in the PR distributions. As a result, any differences in the returnsto observable characteristics are not attributable to the effects of class size,which may yield misleading inferences. The results of FR dominance tests,which account for such impacts of class size, are displayed in Panel B.21

The results indicate that the empirical distributions are rankable fortwo of the three comparisons: Small SSD Medium and Medium SSDLarge.22 However, neither ranking is statistically significant according tothe simple bootstrap. In the former (latter) case, the simple bootstrapyields Pr(s ≤ 0) = 0�526 (0.586). The recentered bootstrap gives p-valuesabove 0.70 in both cases, failing to reject the null Ho : s = 0. In termsof the comparison of small and large classes, the FR distributions areunrankable.23

Examination of Figures 4 and 5 yield three additional findings. First,while small classes second order dominate medium classes, but not largeclasses, according to the empirical distributions, there is little difference inactuality between the observed distributions for medium and large classes(when small classes are the dominant group). Second, small classes farebetter than medium and large classes at the lower tail of the distribution

21In the tables of FR results, the distribution in the X column is treated as the dominantcategory; see Equation (7).

22While the CDFs for small and medium classes appear to cross in the extreme lower tail in thetop row of Figure 3 (favoring medium classes and thereby precluding an SSD ranking in favor ofsmall classes), Table 2 reports a finding of Small SSD Medium because of a “trimming” procedure.Specifically, in the empirical implementation, the support points used to obtain the test statistics(see Appendix A) are chosen to be equally spaced, beginning at the first percentile and ending atthe 99th percentile of the empirical support, �. This process focuses attention away from extremeoutliers.

23Note that SSD relations, especially using the FR distributions, do not obey a transitivityproperty; Small SSD Medium and Medium SSD Large is not sufficient to guarantee Small SSDLarge. While this is true in general for SSD (not FSD), it is especially true in the FR tests since theSmall versus Medium and Small versus Large comparisons utilize Small as the dominant category,while the Medium versus Large comparison uses Medium as the dominant group.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 24: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 355

(below the median), while the converse holds above the median. This facthighlights the concavity assumption explicitly required of the social welfarefunction under the SSD ranking. Finally, medium classes outperform largeclasses (when medium classes are the dominant group) across the majorityof the distribution; if not for a few crossings in the extreme tails, one wouldhave observed a FSD ranking. However, the difference is of low magnitudeas test scores differ by less than half a point (in absolute value) over themajority of the distribution.

These results, which suggest a modest, but statistically insignificant,monotonic ranking in favor of smaller classes, reverse the findings from thePR tests and indicate the importance of allowing the returns to observableattributes to vary by class size, as well as incorporating these differentialreturns into the residual distributions being compared. To verify that thisindeed the case, Table 3 presents the results from standard parametricOaxaca–Blinder decompositions. Consonant with the change in resultsfrom the PR to the FR tests, Panel A indicates that the coefficients differconsiderably across the three class size groups, with a significant advantagebelonging to small classes (followed by medium classes). In particular,these results are driven primarily by greater returns to teacher race andeducation, school size, and lagged test scores. Thus an important benefitof smaller classes (in terms of subsequent test scores) appears to be agreater return to other observable attributes such as teacher education andstudents’ innate ability and/or previous educational inputs.

In the end, we believe that FR tests best isolate the effects of classsize on the distribution of test scores and that the simple bootstrap is themore informative method of inference. Accordingly, we conclude that theunconditional distributions of test scores favor students in classes with 20or more students below the 70th percentile, and students in classes withfewer than 20 students in the upper tail. However, after purging test scores

TABLE 3 Oaxaca–Blinder decompositions of mean test score gaps

Class size Portion of observed gap due to differences in

X Y Endowments Intercepts Coefficients

Observedgap

A. 8th Grade Class Size: 10th Grade Test ScoresSmall Medium 0�067 0�254 −3�067 2�880Small Large 0�029 0�863 −10�047 9�213Medium Large −0�038 0�032 −6�979 6�910

B. 10th Grade Class Size: 12th Grade Test ScoresSmall Medium −0�385 −0�267 1�272 −1�391Small Large −1�939 −1�742 −2�085 1�888Medium Large −1�554 −1�380 −3�357 3�183

Positive numbers in columns 4–6 indicate advantage to X; negative numbers indicateadvantage to Y. See Table 2 for further details.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 25: Class Size and Educational Policy: Who Benefits from Smaller Classes?

356 E. Maasoumi et al.

of the average impact of a host of observable attributes using class sizespecific returns to observables and invoking the selection on observablesassumption, we find that test scores exhibit a monotonic second order SDrelationship, albeit statistically insignificant at conventional levels, favoringsmaller classes. Moreover, the advantage for small classes is confinedsolely to below the median and is mainly attributable to the advantageousreturns to observables enjoyed by students in smaller classes. As a result,uniform rankings are only possible when the debate over class size isbroadened to incorporate the dispersion of test scores into the evaluationof educational policy. Furthermore, these results suggest the possibilityof a Pareto-improving reallocation of students whereby lower performingstudents are placed in smaller classes and higher performing students areplaced in larger classes.

5.2.2. Twelfth Grade Test ScoresThe PR and FR results for twelfth grade test scores are given in Panels

C and D of Table 2, respectively. The corresponding plots are displayedin Figures 6–8. The PR results suggest a nonmonotonic effect of classsize. Specifically, we observe Small FSD Medium, Large FSD Small, andLarge FSD Medium. However, all three observed rankings are statisticallyinsignificant at conventional levels according to the simple bootstrap.A second order ranking of Large over Small is statistically significant(Pr(s ≤ 0) = 0�934), while the other two comparisons are only marginallysignificant (Small SSD Medium: Pr(s ≤ 0) = 0�846; Large SSD Medium:Pr(s ≤ 0) = 0�890). Examination of Figure 6 confirms the previous resultsbased on eighth grade class size (Figure 3): disparities in PR testscores (i.e., the QTEs) are fairly uniform across the entire distribution.Specifically, at any quantile of the distribution, students in large (small)classes score roughly two (one-quarter) points higher on twelfth grade teststhan students in medium classes.

FIGURE 6 CDFs and integrated CDFs: partial residual 12th grade test scores by 10th grade classsize. Small, <20 students, Medium, 20–30 students, Large, >30 students.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 26: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 357

FIGURE 7 CDFs and integrated CDFs: full residual 12th grade test scores by 10th grade class size.Small class size is dominant category. Small, <20 students, Medium, 20–30 students, Large, >30students.

As documented in the eighth grade class size results, however, thePR tests ignores a potentially important difference across class sizes:the returns to observable determinants of student achievement. Theresults of the FR dominance tests, displayed in Panel D, reveal thatthe three distributions are unrankable in the first- and second-degreesense. Examination of Figures 7 and 8 yield two additional findings.First, small classes fare better than medium and large classes only in theextreme lower tail of the distribution (roughly below the 15th percentile)when small classes are treated as the dominant group. Second, largeclasses outperform medium classes across the majority of the distribution,although the difference is of low magnitude as test scores differ by less thanone-half point (in absolute value) over the majority of the distribution. Thisresult is invariant to the choice of small or medium classes as the dominantgroup.

These results, which suggest little impact of class size on subsequenttest scores, refute the findings from the PR tests and confirm our previousfinding that the returns to observable attributes vary in an important

FIGURE 8 CDFs and integrated CDFs: full residual 12th grade test scores by 10th grade class size.Medium class size is dominant category. Medium, 20–30 students, Large, >30 students.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 27: Class Size and Educational Policy: Who Benefits from Smaller Classes?

358 E. Maasoumi et al.

manner by class size. Specifically, Panel B of Table 3 verifies that, accordingto the standard parametric Oaxaca–Blinder decomposition, the return toobservables is most favorable to medium classes, followed by small and thenlarge classes. In particular, these results are driven by differences in thereturns to teacher education and salary, teacher and student race, numberof full-time teachers, and lagged test scores.

As stated previously, we believe that the FR tests best isolate the effectsof class size on the distribution of test scores, and that the simple bootstrapis the more informative method of inference. Accordingly, the analysisusing tenth grade test scores largely reaffirms the conclusions we drew fromthe analysis of tenth grade test scores. Specifically, we conclude that theunconditional distribution of test scores favors smaller classes roughly onlyabove the 70th percentile. However, after purging test scores of a host ofobservable attributes using class size specific returns, we conclude that thedistribution of subsequent test scores is largely unaffected by class size,although students in the extreme lower tail are, if anything, aided by largerclasses.

5.3. Disaggregation of “Small” Classes

As noted in Ehrenberg et al. (2001), target class size varies across states,with some states targeting classes with 15 students andothers targeting classesof 20 students. In addition, the few experiments that have been conductedin the US typically involve changes in class size at the lower end of theclass size distribution. For example, Tennessee’s Project STAR comparedclasses of 13–17 students with classes of 22–26 students. Wisconsin’s SAGE(Student Achievement Guarantee in Education) program reduced manyclasses from between 21 and 25 students to only 12 to 15 students.California’s CSRP reduced classes from roughly 30 to 20 students. To seeif marginal changes in this range matter, we define two new class sizegroupings: Small I (10–15 students) and Small II (16–19 students).

5.3.1. Tenth Grade Test ScoresThe results pertaining to the tenth grade test score distributions are

displayed in Panels A–C of Table 4. The CDFs, integrated CDFs, anddifferences in the CDFs are given in Figures 9–11. In terms of theunconditional empirical distribution (Panel A), we observe Small II SSDSmall I. The observed dominance relation favoring the relatively largerclass size is consonant with the previous unconditional tests examiningtenth grade class size (Table 1). However, the second order ranking isnot statistically significant at conventional levels according to the simplebootstrap (Pr(s ≤ 0) = 0�696). The recentered bootstrap fails to reject thenull Ho : s = 0 (p-value = 0�926).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 28: Class Size and Educational Policy: Who Benefits from Smaller Classes?

TABLE

4Unco

ndition

alan

dco

ndition

alstoc

hasticdo

minan

cetests:

detaile

dco

mpa

rison

ofsm

allclasses

Distributions

Firstorde

rdo

minan

ceSe

cond

orde

rdo

minan

ce

Observed

XY

ranking

d 1,M

AX

d 2,M

AX

dPr�d

∗ 1≤

0�Pr�d

∗ 2≤

0�Pr�d

∗≤

0�Pr�d

∗ 1≥

d 1�Pr�d

∗ 2≥

d 2�Pr�d

∗≥

d�

s 1,M

AX

s 2,M

AX

sPr�s

∗ 1≤

0�Pr�s

∗ 2≤

0�Pr�s

∗≤

0�Pr�s

∗ 1≥

s 1�Pr�s

∗ 2≥

s 2�Pr�s

∗≥

s�

A.8th

Grade

Class

Size:10

thGrade

Test

Scores

(Unconditiona

l)Sm

allISm

allII

IISS

DI

1.94

30.27

80.27

80.00

00.11

40.11

40.56

40.69

20.69

228

2.15

7−0

�285

−0�285

0.00

20.69

40.69

60.53

60.57

80.57

60.01

60.00

80.02

40.01

60.89

80.80

20.17

20.20

00.37

20.04

20.94

60.92

6

B.8th

Grade

Class

Size:10

thGrade

Test

Scores

(Partia

lResidua

l)Sm

allISm

allII

IFS

DII

−0�360

9.59

2−0

�360

0.79

40.12

20.91

60.33

00.64

00.26

2−0

�369

2263

.625

−0�369

0.82

40.13

60.96

00.31

60.62

60.23

40.28

40.26

00.54

40.80

00.41

60.66

40.47

20.33

00.80

20.60

60.33

00.46

2

C.8th

Grade

Class

Size:10

thGrade

Test

Scores

(FullResidua

l)Sm

allISm

allII

Non

e2.29

61.19

11.19

10.00

00.00

00.00

00.89

80.90

40.90

423

9.62

696

.319

96.319

0.08

40.00

00.08

40.48

60.94

40.72

40.00

00.00

00.00

00.44

80.89

00.79

20.23

80.01

20.25

00.12

80.78

60.13

0

D.10

thGrade

Class

Size:12

thGrade

Test

Scores

(Unconditiona

l)Sm

allISm

allII

Non

e0.96

60.75

50.75

50.00

00.00

00.00

00.75

20.68

40.56

410

3.46

411

.552

11.552

0.01

60.40

00.41

60.55

00.50

00.42

60.01

80.01

60.03

40.25

00.37

80.05

60.17

60.20

00.37

60.27

00.59

60.24

8

E.10

thGrade

Class

Size:12

thGrade

Test

Scores

(Partia

lResidua

l)Sm

allISm

allII

IFS

DII

−0�558

8.30

0−0

�558

0.79

40.08

80.88

20.51

60.60

80.50

8−0

�585

2025

.631

−0�585

0.84

00.10

00.94

00.52

80.57

60.49

80.20

60.30

40.51

00.93

20.27

20.80

20.31

20.40

80.72

00.87

00.22

00.71

8

F.10

thGrade

Class

Size:12

thGrade

Test

Scores

(FullResidua

l)Sm

allISm

allII

Non

e2.06

81.10

41.10

40.00

00.00

00.00

00.97

80.98

60.98

615

0.14

413

2.35

313

2.35

30.14

60.00

00.14

60.55

60.97

60.58

00.00

00.00

00.00

00.58

60.92

20.88

00.20

20.00

00.20

20.20

20.72

20.03

4

SmallIclasseshave

10–1

5stud

ents;Sm

allII

classeshave

16–1

9stud

ents.Se

eTa

ble

1forfurther

details.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 29: Class Size and Educational Policy: Who Benefits from Smaller Classes?

360 E. Maasoumi et al.

FIGURE 9 CDFs and integrated CDFs: unconditional 10th grade test scores by 8th grade class sizes.Small I, 10–15 students, Small II, 16–19 students.

Examining the actual plots (Figure 9), we see that the Small IIdistribution lies to the right of the Small I distributions over the majorityof the support. In fact, if not for a few crossings, we would haveobserved a first order ranking. Moreover, it is interesting to note thatthe unconditional test score gap favoring Small II classes gets wider inthe upper tail (i.e., the QTEs are increasing across the quantiles). Thiscontrasts with previous unconditional eighth grade class size results inFigure 1, where (the previously defined) small classes fared better thanmedium and large classes only in the upper tail.

The PR test result is displayed in Panel B. Now, we observe Small I FSDSmall II. Given the reversal in ranking from the unconditional comparison,this implies that attributes associated with higher test scores are positivelycorrelated with class size over the range being analyzed, consistent withcompensatory policies in this range by schools. Moreover, the first orderranking is statistically significant at conventional levels according to thesimple bootstrap (Pr(d ≤ 0) = 0�916). The recentered bootstrap fails toreject the null Ho : d = 0 or s = 0 (p-values above 0.46). Examination ofFigure 10 reveals sizeable disparities in PR test scores that are fairly uniform

FIGURE 10 CDFs and integrated CDFs: partial residual 10th grade test scores by 8th grade classsizes. Small I, 10–15 students, Small II, 16–19 students.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 30: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 361

FIGURE 11 CDFs and integrated CDFs: full residual 10th grade test scores by 8th grade class sizes.Small I class as dominant category. Small I, 10–15 students, Small II, 16–19 students.

across the entire distribution: at any quantile of the distribution, studentswith 10 to 15 students score roughly four points higher than students inclasses with 16 to 19 students.

The results of the FR dominance tests are displayed in Panel C.Examination of the results reveals that the first order ranking found inthe PR test disappears. The elimination of the previous FSD rankingimplies that the returns to observables favors classes with 16 to 20 students.Indeed, this is confirmed in Panel A of Table 5, where the returns onthe whole favor Small II classes. In particular, the returns to teachereducation and salary, as well as the number of white teachers, yield theprimary discrepancies in favor of classes with 16 to 19 students. Thusthe improvement in returns as class size diminishes documented in theprevious section (Panel A, Table 3) is not monotonic.

Viewing the actual plots in Figure 11 shows that the disparity intest scores favors classes with 10 to 15 students in the lower tail of thedistribution (below the 30th percentile). However, the area between theCDFs (i.e., the cumulative sum, in absolute terms, of the QTEs) is greaterabove the 30th percentile, and this precludes a finding of SSD in favor ofclasses with 10 to 15 students. In the end, then, given our stated preferencefor the FR tests, disaggregation of eighth grade small classes indicates

TABLE 5 Oaxaca–Blinder decompositions of mean test score gaps

Class size Portion of observed gap due to differences in

X Y Endowments Intercepts Coefficients

Observedgap

A. 8th Grade Class Size: 10th Grade Test ScoresSmall I Medium II 1.430 1.332 1.079 −0�982

B. 10th Grade Class Size: 12th Grade Test ScoresSmall I Medium II −0�409 −0�023 6.061 −6�447

See Table 3.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 31: Class Size and Educational Policy: Who Benefits from Smaller Classes?

362 E. Maasoumi et al.

FIGURE 12 CDFs and integrated CDFs: unconditional 12th grade test scores by 10th grade classsizes. Small I, 10–15 students, Small II, 16–19 students.

that reduction in class size below 16 students only aids the lowest scoringof students and does not enjoy unambiguous support by all preferencefunctions in the class �2.

5.3.2. Twelfth Grade Test ScoresThe results analyzing the twelfth grade test score distributions are given

in Panels D–F of Table 4. The CDFs, integrated CDFs, and differencesin the CDFs are given in Figures 12–14. In terms of the unconditionaldistributions, we are unable to rank the distributions in either the first-or second-degree sense. Noteworthy, however, is that the recenteredbootstrap rejects the null H0 : d = 0 (p-value = 0�056), rejecting both strictdominance and equality of the distributions. Examining Figure 12, we seethat—consonant with the earlier results comparing small, medium, andlarge classes—the relatively large Small II classes outperform Small I classesup to approximately the 70th percentile. Thus, as is the case for eighthgrade class size, there is modest evidence of an unconditional advantage

FIGURE 13 CDFs and integrated CDFs: partial residual 12th grade test scores by 10th grade classsizes. Small I, 10–15 students, Small II, 16–19 students.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 32: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 363

for classes with 16 to 19 students over those with 10 to 15 students, exceptfor the highest performing students.

The PR dominance test results are displayed in Panel E. As with tenthgrade test scores, we find that Small I FSD Small II. This continues toimply a positive correlation between attributes improving test scores andclass size for students in classes with fewer than 20 students. However,unlike the previous results for eighth grade class size, this ranking isonly marginally statistically significant according to the simple bootstrap(Pr(d ≤ 0) = 0�882); the second order ranking is statistically significant(Pr(s ≤ 0) = 0�940). The recentered bootstrap fails to reject the null H0 :d = 0 or s = 0.

Examination of Figure 13 reveals a sizeable, fairly uniform disparity inPR test scores: at any quantile of the distribution, students in tenth gradeclasses with 10 to 15 students score roughly three to four points higherthan students in classes with 16 to 19 students. However, the distributionscome close to crossing in the extreme lower tail, and apparently do crosssufficiently frequently in the bootstrap resamples to preclude a strongerstatistically significant FSD ranking.

Lastly, the FR test results are displayed in Panel F. As with the previoussection examining tenth grade test scores, we are unable to rank theempirical distributions. Moreover, the recentered bootstrap rejects the nullHo : s = 0 (p-value = 0�034), rejecting both strict second order dominanceand equality of the distributions. The elimination of the FSD rankingobtained in Panel E implies that the returns to observables favors classeswith 16 to 19 students. Panel B in Table 5 confirms this.24 Figure 14 showsfurther consistency with the previous tenth grade test score results: thedisparity in test scores favors classes with 10 to 15 students in the lowertail of the distribution (below the 40th percentile). However, since the areabetween the CDFs (i.e., the cumulative sum (in absolute terms) of theQTEs) is greater above the 40th percentile, this precludes a finding of SSDin favor of classes with 10 to 15 students. Consequently, as stated above,preferences for classes of 10 to 15 or 16 to 19 students may differ amongindividuals with different welfare functions in the class �2.

6. CONCLUSION

Identifying the factors most relevant to student achievement isimportant to parents, as well as policymakers concerned with maximizingthe use of funds available for public schools. However, the impact ofimproved student achievement is potentially even more far-reaching,extending beyond school walls through its effect on completed education,

24Some of the returns that differed most across class size groupings are for size of the tenthgrade student body, student race, the number of white teachers, and teacher experience.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 33: Class Size and Educational Policy: Who Benefits from Smaller Classes?

364 E. Maasoumi et al.

FIGURE 14 CDFs and integrated CDFs: full residual 12th grade test scores by 10th grade class sizes.Small I class size is dominant category. Small I, 10–15 students, Small II, 16–19 students.

future earnings, racial disparities, and economic competitiveness. Whilethe factor most consistently discussed is class size, given the ease at whichit may be manipulated by policy, previous empirical examinations of theimpact of class size reductions have been mixed, with the results suggestingat best a modest impact. Despite this lack of convincing support, the USfederal government allocated $12 billion (over a seven-year period) toreduce class sizes (Hoxby, 2000a), the state of California has spent over$3.6 billion on class size reduction since 1996, 20 US states are currentlyundertaking or discussing policies to reduce class sizes, and the Dutchgovernment decided to allocate approximately $500 million (in US dollars)to reduce class sizes (Levin, 2001).

To investigate these previous mixed results, we analyze the relationshipbetween class size and student achievement using recently developedtests for stochastic dominance. The tests are nonparametric and utilizeinformation on the entire distribution of test scores. Moreover, throughthe use of bootstrap techniques, we are able to report the results of thedominance tests to a degree of statistical certainty. Thus a finding of afirst or second degree dominance is extremely powerful, implying thatany social welfare function that is increasing in test score levels (FSD) orincreasing and concave (SSD) will prefer one distribution over another.This type of analysis is useful for policy decisions as it lends itself tobroad-based, consensus ranking of outcomes. Furthermore, the absence ofsuch rankings is equally informative, implying that any judgment of onedistribution over another is subjective.

Using data from the National Education Longitudinal Study of 1988,we estimate the effects of eighth and tenth grade class size on theunconditional and conditional distributions of test scores in reading,social studies, mathematics, and science. The results shed some light onthe previous ambiguous findings and should prove quite informative forfuture educational policy discussions. First, we find that the unconditional

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 34: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 365

distributions of test scores favor students in larger classes below the 70thpercentile or so, and students in smaller classes in the upper tail. Inparticular, while marginal changes in class size in classes with at least20 students are not associated with changes in the unconditional testscore distributions, the unconditional test scores of low achieving (highachieving) students are superior in classes with at least (less than) 20students. Second, analyzing the conditional distribution of test scores,using class-size specific returns, we find little causal effect of marginalreductions in class size on test scores within the range of 20 or morestudents. However, reductions in class size from above 20 students to below20 students, as well as marginal reductions in classes with fewer than 20students, increase test scores for students below the median, but decrease testscores for students above the median.

Our methodology and findings point to several possible culprits whentrying to make sense of the current empirical evidence on class size.First, reducing class size by one student does not have a constant effect:reductions in classes above 20 students have essentially no impact, whilereductions in classes with 20 or fewer students raise the test scores ofsome students (those below the median) and lower the test scores of otherstudents (those above the median). Second, whereas the majority of thebeneficial impact of class size reduction arises because of the productivityenhancing effect that it has on other educational inputs, the majority ofprevious studies allow, at most, the impact of class size to vary by race orgender.

A. APPENDIX: TECHNICAL DETAILS

A.1. Computation of d and s

The test for FSD requires

(i) Computing the values of F (zj) and G(zj) for zj , j = 1, � � � , J , whereJ denotes the number of points in the support � that are utilized (J = 500in the application, where the points are equally spaced beginning at thefirst percentile and ending at the 99th percentile of the empirical support,�, to focus attention away from extreme values)

(ii) Computing the differences d1(zj) = F (zj) − G(zj) and d2(zj) =G(zj) − F (zj)

(iii) Finding d =√

NMN+M min�max�d1�,max�d2��.

If d ≤ 0 (to a degree of statistical certainty), then the null of FSD is notrejected. Furthermore, if d ≤ 0 and max�d1� < 0, then X FSD Y . On theother hand, if d ≤ 0 and max�d2� < 0, then Y FSD X . If d = max�d1� =max�d2� = 0, then the (estimated) distributions of X and Y are identical.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 35: Class Size and Educational Policy: Who Benefits from Smaller Classes?

366 E. Maasoumi et al.

The test for SSD requires the following additional steps:

(i) Calculating the sums s1j = ∑jk=1 d1(zk) and s2j = ∑j

k=1 d2(zk),j = 1, � � � , J

(ii) Finding s =√

NMN+M min�max�s1j�,max�s2j��

If s ≤ 0 (to a degree of statistical certainty), then the null of SSD is notrejected. Moreover, if s ≤ 0 and max�s1j� < 0, then X SSD Y ; otherwise, ifmax�s2j� < 0, then Y SSD X .

A.2. The Recentered Bootstrap

To obtain recentered bootstrap p-values, we compute the relativefrequency of �d∗∗ > d�, where d∗∗ is the recentered bootstrap estimate ofd . The recentering algorithm requires

(i) Generating bootstrap samples of size N (M ) from X (Y )(ii) Computing the values of F ∗(zj) and G ∗(zj) for zj , j = 1, � � � , J , where

the values of zj used to analyze the original sample are utilized(iii) Computing the differences dc

1(zj) = �F ∗(zj) − G ∗(zj) − �F (zj) − G(zj)and dc

2(zj) = �G ∗(zj) − F ∗(zj) − �G(zj) − F (zj)

(iv) Finding d∗∗ =√

NMN+M min�max�dc

1�,max�dc2��

We then compute the relative frequency of �d∗∗ > d�, where d is the sampleestimate of d .

8. ACKNOWLEDGMENTS

The authors are extremely grateful to David Drukker for assistance,Julian Betts, Eric Hanushek, and Christopher Jepsen for comments,seminar participants at the 8th Annual Texas Camp Econometrics,Georgetown University, University of Texas, Arlington, and the SOLELabor Economics Internet Seminar.

REFERENCES

Abadie, A. (2002). Bootstrap tests for distributional treatment effects in instrumental variablemodels. Journal of the American Statistical Association 97:284–292.

Akerhielm, K. (1995). Does class size matter?, Economics of Education Review 14:229–241.Amin, S., Rai, A. S., Topa, G. (2003). Does microcredit reach the poor and vulnerable? Evidence

from Northern Bangladesh. Journal of Development Economics 70:59–82.Anderson, G. (1996). Nonparametric tests of stochastic dominance in income distributions.

Econometrica 64:1183–1193.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 36: Class Size and Educational Policy: Who Benefits from Smaller Classes?

Class Size and Educational Policy 367

Angrist, J., Lavy, V. (1999). Using Maimonides’ rule to estimate the effect of class size on scholasticachievement. Quarterly Journal of Economics 114:533–575.

Barro, R. J. (2001). Human capital and growth. American Economic Review 91:12–17.Betts, J. R., Shkolnik, J. L. (2000). The effects of ability grouping on student math achievement and

resource allocation in secondary schools. Economics of Education Review 19:1–15.Bishop, J. A., Formby, J. P., Zeager, L. A. (2000). The effect of food stamp cashout on

undernutrition. Economics Letters 67:75–85.Bitler, M. P., Gelbach, J. B., Hoynes, H. H. (2005). Distributional impacts of the self-sufficiency

project. NBER Working Paper No. 11626.Boozer, M. A., Rouse, C. (2001). Intraschool variation in class size: patterns and implications. Journal

of Urban Economics 50:163–189.Card, D., Krueger, A. B. (1996). School resources and student outcomes: an overview of the

literature and new evidence from North and South Carolina. Journal of Economics Perspectives10:31–50.

Chubb, J. E., Moe, T. M. (1990). Politics, Markets and America’s Schools. Washington, DC: BrookingsInstitution Press.

Coleman, J. S., et al. (1966). Equality of Educational Opportunity. Washington, DC: Department ofHealth, Education, and Welfare.

Cook, M. D., Evans, W. E. (2000). Families or schools? Explaining the convergence in white andblack academic performance. Journal of Labor Economics 18:729–754.

Dearden, L., Ferri, J., Meghir, C. (2002). The effect of school quality on educational attainmentand wages. Review of Economics and Statistics 84:1–20.

Dobbelsteen, S., Levin, J., Oosterbeek, H. (2002). The causal effect of class size on scholasticachievement: distinguishing the pure class size effect from the effect of changes in classcomposition. Oxford Bulletin of Economics and Statistics 64:17–38.

Dustmann, C. (2003). The class size debate and educational mechanisms: editorial. Economic Journal113:F1–F2.

Ehrenberg, R. G., Brewer, D. J., Gamoran, A., Willms, J. D. (2001). Class size and studentachievement. Psychological Science in the Public Interest 2:1–30.

Eide, E., Showalter, M. H. (1998). The effect of school quality on student performance: a quantileregression approach. Economics Letters 58:345–350.

Fertig, M. (2003a). Who’s to blame? The determinants of German students’ achievement in thePISA 2000 study. IZA Discussion Paper 739.

Fertig, M. (2003b). Educational production, endogenous peer group formation and classcomposition—evidence from the PISA 2000 study. IZA Discussion Paper 714.

Figlio, D. N., Paige, M. E. (2002). School choice and the distributional effects of ability tracking:does separation increase inequality? Journal of Urban Economics 51:497–514.

Fisher, G., Wilson, D., Xu, K. (1998). An empirical analysis of term premiums using significancetests for stochastic dominance. Economics Letters 60:195–203.

Goldhaber, D., Brewer, D. (1997). Why don’t schools and teachers seem to matter? Assessing theimpact of unobservables on educational productivity. Journal of Human Resources 32m:505–523.

Hanushek, E. A. (1979). Conceptual and empirical issues in the estimation of educationalproduction functions. Journal of Human Resources 14:351–388.

Hanushek, E. A. (1986). The economics of schooling: production and efficiency in public schools.Journal of Economic Literature 24:1141–1177.

Hanushek, E. A. (1989). The impact of differential expenditures on school performance. EducationalResearcher 18:45–51.

Hanushek, E. A. (1996). School resources and student performance. In: Burtless, G., ed. Does MoneyMatter? The Effect of School Resources on Student Achievement and Adult Success. Washington, DC:Brookings Institution, pp. 43–73.

Hanushek, E. A. (2001). Black-white achievement differences and government interventions.American Economic Review 91:24–28.

Hanushek, E. A. (2002). Evidence, politics, and the class size debate. In: Mishel, L., Rothstein, R.,eds. The Class Size Debate. Washington, DC: Economic Policy Institute, pp. 37–65.

Hanushek, E. A. (2003). The failure of input-based schooling policies. Economic Journal 113:F64–F98.Hanushek, E. A., Kimko, D. D. (2000). Schooling, labor-force quality, and the growth of nations.

American Economic Review 90:1184–1208.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014

Page 37: Class Size and Educational Policy: Who Benefits from Smaller Classes?

368 E. Maasoumi et al.

Hanushek, E. A., Rivkin, S. G., Taylor, L. L. (1996). Aggregation and estimated effects of schoolresources. Review of Economics and Statistics 78:611–627.

Hoxby, C. M. (2000a). The effects of class size on student achievement: new evidence frompopulation variation. Quarterly Journal of Economics 90:1239–1285.

Hoxby, C. M. (2000b). Does competition among public schools benefit students and taxpayers?American Economic Review 90:1209–1238.

Jenks, C., Phillips, M., eds. (1998). The Black-White Test Score Gap. Washington, D.C.: BrookingsInstitution Press.

Jepsen, C., Rivkin, S. (2002). What is the tradeoff between smaller classes and teacher quality? NBERWorking Paper 9205.

Kaur, A., Prakasa Rao, B. L. S., Singh, H. (1994). Testing for second-order stochastic dominance oftwo distributions. Econometric Theory 10:849–866.

Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journalof Economics 1144:497–532.

Krueger, A. B. (2003). Economic considerations and class size. Economic Journal 113:F34–F63.Krueger, A. B., Whitmore, D. M. (2002). Would smaller classes help close the black-white

achievement gap? In: Chub, J., Loveless, T., eds. Bridging the Achievement Gap. Washington,DC: Brookings Institute Press.

Lazear, E. P. (2001). Educational production function. Quarterly Journal of Economics 116:777–801.Levin, J. (2001). For whom the reductions count: a quantile regression analysis of class size and

peer effects on stochastic achievement. Empirical Economics 26:221–246.Linton, O., Maasoumi, E., Whang, Y. J. (2005). Consistent testing for stochastic dominance: a

subsampling approach. Review of Economic Studies 72:735–765.Maasoumi, E., Heshmati, A. (2000). Stochastic dominance amongst swedish income distributions.

Econometric Reviews 19:287–320.Maasoumi, E., Heshmati, A. (2005). Evaluating dominance ranking of PSID incomes by various

household attributes. IZA Discussion Paper No. 1727.Maasoumi, E., Millimet, D. L. (2005). Robust inference concerning recent trends in U.S.

environmental quality. Journal of Applied Econometrics 20:55–77.McFadden, D. (1989). Testing for stochastic dominance. In: Part II of Fomby, T., Seo, T. K., eds.

Studies in the Economics of Uncertainty (in honor of J. Hadar). Springer-Verlag, pp. 113–134.Neuman, S., Oaxaca, R. L. (2004). Wage decompositions with selectivity corrected wage equations:

A methodological note. Journal of Economic Equality 2:3–10.Rivkin, S., Hanushek, E. A., Kain, J. (2002). Teachers, schools, and academic achievement.

Unpublished manuscript, Hoover Institution, Stanford University.Summers, A., Wolfe, B. (1977). Do schools make a difference? American Economic Review 56:639–652.Tiebout, C. M. (1956). A pure theory of local expenditures. Journal of Political Economy 64:416–424.Todd, P. E., Wolpin, K. I. (2003). On the specification and estimation of the production function

for cognitive achievement. Economic Journal 113:F3–F33.Wößmann, L. (2003). Schooling resources, educational institutions and student performance: the

international evidence. Oxford Bulletin of Economics and Statistics 65:117–170.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 10:

26 1

4 N

ovem

ber

2014