- Home
- Documents
- The impact of teacher subject knowledge on student achievement: Evidence from within-teacher within-student variation

prev

next

out of 11

Published on

25-Oct-2016View

221Download

0

Transcript

The impact of teacher subject knowledge owithin-teacher within-student variationJohannes Metzler a, Ludger Woessmann b,a University of Munich and Monitor Group, Germanyb University of Munich and Ifo Institute for Economic Research, CESifo and IZA, Poschingerstr. 5mte tentinmvdisubn malso 2012 Elsevier B.V. All rights reserved.ationalis cleadent leacterisand Riere is lJournal of Development Economics 99 (2012) 486496Contents lists available at SciVerse ScienceDirectJournal of Development Economicsj ourna l homepage: www.e lVirtually the only attribute that has been shown to be more frequently test scores in two academic subjects not only for each student, butalso for each teacher. This allows us to identify the impact of teachers'academic performance in a specic subject on students' academicperformance in the subject, while at the same time holding constantany student characteristics and any teacher characteristics that donot differ across subjects. We can observe whether the same studenttaught by the same teacher in two different academic subjects per-forms better in one of the subjects if the teacher's knowledge is rela-tively better in this subject. We develop a correlated random effectsmodel that generalizes conventional xed-effects models with xed ef- We thank Sandy Black, Jere Behrman, David Card, Paul Glewwe, Eric Hanushek, PatrickKline,Michael Kremer, Javier Luque, Lant Pritchett, MartinWest, the editor Duncan Thomas,two anonymous referees, and seminar participants at UC Berkeley, Harvard University, theUniversity of Munich, the Ifo Institute, the Glasgow congress of the European EconomicAssociation, the CESifo Area Conference on Economics of Education, the IZA confer-ence on Cognitive and Non-Cognitive Skills, and the Frankfurt conference of the GermanEconomicAssociation for helpful discussion and comments. Special gratitude goes to AndrsBurga Len of theMinisterio de Educacin del Per for provfor the teacher tests.Woessmann is thankful for the supporCampbell and Rita Ricardo-Campbell National Fellowship oUniversity. Support by the Pact for Research and Innovatiothe Regional Bureau for Latin America and the Caribbeanment Programme are also gratefully acknowledged. Corresponding author.E-mail addresses: jo.metzler@gmail.com (J. Metzler)(L. Woessmann).URL: http://www.cesifo.de/woessmann (L. Woessm0304-3878/$ see front matter 2012 Elsevier B.V. Aldoi:10.1016/j.jdeveco.2012.06.002ittle evidence that thosesalary decisions, namelycial for teacher quality.venting problems from unobserved teacher traits and non-randomsorting of students to teachers.We use a unique primary-school dataset from Peru that containscharacteristics most often used in hiring andteachers' education and experience, are cruPeru1. IntroductionOne of the biggest puzzles in educteacher quality puzzle: while therequality is a key determinant of stuabout which specic observable charcount for this impact (e.g., Hanushek2005; Rockoff, 2004). In particular, thproduction today is ther evidence that teacherarning, little is knowntics of teachers can ac-vkin, 2010; Rivkin et al.,signicantly correlated with student achievement is teachers' academicskills measured by scores on achievement tests (cf. Eide et al., 2004;Hanushek and Rivkin, 2006; Wayne and Youngs, 2003). The problemwith the latter evidence, however, is that issues of omitted variablesand non-random selection are very hard to address when estimatingcausal effects of teacher characteristics.1 In this paper, we estimate theimpact of teachers' academic skills on student achievement, circum-Student achievementTeacher knowledgea b s t r a c ta r t i c l e i n f oArticle history:Received 18 October 2011Received in revised form 8 June 2012Accepted 9 June 2012JEL classication:I20O15Keywords:Teachers differ greatly in howaccount for this. We estimawithin-teacher within-studstudents and their teachersper-grade schools, we circumodel that identies fromone standard deviation in9% of a standard deviation idifferent from zero. Effectsiding uswith reliability statisticst and hospitality by theW.Glennf theHoover Institution, Stanfordn of the Leibniz Association andof the United Nations Develop-, woessmann@ifo.deann).l rights reserved.n student achievement: Evidence from, 81679 Munich, Germanyuch they teach their students, but little is known about which teacher attributeshe causal effect of teacher subject knowledge on student achievement usingvariation, exploiting a unique Peruvian 6th-grade dataset that tested bothtwo subjects. Observing teachers teaching both subjects in one-classroom-ent omitted-variable and selection biases using a correlated random effectsfferences between the two subjects. After measurement-error correction,ject-specic teacher achievement increases student achievement by aboutath. Effects in reading are signicantly smaller and mostly not signicantlydepend on the teacher-student match in ability and gender.sev ie r .com/ locate /devecfects for students, teachers, and subjects. Ourmodel identies the effectbased on within-teacher within-student variation and allows us to testthe overidentication restrictions implicit in the xed-effects models.1 Recently, Lavy (2011) estimates the effect of teaching practices on studentachievement.We can additionally restrict our analysis to small, mostly rural schoolswith atmost one teacher per grade, excluding any remaining possibilitythat parents may have chosen a specic teacher for their students inboth subjects based on the teacher's specic knowledge in one subject,or of bias from non-random classroom assignments more generally (cf.Rothstein, 2010).We nd that teacher subject knowledge exerts a statistically andquantitatively signicant impact on student achievement. Oncecorrected for measurement error in teacher test scores, our resultssuggest that a one standard deviation increase in teacher test scoresraises student test scores by about 9% of a standard deviation inmath. Results in reading are signicantly smaller and indistinguish-United States include Ehrenberg and Brewer (1995), Ferguson (1998),Ferguson and Ladd (1996), Hanushek (1971, 1992), Murnane andPhillips (1981), Rockoff et al. (2011), Rowan et al. (1997), and487J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496able from zero in most specications. We also nd evidence that theeffect depends on the ability and gender match between teachersand students, in that above-median students are not affected by var-iation in teacher subject knowledge when taught by below-medianteachers and that female students are not affected by variation inteacher subject knowledge when taught by male teachers. The aver-age math result means that if a student was moved, for example,from a teacher at the 5th percentile of the distribution of teachermath knowledge to a teacher at the median of this distribution, thestudent's math achievement would increase by 14.3% of a standarddeviation by the end of the school year. Thus, teacher subject knowl-edge is a relevant observable factor that is part of overall teacherquality.Methodologically, our identication strategy for teacher effectsextends the within-student comparisons in two different subjectsproposed by Dee (2005, 2007) as a way of holding student characteris-tics constant.2 Because the teacher characteristics analyzed by Dee gender, race, and ethnicity do not varywithin teachers, that approachuses variation across different teachers, so that the identifying variationmay still be affected by selection and unobserved teacher charac-teristics. By contrast, our approach uses variation within individualteachers, a feature made possible by the fact that subject knowl-edge does vary within teachers. In our correlated random effectsmodel, we actually reject the overidentication restriction impliedin the xed-effects model that teacher knowledge effects are thesame across subjects. While in addition to teacher subject knowl-edge per se, the identied estimate may also capture the effect ofother features correlated with teacher subject knowledge such aspassion for the subject or experience in teaching the subject curricu-lum, robustness tests conrm that results are not driven by observedbetween-subject differences in instruction time, teachingmethods, andstudent motivation.The vast literature on education production functions hints atteacher knowledge as one possibly the only factor quite consis-tently associated with growth in student achievement.3 In his earlyreview of the literature, Hanushek (1986, p. 1164) concluded thatthe closest thing to a consistent nding among the studies is thatsmarter teachers, ones who perform well on verbal ability tests, dobetter in the classroom although even on this account, the existingevidence is not overwhelming.4 Important studies estimating the as-sociation of teacher test scores with student achievement gains in the2 Further examples using this between-subject method to identify effects of specicteacher attributes such as gender, ethnicity, credentials, teaching methods, instructiontime, and class size include Altinok and Kingdon (2012), Ammermller and Dolton(2006), Clotfelter et al. (2010), Dee and West (2011), Lavy (2010), and Schwerdt andWuppermann (2011).3 The Coleman et al. (1966) report, which initiated this literature, rst found thatverbal skills of teachers were associated with better student achievement.4 A decade later, based on 41 estimates of teacher test-score effects, Hanushek(1997, p. 144) nds that of all the explicit measures [of teachers and schools] that lendthemselves to tabulation, stronger teacher test scores are most consistently related tohigher student achievement. Eide et al. (2004, p. 233) suggest that, compared to morestandard measures of teacher attributes, a stronger case can be made for measures ofteachers' academic performance or skill as predictors of teachers' effectiveness. See alsoHanushek and Rivkin (2006).Summers and Wolfe (1977). Examples of education production func-tion estimates with teacher test scores in developing countries includeHarbison and Hanushek (1992) in rural Northeast Brazil, Tan et al.(1997) in the Philippines, Bedi and Marshall (2002) in Honduras, andBehrman et al. (2008) in rural Pakistan.5However, the existing evidence is still likely to suffer from biasdue to unobserved student characteristics, omitted school and teachervariables, and non-random sorting and selection into classrooms andschools (cf. Glewwe and Kremer, 2006). Obvious examples of suchbias include incidents where better-motivated teachers incite morelearning and also accrue more subject knowledge; where parents witha strong preference for educational achievement choose schools orclassrooms with better teachers and also further their children'slearning in other ways; and where principals assign students withhigher learning gains to better teachers. Estimates of teacher effectsthat convincingly overcome such biases are so far missing in theliterature.6An additional problem when estimating teacher effects is mea-surement error, which is likely to attenuate estimated effects (cf.Glewwe and Kremer, 2006, p. 989). The available measure mayproxy only poorly for the concept of teachers' subject knowledge, asin most existing studies, the examined skill is not subject-specic.Furthermore, any specic test will measure teacher subject knowl-edge only with considerable noise. In this paper, we have access toseparate tests of teachers' subject-specic knowledge in math andreading that are each generated from batteries of test questionsusing psychometric modeling. These provide consistent estimates ofthe internal reliability metric of the tests, which we use to adjust forbiases due to measurement error based on classical measurementerrortheory.Understanding the causes of student achievement is importantnot least because of its substantial economic importance. Differencesin student achievement tests can account for an important part of thedifference in long-run economic growth across countries, in particu-lar between Latin America and other world regions as well as withinLatin America (Hanushek and Woessmann, 2011, 2012). Peru hasbeen doing comparatively well in terms of school attendance, butnot in terms of educational quality as measured by achievementtests (cf. Luque, 2008; World Bank, 2007). Its average performanceon international student achievement tests is dismal compared to de-veloped countries (cf. OECD, 2003) and just below the average ofother Latin American countries. Of the 16 countries participating ina Latin American comparative study in 2006, Peruvian 6th-gradersranked 9 and 10, respectively, in math and reading (LLECE, 2008).In what follows, Section 2 derives our econometric identicationstrategy. Section 3 describes the dataset and descriptive statistics.Section 4 reports our results and robustness tests.2. Empirical identication2.1. The correlated random effects modelWe consider an education production function with an explicitfocus on teacher skills:yi1 1Tt1 Ut1 Zi Xi1 i t1 i1 1ayi2 2Tt2 Ut2 Zi Xi2 i t2 i2 1b5 See Behrman (2010) and Glewwe and Kremer (2006) for recent reviews of the lit-erature on education production functions in developing countries.6 Note that because of the association of teacher subject knowledge with otherteacher traits, even an experimental setup that randomly assigns teachers to studentswould not be able to identify the effect of teacher subject knowledge.where yi1 and yi2 are test scores of student i in subjects 1 and 2 (mathand reading in our application below), respectively. Teachers t areet al., 2010; Dee, 2005, 2007). Note, however, that in order to identify and in specications like Eqs. (3a) and (3b) that restrict coef-cients to be the same across subjects, it has to be assumed that the as-signment of students to teachers is random and, in particular, is not488 J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496characterized by subject-specic teacher knowledge Ttj and non-subject-specic teacher characteristics Ut such as pedagogical skillsand motivation. (The latter differ across the two equations only ifthe student is taught by different teachers in the two subjects.) Addi-tional factors are non-subject-specic (Zi) and subject-specic (Xij)characteristics of students and schools. The error term is composedof a student-specic component i, a teacher-specic component t,and a subject-specic component ij.The coefcient vectors 1, 2, and characterize the impact of allsubject-specic and non-subject-specic teacher characteristics thatconstitute the overall teacher quality effect as estimated by value-added studies (cf. Hanushek and Rivkin, 2010). As indicated earlier,several sources of endogeneity bias, stemming from unobserved studentand teacher characteristics or from non-random sorting of students toschools and teachers, are likely to hamper identication of the effect ofteacher subject knowledge in Eqs. (1a) and (1b).Following Chamberlain (1982), we model the potential correla-tion of the unobserved student effect i with the observed inputs ina general setup: i 1Tt1 2Tt2 1Ut1 2Ut2 Zi Xi1 Xi2 i 2where i is uncorrelated with the observed inputs. We allow the and parameters to differ across subjects, but assume that the param-eters on the subject-specic student characteristics are the sameacross subjects.Substituting Eq. (2) into Eqs. (1a) and (1b) and collecting termsyields the following correlated random effects models:7yi1 1 1 Tt1 2Tt2 1 Ut1 2Ut2 Zi Xi1 Xi2 t1 i1 3ayi2 1Tt1 2 2 Tt2 1Ut1 2 Ut2 Zi Xi1 Xi2 t2 i2 3bwhere ij=ij+i. In this model, teacher scores in each subject enterthe reduced-formequation of both subjects. The coefcients capture theextent to which standard models would be biased due to the omission ofunobserved teacher factors,while theparameters represent the effect ofteacher subject knowledge.Themodel setupwith correlated randomeffects allows us to test theoveridentication restrictions implicit in conventional xed-effectsmodels, which are nestedwithin the unrestricted correlated random ef-fects model (see Ashenfelter and Zimmerman, 1997). Since the seminalcontribution to cross-subject identications of teacher effects by Dee(2005), available xed-effects estimators (commonly implemented inrst-differenced estimations) implicitly impose that teacher effectsare the same across subjects. Rather than initially imposing such arestriction, after estimating the system of Eqs. (3a) and (3b) we canstraightforwardly test whether 1=2= and whether 1=2=.Only if these overidentication restrictions cannot be rejected, we canalso specify correlated random effects models that restrict the coef-cients, the coefcients, or both of them to be the same across subjectsin Eqs. (3a) and (3b).A correlated random effects model that enacts both of these re-strictions is equivalent to conventional xed-effects models that per-form within-student comparisons in two subjects to eliminate biasfrom unobserved non-subject-specic student characteristics. Theyhave been applied to estimate effects of non-subject-specic teacherattributes such as gender, ethnicity, and credentials (e.g., Clotfelter7 This correlated random effects specication is similar to siblings models of earn-ings effects of education (Ashenfelter and Krueger, 1994; Card, 1999). We thank DavidCard for alerting us to this interpretation of our model.related to students' subject-specic propensity for achievement.Otherwise, omitted teacher characteristics such as pedagogical skillsand motivation, comprised in the teacher-specic error componentt, could bias estimates of the observed teacher attributes.In order to avoid such bias from omitted teacher characteristics,we propose to restrict the analysis to samples of students whoare taught by the same teacher in the two subjects. In such a setting,Ut1=Ut2=Ut and t1=t2=t, so that the education production func-tions simplify to:yi1 1 1 Tt1 2Tt2 1 2 Ut Zi Xi1 Xi2 t i1 4ayi2 1Tt1 2 2 Tt2 1 2 Ut Zi Xi1 Xi2 t i2 4bWith the additional restrictions that 1=2= and 1=2=,this model has a straightforward rst-differenced representation:yi1yi2 Ti1Ti2 Xi1Xi2 i1i2 5which is equivalent to including both student and teacher xed effectsin a pooled specication. In this specication, any teacher characteristicthat is not subject-specic (Ut and t) drops out.Identication of teacher effects is still possible in our setting be-cause teacher subject knowledge varies across subjects. Note thatthis specication can be identied only in samples where studentsare taught by the same teacher in the two subjects. This makes it im-possible to identify teacher effects for attributes that do not vary acrossthe two subjectswithin the same teacher. Thus, this specication cannotidentify effects of teacher attributes that are not subject-specic, such asgender and race.8 But it allows eliminating bias from standard omittedteacher variables when estimating the effect of teacher subjectknowledge.A nal remaining concern might be that the matching of studentsand teachers is not random with respect to their respective relativeperformance in the two subjects. For example, if there are two class-rooms in the same grade in a school, in principle it is conceivable thatstudents who are better in math than in reading are assigned to oneof the classes and that a teacher who is better in math than in readingis assigned to teach them. Therefore, to avoid bias from non-randomallocation of teachers to students, we nally restrict our main sampleto schools that have only one classroom per grade. This eliminatesany bias from sorting between classrooms within the grade of aschool. While this restriction comes at the cost of identicationfrom a sample of relatively small and mostly rural schools that arenot necessarily representative of the total population, additional anal-yses suggest that the sample restriction is not driving our results. Byrestricting the analysis to schools in mostly rural regions where par-ents have access to only one school, the restricted sample additionallyeliminates any possible concern of non-random selection of schoolsby parents based on the relative performance in math vs. reading ofthe schools and the students. Note that, by eliminating non-randomteacher selection, the sample restriction also rules out bias fromprior differences in student achievement, as students cannot be8 The within-teacher identication also accounts for possible differences betweenteachers in their absenteeism from the classroom, an important topic of recent re-search on teachers in developing countries (cf. Banerjee and Duo, 2006). Note alsothat in a group of six developing countries with data, teacher absenteeism rates arelowest in Peru (Chaudhury et al., 2006).allocated on grounds of within-student performance differences be-tween the subjects to appropriate teachers.9Oneway to test that studentteacher assignment is indeed random inour sample with respect to their relative capabilities in the two subjectsis to include measures of subject-specic student characteristics (Xij) invariable leads to well-known attenuation bias, where the true effect is asymptotically attenuated by the reliability ratio (e.g., Angristand Krueger, 1999):yis ssTis ~ is; s Cov Tis; Tis Var Tis Var Tis Var Tis Var eis 7Thus, the unbiased effect of T on y is given by s ^ s=s.10 Theproblem is that in general, we do not know the reliability withwhich a variable is measured.But in the current case, where the explanatory variable is a testscore derived using psychometric modeling, the underlying psycho-489J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496the model. Therefore, in robustness specications we will include asubject-specic index of student motivation in the model. Furthermore,the observed between-subject variation in teacher subject knowledgemay correlate with other, partly unobserved subject-specic teachercharacteristics, such as passion for the subject or experience in teachingthe curriculum of the subject. The estimates of our models are thus bestinterpreted as the effect of teacher subject knowledge and other subject-specic traits that go with it. To the extent that features like subject-specic passion are the outcome of the knowledge level, rather thanvice versa, estimates that donot control for other subject-specic teachercharacteristics capture the full reduced-form effect of teacher subjectknowledge. But to test towhat extent subject-specic teacher knowledgemay capturemore than just plain knowledge, we also present robustnessspecications that include teacher characteristics other than subjectknowledge that might vary between subjects, namely subject-specicmeasures of teaching hours and teaching methods. Finally, teacherswhose rst language is not Spanish may not only be relatively better inmath than in reading, but may also put relatively less effort into teachingSpanish. Since the native language is likely correlated between studentsand their teachers, we will test for robustness in a sample restricted tostudents whose rst language is Spanish.2.2. Correcting for measurement errorAny given test can only be an imperfect measure of our explanatoryvariable of interest, teachers' subject knowledge T. A rst source ofmea-surement error arises if the available measure proxies only poorly forthe concept of teachers' subject knowledge. In most existing studies,the tested teacher skills can be only weakly tied to the academic knowl-edge in the subject in which student achievement is examined and can-not distinguish between subject-specic cognitive and general non-cognitive teacher skills (see Rockoff et al., 2011 for a notable exception).Often, the examined skill is not subject-specic, either when based on arudimentary measure of verbal ability (e.g., Coleman et al., 1966) orwhen aggregated across a range of skills including verbal and pedagogicability as well as knowledge in different subjects (e.g., Summers andWolfe, 1977). A second source of measurement error arises becauseany specic testwillmeasure teacher subject knowledge onlywith con-siderable noise. This is most evident when the result of one single mathquestion is used as an indicator of math skills (cf. Rowan et al., 1997),but it applies more generally because the reliability of any given itembattery is not perfect.As is well known, classical measurement error in the explanatoryvariable will give rise to downward bias in the estimated effects.Glewwe and Kremer (2006, p. 989) argue that this is likely to be a se-rious problem in estimating school and teacher effects. Assume that,in any given subject s, true teacher test scores T are measured withclassical measurement error e:Tis Tis eis; E eis Cov Tis; eis 0 6Suppose that T is the only explanatory variable in an educationalproduction function like Eqs. (1a) and (1b) with mean-zero variables,and e and are uncorrelated. Classical measurementerror theorytells us that using measured T instead of true T as the explanatory9 A remaining potential source of non-random sorting might be selective drop-outbased on differences in teacher subject knowledge between the two subjects. Such se-lectivity seems quite unlikely, though, and drop-out by grade 6 is very uncommon inPeru. In the 2004 Demographic and Health Survey, 97.0% of 12-year-olds the rele-vant age for grade 6 are currently enrolled in school, and 93.4% of those aged1519 have attained grade 6 (Filmer, 2007).metric test theory in fact provides estimates of the reliability ratio,the most commonly used statistic being Cronbach's (Cronbach,1951). The method underlying Cronbach's is based on the ideathat reliability can be estimated by splitting the underlying testitems in two halves and treating them as separate measures of the un-derlying concept. Cronbach's is the mean of all possible split-half co-efcients resulting from different splittings of a test and as suchestimates reliability as the ratio of true variance to observed varianceof a given test. It is routinely calculated as a measure of the reliabilityof test metrics, and provided as one of the psychometric properties inour dataset.11Of course, such reliabilitymeasures cannot informus about the test'svalidity how good a measure it is of the underlying concept (teachersubject knowledge in our case). But they provide uswith information ofthe measurement error contained within the test metric derived fromthe underlying test items. Note that differences between the test metricand the underlying concept only reduce total reliability further, renderingour adjustments still a conservative estimate of the true effect of teachersubject knowledge on student achievement.3. Data and descriptive statisticsTo implement our identication strategy, which is rather demand-ing in terms of specic samples and testing in two separate subjects ofboth students and teachers, we employ data from the 2004 Peruviannational evaluation of student achievement, the Evaluacin nacionaldel rendimiento estudiantil (EN 2004). The study tested a represen-tative sample of 6th-graders in mathematics and reading, covering atotal of over 12,000 students in nearly 900 randomly sampled primaryschools. The sample is representative at the national level and for com-parisons of urban vs. rural areas, public vs. private, andmulti-grade12 vs.complete schools.The two tested subjects, math and reading, are subjects that areseparately taught in the students' curriculum. The student testswere scheduled for 60 min each and were developed to be adequatefor the curriculum and knowledge of the tested 6th-graders. As aunique feature of EN 2004, not only students but also their teacherswere required to take tests in their respective subjects. The teachertests were developed independently from the student tests, so that10 For a derivation of the measurement-error correction formula in the rst-differenced model, see the working-paper version of this paper (Metzler andWoessmann, 2010). While in contrast to the standard rst-differenced model, our cor-related random effects model has more than one explanatory variable, results of thecorrelated random effects model without controls are very close to the ones reportedhere.11 In Rasch modeling, there are additional alternative measures of the reliability ratio,such as the person separation reliability. In the EN 2004 teacher test, these are veryclose to Cronbach's (at 0.730.77 in math and 0.580.62 in reading), so that applyingthem provides very similar (even slightly larger) estimates of the true effect.12 Multi-grade schools, in which several grades are served in the same class by thesame teacher, are a wide-spread phenomenon in many developing countries whereparts of the population live in sparsely populated areas (cf. Hargreaves et al., 2001).For example, the remoteness of communities in the Andes and the Amazon basinmeans a lack of critical mass of students to provide complete schools that have at leastone classroom per grade.the two do not have common items. While in most studies in the exis-ting literature, tested teacher skills can be only weakly tied to theacademic knowledge in the subject in which student achievementis examined (cf. Wayne and Youngs, 2003), the separate subject-specic teacher tests in EN 2004 allow for an encompassing measure-ment of teacher subject knowledge in two specic subjects.13 Bothstudent and teacher tests in both subjects were scaled using Raschmodeling by the Unit for Quality Measurement (UMC) of the PeruvianMinistry of Education.(similar values in reading). This suggests that, at least on the availablemeasures, the teacher population appears more homogeneous thanthe student population in the ruralurban comparison.4. Results4.1. Main resultsAs a reference point for the existing literature and the subsequentestimates on the differing-subject teacher scores are positive in both models in that sam-ple, they are smaller than the same-subject coefcients and statistically insignicant in the490 J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496Of the 12,165 students in the nationally representative dataset,56% (6819 students) were taught by the same teacher in math andreading. Of the latter, 63% (4302 students) are in schools that haveonly one class in 6th grade. The combination of being served by thesame teacher in both subjects in schools with only one classroom in6th grade is quite frequent, as the Peruvian school system is charac-terized by many small schools spread out over the country to servethe dispersed population. As the main focus of our analyses will beon this same-teacher one-classroom (STOC) sample (although wealso report cross-sectional estimates for the full sample below), wefocus on this sample here.14Table A1 in Appendix A reports descriptive statistics. Both studentand teacher test scores are scaled to a mean of 0 and standard devia-tion of 1. The rst language was Spanish for 87% of the students, and anative language for the others. Descriptive test scores reveal thatSpanish-speaking students fare signicantly better than studentswhose rst language is not Spanish, although their teachers performonly slightly better. Both students and teachers score better in urban(compared to rural) areas, in private (compared to public) schools,and in complete (compared tomulti-grade) schools. In terms of teachereducation, 28% of the teachers hold a university degree, compared to adegree from a teaching institute. Both math and reading is taught atan average of about 6 h per week.Subject-specic inspection (not shown) reveals only a few sub-stantive subject-specic differences in sub-populations. Boys andgirls score similar in reading, but boys outperform girls on averagein math. While male and female teachers score similarly on themath test, male teachers score worse than female teachers in reading.Teachers with a university degree score worse in math but slightlybetter in reading compared to teachers with a degree from aninstitute.Throughout the paper, we express results as standard deviationsof student achievement per standard deviation of teacher achieve-ment. While no external benchmark is available to provide contextto the size of the achievement variation for either test in EN 2004, itis worth noting that on international tests, the average learning gainof students during one year of schooling (the grade-level equiva-lent) is usually equivalent to roughly 2535% of a standard deviationof student achievement. For Peruvian 15-year-olds in 10th and 11thgrade in PISA 2009, the grade-level equivalent equals 31.0% of a stan-dard deviation (no similar data are available for 6th-graders). Apartfrom this, we can contextualize the test-score variations in terms ofthe observed differences in teacher and student test scores betweensub-groups of the population. For example, the difference in teachermath knowledge between rural and urban areas is 41.3% of a teacherstandard deviation, whereas the difference in student knowledge be-tween rural and urban areas is 79.2% of a student standard deviation13 Evidence on content-related validity comes from the fact that both teacher test mea-sures do not show a signicant departure from unidimensionality. Principal componentanalysis of the Rasch standardized residuals shows a rst eigenvalue of 1.7 in math and1.5 in reading, indicating that both tests are measuring a dominant dimension mathand reading subject knowledge, respectively. Additional validity stems from the fact thatat least ve subject experts in math and reading, respectively, certied adequacy of testitems for both teacher tests.14 While the restricted STOC sample is certainly not a random sample of the full pop-ulation of Peruvian 6th-graders (see Table A2 in Appendix A), a matching procedurereported below indicates that the restriction is not driving our results.unrestricted model.16 If we denote the coefcient on the same-subject teacher test score in Eqs. (3a) and(3b) byj, this test is implemented by testing11=22, where the s are estimat-ed as the coefcients on the other-subject teacher test score in each equation. Given thatanalyses, Table 1 reports cross-sectional regressions of student testscores on teacher test scores and different sets of regressors in thetwo subjects. Throughout the paper, standard errors are clustered atthe classroom level. As the rst four columns show, there are statisti-cally signicant associations between student test scores and teachertest scores in both subjects in the full sample, both in a bivariate set-ting and after adding a set of controls. The associations are substan-tially smaller when controlling for the background characteristics, inparticular the three school characteristics urban location, privateoperation, and being a multi-grade school and the reading estimateremains only marginally signicant. Student achievement in bothsubjects is signicantly positively associated with Spanish as a rstlanguage, urban location, private operation, being a complete school,and a student motivation index. Female students perform better inreading but worse in math, and reading (but not math) achievementis signicantly positively associated with having a female teacher. Asthe subsequent columns reveal, the signicant associations betweenstudent and teacher test scores hold also in the sample of studentstaught by the same teacher in both subjects and in the same-teacher one-classroom (STOC) sample, although point estimates arereduced (and become insignicant in reading in the STOC sample),indicating possible selection bias in the full sample.Table 2 presents the central result of our paper, the correlated ran-dom effects model of Eqs. (3a) and (3b) estimated by seeminglyunrelated regressions. By restricting the student sample to thosewho are taught by the same teacher in both subjects in schools withonly one classroom per grade (STOC sample), we avoid bias stem-ming from within-school sorting.15 In our model, the effect of teachersubject knowledge on student achievement in math, 1, is given bythe difference between the regression coefcient on the teachermath test score in the math equation minus the regression coefcienton the teacher math test score in the reading subject (see Eqs. (3a)and (3b)), and vice versa for reading. In the STOC sample, the pointestimates are surprisingly close to the cross-sectional estimateswith basic controls, supporting the view that the teacher-student as-signment is indeed as good as random in this sample.The estimates of this within-teacher within-student specicationindicate that an increase in teacher math test scores by one standarddeviation raises student math test scores by about 6.4% of a standarddeviation. A 2 test reveals that this effect is statistically highly signif-icant. In reading, albeit also positive, the estimate is much lower anddoes not reach statistical signicance. In fact, the tests for over-identication restrictions reject the hypothesis that the effect of interestis the same in the two subjects, advising against estimating the simpliedrst-differenced model of Eq. (5).1615 Results on the s are qualitatively similar in the larger same-teacher sample withoutrestriction to schools with only one classroom per grade (see Table A3 Appendix A). Inthat sample, student math scores are signicantly related to teacher math scores aftercontrolling for teacher reading scores, and student reading scores are signicantly relatedto teacher reading scores after controlling for teacher math scores. While the coefcientj= j+ j, this provides a test of 1=2.We can only speculate about the reasons why results differ mark-edly between math and reading. A rst option is that teacher subjectTable 1Cross-sectional regressions.Full sample Same-teacher sample Same-teacher one-classroom (STOC)OLS SUR OLS SUR OLS SURMath Reading Math Reading Math Reading Math Reading Math Reading Math Reading(1) (2) (3) (4) (5)Teacher test score 0.264 0.219 0.094 0.027 0.147(0.027) (0.023) (0.019) (0.018) (0.041)Student female 0.094 0.063(0.025) (0.027)Student 1st language Spanish 0.709 0.748(0.076) (0.066)Urban area 0.323 0.392(0.066) (0.067)Private school 0.798 0.708(0.063) (0.055)Complete school 0.262 0.281(0.069) (0.069)Teacher female 0.025 0.104(0.040) (0.038)Teacher university degree 0.022 0.039(0.045) (0.041)Student motivation 0.517 0.195(0.065) (0.046)Hours 0.017 0.006(0.012) (0.010)Teaching method (8 indicators) Yes YesF (OLS) / 2 (SUR) 92.46 89.82 806.76 12.79Observations (students) 12,165 12,165 6461 6819Classrooms (clusters) 893 893 500 521Dependent variable: student test score in math and reading, respectively. Clustered standerrors (adjusted for clustering at classroom level) in parentheses: signicance at 1, 5,491J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496knowledge may be genuinely more relevant in the productionTable 2The effect of teacher test scores: correlated random effects models.Unrestricted model Model restricting1=2(1) (2)Math Reading Math ReadingImplied subject 0.064 0.014 0.060 0.0192 (subject=0) 13.138 0.599 9.585 1.095Prob>2 [0.0003] [0.439] [0.002] [0.295]Regression estimates:Teacher test score insame subject0.047 0.039 0.063 0.022(0.037) (0.034) (0.025) (0.020)Teacher test score inother subject0.026 0.017 0.004 0.004(0.036) (0.032) (0.019) (0.019)2 (math=reading) 0.585 Prob>2 [0.444] 2 (math=reading) 6.948 3.477Prob>2 [0.008] [0.062]2 353.45 354.57Observations (students) 3936 3936Classrooms (clusters) 316 316subject, corrected formeasurement error0.087 0.022 0.081 0.029 (Cronbach's ) 0.74 0.64 0.74 0.64Dependent variable: student test score in math and reading, respectively. Regressions inthe two subjects estimated by seemingly unrelated regressions (SUR). Sample: Same-teacher one-classroom (STOC). Implied represents the effect of the teacher test score,given by the difference between the estimate on the teacher test score in the respectivesubject between the equation of the student test score in the respective subject and theequation of the student test score in the other subject (see Eqs. (3a) and (3b)). Regressionsinclude controls for student gender, student 1st language, urban area, private school, com-plete school, teacher gender, and teacher university degree. Clustered standard errors inthe SUR models are estimated by maximum likelihood. Robust standard errors (adjustedfor clustering at classroom level) in parentheses: signicance at 1, 5, and 10%.function of math rather than reading skills. Another possibility isthat by 6th grade, reading skills are already far developed and less re-sponsive to teacher inuence, whereas math skills still developstrongly. It could be that the teacher reading test has more noisethan the math test, so that estimates suffer from greater attenuationbias. It could also be that the underlying true distribution of teacher(6) (7) (8) (9) (10) (11) (12) 0.182 0.073 0.032 0.119 0.158 0.061 0.017(0.032) (0.022) (0.018) (0.054) (0.042) (0.025) (0.023)0.074 0.057 0.098 0.024(0.029) (0.030) (0.035) (0.035)0.765 0.741 0.639 0.623(0.081) (0.071) (0.079) (0.072)0.332 0.382 0.321 0.364(0.069) (0.069) (0.077) (0.078)0.701 0.570 0.725 0.562(0.099) (0.077) (0.115) (0.092)0.273 0.290 0.360 0.371(0.077) (0.076) (0.081) (0.077)0.010 0.100 0.110 0.066(0.059) (0.053) (0.070) (0.066)0.004 0.024 0.060 0.058(0.064) (0.060) (0.079) (0.075)0.701 0.315 0.638 0.382(0.076) (0.058) (0.083) (0.077)0.022 0.014 0.001 0.003(0.015) (0.012) (0.018) (0.016)Yes Yes Yes Yes32.71 522.98 4.96 13.97 470.366819 4987 4302 4302 3121521 398 346 346 261ard errors in the SUR models are estimated by maximum likelihood. Robust standardand 10%.knowledge is tighter in reading than in math (relative to the studentdistribution in each subject), so that a one standard-deviation changein teacher scores reects smaller variation in actual knowledge inreading than in math.As discussed in Section 2.2 above, the estimates still suffer fromattenuation bias due to measurement error in the teacher testscores. Following Eq. (7), the unbiased effect of teacher subjectknowledge on student achievement is given by s ^ s=s. In theEN 2004, the reliability ratio (Cronbach's ) of the teacher mathtest is M=0.74, and the reliability ratio of the teacher readingtest is R=0.64 (in line with the argument just made). Thus, asreported at the bottom of Table 2, the true effect of teacher mathknowledge on student math achievement turns out to be 0.087:an increase in student math achievement by 8.7% of a standard de-viation for a one standard deviation increase in teacher mathknowledge.The overidentication tests do not reject that the selection term (given by the coefcient on other-subject teacher scores in eachequation) is the same in the two subjects, allowing us to estimate arestricted model that assumes 1=2 in column (2). Results arehardly different in this restricted model, although similarity of resultsacross subjects is only marginally rejected in this model.17 The esti-mate of is close to zero in the STOC sample, a further indication17 Table A4 in Appendix A shows results of basic rst-differenced specications,which would be warranted when effects do not differ across subjects and which effec-tively represent an average of our estimates in math and in reading. The working-paper version (Metzler and Woessmann, 2010) provides further details on rst-differenced specications.that selection effects may indeed be eliminated by the sample restric-tion to students who have the same teacher in both subjects insimilar sub-sample models based on the gender match betweenteachers and students. Results suggest that female students do notprot from higher teacher math knowledge when they are taughtby a male teacher. They do so when taught by a female teacher, asdo male students independent of the teachers' gender. This suggeststhat the importance of teacher subject knowledge also depends onThe variables along which this STOC sub-sample is made comparableto the initial population are student test scores in both subjects, teacher492 J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496schools that have only one classroom in the grade. In the largersame-teacher sample (shown in Table A3 in Appendix A), the esti-mate of is positive and (marginally) signicant, indicating positiveselection effects in that sample.4.2. Evidence on heterogeneous effectsTo test whether the main effect hides effect heterogeneity in sub-populations, we estimate our model separately by stratied sub-samples (Table 3). We start by looking at the following sub-groups:female vs. male students, students with Spanish vs. a native languageas their rst language, urban vs. rural areas, private vs. public schools,complete vs. multi-grade schools, male vs. female teachers, andteachers with a university vs. institute degree. With only two excep-tions, both math and reading results are robust within the sub-samples, and the difference between the respective split samples isnot statistically signicant.18 The exceptions are that the math resultsare no longer statistically signicant in the two smallest sub-samples,private schools and teachers with a university degree. While the smallsample sizes are most likely behind this pattern, the point estimatesin these two sub-samples are also close to zero, and the difference ofthe effect for teachers with a university degree compared to the effectfor teachers without a university degree is statistically signicant.With this exception, there is little evidence that the effect of teachersubject knowledge differs across these sub-populations.The exception of the non-effect for teachers holding a universitydegree might indicate that the effect of teacher subject knowledgeattens out at higher levels of teacher skills. To test whether the effectof teacher subject knowledge is non-linear, Table 3 also reports re-sults for sub-samples of teachers below and above the median interms of their knowledge level, averaged across the two subjects.There is no evidence for heterogeneous effects. The same is true inmodels that interact the teacher test-score difference with a continu-ous measure of the average test-score level or with test-score levels ofthe two subjects separately. We also experimented with additionaltypes of non-linear specications, with similar results. At least withinour sample of Peruvian teachers, there is no indication that the effectof teacher subject knowledge levels out at higher knowledge levels.Table 3 also reports results separately for students achievingbelow and above the median achievement level, averaged across thetwo subjects. The math results are signicant in both sub-samples.The effect in reading becomes marginally signicant in the sub-sample of students above the median level and is signicantly higherthan the effect for low-performing students.To analyze this pattern further, the left panel of Table 4 looksdeeper into the ability match between teachers and students. It pro-vides estimates for four sub-samples based on whether students andteachers, respectively, have below or above median achievement.Here, the sub-sample of above-median students taught by below-median teachers falls out of the general pattern in that the point esti-mate in math is small and statistically insignicant. This suggests thatthe teacher-student ability match matters: If teachers are relativelylow-performing, their subject knowledge does not matter much forstudents who are already relatively high-performing. This type ofheterogeneity is an interesting aspect of education production functionsthat is frequently overlooked.Similarly, the right panel of Table 4 looks at the teacher-studentgender match. Given the apparent descriptive gender differences inmath vs. reading achievement, and given the recent evidence thatbeing assigned to a same-gender teacher may affect student out-comes (Carrell et al., 2010; Dee, 2005, 2007), it shows estimates for18 We also do not nd that the effect of teacher subject knowledge interacts signi-cantly with teaching hours.test scores in both subjects, and dummies for different school types(urban, public, andmulti-grade). On this sub-sample of the STOC samplethat is comparable to the whole population, we estimate our within-teacher within-student specication. In a bootstrapping exercise, werepeated this procedure 1000 times. The regression analyses yield anaverage teacher math score effect of 0.065, with an average 2 of7.524 (statistically signicant at the 3% level), for an average numberof 1739 observations (detailed results available on request). These re-sults suggest that the teacher testscore effect of our main specicationis not an artifact of the STOC sample, but is likely to hold at a similarmagnitude in the Peruvian 6th-grade student population at large.4.3. Specication testsAs a test for the randomness of the studentteacher assignment, wecan add subject-specic measures of student and teacher attributes tothe within-teacher within-student specication. Such extended modelsalso address the question to what extent subject-specic teacher knowl-edge may capture other subject-specic features correlated with it.Results are reported in Table A5 in Appendix A.We rst introduce an index of subject-specic student motivation(column (1)).19 Subject-specic differences in student motivationmay be problematic if they are pre-existing and systematically corre-lated with the difference in teacher subject knowledge between sub-jects. On the other hand, they may also be the result of the quality ofteaching in the respective subject, so that controlling for differencesin student motivation between the two subjects would lead to an un-derestimation of the full effect of teacher subject knowledge. In anyevent, although student motivation is signicantly associated withstudent test scores in both subjects, and although the motivation ef-fect is signicantly larger in math than in reading, its inclusion hardly19 The student motivation index, scaled 01, is calculated as a simple average of vesurvey questions corresponding to each subject such as I like to read in my free time(reading) or I have fun solving mathematical problems (math), on which studentscan choose disagree/sometimes true/agree. The resulting answer is coded as 0 for theanswer displaying low motivation, 0.5 for medium motivation, and 1 for highthe teacher-student gender match: For girls, studentteacher genderinteractions may be important in facilitating the transmission ofmath knowledge from teachers to students.Although the ndings generally suggest that results can be gen-eralized to most sub-populations, the fact that our main sample ofanalysis the same-teacher one-classroom (STOC) sample is nota representative sample of the Peruvian student population raisesthe question whether results derived on the STOC sample can begeneralized to the Peruvian student population at large. Schoolswith only one 6th-grade class that employ the same teacher toteach both subjects are more likely to be found in rural ratherthan urban areas and in multi-grade schools, and accordingly onaverage perform lower on the tests (see Table A2 in Appendix A).To test whether the sample specics prevent generalization, weperform the following analysis. We rst draw a 25-percent randomsample from the original EN 2004 population, which is smallerthan the STOC sample. We then employ nearest-neighbor propen-sity score matching (caliper 0.01) to pick those observations fromthe STOC sample that are comparable to the initial population.motivation.Table 3The effect of teacher test scores in sub-samples.NoImplied Prob>2 Obs. Cluster)543333770543333770pleon wtheressadju493J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496affects the estimated effects of teacher subject knowledge in the twosubjects.(1) (2) (3) (4Math:Student female 0.078 [0.0002] 2048 20Student 1st language Spanish 0.096 [0.025] 530 11Urban area 0.076 [0.001] 1859 20Private school 0.073 [0.0001] 3436 28Complete school 0.071 [0.006] 1823 21Teacher male 0.082 [0.0004] 2100 15Teacher university degree 0.087 [0.00003] 2815 22Teacher average score above median 0.059 [0.036] 2062 16Student average score above median 0.057 [0.010] 1970 29Reading:Student female 0.005 [0.846] 2048 20Student 1st language Spanish 0.026 [0.637] 530 11Urban area 0.007 [0.805] 1859 20Private school 0.016 [0.417] 3436 28Complete school 0.037 [0.219] 1823 21Teacher male 0.024 [0.313] 2100 15Teacher university degree 0.020 [0.350] 2815 22Teacher average score above median 0.006 [0.872] 2062 16Student average score above median 0.017 [0.520] 1970 29Dependent variable: student test score in math and reading, respectively. For each sub-sam(SUR). Sample: Same-teacher one-classroom (STOC), stratied in two sub-samples basedteacher test score, given by the difference between the estimate on the teacher test score inand the equation of the student test score in the other subject (see Eqs. (3a) and (3b)). Regcomplete school, teacher gender, and teacher university degree. Clustered standard errors (lihood. Signicance at 1, 5,and 10%.Next, we introduce subject-specic teacher measures (column (2)of Table A5). First, we add weekly teaching hours to control for thepossibility that teachers may prefer to teach more in the subjectthey know better, or get to know the subject better which theyteach more. The effect of teaching hours is, however, not signicantin either subject. Second, we add controls for how teachers teachthe curriculum in the two subjects.20 Teachers report for each subjectwhether they use subject-specic books, work books, local, regional,or national guidelines, and other help to design their subject curricu-lum. The eight indicators of teaching methods enter jointly statistical-ly signicantly in both subjects. However, the effect of teacher subjectknowledge is again unaffected by controlling for these subject-specicteacher characteristics. Together, the models with subject-specic con-trols conrm the signicant causal effect of teacher subject knowledgein math.Another concern with the identication, given the relatively largeshare (13%) of students for whom Spanish is not the main language,may lie in the fact that teachers who are not native Spanish speakersmay not only be relatively worse in teaching Spanish compared tomath, but may also exert less effort in teaching Spanish (even aftercontrolling for their knowledge of Spanish). As long as the native lan-guage is correlated between students and their teachers (due to localconcentrations of indigenous peoples), one way to test for such bias isto compare the full-sample estimate to an estimate on the sample ofnative Spanishspeaking students only (shown in Table 3). The effectof teacher test scores on student achievement in this reduced sampleis 0.06 (very close to the full sample) and highly signicant, indicating20 EN 2004 asks teachers to describe which items they use to design the subject cur-riculum: working books, school curriculum, institutional educational projects, regionaleducational projects, 3rd cycle basic curriculum structure, and/or readjusted curricularprograms.that lower relative effort of non-native Spanish teachers in teachingreading is unlikely to drive our results.Yes Differences Implied Prob>2 Obs. Clusters Diff. Prob>2(5) (6) (7) (8) (9) (10)0.050 [0.043] 1888 301 0.028 [0.346]0.060 [0.003] 3406 293 0.036 [0.461]0.063 [0.014] 2077 113 0.013 [0.710]0.022 [0.753] 500 33 0.094 [0.177]0.067 [0.005] 2113 103 0.005 [0.898]0.039 [0.131] 1836 163 0.043 [0.214]0.003 [0.931] 1121 89 0.084 [0.030]0.103 [0.0003] 1874 149 0.044 [0.263]0.076 [0.006] 1966 242 0.019 [0.585]0.033 [0.180] 1888 301 0.038 [0.250]0.011 [0.551] 3406 293 0.015 [0.795]0.025 [0.268] 2077 113 0.018 [0.623]0.016 [0.745] 500 33 0.032 [0.550]0.000 [0.990] 2113 103 0.037 [0.320]0.005 [0.863] 1836 163 0.019 [0.600]0.003 [0.919] 1121 89 0.017 [0.626]0.009 [0.771] 1874 149 0.015 [0.756]0.035 [0.109] 1966 242 0.052 [0.100], regressions in the two subjects are estimated jointly by seemingly unrelated regressionshether characteristic in head column is true or not. Implied represents the effect of therespective subject between the equation of the student test score in the respective subjections include controls for student gender, student 1st language, urban area, private school,sted for clustering at classroom level) in the SURmodels are estimated bymaximum like-4.4. Interpretation of effect sizeIn order to assess whether the overall effect stems from an expo-sure of a student to a teacher for one year or for several years, wecan observe in the data how long each teacher has been with the test-ed class. A caveat with such an analysis is that 6th-grade teachers'score differences between math and reading may be correlated withprevious teachers' score differences between subjects, although it isnot obvious that this a substantial issue in the subject-differencedanalysis. It turns out that 39% of the STOC sample has been taughtby the specic teacher for only one year, an additional 38% for twoyears, and only the remaining 23% for more than two years.Table A6 in Appendix A reports results of estimating the model sep-arately for the three sub-groups. The point estimate in the sub-sampleof students who have been taught by the teacher for only one yearis even larger than the point estimate in our main specication,suggesting that the estimated effect mostly captures a one-year effect.Note that in this sub-sample, there is actually also a (marginally)signicant effect of teacher subject knowledge on student achieve-ment in reading. At the same time, the math effects are of a similarmagnitude in the sub-samples of teachers that have been with theclass for longer than one year. With limited statistical power in thesesmaller samples, it is however hard to reject that the effect increaseswith tenure, at least when there is some fadeout of teacher effects(Kane and Staiger, 2008; see Metzler and Woessmann, 2010 for amore extensive discussion).When the estimates of our main specication are viewed as the ef-fects of having been with a teacher of a certain subject knowledge forone year, the measurement-error corrected estimate of the effect inmath means that if a student who is taught by a teacher at, e.g., the5th percentile of the subject knowledge distribution was moved to aieveamps bae isetwe coted494 J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496teacher with median subject knowledge, the student's math achieve-ment would be raised by 14.3% of a standard deviation by the end ofthe school year. If one raised all 6th-grade teachers who are currentlybelow the 95th (75th, 50th) percentile on the subject knowledge dis-tribution to that percentile, average student performance would in-crease by 13.9 (7.0, 3.5) percent of a standard deviation. Additionalimprovements could be expected from raising teacher knowledge ingrades 1 to 5. Of course, this would mean substantial improvementsin teacher knowledge, but ones that appear conceivable consideringthe low average teacher performance in the given developing-country context.Table 4The effect of teacher test scores in sub-samples matching students and teachers on achAchievement levelImplied Prob>2 Obs. Clusters(1) (2) (3) (4)Math:Teacher low levelStudent low level 0.091 [0.009] 1132 159Student high level 0.017 [0.642] 930 117Teacher high levelStudent low level 0.068 [0.130] 838 131Student high level 0.133 [0.002] 1036 125Reading:Teacher low levelStudent low level 0.039 [0.358] 1132 159Student high level 0.040 [0.472] 930 117Teacher high levelStudent low level 0.022 [0.612] 838 131Student high level 0.020 [0.619] 1036 125Dependent variable: student test score in math and reading, respectively. For each sub-ssions (SUR). Sample: Same-teacher one-classroom (STOC), stratied in four sub-sampleTeacher/student low/high level is dened by whether teacher/student average test scorthe difference between the estimate on the teacher test score in the respective subject bthe student test score in the other subject (see Eqs. (3a) and (3b)). Regressions includschool, teacher gender, and teacher university degree. Clustered standard errors (adjushood. Signicance at 1, 5, and 10%.It is also illuminating to compare our estimated size of the effect ofteacher subject knowledge to the size of the recent estimates of theoverall teacher effect based on relative student test score increasesexerted by different teachers. Rockoff (2004, pp. 247248) concludesfor the U.S. school system that a one-standard-deviation increase inteacher quality raises test scores by approximately 0.1 standard devi-ations in reading andmath on nationally standardized distributions ofachievement. Rivkin et al. (2005) nd a similar magnitude. It is im-possible to relate our results directly to this nding, both becausethe Peru nding may not generalize to the United States and becausethe line-up of teachers along the dimension of student test-score in-creases will not perfectly match the line-up along the dimension ofteacher subject knowledge, so that the two metrics do not necessarilymatch. But it is still interesting to note that our measurement-errorcorrected estimate of the effect of teacher subject knowledge falls inthe same ballpark.5. ConclusionThe empirical literature analyzing determinants of student learn-ing has tackled the issue of teacher impacts from two sides: measur-ing the impact of teachers as a whole and measuring the impact ofdistinct teacher characteristics. Due to recent advances in the rststream of literature using newly available rich panel datasets, it hasbeen rmly established that overall teacher quality is an importantdeterminant of student outcomes, i.e., that teachers differ stronglyin their impact on student learning (Hanushek and Rivkin, 2010).The second stream of literature examines which specic teachercharacteristics may be responsible for these strong effects and consti-tute the unobserved bundle of overall teacher quality. Answers to thisquestion are particularly useful for educational administration andpolicy-making. In the empirical examination of teacher attributessuch as education, experience, salaries, test scores, and certication,only teacher knowledge measured by test scores has reasonably con-sistently been found to be associated with student achievement(Hanushek, 1986; Hanushek and Rivkin, 2006). However, identica-tion of causal effects of teacher characteristics is often hampered byomitted student and teacher characteristics and by non-random se-lection into classrooms.ment and gender.GenderImplied Prob>2 Obs. Clusters(5) (6) (7) (8)Math:Teacher femaleStudent female 0.093 [0.002] 1012 146Student male 0.073 [0.010] 1088 148Teacher maleStudent female 0.014 [0.698] 876 155Student male 0.083 [0.005] 960 157Reading:Teacher femaleStudent female 0.037 [0.230] 1012 146Student male 0.011 [0.713] 1088 148Teacher maleStudent female 0.030 [0.390] 876 155Student male 0.028 [0.454] 960 157le, regressions in the two subjects are estimated jointly by seemingly unrelated regres-sed on whether combined teacher-student characteristic in head column is true or not.below/above median. Implied represents the effect of the teacher test score, given byeen the equation of the student test score in the respective subject and the equation ofntrols for student gender, student 1st language, urban area, private school, completefor clustering at classroom level) in the SUR models are estimated by maximum likeli-This paper proposed a new way to identify the effect of teachersubject knowledge on student achievement, drawing on the variationin subject matter knowledge of teachers across two subjects and thecommensurate achievement across the subjects by their individualstudents. By restricting the sample to students who are taught bythe same teacher in both subjects, and in schools that have only oneclassroom per grade, this identication approach is able to circum-vent the usual bias from omitted student and teacher variables andfrom non-random selection and placement. Furthermore, possiblebias from measurement error was addressed by reverting to psycho-metric test statistics on reliability ratios of the underlying teachertests.Drawing on data on math and reading achievement of 6th-gradestudents and their teachers in Peru, we nd a signicant effect ofteacher math knowledge on student math achievement, but (withfew exceptions of sub-groups) not in reading. A one-standard-deviation increase in teacher math knowledge raises student achieve-ment by about 9% of a standard deviation. The results suggest thatteacher subject knowledge is indeed one observable factor that ispart of what makes up as-yet unobserved teacher quality, at least inmath.While the results are robust in most sub-populations, we also ndinteresting signs of effect heterogeneity depending on the teacher-student match. First, the effect does not seem to be present whenhigh-performing students are taught by low-performing teachers.Second, female students are not affected by the level of teachermath knowledge when taught by male teachers. Thus, effects ofteacher subject knowledge may depend on the subject, the abilityCentre for European Economic Research, Mannheim.79 (1), 19.495J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496Banerjee, Abhijit, Duo, Esther, 2006. Addressing absence. The Journal of EconomicPerspectives 20 (1), 117132.Bedi, Arjun S., Marshall, Jeffery H., 2002. Primary school attendance in Honduras. Journalof Development Economics 69 (1), 129153.Behrman, Jere R., 2010. Investment in education: inputs and incentives. In: Rodrik,Dani, Rosenzweig, Mark (Eds.), Handbook of Development Economics, vol. 5.North-Holland, Amsterdam, pp. 48834975.Behrman, Jere R., Ross, David, Sabot, Richard, 2008. Improving quality versus increasingthe quantity of schooling: estimates of rates of return from rural Pakistan. Journalof Development Economics 85 (12), 94104.Card, David, 1999. The causal effect of education on earnings. In: Ashenfelter, Orley,Card, David (Eds.), Handbook of Labor Economics, vol. 3A. North-Holland, Amsterdam,pp. 18011863.Carrell, Scott E., Page, Marianne E., West, James E., 2010. Sex and science: how professorgender perpetuates the gender gap. Quarterly Journal of Economics 125 (3),11011144.Chamberlain, Gary, 1982. Multivariate regression models for panel data. Journal ofEconometrics 18 (1), 546.Chaudhury, Nazmul, Hammer, Jeffrey, Kremer, Michael, Karthik Muralidharan, F.,Rogers, Halsey, 2006. Missing in action: teacher and health worker absence in de-veloping countries. The Journal of Economic Perspectives 20 (1), 91116.Clotfelter, Charles T., Ladd, Helen F., Vigdor, Jacob L., 2010. Teacher credentials and studentachievement in high school: a cross-subject analysis with student xed effects. TheJournal of Human Resources 45 (3), 655681.Coleman, James S., Campbell, Ernest Q., Hobson, Carol F., McPartland, James M., Mood,Alexander M., Weinfeld, Frederic D., et al., 1966. Equality of Educational Opportunity.Government Printing Ofce, Washington, D.C.Cronbach, Lee J., 1951. Coefcient alpha and the internal structure of tests.Psychometrika 16 (3), 297334.Dee, Thomas S., 2005. A teacher like me: does race, ethnicity, or gender matter? TheAmerican Economic Review 95 (2), 158165.Angrist, Joshua D., Krueger, Alan B., 1999. Empirical strategies in labor economics. In:Ashenfelter, Orley, Card, David (Eds.), Handbook of Labor Economics, vol. 3A.North-Holland, Amsterdam, pp. 12771366.Ashenfelter, Orley, Krueger, Alan B., 1994. Estimates of the economic return to schoolingfrom a new sample of twins. The American Economic Review 84 (5), 11571173.Ashenfelter, Orley, Zimmerman, David J., 1997. Estimates of the returns to schoolingfrom sibling data: fathers, sons, and brothers. The Review of Economics and Statisticsand gender match between teachers and students. Such aspects of ed-ucation production are frequently overlooked, and understanding themechanisms behind them provides interesting directions for futureresearch.Overall, the results suggest that teacher subject knowledge shouldbe on the agenda of educational administrators and policy-makers.Attention to teacher subject knowledge seems to be in order in recruit-ment policies, teacher training practices, and compensation schemes.However, additional knowledge about the relative cost of improvingteacher knowledge both of different means of improving teacherknowledge and compared to other means of improving studentachievement is needed before policy priorities can be established.The results have to be interpreted in the given developing-countrycontext with relatively low academic achievement. Although averagestudent achievement in Peru is close to the average of Latin Americancountries (LLECE, 2008), it is far below the average developed country.For example, when Peruvian 15-year-olds took the Programme for In-ternational Student Assessment (PISA) test in 2002, their averagemath performance was a full two standard deviations below the aver-age OECD country, at the very bottom of the 41 (mostly developed)countries that had taken the test by the time (OECD, 2003). The extentto which the current results generalize to developed countries thus re-mains an open question for future research.Appendix A. Supplementary materialSupplementary material to this article can be found online athttp://dx.doi.org/10.1016/j.jdeveco.2012.06.002.ReferencesAltinok, Nadir, Kingdon, Geeta, 2012. New evidence on class size effects: a pupil xedeffects approach. Oxford Bulletin of Economics and Statistics 74 (2), 203234.Ammermller, Andreas, Dolton, Peter, 2006. Pupilteacher gender interaction effectson scholastic outcomes in England and the USA. ZEW Discussion Paper, 06060.Dee, Thomas S., 2007. Teachers and the gender gaps in student achievement. The Journalof Human Resources 42 (3), 528554.Dee, Thomas S., West, Martin R., 2011. The non-cognitive returns to class size. EducationalEvaluation and Policy Analysis 33 (1), 2346.Ehrenberg, Ronald G., Brewer, Dominic J., 1995. Did teachers' verbal ability and racematter in the 1960s? Coleman revisited. Economics of Education Review 14 (1), 121.Eide, Eric, Goldhaber, Dan, Brewer, Dominic, 2004. The teacher labour market andteacher quality. Oxford Review of Economic Policy 20 (2), 230244.Ferguson, Ronald F., 1998. Can schools narrow the black-white test score gap? In:Jencks, Christopher, Phillips, Meredith (Eds.), The Black-White Test Score Gap.Brookings Institution, Washington, D.C., pp. 318374.Ferguson, Ronald F., Ladd, Helen F., 1996. How and why money matters: an analysis ofAlabama schools. In: Ladd, Helen F. (Ed.), Holding Schools Accountable:Performance-based Reform in Education. Brookings Institution, Washington, D.C.,pp. 265298.Filmer, Deon, 2007. Educational Attainment and Enrollment around the World. Develop-ment Research Group, The World Bank (Peru 2004 data last updated 25 Sep 2007)[accessed 24 March 2012]. Available from econ.worldbank.org/projects/edattain.Glewwe, Paul, Kremer, Michael, 2006. Schools, teachers, and education outcomes indeveloping countries. In: Hanushek, Eric A., Welch, Finis (Eds.), Handbook of theEconomics of Education, vol. 2. North-Holland, Amsterdam, pp. 9451017.Hanushek, Eric, 1971. Teacher characteristics and gains in student achievement: esti-mation using micro data. The American Economic Review 61 (2), 280288.Hanushek, Eric A., 1986. The economics of schooling: production and efciency in publicschools. Journal of Economic Literature 24 (3), 11411177.Hanushek, Eric A., 1992. The trade-off between child quantity and quality. Journal ofPolitical Economy 100 (1), 84117.Hanushek, Eric A., 1997. Assessing the effects of school resources on student perfor-mance: an update. Educational Evaluation and Policy Analysis 19 (2), 141164.Hanushek, Eric A., Rivkin, Steven G., 2006. Teacher quality. In: Hanushek, Eric A., Welch,Finis (Eds.), Handbook of the Economics of Education, vol. 2. North-Holland, Amster-dam, pp. 10511078.Hanushek, Eric A., Rivkin, Steven G., 2010. Generalizations about using value-added measures of teacher quality. The American Economic Review 100 (2),267271.Hanushek, Eric A., Woessmann, Ludger, 2011. The economics of international differencesin educational achievement. In: Hanushek, Eric A., Machin, Stephen, Woessmann,Ludger (Eds.), Handbook of the Economics of Education, vol. 3. North Holland,Amsterdam, pp. 89200.Hanushek, Eric A., Woessmann, Ludger, 2012. Schooling, Educational Achievement,and the Latin American Growth Puzzle. Journal of Development Economics 99(2), 497512.Harbison, Ralph W., Hanushek, Eric A., 1992. Educational performance of the poor:lessons from rural northeast Brazil. A World Bank Book. Oxford UniversityPress, Oxford.Hargreaves, Eleanore, Montero, C., Chau, N., Sibli, M., Thanh, T., 2001. Multigrade teachingin Peru, Sri Lanka and Vietnam: an overview. International Journal of Educational De-velopment 21 (6), 499520.Kane, Thomas J., Staiger, Douglas O., 2008. Estimating teacher impacts on studentachievement: an experimental evaluation. NBER Working Paper, 14607. NationalBureau of Economic Research, Cambridge, MA.Laboratorio Latinoamericano de Evaluacin de la Calidad de la Educacin (LLECE),2008. Los Aprendizajes de los Estudiantes de Amrica Latina y el Caribe: PrimerReporte de los Resultados del Segundo Estudio Regional Comparativo y Explicativo.Ocina Regional de Educacin de la UNESCO para Amrica Latina y el Caribe(OREALC/UNESCO), Santiago, Chile.Lavy, Victor, 2010. Do differences in school's instruction time explain internationalachievement gaps in math, science, and reading? Evidence from developed anddeveloping countries. NBER Working Paper, 16227. National Bureau of EconomicResearch, Cambridge, MA.Lavy, Victor, 2011. What makes an effective teacher? Quasi-experimental evidence. NBERWorking Paper, 16885. National Bureau of Economic Research, Cambridge, MA.Luque, Javier, 2008. The Quality of Education in Latin America and the Caribbean: theCase of Peru. Abt Associates, San Isidro.Metzler, Johannes, Woessmann, Ludger, 2010. The impact of teacher subject knowledgeon student achievement: evidence from within-teacher within-student variation.CESifo Working Paper, 3111. CESifo, Munich.Murnane, Richard J., Phillips, Barbara R., 1981. What do effective teachers of inner-citychildren have in common? Social Science Research 10 (1), 83100.Organisation for Economic Co-operation and Development (OECD), 2003. Literacy Skillsfor the World of Tomorrow: Further Results from PISA 2000. OECD, Paris.Rivkin, Steven G., Hanushek, Eric A., Kain, John F., 2005. Teachers, schools, and academicachievement. Econometrica 73 (2), 417458.Rockoff, Jonah E., 2004. The impact of individual teachers on student achievement:evidence from panel data. The American Economic Review 94 (2), 247252.Rockoff, Jonah E., Jacob, Brian A., Kane, Thomas J., Staiger, Douglas O., 2011. Can yourecognize an effective teacher when you recruit one? Education Finance and Policy6 (1), 4374.Rothstein, Jesse, 2010. Teacher quality in educational production: tracking, decay, andstudent achievement. Quarterly Journal of Economics 125 (1), 175214.Rowan, Brian, Chiang, Fang-Shen, Miller, Robert J., 1997. Using research on employees'performance to study the effects of teachers on students' achievement. Sociologyof Education 70 (4), 256285.Schwerdt, Guido, Wuppermann, Amelie C., 2011. Is traditional teaching really all thatbad? A within-student between-subject approach. Economics of Education Review30 (2), 365379.Summers, Anita A., Wolfe, Barbara L., 1977. Do schools make a difference? The AmericanEconomic Review 67 (4), 639652.Tan, Jee-Peng, Lane, Julia, Coustre, Paul, 1997. Putting inputs to work in elementaryschools: what can be done in the Philippines? Economic Development and CulturalChange 45 (4), 857879.Wayne, Andrew J., Youngs, Peter, 2003. Teacher characteristics and student achievementgains: a review. Review of Educational Research 73 (1), 89122.World Bank, 2007. Toward high-quality education in Peru: standards, accountability,and capacity building. A World Bank Country Study. The World Bank, Washington,D.C.496 J. Metzler, L. Woessmann / Journal of Development Economics 99 (2012) 486496The impact of teacher subject knowledge on student achievement: Evidence from within-teacher within-student variation1. Introduction2. Empirical identification2.1. The correlated random effects model2.2. Correcting for measurement error3. Data and descriptive statistics4. Results4.1. Main results4.2. Evidence on heterogeneous effects4.3. Specification tests4.4. Interpretation of effect size5. ConclusionSupplementary materialReferences