Teachers and the Gender Gap in Reading Achievement

Teachers and the Gender Gap in Reading Achievement∗

Esteban M. Aucejo†

Arizona State University

Jane Cooley Fruehwirth‡

University of North Carolina

Sean Kelly§

University of Pittsburgh

June 30, 2020

Abstract

Boys persistently lag behind girls in English/language arts. We find that hetero-geneity in teachers’ relative boy-specific value-added explains a large proportion of thisgap. We exploit multifaceted measures of effective teaching, including popular teacherobservation protocols, principal ratings and student perceptions of teaching practicesto explain this heterogeneity. We find no evidence of heterogeneous effects of theseteacher measures by gender. Instead, we show that gender gaps in student evaluationsof teaching practices capture meaningful differences in the quality of instruction boysand girls receive from the same teacher, explaining from a third to all of the value-addedgender gap.

Keywords: gender gap, teaching practices, teacher effectiveness

JEL Classification Codes: I2, I20, I21

∗This research was supported by the Institute of Education Sciences, U.S. Departmentof Education, through Grant R305A170269 to University of North Carolina, Chapel Hill.The opinions expressed are those of the authors and do not represent views of the Instituteor the U.S. Department of Education. We thank Robert Bringe for his excellent researchassistance and Ken Bollen, Cassie Guarino, Laura Hamilton, Spyros Konstantopoulos, andLindsay Matsumara for helpful comments.†Dept of Economics, Arizona State University, CEP & NBER. Esteban.Aucejo@asu.edu‡Dept of Economics & Carolina Population Center, UNC. jane fruehwirth@unc.edu§School of Education, University of Pittsburgh. spkelly@pitt.edu

1 Introduction

Boys persistently lag behind girls in reading achievement by as much as 0.29 of a standard

deviation in National Assessment of Education Progress scores, which is associated with

approximately a year of learning (Loveless, 2015). Reardon et al. (2016) finds that the gap

in state standardized tests of reading achievement (or English language arts (ELA)) is on

average 0.23 of a standard deviation or about two-thirds of a year of learning. Reading

skills are essential building blocks for learning, and early struggles with basic literacy skills

set the stage for a process of cumulative disadvantage (Chatterji, 2006; Northrop, 2017;

Senechal and LeFevre, 2002). By the end of high school, verbal skills are shown to be more

important determinants of college attendance than math skills (Aucejo and James, 2016).

Given the importance of teachers to student learning (Koedel et al., 2015; Jackson et al.,

2014; Konstantopoulos, 2014), we study how much teachers contribute to the gender gap in

ELA performance in elementary school.

To this end, we implement a multi-pronged approach that relies on multiple measures

of teacher and student performance. First, we make use of student test scores to recover

teacher gender-specific value-added (VA) estimates. In particular, we explore the extent

that certain teachers can be more effective with boys relative to girls by documenting how

their value-added varies across gender groups using the estimator proposed in Chetty et al.

(2014) broken out by student gender. This analysis informs the overall teacher potential

to close the ELA gap. Second, we exploit rich objective measures of teacher effectiveness,

based on popular protocols designed to assess effective instruction using trained raters (i.e.

CLASS and FFT, along with the ELA-specific protocol, PLATO), to determine whether

boys and girls receive different marginal benefits from having teachers who score highly

on these evaluations. Third, we exploit summative measures of teacher quality/knowledge,

based on principal evaluations of teacher overall quality, teacher training, experience, and

the Content Knowledge for Teaching Assessment of teacher knowledge, to see whether boys

or girls have different marginal benefits from teacher quality/knowledge. Fourth, we exploit

data on student evaluations of teaching practices/effectiveness based on Ferguson’s Tripod

7C’s,(Ferguson, 2008) to assess whether boys and girls receive different teaching practices

within the classroom.1 Based on this diverse and rich set of teacher performance measures,

we believe this paper is the first to use a large-scale and multi-perspective database to

robustly interrogate how instructional processes might mediate gendered learning outcomes.

In our specifications, we focus on gender gaps in ELA value-added, conditioning both on

prior math and ELA performance (consistent with the literature measuring teacher value-

added) in order to help isolate the effect of the current teacher. We refer to this throughout

the paper as the gender gap in ELA. This gap is 0.08 of a standard deviation in test scores,

and while still sizable, it is only half the size of the raw gender gap in ELA. Even after focusing

on value-added, several identification challenges remain for determining the role of teachers

in explaining this gap. First, if boys (relative to girls) are systematically matched to certain

type of teachers, we may confound teacher effects with student characteristics. Balancing

tests suggest that this is not the case: none of the rich measures of effective teaching predict

student gender. Moreover, we also find that estimates of the achievement gap remain robust

after controlling for school-grade fixed effects, measures of classroom composition and class

fixed effects. The gender gap would not be stable across these regressions if boys were

systematically matched to less effective teachers.2

1Tripod is designed to capture multiple dimensions of teaching, including several aspectsof the student-teacher relationship. For instance, teachers and boys may fundamentallystruggle to relate to each other in the same ways as teachers and girls do, and therefore theymay be less happy in class or find school-work less interesting.

2 It is important to clarify that while a sub-sample of the MET data collects informationfrom teachers that are randomly allocated into classrooms within school-grade (i.e. ran-domization blocks), our analysis does not rely on it (for most part of the analysis) due totwo main reasons. First, a high compliance “random sample” (which involves a sufficientlylarge share of students not re-shuffling across classes after the random allocation of teach-

Second, measurement error in our teacher-effectiveness-related measures may make it dif-

ficult to detect heterogeneous effects of teachers by student gender. This may be particularly

problematic for the observation protocols–CLASS, FFT and PLATO–where inter-rater relia-

bility suggests significant measurement error (Kelly et al., 2020). To overcome this problem,

we implement an instrumental variable approach where contemporaneous teacher-related

measures are instrumented with lagged measures based on the previous class the teacher

taught.

Finally, to use student survey responses to analyze whether teaching practices are applied

differently to boys and girls in the same classroom we must deal with the important issue of

confounding unobservable characteristics of the student (e.g., student bias) with actions of

the teacher. For instance, if a student reports that schoolwork is less engaging, this could

be because he/she does not like school regardless of the teacher or it could relate to what

the teacher is doing in the classroom. We separate the teacher effect from the student effect

by instrumenting the student report with the average rating of the teacher for boys and

girls from the teacher’s classroom in the previous year. Absent matching within schools,

these instruments are independent of student unobservable characteristics that might also

determine their survey responses and ELA achievement. Because we are over-identified, the

test of overidentifying restrictions can provide support that matching on unobservables is

not driving our findings.

We find that the magnitude of the heterogeneity in teacher’s relative male-specific value-

ers) would require to reduce our final sample by more than 60%, decreasing substantiallythe variation in the data. Second, matching based on gender is less concerning given thatclasses tend to be gender balanced. Therefore, this empirical regularity combined with thefact that our specifications control for student lagged reading achievement and school-gradefixed effects (or class fixed effects when possible) make the concern on matching based ongender much less important. Nevertheless, we also present results (corresponding to ourmain specification) based on the random sample. We find that that key coefficients of in-terest show similar magnitude to those obtained from the non-random sample, however thestandard errors become much larger.

added is of an equal magnitude to the gender gap, suggesting that some teachers do a par-

ticularly good job in improving the performance of boys. In fact, the standard deviation of

teacher male-specific value-added is 0.08, which is comparable to overall estimates of teacher

value-added in the literature. Despite these large differences in male-specific value-added, we

find no evidence that the marginal benefits of teacher effectiveness vary by gender, whether

we measure teacher effectiveness through observation protocols of effective instruction, prin-

cipal survey ratings, average student survey ratings or teacher aptitude and even after dealing

with potential measurement error. We also find no evidence of different marginal benefits

of teacher gender or experience for boys. While this is bad news for explaining the gender

gap, on the positive side it suggests that popular teacher evaluation protocols may not be

systematically biased toward practices that favors girls over boys.

Finally, we find gender disparities in Tripod evaluations to be a striking feature of the

MET data, consistent with prior research suggesting that boys tend to evaluate teachers

less positively than girls (e.g. Entwisle et al., 1997). However, we break from the literature

in suggesting that these gaps may reflect real differences in experience that translate into

learning, rather than student bias.3 After isolating the teacher-related component of these

gaps, we find that gender gaps in teaching practices within the classroom explain from a third

to all of the ELA gap, depending on the domain of instruction. Moreover, we present evidence

that the two leading practices for explaining the achievement gap, including captivate (e.g.,

homework and schoolwork are interesting) and confer (e.g., “my teacher wants me to explain

my answers”) contribute to student engagement as well as ELA.4 Overall, our findings suggest

that teachers could play an important role in closing the gender gap in ELA by applying

practices that enhance boys’ engagement with schooling.

Our study relates to a significant body of research that attempts to explain gender

3For example, Mengel et al. (2019) shows that that women university instructors receivesystematically lower teaching evaluations (from their students) than their male colleagues.

4Appendix Table A.1 provide the questions that make up these domains.

achievement gaps. First, Reardon et al. (2016) documents substantial heterogeneity in gen-

der gaps across school districts, which could suggest a role for school policies in contributing

to literacy gaps. One explanation that has received some support in the literature is that

males learn differently from females (Gurian and Stevens, 2004; Gurian, 2010). This would

suggest that some teaching practices might benefit females more than males and vice versa.

However we do not find that instructional processes promoted by current teacher observation

and evaluation protocols (i.e. CLASS, FFT and/or PLATO) inadvertently favor females.

Second, our research connects with the literature on gender gaps in socio-emotional skills.

For example, Bertrand and Pan (2013); Figlio et al. (2019); Aucejo and James (2019) show

that gender differences in non-cognitive skills play an important role in explaining gender

gaps in schooling progression. In a similar vein, Cornwell et al. (2013) concludes that girls

display a more developed attitude toward learning, which is consistent with the lower levels

of boys’ schooling engagement that we find in our data. However, these studies do not

provide an analysis of how to explicitly overcome gender differences in skills. Our results

contribute by focusing on how teachers, through their instruction, could impact boys more

effectively and compensate for their lower levels of school engagement.

Third, a strand of the literature argues that gender interactions between teachers and

students have an important effect on educational outcomes. In particular, Dee (2007) shows

that a large fraction of the gender gap in reading reflects the classroom dynamics associated

with the fact that boys’ reading teachers are mainly females. While we do not recover a

similar pattern in our data, our research contributes by focusing on the teaching practices

that might be used to better engage boys regardless of teacher gender.5 By identifying the

likely specific channels through which some teachers are more successful at teaching boys,

we open up the possibility of training teachers to be more effective with boys rather than

5We do not have many male teachers in our sample, which could explain the differencesin findings.

relying solely on the more limited policy of matching boys to male teachers.6

Finally, the literature has also explored the role of teacher bias. If present, bias is perhaps

most readily revealed in grading practices, where teachers have discretion. Research using

blind vs non-blind classroom assessments finds mixed evidence on whether males or females

are favored by teachers. Lavy and Sand (2015) show the teacher bias (measured as blind

(national) vs non-blind (classroom) assessment) helps explain the math/science gender gap

in favor of males. Terrier (2015) argues females benefit from positive discrimination in math

but not French (using blind v. non-blind). However, given that we are focusing on state

assessments, our analysis does not speak to direct bias in the form of reduced ratings of

students’ academic work. Nevertheless, bias could operate indirectly, and more pervasively,

in ways that detract from learning. For instance, if boys consistently receive less enthusiastic

feedback during class discussions in ELA, they may perceive that they are not good at ELA

tasks and underperform, and subsequently get even less attention from the teacher.7 Our

findings do not directly analyze teacher bias, other than to acknowledge that it is one of

possibly many ways that teachers may differ in their capacity to teach boys. Disparities in

teacher attention within the classroom could be a result of other factors, such as teacher

training and optimal responses to student effort/behavior.8

The rest of the paper proceeds as follows. We first describe the data in Section 2, including

6The literature has also pointed out the role of the cultural environment. Legewie andDiPrete (2012) argues that boys’ underachievement derives from cultural norms that defineschool achievement, and especially ELA content, as not masculine. As one of Smith andWilhelm (2002)’s participants put it, “Reading Don’t Fix No Chevys.” The lower levelsof male engagement with schooling activities that we find in our database are consistentwith this explanation, however our goal is to identify whether and how teachers can alsocompensate for those cultural mandates.

7Bassi et al. (2018) shows, in the context of the Chilean education system, that teachersshow an imbalance in their attention and interactions that favors boys.

8Jackson (2016) finds some evidence that teachers have higher ratings on individual atten-tion in single-sex schools and higher warmth for both males and females, which he interpretsas evidence of focus effects of teachers in single-sex settings and could point to a genderedaspect of teaching and relate to our findings in interesting ways.

all our measures of teacher effectiveness. Section 3 further characterizes the gender gap in our

sample and presents teacher value-added estimates by gender. Section 4 studies the presence

of heterogeneous teacher effects. Section 5 analyzes how teaching practices are applied to

females and males within the classroom and their impact in the gender gap. Section 6

explores plausible mechanisms. Finally, Section 7 concludes.

2 Data

We use the Measure of Effective Teaching (MET) Longitudinal Database for its extensive

information about student outcomes, teacher evaluation protocols, student assessment of

teachers and classroom composition. The data come from six large urban public school

districts in the United States over two academic years (2009-2010 and 2010-2011).9 The

data are also linked to administrative records with detailed information about students and

teachers. For students, we have current and prior measures of achievement based on state

standardized test scores and background characteristics–age, race/ethnicity, gender, gifted

status, and English language learner (ELL) status. Students are linked to their teachers.

We also have self-reported measures of engagement, including whether the student is happy

in class, homework completion and effort, as described further in Table A.1.

We focus on elementary students (grades 4 to 5) since they are primarily in self-contained

classrooms and would have more sustained exposure to their given teacher. In addition, we

rely on the second year of the data (2010/11) because the availability of lagged measures

of teacher effectiveness are an important part of our identification strategy. This yields

9The original 6 districts include New York City Department of Education, Charlotte-Mecklenburg Schools, Denver Public Schools, Memphis City Schools, Dallas IndependentSchool District, and Hillsborough County Public Schools, but only 5 have video observationdata of teaching practice. Kane and Staiger (2012) provides a detailed description of howschools were selected to participate in the MET project. More importantly, Kane and Staiger(2012) argues that MET teachers are comparable by most measures to their non-MET peersin the district, suggesting that they are representative of the districts included.

a potential sample of 13,552 students. We restrict the sample to include only students

with available information from the Tripod survey, which drops 2734 observations. We lose

an additional 929 observations because of missing ELA or math test scores. We lose an

additional 20 observations because the gender or age of the student is not reported. We also

limit the sample to students that have an ELA teacher observed in the MET study, which

drops an additional 1277 observations. Finally, we lose 3 additional observations for lack

of classmates after these restrictions. This brings the estimation sample to 8589 students.

Randomization of teachers to classrooms was an important part of the MET study, but this

only applies to a significantly smaller part of the sample, about a third to a half, depending

on whether you restrict to high compliance randomization blocks. We choose here not to

focus on the randomization sample because of the loss in power and the sense that it was

unnecessary for identification for our question.

2.1 Measuring Teacher Effectiveness

The MET includes rich data on diverse measures of teacher effectiveness–ranging from obser-

vation protocol designed to measure effective instruction, summative assessments of teacher

quality/knowledge, teacher characteristics and student evaluations of teaching practices. The

Content Knowledge for Teaching (CKT) assessment measures the teachers’ ELA knowledge

and specialized content knowledge to teach ELA effectively. Principals are also asked to rate

the teacher’s overall effectiveness on a scale of 1 to 7. Finally, the data also include measures

of teacher experience, which has been shown frequently to matter for student achievement,

but it is only measured based on experience in the district and is therefore very noisy. We

describe the observation-based and student-survey based measures further here.

Teacher Observation Protocol For this analysis we focus on three observation protocols

including Framework for Teaching (FFT), Classroom Assessment Scoring System (CLASS)

and the Protocol for Language Arts Teaching Observation (PLATO). These instruments

provide scores from trained raters on several dimensions of teaching. They are all based

on rigorous research and are aligned with established teaching practices and standards. As

such, we refer to these as measures of effective instruction. Trained raters scored teacher

lessons by watching video recordings and reliability checks were performed at different times

to test and ensure that videos were being appropriately rated.

FFT was designed for use in a variety of academic subjects and grade levels as an in-

strument for general teaching principles. It includes four domains, 2 of which are evaluated

in MET: (1) the classroom environment and (2) instruction. These domains were scored

on a total of eight components (sub-domains) on a four-point scale: unsatisfactory, basic,

proficient, or distinguished (Danielson, 2011).

CLASS is a standardized observational system that, like FFT, is designed for use in a

variety of subjects and grades as an instrument for general teaching principles (Hamre et

al., 2013). CLASS focuses in particular on the quality of teacher-student interactions and

is organized into three domains: (1) emotional support, (2) classroom organization, and (3)

instructional support. Teachers are rated on a 7-point scale labeled simply from low to high.

While there is overlap in the constructs measured by FFT, CLASS has an especially strong

focus on emotional support in defining that at the domain level (Hamre et al., 2013).

PLATO is a classroom observation tool designed to assess fourth to ninth grade ELA in-

struction (Grossman et al., 2013). PLATO focuses on 13 elements of instruction, 8 of which

are included in MET: intellectual challenge, modeling, strategy use and instruction, guided

practice, classroom discourse, text-based instruction, behavior management and time man-

agement. The elements are scored on a 4-point scale (almost no evidence, limited evidence,

evidence with some weaknesses, consistent strong evidence). While there is conceptual over-

lap with other observational protocols, the PLATO elements have their origin specifically in

research on English language arts instruction and are closely linked specifically with literacy

learning.

We create standardized versions of these measures to use in the analysis by averaging

across the different domains and raters. Each teacher was rated by multiple trained raters

and each domain is made up of several subdomains. Averaging over these multiple measures

help to deal with measurement error, though we discuss the potential for measurement error

to bias our results and our empirical strategy for testing this in Section 4 further.10

Student Survey Data Our student-survey-based measures of teaching practices come

from Ferguson (2008)’s Tripod or 7C’s survey. The survey was designed to measure student

perceptions of classroom instruction in 7 dimensions, including care, control, clarify, chal-

lenge, captivate, confer, and consolidate. Care measures students’ perceptions of whether

they feel encouraged and cared for by the teacher. Control measures student perceptions

about classroom behavior. Clarify measures student perceptions of teaching practices aimed

at helping them better understand classroom material. Challenge measures student percep-

tions of rigor and effort needed in the classroom. Captivate measures whether students find

schoolwork and homework to be interesting/enjoyable. Confer measures how much students

perceive they are encouraged to participate in class. Finally, Consolidate measures whether

students perceive that teachers explain how they can do better and summarize what they

learn each day.

Each dimension of the Tripod survey is associated with a set of statements (anywhere

from 2-8 statements for a given dimension). Students rate their agreement with a five-

point scale (1-Totally Untrue to 5-Totally True). Appendix Table A.1 describes the full set

of survey questions for each dimension. We create standardized versions of each of the 7

domains by taking averages across the responses of a given student and then standardizing

them. We also create a composite overall 7C score by averaging across all domains and then

10We also considered each subdomain separately, but focus only on the overall rating asthe results were similar across subdomains.

standardizing.11

2.2 Summary Statistics

Table 1 provides summary statistics for student characteristics, test scores and engagement

(Panel A), classroom characteristics (Panel B), and Tripod responses (Panel C). The first

3 columns present the means, standard deviation and number of observations for males,

the next 3 for females and the last 2 columns the difference in means and and a test of

whether the boy/girl means are statistically significantly different from each other.12 Girls

have statistically significantly higher test scores for ELA and lagged ELA (i.e. 0.16 of a

standard deviation in test scores) and demonstrate higher engagement with school by all 3

measures.13 That said, we do not see that girls are more likely to be placed in gifted classes

or that they are less likely to be designated English language learners. Most aspects of

student background are similar for boys and girls as might be expected, with the exception

that boys are statistically significantly older on average by .07 of a year (possibly consistent

with higher rates of retention or red-shirting), suggesting that controlling for age may be

important.14

Figure 1 presents K-density plots for ELA and math achievement separately for males

11Since many teachers in the former grades taught both math and ELA, these classeswere randomly split, with half of the class filling out the survey for math and the other halfemphasizing ELA. Because we did not see much evidence that student responses varied bysubject and our results were similar, we decided not to distinguish in the results we reporthere whether a student was reporting for ELA or math.

12Appendix Table A.2 shows analogous summary statistics but for the unrestricted sample(i.e. 13,552 students). A comparison between samples indicate that the distribution of mostdemographics and gender differences follow very similar patterns.

13This is consistent with previous findings in the literature (Cornwell et al., 2013; Bertrandand Pan, 2013; Figlio et al., 2019; Aucejo and James, 2019).

14Note that some of these control variables have fewer observations due to missing values.In regressions, we will control for this by using the standard technique of replacing missingvalues with 0’s and controlling for an indicator that they are missing.

and females. Girls outperform boys in ELA across the distribution. By contrast, the math

density plot shows similar performance for males and females across the distribution. We test

for statistical differences in the density for females and males using the Komolgorov-Smirnov

test and find that they are statistically significantly different for ELA with a p-value below

0.001, but not for math, with a p-value of 0.15.

Panel B of Table 1 shows that classroom characteristics are quite balanced between males

and females suggesting that non-random assignment of students to classrooms based on

gender may not be an issue.15 Table 2 shows summary statistics of teacher characteristics and

measures of teacher effectiveness for boys and girls.16 Most importantly, the characteristics

and measures of teacher effectiveness do not differ significantly between boys and girls,

suggesting again that boys and girls do not seem to be systematically assigned to different

teachers or classroom characteristics. Overall, a little under 10% of the students in the sample

have a male teacher. About a third of the teachers are black and only 6% are Hispanic,

despite black and Hispanic students each making up about a third of the population.17

Figure 1: ELA and Math Densities for Males and Females

−4 −2 0 2 4ELA Achievement

Male Female

Density for ELA Achievement by Gender

−4 −2 0 2 4Math Achievement

Male Female

Density for Math Achievement by Gender

15The only difference is in percent male, which is by construction given that it excludesthe student’s own observation, so will be higher for females. The balancing tests in Table 3show that boys are not systematically assigned to different types of classrooms.

16Appendix Table A.3 shows analogous summary statistics but for the unrestricted sample(i.e. 13,552 students). We do not find evidence suggesting large differences between samples.

17Appendix Table A.4 shows the correlations between these measures.

Table 1: Student Summary Statistics by GenderMale Female Male–Female

Mean SD N Mean SD N Mean P-value

Panel A: Student CharacteristicsELA(2009) 0.01 0.97 4227 0.17 0.93 4362 -0.16 0.00ELA(2010) 0.03 0.97 4227 0.19 0.91 4362 -0.16 0.00Effort 3.97 1.07 4225 4.15 1.01 4356 -0.18 0.00Happy 3.94 1.05 4194 4.10 1.01 4343 -0.15 0.00Homework Complete 0.73 0.44 4113 0.81 0.39 4285 -0.08 0.00Age 9.41 0.93 4227 9.34 0.91 4362 0.07 0.00Gifted 0.09 0.29 4227 0.10 0.30 4362 -0.01 0.37English Language Learner (ELL) 0.14 0.35 4227 0.13 0.34 4362 0.01 0.06Free Reduced Price Lunch (FRPL) 0.49 0.50 3020 0.50 0.50 3113 -0.01 0.56White 0.25 0.43 4199 0.24 0.43 4305 0.00 0.72Black 0.41 0.49 4199 0.41 0.49 4305 0.00 0.76Hispanic 0.26 0.44 4199 0.26 0.44 4305 0.00 0.74Asian 0.07 0.25 4199 0.06 0.24 4305 0.01 0.29Race Other 0.02 0.15 4199 0.02 0.15 4305 0.00 0.44Grade Level 4.54 0.50 4227 4.54 0.50 4362 0.01 0.63

Panel B: Class CharacteristicsAvg. Lag Math 0.06 0.51 4227 0.08 0.51 4362 -0.02 0.09Avg. Lag ELA 0.06 0.50 4227 0.08 0.50 4362 -0.02 0.07Avg. Age 9.40 0.82 4227 9.40 0.82 4362 0.00 0.94% Male 0.50 0.10 4227 0.50 0.10 4362 -0.01 0.01% Black 0.42 0.36 4199 0.41 0.36 4305 0.01 0.49% Hispanic 0.26 0.26 4199 0.25 0.26 4305 0.00 0.93% Asian 0.06 0.11 4199 0.06 0.11 4305 0.00 0.99% Race Other 0.02 0.04 4199 0.02 0.04 4305 0.00 0.30% Gifted 0.09 0.17 4227 0.09 0.17 4362 -0.01 0.10% ELL 0.14 0.18 4227 0.14 0.18 4362 0.00 0.29% FRPL 0.49 0.31 3020 0.49 0.31 3113 0.00 0.98

Panel C: Student Tripod ResponsesClarify 4.20 0.58 4227 4.27 0.56 4362 -0.07 0.00Care 4.13 0.74 4227 4.23 0.73 4362 -0.10 0.00Challenge 4.26 0.70 4227 4.31 0.69 4362 -0.05 0.00Consolidate 3.86 0.95 4227 3.89 0.97 4362 -0.03 0.11Captivate 3.60 0.84 4227 3.74 0.80 4362 -0.14 0.00Control 3.52 0.73 4227 3.53 0.73 4362 0.00 0.76Confer 4.22 0.60 4227 4.31 0.57 4362 -0.10 0.00All 7Cs 3.97 0.54 4227 4.04 0.53 4362 -0.07 0.00

Notes: The sample sizes 4362 for females and 4227 for males refer to the core sample; some variablesin this table have fewer observations, as discussed in Section 2. The last column reports whether themeans are statistically significantly different between males and females. The classroom-characteristicvariables in Panel B are calculated excluding each individual student. The tripod survey domainsin Panel C were constructed by averaging over the relevant questions. All 7Cs averages over the 7different domains.

Table 2: Teacher Descriptive Statistics by GenderMale Female Male–Female

Mean SD N Mean SD N Mean P-value

7C(2010) 4.01 0.25 4227 4.01 0.25 4362 0.00 0.49FFT(2010) 2.67 0.25 3541 2.67 0.25 3625 0.00 0.79CLASS(2010) 4.57 0.36 3541 4.58 0.36 3625 -0.01 0.18PLATO(2010) 2.70 0.23 3541 2.70 0.23 3625 0.00 0.477C(2009) 3.95 0.27 4093 3.95 0.26 4224 0.01 0.30FFT(2009) 2.66 0.24 3508 2.66 0.24 3598 0.00 0.49CLASS(2009) 4.57 0.40 3522 4.57 0.40 3611 0.00 0.85PLATO(2009) 2.66 0.27 3487 2.66 0.26 3576 0.00 0.91Principal Survey 4.32 1.16 3528 4.36 1.15 3652 -0.04 0.14CKT Score -0.02 1.00 3628 0.02 1.00 3702 -0.04 0.13Years of Exp. 6.49 5.89 2808 6.38 6.01 2899 0.10 0.52Male 0.09 0.29 4066 0.08 0.28 4202 0.01 0.14Black 0.32 0.47 4066 0.32 0.47 4202 0.00 0.72White 0.60 0.49 4066 0.61 0.49 4202 -0.01 0.30Hispanic 0.06 0.25 4066 0.06 0.24 4202 0.00 0.35Notes: P-value in the last column tests whether the male and female means arestatistically significantly different. The 7C variable the average student score byclass. FFT, CLASS and PLATO are also calculated as averages across all ratersand domains. For this analysis, we consider every possible response when calcu-lating this teacher average, so we include both ELA and Math responses in thecase where the teacher is instructing both subjects in one class.

Finally Panel C of Table 1 shows that there are striking raw differences in Tripod sur-

vey responses between boys and girls. Note that while we present raw measures here to

illustrate the magnitudes, in our empirical strategy we standardize these measures for easier

interpretation of effect sizes. In most cases, girls respond more favorably than boys, with

the exception of Control and Consolidate.18 Recall that a rating of 5 is indicative of more

positive regard for the teacher in that dimension, 3 is neutral and 1 is the lowest rating. We

see that the averages are above 4 for Clarify, Care, Challenge and Confer, but drop to rang-

ing between 3.5 and 4 for Captivate, Consolidate and Control. The statistically significant

18The lack of a gender gap in these measures likely stems from the fact that all the questionsaim at the classroom behavior rather than the individual students behavior or perceptionsof the teacher’s actions toward the student.

gender gaps range from .14 for Captivate to .05 for Challenge. These differences could be

a result of teacher actions or unobserved student attributes, a feature we explore further in

Section 5.

2.3 Balancing tests

We provide further evidence that boys are not systematically matched to teachers or class-

rooms in our sample through simple balancing tests in Table 3. We regress each measure of

teaching practice on whether the student is male conditional on school-grade fixed effects.

Previous literature has shown that a potential problem with contemporaneous measures of

teaching practice (either observation protocol or student/principal evaluations) is that they

may be affected directly by classroom composition (Campbell and Ronfeldt, 2018; Steinberg

and Garrett, 2016; Kelly et al., 2020), either because raters are biased, teachers adapt or

measures intrinsically capture some features of the classroom as well as the teacher. Thus,

our analysis will rely more heavily on lagged measures of the practice, in which case the

main endogeneity concern is matching. To this end, the balancing tests report both lagged

and contemporaneous practices. We find that none of our measures of teacher effectiveness,

quality or practice are statistically significant predictors of the male student dummy and

coefficients are very small in magnitude, whether we include contemporaneous or lagged

measures.

3 Gender Gap in ELA

In this section, we further develop evidence on the extent that teachers may help to explain

the gender gaps in ELA. We do this first by showing how the gap changes with various

controls that are not directly based on the teacher. Then, we provide an estimate of the

heterogeneity in teacher effectiveness for boys and girls.

solidat

Lagged

coeffi

andφM j

andφj

3.1 Conditional Gender Gap

Table 4 presents a set of OLS regressions where the dependent variable is ELA score and

the independent variables are gender, and then we add in different sets of controls across the

columns including, prior achievement, student background characteristics, classroom com-

position, and school/grade/classroom fixed effects (depending on the specification). Column

(1) shows that the raw gender gap in ELA is 0.16 of a standard deviation. Column (2)

controls for lagged ELA and math scores to show how controlling for prior performance

affects the overall gap. Column (3) adds other student characteristics, including English

language learner, race/ethnicity, age, gifted, free/reduced price lunch status to determine

whether other student characteristics help explain the gap. Column (4) adds in controls for

school-grade fixed effects to see whether school/grade level variation explains some of the

gap. Column (5) adds in controls for classroom composition, the classroom peer averages of

all the individual controls, including lagged ELA and math. This is our main specification

throughout. Column (6) controls for class fixed effects rather than classroom composition

to capture any unobservables at the class level that may explain the achievement gap.

Interestingly, these results that controlling for lagged achievement explains about half of

the raw gap. After that, the gap is stable. The fact that classroom composition measures,

school-grade fixed effects and classroom fixed effects have virtually no additional explanatory

power matches the intuition that males are not being systematically matched to higher or

lower “quality” schools or classrooms once we account for lagged test scores.19 Importantly,

the results also suggest that teacher-based explanations of the achievement gap must involve

boys and girls within the same classroom not receiving the same benefits from their teacher.

19The size of the ELA gender gap remains quite stable when comparing specificationswith only class fixed effects or with only school-grade fixed effects and no other controls.Importantly, this means that conditioning on prior test scores is not needed to get thisevidence of stability. Again, these results make sense when there is no within-school matchingof students to classrooms, either based on gender or prior achievement.

Table 4: The Conditional Gender Gap in ELA (N=8589)

(1) (2) (3) (4) (5) (6)

Male -0.160*** -0.084*** -0.083*** -0.077*** -0.075*** -0.081***(0.022) (0.015) (0.015) (0.014) (0.014) (0.014)

ELAt−1 0.557*** 0.526*** 0.513*** 0.510*** 0.515***(0.013) (0.013) (0.013) (0.013) (0.013)

Matht−1 0.283*** 0.252*** 0.239*** 0.239*** 0.229***(0.010) (0.011) (0.009) (0.009) (0.009)

Avg Peer Matht−1 0.181**(0.073)

Avg Peer ELAt−1 -0.076(0.077)

% Male 0.122*(0.070)

ELAt−1, Matht−1 X X X X XAdditional Student Controls X X X XClass Controls XSchool-Grade FE X XClass FE X

R2 0.007 0.633 0.641 0.576 0.578 0.559

Notes: *** denotes significance at the 1%, ** at the 5% and * at the 10% levels. Standard errors areclustered at the school level. Student controls include lagged ELA and math, age, race, grade level,English Language Learner (ELL) status, gifted status, free and reduced price lunch (FRPL) status. Todeal with missing values for race and free/reduced price lunch, we replace missing with 0’s and controlfor indicator variables for missing. Classroom controls in column (5) controls include average lagged peerachievement (both math and ELA), average age, %male, %black, %hispanic, %race other, %gifted, %ELL,and %FRPL.

3.2 Teacher Male-Specific Value-Added

To further understand whether teacher-based explanations may be a plausible explanation

for the gap, we next consider an overall estimate of the male-specific teacher value-added

to get a sense of the extent to which within-class gender disparities in teacher value-added

contribute to the gap. Let i index students, c = c(i, t) classrooms and j = j(c(i, t)) teachers,

where teachers are assumed to be assigned to a given classroom. Yit denotes a student’s ELA

achievement on the end of grade assessment and Mi an indicator that the student is male.

Let φj denote the contribution of a teacher j to i’s value-added. If the student is male, the

teacher also makes contribution φMj , i.e.,

Yit = β0 + β1Mi +Xitβ2 + φj + φMj Mi + νit. (1)

Note that if φMj = 0, then males and females receive the same benefits of a given teacher.

Xit includes math and reading prior achievement, dummies for race, gifted, English language

learner, free-reduced price lunch, and the corresponding classroom averages along with per-

cent male.20

We estimate the value-added in several steps, following Chetty et al. (2014) and expanding

to recover the differential male effect, using the logic of a difference-in-difference estimator.

First, we estimate equation (1) controlling for teacher fixed effects, which yields estimates

of β0, β1, β2.21 We then solve for φj taking averages over all females in a given class as:

φj + ˆνFjt = Y Fjt − β0 − XF

jt β2 ≡ ˜Y Fjt ,

20Because free/reduced-price lunch status is missing in some cases, we replace missingvalues with 0 and control for a missing indicator so that these observations are not droppedfrom our regression.

21Controlling for teacher fixed effects helps eliminate bias from matching of students toteachers.

where the F superscript denote that classroom averages are taken over the females in the

classroom. This gives us an approximation of φj with error ˆνFjt. Third, to deal with noise, we

apply a standard shrinkage estimator (denoted by superscript S) resulting in ˜Y FSjt . Fourth, we

regress ˜Y FSjt on ˜Y FS

jt−1 with coefficient δ, which gives us φj = δ ˜Y Fjt−1, assuming Cov(ˆνjt, ˆνjt−1) =

0. This is a standard assumption in value-added models and is based on the rationale that Xit

provides sufficient controls for selection so that what remains in the residual after removing

the shared teacher component is just random measurement error.

We follow the same procedure as above for the male sample. We solve for the residualized

test score first as:

φj + φMj + ˆνMjt = Y M

jt − β0 − β1 − XMjt β2 ≡ ˜Y M

From the same steps as above, applying shrinkage, etc, we recover φj + φMj = δM ˜Y MS

jt−1, where

δM was estimated from the projection of ˜Y MSjt on ˜Y MS

jt−1.

As a final step, we estimate the relative male-specific value-added, φMj , by taking differ-

ences between the male and female value-added estimates. While Chetty et al. (2014), Kane

et al. (2013) and others make a compelling case that the above procedure is sufficient to re-

cover value-added even in the presence of sorting, the recovery of male-specific-value-added

has the additional advantage that it is based on a within-class comparison and so is arguably

even more robust to sorting concerns.

Table 5 shows that the relative male-specific value-added is of approximately similar

magnitude to the overall value-added, 0.08 of a standard deviation. We also explore a

generalization by estimating equation (1) separately for males and females. This approach

gives slightly higher estimates of the standard deviation in male-specific value-added. Finally,

we contrast estimates when we do not apply shrinkage. Again, estimates are slightly larger.

Overall, these findings support that there is a likely significant capacity for teachers to close

Table 5: Teacher Value-Added Components (N=448)

Mean SD Min Max

Restricted parameter estimatesShrunk

φMj 0.01 0.08 -0.28 0.24

φj -0.01 0.08 -0.23 0.32Not Shrunk

φMj 0.01 0.10 -0.34 0.28

φj -0.01 0.09 -0.27 0.40

Unrestricted parameter estimatesShrunk

φMj 0.01 0.10 -0.37 0.27

φj 0.00 0.10 -0.30 0.42Not Shrunk

φMj 0.01 0.12 -0.46 0.33

φj 0.00 0.12 -0.36 0.53

Notes: φMj and φj are value-added variables.They are constructed using the process discussedin section 3.2. In the regressions that createdthe value-added measures used to the constructφMj and φj , the following controls were included:lagged ELA and math, race, English LanguageLearner (ELL) status, gifted status, free and re-duced price lunch (FRPL) status, year and in-dicator variables showing if FRPL status wasmissing. We also controlled for the followingclassroom composition variables: average laggedpeer achievement (both math and ELA), %male,%black, %hispanic, %race other, %gifted, %ELL,and %FRPL. Unrestricted parameter estimatesare when equation (1) is estimated separately formales and females, whereas restricted model re-stricts the parameter estimates to be the samefor males and females.

the male/female ELA gap and that some teachers are better at doing this.

Given that φMj 6= 0, it could be (1) that the teacher input is the same for males and

females, but males have a different marginal benefit of that input, i.e., φMj = ρφj. Alterna-

tively, it could be that (2) males and females within the same classroom experience different

inputs, i.e., they feel more cared for or more engaged by what the teacher is doing. We

test these alternative hypotheses below. First, we exploit our rich teacher/classroom-based

measures of teacher effectiveness to test (1). We use the individual-specific Tripod measures

of practices to test (2).

4 Heterogeneous Teacher Effects

Value-added estimates reveal that some teachers are more effective at teaching boys relative

to girls, suggesting that differences in teacher effectiveness could explain the achievement gap.

A natural next step is to consider whether popular teacher evaluation protocols or other

measures of teacher aptitude and quality, as measured in MET data, have heterogeneous

returns across males and females. Let Pj represent these different measures of effective

teaching, which may or may not be time-varying. We estimate

Yit = β0 + β1Mi +Xitβ2 +MiPjβ3 + Pjβ4 + βsg + εit, (2)

where βsg denotes the school, s = s(i, t), by grade, g = g(i, t), fixed effects. The fixed effects

are important to control for differences in end of grade assessments across schools and grades

in our sample. They could play a secondary role of helping to control for matching, though

evidence presented in Section 2.3 suggests that matching of males to teachers/classrooms at

the school-grade level is not a likely confound. Our key parameter of interest is β3. Note

that because Pj is standardized to have mean 0, β1 will remain the same, continuing to

capture the achievement gap in the average classroom when we include the interactions with

our measure of teacher effectiveness. Thus, the relevant focus will be on the magnitude of

the interaction. If β3 is high relative to β1, it suggests that there is significant potential

for heterogeneity in teacher effectiveness to explain the achievement gap. While Pj could

represent a vector of teaching practices/qualities, we focus on estimating equations with

each teacher measure separately to give the best chance of finding statistically significant

interactions.

Our teacher measures include the standardize average ratings from each of the video

observation protocols (FFT, CLASS, PLATO), the standardize average of the 7Cs student

evaluations (this specification does not exploit individual student level variation in Tripod

as we do in Section 5), the principal survey teacher rating, CKT and teacher characteristics,

including experience and whether the teacher is male. We focus our analysis on contem-

poraneous measures of effective teaching, but we show robustness to potential bias from

measurement error and/or matching, as discussed below.

4.1 Results

The top panels of Tables 6 and 7 show estimates of equation (2) for each measure of teacher

effectiveness and teacher characteristics separately. We find that across the board β3 is not

statistically significantly different from 0 and that the point estimates are small. This is

true whether we include the practice measures individually or in the same regression (not

shown).22 Interestingly, we also do not find that boys benefit more from having male teachers,

though the lack of significance may be driven by the small number of male teachers in our

sample.

22Bassi et al. (2016) also show that in the context of the Chilean education system teacherswith higher scores in CLASS do not have differential impact on boys and girls readingperformance.

geF†

),age,

–ave

solidat

geF†

ed–a

One reason we may fail to uncover significant heterogeneity in the effects of teachers for

males is because of measurement error. For instance, Kelly et al. (2020) and other studies

discuss the somewhat low inter-rater reliability for these protocols. Furthermore, existing

research suggests that FFT and CLASS scales are correlated with characteristics of the

classroom Campbell and Ronfeldt (2018); Kelly et al. (2020); Garrett and Steinberg (2015),

so that measurement error may not be random. This could be a result of either measurement

bias among raters that is created by confounding teacher and classroom characteristics or

teacher adaptation in how they teach according to classroom characteristics. A similar

concern will apply to student-based 7C’s ratings. We address these types of measurement

error in Panel B of Tables 6 and 7 by instrumenting contemporaneous measures of teacher

effectiveness (Pjt and PjtMi) with lagged measures (Pjt−1 and Pjt−1Mi).23 Again, even

after accounting for measurement error, there is no evidence of heterogeneity in teacher

effectiveness by gender.24

Another potential reason that the estimates of β3 could be biased is if boys are matched

to teachers who are better-suited to teaching boys. Our balancing tests reveal no evidence

of matching based on observable measures of teacher effectiveness. Furthermore, we believe

if anything this matching would bias our estimates of β3 away from 0. However, we check

robustness by controlling for matching based on class fixed effects.25 These results, reported

in Appendix Tables A.5 and A.6 again show that there is no evidence of heterogeneity in

23Note that we could use multiple contemporaneous measures also as a potential instru-ment. We do not pursue this route because of evidence that contemporaneous measures maybe biased by aspects of the current classroom, as discussed above.

24Note that similar results apply if we replace contemporaneous with lagged measures ofteacher effectiveness, i.e., the reduced-form of Panel B. We also experimented with principalcomponent scores, instead of averages of the different domains, but the lack of heterogeneouseffects remain robust.

25We repeated this analysis on the subsample where teachers were randomly allocated intoclassrooms within randomization blocks (i.e. approximately within school-grade level). Re-assuringly, we also find small point estimates that are not statistically significantly differentfrom 0.

teacher effectiveness by gender.26

Overall, this evidence indicates that the effects of teaching practices do not differ by

gender, whether we use observation protocol-based measures or student survey or principal-

based measures or teacher knowledge. We further find no evidence that boy learning in ELA

is better-supported by teachers with more experience or those who are male. The good news

is that the practices emphasized by the observation-based protocols as effective instruction

do not appear to favor girls to the expense of boys. This is important because these protocols

are being increasingly used in districts in high stakes settings. Yet, the puzzle of how teachers

contribute to the gender ELA gap remains.

5 Teacher Differentiation and Student Evaluations

The remaining potential explanation for the heterogeneity in teacher effectiveness by gender

revealed by our value-added estimates has to do with teacher differentiation within the class-

room. Teacher differentiation could occur from teachers targeting boys and girls differently

within the classroom or indirectly through the teaching practice being received differently

by boys and girls. For instance, boys may be called on more to answer questions than girls

or may find different teacher actions to be caring or clarifying. Given the evidence that

boys learn in different ways than girls (Gurian and Stevens, 2004), some within-classroom

differentiation of instruction between them may even be an important aspect of teacher

effectiveness. Importantly, none of the observation protocol are designed to capture differ-

entiation, but we can begin to explore aspects of this differentiation using Tripod’s student

evaluations of teaching practice. It is an unusually rich measure in large scale data offering

some sense of differentiation of teacher inputs within the classroom.

26Instruments are stronger in this case because we are only instrumenting for the interac-tion of the measure of teacher effectiveness with the male dummy because the level effect ofthe teacher is absorbed by the class fixed effects.

First, we perform a similar analysis to the conditional ELA gap in Table 4, but with

student-level 7C’s as the dependent variable. We do this to see whether our controls for

student and classroom characteristics explain the gender gap, which would eliminate the

student-level variation in 7C’s as a likely explanation for the ELA gender gap. Each row of

Table 8 corresponds to treating a different domain of the 7C’s as the outcome variable and

the cells report the coefficient on the male dummy from separate regressions with different

controls. Column (1) is the raw gap; column (2) conditions on lagged performance. Column

(3) controls for other student characteristics, and column (4) for school-grade fixed effects

to deal with any differences at the school-grade level or potential matching to schools. To

deal with potential matching at the classroom/teacher level, column (5) adds controls for

classroom characteristics and column (6) for classroom fixed effects.

Interestingly, unlike ELA, for most of the 7C’s the gap is not explained at all by lagged

ELA and math performance. However, like ELA, the gender gap stays constant with the

addition of other student controls, school-grade fixed effects, classroom controls and class-

room fixed effects. An exception to this is Challenge, in which case the gap drops from

-.07 to -0.03 after controlling for school-grade fixed effects and class characteristics and is

no longer statistically significantly different from zero. Thus, despite significant raw gaps in

Challenge, the lack of conditional gender gap suggests it is unlikely to explain the gender

gap in ELA. Interestingly, the gender gap in Control is consistently zero across specifica-

tions, likely because it measures students’ perceptions of the classroom behavior rather than

anything that would signal differentiation. Consolidate also exhibits a comparatively small

gap throughout, suggesting it is an unlikely explanation. The persistent gap in Clarify, Care,

Captivate and Confer make these the most likely candidate explanations for the ELA gender

gap. That said, the gaps in these measures could be due to (1) males perceiving different

degrees of support, care or engagement than females, perhaps through teacher differentia-

tion of curriculum, teacher bias or differential student response to inputs, or (2) males just

evaluate teachers on a different scale. The first explanation could help to explain the ELA

achievement gap, whereas the latter would fit more with the observation of several studies

suggesting that male/female gaps in teacher ratings are not “productive” but rather evidence

of student bias (MacNell et al., 2015). We explore this further below.

5.1 Do the Tripod Gaps Matter for Achievement

To test this differentiation hypothesis, suppose there is some student-specific component of

teacher effectiveness, τij, which could be either uni- or multi-dimensional such that

Yit = β0 + β1Mi +Xitβ2 + τijβ3 + νit.

To the extent that τij explains the gender gap in ELA, we expect that controlling for it would

shrink the value of β1 toward 0. Suppose that τij is captured at least in part by the student

evaluations. Let Eijk denote the evaluation of a teacher j by a student i on a question k,

Eijk = τij + eijk, (3)

where eijk captures measurement error, student characteristics or non-productive aspects of

the student perception of teachers.

Plugging in for each student’s average evaluation of teacher j in a given dimension (or

dimensions) as a proxy for τij, we have

Yit = β0 + β1Mi +Xitβ2 + Eijβ3 + νij − β3eij. (4)

We cannot be certain that the measurement error is random in this case, so it is difficult

to sign the bias in β3 associated with this measurement error. The individual evaluation

of the teacher could be correlated with achievement for reasons not related directly to the

Table 8: Conditional Gender Gap in 7C’s Student Evaluations (N=8589)

Response (1) (2) (3) (4) (5) (6)

ClarifyMale -0.122*** -0.121*** -0.119*** -0.120*** -0.114*** -0.125***

(0.021) (0.021) (0.021) (0.022) (0.022) (0.022)Care

Male -0.137*** -0.137*** -0.134*** -0.136*** -0.132*** -0.142***(0.023) (0.023) (0.023) (0.023) (0.024) (0.023)

ChallengeMale -0.069*** -0.051** -0.051** -0.042** -0.035 -0.048**

(0.020) (0.020) (0.020) (0.020) (0.021) (0.020)Consolidate

Male -0.033* -0.037** -0.037** -0.043** -0.036* -0.051***(0.019) (0.019) (0.018) (0.019) (0.020) (0.020)

CaptivateMale -0.154*** -0.172*** -0.163*** -0.169*** -0.170*** -0.171***

(0.019) (0.019) (0.019) (0.020) (0.021) (0.020)Control

Male -0.007 -0.006 -0.016 0.013 -0.001 0.021(0.022) (0.022) (0.023) (0.020) (0.020) (0.019)

ConferMale -0.153*** -0.148*** -0.144*** -0.148*** -0.144*** -0.152***

(0.020) (0.020) (0.020) (0.021) (0.021) (0.021)All 7Cs

Male -0.131*** -0.132*** -0.130*** -0.127*** -0.124*** -0.132***(0.020) (0.021) (0.021) (0.022) (0.022) (0.022)

ELAt−1, Matht−1 X X X X XOther Student X X X XClass Controls XSchool-Grade FE X XClass FE X

Notes: *** denotes significance at the 1%, ** at the 5% and * at the 10% levels. Standarderrors are clustered at the school level. Although this table only shows the coefficient estimatesfor the male variable, other controls were included. Starting with column (3), all individualstudent controls were included. The controls include lagged ELA and math, age, race, gradelevel, English Language Learner (ELL) status, gifted status, free and reduced price lunch (FRPL)status, and indicator variables showing if any of the previous variables were missing (we imputeto maintain consistent sample size). In column (5) additional classroom characteristic controlswere included. These include average lagged peer achievement (both math and ELA), averageage, %male, %black, %hispanic, %race other, %gifted, %ELL, and %FRPL. These classroomcharacteristic variables were calculated for each student excluding themselves.

teaching practice. For instance, engaged students may rate teachers more highly, suggesting

the possible concern of reverse causality.

To address these concerns, we instrument Eij with the lag of teacher j’s average evaluation

for her/his t−1 class for females and males, EFjt,t−1, E

Mjt,t−1.

27 These instruments are arguably

independent of unobservables at the student level that contribute to either achievement or

certain types of evaluations. The main reason our instrumental variable strategy might fail

to address the endogeneity concern is if there is matching of students of certain types to

teachers with certain practices or effectiveness, but again recall the balancing tests suggest

that this is not a concern. Because we are over-identified, we can also test whether the

instruments pass the test of over-identifying restrictions, which would also be unlikely if

matching were relevant.

5.2 Results

Table 9 shows estimates of equation (4) first without any control for teaching practice (col-

umn 1), then the overall 7C’s (column 2) and then with each of the 7C’s that shows a

statistically significant raw gender gap, including Clarify, Care, Captivate, Challenge and

Confer (columns 3 to 7). These specifications follow the instrumental variable strategy de-

scribed above. Comparing columns (1) to (2) we see that the overall 7C’s explains a little

less than half of the gap, 0.03 of a standard deviation. A 1 standard deviation increase in

overall 7C’s increases achievement by 0.27 of a standard deviation. Weak instruments do

not appear to be a problem and we pass the test of over-identifying restrictions, providing

supportive evidence that these results are not driven by matching or other unobservable

characteristics of the current student population. Among the five domains of the 7C’s that

27We have also explored instrumenting Eij with Ejt,t−1 (i.e. average lagged evaluation notbeing gender specific). Results are almost identical in terms of explaining the gender gap, ifany they become stronger, as we discuss in the following subsection.

we explore, Captivate and Confer explain the largest proportion of the gap. Column (5)

shows that captivate explains all of the achievement gap and column (7) that Confer ex-

plains about half.28 Increasing Captivate by one standard deviation raises ELA achievement

by 0.41; increasing Confer by one standard deviation raises ELA achievement by a smaller

amount, 0.28.29

To account for correlations in our measures, we include in Column (8), our two leading

explanatory factors, Confer and Captivate in the same regression. Combined these measures

explain all of the gender gap. In this case, the marginal effect of captivate and confer are

comparable in magnitude, 0.21 and jointly significant at the 99% confidence level.

To test whether these findings are robust to controlling for other measures of teacher

effectiveness, Appendix Table A.7 repeats the analysis in Table 9 with the only change of

including controls for CLASS, FFT, PLATO, principal survey ratings, years of experience,

teacher gender and CKT. Findings for Captivate and Confer remain robust; they remain

jointly significant at the 0.03 confidence level.

As a final robustness check, we explore in Table 10 the extent that the gender gap

in Tripod explains gender differences in SAT9 performance. SAT9 is a useful comparison

because it is a low-stakes English/language arts test, in that students were only tested for

the purpose of the MET study. It differs substantively by including a writing component. As

shown in column (1), the conditional gaps are also much larger, 0.29 of a standard deviation.

This may be partly because of the writing element which has been shown to have larger

male/female gaps (Schwabe et al., 2015), but also likely because we are not able to control

28If instead, we were instrumenting Eij with Ejt,t−1 (i.e. average lagged evaluation notbeing gender specific), the equivalent male coefficients in columns (5) and (7) would be 0.023(positive but not statistically significant), and -0.035, respectively.

29We also experimented with principal component scores (instead of averages) of the differ-ent Tripod questions corresponding to each of the 7Cs but results do not change substantially,if any the specification with all 7C’s shows that the male coefficient becomes slightly smallerand not statistically significantly different from 0.

for lagged SAT9 in this sample and instead control for lagged ELA. The overall 7C’s (column

2) explain 0.05 of a standard deviation, slightly more than for ELA, but significantly less in

terms of proportion of the gap. Controlling for Captivate alone (column 5) explains 0.12 of

a standard deviation of the gap (almost half), whereas Confer (column 7) explains 0.06 of

a standard deviation. Combined again Confer and Captivate explain just under half of the

gap and are jointly significant at the 0.001 level.30

To get a sense of the magnitude of the contributions of Confer and Captivate to the

overall ELA gap by fourth grade, we consider how much boys would gain if their average

Confer and Captivate increased to the level of girls. Assuming the same marginal effect at

different grade levels and taking the persistence of 0.52 estimated in column (8) of Table 9,

we find that the average boy would gain 0.11 of a standard deviation in ELA, which comes

close to closing the raw gap, 0.16.

6 Mechanisms Exploration

One natural way that Captivate and Confer may operate to close the achievement gap is

through student engagement. Thus, we explore the effects of Confer and Captivate on our

measures of student engagement, including self-reported effort, homework completion and

happiness in class.31 Recall from Table 1 that boys have statistically significant lower levels

of engagement than girls in all dimensions we measure.

30Appendix Table A.8 repeats the analysis on Table 9 but based on a subsample in whichteachers were randomly allocated into classrooms within randomization blocks (i.e. approx-imately within school-grade level). After further constraining this subsample to classroomswith more than 50% of students compliance (i.e. students remain in their original class-rooms), we ended up with 3072 observations. Results show that while the standard errorshave increased substantially (mainly due to a weak first stage), the pattern that Captivateand Confer play a key role in explaining the gender gap remains.

31We do not have access to teacher assessments of student engagement. The particularquestions that refer to these domains are explained in Appendix Table A.1.

tat†

7C’s

st.‡

tat†

7C’s

st.‡

To characterize this gap in engagement, we study whether it can be explained by a

large set of student and classroom controls, and to what extent being exposed to teachers

with higher levels of Captivate and Confer could contribute to close this gap. Table 11

shows results corresponding to nine specifications where we estimate three different models

with the 3 different engagement proxies as dependent variables. Columns (1), (4) and (7)

present OLS baseline estimates where we regress homework completion, the index of student

effort, and the index of student happiness on an indicator for gender. Columns (2), (5),

and (8) add a large set of student and classroom controls to see how much of the gap

is explained by student and classroom controls. Finally, columns (3), (6) and (9) show

IV specifications where we explore the effect of our key teaching practices, instrumenting

Captivate and Confer as in the previous section. Surprisingly, we find that our rich set of

student and classroom controls explain little of the male/female engagement gap. However,

after adding Captivate and Confer, gender disparities are reduced by (almost) half in the case

of homework completion. The gap is no longer statistically significantly different from 0 for

effort and happiness. Interestingly, these results also show that Captivate plays a dominate

role for homework completion and effort, whereas Confer dominates for whether students

report being happy in class.

We finally consider the extent to which our engagement variables correlate with ELA

performance. Column (1) of Table 12 reports, for comparison purposes, the baseline gender

gap in ELA. Column (2) adds to our baseline specification all the engagement measures.

Finally, column (3) corresponds to an IV specification where we further include Captivate

and Confer as controls but we instrument for them in a similar fashion as in previous specifi-

cations. Results indicate that the engagement proxies are positively and jointly significantly

correlated with test score performance. The only measure that is not statistically significant

by itself is the index for happiness in class, however this is explained by its correlation with

tat†

lagged

Table 12: Gender Gap in ELA with Engagement and Teaching Practice Controls (N=8236)

(1) (2) (3)

Captivate 0.267(0.197)

Confer 0.389***(0.148)

Male -0.071*** -0.055*** -0.005(0.015) (0.015) (0.028)

HW Complete 0.062*** -0.058(0.016) (0.058)

Effort 0.058*** 0.037**(0.008) (0.017)

Happy 0.010 -0.248**(0.007) (0.110)

Test for Joint Significance of Student EngagementP-Value 0.000 0.000Test for Joint Significance of Captivate and ConferP-Value 0.025KP F-Statistic† 4.673Test overidentifying restrictionsP-value‡ 0.718

Notes: *** denotes significance at the 1%, ** at the 5% and *at the 10% levels. Standard errors are clustered at the schoollevel. All regressions control for school-grade fixed effects,student lagged achievement (ELA and math), age, gender,race, English Language Learner (ELL) status, gifted status,free and reduced price lunch (FRPL) status, an indicator vari-ables showing if any of the previous variables were missing(including the teacher practice), average lagged peer achieve-ment (both math and ELA), average age, %male, %black,%hispanic, %race other, %gifted, %ELL, and %FRPL. Col-umn 3 instruments for the student report of the teachingpractice with measure of the teaching practice based on theaverage of the teacher’s prior classroom broken out by gen-der. † Reports the Kleibergen-Paap rk Wald statistic for aweak instrument test. ‡ Reports the P-value from Hansen’sJ statistic test of overidentifying restrictions.

the effort measures.32 Moreover, our estimates show that student engagement measures can

explain an important share of the gender gap in ELA (23%).33 Finally, column (3) indicates

that the teaching practices explain a significant part of the relationship between engagement

and ELA, which we interpret this as evidence that the practices may work partly through

student engagement.34 That said, the engagement measures do not have the power to ex-

plain the relationship between teaching practices and ELA. This is not surprising given the

likely unobserved aspects of engagement and measurement error. Further exploration would

benefit from richer measures of engagement and behavior to develop a better sense of the

key mediators underlying the relationship between these teaching practices and ELA.

7 Conclusion

We find that in our sample there is considerable heterogeneity in teacher value-added for boys

and girls, between 0.08 and 0.10 of a standard deviation, enough to explain the conditional

achievement gap. That said, we find little evidence that this heterogeneity in value-added

measures of effectiveness is explained by heterogeneous returns to measures related to teacher

effectiveness, including video observation protocols, principal-survey and student survey-

based measures, teaching knowledge, experience or gender. These results are robust to

adjustments for measurement error and do not appear to be driven by matching. On the

positive side, observation protocols designed to measure effective instruction–FFT, CLASS

and PLATO–do not appear to be unfairly biased toward practices that favor girl achievement

at the expense of boy achievement or vice versa. Combined with evidence that boys may

32If the two effort variables are not included, happiness becomes positive and statisticallysignificant. Appendix Table A.9 reports similar specifications as in Table 12 but includingeach of the engagement variables separately.

33Importantly, we do not make a claim of causation here; only to say that our engagementmeasures are related to ELA performance.

34Interestingly, in column (3) we see that homework completion becomes no longer signif-icant, effort substantially reduces its size, and happiness completely changes sign.

learn differently from girls,(Gurian, 2010), this is good news given that these protocol are

growing in importance in many districts for both teacher development and accountability.

On the other hand, we find that gender gaps in the student survey-based 7C’s measure

of teaching practice, Captivate, which captures the extent to which students find school

and homework engaging, fully explains the gender ELA gap alone. Confer, which measures

the student’s perception of the teachers’ encouragement of student discussion, also explains

about half the ELA gender gap. Combined these two measures seem to matter similarly

for producing ELA value-added and explain all the conditional ELA gap. We rule out that

this is explained by reverse causality or student unobserved attributes by using teachers’

prior Tripod scores as instrumental variables. We interpret this as evidence of meaningful

differentiation within the classroom. Back of the envelope calculations suggest that raising

the amount of Captivate and Confer that boys receive to that of girls would increase boy

achievement by 4th grade (over 5 years) by 0.11 of a standard deviation in ELA. This is about

two-thirds of the overall ELA gap for 4th and 5th graders in our sample, 0.16. Finally, we

show that captivate and confer operate (in part) by increasing boys engagement in schooling

activities.

Our findings have several important implications for practice. First, though boys are

lagging behind girls in ELA, some teachers appear to be highly effective in improving boys’

ELA outcomes. Second, observational measures that evaluate teachers based on a set of

well-accepted practices do not appear to be biased toward practices that favor girls. More

work would be useful to determine if the practices favored by these protocols, which generally

favor a student-centered approach to teaching, are unevenly applied to boys and girls in the

classroom in ways that are simply not captured by the protocol. Third, student surveys

capture useful information about teacher differentiation that can help explain achievement.

Our results indicate the need for caution in interpreting heterogeneity in student reports as

evidence of student bias–rather gender gaps appear to capture meaningful differences in the

learning that is occurring in the classroom. Finally, our findings on Captivate and Confer

suggest that focusing on practices that move the needle on boys’ interest in school/homework

and create an environment where they feel welcome to interrogate ideas will be most fruitful

in narrowing achievement gaps.

References

Aucejo, Esteban and Jonathan James, “The Path to College Education: The Role of

Math and Verbal Skills,” Technical Report 1602, California Polytechnic State University,

Department of Economics 2016.

Aucejo, Esteban M and Jonathan James, “Catching up to girls: Understanding the

gender imbalance in educational attainment within race,” Journal of Applied Economet-

rics, 2019, 34 (4), 502–525.

Bassi, Marina, Costas Meghir, and Ana Reynoso, “Education quality and teaching

practices,” Technical Report, National Bureau of Economic Research 2016.

, Mercedes Mateo Diaz, Rae Lesser Blumberg, and Ana Reynoso, “Failing to

notice? Uneven teachers’ attention to boys and girls in the classroom,” IZA Journal of

Labor Economics, November 2018, 7 (1), 9.

Bertrand, Marianne and Jessica Pan, “The trouble with boys: Social influences and

the gender gap in disruptive behavior,” American Economic Journal: Applied Economics,

2013, 5 (1), 32–64.

Campbell, Shanyce L. and Matthew Ronfeldt, “Observational Evaluation of Teachers:

Measuring More Than We Bargained for?,” American Educational Research Journal, 2018,

55 (6), 1233–1267.

Chatterji, Madhabi, “Reading achievement gaps, correlates, and moderators of early read-

ing achievement: Evidence from the Early Childhood Longitudinal Study (ECLS) kinder-

garten to first grade sample.,” Journal of Educational Psychology, 2006, 98 (3), 489–507.

Chetty, Raj, John N Friedman, and Jonah E Rockoff, “Measuring the impacts of

teachers I: Evaluating bias in teacher value-added estimates,” The American Economic

Review, 2014, 104 (9), 2593–2632.

Cornwell, Christopher, David B. Mustard, and Jessica Van Parys, “Noncognitive

Skills and the Gender Disparities in Test Scores and Teacher Assessments: Evidence from

Primary School,” Journal of Human Resources, January 2013, 48 (1), 236–264.

Danielson, Charlotte, The framework for teaching evaluation instrument 2011. Published:

The Danielson Group.

Dee, Thomas S, “Teachers and the gender gaps in student achievement,” Journal of Human

resources, 2007, 42 (3), 528–554.

Entwisle, Doris R., Karl L Alexander, and Linda Steffel Olson, Children, schools,

and inequality Social inequality series, Boulder, Colo. : Westview Press, 1997., 1997.

Ferguson, R, “The tripod project framework,” The Tripod Project, 2008.

Figlio, David, Krzysztof Karbownik, Jeffrey Roth, Melanie Wasserman et al.,

“Family disadvantage and the gender gap in behavioral and educational outcomes,” Amer-

ican Economic Journal: Applied Economics, 2019, 11 (3), 338–81.

Garrett, Rachel and Matthew P Steinberg, “Examining teacher effectiveness using

classroom observation scores: Evidence from the randomization of teachers to students,”

Educational Evaluation and Policy Analysis, 2015, 37 (2), 224–242.

Grossman, Pam, Susanna Loeb, Julie Cohen, and James Wyckoff, “Measure for

Measure: The Relationship between Measures of Instructional Practice in Middle School

English Language Arts and Teachers’ Value-Added Scores,” American Journal of Educa-

tion, May 2013, 119 (3), 445–470.

Gurian, Michael, Boys and Girls Learn Differently! A Guide for Teachers and Parents:

Revised 10th Anniversary Edition, John Wiley & Sons, October 2010. Google-Books-ID:

o WclCNMBm4C.

and Kathy Stevens, “With Boys and Girls in Mind,” Educational Leadership, 2004, 62

(3), 21–26.

Hamre, Bridget K., Robert C. Pianta, Jason T. Downer, Jamie DeCoster, An-

drew J. Mashburn, Stephanie M. Jones, Joshua L. Brown, Elise Cappella,

Marc Atkins, Susan E. Rivers, Marc A. Brackett, and Aki Hamagami, “Teach-

ing through Interactions: Testing a Developmental Framework of Teacher Effectiveness in

over 4,000 Classrooms,” The Elementary School Journal, 2013, 113 (4), 461–487.

Jackson, C. Kirabo, “The Effect of Single-Sex Education on Test Scores, School Com-

pletion, Arrests, and Teen Motherhood: Evidence from School Transitions,” Technical

Report w22222, National Bureau of Economic Research, Cambridge, MA May 2016.

, Jonah E. Rockoff, and Douglas O. Staiger, “Teacher Effects and Teacher-Related

Policies,” Annual Review of Economics, 2014, 6 (1), 801–825.

Kane, Thomas J and Douglas O Staiger, “Gathering Feedback for Teaching: Combining

High-Quality Observations with Student Surveys and Achievement Gains. Research Paper.

MET Project.,” Bill & Melinda Gates Foundation, 2012.

, Daniel F McCaffrey, Trey Miller, and Douglas O Staiger, “Have we identified

effective teachers? Validating measures of effective teaching using random assignment,”

in “Research Paper. MET Project. Bill & Melinda Gates Foundation” Citeseer 2013.

Kelly, Sean, Robert Bringe, Esteban Aucejo, and Jane Cooley Fruehwirth, “Us-

ing global observation protocols to inform research on teaching effectiveness and school

improvement: Strengths and emerging limitations,” Education Policy Analysis Archives,

2020, 28, 62.

Koedel, Cory, Kata Mihaly, and Jonah E. Rockoff, “Value-added modeling: A re-

view,” Economics of Education Review, 2015, 47, 180 – 195.

Konstantopoulos, Spyros, “Teacher Effects, Value-Added Models and Accountability,”

Teacher College Record, 2014, 116 (1).

Lavy, Victor and Edith Sand, “On The Origins of Gender Human Capital Gaps: Short

and Long Term Consequences of Teachers’ Stereotypical Biases,” Technical Report 20909,

National Bureau of Economic Research, Inc January 2015.

Legewie, Joscha and Thomas A. DiPrete, “School Context and the Gender Gap in

Educational Achievement,” American Sociological Review, June 2012, 77 (3), 463–485.

Loveless, Tom, “Girls, boys, and reading,” November 2015.

MacNell, Lillian, Adam Driscoll, and Andrea N. Hunt, “What’s in a Name: Exposing

Gender Bias in Student Ratings of Teaching,” Innovative Higher Education, August 2015,

40 (4), 291–303.

Mengel, Friederike, Jan Sauermann, and Ulf Zolitz, “Gender bias in teaching evalu-

ations,” Journal of the European Economic Association, 2019, 17 (2), 535–566.

Northrop, L., “Breaking the Cycle: Cumulative Disadvantage in Literacy,” Reading Re-

search Quarterly, 2017, 52 (4), 391–396.

Reardon, Sean, Erin Fahle, Demetra Kalogrides, Anne Podolsky, and Rosalia

Zarate, Geographic Variation of District-Level Gender Achievement Gaps within the

United States, Society for Research on Educational Effectiveness, 2016.

Schwabe, Franziska, Nele McElvany, and Matthias Trendtel, “The school age gender

gap in reading achievement: Examining the influences of item format and intrinsic reading

motivation,” Reading Research Quarterly, 2015, 50 (2), 219–232.

Senechal, Monique and Jo-Anne LeFevre, “Parental Involvement in the Development

of Children’s Reading Skill: A Five-Year Longitudinal Study,” Child Development, 2002,

73 (2), 445–460.

Smith, Michael and Jeffrey D. Wilhelm, Reading Don’t Fix No Chevys: Literacy in

the Lives of Young Men, 1 edition ed., Portsmouth, NH: Heinemann, March 2002.

Steinberg, Matthew P and Rachel Garrett, “Classroom composition and measured

teacher performance: What do teacher observation scores really measure?,” Educational

Evaluation and Policy Analysis, 2016, 38 (2), 293–317.

Terrier, Camille, “Giving a Little Help to Girls? Evidence on Grade Discrimination and

its Effect on Students’ Achievement,” Technical Report dp1341, Centre for Economic

Performance, LSE March 2015.

A Appendix Tables

Table A.1: Student Survey Questions: Engagement and Tripod 7CsDimension Example Question PromptsCare My teacher in this class makes me feel that he/she really cares about me.

The teacher in this class encourages me to do my best.My teacher gives us time to explain our ideas.My teacher seems to know if something is bothering me.If I am sad or angry, my teacher helps me feel better.My teacher is nice to me when I ask questions.I like the way my teacher treats me when I need help.

Control Our class stays busy and does not waste time.Students behave so badly in this class that it slows down our learning.Everybody knows what they should be doing and learning in this class.My classmates behave the way my teacher wants them to.

Clarify If you don’t understanding something, my teacher explains it another way.My teacher has several good ways to explain each topic that we cover in this class.My teacher explains difficult things clearly.This class is neat, everything has a place and things are easy to find.My teacher explains things in very orderly ways.In this class, we learn to correct our mistakes.My teacher knows when the class understands, and when we do not.I understand what I am supposed to be learning in this class.

Challenge My teacher pushes us to think hard about things we read.In this class, my teacher accepts nothing less than our full effort.In this class we have to think hard about the writing we do.My teacher pushes everybody to work hard.

Captivate We have interesting homework.Homework helps me learn.School work in not very enjoyable.School work in interesting

Confer My teacher wants me to explain my answers.When he/she is teaching us, my teacher asks us whether we understand.My teacher tells us what we are learning and why.My teacher asks questions to be sure we are following along when he/she is teaching.My teacher checks to make sure we understand what he/she is teaching us.My teacher wants us to share our thoughts.Students speak up and share their ideas about class work.

Consolidate When my teacher marks my work, he/she writes on my papers to help me understandhow to do better.

My teacher takes the time to summarize what we learn each day.Effort I have pushed myself hard to understand my lessons in this class.

When doing schoolwork for this class, I try to learn as much as I can and I don’t worryabout how long it takes.

Happy Being in this class makes me feel sad or angry (reverse-coded)This class is a happy place for me to be

Homework When homework is assigned in this class, how much of it do you usually complete?

Effort and Happy items have the scale no/never (1); mostly not (2); maybe/sometimes (3); mostly yes (4);yes, always (5). Homework complete gets a value of 1 if students responded all and 0 otherwise. Tripodquestions have a 5-point scale (1-totally untrue to 5-totally true).

Table A.2: Student Summary Statistics by Gender (Unrestricted Sample)

Male Female Male-FemaleMean SD N Mean SD N Mean P-value

Panel A: Student CharacteristicsELA(2009) -0.04 0.98 5928 0.13 0.93 6047 -0.17 0.00ELA(2010) -0.04 0.98 6290 0.13 0.92 6437 -0.17 0.00Effort 3.96 1.08 5375 4.13 1.03 5450 -0.17 0.00Happy 3.94 1.05 5305 4.11 1.00 5416 -0.17 0.00Homework Complete 0.73 0.44 5203 0.81 0.39 5340 -0.08 0.00Age 9.42 0.93 6607 9.34 0.90 6637 0.08 0.00Gifted 0.08 0.27 6611 0.08 0.27 6638 0.00 0.46English Language Learner (ELL) 0.15 0.35 6611 0.13 0.34 6637 0.01 0.02Free Reduced Price Lunch (FRPL) 0.48 0.50 4919 0.48 0.50 4911 0.00 0.90White 0.24 0.43 6558 0.24 0.43 6558 0.00 0.92Black 0.43 0.50 6558 0.44 0.50 6558 0.00 0.78Hispanic 0.25 0.43 6558 0.24 0.43 6558 0.00 0.58Asian 0.05 0.23 6558 0.05 0.23 6558 0.00 0.94Race Other 0.02 0.15 6558 0.02 0.15 6558 0.00 0.65Grade Level 4.54 0.50 6611 4.54 0.50 6638 0.00 0.97

Panel B: Class CharacteristicsAvg. Lag Math 0.04 0.51 5998 0.05 0.51 6109 -0.01 0.15Avg. Lag ELA 0.04 0.50 5928 0.06 0.51 6047 -0.02 0.08Avg. Age 9.39 0.81 5783 9.39 0.82 5806 -0.01 0.57% Male 0.52 0.10 5787 0.48 0.10 5807 0.04 0.00% Black 0.45 0.37 5750 0.45 0.37 5743 0.00 0.62% Hispanic 0.24 0.26 5750 0.24 0.26 5743 0.00 0.32% Asian 0.06 0.10 5750 0.06 0.11 5743 0.00 0.51% Race Other 0.02 0.04 5750 0.02 0.04 5743 0.00 0.19% Gifted 0.08 0.16 5787 0.09 0.17 5807 0.00 0.12% ELL 0.14 0.18 5787 0.14 0.18 5807 0.01 0.09% FRPL 0.47 0.31 4180 0.47 0.31 4174 0.00 0.57

Panel C: Student Tripod ResponsesClarify 4.20 0.58 5396 4.27 0.56 5464 -0.07 0.00Care 4.12 0.75 5395 4.23 0.73 5465 -0.11 0.00Challenge 4.23 0.71 5392 4.29 0.70 5461 -0.06 0.00Consolidate 3.86 0.95 5337 3.89 0.97 5437 -0.03 0.09Captivate 3.60 0.86 5390 3.73 0.81 5462 -0.13 0.00Control 3.52 0.72 5382 3.53 0.73 5459 -0.01 0.59Confer 4.21 0.61 5383 4.31 0.57 5459 -0.10 0.00All 7Cs 3.96 0.54 5396 4.03 0.53 5465 -0.07 0.00

Notes: The sample sizes for males and females refer to the unrestricted sample where we only limitthe sample to students in the second wave of the survey and only in grades 4-5; some variables in thistable have fewer observations. The last column reports whether the means are statistically significantlydifferent between males and females. The classroom-characteristic variables in Panel B are calculatedexcluding each individual student. The tripod survey domains in Panel C were constructed by averagingover the relevant questions. All 7Cs averages over the 7 different domains.

Table A.3: Teacher Descriptive Statistics by Student Gender (Unrestricted Sample)

Male Female Male–FemaleMean SD N Mean SD N Mean P-Value

7C(2010) 4.01 0.25 5539 4.01 0.25 5538 0.00 0.71FFT(2010) 2.66 0.25 4839 2.67 0.25 4750 0.00 0.47CLASS(2010) 4.58 0.36 4839 4.59 0.36 4750 -0.01 0.08PLATO(2010) 2.70 0.23 4839 2.69 0.23 4750 0.00 0.467C(2009) 3.96 0.26 5597 3.95 0.27 5605 0.01 0.07FFT(2009) 2.66 0.24 4793 2.65 0.24 4717 0.00 0.65CLASS(2009) 4.57 0.41 4810 4.57 0.41 4732 0.00 0.91PLATO(2009) 2.65 0.27 4763 2.65 0.27 4688 0.00 0.74Principal Survey 4.32 1.13 4679 4.34 1.14 4699 -0.03 0.24CKT Score -0.04 0.99 4676 -0.01 0.99 4666 -0.03 0.16Years of Exp. 6.26 5.79 3694 6.23 5.92 3662 0.03 0.85Male 0.10 0.30 5326 0.09 0.28 5343 0.01 0.12Black 0.34 0.48 5326 0.34 0.47 5343 0.01 0.40White 0.58 0.49 5326 0.60 0.49 5343 -0.02 0.10Hispanic 0.06 0.24 5326 0.06 0.23 5343 0.01 0.12

Notes: The sample sizes for males and females refer to the unrestricted samplewhere we only limit the sample to students in the second wave of the survey andonly in grades 4-5. P-value in the last column tests whether the male and femalemeans are statistically significantly different. The 7C variable is the average studentscore by class. FFT, CLASS and PLATO are also calculated as averages across allraters and domains. For this analysis, we consider every possible response whencalculating this teacher average, so we include both ELA and Math responses in thecase where the teacher is instructing both subjects in one class.

studen

.φM j

andφj

geF†

–ave

solidat

geF†

ed–a

7C’s

te.†

tat†

7C’s

lagged

Teachers and the Gender Gap in Reading Achievement

Documents

Gender Achievement Gaps in U.S. School Districts Achievement Gaps.pdfachievement gap in math, but there is a gap of roughly 0.23 standard devi-ations in ELA that favors girls. Both

The Global Achievement Gap

2007-08 ACT Public Workforce Profile · Web viewWorkforce Gender Pay Gap % ACTPS Gender Pay Gap 5.5 ACT Gender Pay Gap 11.0 Australia’s Gender Pay Gap 16.9 While caution should

Achievement gap presentation

The Achievement Gap

Achievement Gap

Have Gender Gaps in Math Closed? Achievement, Teacher ... · Gender Gap Development 3 organization, and eagerness to learn)—and gender achieve-ment gaps, including disparities in

Chapter 10: Development€¦ · •Gender-related Development Index (GDI)—measures the gender gap in the level of achievement for income, education, and life expectancy •Compares

Closing the achievement gap

AT&T UK Pay Gap Report€¦ · Gender Pay Gap 8.9% 6.8 % Gender Bonus Gap 25.6 %7.9% MEAN MEDIAN Gender Pay Gap 21.2 25.4% Gender Bonus Gap 31.4% 43.6 Proportion of females and males

Running Head: “ACHIEVEMENT GAP” LANGUAGE · 2019-07-11 · Running Head: “ACHIEVEMENT GAP” LANGUAGE “Achievement Gap” Language Affects Teachers’ Issue Prioritization

GENDER PAY GAP REPORT - 2017 · gender pay gap report -2017. whatisthegenderpaygap?

Achievement Gap Report

Achievement Gap[1]

Gender gap

The Black-White-Other Achievement Gap: Testing Theories of ...eaop.ucsd.edu/198/achievement-gap/The Black-White-Other Achievement... · The Black-White-Other Achievement Gap: Testing

The gender gap in mathematics achievement

2018 GENDER PAY GAP - mandco-online.com GENDER PAY GAP final.pdf · Gender Pay Gap Reporting Gender Pay Gap Reporting (GPGR) requires UK businesses with more than 250 employees to

TACKLING YOUR GENDER PAY GAP: ATTRACTION AND …...• Communicating your gender pay gap • Tackling your gender pay gap – Attraction and Recruitment • Tackling your gender pay

Gender Achievement Gap in Education