Teacher Quality, Quality Teaching, and Student Outcomes: Measuring the Relationships
Heather C. HillDeborah Ball, Hyman Bass, MerrieBlunk, Katie Brach,
CharalambosCharalambous, Carolyn Dean, Séan Delaney, Imani Masters Goffney, Jennifer Lewis, Geoffrey Phelps,
Laurie Sleep, Mark Thames, Deborah Zopf
Measuring teachers and teaching
Traditionally done at entry to profession (e.g., PRAXIS) and later ‘informally’ by principals
Increasing push to measure teachers and teaching for specific purposes: Paying bonuses to high-performing
teachers Letting go of under-performing (pre-
tenure) teachers Identifying specific teachers for
professional development Identifying instructional leaders,
coaches, etc.
Methods for identification
Value-added scores Average of teachers’ students’ performance
this year differenced from same group of students’ performance last year
In a super-fancy statistical model Typically used for pay-for-performance
schemes Problems
Self-report / teacher-initiated Typically used for leadership positions,
professional dev. However, poor correlation with
mathematical knowledge R= 0.25
Identification: Alternative Methods
Teacher characteristics NCLB’s definition of “highly qualified” More direct measures
Educational production function literature Direct measures of instruction
CLASS (UVA)—general pedagogy Danielson, Saphier, TFA—ditto But what about mathematics-specific
practices?
Purpose of talk
To discuss two related efforts at measuring mathematics teachers and mathematics instruction
To highlight the potential uses of these instruments Research Policy?
Begin With Practice
Clips from two lessons on the same content – subtracting integers What do you notice about the instruction
in each mathematics classroom? How would you develop a rubric for
capturing differences in the instruction? What kind of knowledge would a teacher
need to deliver this instruction? How would you measure that knowledge?
Bianca
Teaching material for the first time (Connected Mathematics)
Began day by solving 5-7 with chips Red chips are a negative unit; blue
chips are positive Now moved to 5 – (-7) Set up problem, asked students to used
chips Given student work time
Question
What seems mathematically salient about this instruction?
What mathematical knowledge is needed to support this instruction?
Mercedes
Early in teaching career Also working on integer subtraction with
chips from CMP Mercedes started this lesson previous day,
returns to it again
Find the missing part for this chip problem. What would be a number sentence for this problem?
Start With Rule End With
Add 5
Subtract 3
Questions
What seems salient about this instruction? What mathematical knowledge is needed
to support this instruction?
What is the same about the instruction? Both teachers can correctly solve the
problems with chips Both teachers have well-controlled
classrooms Both teachers ask students to think
about problem and try to solve it for themselves
What is different?
Mathematical knowledge Instruction
Observing practice…
Led to the genesis of “mathematical knowledge for teaching”
Led to “mathematical quality of instruction”
Mathematical Knowledge for Teaching
Source: Ball, Thames & Phelps, JTE 2008
MKT Items
2001-2008 created an item bank of for K-8 mathematics in specific areas (see www.sitemaker.umich.edu/lmt) (Thanks NSF) About 300 items
Items mainly capture subject matter knowledge side of the egg
Provide items to field to measure professional growth of teachers NOT for hiring, merit pay, etc.
MKT Findings
Cognitive validation, face validity, content validity
Have successfully shown growth as a result of prof’l development
Connections to student achievement - SII Questionnaire consisting of 30 items (scale
reliability .88) Model: Student Terra Nova gains predicted by:
Student descriptors (family SES, absence rate) Teacher characteristics (math methods/content, content
knowledge) Teacher MKT significant
Small effect (< 1/10 standard deviation): 2 - 3 weeks of instruction
But student SES is also about the same size effect on achievement
(Hill, Rowan, and Ball, AERJ, 2005)
What’s connection to mathematical quality of instruction??
History of Mathematical Quality of Instruction (MQI) Originally designed to validate our
mathematical knowledge for teaching (MKT) assessments Initial focus: How is teachers’ mathematical
knowledge visible in classroom instruction? Transitioning to: What constitutes quality in
mathematics instruction? Disciplinary focus Two-year initial development cycle (2003-05) Two versions since then
MQI: Sample Domains and Codes
Richness of the mathematics e.g., Presence of multiple (linked) representations,
explanation, justification, multiple solution methods
Mathematical errors or imprecisions e.g., Computational, misstatement of mathematical
ideas, lack of clarity
Responding to students e.g., Able to understand unusual student-generated
solution methods; noting and building upon students’ mathematical contributions
Cognitive level of student work Mode of instruction
Initial study: Elementary validation
Questions: Do higher MKT scores correspond with
higher-quality mathematics in instruction?
NOT about “reform” vs. “traditional” instruction
Instead, interested in the mathematics that appears
Method
10 K-6 teachers took our MKT survey Videotaped 9 lessons per teacher
3 lessons each in May, October, May
Associated post-lesson interviews, clinical interviews, general interviews
Elementary validation study
Coded tapes blind to teacher MKT score
Coded at each code Every 5 minutes Two coders per tape
Also generated an “overall” code for each lesson – low, medium, high knowledge use in teaching
Also ranked teachers prior to uncovering MKT scores
Projected Versus Actual Rankings of Teachers
Projected ranking of teachers:
Actual ranking of teachers (using MKT scores):
Correlation of .79 (p < .01)
Hill, H.C. et al., (2008) Cognition and Instruction
Correlations of Video CodeConstructs to Teacher Survey Scores
Construct (Scale) Correlation to MKT scores
Responds to students 0.65*
Errors total -0.83*
Richness of mathematics 0.53
*sig
nifi
can
t at
the .
05
leve
l
Validation Study II: Middle School
Recruited 4 schools by value-added scores High (2), Medium, Low
Recruited every math teacher in the school All but two participated for a total of 24
Data collection Student scores (“value-added”) Teacher MKT/survey Interviews Six classroom observations
Four required to generalize MQI; used 6 to be sure
Validation study II: Coding
Revised instrument contained many of same constructs Rich mathematics Errors Responding to students
Lesson-based guess at MKT for each lesson (averaged)
Overall MQI for each lesson (averaged to teacher) G-study reliability: 0.90
Validation Study II:Value-added scores All district middle school teachers
(n=222) used model with random teacher effects, no school effects Thus teachers are normed vis-à-vis
performance of the average student in the district
Scores analogous to ranks Ran additional models; similar results* Our study teachers’ value-added scores
extracted from this larger dataset
Results
MKT MQI Lesson-based MKT
Value-added score*
MKT 1.0 0.53** 0.72** 0.41*
MQI 1.0 0.85** 0.45*
Lesson-based MKT
1.0 0.66**
Value added score
1.0
•Significant at p<.05•Significant at p<.01
Source: Hill, H.C., Umland, K. &Kapitula, L. (in progress) Validating Value-Added Scores: A Comparison with Characteristics of Instruction. Harvard GSE: Authors.
Additional Value-Added Notes
Value-added and average of: Connecting classroom work to math:
0.23 Student cognitive demand: 0.20 Errors and mathematical imprecision: -
0.70** Richness: 0.37*
**As you add covariates to the model, most associations decrease Probably result of nesting of teachers
within schools Our results show a very large amount
of “error” in value-added scores
Lesson-based MKT vs. VAM score
Proposed Uses of Instrument
Research Determine which factors associate with
student outcomes Correlate with other instruments (PRAXIS,
Danielson) Instrument included as part of the National
Center for Teacher Effectiveness, Math Solutions DRK-12 and Gates value-added studies (3)
Practice?? Pre-tenure reviews, rewards Putting best teachers in front of most at-
risk kids Self or peer observation, professional
development
Problems
Instrument still under construction and not finalized
G-study with master coders indicates we could agree more among ourselves
Training only done twice, with excellent/needs work results
Even with strong correlations, significant amount of “error”
Standards required for any non-research use are highKEY: Not yet a teacher evaluation tool
Next
Constructing grade 4-5 student assessment to go with MKT items
Keep an eye on use and its complications
Questions?