Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 1
A “Jarring” Experience?
Exploring how Changes to Standardized Tests Impact Teacher Experience Effects
Mark Chin
Harvard Graduate School of Education
Author Note
Mark Chin, Harvard Graduate School of Education, Harvard University. Correspondence concerning this article should be addressed to Mark Chin, Center for Education Policy Research, 50 Church Street 4th Floor, Cambridge, MA 02138. E-mail: [email protected]
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 2
Abstract
Experience has long been used by states and districts to indicate a teacher’s quality. More time in
the classroom theoretically leads to the development of skills and the improved implementation
of instructional practices key to student learning. Empirical evidence links teacher experience to
student test performance. Test familiarity, and, subsequently, veteran teachers’ more effective
tailoring of instruction to the content and format of test items, may also contribute to this
relationship. If the teacher experience effect is in part explained by teacher test-experience, this
could lead to non-persistent student learning, and to misallocation of resources or misguided
personnel decisions. I used administrative data from Kentucky before and after the state switched
standardized tests to test whether test experience does factor into the teacher experience effect. I
found that the teacher experience effect on mathematics attenuated following the change,
supporting this hypothesis, as both novice and more veteran teachers became test-inexperienced.
Keywords: teacher quality, teacher experience, test preparation, standardized tests
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 3
A “Jarring” Experience?
Exploring how Changes to Standardized Tests Impact Teacher Experience Effects
Experience has long been used by states and districts to indicate a teacher’s quality. As
such, teacher compensation contracts typically connect pay to years of experience (Schools and
Staffing Survey, 2012). Recent federal policies, however, have pushed policymakers to employ
other metrics to measure teacher effectiveness in their updated evaluation systems. One such
measure includes teacher impacts on students’ standardized test outcomes, or “value-added”
measures. Theory would suggest that students taught by more veteran teachers should
demonstrate higher test score growth. For example, the additional years of experience in the
classroom likely translate to better student outcomes through the development of key
proficiencies or the improved implementation of important instructional practices, such as better
understanding of student learning pathways or improved ability to minimize unproductive
classroom time (e.g., Leinhardt, 1989; Scribner & Akiba, 2010). This hypothesis has largely
played out in empirical analyses; most extant research suggests positive within-teacher returns to
experience, particular in the earlier years of a teacher’s time in the classroom (e.g., Harris &
Sass, 2011; Papay & Kraft, 2015; Rice, 2013; Rockoff, 2004).
Another explanation as to why students taught by novice teachers perform worse than
expected on standardized tests, however, may be the lack of familiarity such teachers have with
the format or content of exam items. Under this paradigm, higher test performance demonstrated
by students taught by more veteran teachers would in part be explained by these teachers’
improved implementation of narrower, test-specific instructional practices gained from
familiarity. Such a finding might provide further evidence that test-based accountability systems
do not necessarily yield lasting learning outcomes (e.g., Corcoran, Jennings, & Beveridge, 2011;
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 4
Jacob, 2005), and, subsequently, would challenge the utility of teacher experience as an indicator
of quality or for decisions surrounding hiring, retention, and compensation. With many states
already having switched or contemplating a switch to use of standardized tests more aligned to
the Common Core (e.g., the Smarter Balanced assessments or PARCC), understanding how
testing regime changes impact this traditional indicator of teacher quality is essential.
In my analysis, I explored the possibility that the relationship between a student’s test
outcomes and the experience level of his or her teacher (henceforth referred to as “the teacher
experience effect” for simplicity) may in part be test-specific. To do so, I utilize administrative
data from the state of Kentucky, which changed its high-stakes standardized tests between the
2010-11 and 2011-12 academic years. If the teacher experience effect were to mainly capture the
development of test-independent teaching skills or improved implementation of other test-
independent instructional behaviors over time, I would expect the experience effect to be similar
to those observed in prior research, and to be consistent in the year before and after the change in
the standardized testing regime in the state. Alternatively, if the teacher experience effect were to
mainly capture familiarity with the states’ original standardized test, I would expect that the
difference in value added between novice and more veteran teachers to be consistent with prior
research before the change, but to attenuate in the year immediately following the change. This
change might thus capture the inexperience of both novice and veteran teachers in implementing
narrow, effective test-specific instruction the year following a change.
Results from standard value-added models controlling for the teacher experience effect
provided evidence that would support the latter hypothesis. I found that this effect on
mathematics achievement in Kentucky significantly attenuated the year following the
standardized test change. This difference was particularly pronounced for the sample of schools
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 5
with the highest proportion of students eligible for free- or reduced-price lunch in the year prior
to the test change, as expected, given prior research documenting higher prevalence of test-
focused instruction in schools serving the most disadvantaged students. The difference observed
in the main analysis remained even when restricting the sample of teachers to those teaching in
both school years (i.e., to account for potential attrition of ineffective teachers from the sample).
Furthermore, analyses suggested that new teachers hired in the years before and after the test
change were not significantly different from one another on other characteristics that would
indicate their effectiveness, alleviating some concern regarding the influence of “vintage effects”
(see Murnane & Phillips, 1981) on my results. I found similar patterns for the teacher experience
effect on student English language arts (ELA) achievement, though the effects overall were
(unsurprisingly) smaller and differences were insignificant.
These initial findings suggest that the positive returns to experience for student test
outcomes observed in prior literature may in part result from increased exposure and familiarity
with standardized tests; if such teachers more capably implement test-specific teaching practices
to improve outcomes, the effect may result in impacts on student learning that fail to persist, or
“fade out”, over time (e.g., Kane & Staiger, 2008). Basing personnel decisions on teacher
experience may thus be misguided if persistent student learning is the goal of such policies.
In what follows, I describe the setting for my investigation, the data used in my analyses,
and my methodology for exploring the teacher experience effect on student test achievement. I
then present results from analyses, and conclude by discussing the practical implications of my
findings.
Setting
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 6
In the March of 2009, the governor of Kentucky signed Senate Bill 1, which called for a
comprehensive revision to the states’ academic standards for student learning in several subjects.
Revised standards for learning were intended to be tougher than original standards, with such
revisions targeting increased college and career readiness of the states’ students; Kentucky
officially adopted the Common Core State Standards (CCSS) in 2010. Senate Bill 1 also
mandated that new high-stakes standardized tests be designed to reflect and align with the
changes to academic standards, and for administration of these tests to begin starting in the
academic year of 2011-12. Prior to this transition year, and starting in the 1998-99 academic
year, students in grades three to eight were tested using the Kentucky Core Content Tests
(KCCT). The state began administration in 2011-12 of the Kentucky Performance Report for
Educational Progress (K-PREP) tests. Though the bill suspended state school accountability
based on student KCCT performance starting in the 2008-09 academic year and through the first
administration of the K-PREP tests, to meet federal regulations stemming from No Child Left
Behind, the KCCT was still administered through the 2010-11 academic year. Notably, the
Kentucky Department of Education did not develop new items designed to assess students on the
standards tied to the KCCT, and the KCCT administered in 2010-11 included the same items as
the one administered in 2008-09 (Bynum & Thacker, 2011).
Data
In my analyses, I used Kentucky statewide student- and teacher-level data collected in the
2010-11 and 2011-12 academic years. Student-level data analyzed included: (a) student
demographic data, including gender, race or ethnicity, eligibility for free- or reduced-price lunch
(FRPL), eligibility for special education (SPED), a designation for limited English proficiency,
and other academic-level indicators (i.e., classification as “gifted”, being retained in a grade, or
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 7
receiving supplemental instruction in mathematics or ELA); (b) current and prior scaled score
performance on either the KCCT or K-PREP mathematics and ELA state standardized tests, and;
(c) student links to teachers, classrooms, schools, and districts. For my analyses, I focused on
students in grades four through eight, as students in third grade did not have prior test scores—an
important covariate in my models exploring the teacher experience effect. Teacher-level data
analyzed included a variable for in-state teaching experience and whether or not teacher held an
advanced degree within a given academic year.
To ensure the stability of results and avoid the misattribution of effects when estimating
the teacher experience effect (described in more detail below), I restricted the sample of students.
Specifically, students had to: (a) be reliably linked to a single teacher for primary mathematics
(or ELA) instruction (i.e., the course of instruction fit the typical course progression, or only one
link existed); (b) have data on all controls included in my analysis models, and; (c) not be linked
to an atypical classroom (i.e., those containing fewer than five students, greater than 40 students,
greater than 50% of students missing prior achievement scores, or greater than 50% of students
being SPED). Following these restrictions, my final sample for mathematics contained 282833
students taught by 4517 teachers in 984 Kentucky schools across the two years. For ELA, this
final sample included 304770 students taught by 5239 teachers in 991 Kentucky schools across
the two years.
Methods
To explore whether the teacher experience effect on student test performance in
Kentucky changed after the state’s move from the KCCTs to the K-PREP tests, I estimated the
following student-level model (Equation 1) using OLS regression, clustering standard errors at
the school level:
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 8
𝑦𝑖𝑖𝑖𝑖𝑖𝑖 =
𝑌𝑖𝑖−1𝛼 + 𝐷𝑖𝑖𝛽 + 𝑃𝑖𝑖𝑖𝑖𝑖𝛿 + 𝐶𝑖𝑖𝑖𝛾 + 𝜅𝑖𝑖 + 𝜂𝑖 + 𝑓(EXP𝑖2010−11)𝜇 + 𝑓(EXP𝑖2011−12)𝜈 +
𝜀𝑖𝑖𝑖𝑖𝑖𝑖
The outcome in Equation 1, 𝑦𝑖𝑖𝑖𝑖𝑖𝑖, captures the performance of student i in class j taught by
teacher k in grade g in school s in the academic year t on either the mathematics or ELA test
from the KCCT or the K-PREP assessment.0F
1 This performance (i.e., the student’s scaled score)
was standardized within grade and year to have a mean of zero and standard deviation of one.
The model controlled for a vector of controls for student baseline ability levels (𝑌𝑖𝑖−1), including
a cubic function for prior test achievement; a vector of the student demographic characteristics
described above (𝐷𝑖𝑖); the aggregate of the two covariate vectors for a student’s classroom peers
(𝑃𝑖𝑖𝑖𝑖𝑖); the aggregate of the two covariate vectors for a student’s grade-level cohort (𝐶𝑖𝑖𝑖); and
grade-by-year fixed effects (𝜅𝑖𝑖).
My coefficients of interest in Equation 1 are the effects (modeled using different
functional forms described below) of being taught by a teacher k with experience EXP on the
outcome in different years, captured by 𝜇 for 2010-11 (i.e., the teacher experience effect on
KCCT performance) and by 𝜈 for 2011-12 (i.e., the teacher experience effect on K-PREP
performance). If the teacher experience effect on outcomes is largely independent of the specific
test, I would not expect the difference between the coefficients 𝜇 and 𝜈 to be statistically
significant. This result would contradict the theory that the positive returns to teacher experience
on outcomes seen in extant literature might be caused by test familiarity (and, subsequently,
1 Exploration into the distributions of scaled scores for students on standardized tests in 2010-11 showed a significant ceiling effect and a minor floor effect. To ensure that my results were not influenced by the loss of information regarding students’ actual ability levels at these extremes, I dropped students attaining the highest or lowest possible scale score on all tests from my analyses. Sensitivity checks suggested that inclusion of these students had minor impacts on estimates, with overall patterns remaining the same.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 9
ability to teach to the test well) of more veteran teachers. However, if the effect of experience is
in part explained by the test familiarity of more veteran teachers, as I would hypothesize, I
expect the difference between the two coefficients to be statistically significant. Specifically, I
would expect the effect of additional experience to attenuate in 2011-12, as both novice and
more veteran teachers would be equally (un)familiar with the new K-PREP assessments.
Key to my hypothesis is that the KCCT and the K-PREP assessments are sufficiently
different from one another as to cause a “drop” in test familiarity for more veteran teachers
following the standardized test change. Exploration into the alignment of items on the KCCT to
the CCSS (which were aligned to the K-PREP) suggested that, though items on the old
standardized tests did assess many of the new adopted standards, gaps still existed in terms of the
content and the depth of knowledge assessed (Taylor, Thacker, Koger, Koger, & Dickinson,
2010). Anecdotal evidence also indicated the K-PREP to be more difficult than the KCCT, which
was supported empirically by observations that far fewer students scored the highest possible
scaled score on the new exams.
Another key to my hypothesis is that evidence exists documenting the implementation of
narrow, test-specific instruction by teachers during the time period that the KCCT was
administered. Though anecdotal report does provide such evidence, some quantitative analyses
into the relationship between changes in school-level test performance across years with school-
level averages of FRPL in Kentucky have also supported this notion. Specifically, this
relationship, insignificant before the switch to the K-PREP, is negative and significant (i.e.,
schools with higher proportions of disadvantaged students worsen in their average performance
across years) in 2011-12 (Dickinson, Levinson, & Thacker, 2013). Though several reasons might
explain this observed relationship, extant research has found higher incidence of narrow, test-
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 10
specific instructional practices in schools with larger proportions of disadvantaged students (e.g.,
Herman & Golan, 1993). Thus, it stands that such schools would be “hurt” most in terms of
student achievement by the switch from the KCCT to the K-PREP assessments. These findings
informed supplemental analyses (described below) exploring how differences between the
teacher experience effect in 2010-11 and teacher experience effect in 2011-12 varied across
different types of schools.
A notable inclusion in the model represented in Equation 1 is the control for school fixed
effects (𝜂𝑖). Though the appropriate specification for modeling teacher effects on student
achievement is still being debated (see Goldhaber & Theobald, 2012), I opt to include school
fixed effects, as other researchers investigating returns to teacher experience have done in the
past (e.g., Papay & Kraft, 2015), and because prior research has provided evidence for the
systematic sorting of teachers—in particular, inexperienced ones—to certain types of schools
(see Rice, 2013). Further, other literature has documented heterogeneous effects of experience
across schools (Kraft & Papay, 2014; Loeb, Kalogrides, & Béteille, 2013; Sass, Hannaway, Xu,
Figlio, & Feng, 2012).
A notable exclusion in the model represented in Equation 1 is a control for teacher fixed
effects. Many researchers interested in exploring the effect on student outcomes of being taught
by teachers with varying experience levels have (rightfully) noted that cross-sectional analyses
fail to account for certain effect-biasing factors. Specifically, cross-sectional investigations fail to
account for selection biases and vintage effects (Murnane & Phillips, 1981). The former factor
suggests that more experienced teachers may yield larger gains for students on test outcomes
because the least effective teachers leave the teaching profession altogether. The latter factor
argues that teachers from different hired cohorts vary in their latent effectiveness such that the
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 11
effect of experience is confounded. Though several estimation methods exist that attempt to
address these issues (see Papay & Kraft, 2015), researchers most commonly include teacher
fixed effects into their models investigating the teacher experience effect, essentially comparing
more experienced teachers to their less-experienced selves. Inclusion of teacher fixed effects into
Equation 1, however, would result an unidentifiable model, due to the extant inclusion of both
grade-by-year fixed effects, to account for year-to-year “shocks” to test performance, and my
variables of interest—experience-by-year interactions. Thus, I opted to exclude teacher fixed
effects from my model. In attempts to alleviate some concern around selection bias and vintage
effects influencing my results, I conducted sensitivity analyses. First, as I am interested not
necessarily in the underlying trajectory of returns to teacher experience but am instead interested
in whether the teacher experience effect differs between academic years, I restricted the sample
of students to those only taught by teachers teaching students in both 2010-11 and 2011-12.1F
2
Second, I explored whether the novice teachers in the 2010-11 and 2011-12 cohorts might have
differed significantly in other measures of teacher quality. Specifically, I explored whether
teachers in one year were more likely to hold an advanced degree than those in the other year. I
also used data from the Common Core of Data to estimate Equation 1 controlling for changes in
enrollment from the prior year in 2010-11 and 2011-12 within each district (see Murnane &
Phillips, 1981, for a similar analysis). By doing so, I explored whether or not the quality of
incoming teachers, proxied by changes in enrollment (i.e., in years with larger increases in
enrollment, the demand for teachers will increase such as to reduce the overall quality of newly
hired teachers), might have explained my results.
Results
2 I include students taught by novice teachers in 2011-12, as such teachers did not instruct in the prior year.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 12
First, in order to provide a sense of the distribution of experience of teachers across years
in Kentucky, I show histograms of this measure in 2010-11 and 2011-12. As Figure 1 shows, a
large proportion of teachers have fewer than 10 years of in-state teaching experience, and
districts did hire a significant number of novice teachers in both years.
[Insert Figure 1 here.]
The distribution of teacher experience informed my categorization of teachers into different
“buckets” of experience for initial model estimation of Equation 1 for mathematics and ELA
student outcomes. Specifically, I compared the performance of students taught by novice
teachers to those taught by teachers of each additional year of experience up to nine (i.e., I
included dummy indicators for each year of experience from one to nine) and to those taught by
teachers with 10 or more years of experience. I argue for this model simplification because, in
many studies, very experienced teachers (who are also not the main focus of my exploration) do
not demonstrate significantly larger effects on achievement than those with a few years of
experience (see Kraft & Papay, 2014, for a similar simplification). Furthermore, as noted early,
administration of the KCCT began in the 1998-99 academic year, which would suggest that very
experienced teachers might not be that different from relatively experienced teachers in “test-
specific experience”, the key variable in my hypothesis.
Mathematics
Figure 2 presents a visual representation of 𝜇 and 𝜈 from estimation of Equation 1 for
student performance on Kentucky’s mathematics standardized tests for each individual year of
experience up to 10-plus years.
[Insert Figure 2 here.]
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 13
The two figures captured by my analyses depict a few patterns. First, the effect on KCCT
mathematics performance (i.e., the solid line) of being taught by non-novice teachers is positive;
in fact, the actual “trajectory” of effects as teacher experience increases through 10-plus years is
very similar to the trajectory seen in Papay and Kraft (2015)—despite the authors specifically
modeling of within-teacher returns to experience. However, this trajectory is much flatter in the
year following Kentucky’s switch to the K-PREP tests.
I used these initial exploratory results to further refine the controls for teacher experience
in my model formally testing differences in the experience effect across years. Specifically, I
employed only indicators (interacted with year) for being taught by a teacher with one year of
experience and for being taught by a teacher with two-plus years of experience. I reduced the
number of experience controls in estimating Equation 1 to reduce the number of comparisons of
effects across years made, and because Figure 2 indicated that, following positive increases in
effects for the first two years of experience, the cross-sectional “returns” to additional experience
were relatively stable. Table 1 shows the results from my regression estimates with these specific
controls for experience.
[Insert Table 1 here.]
Column 1 in Table 1 shows the results from my base analyses. Being taught by a teacher
with one year of experience in 2011, as opposed to a novice teacher in that same year, is
associated with a 0.09 SD (p<0.01) relative increase in KCCT mathematics achievement.
Students who were taught by teachers with two or more years of experience this year
experienced a relative growth of 0.12 SD (p<0.01) more than their peers taught by a novice
teacher. Comparatively, in 2012, the first year of K-PREP administration, the effect of being
taught by a teacher with one year of experience over a novice teacher was smaller and
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 14
insignificant (𝛽 = 0.00, p>0.1). The effect of being taught by a teacher with two or more years
of experience over a novice teacher, though still significant, was smaller as well (𝛽 = 0.05,
p<0.05). Importantly, these differences across years in the effect of experience were statistically
significant (see the F-test rows in Column 1)—the pattern of attenuation matched my hypothesis.
I investigated whether differences in the teacher experience effect across years might
vary depending on school-level averages for student FRPL. Prior research has found in some
settings that schools with larger proportions of such students demonstrate higher incidence of
narrow, test-specific instructional practices. Thusly, I hypothesized that schools with lower
FRPL-rates would witness smaller differences between the teacher experience effects across
years, as teachers in such schools would be less likely to enact the performance-impactful
practices affected most by a test change. Similarly, I hypothesized that schools with higher
FRPL-rates would witness larger differences. Columns 2 and 3 of Table 1 depict the results from
an analysis. In column 2, I looked at the teacher experience effect across years for schools in the
bottom tercile of FRPL-rates (i.e., between two- to 65-percent of students in a school being
eligible for FRPL in 2010-11), and indeed saw smaller differences across years. In column 3, I
looked at the teacher experience effect across years for schools in the top tercile of FRPL-rates
(i.e., 80-percent or more of students in a school being eligible for FRPL in 2010-11), and again
found my hypothesis to be supported. Though standard errors on estimates have increased due to
the restricted sample in both analyses, affecting comparison tests, the patterns and magnitudes of
the observed effects matched what I expected.
Sensitivity analyses. As noted above, the literature investigating the teacher experience
effect on student outcomes has raised several concerns regarding bias in estimated effects—
specifically, the bias caused by selection (of teachers out of the teaching profession) and vintage
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 15
effects. As such, I conducted two sets of sensitivity analyses that try to address the issues raised
by prior work investigating the teacher experience effect on student outcomes.
First, I explored the effects of being taught by a teacher with one or two-plus years of
experience (versus no experience) on mathematics achievement for the sample of students taught
by teachers that remained in the sample for both years or were novices in 2011-12. This
restriction should help alleviate the worry that departure of less-effective teachers from the
sample between the 2010-11 and 2011-12 school years may have biased my results. In column 4
of Table 1, however, we see that this sample restriction does not impact the substantive
conclusion of my analyses.
Second, I followed Murnane and Phillips (1981) and attempt to account for vintage
effects by controlling for changes in student enrollment in my model. As noted earlier, this
control proxies for teacher effectiveness, as the expectation is that years with larger increases in
enrollment will increase the demand for teachers, and subsequently decrease the overall quality
of newly hired teachers. The quality of novice teachers might be contributing to the effects I
observed; for example, if new hires in 2011-12 were particularly strong relative to those in 2010-
11, this could also attenuate the relationship between increased experience and test scores
following the test change. However, inclusion of the control for changes in enrollment again did
not impact the substantive interpretation of my results (see column 5 of Table 1). Furthermore, a
paired t-test looking at whether or not novice teachers in 2011-12 were more likely to hold
advanced degrees than novice teachers—another often used indicator for teacher quality in
school districts, also tied to teacher salary—in 2010-11 did not find significant differences
(results not shown). I acknowledge the limitation of using just these observable characteristics to
justify the equivalence of the quality of new hires across school years; however, I argue that
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 16
these sensitivity analyses at least yield some reassurances that vintage effects may not have
caused my main findings to emerge.
ELA
I followed the same analytic strategy depicted in my analyses of the effect of teacher
experience on students’ mathematics test outcomes for investigations into the effect of teacher
experience on students’ ELA test outcomes. Figure 3 depicts the results of my initial exploratory
estimation of Equation 1.
[Insert Figure 3 here.]
Compared to what was suggested regarding the teacher experience effect depicted for
mathematics in Figure 2, the effect depicted for ELA in Figure 3 suggests that the effect of
teacher experience on student ELA test performance is much smaller. The magnitude of these
results, however, matched those seen in prior work. Furthermore, the figure does show a flatter
trajectory of effects for experience in the year following Kentucky’s standardized test regime
change. When modeling the relationships formally (see Table 2), however, I was unable to reject
the null hypothesis that the coefficients for experience on ELA outcomes were the same across
years, despite the patterns across years arguably being similar to those seen in mathematics.
[Insert Table 2 here.]
Discussion
Teacher experience has been used for decades as an indicator for teacher effectiveness in
school districts across the country. Expectations are that increased time in the classroom allows
teachers to develop skills and refine instructional practices that increase their positive impact on
student learning. This hypothesis has been largely borne out in empirical studies linking teacher
experience and student test performance, one indicator for student learning. Perhaps
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 17
unsurprisingly, contracts typically tie teacher salaries to their experience and districts often base
personnel decisions on the time a teacher has been in the classroom.
The utility of experience as a proxy for a teacher’s effectiveness in impacting student
persistent growth, however, depends in part on whether the skills and abilities gained with
experience are test-independent. Numerous studies have documented the narrowing of teachers’
instruction to focus on exam-specific item formats and content when high-stakes are attached to
student performance on standardized tests. Thus, the observed teacher experience effect might in
reality reflect the development of effective test-specific instructional behaviors gained from
additional experience with standardized tests. This hypothesis could help explain the surprising
fade out of teacher impacts over time.
A switch from the KCCTs to the K-PREP assessments in Kentucky allowed me to
investigate the possibility that the effect of teacher experience on student outcomes might in part
be explained by test familiarity. I found that students taught by teachers with more experience
saw more growth in mathematics in the year before the standardized test switch than those taught
by teachers with the same amount of experience in the year after. Furthermore, the difference
was more pronounced in schools that served more disadvantaged populations; test-focused
instruction has been more documented as more prevalent in such schools. These finding support
the hypothesis of a teacher test-experience effect contributing to the overall teacher experience
effect, as novices and more veteran teachers are both inexperienced with test item formats and
content following a testing change.
Two points about the Kentucky’s switch from the KCCT to the K-PREP assessments are
necessary to put my findings in context. First, the items used in the final administration of the
KCCT (in the 2010-11 academic year) were the same as those used in the 2008-09
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 18
administration. This repetition in particular benefits those with more experience with the formats
and content of state test items, and results in even larger differentials between novice and more
veteran teachers in terms of test performance impacts; thus, the difference I observed between
teacher experience effects across years may be an upper bound. On the other hand, the state had
also suspended school test-based accountability in that final administration and officially adopted
the CCSS curriculum two years before the test switch. Had schools and teachers been under
typical accountability pressures, the state could have witnessed an even greater narrowing of
instruction, and the novice-veteran differential would have been larger. It is likely both of these
contextual factors influence my estimates for the teacher experience effect on mathematics in
Kentucky.
What do my findings mean for policy? Teachers’ salaries in Kentucky, like the salaries of
those in other states, are tied in part to experience. If compensation is partly tied to experience
because of its expected relationship to student test outcomes, states and districts will need to
consider whether or not the potential test-specific nature of the teacher experience effect should
be rewarded, and to what extent. For example, I observed the difference in student test score
impacts between novice teachers and teachers with two years of experience in Kentucky to be
approximately 0.10 standard deviations in 2010-11, but only 0.05 standard deviations in 2011-12
(see Figure 2); policymakers may consider basing compensation policies on the latter
differential.
Replication in other settings is essential to corroborate my work and to help develop a
better sense of how much of the teacher experience effect is captured by teacher test-experience.
With many states now considering or already having recently moved from their original
standardized tests to those more aligned to the CCSS, replication should be a much more
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 19
practical endeavor. Analyses investigating the relationship of experience to other measures of
teacher quality can also help support my hypothesis; for example, even if the teacher experience
effect is in part test-specific, I would still expect differentials in terms of impact on student non-
test-score outcomes between novice and more veteran teachers to remain following a test change.
If experience yields positive effects on these outcomes and these effects persist, it may be that
states and districts should continue to reward teachers based on their experience.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 20
References
Bynum, B. H., & Thacker, A. A. (2011). Third-party checking of calibration, scaling and
equating of the 2011 Kentucky Core Content Test. Alexandria, VA: Human Resources
Research Organization. Retrieved from
http://education.ky.gov/AA/KTS/Documents/FR-11-65%20KCCT%20Third-
Party%20Checking%202011.pdf
Corcoran, S. P., Jennings, J. L., & Beveridge, A. A. (2011). Teacher effectiveness on high-and
low-stakes tests. Paper presented at the meeting of the Society for Research on
Educational Effectiveness, Washington, DC.
Dickinson, E. R., Levinson, H., & Thacker, A. A. (2013). Exploring patterns in school
achievement from KCCT to K-PREP: The role of school-level socioeconomic status.
Alexandria, VA: Human Resources Research Organization. Retrieved from
http://education.ky.gov/aa/kts/documents/humrro%202013-036%20kcct%20to%20k-
prep%20school%20level%20ses.pdf
Goldhaber, D., & Theobald, R. (2012). Do different value-added models tell us the same
things? Center for Education Data & Research: Seattle, WA.
Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality and student
achievement. Journal of Public Economics, 95(7), 798-812.
Herman, J. L., & Golan, S. (1993). The effects of standardized testing on teaching and
schools. Educational Measurement: Issues and Practice, 12(4), 20-25.
Jacob, B. A. (2005). Accountability, incentives and behavior: The impact of high-stakes testing
in the Chicago Public Schools. Journal of Public Economics, 89, 761-796.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 21
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An
experimental evaluation (No. w14607). National Bureau of Economic Research:
Cambridge, MA.
Kraft, M. A., & Papay, J. P. (2014). Can professional environments in schools promote teacher
development? Explaining heterogeneity in returns to teaching experience. Educational
Evaluation and Policy Analysis, 36(4), 476-500.
Leinhardt, G. (1989). Math lessons: A contrast of novice and expert competence. Journal for
Research in Mathematics Education, 20(1), 52-75.
Loeb, S., Kalogrides, D., & Béteille, T. (2012). Effective schools: Teacher hiring, assignment,
development, and retention. Education Finance and Policy, 7(3), 269-304.
Papay, J. P., & Kraft, M. A. (2015). Productivity returns to experience in the teacher labor
market: Methodological challenges and new evidence on long-term career improvement.
Journal of Public Economics.
Murnane, R. J., & Phillips, B. R. (1981). Learning by doing, vintage, and selection: Three pieces
of the puzzle relating teaching experience and teaching performance. Economics of
Education Review, 1(4), 453-465.
Rice, J. K. (2013). Learning from experience? Evidence on the impact and distribution of teacher
experience and the implications for teacher policy. Education Finance and Policy, 8(3),
332-348.
Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from
panel data. American Economic Review, 247-252.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 22
Sass, T. R., Hannaway, J., Xu, Z., Figlio, D. N., & Feng, L. (2012). Value added of teachers in
high-poverty schools and lower poverty schools. Journal of Urban Economics, 72(2),
104-122.
Schools and Staffing Survey (SASS), National Center for Education Statistics, Institute of
Education Sciences, U.S. Department of Education. (2012). Table 2. Percentage of public
school districts that had salary schedules for teachers and among those that had salary
schedules, the average yearly teacher base salary, by various levels of degrees and
experience and state: 2011–12 [Data set]. Retrieved from
https://nces.ed.gov/surveys/sass/tables/sass1112_2013311_d1s_002.asp
Scribner, J. P., & Akiba, M. (2010). Exploring the relationship between prior career experience
and instructional quality among mathematics and science teachers in alternative teacher
certification programs. Educational Policy, 24(4), 602-607.
Taylor, L. R., Thacker, A. A., Koger, L. E., Koger, M. E., & Dickinson, E. (2010). Alignment of
the Kentucky Core Content Test (KCCT) items to the Common Core State Standards.
Alexandria, VA: Human Resources Research Organization. Retrieved from
http://education.ky.gov/AA/KTS/Documents/FR-10-
36%20KCCT%20Common%20Core%20Alignment%20to%20State%20Standards.pdf
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 23
Figure 1. Distribution of teacher experience in 2011 and 2012 in Kentucky.
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 24
Figure 2. Plotted regression coefficients of teachers’ experience on student mathematics achievement growth. Effect of teachers with more than 10 years of experience collapsed into the 10-year category.
0.0
5.1
.15
.2S
tude
nt A
chie
vem
ent
0 2 4 6 8 10Experience
2011 2012
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 25
Table 1. Regression coefficients for teachers’ experience on students’ mathematics achievement growth 1 2 3 4 5 1-year Exp. Effect in 2011 0.0856*** 0.0739** 0.119** 0.0840*** 0.0881***
(0.0227) (0.0351) (0.0491) (0.0255) (0.0234)
1-year Exp. Effect in 2012 -0.000827 0.0282 0.0231 0.00621 -0.00163
(0.0251) (0.0403) (0.0584) (0.0298) (0.0252)
2-plus-years Exp. Effect in 2011 0.120*** 0.0863*** 0.111*** 0.120*** 0.120***
(0.0170) (0.0190) (0.0332) (0.0195) (0.0174)
2-plus-years Exp. Effect in 2012 0.0495** 0.0609** 0.0159 0.0445** 0.0514**
(0.0208) (0.0287) (0.0482) (0.0218) (0.0207)
Controls Student Demographics x x x x x
Prior Achievement x x x x x Grade-by-year Fixed Effects x x x x x Cohort Aggregates x x x x x Peer Aggregates x x x x x School Fixed Effects x x x x x
Sensitivity Checks Low-FRPL Schools
x High-FRPL Schools
x
Selection Sample
x Control for Change in Enrollment
x
from Prior Year
Observations 282833 125847 58362 224451 280223 R-squared 0.510 0.518 0.429 0.516 0.510
F-tests p-value: 2011 1-year Exp. Effect 0.0171 0.419 0.224 0.0705 0.0148
vs. 2012 1-year Exp. Effect p-value: 2011 2-plus-years Exp. Effect 0.00948 0.466 0.0795 0.00893 0.0121
vs. 2012 2-plus-years Exp. Effect
Note: School-level clustered standard errors reported in parentheses. The Selection Sample includes only students taught by teachers who teach in both 2011 and 2012 (or are novices in 2012). ***p<0.01, **p<0.05, *p<0.1
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 26
Figure 3. Plotted regression coefficients of teachers’ experience on student ELA achievement growth. Effect of teachers with more than 10 years of experience collapsed into the 10-year category.
0.0
5.1
.15
.2S
tude
nt A
chie
vem
ent
0 2 4 6 8 10Experience
2011 2012
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 27
Table 2. Regression coefficients for teachers’ experience on students’ ELA achievement growth
1-year Exp. Effect in 2011 0.0473*
(0.0248)
1-year Exp. Effect in 2012 0.0308*
(0.0173)
2-year Exp. Effect in 2011 0.0152
(0.0238)
2-year Exp. Effect in 2012 0.0319*
(0.0179)
3-year Exp. Effect in 2011 0.0208
(0.0230)
3-year Exp. Effect in 2012 0.0104
(0.0180)
4-plus-years Exp. Effect in 2011 0.0525***
(0.0164)
4-plus-years Exp. Effect in 2012 0.0271**
(0.0134)
Controls Student Demographics x
Prior Achievement x Grade-by-year Fixed Effects x Cohort Aggregates x Peer Aggregates x School Fixed Effects x
Observations 304770 R-squared 0.492
F-tests p-value: 2011 1-year Exp. Effect vs. 2012 1-year Exp. Effect 0.578
p-value: 2011 2-year Exp. Effect vs. 2012 2-year Exp. Effect 0.576 p-value: 2011 3-year Exp. Effect vs. 2012 3-year Exp. Effect 0.712 p-value: 2011 4-plus-years Exp. Effect vs. 2012 4-plus-years Exp. Effect 0.222
Note: School-level clustered standard errors reported in parentheses. ***p<0.01, **p<0.05, *p<0.1