23
Geoffrey D. Borman, Deven Carlson, and Michelle Robinson, University of Wisconsin—Madison Presentation Prepared for IES Research Conference National Harbor, MD June 30, 2010 The District-Level Achievement Impacts of Benchmark Assessments: Year 1 Outcomes of CDDRE

Geoffrey D. Borman, Deven Carlson, and Michelle Robinson, University of Wisconsin—Madison Presentation Prepared for IES Research Conference National Harbor,

Embed Size (px)

Citation preview

Geoffrey D. Borman, Deven Carlson, and Michelle Robinson,

University of Wisconsin—Madison

Presentation Prepared for IES Research Conference

National Harbor, MDJune 30, 2010

The District-Level Achievement Impacts of Benchmark Assessments: Year 1 Outcomes

of CDDRE

IntroductionStudent assessments, accountability programs,

and the use of associated data systems have recently emerged as central strategies for improving the nation’s public schoolsImpetus for such strategies can be traced to

NCLB

Relatively little rigorous empirical work evaluating the impacts of adopting more proactive uses of data and data systems on student and school achievementCDDRE is one of the most prominent research

and development efforts in this area

Research QuestionThis presentation reports the results of an

experimental evaluation of one component of the CDDRE intervention

Specifically, we report results from a multi-state district-level cluster randomized trial of the impact of quarterly benchmark assessments on reading and mathematics achievement

Research Question: Does the implementation and administration of

benchmark assessments, coupled with the provision of consulting services to assist in the interpretation of the resulting data, bring about district-wide changes in student achievement relative to a control condition in which districts operated as usual without benchmark assessments and associated services?

Existing LiteratureData-driven reform involves collecting,

interpreting, and disseminating data in a manner that is intended to inform and guide district and school improvement efforts.

Bernhardt (2003) identifies four categories of data that practitioners can analyze to inform reform efforts1. Student learning and assessment2. Demographics3. School Processes4. Teacher Perceptions

Perie, Marion, & Gong (2009) discuss three possible uses of assessment results

1. Instructional2. Evaluative3. Predictive

Existing LiteratureEmpirical literature on data-driven reform can be

classified into two main groups of studies1. Teacher responses to provision of student data2. Effects of data-driven decision-making on student

outcomes

1.Teacher response to provision of student data Teachers believe interim tests assist them in

identifying areas in which their students would benefit from additional instruction

Teachers report that they alter instruction in response to assessment results

Results come from teacher self-reports Objective assessments of teacher data use reach

somewhat less optimistic conclusions

Existing Literature2. Effects of data-driven decision-making on

student outcomes Three large-scale empirical studies

1. Quint, Sepanik, & Smith (2008) evaluated a data-driven instruction program, which included benchmark assessments, implemented in 21 public schools in Boston during 05-06 and 06-07 school years

No statistical or practical differences in achievement2. Henderson et al. (2007) examined effect of benchmark

assessments in 8 MA school districts No statistically significant effects. Point estimates were

positive, but study was somewhat underpowered3. May and Robinson (2008) RCT evaluated Ohio’s

Personalized Assessment Reporting System for Ohio Graduation Test

No effects for first-time test takers, but positive effects for re-takers

Several in-depth case studies attribute achievement improvements to data use

Existing Literature On the whole, there is some empirical evidence

that implementation of data-driven reforms, including benchmark assessments, can lead to improvement in achievement, but the evidence is far from conclusive

A group of RAND researchers note that more research on the topic is clearly needed:

“...experimental studies are needed to more rigorously measure the effects of enhanced provision of data and supports to use it. Standardized interventions can be developed and tested in randomized trials. For example, studies might examine whether the provision of interim progress test data or value-added measures, combined with ongoing professional development for teachers on how to use the information, leads to better instruction and higher achievement than do classrooms without such data and training” (Marsh, Pane, & Hamilton, 2006, p. 12).

Method: Sample RecruitmentSample recruitment took place over three school

years— 2004-05 to 2006-07—across seven states: AL, AZ, IN, MS, OH, PA, and TNState departments of education nominated districts

with large numbers of low-performing schools to participate

Resulted in 59 districts agreeing to participateDistrict officials decided which schools to target

—generally a subset of the lowest-performing schools in the districtFinal sample included 549 schools across 59

districtsSample has geographic, racial/ethnic, and

socioeconomic diversity

Method: Randomization ProceduresDistrict-level randomized block design

For each cohort, districts were blocked by state and then randomly assigned to proximate treatment condition (treatment group) or delayed treatment condition (control group) with selection probability of approximately 50 percent

Several advantages to the randomized block design1.Intervention will be distributed in fair and

equitable manner across states2.Enhances the face validity of the study3.Increases statistical precision and power

No statistically significant differences between treatment and control group on any pretreatment covariates

Results of Randomization

Results of Randomization

Method: Implementation of Treatment and Control ConditionsCDDRE intervention includes several components

implemented across a three-year periodWe test the impacts of the Year 1 components

1. Quarterly benchmark assessments tied to state standards and assessments in reading, writing, and mathematics—predict performance on state assessments, determine needs for specific interventions, and evaluate students’ progress toward state goals

2. Reviews of state test data, benchmark assessments, questionnaires from educators at all levels, and other indicators to identify areas of need for schools not meeting AYP goals or at risk for AYP failure

3. Training for state, district, and building leaders in interpreting and using data, managing resources to focus on areas of need, and leading a data-driven reform process.

“Business as usual” counterfactual during one-year treatment delay

Method: Outcome MeasureOutcome measure is school-level performance

on state-administered achievement testsAdvantage: Every student tested at state-expense

and grade-level or school-level data are generally publicly availableAll data in this analysis were collected from state

websitesAdvantage: 4sight benchmark assessments

developed from state assessment blueprints so state assessments are the ideal outcome measure

Disadvantage: Comparability across grades, subjects, and especially states can be difficultSolution: Transform all assessment results into z-scores

Separate analyses for reading and math

Method: CovariatesBaseline school-level achievement

Increase the precision of impact estimatesAlso in the form of z-scores

School-level demographicsPercent minorityPercent eligible for free- or reduced-price

lunch

District-level demographicsPercent minorityPercent eligible for free- or reduced-price

lunchPercent receiving special education services

Method: Missing Data ProceduresReading analysis

School-level achievement outcomes unavailable for 25 of 549 schools in our sample (approximately 5%)Eight schools in treatment districts, 17 schools in control

districtsExcluded from the analysis

Five schools with outcome data missing baseline achievement dataWe follow guidance of Puma et al. (2009) and assign average

achievement in district to these schools and include dummy code in analytic model

Three schools with outcome data missing free- or reduced-price lunch eligibility data Assign average free/reduced lunch eligibility in district to

these schools and include dummy code in analytic modelAnalytic sample: 524 schools located in 59 districts

Method: Missing Data ProceduresMath analysis

Three districts did not administer benchmark assessments in math—Neither OH district and SE Delco in PAReduces number of schools in math analysis to 538

School-level achievement outcomes unavailable for 29 of 538 schools (approximately 5%)Seven schools in treatment districts, 22 schools in

control districtsExcluded from the analysis

Three schools with outcome data missing baseline achievement or free/reduced lunch dataAs in reading analysis, we assign average district value

and include dummy in analytic modelAnalytic sample: 509 schools located in 56

districts

Method: Analytic ModelMultilevel modeling approach

Appropriate for the analysis of data from cluster randomized trials such as this one

We estimate four models for each subject1.Unconditional model to determine amount of

variation in school-level achievement attributable to school-level factors versus district-level factors

2.Model with only treatment indicator and randomization blocks

3.Model with treatment indicator, randomization blocks, and baseline achievement

4.Fully specified model: treatment indicator, randomization blocks, baseline achievement, and baseline demographics

Method: Analytic ModelFully specified level-one (school-level)

model is written as:

Yij = β0j + β1(Base Ach.)ij + β2(FRPL)ij + β3(Pct. Min.)ij + β4(Base Ach. Miss)ij + β5(FRPL Miss)ij + εij

Y represents average school-level achievement and i and j index schools and districts, respectively

Method- Analytic ModelFully specified level-two, district-level,

model written as:

β0j = γ00 + γ01(Treat)j + γ02(Pct. Min.)j + γ03(FRPL)j + γ04(Pct. Sp. Ed.)j + δ(Rand. Block)j + τj

Treatment indicator at district levelRandomization blocks modeled as fixed effects

Results: MathematicsFixed Effect (1) (2) (3) (4)

Treatment -0.003(0.074)

0.058**(0.027)

0.057**(0.029)

Baseline Ach. 0.885***(0.024)

0.802***(0.028)

Pct. Min.- School -0.002***(0.001)

Pct. FRPL- School

-0.0009*(0.0004)

Pct. Min.- Dist. -0.0001(0.001)

Pct. FRPL- Dist. 0.001(0.001)

Pct. Sp. Ed.- Dist.

0.001(0.005)

Random Effect (1) (2) (3) (4)

District (Intercept)

0.072(0.017)

0.058(0.015)

0.006(0.002)

0.006(0.002)

Residual 0.080(0.005)

0.080(0.005)

0.023(0.002)

0.022(0.001)

Results- ReadingFixed Effect (1) (2) (3) (4)

Treatment -0.044(0.062)

0.027(0.020)

0.033(0.020)

Baseline Ach. 0.893***(0.025)

0.802***(0.032)

Pct. Min.- School -0.0013**(0.0006)

Pct. FRPL- School

-0.0009**(0.0005)

Pct. Min.- Dist. 0.0001(0.001)

Pct. FRPL- Dist. 0.0001(0.001)

Pct. Sp. Ed.- Dist.

0.004(0.004)

Random Effect (1) (2) (3) (4)

District (Intercept)

0.056(0.013)

0.042(0.011)

0.006(0.002)

0.006(0.002)

Residual 0.073(0.005)

0.073(0.005)

0.023(0.002)

0.022(0.001)

Results: Summary and Magnitude of EffectsFully specified model indicated a treatment

impact on mathematics achievement of d = 0.21Comparable to reducing school-level

free/reduced lunch eligibility by 60 percentage points

In reading, the impact approached, but did not reach, conventional levels of significance (p-value = 0.10)Equivalent to an effect size of d = 0.14Comparable to reducing school-level

free/reduced lunch eligibility by 35 percentage points

ConclusionsOne of the first large-scale, district-level

experimental evaluations of benchmark assessments

Results suggest that implementation and administration of benchmark assessments, coupled with appropriate professional development, can raise student achievement, particularly in math

Geographic and socioeconomic diversity of our sample gives us confidence that the results of this CRT are generalizable