NCLB and Growth Models: In Conflict or in Concert? Susan L. Rigney, United States Department of Education Joseph A. Martineau, Michigan Department of Education

NCLB and Growth Models: In Conflict or in Concert?

Susan L. Rigney, United States Department of EducationJoseph A. Martineau, Michigan Department of Education

Presented at the MARCES conference onLongitudinal Modeling of Student AchievementCollege Park, MD

November 7, 2005

Introduction

“In response to your concerns about giving schools credit for improving student achievement, we are also considering the idea of a growth model…”

Margaret Spellings

9/13/05

Author Perspectives

Sue RigneyEducation Specialist in the office of Student

Assessment and School Accountability (Title I) at the U. S. Department of Education.

Primary responsibility = monitoring state compliance with the standards, assessment and accountability requirements of NCLB

Secondary responsibility = contributing to ongoing discussion, clarification and implementation of policies related to assessment and accountability.

Author Perspectives

Joseph Martineau Psychometrician for the Michigan Office of Educational

Assessment and Accountability. Primary concerns = congruence of accountability

systems with values of educational research & adequacy of statistical & psychometric methodology

His secondary concerns = philosophy and policy of accountability in terms of both practicality and feasibility

Authorship should not be construed as an endorsement of NCLB as a whole.

In conflict?

CRS says Substantial interest…in the possible use of

individual/cohort growth models… Such AYP models are not consistent with certain statutory provisions of NCLB as currently interpreted by USED

But, NCLB (Sec 4) says The Secretary shall take such steps as are

necessary to provide for the orderly transition to, and implementation of, programs authorized by this Act

In concert?

USED Growth Model Study Group

IES grant for longitudinal data systems

State Accountability Workbook

Amendments

Types of Models

Definitions developed by a State collaborative through CCSSO (Goldschmidt et al, 2005)

Definitions Cross-sectional models

Status Models Improvement Models

Longitudinal Models Growth Models Residual Growth (RG) Models

Commonly labeled “Value Added” Models Why we use the term RG

The Intersection of Policy and Growth Models

3-8 Assessments Provide Longitudinal Data

Safe Harbor

Use of Improvement Index in AYP

CCSSO SCASS Activities

USED Assistant Secretary Luce

Systemic Coherence:A Standard for Evaluating Models

Three broad principles of systemic coherenceModels are consistent with policy goalsModels are integrated as a part of a consistent

system of content standards, assessments, performance standards, and accountability criteria

Models are implemented in a manner consistent with the values of educational research

1. Standards-based

Assessments must cover depth and breadthResults expressed in terms of performance

levels% Proficient is most influential component

of AYP

2. All Students

Participate (95% rule) Results reported for all AYP = Not all Visible

Full Academic Year Minimum n LEP exemption for ELA test

Held to same standards Alternate based on alternate achievement

standards

3. School Improvement

Annual Measurable Objectives

Increased in 2004-05

Adjustment for transition in 2005-06

School accountable for subgroups

More visible in 2005-06

Consequences

Can/should growth moderate consequences?

Consistency of Content Standards, Assessments, Performance Standards, and Accountability Criteria

Accountability based on academic indicators

Peer Review of State Assessment SystemsAlignmentPerformance descriptorsAlternate assessments

Coherent Assessment System

State assessments Rational, coherent design Relative contribution of different tests Matrix forms equivalent Comparability

English vs Spanish Computer vs paper & pencil

Local assessments Aligned, equivalent, comparable results for

subgroups, aggregable

Results understandable

Educators know what to do

Articulation across grades

Articulation across performance levels

A “progression matrix” that show

Proficient is different from basic because…

Proficient in third grade is different form proficient in fourth grade

because…

Administrators know how to allocate resources

Consistency with Values of Educational Research

As defined by Gregory N. Derry1.Free flow of information & Curiosity

ReplicabilityThorough peer review Improvement

Honesty and Open-mindednessWillingness to consider multiple alternativesScrupulous investigations of weaknessesFlexibility to adopt feasible improvements

1 Professor of Physics at Loyola University and author of What Science Is and How It Works (Princeton University Press, 1999)

Attributes of Systemic Coherence Applicable in this Context

1. Alignment of standards and assessments2. The same performance standards for all3. Inclusion of all student groups4. Explicit tracking of achievement gaps5. Appropriate statistical and psychometric

models6. A program of ongoing research7. Consistency of reports with all other

attributes

1. Alignment of Standards and Assessments

Foundation of validity of school accountability decisions

USED expects independent verification ofFull range of content standards?Address content and process skills?Same degree and pattern of emphasis?Scores reflect full range of achievement?Procedures to maintain/improve?

Alignment methods

Alignment MethodologyWebb (SCASS TILSA)Porter (SCASS SEC)AchieveBuros

Methods do not address articulation across grades

JM: Current instantiations of “independent review” may underestimate alignment

2. The Same Standards for All Students

Grade-level achievement standards Except for students with most significant cognitive disabilities (1%)

All students proficient by 2013-14 What about growth toward proficient? What about length of time in system?

Proposals to balance fairness toward both educators and student groups should also be a part of any plan to implement growth models for accountability purposes. Fairness toward one should not be sacrificed for fairness toward the other.


JM: The NCLB expectation that all students will be proficient by a given date seems unreasonable. The recognition that there will always be individual differences among students (and aggregate differences across schools in their intake populations) should also be incorporated in setting policy targets.

SR: Safe harbor recognizes that adequate yearly progress may be met with less than 100% meeting annual and long-range goals.

JM: The safe harbor provision of NCLB is a good beginning, but does not fully account for these realities.


JM: The punitive nature of NCLB consequences can actually undermine policy objectives by adding turbulence to schools serving low-achieving students.

SR: The pressures of accountability have resulted in remarkable successes (Ed Trust), and there are multiple safeguards to prevent Type I error.

JM: The multiple safeguards are an important starts, but policies encouraging more assistance in and attraction of highly effective educators to low-achieving schools is more likely to support the policy objectives.

SR: NCLB funds are available for recruitment and retention bonuses, and data indicate that states are beginning to use these funds in this way.

Implications for growth model

Expectation of same growth for all maintains achievement gap

Expectation of 12 months growth in 1 year maintains achievement gap

Expectation of normative growth maintains achievement gap

3. Inclusion of All Student Groups

Missing data means missing studentsHow many missing students does it take to

compromise validity?Robustness to missing data does not imply

that it is OK to leave out data where it can reasonably be obtained

4. Explicitly Tracking Achievement Gaps

Closing the achievement gap is a…Policy objectiveMatter of ethicsAttainable

Tracking the achievement gap makes inequities publicly visible

4. Explicitly Tracking Achievement Gaps, continued…

Separate models from those used to track attainment of growth targets

Include in the model variables defining policy-defined subgroups

Interaction of grade with subgroup variablesSimple graphical representation of the

results

5. Appropriate Statistical and Psychometric Models

Statistical concerns Match of model to data structure Violations of assumption Do random effects models “cheat?” How do we integrate results from alternate

assessments? What is the sample, and what is the population? Different models needed for different purposes

Meeting growth targets Tracking achievement gaps Primary research

5. Appropriate Statistical and Psychometric Models

Statistical concerns Are the models correlational or causal? The mandated

data collection is correlations. JM: The mandated policy uses are more causal. The

descriptive statistics are used to label schools as in need of improvement, and if students are not achieving reasonable goals, it is hard to argue with this label. However, the distinction between schools in need of improvement and ineffective educators is unlikely to be either fathomed or appreciated by many people. The nature of NCLB consequences invites this unfounded interpretation.

SR: The statute provides substantial resources for professional development and instructional materials in order to help educators meet the extraordinary needs of the children they serve.

5. Appropriate Statistical and Psychometric Models, continued…

Unwarranted assumptionsNo equating error

Vertical – Doran (2005)Horizontal – not studied, but most assessments only

have a few anchor items in common across years Interval level scale

If using scale scores, most models assume equal interval measurement

Psychometrically suspectEffects not well studied


Unwarranted assumptions, continued… A single continuous scale on the same construct across grades

(vertical or developmental scales) Mathematical demonstrations (Martineau, 2004, in press)

We purposely build content shift into our assessments across grades High correlations among sub-constructs do not take care of the problem Students where growth is occurring outside the curriculum-defined range

for the grade are not measured well Effects of prior schools/grades become attributed to later schools/grades Practically significant effects of the misattributions occur in all reasonably

conceivable assessment scenarios Empirical validation (Lockwood et al, under peer review)

Subscales of math assessment, greater variability within teacher across subscales than across teachers within subscale.

Low correlations in “value added” across subscales The sub-content matters tremendously


Unwarranted assumptions, continued… We need to account for equating error We need to study the effects of the interval-level

measurement assumption and either Validate the assumption, or Not make the assumption

We need to either Develop psychometric models that can account for change in

content across grades, or Not assume the same content across grades

Analytical models that avoid scale assumptions Hill’s Value Table approach (this conference) Betebenner transition matrix approach (2005) Standards-based interpretations, can use baseline data

6. An Ongoing Program of Research

A turbulent field (“in its adolescence,” to quote Lissitz)

Large-scale implementation in a turbulent field requires extraordinary flexibility to keep up with the state of the art

And yet, too much flexibility can thwart useful interpretation of trend data

7. Consistency of Reports with Other Attributes

Responsive to instruction?

Understandable to stakeholders?

Grounded in policy aims?

Valid & reliable?

Setting standards for growth

What’s reasonable?

vs

What do we hope to accomplish?

What’s fair?

Growth & school consequences

Less than1 year 1 year

More thanone year

Advanced OK(?) OK GreatProficient Not OK OK OK

Basic Not Ok Not OK OK (?)

Achievement

Growth

Conclusions

Can we add growth?

Yes!

Should we add growth?

Yes, where there is an evaluative framework tied to policy

objectives, a systemic approach, and alignment with the values of

educational research

Must we add growth?

An option, not a requirement because of the extraordinary

necessary infrastructure

Recommendations for Policymakers

Understand the basic differences between models – Run simulations with real data

Understand the limitationsListen to practitionersListen to methodologists

Anticipate cost/benefitsLack of stability corrupts meaningDo not over-specify the details in statute

This field moves ahead quicklyFlexibility to implement advances is key

Recommendations for Accountability Implementation Staff

State Directors: give your staff time to write it up!! Require greater detail in the Technical Manuals

that allows for comprehensive review of the procedures

Explain it (as much as you can) to your legislators and Congresspersons

Challenge assumptions Status quo is good Change is good Resource assumptions Claims of proponents

Recommendations for Technical Researchers

Validity need not conflict with transparency Validity

Maintain sufficient complexity to produce valid results

Transparency for non-technical stakeholders Simple, but accurate reports Grounded interpretations

Transparency for technical stakeholders Comprehensive documentation of the entire system, including

psychometric and statistical models Facilitation of replication Facilitation of primary research on strengths and weaknesses

Recommendations for Technical Researchers

Pay systemic attention to… Assumptions of psychometric models Assumptions of content standard models Assumptions of statistical models

Think carefully about what the models can tell us and cannot tell us about instruction, curriculum, and student development

Develop simple graphical representations of the model and its important concepts for policymaker consumption

Become involved in public policy forums as a community lobby in order to promote appropriate interpretation of data. We cannot give our cautions, wash our hands of how the data is

used, and stand on the outside of the political process

Recommendations for All Stakeholders

Realize that with all of the high stakes surrounding accountability uses of student achievement data, there are forces that can work against community interests: Economic benefits, reputations, and other personal

investments can cause proponents of specific systems to avoid scrupulous investigations of the shortcomings of those systems and/or the benefits of competing approaches

Willingness to be and accountability for being rigorously honest and open-minded about multiple approaches is an essential part of improving and evaluating growth-based accountability systems

Documents

NCLB and Growth Models: In Conflict or in Concert? Susan L. Rigney, United States Department of Education Joseph A. Martineau, Michigan Department of Education