How is Testing Supposed to Improve Schooling? Edward Haertel April 15, 2012 NCME Career Award...

Preview:

Citation preview

How is Testing Supposed toImprove Schooling?

Edward HaertelApril 15, 2012

NCME Career Award Address

Vancouver, British Columbia

1

How Many Purposes… ?

2

2004 2005 2006 2007 2008 2009 2010 2011 2012 20130

2

4

6

8

10

12

14

16

Year of Talk

Nu

mb

er

of

Te

st

Use

s

Purposes for Educational Testing

3

Measuring Influencing

Learning Instructional Guidance

Learners Student Placement and Selection

Directing Student Effort

Methods Informing Comparisons Among Educational Approaches

Focusing the System

Actors Educational Management

Shaping Public Perceptions

Measuring versus InfluencingMeasuring

◦Relies directly on informational content of specific test scores

Influencing◦Effects intended to flow from testing

per se, independent of specific test results Deliberate efforts to raise test scores Changing perceptions or ideas

4

Example: Weekly Spelling Test

Measuring◦ Note words often missed (guides

reteaching)◦ Assign grades◦ Guide students’ review following testing

Influencing◦ Motivate studying◦ Convey importance of spelling proficiency

5

Leap from measuring to influencing

6

Arguments … claim … program will lead to improvements in school effectiveness and student achievement by focusing … attention … on demanding content.

Yet, the validity arguments … attend only to the descriptive part of the interpretive argument …. The validity evidence … tends to focus on scoring and generalization to the content domain for the test.

The claim that the imposition of the accountability requirements will improve the overall performance of schools and students is taken for granted.

Kane, M. T. (2006). Validation. In R. L Brennan (Ed.), Educational Measurement (4th ed., pp. 17-64)

Interpretive ArgumentScoring

◦ Alignment, DIF, scaling, norming, equating, …

Generalization◦ Score precision, reliability, generalizability,

…Extrapolation

◦ Score as reflection of intended constructDecision or Implication

◦ Use in guiding action or informing description

7

8

“Appropriate test use and sound interpretation of test scores are likely to remain primarily the responsibility of the test user.”

Standards for Educational and Psychological Testing, p. 111

Not our concern?

Process too linear?

Curriculum FrameworkTest SpecificationItem WritingForms AssemblyTryout and revisionAdministrationScaling

9

Today’s FocusAchievement tests taken by students

◦ Some attention to aptitude tests as well◦ Exclude tests taken by teachers◦ Include uses of student test scores to

evaluate teachers◦ Exclude testing for individual diagnosis of

special needs

10

Testing and Prior Instruction

Curriculum-Dependent Test Question

Curriculum-Neutral Test Question

May assume prior knowledge and skills

May probe reasoning with what is already known

May “drill deeper,” testing application of concepts

Must include requisite information with item

Must set up context in order to probe reasoning

Often limited to testing knowledge of concept definitions

11

Seven Broad Purposes of Testing

12

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

13

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

14

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

15

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

16

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Purposes for Educational Testing

17

Measuring Influencing

Learning Instructional Guidance

Learners Student Placement and Selection

Directing Student Effort

Methods Informing Comparisons Among Educational Approaches

Focusing the System

Actors Educational Management

Shaping Public Perceptions

Seven Broad Purposes of Testing

18

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

19

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Seven Broad Purposes of Testing

20

Purpose Primary Users ConstructLinkage to Curriculum Interpretation

Instructional Guidance Teachers, Students

Narrow Achievement Targets

Strong CR, Individual

Student Placement and Selection

School Administrators (and Others)

Aptitude, Achievement

Varies Varies, Individual

Informing Comparisons Among Educational Approaches

School Administrators, Researchers

Achievement (curriculum-specific; curriculum-neutral)

Varies NR, Group

Educational Management

Public, Elected Officials, Administrators

Achievement Grows higher NR (may look like CR), Group

Directing Student Effort

Students Aptitude, Achievement

Varies (should be strong)

mostly CR, Individual

Focusing the System Teachers, School Administrators

Achievement Grows higher NR (may look like CR), Group

Shaping Public Perceptions

Public, Elected Officials, Administrators

Achievement Should be Strong

NR, Group

Purposes for Educational Testing

21

Measuring Influencing

Learning Instructional Guidance

Learners Student Placement and Selection

Directing Student Effort

Methods Informing Comparisons Among Educational Approaches

Focusing the System

Actors Educational Management

Shaping Public Perceptions

Instructional GuidanceFormative Assessment (informal)

◦ Scoring Sound items adequately sampling domain?

◦ Generalization Test scores with adequate precision?

◦ Extrapolation Mastery extends beyond test per se?

◦ Decision or Implication Used to adapt teaching work to meet learning

needs?

22

Instructional GuidanceFormative Assessment (highly

structured)◦ Winnetka Plan◦ Programmed Instruction approaches◦ Benjamin Bloom’s Mastery Learning◦ Pittsburgh LRDC’s IPI Math Curriculum◦ Criterion-Referenced Testing movement

23

• Scoring• Generalization• Extrapolation• Decision or Implication

Instructional GuidanceFormative Assessment (highly

structured)◦ Scoring

Questions mapped well to behavioral objectives

◦ Generalization Multiple items highly redundant

◦ Extrapolation ??? Assume decomposability, decontextualization

◦ Decision or Implication Relied on cut scores, simple rules; insufficient

attention to actual effects

24

Student Placement and SelectionIQ-based trackingGATE programsEnglish Learner status (Entry / Exit)MCTs / HSEEsAdvanced Placement /

International BaccalaureateSAT / ACT…

25

IQ-Based TrackingRationale

◦ Teachers deliver uniform instruction to all students in a classroom

◦ Students learn at different rates Or, have different “capacities”

◦ Grouping students by ability will improve efficiency because all will receive content at a rate appropriate to their ability This will reduce wasted effort and frustration

26

IQ-Based TrackingContext

◦ Increasing immigration (since late 19th century)

◦ Perceived success of Army Alpha◦ Scientific School Management movement◦ Prevailing hereditarian views

27

IQ-Based TrackingScoring

◦ Scores free from bias and distortion?Generalization

◦ High correlations across forms and occasions

Extrapolation◦ Assumed based on strong theory, some

criterion-related validity evidenceDecision or Implication

◦ Largely unexamined

28

Student Placement and SelectionIQ-based trackingGATE programsEnglish Learner status (Entry / Exit)MCTs / HSEEsAdvanced Placement (AP) /

International Baccalaureate (IB)SAT / ACT…

29

Comparing Educational ApproachesESEA-mandated Project Head Start

evaluationsEvaluations of NSF-sponsored science

curriculaNational Diffusion NetworkWhat Works ClearinghouseBoth RCTs and Quasi-experimental

research

30

Educational ManagementMeasuring Schools

◦ NCLB Adequate Yearly Progress (AYP) determinations Intervention for schools “in need of improvement”

Measuring Teachers◦ “Value-Added” Models

31

“Measuring” purpose (Educational Management) is only part of the story. “Influencing” interacts with “measuring.”

“Value-Added” Models forTeacher EvaluationScoring

◦ May require vertical scaling◦ Bias due to violations of model assumptions

Generalization◦ Extra error due to student sampling and sorting

Extrapolation◦ Score gains as proxy for teacher effectiveness /

teaching quality broadly definedDecision or Implication

◦ Largely unexamined

32

InfluencingPurposes of directing effort, focusing

the system, and shaping perceptions rarely stand alone◦ Direct use of test scores for measuring is

always included◦ Influencing purposes may nonetheless be

more significant

33

Shaping Public Perceptions

34

"Test results can be reported to the press. … Based on past experience, policymakers can reasonably expect increases in scores in the first few years of a program … with or without real improvement in the broader achievement constructs that tests … are intended to measure."

R. L. Linn (2000, p. 4)

Attending to Influencing Purposes in Test Validation Importance

◦ Influence as ultimate rationale for testing◦ Place in the interpretive argument where

unintended consequences ariseChallenge

◦ Purposes not clearly articulated◦ Required data not available for years◦ Required research methods unfamiliar◦ Disincentives to look closely◦ Expensive, may not matter

35

Clarity of Purpose

36

SBAC and PARCC Consortia must have:

“A theory of action that describes in detail the causal relationships between specific actions or strategies … and … desired outcomes …, including improvement in student achievement and college- and career-readiness.”

Availability of DataFamiliar problem in literature on

program evaluation◦ Plan ahead◦ Attend to implementation cycle◦ Do not ask for results too soon

Plan for “audit” tests?Phased implementation?

37

Expanded Methods and TheoriesCan we view testing phenomena

through other disciplinary lenses?Validation requires both empirical

evidence and theoretical rationales◦ Common sense gets us part way there◦ Where does theory for “Influencing”

purposes come from?◦ What research methods can we borrow?

38

Costs and IncentivesNeed increased investment in

comprehensive validationNeed help from agents, agencies

beyond test makers, test administrators

Need more explicit press for comprehensive validation in RFPs, public discourse

39

40

Thank you