CAN INSTRUCTIONALLY INSENSITIVE ACCOUNTABILITY TESTS EVER EVALUATE EDUCATORS FAIRLY?

Preview:

DESCRIPTION

CAN INSTRUCTIONALLY INSENSITIVE ACCOUNTABILITY TESTS EVER EVALUATE EDUCATORS FAIRLY?. W. James Popham University of California, Los Angeles Winter Conference Washington Educational Research Association and Office of Superintendent of Public Instruction, Seattle December 4,2008. - PowerPoint PPT Presentation

Citation preview

CAN INSTRUCTIONALLY CAN INSTRUCTIONALLY INSENSITIVE ACCOUNTABILITY INSENSITIVE ACCOUNTABILITY

TESTS EVER EVALUATE TESTS EVER EVALUATE EDUCATORS FAIRLY?EDUCATORS FAIRLY?

W. James PophamW. James PophamUniversity of California, Los University of California, Los

AngelesAngeles

Winter ConferenceWinter ConferenceWashington Educational Research Washington Educational Research

Association and Office of Association and Office of Superintendent of Public Superintendent of Public

Instruction, Seattle December Instruction, Seattle December 4,2008 4,2008

In nations where students’ In nations where students’ scores on “accountability” tests scores on “accountability” tests play a pivotal role in the play a pivotal role in the evaluation of schools, it is evaluation of schools, it is assumed that students’ assumed that students’ performances on such tests performances on such tests accurately reflect instructional accurately reflect instructional quality.quality.

But what if students’ scores on But what if students’ scores on educational accountability tests educational accountability tests did did notnot accurately reflect accurately reflect instructional quality?instructional quality?

A DEFINITION OF INSTRUCTIONAL A DEFINITION OF INSTRUCTIONAL SENSITIVITY SENSITIVITY

The degree to which students’ The degree to which students’ performances on a test performances on a test accurately reflect the quality of accurately reflect the quality of instruction specifically provided instruction specifically provided to promote students’ mastery of to promote students’ mastery of what is being assessed.what is being assessed.

1 2 3 4 5 6 7 8 9 10CompletelyInsensitive

TotallySensitive

A Continuum of Instructional Sensitivity

Accountability tests, such as numerous assessments used in the U.S., differ intheir ability to detect instructional quality.

WHY MIGHT A TEST ITEM BE WHY MIGHT A TEST ITEM BE INSTRUCTIONALLY INSENSITIVE?INSTRUCTIONALLY INSENSITIVE?

Alignment LeniencyAlignment Leniency Excessive EasinessExcessive Easiness Excessive DifficultyExcessive Difficulty Confusion-Engendering Item Confusion-Engendering Item

FlawsFlaws Socioeconomic Status (SES) Socioeconomic Status (SES)

LinksLinks Academic Aptitude LinksAcademic Aptitude Links

ALIGNMENT LENIENCYALIGNMENT LENIENCY

Many items on accountability tests, Many items on accountability tests, when judged as to their alignment when judged as to their alignment with the curricular aims they are with the curricular aims they are supposed to be measuring, will be supposed to be measuring, will be regarded as aligned with those regarded as aligned with those aims (skills and/or knowledge) even aims (skills and/or knowledge) even if the items are only tangentially if the items are only tangentially related to the curricular aim being related to the curricular aim being assessed.assessed.

An Example of Lenient AlignmentAn Example of Lenient Alignment

Item 23Item 23

Using the bus schedule on the adjacent page, Using the bus schedule on the adjacent page, if your purpose was to determine the if your purpose was to determine the shortest time to reach Boston from Denver shortest time to reach Boston from Denver on a Monday, on which bus should you on a Monday, on which bus should you begin your journey?begin your journey?

A. Bus 214A. Bus 214

B. Bus 197B. Bus 197

C. Bus 110C. Bus 110

D. Bus 202 D. Bus 202

Was the item aligned?Was the item aligned?

If the curricular aim had been for If the curricular aim had been for students to be able to use students to be able to use appropriate functional texts such appropriate functional texts such as train or bus schedules.as train or bus schedules.

If the curricular aim had been for If the curricular aim had been for students to be able to determine students to be able to determine whether given functional texts whether given functional texts would fulfill their purpose for would fulfill their purpose for using such texts.using such texts.

EXCESSIVE EASINESSEXCESSIVE EASINESS

If an item is so easy that even If an item is so easy that even completely untaught students completely untaught students would answer it correctly, then would answer it correctly, then the item can’t distinguish the item can’t distinguish between well taught and poorly between well taught and poorly taught students.taught students.

E.g., How many letters are there E.g., How many letters are there in the word in the word sevenseven? ?

EXCESSIVE DIFFICULTYEXCESSIVE DIFFICULTY

If an item is so difficult that even If an item is so difficult that even marvelously instructed students marvelously instructed students might not answer it correctly, might not answer it correctly, then the item can’t distinguish then the item can’t distinguish between well taught and poorly between well taught and poorly taught students.taught students.

E.g., Without using your E.g., Without using your computer, what is the square root computer, what is the square root of 1,522,756?of 1,522,756?

ITEM FLAWSITEM FLAWS

Items embodying serious deficits Items embodying serious deficits (e.g., ambiguities, garbled syntax, (e.g., ambiguities, garbled syntax, more than one correct answer, or more than one correct answer, or no correct answer) will prevent well no correct answer) will prevent well taught students from answering taught students from answering the item correctly, hence make it the item correctly, hence make it impossible for the item to impossible for the item to accurately distinguish between accurately distinguish between effectively and ineffectively taught effectively and ineffectively taught students. students.

SOCIOECONOMIC STATUS (SES) SOCIOECONOMIC STATUS (SES) LINKSLINKS

If an item gives a meaningful If an item gives a meaningful advantage to students from advantage to students from higher SES families, then the higher SES families, then the item will tend to measure what item will tend to measure what students bring to school rather students bring to school rather than how well they are taught than how well they are taught once they get there.once they get there.

A plant’s fruit always contains A plant’s fruit always contains seeds. Which of the items seeds. Which of the items below is not a fruit?below is not a fruit?

A.A. orangeorangeB.B. pumpkinpumpkinC.C. appleappleD.D. celerycelery

A 6th-Grade Science A 6th-Grade Science Item:Item:

In which of the sentences below does the In which of the sentences below does the word word fieldfield mean the same thing as in the mean the same thing as in the sentence above?sentence above?A.A. The shortstop knew how to The shortstop knew how to fieldfield his his

position.position.B.B. We prepared the We prepared the fieldfield by plowing it. by plowing it.C.C. What What fieldfield do you plan to enter when do you plan to enter when youyou

graduate?graduate?D.D. The nurse examined my The nurse examined my fieldfield of vision. of vision.

A 4th-Grade Reading Item:

My father’s field is computer graphics.

ACADEMIC APTITUDE LINKSACADEMIC APTITUDE LINKS

If an item gives a meaningful If an item gives a meaningful advantage to students who advantage to students who possess greater inherited possess greater inherited quantitative, verbal, or spatial quantitative, verbal, or spatial aptitudes, then the item will aptitudes, then the item will tend to measure what students tend to measure what students bring to school rather than how bring to school rather than how well they are taught once they well they are taught once they get there.get there.

If someone really wants to conserve If someone really wants to conserve resources, one good way to do so is to:resources, one good way to do so is to:

A.A. leave lights on even if they are notleave lights on even if they are notneeded.needed.

B.B. wash small loads instead of large loadswash small loads instead of large loadsin a clothes-washing machine.in a clothes-washing machine.

C.C. write on both sides of a piece of paper.write on both sides of a piece of paper.D.D. place used newspapers in the garbage.place used newspapers in the garbage.

A 6th-Grade Social Studies A 6th-Grade Social Studies Item:Item:

The secret number is inside the circle. It is The secret number is inside the circle. It is also inside the square. It is NOT inside the also inside the square. It is NOT inside the triangle. Which of these is the secret triangle. Which of these is the secret number?number?

A. 2A. 2 B. 3B. 3 C. 5 C. 5 D. 7 D. 7

A 3rd-Grade Mathematics Item:A 3rd-Grade Mathematics Item:

5 24

76

1

3

A 4A 4thth-Grade Mathematics -Grade Mathematics Item:Item:Which of the letters below, when Which of the letters below, when

folded in half, will have two parts folded in half, will have two parts that match exactly?that match exactly?

FF(A)(A)

ZZ(B)(B)

SS(C)(C)

BB(D)(D)

WHY MIGHT A TEST ITEM BE WHY MIGHT A TEST ITEM BE INSTRUCTIONALLY INSENSITIVE?INSTRUCTIONALLY INSENSITIVE?

Alignment LeniencyAlignment Leniency Excessive EasinessExcessive Easiness Excessive DifficultyExcessive Difficulty Confusion-Engendering Item Confusion-Engendering Item

FlawsFlaws Socioeconomic Status (SES) Socioeconomic Status (SES)

LinksLinks Academic Aptitude LinksAcademic Aptitude Links

A LESSON TO BE LEARNED:A LESSON TO BE LEARNED:

When the measurement community When the measurement community became convinced that assessment became convinced that assessment bias in our high-stakes tests was bias in our high-stakes tests was threatening validity, we set out to threatening validity, we set out to (1) detect assessment bias and (2) (1) detect assessment bias and (2) reduce it. We were successful. We reduce it. We were successful. We can be equally successful in coping can be equally successful in coping with instructional insensitivity.with instructional insensitivity.

TWO STRATEGIES FOR TWO STRATEGIES FOR DETERMINING INSTRUCTIONAL DETERMINING INSTRUCTIONAL

SENSITIVITYSENSITIVITY

A Judgmental StrategyA Judgmental Strategy whereby whereby seasoned, well trained educators seasoned, well trained educators supply item-by-item ratings using supply item-by-item ratings using a rigorous item-evaluation rubrica rigorous item-evaluation rubric

An Empirical StrategyAn Empirical Strategy contrasting contrasting per-item performances of (1) per-item performances of (1) taught versus untaught students taught versus untaught students or (2) effectively taught versus or (2) effectively taught versus ineffectively taught studentsineffectively taught students

JUDGMENTAL DETERMINATION OF JUDGMENTAL DETERMINATION OF AN ITEM’S INSTRUCTIONAL AN ITEM’S INSTRUCTIONAL

SENSITIVITYSENSITIVITY

An Illustrative Review Question:An Illustrative Review Question: “ “If a teacher has provided If a teacher has provided

reasonably effective instruction reasonably effective instruction related to the objective measured related to the objective measured by this item, is it likely a by this item, is it likely a substantial majority of the substantial majority of the teacher’s students will respond teacher’s students will respond correctly to the item?”correctly to the item?”

EMPIRICAL DETERMINATION OF AN EMPIRICAL DETERMINATION OF AN ITEM’S INSTRUCTIONAL ITEM’S INSTRUCTIONAL

SENSITIVITYSENSITIVITY

Contrasting per-item performances Contrasting per-item performances of taught versus untaught studentsof taught versus untaught students

Contrasting per-item performances Contrasting per-item performances of effectively taught versus of effectively taught versus ineffectively taught studentsineffectively taught students

INSTRUCTIONAL INSENSITIVITY:INSTRUCTIONAL INSENSITIVITY: UNFAIR AND HARMFUL UNFAIR AND HARMFUL

When we allow educators’ quality to When we allow educators’ quality to be determined on the basis of be determined on the basis of accountability tests incapable of accountability tests incapable of performing that task, we are being performing that task, we are being profoundly unfair to those profoundly unfair to those educators.educators.

Far worse, some wrongly evaluated, Far worse, some wrongly evaluated, desperation-driven educators will desperation-driven educators will engage in classroom practices that engage in classroom practices that are educationally harmful to are educationally harmful to children.children.

Presenter’s e-mail address:Presenter’s e-mail address:wpopham@ucla.edu

Reactions or suggestions Reactions or suggestions regarding this topic will be regarding this topic will be

welcomed.welcomed.

Recommended