21
Steven L. Wise Senior Research Fellow Evaluating Test-Taking Effort: When is a Growth Score Not Really a Growth Score?

Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

  • Upload
    nwea

  • View
    430

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Steven L. Wise

Senior Research Fellow

Evaluating Test-Taking

Effort: When is a

Growth Score Not

Really a Growth Score?

Page 2: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• The ability to measure student growth across time is a key feature of MAP.

– Growth = RITTime2 – RITTime1

• A valid growth score, however, requires valid test scores at each time point.

• If either component score is invalid, the growth score is untrustworthy.

• How can we evaluate the validity of individual scores?

Measuring Student Growth

2

Page 3: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• A valid test score requires:

– a well-constructed test with standardized administration procedures

– that construct-irrelevant factors, which introduce construct-irrelevant variance (CIV) do not meaningfully affect test performance.

• ISV: how trustworthy is the score?

• Low ISV scores are distorted by construct-irrelevant factors.

Individual Score Validity (ISV)

3

Page 4: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• One student, one proctor (e.g., school psychologist).

• Proctor has been trained to observe the student’s test taking behavior during the test event.

• If warranted, the proctor will terminate the test, take corrective action, or invalidate the score.

• Potential reasons: lack of motivation, anxiety, illness, changes in testing environment.

• Examinees, items, and context are all construct-irrelevant factors.

Scenario 1: Individually Administered Achievement Test

4

Page 5: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Many-to-one relationship between students and proctors.

• Monitoring responsibilities of proctors typically do not extend beyond

– maintaining standardized administration

– deterring cheating

• Looking for CIV not usually part of proctor’s role (and is impractical).

Scenario 2: Group Administered Achievement Test

5

Page 6: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Student effort has been found to be a key construct-irrelevant factor affecting MAP scores.

• We implicitly assume that students give good effort when administered our MAP test.

• When they don’t, the resulting RITs tend to underestimate true proficiency.

Test-Taking Effort and MAP

6

Page 7: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• If low effort occurs at Time1 but not at Time2:

– Growth will be positively affected.

– Possibly unrealistically high positive growth

• If low effort occurs at Time2 but not at Time1:

– Growth will be negatively affected.

– Possibly negative growth

How Low Effort Distorts Growth Scores

7

Page 8: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Table 1. Percentages of Fall-Spring growth scores that are negative with magnitude exceeding two RIT standard errors

ContentArea

Grade

2 3 4 5 6 7 8 9

Math 1% 1% 2% 2% 5% 6% 8% 15%

Reading 1% 3% 5% 6% 9% 11% 12% 16%

How Often Do Negative Growth Scores Occur?

8

Page 9: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Students who have become disengaged from their test and have stopped giving effort show two types of behaviors:

– They tend to answer questions very rapidly.

– Their answers tend to be correct at about a chance level (as opposed to the expected .50 rate characteristic of an adaptive test).

• Data on these behaviors can be objectively and unobtrusively collected by the computer.

Assessing Student Effort

9

Page 10: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Two types of response behaviors

– Rapid-guessing behavior: the student responds before he or she would have able to read and consider the item.

– Solution behavior: all other behaviors.

• RTE equals the proportion of items for which the examinee exhibited solution behavior.

• Ranges from 0.0 (low) to 1.0 (high).• RTE measures the effort expended by a

student to a test.

Response Time Effort (RTE)

10

Page 11: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• We developed five flagging criteria for spotting test events whose scores indicate low ISV.

• Flags based on both RTE and response accuracy.

• They take into account that students often behave non-effortfully during only a portion of the test event.

• We will call “invalid” any test event that triggered at least one of the flags.

Five Effort Flags

11

Page 12: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Consider a test event as a student seeing a series of items in a particular context.

• Student factors: gender, grade

• Item factors: content area, amount of reading, presence of a table, figure or graph.

• Context factors: item position, time of day, test stakes, heat/cold, noise distractions

Correlates of Student Effort

12

Page 13: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Average RTE in Math

13

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

3 4 5 6 7 8 9

Me

an R

TE

Grade

Females

Males

Page 14: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Average RTE in Reading

14

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

3 4 5 6 7 8 9

Me

an R

TE

Grade

Females

Males

Page 15: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Invalid Test Events in Math, By Time of Day

0

5

10

15

20

25

7:00 a.m. 8:00 a.m. 9:00 a.m. 10:00 a.m. 11:00 a.m. 12:00 noon 1:00 p.m. 2:00 p.m.

Pe

rce

nt

Inva

lid S

core

s

Time of Day

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Grade 9

15

Page 16: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Invalid Test Events in Reading, By Time of Day

0

5

10

15

20

25

7:00 a.m. 8:00 a.m. 9:00 a.m. 10:00 a.m. 11:00 a.m. 12:00 noon 1:00 p.m. 2:00 p.m.

Pe

rce

nt

Inva

lid S

core

s

Time of Day

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Grade 9

16

Page 17: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Student Spr10 RIT Fa10 RIT Spr11 RIT

John 229 242 245

Paul 168 190 215

George 201 210 170

Ringo 229 174 241

Yoko 201 159 225

Some Actual MAP Test Events

Which of the score patterns seem reasonable? Which do not?17

Page 18: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

Student Spr10 RIT Fa10 RIT Spr11 RIT

John 229 242 245

John’s RTE: 1.0 1.0 1.0

Paul 168 190 215

Paul’s RTE: .72 .90 1.0

George 201 210 170

George’s RTE: .80 .68 .52

Ringo 229 174 241

Ringo’s RTE: 1.0 .28 1.0

Yoko 201 159 225

Yoko’s RTE: .56 .42 .92

Considering the Test Events in Light of RTE Information

MAP scores with low RTE’s are not trustworthy.

Page 19: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Identify suspect RITs on our reports.

• Try to preempt non-effortful responding by developing a smart test that monitors effort and displays messages to students and/or proctors.

• Develop methods for adjusting RITs for the amount of non-effortful behavior in a test event.

Addressing the Problem: What NWEA Can Do

Page 20: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

• Explain to students the importance of their giving their best effort on MAP.

• Administer MAP in a setting that is free from construct-irrelevant factors.

• Administer MAP in the morning when possible.

Addressing the Problem: What You Can Do

20

Page 21: Evaluating Test Taking Effort - When is a Growth Score Not Really a Growth Score

[email protected]

Thank you for your attention.