20
Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014

Large-scale testing: Uses and abuses

Embed Size (px)

DESCRIPTION

Large-scale testing: Uses and abuses. Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014. Large-scale testing: Uses and abuses. 3 types of large-scale tests Measuring test quality A chronology of mistakes E conomists misunderstand testing How SIMCE is affected. - PowerPoint PPT Presentation

Citation preview

Page 1: Large-scale testing:  Uses and abuses

Large-scale testing: Uses and abuses

Richard P. Phelps

Universidad Finis Terrae, Santiago, Chile

January 7, 2014

Page 2: Large-scale testing:  Uses and abuses

Large-scale testing: Uses and abuses

1. 3 types of large-scale tests2. Measuring test quality3. A chronology of mistakes4. Economists misunderstand testing5. How SIMCE is affected

Page 3: Large-scale testing:  Uses and abuses

AchievementAptitude

Non-cognitive

1. Three types of large-scale tests

Page 4: Large-scale testing:  Uses and abuses

Achievement tests Historically, were larger versions of classroom tests

~ 1900 - “scientific” achievement tests developed (Germany & USA)

SOURCE: Phelps, Standardized Testing Primer, 2007

J.M. Rice - systematically analyzed test structures & effects

E.L. Thorndike - developed scoring scales

Page 5: Large-scale testing:  Uses and abuses

Achievement tests

Purpose: to measure how much you know and can recall

Developed using: content coverage analysis

How validated: retrospective or concurrent validity (correlation with past measures, such as high school

grades)

Requires a mastery of content prior to test.

Fairness assumes that all have same opportunity to learn content

Coachable – specific content is known in advance

SOURCE: Phelps, Standardized Testing Primer, 2007

Page 6: Large-scale testing:  Uses and abuses

Aptitude tests

1917 – Adapted by U.S. Army to select, assign soldiers in World War 1

1930s – Harvard University president J. Conant- wanted new admission test to identify students from lower social classes with the

potential to succeed at Harvard- developed the first Scholastic Aptitude Test (SAT)

SOURCE: Phelps, Standardized Testing Primer, 2007

1890s – A. Binet & T. Simon (France)

- Pre-school children with mental disabilities

- achievement test not possible- developed content-free test of mental abilities

(association, attention, memory, motor skills, reasoning)

Page 7: Large-scale testing:  Uses and abuses

Aptitude testsPurpose: predict how much can be learned

Developed using: skills/job analysis

How validated: predictive validity, correlation with future activity (e.g., university or job evaluations)

Content independent. Measures: … what student does with content provided… how student applies skills & abilities developed over a lifetime

Not easily coachable – the content is either…… not known in advance, … basic, broad, commonly known by all, curriculum-free;… less dependent on the quality of schools

SOURCE: Phelps, Standardized Testing Primer, 2007

Page 8: Large-scale testing:  Uses and abuses

Aptitude tests

Aptitude tests can identify:

- Students bored in school who study what interests them on their own

- Students not well adapted to high school, but well adapted to university

- Students of high ability stuck in poor schools

SOURCE: Phelps, Standardized Testing Primer, 2007

Page 9: Large-scale testing:  Uses and abuses

Achievement Aptitude

Measure past learning potential

Development content analysis job/skills analysis

Validation retrospective predictive

Content dependent independent

Coachable? very much not much

Comparing Achievement & Aptitude tests

Page 10: Large-scale testing:  Uses and abuses

Non-cognitive tests

More recently developed – measure values, attitudes, preferences

Types: integrity tests career exploration matchmakingemployment “fit”

Page 11: Large-scale testing:  Uses and abuses

Non-cognitive tests

Purpose: to identify “fit” with others or a situation

Developed using: surveys, personal interviews

How validated? success rate in future activities

Content is personal, not learned

“Faking” can be an issue (e.g., “honesty” tests)

Page 12: Large-scale testing:  Uses and abuses

Achievement Aptitude Non-Cognitive

Measure past learning potential attitudes, values, preferences

Development content analysis job/skills analysis surveys

Validation retrospective predictive predictive

Content dependent independent independent

Coachable? very much very little can be faked

Comparing Achievement, Aptitude, & Non-Cognitive Tests

Page 13: Large-scale testing:  Uses and abuses

2. Measuring test quality

3 measures are important:1. Predictive validity2. Content coverage3. Sub-group differences

Test reports can be “data dumps”

Page 14: Large-scale testing:  Uses and abuses

Predictive validity(values from -1.0 to +1.0)

…measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion)

A test with low predictive validity provides a little information.

Page 15: Large-scale testing:  Uses and abuses

Source: NIST, Engineering Statistics Handbook

A positive correlation between two measures

Page 16: Large-scale testing:  Uses and abuses

Source: NIST, Engineering Statistics Handbook

A negative correlation between two measures

Page 17: Large-scale testing:  Uses and abuses

Source: NIST, Engineering Statistics Handbook

No correlation between two measures

Page 18: Large-scale testing:  Uses and abuses

How does one measure predictive capacity?

Correlation Coefficient: I--------------------------------------------I

-1 0 1

Page 19: Large-scale testing:  Uses and abuses

0

0.1

0.2

0.3

0.4

0.5

0.6

SAT

PSU 2010

Predictive validities: SAT and PSU

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

Page 20: Large-scale testing:  Uses and abuses

Language Mathematics SAT Writing PSU Social Science

0

0.1

0.2

0.3

0.4

0.5

0.6

SAT PSU Administracion

Predictive validities: SAT and PSU(faculty: Administracion)

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013