CTT IRT SweSAT Chile dec 2017 PEL - demre.cl fileUMEÅ UNIVERSITY •Overview of the SweSAT •CTT and IRT •Item-and test-level analysis •Scoring and equating •Why use CTT? Why

CTT OR IRT OR CTT AND IRT – ITEM ANALYSIS AND

SCORING ON THE SWESATPer-Erik Lyrén, Umeå University

Transparency and Validity in Selection to HigherEducation

Seminar at Universidad de Chile – DEMRE5–6 December 2017

UMEÅ UNIVERSITY

• Overview of the SweSAT• CTT and IRT• Item- and test-level analysis • Scoring and equating• Why use CTT? Why use IRT?• Why do we not use IRT on the SweSAT?• The future

OUTLINE

UMEÅ UNIVERSITY

THE SWESAT SINCE 2011

Verbal score(0.0-2.0)

Quant. score(0.0-2.0)

Total score(0.00-2.00)

V Booklet 1(40 items)

V Booklet 2(40 items)

Q Booklet 2(40 items)

Q Booklet 1(40 items)

Exp. Booklet(40 items)

WORD(20 items)

SEC(20 items)

READ(20 items)

ERC(20 items)

DTM(24 items)

DS(12 items)

QC(20 items)

XYZ(24 items)

UMEÅ UNIVERSITY

WHAT IS CTT AND IRT?

• Different frameworks for analysing and scoring tests• CTT = Classical Test Theory• IRT = Item Response Theory

o Modern Test Theory

UMEÅ UNIVERSITY

• CTTo p (proportion correct; difficulty)o rbis/rpbis (item-total correlation; discrimination)o p differences (group differences)o ”Item characteristic curves” (proportion correct vs observed test

score)

SWESAT: ITEM ANALYSIS

UMEÅ UNIVERSITY

• CTTo Score means and SDso Reliability estimates (KR-20; split half)o Score differences (groups based on gender, age, education)

SWESAT: TEST ANALYSIS

UMEÅ UNIVERSITY

• CTTo Raw scores: Number-correct

§ No formula scoring/penalty for guessingo Scale scores: non-linear transformation of raw scoreso Equating: Traditional equipercentile equating

§ IRT has been used

SWESAT: SCORING+EQUATING

UMEÅ UNIVERSITY

• Easy to use• Works fine for many applications• Decent estimates with low N’s• The definition of reliability is intuitive

WHY USE CTT?

UMEÅ UNIVERSITY

• Persons (ability) and items (difficulty) on the same scale• Testable assumptions (model fit)• Attractive features

o Graphical representations of items and fito Test information functionso Bank of calibrated items à equating alternate forms is trivial

• Investigating/detecting aberrant response patterns

WHY USE IRT?

UMEÅ UNIVERSITY

• More information à better decisions (?)• Compare the outcomes from using the different frameworks to

solve a practical testing problemo Similar solutions strengthens the validity argumento Different solutions à why?

• Does any testing program use only IRT?o Reliablity estimates?

WHY USE BOTH?

UMEÅ UNIVERSITY

• General issueo Resources

§ Calibrated items à item banking procedures

• Item analysis and test assemblyo Traditiono Training of the test developers

§ Content experts rather than psychometricians

WHY NOT IRT ON THE SWESAT?

UMEÅ UNIVERSITY

• Equatingo IRT methods not necessarily better (less error) than CTT methodso Model fit? o Other parts of test development is CTT-based

• Scoringo Traditiono Communication: The scores are what the public sees

§ Probably less of an issue than we might think, especiallywhen considering cheating issues

WHY NOT IRT ON THE SWESAT?

UMEÅ UNIVERSITY

• The SweSAT will (also) use IRTo Or maybe Optimal Scoring (e.g., Ramsay & Wiberg, 2016)?

• Computer-based testingo CAT-ish?

• Cheating• Additional resources

THE FUTURE

Documents

CTT IRT SweSAT Chile dec 2017 PEL - demre.cl fileUMEÅ UNIVERSITY •Overview of the SweSAT •CTT and IRT •Item-and test-level analysis •Scoring and equating •Why use CTT? Why