Upload
trinhphuc
View
214
Download
0
Embed Size (px)
Citation preview
CTT OR IRT OR CTT AND IRT – ITEM ANALYSIS AND
SCORING ON THE SWESATPer-Erik Lyrén, Umeå University
Transparency and Validity in Selection to HigherEducation
Seminar at Universidad de Chile – DEMRE5–6 December 2017
UMEÅ UNIVERSITY
• Overview of the SweSAT• CTT and IRT• Item- and test-level analysis • Scoring and equating• Why use CTT? Why use IRT?• Why do we not use IRT on the SweSAT?• The future
OUTLINE
UMEÅ UNIVERSITY
THE SWESAT SINCE 2011
Verbal score(0.0-2.0)
Quant. score(0.0-2.0)
Total score(0.00-2.00)
V Booklet 1(40 items)
V Booklet 2(40 items)
Q Booklet 2(40 items)
Q Booklet 1(40 items)
Exp. Booklet(40 items)
WORD(20 items)
SEC(20 items)
READ(20 items)
ERC(20 items)
DTM(24 items)
DS(12 items)
QC(20 items)
XYZ(24 items)
UMEÅ UNIVERSITY
WHAT IS CTT AND IRT?
• Different frameworks for analysing and scoring tests• CTT = Classical Test Theory• IRT = Item Response Theory
o Modern Test Theory
UMEÅ UNIVERSITY
• CTTo p (proportion correct; difficulty)o rbis/rpbis (item-total correlation; discrimination)o p differences (group differences)o ”Item characteristic curves” (proportion correct vs observed test
score)
SWESAT: ITEM ANALYSIS
UMEÅ UNIVERSITY
• CTTo Score means and SDso Reliability estimates (KR-20; split half)o Score differences (groups based on gender, age, education)
SWESAT: TEST ANALYSIS
UMEÅ UNIVERSITY
• CTTo Raw scores: Number-correct
§ No formula scoring/penalty for guessingo Scale scores: non-linear transformation of raw scoreso Equating: Traditional equipercentile equating
§ IRT has been used
SWESAT: SCORING+EQUATING
UMEÅ UNIVERSITY
• Easy to use• Works fine for many applications• Decent estimates with low N’s• The definition of reliability is intuitive
WHY USE CTT?
UMEÅ UNIVERSITY
• Persons (ability) and items (difficulty) on the same scale• Testable assumptions (model fit)• Attractive features
o Graphical representations of items and fito Test information functionso Bank of calibrated items à equating alternate forms is trivial
• Investigating/detecting aberrant response patterns
WHY USE IRT?
UMEÅ UNIVERSITY
• More information à better decisions (?)• Compare the outcomes from using the different frameworks to
solve a practical testing problemo Similar solutions strengthens the validity argumento Different solutions à why?
• Does any testing program use only IRT?o Reliablity estimates?
WHY USE BOTH?
UMEÅ UNIVERSITY
• General issueo Resources
§ Calibrated items à item banking procedures
• Item analysis and test assemblyo Traditiono Training of the test developers
§ Content experts rather than psychometricians
WHY NOT IRT ON THE SWESAT?
UMEÅ UNIVERSITY
• Equatingo IRT methods not necessarily better (less error) than CTT methodso Model fit? o Other parts of test development is CTT-based
• Scoringo Traditiono Communication: The scores are what the public sees
§ Probably less of an issue than we might think, especiallywhen considering cheating issues
WHY NOT IRT ON THE SWESAT?
UMEÅ UNIVERSITY
• The SweSAT will (also) use IRTo Or maybe Optimal Scoring (e.g., Ramsay & Wiberg, 2016)?
• Computer-based testingo CAT-ish?
• Cheating• Additional resources
THE FUTURE