Evaluation of in-house item banks by administering actual CATs

Evaluation of in-house item banks by administering actual CATs

Tetsuo KIMURA (Niigata Seiryo Unirversity)

TERA-PROMS 2013, Kaohsiung, TaiwanAugust 5, 2013

UCAT

Outline•Background & Previous Studies

▫What is CAT?▫UCAT and Moodle UCAT▫Construction of Item Bank

• Current Study▫Sample & Method▫Results▫Conclusion

2

What is CAT?

Paper-Pencil Test

Computerized Test

Computer Adaptive

Test

What is CAT?

Paper-Pencil

Test

Computerized

Test

Computer Adaptive

Test

Interview Test

Self-scoring flexilevel test (Lord, 1971)

Binet’s IQ test (Binet’s & Simon, 1905)

Adaptive Test

Binet’s IQ test (Binet’s & Simon,1905)

The First Adaptive Test

5

Flexilevel Test (Lord,1971)The middle difficulty item, number 11 in difficulty-order

① 　②　③　④

1. A slightly easier item, number 10 in difficulty-order　　①　②　③　④

1. A slightly harder item, number 12 in difficulty-order

　　①　②　③　④2. A slightly easier item, number 9 in difficulty-order　　①　②　③　④

2. A slightly harder item, number 13 in difficulty-order　　①　②　③　④

3. 3.

・・・

・・・

10. The easiest item, number 1 in difficulty-order　　①　②　③　④

10. The hardest item, number 21 in

difficulty-order　　①　②　③　④

① 　②　③　④

① 　②　③　④① 　②　③　④

① 　②　③　④

① 　②　③　④ ① 　②　③　④

① 　②　③　④① 　②　③　④

① 　②　③　④ ① 　②　③　④① 　②　③　④

6

Individualizatio

n of test

1. item selection suitable to each test taker2. shortening of test administration time

What is CAT?

3. improvement of measurement accuracy

Efficiency of

measuremtn

Previous Studies

8

•Rash-based CAT program▫Linacre (1987) . UCAT: a BASIC computer-adaptive testing

program. ▫Kimura, Ohnishi & Nagaoka (2012). Moodle UCAT: a computer-

adaptive test module for Moodle based on the Rasch model.⇒ 　 ACP (SG ） & Version2 (JP) cooperative project

•Construction of item banks for CAT▫Kimura (2009). Construction of a Moodle-based placement test

and possibility of a Moodle-based computer adaptive test.▫Kimura & Nagaoka (2010). Towards the construction of item

banks for moodle-based in-house computer adaptive English tests.

Construction of item bank

Pretesting

9

Item analysis & elimination of misfit

More pretests with new items and anchored items

Item bank

Calibrated items

Anchored items

Types of items used in the study

All the items were adopted from the Eiken Test Grade pre 1 to Grade 3, under the permission of the Society for Testing English Proficiency (STEP).

Listening comprehension (Lng)

Reading comprehension (Rdg)

Vocabulary and grammar (Vgm)

Listening comprehension with dialogue (Dlg)

Listening comprehension with monologue (Mlg)

Construction of item bank:Common Person Linking Dlg & Mlg Lng

r = .86

Mlg =Dlg × 1.18 ＋ 0.06

r = 0.89

Dlg =Mlg × 0.85 ＋ 0.05

Current item banks

12

Vgm N AVG SDG1.5 (B2) 73 1.57 0.84 G2 (B1) 69 0.52 0.81 G2.5 (A2) 67 -0.47 0.91 G3 (A1) 49 -1.41 0.80 Total 258 0.19 1.37

Lng N AVG SDG1.5 (B2)

44 1.26 1.42

G2 (B1) 109 0.77 1.11 G2.5 (A2)

75 0.35 1.05

G3 (A1) 80 -0.90 1.33 Total 308 0.30 1.43

CAT Algorithm: Item Selection (logit bias)

13

Moodle UCATLL and UL can be adjusted by adding logit value to the Logit bias box in the CAT setting window

BiasULULBiased

BiasLLLLBiased

_

_

Positve logit value decrease the chance of answer correctNegative logit value increase the chance of answer correct

Current Study: Sample & Method

14

Test takers: About 160 Japanese university freshmen whose majors are nursing and social welfare

Some students were eliminated from the data because they had not completed the CAT properly.

Eiken grade

Item banks

Vgm Lng

Pre 1st

115 113

2nd 105 108Pre 2nd

95 104

3rd 85 91CAT conditions• Initial estimate ability: 0.0 logit (100 unit)• Ending condition: number of item (16 items)

S.E. theoretically reached as low as 0.5 logit (Linacre, 2006)

• Logit bias: 0 (targeting probability of answering correct could be 0.5)

Current Study: results

15

Vgm: More than 90% of 157 test takes ended their CAT with S.E. less than .55 logits.Lng: More than 90% of 130 test takes ended their CAT with S.E. less than .55 logits.

Item exposure rate (frequency per 100 test takers)

Vgm Lng

Current Study: results

Vgm Lng

Item exposure rate (frequency per 100 test takers)

Current item banks

Vgm Lng

Current Study: Conclusions

•More items with lower difficulty should be added to both item banks.

•If the CATs were administered to students in advanced level, more items with higher difficulty need to be added to both item banks.

•If the cutting point of test is set between 0 and 3 logits for Vgm and between -1 and 3 for Lng, the current item banks can serves well for the CAT.

Thank you for listening.

Tetsuo Kimura 　　　　　 [email protected]

Files for Moodle UCAT https://github.com/VERSION2-Inc/moodle_ucat

Acknowledgements: A part of the present study was supported by a

Grant-in-Aid for Scientific Research for 2010-2012 (No. 22520590) from the Japan Society for the Promotion of Science.

18

mailto:[email protected]

https://github.com/VERSION2-Inc/moodle_ucat



Education

Evaluation of in-house item banks by administering actual CATs