39
Test Equating Zhang Zhonghua Chinese University of Hong Kong

Test Equating

  • Upload
    gay

  • View
    96

  • Download
    0

Embed Size (px)

DESCRIPTION

Test Equating. Zhang Zhonghua Chinese University of Hong Kong. Question ?. Two sets of Standardized Test which measure the same trait: A and B. - PowerPoint PPT Presentation

Citation preview

Page 1: Test Equating

Test Equating

Zhang ZhonghuaChinese University of Hong Kong

Page 2: Test Equating

Question ?• Two sets of Standardized Test which measure the same trait:

A and B. • A and B were administrated separately to two groups of

students (Group 1 and Group 2). Group 1 students only took Test A, and Group 2 students only took Test B.

• The mean score on Test A for Group 1 is 84. And the mean score on Test B for Group2 is 80. t-test result indicated that there was a statistically significant difference between the mean score for Group 1 and Group 2 (p<0.05).

• Then, should the conclusion that the Group 1 students were better than the Group 2 students on the trait that the two tests measured be gotten?

Page 3: Test Equating

Why Equate?

• To compare test scores of different forms of tests (Strictly speaking, Parallel tests) which measure the same latent trait

• To construct the item bank/pool

• Computerized Adaptive Testing (CAT)

Page 4: Test Equating

What’s Equating?

• “Equating is a statistical process that is used to adjust scores on test forms so that scores on the forms can be used interchangeably. Equating adjusts for differences in difficulty among forms that are built to be similar in difficulty and content” (Kolen & Brennan, 2004).

• The two alternate test forms for equating: Same content and statistical specification

• Equity• Symmetric• Group Invariance

Page 5: Test Equating

• Lord’s Equity Property

Examinees with a given true score would have identical observed score means, standard deviations, and distributional shapes of converted scores on Form X and scores on Form Y.

• First-order Equity Property

Examinees with a given true score have the same means converted score on Form X as they have on Form Y.

Page 6: Test Equating

Form Y Raw Form X1 Raw Form X2 Raw

1 2 4

2 3 5

. . .

. . .

13 14 16

14 15 17

15 16 18

16 17 19

17 18 20

18 19 21

… … …

Page 7: Test Equating

Equating Design

• Single Group

• Random Groups

• Single Group with Counterbalance

• Anchored/Common-item Nonequivalent Group

• Preequating

Page 8: Test Equating

• Single Group

Sample Form X Form Y

G1 √ √

Page 9: Test Equating

• Single Group with Counterbalancing

Sample Time 1 Time 2

G1 Form X Form Y

G2 Form Y Form X

Page 10: Test Equating

• Random Groups

Sample Form X Form Y

G1 √

G2 √

Page 11: Test Equating

• Common-item Nonequivalent Groups

Sample Form X Form Y Common Items V

G1 √ √

G2 √ √

Page 12: Test Equating
Page 13: Test Equating

• Preequating

Precalibrated IRT Parameter Item Bank

Items form Bank(Operational items)

New Items(Non-Operational

Items)

Page 14: Test Equating

Equating Methods

• Based on Classical Testing Theory (CTT)

• Based on Item Response Theory (IRT)

Page 15: Test Equating

Downloadable Equating Procedures

• Equating/Linking Programs

http://www.education.uiowa.edu/casma/EquatingLinkingPrograms.htm

• IRT Scale Transformation Programs

http://www.education.uiowa.edu/casma/IRTPrograms.htm

Page 16: Test Equating

Equating Methods Based on CTT

• Mean Equating

• Linear Equating

• Equipercentiel equating

Page 17: Test Equating

CTT-Mean Equating

• In mean equating, Form X is considered to differ in difficulty from Form Y by the difference of the mean scores between the two forms.

• Example:

MX=70, MY=75.

Let Form X as the base Form, Form Y as the target Form.

For the score 80 on Form Y, the Equated Score on the scale of Form X is 80-(75-70)=75.

Page 18: Test Equating

CTT-Linear Equating

• In Linear Equating, scores that are an equal distance from their means in standard deviation units are set equal.

( ) ( )

( ) ( )

x X y Y

X Y

( ) ( )( ) ( ) ( )

( ) ( )Y

Y Yl x x Y X

X X

Page 19: Test Equating

CTT-Equipercentile

• For a given Form X score, find the percentage of examinees earning scores at or below that Form X score.

• Find the Form Y score that has the same percentage of examinees at or below it.

• The Form X and Form Y score are considered to be equivalent.

• Example: 70% of the examinees got a score 75 or below on Form X. 70% of the examinees got a score 80 or below on Form Y. Then a Form X score of 75 would be considered to represent

the same level of achievement as a Form Y score of 80.

Page 20: Test Equating

Equating Methods Based on IRT

• IRT Parameters Equating

• IRT Observed Score and IRT Truce Score Equating

Page 21: Test Equating

Item Response Theory

• Take IRT Three-Parameter Model as an example,

• Item parameters: Item Discrimination, Item Difficulty, Guessing

( )

( )( , , ) (1 )

1

i i

i i

Da b

j i ii i Da bi

eP cba cc

e

Page 22: Test Equating
Page 23: Test Equating
Page 24: Test Equating

0.0

0.5

1.0

150 200 250 300

Pro

bab

ility

Item 1 Item 2

Scale Score

Difficulty

Item 1

Item 2

Page 25: Test Equating

0.0

0.5

1.0

150 200 250 300

Pro

bab

ility

Item 1 Item 2

Scale Score

Difficulty

Item 1

Item 2

Page 26: Test Equating

0.0

0.5

1.0

150 200 250 300

Pro

bab

ility

Item 1 Item 2

Scale Score

Difficulty

Item 1

Item 2

Page 27: Test Equating
Page 28: Test Equating
Page 29: Test Equating

Item Parameter Equating

• Linking Separate Calibration (Mean/Mean Method, Mean/Sigma Method, Stocking-Lord Method, Haebara Method)

• Concurrent Calibration

• Fixed Common-Precalibrated Item Parameter Method

Page 30: Test Equating

IRT-Linking Separate Calibration

,

,

,

exp ( )

(1 )

1 exp ( )

exp ( )(1 )

1 exp ( )

Ji Ii

Jj Ij

IjJj

Jj Ij

IjIi Ij

Ij IjIj

Ii Ij

Ij Ii Ii

Ij Ij

Ij Ii Ii

Let

A B

b Ab B

aa

Ac c

Then

aD A B Ab BA

c ca

D A B Ab BA

Da bc c

Da b

Page 31: Test Equating

IRT-Moment Methods

• Mean/Mean Method

• Mean/Sigma Method

T

B

a

a

MA

M T

B T

B

ab b

a

MB M M

M

B

T

b

b

SA

S B

B T

T

bb b

b

SB M M

S

Page 32: Test Equating

IRT-Characteristic Curve Method

• Stocking-Lord method:

• Haebara method:

2

1 1 1

1[ ( , , ) ( , , )]

N n njT

ST i jB jB jB i jT jTi j j

aF P a b c P Ab B c

N A

2

1 1

1[ ( , , ) ( , , )]

N njT

H i jB jB jB i jT jTi j

aF P a b c P Ab B c

N A

Page 33: Test Equating

Example

• Take Form Y as the base test , Form X as the target Test

• Item 1 on Form X: Item Difficulty is 1.0; Item Discrimination is 1.896; Guessing is 0.18

• Equated item parameters for Item 1 on Form X onto the scale of Base Form Y can be computed as follows,

Stocking-Lord Haebara Mean/Mean Mean/Sigma

B -0.057 -0.063 -0.087 0.028

A 0.948 0.942 0.943 0.770

( ) ( )1 0.948 1.0 0.057=0.891Y

bXeq Ab B

( )1( )

1.8962.0

0.948Xa

Y

aeq

A

( ) ( )1 0.18cY Xeq c

Page 34: Test Equating

IRT- Concurrent Calibration

• Concurrent calibration method involves estimating item and ability parameters simultaneously on a single computer run. In the procedure, the items that are not taken by one group of subjects are taken as not reached or missing data and the item parameters for all items on the two test forms are simultaneously estimated. This one estimation run makes the item parameters for all items from the two test forms put on the same scale (Kim & Hanson, 2002; Kim & Cohen, 1998).

• Example

Page 35: Test Equating

Concurrent Calibration for Replication 16>COMMENTSHorizontal EquatingConcurrent Calibration for Replication 16>GLOBAL NPARM=3,DFNAME='D:\RESEARCH\REP16\CONH-16\CONH-16.DAT',SAVE;>SAVE PARM='D:\RESEARCH\REP16\CONH-16\CONH-16.PAR';>LENGTH NITEMS=140;>INPUT NTOTAL=80,SAMPLE=2000,NALT=4,NIDCH=4,FORMS=2;(4X,4A1,6X,I1,1X,80A1)>FORM1 LENGTH=80,ITEMS=(1(1)80);>FORM2 LENGTH=80,ITEMS=(1(1)20,81(1)140);>TEST ITEMS=(1(1)140),LINK=(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0);>CALIB CYCLES=20;>SCORE;

Page 36: Test Equating

IRT-Fixed Common-Item Parameters

• This procedure combines the features of concurrent calibration and linking separate calibration methods. In the method, the item parameters for the two test forms are estimated separately. What differs from linking separate calibration is that the common item parameters from the target test will be fixed at the estimated values from the base test.

• Example

Page 37: Test Equating

Fixed Common Item Parameters for Replication 16>COMMENTSFCIP for Replication 16Target Test Form B with N (0,1)>GLOBAL NPARM=3,DFNAME='D:\RESEARCH\REP16\FIXV-16\B11-16.DAT',SAVE;>SAVE PARM='D:\RESEARCH\REP16\FIXV-16\FIXV-16.PAR';>LENGTH NITEMS=(80);>INPUT NTOTAL=80,SAMPLE=1000,NALT=4,NIDCH=4;(4A1,1X,80A1)>TEST ITEMS=(1(1)80);>CALIB TPRIOR,SPRIOR,GPRIOR,READPRI,CYCLES=20;>PRIORSTMU=(-0.639,1.041,1.701,0.482,-1.144,-0.023,0.616,1.133,0.668,0.577,-0.257,0.029,0.904,0.232,1.602,1.642,0.537,-0.228,1.439,0.517,0.0(0)60), TSIGMA=(0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,2.0(0)60), SMU=(-0.688,0.011,-0.810,0.614,-0.811,-0.445,-0.142,-0.387,0.292,-0.449,0.040,-0.522,0.080,0.660,0.301,0.408,-0.689,-0.079,0.294,-0.174,0.0(0)60), SSIGMA=(0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.5(0)60), ALPHA=(2882.990,1329.080,3540.010,4092.470,2694.080,2652.900,2314.660,2532.500,2336.870,3725.870,2364.700,2545.460,2358.110,2307.760,3583.990,3117.190,2569.460,1817.030,1057.210,2544.350,6(0)60), BETA=(7119.010,8672.920,6461.990,5909.530,7307.920,7349.100,7687.340,7469.500,7665.130,6276.130,7637.300,7456.540,7643.890,7694.240,6418.010,6884.810,7432.540,8184.970,8944.790,7457.650,16(0)60); >SCORE;

Page 38: Test Equating

Comparison of Different Equating Methods

• No agreements have been gotten • Methods based on CTT can be used to equate tests.

Methods based on IRT are essential to construct item bank/pool.

• Among the methods based on IRT, some researches indicated that Concurrent Calibration Method could produce more accurate equating results than that of Linking Separate Calibration Method and FCIP method.

Page 39: Test Equating

Thank You Very Much!