Transcript
Page 1: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

DOCUMENT RESUME,

ED 262 071 TM 850 548

AUTHOR Skaggs, Gary; Lissitz, Robert W.TITLE An Exploration of the Robustness of Four Test

- Equating Models.PUB DATE Apr 85NOTE 47p..; Paper presented at the Annual Meeting of the

American Educational Research Association (69th,Chicago, IL, March 31-April 4, 1985). Small print intables 1-3.

PUB TYPE Speeches/Cdinference Papers (150) -- Reports -Research /Technical (143)

EDRS PRICE MFO1 /PCO2 Plus Postage.DESCRIPTORS Computer Simulation; -*Equated Scores;-*Error of

Measurement; *Goodness of Fit; Latent Trait Theory;*Mathematical Models; Scaling; Standardized Tests;'Statistical Analysis; Statistical Studies

IDENTIFIERS Equipercentile Equating; Linear Models; Rasch Model;Robustness;,Three Parameter Model

ABSTRACTThis study examined how four'commonly used test

equating procedures (linear, equipercentile, Rasch Model, andthree-parameter) would respond to situations in which the propertiesor the two tests being equated were different. Data for two testsplus an external anchor test-were generated from a three, parametermodel in which mean test differences in difficulty, discrimination,and Lower asymptote were manipulated. In each case two data sets were.generated consisting of responses of 2,000 examinees to a 35 itemtest plus 15 item anchor test. Each test equating case was comprisedof 4,000 examinees and 85,items. The robustness with respect toviolations of assumptions was tested for the linear and Raschequating methods. For equipercentile equating, the results showed howthe method responded to various conditions for the three-parametermodel, the study primarily tested LOGIST's simultaneous estimationprocedure. Results indicated that equipercentile equating was verystable across the cases studied. Linear and Rasch model equating werevery'sensitive to violations of their models' assumptions. Resultsfor the three-parameter model were disappointing. Paying closeattention to test item properties was advised when selecting anequating method. (BS)

************************************************************************ Reproductions supplie,4. by EDRS are the best that can be made *

* from the original document. *

***********************************************************************

Page 2: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

An Exploration of.the Robustnessof Four Test Equating Models

Gary SkaggsRobert W. Lissitz

University of Maryland

April 1985

UAL DEPARTMENT OF EDUCATIONNATIONAL INSTITUTE OF EDUCATION

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

VT' his document has been reproduced asreceived from the person or organizationoriginating it.

I Minor changes have been made to Improvereproduction quality.

Points of view or opinions stated in this docu-ment do not necessarily represent official Nitposition or policy.

"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BY

Liss, 74z eolek:/-- 10

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."

Paper presented at theAnnual Meeting of the American Educational Research Association

Chicago, ILApril 1985

2fly

Page 3: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

7

AN EXPLORATION OF THE ROBUSTNESS OF FOUR TEST EQUATING MODELS

Gary SkaggsRobert W. Liesitz

University of Maryland

The application of item response theory (IRT) to many measurement problems has

been one of the ia3or psychometric breakthroughs of the pest twenty years. IRT

methodology is currently being used in many large standardized testing programs. and

at the same time, a great deal of research is being doni to evaluate the robUstness

of the procedures under a variety of conditions. One of the most important

applications of this model is in the area of test equating.

The purpose of test equating is to determine therelationship between raw

scores on two tests that measure the same ability. Equating can be horizontal,

between tests of equivalent difficulty and content,, or vertical, between tests of

intentionally different difficulties.

Most conventional approaches to test equating have been describe* as either

linear or equipercentile methods (see Angoff, 1971) in contrast to using item

response theory (IRT) (Lord, 1980; Wright S. itone,41979) and are now widely used.

In these techniques, a mathematical relationship between raw scores on tests is

modeled. This relationship is based on estimates of item parameters from two tests"

and placement of these estimates'on the same scale.

A number of equating studies using IRT methods have appeared 4n recent years.

A maJority of these have dealt with the one-parameter logistic, or Rasch, model.

This model is the simplest of the IRT models but also the most demanding one in

terms of its assumptions.

Several studies have found the Rasch model to be useful and appropriate for

item calibration and linking (Tinsley & Davis, 1975; Rentz & Bashaw, 1977; Guskey,

1981; Forsyth, Saisangjan, & Gilmer, 1981). On the other hand, a number of

researchers have noticed problems with the Reach model for vertical equating

3

Page 4: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

(Whitely & Dawis, 1974;51inde & Linn, 1978 1979; Loyd & Hoover,. £980: Holmes,

19S2).

Some of the inconsistency can be attributed to different equating-liesigns,

different types of tests being equated, and different procedures used to analyze the

retults. Regardless of the cause, there are some, fundamental concerns about the

Rasch model. The most frequently postulated arguments concern the failure of the

model to account for chance scoring, unequal discrimination, and

multidimensionality. The last concern applies to more complex IRT models as well.

Research tailing the three-parameter logistic and other models has not been as

plentiful as with the Rasch model, but it.suggests the same interpretative

difficulties. Most of the studiel have examined the three - parameter model in the

context of horizontal equating. of general ability tests. This work has generally

supported the use of the three-parameter model (March°, Petersen, .& Stewart, 1979;

Kolen, 1981; Petersen, Cook, & Stocking, 1981). For vertical equating, results have

been more mixed with some-studies finding the there parameter model to be more

effective than the Reach model (Marco et al., 1979; Kolen, 1981). However, the

comparison between the models has been shown to depend largely on the content of the

tests being equated (Kolen, 1981; Petersen et al., 1981; Holmes & Doody-Bogan,

1983).

With all of this conflicting research, it is very difficult to make decisions

about howto use IRT equating or whether to use it at all. The purpose of the

present study is to explore how test equating results can be affected by the

parameters of the items that make up the testa being equated.

Four methods of equating were Chosen representing, popular versions of linear,C

equipercentile, Rasch model, and three-parameter model techniques. Data for two

tests plus en external anchor test were generated from a three-parameter model in

which mean test differences in difficulty, discrimination, and lower asymptote were

Page 5: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

manipulated. For Reach model and linear equating, this study is an exploration of

robustness when the model's assumptions are violated. For the three-parameter

Todel, this study amounts to an examination of the parameter estimation strategy.

For tine equipercentile equating, this study explores its effectiveness under a

variety of test conditions.

beta

METHOD

Data in this study were generated from the three-parameter logistic model:

-1P(u ='1/ 8,a ,b ,c ) = c (1-c )(1 * exp(-1.702a (8 -b ))) Illi3 3iii i i i 3 1

The response to item i by person 3, a 0 or 1, was determined by comparing the

probability defined by equation 1 to a random number drawn fro* a (0,1) uniform

distribution. If the probability of a correct response exCeeded the random'number,

the item was scored as correct. Otherwise, the item was scored. as, incorrect. The

random numbers were produced from the GOBS (IMSL, 1980) generator.

In all simulation cases. an external anchor test design was used. Iff each

Case, two date,sets were generated. Each data set consisted of the responses of

2,000 examinees to a 35 item test plus an anchor test of 15 items; Each test

equating case was comprised of 4,000 examinees and 85 items.. This size was chosen

to be large enough to provide stable parameter estimates for both IRT models

(Lissak, Hulin, & Drasgow, 1982).

Item and Ability Parameters

The item parameters used to generate the data were determined by manipulating

lower asymptotes and mean test difficulty and discrimination of the tests being

equated. For each reference, the two tests will be referred to as test A and test

B.

1

Page 6: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

item difficulty was Studied a. three levels: 1) TA = bg = 0 ;2) = =

.5; and 3)15A= -1.0, = 1.0. For each test, difficulties were uni- rmly

distributed across a ranee of 4./- 2 logits.

Item discrimination was also examined at three levels: 1)71A =- at = .8: 2)-E1

= Et = 1.1; and 3) "iiA = 1.1, it = .5. In each test, discriminatiOns were \\\

uniformly distributed across a range of 4. - .1. Difficulties and discriminations

were randomly paired.

Lower asymptote, values were.manipulated in four ways: 1) cA = cg = 0; 2) cA =

cB = .2; 3) cA = 0, ca.. = .2, and 4) cA = = O. In each case, lower asymptote

values were the same for all items within a test.

In this study, a complete.crossins.of all levels produced 36 cells, or cases,

of pairs of tests to be equated. All anchor test items had a mean difficulty of

zero end a mean discrimination of .8. For vertical equating,the anchor items

represented an overlap in difficulty between tests A and B. Lower asymptote values

were all zero except in the case where the valuei were .2 for both tests. In these

cases, lower asymptotes were .2 for the anchor teat items.

These item parameters were chosen to refleCt a typical test equating between

tests of either equal or unequal difficulty (to simulate horizontal or vertical

equating). The abilities of the examinees was chosen to match that of each test's

difficulty, an.ideal situation and one found commonly in achievement testing. Each

sample of 2,000 examinees was selected from a normal distribution with.a mean equal

to-the mean difficulty-of the test and a standard deviation of one. The GGNML

(IM5L, 1980) generator was used to generate ability parameters for each sample.

Equatina Methods

One linear, one equipercentile, and two item response theory GIRT) equating

methods were chosen for this study on the basis of their popularity. In all cases,

an external anchor test design was used, and Test B was equated to Test A. That is,

Page 7: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

for each raw score on Test B, an equivalent was found on the raw score scale of Test

A. For vertical equating,' Test B was always the more difficulty test.

The linear equating,method has been described by Angoff (1971) as Design Ivq-1

and is a procedure derived by Levine (1955) for equally reliable tests.

Equipercentile equating was accomplished using Levine's (1958) method which has been

described by Angoff as Design IVB. Cureton & Tukey's (1951) rolling weighted

average' method was used to smooth the cumulative distributions.

One of the IRT equating methods is based on the Basch model. Item parameters

were estimated using BICAL_(Wright, Mead, I Bell, 1980), and the equating was done

using procedures outlined by Wright & Stone (1979).

For three-parameter model' equating, parameter estimates were obtained from

LOGIST V.(Wingersky, Barton, & Lord, 1982). Many versions of this program exist.

For this study, a. version adapted by ETS for a UNIVAC 1100 was used. For each

equating case] item parameters for both tests and the anchor test were estimated

simultaneously by employing LOGIST's option,for not reached items.' The equating

\ then followed,pord's (1980) estimated true score equating procedure. For.below

.chance raw scores, Lord's (1980, p. 210) method of linear extrapolation was used.

Analysis Procedures

Since the data were generated from a known three-parameter model, these initial

item parameters. were used to develop a criterion for the test equating cases. This

criterion was simply a pairing of raw scores corresponding to the same ability

estimates:71A Pi o) . cs,- z Pi ce)

i j

This equating function was then compared to the equating functions produced by

the four equating methods. Besides plotting these results, two summary statistics

were used to interpret the results. These statistics are very similar to mean

square error statistics used in other equating studies, (e.g. Marco, Petersen, &

Page 8: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Stewart, 1979; Petersen, Cook, & Stocking, 1981). These indices are referred to

here as the weighted and unweightediman square error OISE) and can.be stated as

follows:k-=1

weighted(MSE) = fi(XE - Xcrit

)

2/ f S 123

k-1

unweighted(MSE) = Z' (X - XE crit

)2/52

B

2where k equals the number of items on Test B, 5g equals the raw score variance for °

Test B, -crit is the criterion test score equivalent on Test A for raw score i on

Test B. Xe is the Test A equivalent for raw score i that is produced by one of the

equating methods, and fi is the frequency of raw score i on Test B. The summation

is over rev score values, except that for the weighted MSE, the summation is only

across that part of the scale where extrapolation was not necessary. Zero and

perfect scores were excluded from all IRT equatirigs, but included in both

conventional equating'.

RESULTS,

Raw score means and standard deviations for all data sets are shown in Table 1.

Raw score means ranged from approximately 17.5 to 21.3 and standard deviations from

Page 9: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

4

5.'0 to 7.7. By looking at data sets generated under similar item parameters, it is

clear that the generation procedure produced very consistent results. An

examination of the frequency distributions4for each data set also revealed a high

degree of consistency. In the shapes of the distributions. This in turn suggests

that there was a high degree of stability in the equetings.

As expected, a higher degree of,item discrimination in the generating item

parameters produced More dispersion in the raw score distributions. The reverse was

true for low discrimination. Non-zero lower asymptotes produced negatively skewed

raw score distributions.

The summaries of the two mean square error indices are presented in Tables 2

and 3. The first case, where test difficulties and discriminations were equal and

lower asymptotes were zero, was a situation where the data fit the Reach model.

From a psychometric point of view, too, this represented an ideal (easy) equating

situation. 'The best result for all four methods, almost perfect equating, was

found in this situation. The worst resets for all methods occurred where mean test

difficulties and discriminations were unequal, where levels of chance scoring were

unequal, and where low discrimination was paired with non-zero lower asymptotes on

the, more difficult test.

In general, the error indices for the equipercentile method were the lowest

across all cases.. This was followed by the three-paremeter model. Values for the

Rasch-model and linear equating tended, naturally, to be relatively large in

situations where their assumptions were violated.

To aiein the interpretation of Tables 2 and 3 , repeated measures

analyses-of-variance were performed on the two ?ISE indices. The results for the

weighted 115E appear in Table 4 and for the unweifted OE in Table 5. All effects

involving a comparison of the four methods were significant. Figures 1 to 4 show

cell plots of all means for the first and second order interactions between the four'

Page 10: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

equating methods and independent variables. In each plot, the values shown are

means pooled across the variable(s) not included in the plot.

Finally, the actual equating functions for each case are shown in Figure 5, In

each plot, the solid.line represents the criterion equating based,on the initial

item parameters. The four broken lines represent the resultiofrom the four equating

. procedures. The criterion equating in most cases was curvilinear, making linear,

equating clearly inappropriate. In most of the plots. Reach and linear equating

were the most deviant, while the equipercentile equating line was closest to the

criterion, thus visually confirming the ?SSE values in Tables 2 and,3..

DISCUSSION

From a statistical viewpoint, the robustness with respect to violations of

assumptions was tested in this study only for the linear and Rasch equating methods.

For equipercentile equating, the results showed how the method responded to a

variety of conditions. In the case'of the three-parameter model, this study was

primarily a test of LOGIST's simultaneous estimation procedure.

Linear E uatin%

The assumptions of the linear equating model are violated whenever the shapes

of the raw score distributions differ for the two tests being equated. This

occurred when the mean test discrimination and/or level of lower asymptotes differed

between the two testa. The appropriateness of linear equating could be gauged by

the degree of curvilinearity in the criterion equating function. The total error

for linear equating was the smallest for horizontal equating with equally

discriminating tests. Chance scoring did not affect the equating in these cases

since the criterion equating function was still lifiear. Linear equating was clearly

inappropriate for all vertical equating cases and for horizontal equating where mean

test discriminations were unequal.

iU

Page 11: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Ecuinercentile Eouatinc

As can be seen in Tables 2 and 3, equating error for equipercentile equating

was-generally the lowest of' all four methods. All MSE values were below .25, and it

provided the smallest values 'in 30 of the 36 cases and in all-the vertical equatinb

cases. In the most extreme situations, this method was the only one of the four to

produce what we would consider acceptable results. Perhaps one reason for this- is

that it is only one of the four approaches not based on a model. It is simply the

best fit of the data at hand. The issue of cross-validation might be important in

some situations, but in this case, our preliminary work (not reported here)shows

that the results ailkvery stable.

In this version. of equipercentile equating, a total group cumulative

distribution was estimated for both tests basedon the response of the combined

.____.samples to the anchor test items. That this estimation inconjunCtion with a

smoothing routine worked so well was somewhat surprising.

Pasch Model Eauatina

An examination of the results in Tables 2 and 3 suggests that,the Reach model

was not very robust to violations of its assumptions. The first case in those

tables and in Figure 5. shows a situation where the data fit the Reach model for

horizoqtal equating. The Reach model, as expected, performed extremely well as did

the other three methods. In the second case in the tables and in Figure 5, all-4

items had a lower essymptote of .2. Yet, the equating was still quite good for all

methods.

In subsequent cases, where the level oP chance scoring was unequal in the two

tests and where test discriminations were unequal, the Reach model performed very;

poorly. In situations where low discrimination was paired with non-zero lower

asymptotes the total error was relatively large. An explanation for these results

11

Page 12: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

\ Acan be found by looking at the estimation and linking procedure. When the BICAL

program is faced with a data set, a metric is chosen so that mean difficulty equal

to ,zero and all discriminations equal to one. When BICAL runs are done for two

tests with different properties, the resulting metrics are different, and estimated

item difficulties foi one test are more or less compressed than they should,be. The

use of an equating' constant does not alter the ,underlying metric, and a bias is

introduced into the equating.

That this bias can be severe can be seen in the next to last plot in Figure 5

in the case wheke low discrimination and chance scoring are bothc'present in the more

difficult test. The BICAL estimates for this case revealed a range of difficulty of

-4.0 to 3.1 for\Test A, but for Test B, the range was -1.5 to 1.2. Both tests were

generated with a range of /-2 logits; Obviously the metrics are quite different..

For vertical aque4ng, where the data for edth test fit the Reach Model, Rasch

equating produced adequate results where the test difficulties differed by one'

logit. However, where tests differed by two legit& in difficulty, the equating was

not as good. One poisible reason for this involves the anchoring procedure. Since

the anchor en overlap in the difficulty ranges of.both

teats, these iteitta_wermLvery difficult for those taking. Test A and very easy for

those taking Test B. Consequently, estimation was not as accurate. An anchor test

with a wider range Of difficulty (see Loyd, 1983) might have alleviated this

problem.

Ip the vertical equating cases; the error introduced by unequal seen

discrimination and chance Scoring was even more pronounced. Even where the same

degree of chance-scoring occurred on the two,tests, Reach equating was clearly

inadequate. These results therefore corroborate from a different methodological

perspective, empirical results that advise against using the Reach model to equate

vertically whenever chance scoring is a poisibility. These results also advise

12

Page 13: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

against using thy Reach model in any situation when Sean test discriminations are

°unequal.

O

4 These problems would be especialYy difficult to overcome when one is

constructing alternate forms from an item babk.. To ensure that all tests formed

from the bank had the same mean discrimination, all items in the bank would have to

have the Same discrimination, or a,coiplicated algorithm for item selection be

progressed. Similarly,_chance scoring would not be a problem only if all items had

the same degree of chance scoring the forms to be equated were of Comparable

difficulty. this is a difficult task for any test developer.

Three parameter Model Eouatino

Since the data were generated from a threeaparameter molel, one would expect

three-parameter model estimation and equating to be quite accurate. An examination

ofthe values in Tables 2 and 3 indicates that this was not always the case.

The plots in Figures 1 to 4 suggest that three- parameter equating was.

relatively unaffected by levels of teat difficulty or chance Scoring. However, the

equating was affected by unequal discrimination. There was also an interaction

between unequal discrimination and teat difficulty and chance scoring. The greatest

errors occurred where thelmore difficult test also 'had the lower discrimination and

where a higher degree of hance scoring was paired with lower discrimination.

Since the data actually fit the model, the LOGIST estimation procedure as

programmed should be held responsible for the success of the equating. In this

study, simultaneous estimation was used. A single LOGIST run was used for each test

equating by employing the "not reached" option. In every case for this study, the

LOGIST estimation converged. However, for unequal discrimination and chance

scoring, theAprogram typically took at least 35 stages to converge (For practical

Page 14: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

reasons, it was decided to extend the limit on the number of stages rather than

produce continuation runs).

That LOGIST was unable to recover the initial metric can be illustrated by

looking at.he parameter estimates from one of the cases. For the situation where

the/initialiparameters for tests A and B were as follows: be-.5, ce.0 and

ci=.2: the weighted andounweighted HSE's were .616 and .563,

respectivel: For Test A, the LOGIST difficulty estimates ranged from -3.14 to

.1.72, while the original difficulties ranged from -2.5 to 1.5. However, by linearly

transformirig-the_LOGIST estimates to the original metric, the estimations ranged

from -4.03 to 2.52. The LOGIST discriminations for Test A ranged from .8 to 1.3

compared to the original-1.0 to 1.2. After transformation, the range becomes ..6 to

.9.

For Test B, the LOGIST difficulties after transformation ranged from -1.37 to

2.55. The original span was from -1.5 to 2.5. However, the difficulties were

poorly estimated for the easiest half of the test. The LOGIST discriminations after

transformation ranged from .5 to 1.0, the original range being .4 to .6.

Ironically, the lower asymptotes were estimated reasonably well. The default

options were used, and default values were obtained for the easiest items on both

tests. Yet, no item had a c-value_greiter than .1 on Test A, and only six items on

Test B had c values leas than .1.

Another peculiarity was observed in the LOGIST results across all cases.

each teat, parameters for a few items (one to throe out of 35) were estimated

extremely poorly. These tended to occur more frequently on tests with weaker

discriminations. No apparent reason for these outliers could be found as all item

responses were generated from the same function. Still, an erroneous decision on

the quality of an item could be made from these results.

14

Page 15: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Clearly, in cases of unequal discrimination, LOGIST was unable to reproduce the

original metric, and equating was therefore biased. The differences in

discrimination were quite severe in this study, and it is not,known how well LOGIST

would respond to milder differences. On the other hand, the parameters for this

study sample size, test length, and ability distribution- - were chosen to yield

stable, reproduceable estimates. The results suggest therefore that the use of the

simultaneous estimation procedure of LOGIST is questionable in circumstances such as

these. Some other method for transforming estimates to the same scale should be

considered.

Comments on Analysis Procedures

A reviev of published equating studies reveals a wide variety of evaluation

procedures and summary statistics. The degree to which methodology affected

conclusionsis not known in these studies. In this study, the weighted mean square

error statistic was chosen because it has appeared frequently in the literature

(e.g. Marco et al., 1979; Petersen et al., 19811. When the results from these

statistics were compared to graphs of the equating functions (Figure,5), the

weighted MSE values did not seem to represent some of the cases accurately. This

was because there were relatively few persons in the raw score ranges where the

greatest equating errors occurred, at the lower end of the distribution.

Because of this, the inweighted MSE was also computed. Because each raw score

counted equally with this'statistic,. the values tended to be higher than for the

weighted statistics. In Figures 1 to 4, the weighted MSE values appear on the left

hand side and the unweighted values on the right. A comparison of the two sets ofk.

plots suggests thatthe two sets of MSE values turned out to be very similar for

equipercentile and three-parameter model methods. For Linear and Rasch model

equating, quite different results appeared in some of the plots.'

15

Page 16: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Other abnormalities appear when looking at the plots in Figures 1 to 4. For

example,, in Figure 1, the symmetry in the study's design does not appear in th MSE

values for levels of the discrimination and lower asymptote independent variable.

Tests h and B alternate between mean discriminations of .5 and 1.1 and lower

asymptotes of 0 and .2. Yet, the higher MSE values occur when Test A has the higher

discrimination. The same thing occurs when there is more chance scoring on Test B

than on Teat A. If one examines symmetrical designs (third, fourth,:fifth, and

ninth plots) in Figuri 5, e plots appear to be mirror images of one another.

The paradox can be explained by the fact that the HSE statistic nses vertical

distances from the plots. If horizontal diitances were used (i.e. Test A equated to

Test B). the pattern of results would be reversed. This analysis calls into

question the use 9f )5E statistics for this purpose. A great deal of theoreical

statistical work is needed in the area of proper error indexing.

CONCLUSIONS

The purpose of this study was to examine how four commonly used test equating

procedures would respond to situations in which the properties of the two testsI #

,being equated were different. The reset.* indicated that equipercentile equating

was very stable across the cases studied. Linear and Rauch model equating were very

sensitive to violations of their models' assumptions. Rasch model equating showed

robustness only for horizontal equating where the degree of chance scoring was the

same for both tests.

When data fit the Reach model, three-parameter model equating and Rasch

equating achieved comparable and accurate results. In all other cases,

three-parameter equating was far better than the Rasch model but generally not as

good as equipercentile equating. The results for three-parameter model equating

were disappointing since the data were generated from a three-parameter model.

16

Page 17: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Simultaneous estimation using LOGIST seemed unable to recover the original metric,

especially when mean test discriminations were unequal.

All of the equating methods were effected by some situations. Where the tests

being equated-differed in difficulty, mean discrimination, and in their.degree of

chance scoring, the equating terror was the largest for all four methods. This

suggests that equating tests should not be'attempted under such extreme conditions.

None of the equating methods could completely overcome the effect of such divergence

in item type.

The use of the MSE statistics produced several-paradoxes:in the results.

These could be resolved by examining the equating functions themselves. Certainly.

more statistical work needs to be done in the comparison of test characteristic

curves.

Finally, all of the data for this °study were generated from a unidimensional

three-parameter model. Real-data' do not exactly conform to this model, although it

seems reasonable in a wide variety of situations. How these methods would respond

to multidimensional data is not known, but problems for both IRT methods were

uncovered in the unidimensional case.

This study supported otheAesearch finding' that found the Rauch model

inappropriate for use in vertical equating situations. The three-parameter model

procedure used here also did not generally produde acceptable &sults in more

complex situations where we might have expected it to do so. The best advice at

this point would seem to be to pay very close attention to the_propertieerof the

test items. If the tests differ very much in their properties, then classic

equipercentile equating is suggested.

17

Page 18: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

REFERENCES

AngoEf, W.H. (1971) Scales, Norms and Equivalents Scores. In R. L. Thorndike.(ed.). Educational Measurement (2nd ed.). Washington, D.C.: American Councilon Education.

Cureton, E.E. & Tukey, J.W. (1951) Smoothing frequency distributions, equatingtests, and preparing norms. American PaYchologist, 60 404. (abstract)

Divgi, D.R. (1981) Does the Reach model really work? Not if you look closely.Paper presented at annual meeting of American Educational Association, LosAngeles.

01

Forsyth, R., Saisang3an, U., & Gilmer, J. (1981) Some empirical results related tothe robustness of the Reach model. Applied Psychological Measurement, 5175-186.

Guskey. T.R. (1981) Comparison of a Reach model scale and the grade-equivalentscale for vertical equating of test scores. Applied Pavpholooical Measurement,1, 187-201.,

1/4

\..

Holmes. S.E. (1982). Unidimensionality and vertical equating with the Resch model.Journal of Educational Measurement, 11,139-147.

Holmesi S.E. & Doody-BOgen, E.N. (1983) An empirieal study of vertical equatingmethods using the three-parameter logistic model. Piper_ presented at theannual'meeting of the National Council on Measurement in Education, Montreal.

Hulin, C.L., Lissak; R.I., & Drasgow, F. (1982) Recovery of two- andthree - parameter. Logistic item characteristic curves: a Monte Carlo study.Applied Psychological Measurement, 60 249-260.

IMSL, Inc,. (1980) Internitiona athematica d-Ste Jet cal Libraries ReferenceManual, Houston, T X

Kolenja.--,1-Comparison of traditional and item response theory methods forequating tests. Journal of Educational Measurement, Ls, 1-11.

Levine. R.S. (1955) Equating the score scales of alternate forms administered to. samples of different ability. Research Bulletin No: 23. Princeton, NJ:'Educational Testing Service.

Levine, R.S. (1958) Estimated national norms for the Scholastic Aptitude Test.Statistical Report No. 1. Princeton, N.J.: Educational Testing Service.

Lord, F.M. (1980) Practical application of item response theory. Hillsdale, NJ:Lawrence Erlbaum.

Loyd, B.H. (1983) A comparison of the stability of selected vertical equatingmethods. Paper presented at the annual meeting of the Alerican EducationalResearch Association, Montreal.

Page 19: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Loyd, B.H. O.Hoover. H.D. (1980) Vertical equating using the Raschsodel., Journal of Educational Measurement, 17, 179-193.

Macro, G.L., Petersen, N.S., & Stewart, E.E. (1979) A test of the adequacy ofourvilinar score equating model. Paper presented at the Computerized AdaptiveTesting Conference, Minneapolis.

Petersen, N.S., Cook, L.L., & Stocking, M.L. (1981) IRT versus conventionalequating methods: A comparitive study of scale drift. Paper presented atannual meeting of American Educational Research Association, Los Angeles.

Slinde, J.A. & Linn, R.L. (1978) An.exploration of the adequacy of the Reach modelfor the 'problem of Vertical equating. Journal of Educational Measurement,23-35.

Slinde, J,A. & Linn, R.L. (1979) A note on vertical equating via the Reach modelfor groups of quite different ability and teats of quite different difficulty,Journal of Educational Measurement, 14, 159-165.

Tinsley. H.E. & Davis, R.V. (1975). An investigation of the Reach simple logisticmodel: Sample-free item and test calibration. Educational and PsvchaloaicalMeasurement, a2, 325-339.

Whitely, S.E. & Dawis, R.V. .(1974) The nature of objectivity with the Rauch moctol.Journal of Educational Measurement. ice, 1P-178.

Wingersky, M.S., Barton, M.H.,& Lord, P.M. (1983)- LOGIST: A computer program forestimating examinee-ability and'item characteristic curve parameters. LOGIST5,__versibh 1. Princeton, NJ: Educational Testing Service

Wright, B. D., Mead, R. J., & Bell, S. R. (1980) BICAL: Calibrating items with the%tech model. Research Memorandum No. 23c Chicago: Statistical Laboratory,Department of Education, University of Chicago.

Wright, B. D. & Stone, M.H. (1979) Best test design. Chicago, IL: Mesa Press.

19

Page 20: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Teals 3.

Raw Score Rims and Standard Striations for Generated Data

Test A Toot 11

e Haan $ .0 ; : e 16.J...n Lai0 .1 .0 17.54 7.23 0 .1 .0 7.5S 1 210 .1 :2 21.01 5.19 0 A 210 Al .0 17.79 7 .12 ' 0

.2. .15 .10.1 .2 XJ5 s.00

0 1 .2 21.20 5j$ : 0 .1 .0 17 .6 ., 0 30

o .5 .0 17.31 1j.3 0 1.1 .0, *

0 .20 .5- .0

20.55 4.5117.75 6.00 -- 0

01.1---.T2---2-.04 6.241.1 .2 21.01 6j11. *

0 .S .2 20_45-5-.09- 0 1.1 .0 17.3* 7.5Q

--------- ---0 1 .1 .0 17.41 7 .55 . 0 .5 Ji 17.3*0 1.1 .2 20.53 1.31 0. .5 .2 21.14 5.00 1.1 .0

1.117.12 7.12 0 .5 .2 2035 5.020 .2 21.20 1..36 0 .5 .0 17.63 5.50

.1 .0.5.1 .2

.0 .0

17.34 7.0120.52 5417.57 7.21

.s

.5

.5

.1 .o 17.42 7.05

.9 .2 20.3* 5 A0.9 .2 21.01 5 Ag

-.s .2 21.04 5.17 .5 .1 .0 37.35 47-.5 5 .0-a .5 .2

17.i321.16 5.07

.5.5

1.1 .0 17.41 7.601.1 .2 20.51 E.37

-.5 .5 .0 3.7. 6.04 .5 1.1 .2 21.16 6.27-15 -5 .2 20.97 5.10 .5 1.1 .0 17.55 7.70

....5 1.1 .0...5 1.1 .2

17.51 _ .4021.03 32

.5.55 6.5.5

.22217..0571

....7 1 .1 .01.1 .2

17.49 7.,20.89 6.42

.5 .5 ..2 20.63 5.15.3 .0 17 .39

-1.0 -1 .0 17.41 7.00 1.0 .1 .0 17 .37 7 44-1.0 .1 .2 20.14 5.50 1.0 . .1 .3 20.11 .6 .02-1.0 .1 .0 17.21 7.06 1 .0 .1 .2 21.04-1.0 9 .2 21.11 S AI 1.0 .9 .0 17.13 7.0$-1.0 .51, .0 12.47 1.12 1.0 1.1 .0 17 AO 7 .40-1.0 -5 -2 21.00 5.14 1.0 1.1 .2 20.75 .32-1.0 .5 .0 17.43 1.05 1.0 1.1 .2 20.55 - .41

A .5 .2 21.14 5.14 1.0 1.1 .0 17.60 .7.41

-1.0 1.1 .0 17.65 715 1.0 .5 .0 17.13-1.0 1.1 .2 20.51 1.46 1.0 .5 .2 20.57 5.15.-2 A 1 J. .0 17.55 7.41 1.0 .5 .2 2054-140 1.1 .2 n.ss 6.32 1.0 .5 .0 17.40 6.00

20

Page 21: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Table 2

Unweighted Meat Square Error for Teat Eouatint Cases

Test A Test

c.

.$ .0 Al .0. 8 .2 .6 .2.8 .0 .8 .2. 8 .2 .8' .0

.5 .0 1.1 .0

.5 .2 1.1 .2

. 5 .0 1.1 .2

. 5 .2 1.1 .0

I' 1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0

:8:8

.8

.0

.2

.2

.8

.8.

.8

.0

.2

.2

.0

.5 .0 1.1 .0.5- .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0

1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0

.8 .0 .8 .0

.8 .2 .8 .2

.8 .0 .8 .2.8 .2 .8 .0

.5 .0 1.1 .0..5 .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0

... . -- _1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0

SA - - 0.0

LINEAR.

E0 T'ILE RASCR 3-PARA

.000 .000 .000 .000

.003 .004 .000 .001

.088 .010 .231 .045

.003 .016 '.172 .037

.037 .043 .501 .146

.086 .036 .309 .133

.041 .042 '.140 .082

.028 .039 .890 .204

.094 .074 .854 .271

.195 .079 .422 .214

.705 .132 1.955 .563

.060 .041 .158 .111-bA a Tr .5

N,

.227 .008 .054 .oso

.369 .055 .185 .043

.391 .023 :826 .120

.163 .008 . .120 .058

.160 .040 .199

.479 .069.494

.133.387 .078

.210.157

.128 .036.370.596 .210

.605 .107 .945 .288

.427 .095 .982 .270

.819 .175 2.817 .563

.426 .040 .352 .131

-1.0; Tis 1.0

1.326 .011 .161 .1331.693 .072 .672 .1422.383 .112 1.787 .2281.109 .013 '.183 .117

.916 .059 .443 .2751.386 .178

.420 .1821.807 .090

1.174 .343.740 .058

.442 .212

3.099 .122 1.286 .5192.455 .107- 2.000 .4773.967 .234 4.585 .4782.168 .032 .807 .216

Page 22: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Table 3

Weighted Mean Square Zrtor for Test Equating Costs

me g TEST

e.8 .0 .8 .0.4 .2 .8 .2.8 .0 .8 .2.8 .2 .8 .0

.5 .0 1.1 .0

.5 .2 1.1 .2

.5 .0 1.1 .2

.5 .2 1.1 .0

1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 a1.1 .2 .5` 9.0

.8 .0 .8 .0.8 .2 .1 .2.11 .0 .8 .2Al .2 .8 .0

.5 .0 1.1 .0

.5 .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0

1.1 .0 .5 .01.3. .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0

.8 4 .0. .8 .0

.8 .2 .8 .218 .0 .8 .2.8 .2 .8 .0

.5 .0 1.1 .0

.5 . .2 1.1 .2

.5 .0 1.1 .2

.5 .2 1.1 .0

1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0

bag. bp. 0.0

CASE

=al ED ILE !ASCE 3 -PAPA

.000 °.000 .000 .000

.001 .002 .000 .001

.004 .004 .132 .030

.002 .001 .014 .005

.0E1 .023 .382 ^ .104

.029 .023 .251 .119

.021 .022 .120 .073

.019 .019 .521 .097

.081 .085 .132 .2514081 .015 .390 .244Nis .151 1.430 .510.040 .029 .129 .101

-.51 ;3 is .5

4108 .010 .058 .042.113 .002 .091 .059.198 .014 .215 .109

. .104 .008 .144 .057

.132 .031 .444 .190

.100 .025 .273 .144

.191 .039 .358 .193, .111 .024 .394 .110

.183 .047 .494 .258

.190 . .039 .512 .245

.417 .144 1.413 .414

.121 .034 .291 .114

;A is -1.01 b8. 1.0-

.870 .014 .140 .153

.775 .008 .194 .1551.484 .028 .444 .221

0.779 .013 .230 .123

.830 .048 . .514

.557 .021' .358..371.271

1.047 .034 .792 .410.441 .038 .480 .272

1.317 .031 .340 .2151.001 .033 .543 .3382.134 .093 1.142 485

.910 .014 .387 .157

22

Page 23: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Table 4

Analysis of Variance of Unweighted Mean Square Error

Source df ms

Among SsA,

Difficulty (DI) 2 5.9676 3140.642Discrimination (DS) 2 3.6789 1936.263Lower Aiymptote (L) 3 1.7128 901.474DI x DS 4 .4974 261.789DI x L 6 .3305 173.947DS x L 6 .5190 273.158DI x DS x L .12 .0019 .132

Within Ss

Method (m) 3 5.2413 363.979DI x m 6 2.3454 162.875DS x m 6 .7985 55.451L x al. 9 .4800 33.333DI x DS x m 12 .2033. 14.118DIxLxm '18 .1141 7.924DSxLxm 18 .2231 15.493DI xDSxLxm 36 .0144

Page 24: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Source

Table 5

Analysis of Variance of weighted mean square error

df MS F

Among Ss

Difficulty (DI) 2 1.345 336.657**Discrimination (DS) 2 .712 174.510**Lower Asymptote-(L) 3 .412 100.980**DI x DS 4 .002 .490DI x L 6 .034 8.333**DS x L 6 .156 38.235**DI x DS x L 12 .004

Within Ss

Method (m) . 3 1.178 136.028**DI x m 6 .762 87.991**DS x m 6 .124 14.319**L x m 9 .055 6.351**DI x DS x m 12 .041 . 4.734**DIxLxm 18 .026 3.002**.DSxLxm 18 .030 3.464**DSxDIxLxm 36 .009

Page 25: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

1.0

2.05

,_ 1.0-

2.0

1.0

0

WEIGHTED MSE

'Es

.o

CP. 0 ...aeft. OEM 4. ono ale

8A1.1aB .5

Se-1.0Ile 1.0

C

A=.8

-ie. 8

We a.m. ye

O.. .ms irrollioft rimm... mon Imm a

Tv .5

173=1.1

linearequipercentileRaschthree-parameter

20

1.0.

o

DIFFICULTY

2.0

1:0

0

DISCRIMINATION

CA-I

cB.1=,

cA11.2

c.0cB B

.

cA=.2

2.0

1.0

0

=WEIGHTED MSE

40.040.10 :)///ow*

tool)

6.0 .

bA8.0

bB.0Ab .54

.5

Woo-B.

awn

loom dowel

m19,

A8

za--8ITA.1.1-

lie .5

1 60 sum ...err. -.-"A ,..40 g 4"%1:10

ce.2ces 0LOWER ASYMPTOTES

Figure 1. Celiplots of interactions between eqUating methods anddifficulty, discrimination, and lower4symptote

CB-0cA=.2

cB '

2

c 0A2.

cAu.2CB- 0

Page 26: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

6

WEIGHTED MSE

4-----41inesr..ea7.11et-- -=34:par al I

410-..---eRasch

1.01

.41'" . aloeow.. aD.' mow 43.

;ono OMB

m.5 = ;5aB

s=1.1 , 4atm .5

0A B

...

r =.8 = .5_A14=1.1

aBw'8

TA=1.1lim .5

lr - -.5ABB = .5

'wow

a" 400.I..ammo IMO

OM...a.' MONO

nsmog.

e

3.. 8 '1 = .5.....k A.

aB=.8 a=1.1

\ . ri,..-1.0134

' '1 0. 1 .

f

Figure 211.: Cellplots of interac,tion.between equating methods, difficulty,...

and discrimination

I

0

3.01

2.04

1.0-

LUWEIGHTEp MSE1

..w

_Aig 1aB=1.1 ae. .5

S.

8

Y1'8

ar .5aB=1.1

gegia- rae -J

0 owe.

0IMAM OM OM, OOM MOD OMM

14.

26 :

Page 27: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

WEIGHTED !ISE

,-- e.. Dar.

eitasen-

2.0

1.0

04

3.0 -

2.0

'1.0

0

e.tow twOmm a. 11. WOSo ut 11.06.. Imo OM el. tffillt as Mg

CA=0"

0CA =.2

c.=.2

11.0

1.0 .

0

UNWEIGHTED MSE

.c..,;-...... ..=: -,

...-:-"7. -4------.H.

... - ..

c=0 c..2::0::cA. cB=.2 A. ' :1.

.13

vo,. 11".."1 moco.. a.0",INIM am!.

C 0 c .2 C=O=-.AA.

, eA..2 c 0 C R.

cA

CBAR u A

..2 ccBBma.2 cB

A .5; rB = 1.1

....ero N.. a

C ;0 C c . 0 c ..2cA=0 cA..2 ek

a'2 oe

1.1; AB .5

3.0

2.0

1.0

0

- hD. ..43.

car, 0 c ..2 c ="0 cA=.2c

Bsou cA..2 cA=.2 c8= 0B B

Figure 3. Ceilpiots of interaction between equating methods, discriminationand lower asyrptote

27

Page 28: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

of

2.0

3.0

2.0

1.0

WEIGHTED MSE

linear

ti.ri Rasch0

.. 4. ...0. "rk . ... . :

... aro .0. al 4. .p,.. ,.. .... a bme,._.......:

C m0 c ..2 C= 0 c m.2CAmu

A AcA.2 c

A=.2 ce 0B

.13.

b..... bti .. . ocr ' .6.......... pit.. stwmisigaboo. "w. "%ft.

w 1 .....,.

c =0 c ..2 c b 0 c ..2A

cBA.v c1B4..2 -B mgcBA ..2 cA

0

BA= -.

--s

"Z."

A1Cub. b ep

OM Me

-pare

UNWEIGHTED MSE

0

2.0

1.0

A0

."5

1.0

At

A/- *I

emu. "'ND

CA =0

cBinv ::::;

c 0 c A..2cBA..2

CB= 0

I

re.

Or .0, .bow. bow Am.

A°C) c A.I.2 :1 c A..2cAa° CA: 0' c . 26

c c = 0 c ..2,

,934 cB..2

A.2 .'41A 0ce ce c

B.0 c

B..2 .2 0

A 'ce CBS

. 11 4. 0. IF .1 0Ast ' B '

,

Figure 4. 0ellplote of the interaction between equating methods, difficulty,and lower asymptote

28

Page 29: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

Figure 5'

Equating functions for four equating methods

Figures follow.

,,,

Page 30: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

4e.e5

3e. se

5

T ae.e.

15.08

19.00

5.80

8.90

48.00

121111IIM

alatiWILIDMalUwe

1 VIVOI:DerAt14401111111111111114414441141{111111111111111

8.00 5.88 18.08 15.09 20.08 25.08 38.98 35.80 40.90

TEST

Test A: 15 = 0; E a .3; c = .0; g a 0.0Test 8: 5 = 0; E x .8; c = .0; 8 s 0.0

35.88-1

T 25.00 --

T 20.08A

15.00i

10.e0-

5.00-1211E1104

.m0. ammo 2111111110CM

Mel. UKAII

eav& 1-- taPCX=VII

1 .1 1 111pilipliiiiitili111111440.00 540, 10.00 115.08 20.88 25.00 30.88 35.80 4030

TEST 11

Test A: = 0; a = .8; c = .2; ti = 0.0Test B: = 0; a= .8; c = .2; =0.0

30

Page 31: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

40.ee

33.884

38.88

J

18.80 ., .#.e. 00 01412311110

,.?" ....... apulatittVr .4..40.0 WMWM

_.....,.,drcur=erat0.80 . iiiiiiiiaitspiisiataisaliolssat

8.00 5.80 .10.88 15.00 20.00 25.80 30.00 35.00 4040TESTS

Test A; 5 = 0; I a .8; c = .0; i = 0.0Test B: 5= 0; 1 a .8; c = .2; 11 = 0.0

5.00 -C

40.10I(

35.004

30.007

T 25.88'E :$

T 28.06

A

ims

MO, GINO OM..

:

Ir_-

e.se:Ae. -es

triTIMICillME- OP- iposian31

a=

1111114.61114115.-0e Wee isite 20.ee as.es _ae.es 35.00 40

TEST 1

Test A: 5 = 0; = .8; c = .2'; 0.0 31Test 8: 5 = 0: I = .q: r = n z 1

881

Page 32: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

40.90

35.8$

38.89

' 25.89

5 ,

7 ae.ee

'A

15.08

1009

5.68

MX=VIM=Si=MIR

milli.'" taral=rnit0.80 ri itaiiii.liers -u iiit al -4 a ii41 I1 1.119.80 5.00 18.88 15.0e.2e.8e 25.88. 30.88 35.80 48.80

TEST I

AO.00

35.08

30.08

ET 25.80

5ae.ee

.A

15.80

10.80

5.88

Test A: 5 s0; Bati .5; c_ .0; 8 * 0.0Test II; B 0; a 1.1; c = .0; e s 0:0

CCTJUIPI.03MMT.AIT .. 11=31

IS=gr=ante.ce -7-1-1-77779-1-1-14 1441111f 11-4110111111111cee 5.80 sem 15.001040 25.98 30.88. 35.08 4840,

TEST

Test A: 5 = 0; I = .5; c = .2; e = 0.0Testl; 15,4= 0; I = 1.1; c = .2; = o.o

32

Page 33: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

3i.8.4

3e.88

25.88

S

T aces

A

16.84

18.88

S.80

OA. iliiiilliAtiffilT4146410.80. S.08 18.08 15.88 20.80 25.00 30.00 35.80 40.09

a-

4001MOUR

""""71.011110120111031

agra=rfas

49.80

35.08

38.88

TEST

Test A: B = 0;Test B: 8 = 0;

81

= .5; c = .0;= 1.1; c it .2;

8 .1 0.0

a 0.0

5.08i /

eiee=i4frrrynTr-rrrr,

j

=mum.1.841SCI. 111

111:::31 'a.. tiled

tcrall:0011.1I

0.88 5.08 10.00 15.08 20.88 25.00 30.08 3S.08

TEST 1

Test A: 5 = 0; = .5; c = .2; Om 0.0Test B: c = 0; = 1.1; c = .0; 8 = 0.0

4e.ee

Page 34: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

49.00

2530

3e.ee

T as.ee

$

T ae.ee

'A

15.00

18.09

s.ee

0

d.

°,01141MVIM

*WM

!4"=4,014MWM4Cee 41T1 I II i i74Fistipt up-tie wag8.00 5.88 10.80 is.ee 28.08 25.88 38.80'35.88 40.88

TEST

Test A: 5= 0; 3 = 1.1; c = .0; ; = 0.0Test B: = 0; a= .5; c = .0; 13 = 0.0

eC41MUA'. VOC CIU543

e=as

.reft _t

8.8EF

e.ea s.ee tem is.ee 28.88 25.88 38.88 35.08 49.00

TEST

Test A: 5 .= 0; 1 = 1.1; C = .2; 6 = 0.0Test S: 5 = =

4.= G = .2: T1

Page 35: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

'leen

548

40.00

25.00

30.00

.1111115.89- 10.08 15.08 28.88 2545

TEST

Test A: 6 = 0; I 1.1; c a .0, 5 =4.0Test B: 5 = 0; E 8. .5; c = .2; 5 = 0.0

...v.Ustampozzant

lidliIirrr30.80 35.08 48 0

10.00

5.03MUM

--a 11entiltfa=WOO

--dw tran=nt.tii1.114114141iiiiIk1141111 41-.111.11i-1SM.848 5.08 10.03 15.00 29.00 25.08 30.08-3540 40.08

TEST .O

Test A: 5 = 0; a = 1.1; c,

35.2; =0.0

Page 36: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

.38.89

0.00 5.99 ie.ee 15.00 2940 25.00 30.011 35.00 4.911TEST I

004.4,-

WM=.1110111CM.

MIMI

'"''"` =mg

Test A:. 5 = -.5; a = .8; c a, .0; -.5Test B: B.= .5; a a .8; c a .0; 5 = t5

?nom

?aeonA

isle'

e.relaiwiftfro, otex=In=--- =ILE4#1 ialei 'awl-141611;1 I I'

e.ee SAO ie.ee 15.00 um 25.00 301.0$ 25.00 40.09i TEST

Test A: E r -.5; 3= .8; .c= 2; g -5Test .5; a .8; c_ .2; ; r .5 '36

Page 37: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

4.94

311.9t

sloe1=1:01111

311111111M

Malmom.. laale

049 S.90 11.00 15m 80.00 2540 3040 25, 40.0$.TEST 0

Test A: 5 = -.5k 5 a .8; c.* .0; 8 'Test B: 5'.= .5; E a .8; c a :2; 11

4040

-.5.5

35.99

30.8e

T 25.80

29.00

A

15.00

19.00

5.90

15.COC4.414114-1341111i1C

CTITLIM..-....- 3^"Itli

11=1-0.0411.0110AMMI

0 Crallra=k111.10.00 5.00' 10.00 15.90 20.09 26.30 MOO 35.00 40.80

TEST

Test A: 5 = -.5; E = .8; c = .2; i = -.5Test 3: 5 = . A-= .8; c = .0; a = .5

Page 38: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

40.00

35.00

3161

er 25.68

-T 26.80

A

15.00

Mel

5.80

a es'

is/e//

// /////.. 111211271111.

//e/ - ri wow=WOWOOpiiINIMORTIS

OM SAO 10.00 15.00 20.00 25.00 30.00 35.00 44.60

TEST

Test. A: 6 = I a c a .0; g -.5Test B:_5 = .5; a =, 1.1; c = .0; = .5

4040

ET as ee

S

T 20.00.7

t

5.00 4

LII=11- COPIZZORTLE

0.80 4

8.40 5.08 10.88 15.88 20.80 25.80 38.40 25.00 40.80

TEST I

Test A: 5 2 INS; I = .5; c = .2;. I = -.5Test B: 8 = .5; 3 = 1.1; c = . ; = .5

38

Page 39: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

1g 80es-0

30.01

T.as.se

5

T 26.06

A

15.0e

10.88

5.08

4.1%7doe

r

cmgral311411=12611101

U1111111

00$1pii111111814 1 4 1',11 11111111

"41 IMIPII=Nrvi$ 88 mr.19-19.20-15.00 ae.es 25.88 38.8. 25.88 48.66

EST 1

lk 35.00--

ET as 004

5

T aeon

A

15.00 !e

10.084,

5.80-:

Test A: 5 -.5; 1 .5; c g a --5Test 2: 5 = .5; a 1.1; c = .2; 8 a .5

0.$9. 5..09 1048 15.00 20.09

TEST

Test A: if = -.5; . .5;Test 13: 5 = .5; a = 1.1;

=7"3.1t111I ipla=g111

Wm*"- 11=1":=11111/11 119-111-11-111125:80 30.80 3540 40.80

c = .2; g = -.5c = .0; 0 = .5

Page 40: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

4C.Ce

39.80-E

T as.ee-:

5

T 28.43ei

A

19.88-;

. tic

.iimas."1"2

am=

.-----ampacestras0.88 iiiiii1 l4 i4 61 ii-4114-411144 4411646146118.88 5.88 18.88 15.88 28.88 25.81 38.88 35.88 48.88

TEST.

Test A: 6 = -.5; a is 1.1; c = .0; 8 e -.5Test 8: 6 = .5; a .5; c * .0; I a .5

40.0

.35.80

30.00

Er um5

T 29.80

A

15.88

se.ee

5.ee=:r ON

- --- lar=1:1Weltarts

esecitIiiii7i164,111.1.-6-1114-)444411641611t61'""'"' 1=MLi0.80 543 sem mee tem 25.84 38.00 25.130 40 00

TEST i

Test A: 5 = -.5; 5 = 1.1; c = .2; g = -.5Test B: b = .5; a = .5; c = .2; 8 = .5

40

Page 41: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

25.88

ae.eeA

15.88

/1 IV

e're. /4/7'

18.88 /5.88 if

i.

.L

400.EA..,-,.........o...e.ee. Iwo, 11181411iii.iii"14-ECT

i.e. 5.81 10488 15.08 29.88 2548

TEST I

Test A: 5 = -.5; 1 = 1.1; c = .0; i a -.5Test B: 5 = .5; a.= .5; c = .2; 0 = .5.

4440

U

T 25.08-4

T 28.881 4%A

15.884ei;//1"/

le.ee-

WEINCII1ada

11111 411 411

30.88 35.88

.41041.0.1114.

rit=70S ee *"."'44 Ong=

:A14 Sr 4.4. UN=I...., Crat:

i ice=C171.eeee

8.e4 5.88 18.88 15.80 29.88 25.88 38.89 35.88 40.88f TEST B

Test A: 6 = -.5; a = 1.1; c = .2; g = -.5Test B: 3 = .5; = .5; c = .0; g = .5

41

.

Page 42: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

36.00

" as.efE ,

ST aces

._,A

1648 f/18.88

LSOalga

011.---escrecarentCOO

J-6 71 I 11111 11111111CO. LOS MOO 16.88 28.88 atm 3840 36.8$ 411.11$

TESTS

'.Test A: 6 =,-1.0; r = .8; c .0;Test B: b a 1.0; r .8vc,a .0; g a 1.0

35' CC

3C.8.2

T 25.80

S

720.88

A

16.88

Mee

.e/

C21=1011faie - - S

II:=N%u

CSO I I I I I I 1 - tr..45=mt8.88 6.88 10.08 16.88 acqs E6.88 38.88 36.80 48.0$

TEST*

Test A: F = -1.0; C= .8; c = .2; U. -1.QTest B: b = 1.0; a = .8; c = .2; 0.= 1.0

C.

Page 43: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

35.0$-:

34.01i

25.0E' 1

0'.' 20.8071,.A

15.887-5

le.ce

5.00 -

I. Sejaa

R.30

4ece

35.88

30.08

E25.005

T acee

A

15.08

0:7103111

"WagtailWmWOO

''''.0OUOCCONTOS

603 10.88 15.88 20.00 25.811 30.80 35.80 40.80TEST I

Test A: 5 a -1.0; a = .8; c .2; ; a -1.0Test 8: 5 = 1.0: 1 = .8; c .0; 5* 1.0

e

%/

18.88de

teara5.08bags

MP NI UP:01.OWN& traVa=erntesee

0.00 5.09 10.00 15.80 28.00 25.00 33.08 15.00 40.08TEST 1

Test A: 5 = -i.0; E = .8; c = .0; U = -1.0Test B: 5 = 1.0; a = .8; c = .2* 0 = 1.0

43

Page 44: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

25.805

T 28.86

A

15.86

18.08

5.00

4949

35.00

30.00

T 25.80

T 29.00

A

15.80

5.00

= INIMOW.12=11111

lasa

auprocanut

10.00 15.80 20.00 25.00 38.08 35.0$ 40.08"

TEST I

Test A: 6 = -1.0; I = .5; c .0; g"ii -1.0Test B: 6 = 1.0; E = 1.1; c = .0; g = 1.0

4.-.P....=11122111

poimcmg

0 100411.

VP MD %ben-IliirPCX:DMILIC

Lee 1 AA kiwCM 540 leas is.ee 20.80 2540 30.30 35.80 40.80

TEST I

Test A: b = -1.0; B = .5; c = .2; g = -1.0Test B: b = 1.0; a = 1.1; c = .2; 8 = 1.0

44

Page 45: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

35.28

39.88

'T nom

S

T28.63A

15.80

10.89

538

r

"or

# ".40/10000.00.

e

Ot7.!1;18.88 :I iii1111411.1414114111p4,14ii

0.88 5.88 18.88 15.88 20.38 25.88 38.88 35.88 40.09

TEST 11

Test A: 5 = -1.0; a = .5; c = .0; g = -1.0Test B: b = 1.0; a = 1.1; c = .2; 5 ix 1.0

+00.4.

,CCM2UM11=11.1=11 31111111.=

alial/011 MO

tIrACI=Inii

T 25.88

S

T 20.80

A

............

«do'

5.88-1.-..... UMW

ocial....- 3possen.

eotte amiss611'14114M1s.ee-eime-ab:11-3e.ee- 3s4.--413.e8

Is

etrITZMI

TEST 1

Test A: 6 =Test B: 5 =

-1.0; ' =1.0; a =

.5; c =1.1; c =

.2; g ==

-1.01.0

45

Page 46: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

A 4.0

25.04

24.80

as.es

ST ae.ee

A

15.80

10.80

3polinClitWellMe18.001:111=

cos' s.es ie.es is.es 2e.ee as.ee 38.80 35

44.ec

35.0

Mee

Test A: 5 it -1.0; 1 1.1; c = .0; g = -1.0Test B: 5 = 1.0.; I = 5; c 11-.0; g = 1.0-

44.01

MEN

CZITCUIR

reamrst5.00 MU/I WOOemora=cruz

COO 4411411111411 4h1441 it1 411 11111111fLi Lee 10.0 15.00 20.00 35.88 se.es 3540 40.0TEST

-1.01.0

Test A: 5 = -1.0; I = 1.1; c = .2; g =Test B: 5 = 1.0; a = .5; c .2; 6 =

46

Page 47: DOCUMENT RESUME, ED 262 071 Skaggs, Gary; Lissitz, …DOCUMENT RESUME, ED 262 071 TM 850 548 AUTHOR Skaggs, Gary; Lissitz, Robert W. TITLE An Exploration of the Robustness of Four

38.80

ET

35.001.00

S t

(azA

18.08--

s.es

Lee

4040

35.80

39.80

iv-F.44.4 fl .-.

8.08 5.89 18.00 15.88 20.88 25.08

TESTI

smaggriaIwoModraimapme

38.00 35.88 40.80

Test A: 5 = -1.0; I = 1.1; c = .0; =Test B: 5°a 1.0; a = = .2; 0 =

7 25.80ade .

.0

.1

4r/ /de-,

.0dr

T 2e.ee / /

A df.7 b

15.00 --wt

-1.01.0

.19.00

5.00

0.00

--- C=1211p=c732lig:II

4 -,001.210M!---..wammomng

W.16111111 114 I I II 1-I WI 11 16 1

0.09 5:818 10.00 sae 2e.ee mee 30.88 35.00 48.ee

TEST I

Test A: 5 = -1.0; a = 1.1; cTest B: 5 = 1.0; = .5; c

47

= .0; 0 =-1.01.0