26
Running Head: Fit indices for unidimensionality An Assessment of Various Item Response Model and Structural Equation Model Fit Indices To Detect Unidimensionality Sam Elias University of Western Australia John Hattie University of Auckland Graham Douglas University of Western Australia Address all correspondence to: John Hattie Faculty of Education, The University of Auckland Private Bag 92019 Auckland New Zealand [email protected]

An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

  • Upload
    vodien

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Running Head: Fit indices for unidimensionality

An Assessment of Various Item Response Model and

Structural Equation Model Fit Indices

To Detect Unidimensionality

Sam Elias University of Western Australia

John Hattie University of Auckland

Graham Douglas University of Western Australia

Address all correspondence to:

John Hattie

Faculty of Education, The University of Auckland

Private Bag 92019

Auckland

New Zealand

[email protected]

Page 2: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Abstract

A fundamental problem in structural modeling and item response modeling is

to determine whether a set items is unidimensional. Both models have a plethora of

fit indicators, and the aim of this study is to compare the Stout T-statistic derived

within item response modeling with many competing indices used in structural

equation modeling to assess unidimensionality. Given that it is rare to find a

“perfectly” unidimensional test, a further aim is to ascertain how many items from a

second dimension are necessary before the various indices indicate that the item set

is no longer “essentially” unidimensional.

Elias, Hattie & Douglas Page 2

Page 3: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Within both item response theory (IRT) and structural equation modeling

(SEM) there has been a plethora of indicators aimed to help the user make decisions

about the adequacy of fit of data to various models. From the IRT standpoint, one of

the more successful indicators of dimensionality, or number of factors, is the Stout

T-statistic (described below). It has a theoretical justification, and has been shown to

most powerful at detecting one versus more than one dimension/factor. A major aim

of the present study is to assess the performance of the Stout T statistic with many of

the more defensible fit statistics commonly used within structural modeling, and to

suggest that this statistic may be of much value to SEM users. The article also

indicates how various attributes of the items (such as item difficulty/threshold,

discrimination, and guessing) can affect the performances of many of the commonly

used fit indices.

Under the common factor or structural equation models, responses to a set of

observed variables by N subjects are summarized by a sample covariance matrix. A

hypothesized population covariance matrix summarized by a number of parameters

(in CFA the chosen parameters may for example include restrictions on the residual

variances, factor loadings, factor variances and covariances) is then compared to the

sample covariance matrix for fit. A chi-square based statistic is used to provide the

goodness-of-fit test to assess whether the residual differences from the hypothesized

and observed covariance matrices are sufficiently small, and whether the

hypothesized covariance matrix should be accepted. The problems of this chi-square

statistic are well documented (Bentler & Bonett, 1980), with particular reference to

the undue influence of the sample size on the magnitude of the statistic.

With few exceptions, most of the currently used goodness-of-fit statistics are

ad hoc, and it is rare to find a theoretically justifiable criterion. Exceptions include

Elias, Hattie & Douglas Page 3

Page 4: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

the more recent RMSEA or Dk statistic, and the Stout T-statistic. In a study aiming

to assess the performance of a large set of goodness-of-fit statistics, Marsh, Balla and

McDonald (1980) reported five such indices that appeared to be relatively

independent of sample size. These indices are the fitting function incremental type 2

index (FFI2), the likelihood ratio incremental type 2 index (LHRI2), the chi-square

incremental type 2 index (CHII2), the Tucker-Lewis index (TLI), and the Cudeck

and Brown rescaled Akaike information criterion incremental type 2 index (CAKI2).

In a later publication, Marsh and Balla (1994) recommended the use of five further

indices, McDonald’s Dk index (also referred to as RMSEA), McDonald’s Mc index,

the goodness-of-fit index (GFI), the adjusted goodness-of-fit index (AGFI), and the

relative noncentrality index (RNI). These ten indices are based on variations of the

basic chi-square statistic, and each index is scaled to lie on an interval in which the

end points are defined for a perfect model fit and for no model fit. Table 1 presents

the ten effective CFA indices, the original investigator(s) of each index, and the

formula.

Four of these indices Dk, Mc, GFI and AGFI are considered “stand-alone”

indices, as they are based on a hypothesized (target) model fitting the sample data.

The remaining are considered Incremental Type 2 Indices as they are based on the

difference between a target model and an alternative model such as a null model. A

Type 2 index is defined as |t-n|/|e-n|, where t is the value of the stand-alone index for

the target (hypothesized) model, n is the value of the index for the null model (all the

variables are assumed to be uncorrelated), and e is the expected value of the stand-

alone index if the target model is true. The six incremental Type 2 indices are scaled

to vary on the interval 0-1, with 1 representing a perfect fit and 0 representing the fit

Elias, Hattie & Douglas Page 4

Page 5: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

of a null model; however, sampling fluctuations may produce values outside this

range.

A major purpose of fit indices is to detect model misspecification rather than

the adequacy of modeling the null hypothesis. Those who use structural models and

factor analysis need indices to assist in detecting well-fitting from badly-fitting

models (see Maitai & Mukherjee, 1991). Most of the evaluations of these indices,

however, have been related to the sensitivity of these indices to sample size or to the

estimation method. There are far fewer evaluations as to their sensitivity to model

misspecification. Hu and Bentler (1998) assessed various indices under differing

conditions of under-parameterization and concluded that CAK, GFI, and AGFI are

not sensitive to misspecification; and TLI, RNI, Mc and RMSEA are moderately

sensitive to simple model misspecification (they also found interaction effects with

method of estimation). They called for more studies on the performance of fit

indices under various types of correct and misspecified models.

The Stout T-statistic

The principle of local independence requires that responses to different items

are uncorrelated when θ (the person’s ability) is fixed, although it does not require

that items be uncorrelated over groups in which θ varies. As many have

demonstrated, the statement of the principle of local independence contains the

mathematical definition of latent traits. That is, θ1, .. .,θk are latent traits, if and only

if they are quantities characterizing examinees such that, in a subpopulation in which

they are fixed, the ability scores of the examinees are mutually statistically

independent. Thus, a latent trait can be interpreted as a quantity that the items

Elias, Hattie & Douglas Page 5

Page 6: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

measure in common, since it serves to explain all mutual statistical dependencies

among the items. As it is possible for two items to be uncorrelated and yet not be

entirely statistically independent, the principle is more stringent than the factor

analytic principle that their residuals be uncorrelated. If the principle of local

independence is rejected in favor of some less restrictive principle then it is not

possible to retain the definition of latent traits, since it is by that principle that latent

traits are defined. It is possible, however, to reject or modify assumptions as to the

number and distributions of the latent traits and the form of the regression function

(e.g., make it nonlinear instead of linear), without changing the definition of latent

traits.

A set of items can be said to be unidimensional when it is possible to find a

vector of values θ = (θi ) such that the probability of correctly answering an item k

is π ik = f k(θ i ) and local independence holds for each value of θ . This definition is

not equating unidimensionality with local independence, because it can further

require that it is necessary to condition only on one θ dimension and that the

probabilities pik can be expressed in terms on only one dimension.

McDonald (1981) outlined ways in which it is possible to weaken the strong

principle, in his terminology, of local independence. The strong principle implies

that not only are the partial correlations of the test items zero when the latent traits

are partialled out, but also the distinct items are then mutually statistically

independent and their higher joint moments are products of their univariate moments.

A weaker form is to ignore moments beyond the second order and test the

dimensionality of test scores by assessing whether the residual covariances are zero

(see also Lord & Novick, 1968, pp. 225, 544–545). McDonald (1979, 1981) argued

Elias, Hattie & Douglas Page 6

Page 7: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

that this weakening of the principle does not create any important change in anything

that can be said about the latent traits, though strictly it weakens their definition.

Stout (1987, 1990) also used this weaker form to develop his arguments for

“essential unidimensionality”. He devised a statistical index based on the

fundamental principle that local independence should hold approximately when

sampling from a subpopulation of examinees of approximately equal ability. He

defined essential unidimensionality as: a test (U1,. .. ,UN ) of length N is said to be

essentially unidimensional if there exists a latent variable θ such that for all values

of θ ,

1N(N −1)

Cov(Ui ,Uj |θ )1<i≠ j<N∑ ≈ 0. (1)

That is, on average, the conditional covariances over all item pairs must be

small in magnitude (for more detail regarding the theoretical developments see

Junker, 1991; Junker & Stout 1991; Nandakumar, 1987, 1991; Nandakumar & Stout,

1993; Stout, 1990). Essential unidimensionality thus replaces local independence

and is based on assessing only the dominant dimensions.

Stout then developed an empirical notion of unidimensionality to match his

weaker form. Either a subjective analysis of item content or an exploratory factor

analysis is used to develop a core set of items, and this set is termed by Stout the

assessment subtest. The remaining set of items, termed the partitioning subtest, is

used to partition examinees into groups for a stratified analysis. When the total set

of items is unidimensional, then the assessment and partitioning set are both

unidimensional, but when the dimensionality is greater than one, then “the

Elias, Hattie & Douglas Page 7

Page 8: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

partitioning subtest will contain many items that load heavily on at least one other

dimension not measured by the assessment subtest”.

Stout (1990) defined essential dimensionality (dE) of an item pool as the

minimum dimensionality necessary to satisfy the assumption of essential

independence. When dE=1, essential unidimensionality is said to hold. Essential

dimensionality only counts the dominant traits that are required to satisfy the

assumption of essential independence. Stout introduced a T statistic, which will be

small when the assumption of essential independence holds within subgroups and H0

will be accepted (the test is essentially unidimensional). When the assumption of

essential independence fails within subgroups, T will be large and H0 will be

rejected.

There have been recent reviews of the Stout procedure that have demonstrated

the power of the procedure (Hattie et al., 1996), and refinements that have improved

some of the steps within the Stout procedure (Stout, 1996, 1997). A more detailed

description of these improvements to the Stout method can be found in Elias (1997),

Kim, Zhang, & Stout (in press), Nandakumar and Stout (1993), and Nandakumar,

Yu, Li, and Stout (in press).

Purposes of the present study

The aim of this presentation is threefold. First, it is possible to convert many

of the goodness of fit indices developed within the common factor model or

structural equation modeling for use in item response theory and vice versa, and then

compare them to the Stout T-statistic (see Gessaroli & de Champlain, 1996).

Second, an aim is to ascertain how many items from a second dimension are

necessary before the various indices determine that the item set is no longer

Elias, Hattie & Douglas Page 8

Page 9: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

essentially unidimensional. Third, most simulations merely use items with

unknown/unspecified psychometric properties to assess the effects on the various

indices. Instead, in this study items of varying qualities are added to demonstrate the

importance of considering items with particular qualities, especially demonstrating

their effects on the various indices.

It is noted that, while these goodness of fit indicators can be compared, there

are important distinctions between the IRT and SEM models. The former is

fundamentally a non-linear model, as the probability of getting an item correct is

usually non-linearly related to the underlying ability (θ) (typically a normal ogive

function), and the latter is commonly used as a linear model—although there are

many instances of non-linear SEM models (refs,).

Method

A unidimensional set of 15 items was generated. It is not uncommon for test

developers to use approximately 15 items to measure a single ability, and longer tests

often contain shorter subtests about this length. With 15 items it is possible to define

up to 5 dimensions given the claim that at least three items are needed to define a

dimension (McDonald & Mulaik, 1979).

To generate the data (the core items sets representing one dimension and the

added items representing the second dimension are correlated), a multidimensional

compensatory three parameter model as outlined in Hattie (1984) was used, whereby

the probability of a correct response is given by

P(xij =1 | ai ,bi ,θ j) = ci +1− ci

1+ exp{−d (aikθ jk − bik )}k∑

Elias, Hattie & Douglas Page 9

Page 10: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

where P(xij=1) is the probability of a correct response to item i by person j; aik

is a vector of discrimination parameters for item i on dimension k; bik is a vector of

difficulty parameters for item i on dimension k; ci is the guessing parameter for item

i; θjk is a vector of ability parameters for person j on dimension k; d is a scaling

factor with a value of 1.7.

Using this model, it is possible to vary the item difficulty, discrimination and

guessing parameters, as well as the number of dimensions underlying the simulated

data. For this study, data sets were generated with varying levels of item difficulty,

guessing and discrimination. The item parameter levels were chosen to be as

realistic as possible, so that the results from the simulation can be of practical

significance for teacher-made and nationally normed tests.

To the core unidimensional 15 item sets, a further 15 items from a second

dimension were sequentially added. With each addition of an item from the second

dimension, it was intended that the dominance of the first dimension would

progressively decrease, and that this would be reflected in the values of the eleven

indices which were calculated at each sequential item addition. The aim of adding

these items sequentially was to assess at what point the Stout-T index and other

indices indicate that the first unidimensional set was no longer essentially

unidimensional.

For the core unidimensional items, two sets of difficulty values were chosen (-

1, -.5, 0, .5, 1; and -2, -1, 0, 1, 2), and guessing was either 0, or .15. The 4

parameters, relating to the sequentially added 15 items, were completely crossed to

produce 36 generating models: Difficulty (narrow [-1, 1] or wide [-2, 2]) x guessing

(0, .15), Discrimination/correlation (.1, .3, .5) x difficulty of the added items (all -1,

Elias, Hattie & Douglas Page 10

Page 11: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

0, or 1). For each of these 36 models, item sets using 1000 examinees (assuming

ability was normally distributed) and 15 replications were employed to estimate each

index (details are in Elias, 1997; Hattie, 1984, 1985). That only 15 replications was

used is a limitation of this study, although this number of replications was

sufficiently powerful to find reliable differences, and the computer costs of

increasing the number of replications was prohibitive (given that the number of runs

increased factorially relating to the number of addition items). NOHARM (Fraser,

1981), DIMENSION (Krakowski & Hattie, 1992), and DIMTEST (Stout, et al.

1991) were used.

Results

The first section of the results summarizes the results for the structural model-

based indices. In all cases, these indices varied within their expected limits, although

in no case did the index vary over the complete potential range nor was there an

observable transition point indicating a change from a uni- to two-dimensional set of

items. Only for the Tucker-Lewis procedure did the index fall in the range

considered indicative of “good fit” (in this case >.90; Bentler & Bonett, 1980,

although Bentler, 1999 argued for > .95) when there was a unidimensional set, but it

also remained in this range for the two-dimensional set. Only three of the indices

increased (or decreased) as expected when the dimensions were close to orthogonal

(r = .1), and the LHRI2 and CAK12 did not decrease under any condition.

Five of these indices were able to discriminate in the expected direction under

all conditions (RNI, CHI2, MC, Dk, & Stout T). The other indices moved in the

opposite direction than expected either when there was a low relationship between

Elias, Hattie & Douglas Page 11

Page 12: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

the dimensions, or when easy or average difficulty items were added (Table 2). Only

the Stout T index provided an indication of transition from one to two dimensions.

As it is a typical and often highly recommend index within structural equation

modeling (and for space reasons), the performance of the Tucker-Lewis index is used

as an illustration of the effects of adding items. The following series of graphs show

the TLI variation for each parameter level as items are added to the unidimensional

set (Figures 1 to 4). The average value of the TLI for the set of 15 items was .994. It

increased in value when very difficult items were added, and thus falsely implies that

the set is becoming more unidimensional as the number of these items from a second

factor is added, and only decreased when easy or medium difficult items were added

(Figure 1). TLI also increased as the correlation between factors is .5 or greater but

only after at least half the items from the second dimension are added, otherwise it

decreased as desired. The changes in slopes were similar when there was and was

not guessing but the magnitude of the TLI increased in the presence of guessing. A

major problem is that in all conditions the TLI remains above the .9 level, and even

when it performs as desired the differences are reflected only at the second and third

decimal point.

Table 3 presents the manner in which each index was affected by the addition

of certain types of items (note, all except Dk and Stout are expected to decrease as

items are added). None of the Type II indices nor the Mc index performed in a

satisfactory manner, and tended to perform in the expected direction only when the

more difficult items were added. The Cudeck and Browne rescaled Akaike

Information Criterion decreased as items from the second dimension were added, but

after approximately half the items were added it tend to increase again. While the

GFI and AGFI generally performed satisfactory, the range of values was extremely

Elias, Hattie & Douglas Page 12

Page 13: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

narrow (.97 to .99) and thus for practical purposes are of little value. Only the

McDonald Dk and Stout T statistic performed satisfactorily, and the Dk index had no

identifiable transition point(s).

The Stout T-statistic satisfied all criteria, in that it varied within its expected

limits, varied over the complete potential range, increased as expected, and there was

a clear indication of the transition from unidimensional to multidimensional item

sets. The Stout index is expected to have a small unidimensional value close to 0.0,

and a large multidimensional upper value. The index values increased as expected,

and the lower and upper values of -0.29 and 4.74 representing the 15 item

unidimensional set and the 30 item multidimensional set respectively (see Figure 5).

Across all conditions of added items, there were minor variations in the performance

of the Stout T statistic. As items were added from a second dimension that were

more difficult, and correlated less than .5 with the first dimension, then the Stout

index was more effective at determining uni- and two-dimensional item sets. Across

all conditions, the sharp transition occurred when more than one-third the number of

items in the first dimension was added.

Conclusions

Two indices performed in the desired manner, in that they both increased as the

number of items from a second dimension were added to the core unidimensional set

of items. Although worthy of inclusion as a goodness-of-fit measure, the McDonald

Dk or RMSEA index had no transition point such that a user could claim there was

evidence for a uni- or two-dimensional data set. The results from these simulations

provide strong evidence, however, that the Stout T statistic is an effective index for

assessing the degree of unidimensionality of a set of items. The decision to accept a

Elias, Hattie & Douglas Page 13

Page 14: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

set of items as unidimensional was not affected by various item parameters such as

difficulty, guessing, or correlation between factors. The Stout index more readily

detects multidimensionality under all conditions, and particularly when: there is no

guessing on the added items; there is a correlation less than .3 between the

dimensions; and when items of low or medium difficulty are added to the core item

set. After approximately one-third more than the number of items in the

unidimensional set are added, then the Stout T statistic indicates the presence of a

second dimension.

Future research could profitably investigate a generalization of the Stout T

index to detect the dimensionality, rather than only essential unidimensionality.

Kim, Zhang, and Stout (in press) have developed DETECT (Dimensionality

Evaluation To Enumerating Contributing Traits) for simultaneously assessing the

latent dimensionality structure and to determine the degree of multidimensionality

present in test data. Similarly, converting the Stout indices for use in structural

equation modeling, and investigating its properties (such as sensitivity to sample

size) could be most fruitful. Most critically, this study has pointed to the importance

of considering the effects of the nature of items and their effects on the performance

of the indices. Most studies that have investigated the properties of SEM based

indices have ignored the importance of the difficulty, discrimination, and guessing.

Elias, Hattie & Douglas Page 14

Page 15: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE

Transactions on Automatic Control, 19, 716-723.

Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-332.

Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological

Bulletin, 107, 238-246.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the

analysis of covariance structures. Psychological Bulletin, 88, 588-606.

Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures.

Multivariate Behavioral Research, 18, 147-167.

Elias, S. (1997). The effectiveness of Stout’s T and other selected indices of

assessing essential unidimensionality. Unpublished Masters thesis, University

of Western Australia, Nedlands, WA.

Fraser, C. (1981). NOHARM: A FORTRAN program for non-linear analysis by a

robust method for estimating the parameters of 1-, 2-, and 3-parameter latent

trait models. Armidale, Australia: University of New England, Centre for

Behavioural Studies in Education.

Gessaroli, M.E., & de Champlain, A.F. (1996). Using an approximate chi-square

statistic to test the number of dimensions underlying the responses to a set of

items. Journal of Educational Measurement, 33, 157-180.

Hattie, J. A. (1984). An empirical study of various indices for determining

unidimensionality. Multivariate Behavioral Research, 19, 49-78.

Hattie, J. A. (1985). Methodological review: Assessing unidimensionality of tests

and items. Applied Psychological Measurement, 9, 139-164.

Elias, Hattie & Douglas Page 15

Page 16: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Hattie, J. A., Krakowski, K., Rogers, H. J., & Swaminathan, H. (1996). An

assessment of Stout’s index of essential dimensionality. Applied

Psychological Measurement, 20, 1-14.

Hu, L.T., & Bentler, P.M. (1998). Fit indices in covariance structure modeling:

Sensitivity to underparameterized model misspecification. Psychological

Methods, 3, 424-453.

Joreskog, K. G., & Sorbom, D. (1981). LISREL V: Analysis of linear structural

relationships by the method of maximum likelihood. Chicago: National

Educational Resources.

Junker, B. (1991). Essential independence and likelihood-based ability estimation

for polytomous items. Psychometrika, 56, 255-278.

Junker, B., & Stout, W. F. (1991, July). Structural robustness of ability estimates in

item response theory. Paper presented at the 7th European Meeting of the

Psychometric Society, Trier, Germany.

Krakowski, K., & Hattie, J. A. (1993). DIMENSION: A program to generate

unidimensional and multidimensional item data. Applied Psychological

Measurement, 17, 252.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores.

Reading MA: Addison-Wesley.

Maitai, S.S., & Mukherjee, B.N. (1991). Two new goodness-of-fit indices for

covariance matrices with linear structures. British Journal of Mathematical

and Statistical Psychology, 28, 205-214.

Marsh, H. W., & Balla, J. R. (1994). Goodness-of-fit in confirmatory factor

analysis: The effects of sample size and model parsimony. Quality and

Quantity, 28, 185-217.

Elias, Hattie & Douglas Page 16

Page 17: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in

confirmatory factor analysis: The effect of sample size. Psychological

Bulletin, 103, 391-410.

McDonald, R. P. (1979). The structural analysis of multivariate data: A sketch of

general theory. Multivariate Behavioral Research, 14, 21-38.

McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of

Mathematical and Statistical Psychology, 34, 100-117.

McDonald, R. P. (1982). Linear versus nonlinear models in item response theory.

Applied Psychological Measurement, 6, 379-396.

McDonald, R. P. (1990). An index of goodness-of-fit based on noncentrality.

Journal of Classification, 6, 97-103.

McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model:

Noncentrality and goodness-of-fit. Psychological Bulletin, 107, 247-255.

McDonald, R. P., & Muliak, S. (1979). Determinancy of common factors: A non-

technical review. Psychological Bulletin, 86, 297-306.

Miller, T. R., & Hirsch, T. M. (1992). Cluster analysis of angular data in

applications of multidimensional item response theory. Applied Measurement

in Education, 5, 193-211.

Mokken, R.J. (1971). A theory and procedure of scale analysis. With applications

in political research. New York, Berlin: Walter de Gruyter, Mouton.

Nandakumar, R. (1987). Refinements of Stout’s procedure for assessing latent trait

dimensionality. Unpublished doctoral dissertation, University of Illinois,

Urbana-Champaign.

Nandakumar, R. (1991). Traditional dimensionality verses essential dimensionality.

Journal of Educational Measurement, 28, 99-117.

Elias, Hattie & Douglas Page 17

Page 18: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Elias, Hattie & Douglas Page 18

Nandakumar, R., & Stout, W. F. (1993). Refinements of Stout’s procedure for

assessing latent trait dimensionality. Journal of Educational Statistics, 18, 41-

68.

Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT and

SYSGRAPH. Evanston, IL: SYSTAT, Inc.

Steiger, J. H. (1990). Structure model evaluation and modification: An interval

estimation approach. Multivariate Behavioral Research, 25, 173-180.

Stout, W. F. (1987). A nonparametric approach for assessing latent trait

unidimensionality. Psychometrika, 52, 589-617.

Stout, W. F. (1990). A new item response theory modeling approach with

applications to unidimensionality assessment and ability estimation.

Psychometrika, 55, 293-325.

Stout, W. F., Nandakumar, R., Junker, B., Chang, H. H., & Steidinger, D. (1991).

DIMTEST and TESTSIM. University of Illinois, Champaign, IL.

Yen, W.M. (1985). Increasing item complexity: A possible cause of scale shrinkage

for unidimensional item response theory. Psychometrika, 50, 399-410.

Page 19: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Elias, Hattie & Douglas Page 19

Table 1. Effective CFA Indices For Assessing Model Fit.

Symbol Original Investigators Formula

Stand-alone indices

Dk McDonald, 1990; McDonald & Marsh, 1990 [χ2/(N-1)] - df/N Mc McDonald, 1990; McDonald & Marsh, 1990 exp(-Dk /2)

GFI Joreskog & Sorbom, 1981; Steiger, 1989, 1990 p/(2 Dk +p)

AGFI Joreskog & Sorbom, 1981; Steiger, 1989, 1990 1-(p x (p+1)/2df) x (1-GFI )

Incremental Type 2 Indices

FFI2 Joreskog & Sorbom, 1981 |(FFt-FFn)| / |(FFe-FFn)|

LHRI2 Joreskog & Sorbom, 1981 |(LHRt-LHRn)| / |ABS(LHRe-LHRn)|

CHII2 Joreskog & Sorbom, 1981 |( �2t- �2n)| / |( �2e- �2n)|

TLI Tucker & Lewis, 1973 (�n2/dfn-�t2/dft)/(�n2/dfn-1.0)

CAKI2 Akaike, 1974, 1987; Cudeck & Browne, 1983 |(CAKt-CAKn)| / |(CAKe-CAKn)|

RNI(DkI2) Bentler, 1990; McDonald & Marsh, 1990 (Dkn-Dkt)/(Dkn-0)

where, χ2 is the chi-square value, df is the number of degrees of freedom, N is the sample size, p=tr(E-1S), E is the population covariance

matrix, S is the sample covariance matrix, FF= χ2/(N-1); N is the sample size, LHR=exp(-.5FF), CAK=FF+2q/N; q is the number of estimated

parameters. .

Page 20: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Elias, Hattie & Douglas Page 20

Table 2. Performance of the fit-indices

(y=yes as expected, n=no)

Difficulty range

r between dimensions

Difficulty of added item Guessing

-1 1 -2 2 .1 .3 .5 -1.0 .0 1.0 .0 .15

FFI2 y y n y y n y y n y

LHRI2 n n n n n y n n n n

TLI y y n y y n n y y y

CAKI2 n n n n n n n n n n

RNI y y n y y y y y y y

CHI2 y y y y y y y y y y

MC y y y y y y y y y y

GFI y y n y y y y y y y

AGFI y y n y y y y y y y

Dk y y y y y y y y y y

Stout T y y y y y y y y y y

Total 8 8 3 8 8 7 7 8 7 8

Page 21: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Elias, Hattie & Douglas Page 21

Table 3

Manner in which the indices varied as items of various types were added.

Difficulty of items r between dimensions Range of difficulty Level of guessing

Index mid/easy difficult .1/.3 .5 -1 1 -2 2 .0 .15 FFI2 decrease stable decrease stable stable stable decrease stable

LHRI2 stable increase stable stable stable stable stable stable

TLI stable decrease increase decrease stable decrease decrease stable

CAKI2 stable de+increase de+increase de+increase de+increase de+increase de+increase de+increase

RNI stable decrease stable decrease stable stable stable stable

CHII2 stable increase stable stable stable stable stable stable

Mc stable stable stable stable stable stable stable stable

GFI decrease decrease stable decrease decrease decrease decrease decrease

AGFI decrease decrease stable decrease decrease decrease decrease decrease

Dk increase increase increase increase increase increase increase increase

Stout increase increase increase increase increase increase increase increase

Page 22: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Figure 1

TLI Variation for Additional Items According to Difficulty of Added Item Levels.

0.984

0.986

0.988

0.990

0.992

0.994

0.996

0.998

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Index

Additional Items

DAI=-1 DAI=0 DAI=1

Elias, Hattie & Douglas Page 22

Page 23: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Figure 2

TLI Variation for Additional Items According to Correlation Levels.

0.99050.99100.99150.99200.99250.99300.99350.99400.99450.9950

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Index

Additional Items

COR=0.1 COR=0.3 COR=0.5

Elias, Hattie & Douglas Page 23

Page 24: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Figure 3

TLI Variation for Additional Items According to Difficulty Range Levels.

0.9920

0.9925

0.9930

0.9935

0.9940

0.9945

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Index

Additional Items

DR=-1 1 DR=-2 2

Elias, Hattie & Douglas Page 24

Page 25: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Figure 4

TLI Variation for Additional Items According to Guessing Levels

0.99100.99150.99200.99250.99300.99350.99400.99450.9950

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Index

Additional Items

GU=0.0 GU=0.15

Elias, Hattie & Douglas Page 25

Page 26: An Assessment Of Various Fit Indices To Detect For ... · Hu and Bentler (1998) assessed various indices under differing conditions of under-parameterization and concluded that CAK,

Fit indices for unidimensionality

Figure 5

Performance of the Stout T-index as items are added to the 15-item core set

-1

0

1

2

3

4

5

6

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

Series 1

Elias, Hattie & Douglas Page 26