23
THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992/pp. 139-161 REDEFINING THE WISC-R: IMPLICATIONS FOR PROFESSIONAL PRACTICE AND PUBLIC POLICY Gregg M. Macmann The Devereux Foundation David W. Barnett University of Cincinnati The factor structure of the Wechsler Intelligence Scale for Children-Revised (Wechsler, 1974) was examined in the standardization sample using methodological advances in exploratory and confirmatory factor analysis. Three competing models were evaluated: (a) a one-factor model, (b) an oblique (or correlated) two-factor model, and (c) an oblique three-factor model. Because of high correlations across factors, the discriminant validity of the two- and three-factor models was problematic. The substantial overlap across fac- tors was most parsimoniously represented by a single general factor, consistent with the interpre- tations of O'Grady (1989). The competing factor models are discussed in relation to treatment utility and alternative assessment practices. Given the widespread use of the scale, the issues are of critical importance to public policy regarding the purposes and outcomes of special education assessments. The factor structure of the Wechsler Intelligence Scale for Children-Revised (WISC-R) (Wechsler, 1974) has been the subject of extensive research, spanning innumerable studies of the standardization sample (e.g., Kaufman, 1975; O'Grady, 1989; Wallbrown, Blaha, Wallbrown, 8c Engin, 1975) and diverse clinical and ethnic groups (e.g., Hale, 1983; Kaufman, 1979a, 1979b; Sattler, 1988). Although Kaufman (1981) suggested that little more can be learned and that attention should be directed instead to demonstrating the utility of WISC-R factors for educational and treatment planning, research has failed to support the treatment utility of the WISC-R for all but the most crude predictive purposes (e.g., Hale, 1983; Kramer, Henning-Stout, Ullman, & Schellenberg, 1987; Witt & Gresham, 1985). Testimonial evidence notwithstanding (e.g., Kaufman, 1979a, 1990; Sattler, 1988), the existing validity data translate tenuously, if at all, to meaningful instructional decisions (e.g., Cronbach 8c Snow, 1977; Heller, Holtzman, 8c Messick, 1982; Snow, 1986). More generally, the constructs measured may entirely neglect information of crit- ical importance to the design and evaluation of interventions (e.g., Barnett 8c Zucker, 1990; Hayes, Nelson, 8c Jarrett, 1986, 1987; Haynes, 1986; Rosenfield 8c Reynolds, 1990). Wechsler (1958) defined intelligence as an "aggregate or global capacity" (p. 7), but more recent theories have stressed the development of dynamic, multifaceted Address: Gregg M. Macmann, University of Iowa, 361 Lindquist Center, Iowa City, IA 522421529. 139 at PENNSYLVANIA STATE UNIV on May 10, 2016 sed.sagepub.com Downloaded from

redefining the wisc-r: implications for professional practice and

Embed Size (px)

Citation preview

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992/pp. 139-161

REDEFINING THE WISC-R: IMPLICATIONS FOR PROFESSIONAL PRACTICE

AND PUBLIC POLICY

Gregg M. Macmann The Devereux Foundation

David W . Barnett University of Cincinnati

The factor structure of the Wechsler Intelligence Scale for Children-Revised (Wechsler, 1974) was examined in the standardization sample using methodological advances in exploratory and confirmatory factor analysis. Three competing models were evaluated: (a) a one-factor model, (b) an oblique (or correlated) two-factor model, and (c) an oblique three-factor model. Because of high correlations across factors, the discriminant validity of the two- and three-factor models was

problematic. The substantial overlap across fac-tors was most parsimoniously represented by a single general factor, consistent with the interpre-tations of O'Grady (1989). The competing factor models are discussed in relation to treatment utility and alternative assessment practices. Given the widespread use of the scale, the issues are of critical importance to public policy regarding the purposes and outcomes of special education assessments.

The factor structure of the Wechsler Intelligence Scale for Children-Revised (WISC-R) (Wechsler, 1974) has been the subject of extensive research, spanning innumerable studies of the standardization sample (e.g., Kaufman, 1975; O'Grady, 1989; Wallbrown, Blaha, Wallbrown, 8c Engin, 1975) and diverse clinical and ethnic groups (e.g., Hale, 1983; Kaufman, 1979a, 1979b; Sattler, 1988). Although Kaufman (1981) suggested that little more can be learned and that attention should be directed instead to demonstrating the utility of WISC-R factors for educational and treatment planning, research has failed to support the treatment utility of the WISC-R for all but the most crude predictive purposes (e.g., Hale, 1983; Kramer, Henning-Stout, Ullman, & Schellenberg, 1987; Witt & Gresham, 1985). Testimonial evidence notwithstanding (e.g., Kaufman, 1979a, 1990; Sattler, 1988), the existing validity data translate tenuously, if at all, to meaningful instructional decisions (e.g., Cronbach 8c Snow, 1977; Heller, Holtzman, 8c Messick, 1982; Snow, 1986). More generally, the constructs measured may entirely neglect information of crit-ical importance to the design and evaluation of interventions (e.g., Barnett 8c Zucker, 1990; Hayes, Nelson, 8c Jarrett, 1986, 1987; Haynes, 1986; Rosenfield 8c Reynolds, 1990).

Wechsler (1958) defined intelligence as an "aggregate or global capacity" (p. 7), but more recent theories have stressed the development of dynamic, multifaceted

Address: Gregg M. Macmann, University of Iowa, 361 Lindquist Center, Iowa City, IA 522421529.

139

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

140 THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992

competencies and information processing capabilities (e.g., Snow & Lohman, 1989). Kaufman's (e.g., l$79a, 1979b, 1981) analyses and recommendations for profes-sional practice with the WISC-R have implied an intermediate stance, placing assessments of aggregate intellectual functioning within the context of unique pro-file patterns or presumably differentiated, specific abilities. A fundamental premise of WISC-R profile analysis is that the Verbal and Performance scales are factorially sound, and that the associated subtests have meaningful and dependable variance (e.g., Conger, Conger, Farrell, & Ward, 1979; McDermott, Glutting, Jones, Watkins, 8c Kush, 1989). Given the time required to administer, score, and interpret the WISC-R, there may be a strong inclination to overemphasize the clinical value and possibilities of the technique (e.g., Witt & Gresham, 1985).

The consensus at present is that the WISC-R is defined by three factors: verbal comprehension (Factor 1; Information, Similarities, Vocabulary, Comprehension subtests), perceptual organization (Factor 2; Picture Completion, Picture Arrange-ment, Block Design, Object Assembly, Mazes), and freedom from distractibility (Factor 3; Arithmetic, Digit Span, Coding) (Kaufman, 1979a; Sattler, 1988). Empir-ical support for the three-factor model has accrued through replication of the structure across subgroups of the standardization sample (e.g., Carlson, Reynolds, 8c Gutkin, 1983; Kaufman, 1975; Reynolds & Gutkin, 1980; Reynolds 8c Harding, 1983) and, to a lesser extent, across clinical and ethnic groups (e.g., Kronenberg & ten Berge, 1987; O'Grady, 1989). The two-factor model (e.g., Hale, 1983) (defined by the Verbal and Performance scales of the WISC-R) has gained support through exploratory and confirmatory analyses of the verbal-performance dichotomy (e.g., Ramanaih, O'Donnell, 8c Ribich, 1976; Silverstein, 1982; Wallbrown et al., 1975) and, indirectly, through evidence of the limited reliability and inconsistent repli-cation of the third factor (freedom from distractibility) (e.g., Conger et al., 1979; Peterson 8c Hart, 1979; Tingstrom 8c Pfeiffer, 1988).

A recent confirmatory analysis by O'Grady (1989) has raised the possibility of a single-factor solution to the WISC-R. A key finding was that orthogonal (uncor-rected) two- and three-factor models for the WISC-R provided less satisfactory goodness of fit than a one-factor model. O'Grady also suggested that the incremen-tal gain in fit, associated with oblique (correlated) multiple-factor models, was minimal. Although the results did not provide incontrovertible support for either a one-, two-, or three-factor model (i.e., all of the structural models were at least partially incorrect), O'Grady (1989) concluded that "a large proportion of WISC-R performance can be explained by a general intellectual factor" (p. 191).

O'Grady's (1989) findings were particularly striking given evidence that the fit indices used to support these conclusions are downwardly biased as a function of sample size (e.g., Bentler, 1990; Marsh, Balla, & McDonald, 1988; McDonald 8c Marsh, 1990). At least potentially, O'Grady's (1989) argument for a single-factor solution might be strengthened through the use of alternative estimates of model fit (e.g., Bentler, 1990). In addition, Kaufman's (1975) original analyses of the WISC-R, and many subsequent factor analyses, have relied on the Kaiser rule to determine the number of factors to retain (i.e., eigenvalues greater than 1). Monte Carlo studies (using computer simulations of known factor structures) have shown that the Kaiser rule typically overestimates the number of factors to retain (e.g.,

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 141

Crawford 8c Koopman, 1973; Zwick & Velicer, 1982, 1986). Relatively accurate criteria for determining the number of factors to retain (e.g., Velicer & Jackson, 1990; Zwick 8c Velicer, 1986) generally have not been applied to the WISC-R.

Despite the controversies, assessments of intellectual functioning are a ubiqui-tous aspect of professional practices associated with the evaluation and educational classification of children. The purpose of the present study was to reexamine the factor structure of the WISC-R in light of recent methodological advances in exploratory and confirmatory factor analysis. The study addresses in part the ques-tion of what psychologists can say with confidence based on the array of "informa-tion" that ostensibly is revealed through WISC-R profiles (e.g., Conger et al., 1979; McDermott et al., 1989; Reynolds 8c Kaufman, 1985). Given the widespread use of the scale, the boundaries of appropriate inference are critical to public policy regarding special education assessments (e.g., Heller et al., 1982; Messick, 1989; Rosenfield 8c Reynolds, 1990).

METHOD

Data

The intercorrelations and standard deviations of the 12 WISC-R subtests were examined for (a) each of the 11 age groups of the standardization sample (n = 200 per group) (Wechsler, 1974, Table 14) and (b) the average intercorrelation matrix across the 11 age groups (N= 2,200) (Wechsler, 1974, Table 15).

Data Analyses

Parallel and Average Partial Analyses. The key limitation of the Kaiser rule is that the procedure assumes an infinitely large sample size (i.e., population rather than sample data). Horn (1965) proposed that the Kaiser rule might be adjusted through the parallel analysis (PA) of randomly generated data. The PA criterion is calcu-lated by averaging the eigenvalues that result from the principal-component anal-ysis of 30 + random data matrices for a given number of variables (e.g., p- 12) and subjects (e.g., n = 200). Silverstein (1987) used a variation of the PA criterion to study WISC-R structure, but the procedure was based on principal-axw rather than principal-component analysis (i.e., squared multiple correlations rather than ones in the diagonal of the correlation matrix). Although the principal-axis version of the PA criterion suggested a three-factor solution for the WISC-R (Silverstein, 1987), Crawford and Koopman (1973) found that the PA criterion did not per-form well when eigenvalues were estimated through principal-axis factoring.

The minimum average partial (MAP) criterion (Velicer, 1976) provides another relatively accurate means of estimating the number of factors to retain (Reddon, 1985; Zwick 8c Velicer, 1982, 1986). The intercorrelations within the matrix of factored variables are examined after the variance attributable to each successive, unrotated principal component (i.e., one, two, three, etc.) has been extracted or partialled from the variables. The average squared partial correlation (associated with the extraction of each principal component) decreases progressively until

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

142 THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992

a point is reached at which additional components fail to reduce the unknown variance among the variables. The point at which the minimum average squared partial correlation is reached indicates the number of factors to retain, consis-tent with common-factor theory (Velicer, 1976).

Although imperfect, the PA and MAP criteria may be used with reasonable con-fidence to bracket a plausible range of solutions for the number of factors to retain (Velicer & Jackson, 1990; Zwick & Velicer, 1986). In the present study, Horn's (1965) PA criterion was implemented through the principal-component analysis of 60 random data matrices (p = 12; n = 200). In contrast, given that eigenvalue shrinkage is negligible for N= 2,200, the eigenvalues resulting from the principal-component analysis of the average intercorrelation matrix (Wechsler, 1974, Table 15) were treated as population values (i.e., the PA criterion converges toward the Kaiser rule as sample size increases). The MAP criterion (Velicer, 1976) was evaluated for each of the 11 age groups of the standardization sample, and for the average intercorrelation matrix.

Sampling Error for Eigenvalue Estimation. Assuming that the average intercorre-lation matrix (Wechsler, 1974, Table 15) roughly approximates the population matrix, the influence of sampling error on variability in observed eigenvalues may be estimated through simulation methods. The simulation feature of the EQS (Bender, 1989) program was used to generate 55 samples of n = 200 (with the aver-age intercorrelation matrix designated as a matrix of population values). The dis-tribution of eigenvalues (i.e., 95th and 98th percentile ranks) resulting from the principal-component analysis of each of the 55 samples was used to evaluate vari-ability in observed eigenvalues within the standardization sample. The null hypoth-esis was that observed variability across the 11 age groups of the standardization sample would not exceed sampling error (as designated by the percentile estimates for the 55 samples). Given 11 comparisons, the experimentwise error rate for the 98th percentile criterion was 22% (i.e., 11 x .02), a level generally considered toler-able (e.g., Keppel, 1982).

Confirmatory Factor Analyses. Using confirmatory factor-analytic techniques, three competing models were evaluated: (a) a one-factor model, (b) an oblique (or corre-lated) two-factor model, and (c) an oblique three-factor model. O'Grady (1989) demonstrated that the orthogonal (uncorrelated) two- and three-factor models provided poorer estimates of fit than a single-factor model; consequently, the uncorrelated factor models were not estimated for the present study. The com-peting models were evaluated and compared through the use of three measures of goodness of fit: (a) the normed fit index (NFI) (Bender & Bonett, 1980), (b) the comparative fit index (CFI) (Bentler, 1990), and (c) the parsimonious compara-tive fit index (PCFI) (James, Mulaik, 8c Brett, 1982; Mulaik et al., 1989).

Research has indicated that many indices of model fit (such as the chi-square statistic and normed fit index) are influenced by sample size (e.g., Bentler 8c Bonett, 1980; Marsh et al., 1988; Tanaka, 1987). The concerns parallel general dissatisfac-tion with the practice of significance testing (i.e., "significance" depends on sample size) and, consequently, have spurred efforts to develop more robust, descriptive

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 143

estimates of model fit, analogous to the population estimates of "effect size" avail-able for other facets of the general linear model (e.g., Cohen, 1988). The CFI (Bentler, 1990) is a statistic, based on the concept of "noncentrality reduction" (e.g., Bentler, 1990; McDonald & Marsh, 1990), that has been shown to minimize the downward bias associated with small sample size for the NFL If a no factor ("null" or "independence") model is used as the baseline for comparison, then the normed fit index (and its population counterpart, the CFI) "reveals, in relation to the observed covariance matrix, the proportional degree to which the many relation-ships observed between variables within that matrix are reproduced by the model" (Mulaik et al., 1989, p. 434). Although the use of a no factor baseline model has been criticized (e.g., Sobel 8c Bohrnstedt, 1985; Waller & Waldman, 1990), Mulaik et al. (1989) contended that "it is the more or less absolute, overall fit of these models to the same data that is important information for comparing them" (p. 434). The use of an alternative baseline model provides increased sensitivity to between-model distinctions, but the gains in sensitivity are achieved through a loss of perspective on total information accounted for (e.g., Mulaik et al., 1989).

The relative complexity of competing structural models is another dimension along which models may be compared (e.g., Bentler & Mooijaart, 1989; James et al., 1982; Mulaik et al., 1989). An estimate of "parsimonious fit" may be obtained by multiplying the CFI by the parsimony ratio for each model (i.e., the model degrees of freedom divided by the independence model degrees of freedom) (James et al., 1982; Mulaik et al., 1989). The parsimony adjustment is a simple but contro-versial technique; models with fewer degrees of freedom (i.e., more free parameters to be estimated) are less parsimonious. Unfortunately, logic and the mathematics of the procedure do not necessarily correspond (e.g., McDonald & Marsh, 1990). For example, a one-factor model for the WISC-R has the same degrees of free-dom as either an uncorrelated two- or three-factor model, even though the latter models clearly are more theoretically complex. In the absence of a readily avail-able alternative (e.g., Bentler 8c Mooijaart, 1989; Mulaik et al., 1989), however, the parsimony ratio was used to quantify the loss of parsimony associated with correlated multiple-factor solutions for the WISC-R (O'Grady, 1989).

Two methods of covariance structure analysis were used to derive parameter estimates and "fit functions" for the three competing models: maximum likeli-hood (ML) and generalized least squares (GLS) estimation. Both methods involve an iterative reweighting of the covariance matrix in terms of variable uniqueness (or specificity) and share similar properties (e.g., Bentler & Bonett, 1980). Although the two methods are expected to converge toward identical parameter estimates in large samples (Mulaik et al., 1989), Tanaka (1987) found wide variability in esti-mates of model fit across ML and GLS estimation (thus suggesting the need for multiple methods).

Completely standardized, or "path analytic," solutions (Bentler, 1989) were generated for the three competing models. Although O'Grady (1989) noted cor-rectly that the hierarchical two- and three-factor models for the WISC-R are statisti-cally equivalent to the correlated two- and three-factor models (e.g., they result in identical estimates of model fit), and thus represent "trivial" variations from the standpoint of model comparison, the model distinctions are not trivial theoreti-

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

144 THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992

cally. Thus, parameter estimates (factor-variable loadings, etc.) were calculated for the hierarchical two- and three-factor models. The hierarchical two-factor model was identified by (a) specifying an equality constraint for the second-order load-ings on the verbal and performance factors and (b) fixing the variances for the disturbances of the first-order factors at 1.0. Similarly, fixed variances (1.0) for the factor disturbances were used to identify the hierarchical three-factor model. The modeling of hierarchical factor structures permitted the estimation of second-order factor loadings and indirect effects (i.e., relationships between the second-order factors and measured variables) with no loss of information or distortion of the first-order structural equations and factor structures.

The standardized parameter estimates for the three competing models were developed from the average intercorrelation matrix (Wechsler, 1974, Table 15) rather than separately for each age group. Analyses of the average intercorrela-tion matrix have been discouraged because of the questionable status of support-ing assumptions regarding the equality of covariance matrices across the 11 age groups (e.g., Conger et al , 1979; O'Grady, 1989). Nonetheless, a statistical rejec-tion of the assumption of equal covariance matrices is problematic because of the Type II error rates associated with all pairwise comparisons across 11 groups, each with n = 200 (e.g., Bentler, 1989; Stevens, 1986; Tanaka, 1987). In fact, Conger et al. (1979) concluded that, although "differences among age groups were manifest in the subscale reliabilities and true score covariance matrices" (p. 424, emphasis added), the observed score correlation/covariance matrices were essentially equiva-lent. In light of the inordinate power to detect "statistically significant" but practically inconsequential differences across the 11 age groups, the less strin-gent finding of equivalent observed score correlation/covariance matrices (Conger et al., 1979) was viewed as sufficient justification to use the average intercorrela-tion matrix to estimate parameters.

As recommended by McDonald (1985), both the factor-pattern and factor-structure matrices were examined. The elements of the factor-pattern matrix are specified by the system of structural equations used to define a model. For exam-ple, in the two-factor model, the loading for Vocabulary on the verbal factor is estimated, but the loading for Vocabulary on the performance factor is fixed at zero. In contrast, the factor-structure matrix is the reproduced correlation matrix for measured and latent variables (Bentler, 1989), which represents the optimal estimate of the population matrix under the assumption that the hypothesized factor model is correct. "The fact that a regression weight is zero [in the factor-pattern matrix] does not imply that the corresponding correlation [in the factor-structure matrix] is zero" (McDonald, 1985, p. 39). This can reveal "interpretive problems" for confirmatory models that have highly correlated factors, because it may be difficult to "untangle the contributions of the factors to the variance of each variable" (McDonald, 1985, p. 101). Simply, the factor-structure matrix may be used to evaluate whether "the correlations of variables with 'wrong' fac-tors . . . are . . . too high" (McDonald, 1985, p. 101).

The data analyses were conducted with the Statistical Package for the Social Sciences (SPSS) (Norusis/SPSS Inc., 1988) and EQS (Bentler, 1989).

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992 145

RESULTS

Parallel analysis (PA) and minimum average partial (MAP) analysis for the 11 age groups are reported in Table 1. The PA and MAP criteria converged at 10 of the 11 age levels, suggesting a one-factor solution for ages 6/4 through 12V£ (seven age levels) and a two-factor solution at ages 13V£, 15/4, and I6V2 (three age levels). The results at age 14/4 were mixed, but generally supported a one-factor solution since the extraction of a second factor did not reduce the unknown variance in the matrix. Alternatively, on the basis of the average intercorrelation matrix (Wechsler, 1974, Table 15), the MAP criterion suggested a one-factor solu-tion, and the PA criterion suggested a two-factor solution (given that the PA cri-terion for A = 2,200 closely approximates the Kaiser rule).

As evidenced by variability in the magnitude of eigenvalue estimates across the 11 age groups, Conger et al. (1979) observed that, although studies should find "comparable across-age factors . . . , the saliency of factors may vary from age to age" (p. 434). To clarify this issue, computer-simulated data were generated to evaluate the plausibility of subject sampling as a source of observed variability in the magnitude of eigenvalue estimates. Using the average intercorrelation matrix as a matrix of population values, five replications of the WISC-R standardi-zation sample (55 groups of n = 200) were simulated. Normative expectations for sampling error were derived from the distribution of eigenvalues resulting from the principal-component analysis of each of the 55 samples. For Factors 1, 2, and 3, the 95th percentile criterion values were 5.87, 1.41, and 1.11, respec-tively; the 98th percentile criterion values were 6.15, 1.54, and 1.22, respectively. Although the observed eigenvalues for Factors 2 and 3 at age 13!4 (i.e., 1.43 and 1.11, respectively) exceeded the 95th percentile criterion (experimentwise error rate [EER] = 55%), the eigenvalue estimates across the 11 age groups did not exceed the 98th percentile criterion (which provided better protection for experiment-wise error; EER = 22%). The results suggested that sampling error could not be rejected as a plausible source of age-related variability in the saliency of factors. Despite the implication that statistical analyses might justifiably be focused on the average intercorrelation matrix, inferences regarding the number of factors to retain were still problematic because the designated population matrix might support either a one- or two-factor model (on the basis of the PA/Kaiser and MAP criteria).

The goodness-of-fit indices under ML and GLS estimation are reported in Table 2. In general, the fit indices under ML estimation supported a two-factor model. As might be expected for n = 200, the NFIs (O'Grady, 1989, p. 181) were downwardly biased in relation to the population estimates of model fit provided by the CFIs (e.g., Bender, 1990). Yet, despite the overall increase in the magni-tude of fit, the corrections for sample size bias were roughly comparable across the three competing models. As a consequence, there were no important differ-ences between the estimates of incremental fit based on the CFI and those based on the NFL Although O'Grady (1989) described the increments in model fit asso-ciated with the NFIs as "small" (e.g., .07 for the two-factor model relative to the

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

146 THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992

TABLE 1 PARALLEL AND AVERAGE PARTIAL ANALYSES FOR THE 11 AGE GROUPS

Age group

61/2

7Vi

m 9Vi

101/2

ny2

121/2

131/2

14V2

i5y2

i6y2

Criterion

Mean eigenvalues for random data (N = 200) Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial Observed eigenvalues Average squared partial

Average intercorrelation matrix (N = 2,200)

Observed eigenvalues Average squared partial

One

1.43 5.37*

.019** 5.28*

.024** 4.97*

.025** 5.79*

.028** 5.09*

.024** 5.54*

.028** 5.84*

.024** 5.42*

.039 5.64*

.030** 5.09*

.035 5.38*

.032

5.40a

.024**

Factor

Two

1.30 1.07

.024 1.16

.027 1.26

.031 1.19

.030 1.21

.034 1.25

.035 1.11

.033 1.43*

. 0 3 1 * * 1.20

.030** 1.33*

. 0 3 1 * * 1.32*

. 0 3 1 * *

1.16a

.026

Three

1.22 .92 .037 .94 .044

1.11 .035 .93 .040

1.07 .039 .87 .041 .92 .039

1.11 .041

1.00 .043

1.03 .042 .88 .042

.97

.034

aThe observed eigenvalue was greater than 1 (i.e., the parallel analysis criterion approximates the Kaiser rule in very large samples). *The observed eigenvalue exceeded the mean eigenvalue for random data. **The minimum average squared partial correlation.

one-factor model), the two-factor model showed satisfactory fit (CFI>.90) at each of the 11 age levels (mean CFI = .951) and was the preferred model in terms of parsimonious comparative fit (PCFI). Nevertheless, the CFI estimates for the one-factor model were of sufficient magnitude (e.g., mean CFI = .876) to appreciably strengthen the argument that the one-factor model might be "salvaged" through modifications (O'Grady, 1989). For example, a "minor" factor indicative of shared method variance (e.g., see Newcomb & Bender, 1988) might be added to the one-factor model by fitting correlated residuals for selected subtests (e.g., Block Design and Object Assembly). In contrast, although the three-factor model produced con-sistently high estimates of model fit (mean CFI = .965), the average increment rela-tive to the two-factor model was only .017.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

<

D

cn <

O UJ

N

<

U

Q Z <

Q O

o 1 1

en * < 13 h 5

X < 2

Q

z D en Z O on

o u Q O 5

u

03

CD N

2 <D c CD

U

O 00 t - o 00 00

o^ 0>

* —1 o tx 0> 00 & o

i n 00 o^

* "^ i - 00 oo ix cx> o^

U

O O

X CO

5

U

CN

IX 0 0

IX vO

m °

vo vO*

00 CO

x.

CO

CN ^

m i x oo IX

c o

<N O 0 0

00 ^o o^ o^ I X I X

CT» O^ O^

* - ^ ^ i -& o 0^ CT

00 00 CT>

* ~< LO CN 00 00 O ^ <T>

* x .

CM CO

IX IX

C O

* ^ O ^ CN

^ m 0^ CT

o^ CN CT>

O t x & 00

X CO

I X

CT 00*

IX IX

00 C7>

* *•< vO « t

o^ o^

0 0 0>

* ^ 1 IX Tfr 00 00 CX> 0>

LO

x,

^ m I X i%

00 IX

o^

m oo

o> o>

m o>

* ^ 00 O v- O CT 0>

x

o q \

CN r n

q q r I*

o q

«1" CN o o o o

LO o q

rfr rn o o o o

o m o

I I

00 o q

LO m O o o o

x. o q

*s0 m O o o o

IX CN O

0^ CN CN m q q

l' f

o q

CN CN o o o o

o q

CN ^ o o o o

o 00 q

^ i -o o

m T— rx Tt o o

o o

o o o o

q o r CN O

o o

CN

m

0 ^ vD oo «r o o

v O CN

vo x. 0 0 r f o o

o q

T - m q q

f r

q

tx o r - O o o

o q

oo vB i - o o o

, 5 5 =55 =55 . 5 5

* 5 . 5 5

5

1 , 5 5

CD

O

CD U C CD

TJ C CD Q . (D

-o C

X <D

c

(D > to CO __; Q- 0

o o w E

2 o

n Q. CD

"I u ° X «n CD ^

C _ ^

' " CD

o j E

a l 8 ^ ii ^

U - Q . . O <D II u_

CD o •

E E o c

ft c ^ o o

l 5 . A '

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

148 THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992

A distinctly different recommendation for model selection emerged from the fit indices obtained under GLS estimation. The goodness-of-fit estimates for the one-factor model under GLS were consistently in the high .90s. Given the magni-tude of fit for the one-factor model (mean CFI = .990), there was little opportunity for improvement in fit through the estimation of either correlated residuals or additional major factors.

The disparities between the ML and the GLS estimates of goodness of fit were primarily a function of differences in the statistical definition of a "null" or "independence" model (i.e., the "norm" for the normed fit index). The chi-square, or lack-of-flt, estimates for the no factor null model were substantially higher under GLS than under ML estimation (mean chi-square under GLS = 6,206.3; mean chi-square under ML= 1,034.4). Primarily as a consequence of these baseline differ-ences, the proportional reduction in unexplained covariation was much higher under GLS estimation. Unfortunately, because there was no basis for determining which no factor null model might provide the most reasonable baseline for com-parison (e.g., Bender, 1989; Bender & Bonett, 1980; Mulaik et al., 1989; Tanaka, 1987), the differences in goodness-of-fit estimates across the two methods essen-tially were uninterpretable. As was the case for PA and MAP analyses, the results might support either a one- or a two-factor model.

The standardized solutions for the one-factor and hierarchical two-factor models (as estimated from the average intercorrelation matrix) are detailed in Table 3. In contrast to the conflicting estimates of goodness of fit, the model parameter estimates (factor-variable loadings, across-factor correlations, etc.) were highly con-sistent across ML and GLS estimation. The key feature of the hierarchical two-factor solution was the high correlation across factors (r = .76/.79), which was reflected in the factor-variable correlations. Most notably, although only one Per-formance subtest (i.e., Block Design) was moderately correlated with the verbal factor (.600 range), four of the Verbal subtests—Information (.607/.633), Similarities (.599/.624), Vocabulary (.651/.681), and Comprehension (.560/.592)—were as highly correlated with the performance factor as were Picture Completion (.657/.665), Picture Arrangement (.607/.619), Coding (.398/.417), and Mazes (.519/.516).

Estimates for the three-factor model provided a similar set of results (Table 4). The hierarchical second-order factor was defined primarily by verbal comprehen-sion (r = .934/.942), but each of the first-order factors was a "good" measure of the second-order factor (as defined by loadings in the .80 range) (e.g., Kaufman, 1979a; Sattler, 1988). Relatedly, the verbal comprehension factor was highly corre-lated with both perceptual organization (r = .742/.770) and freedom from distract-ibility (r = .790/.815). Especially problematic from the standpoint of discriminant validity, the four verbal comprehension subtests—Information (.627/.653), Similar-ities (.623/.643), Vocabulary (.680/.703), and Comprehension (.584/.608)—were as highly correlated with freedom from distractibility as were Digit Span (.567/.564) and Coding (.456/.464).

DISCUSSION

Capitalizing on recent methodological advances in exploratory and confirmatory factor analysis, the standardization data of the WISC-R were systematically reana-

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

ro t K r- f N f N i n O ^ C O n O N v D ^ O r O C N m C O ^ O O v D r - i - r - r - T - ^ O

K ^ ^ r - O ^ N N O r n o O O i f n O 0 < ^ c o L n o t L n O r - o o ^ T - ^ o ^ i n r j - ^ i n n ^ o o o N c n i n N

N O ^ O O O r - f l O t N ^ K r - O O t

L n m » - r o t N r - m o o v O t v O O m

T— O P o i - t n o o f O f N a i ^ o f N O f - r -r - o o ^ ^ i N ^ L n t N r n K O O ^ O i

a i c o ^ t ^ ^ N f n O r - T t m K K

ffiflOOOnoOOO^tOOrnNrOvv 0 0 K M n n n 0 0 M C O v O L n r - v O ^ ^ N K ^ C O K t v O m K v O ^ t Z Z

N K f O r - ( N l O ( J i N O O N O f n ^ ^ N K O O O N t m m ^ L f l t t Z Z

c o a;

Q. c E CD c o CD C I L_ CO CD CD Q- — *- *-C ± i 3 3 o .2P."C u

c E DO CD

. ± <D O O

— "£ o .

<u - D 03 u_ 03 > .!. O u

v2 u_ CD

"D

? "O c o u en

(D

X 1—

"a: E

ft _) u 3 en »L O u

45 k_

o

o 03 CD 2_ o u

CD U 3

"D

y Q. (1) J—

0) _c

b

c

c DO en a; Q

j *

u o

CO

^ c CD

E CI) DO c 2 !_

o < CO

E

(D en CD i— OJ 3 cr

en 03

-o CD N

03

CD C CD DC

II LO - J

U c o (U

b * J

en CD

"O O O

r CD

o -* * 4 -

"D CD C

_o O £ CD S

— F H E X

F II

_ j

P 3

Q_

o CD Q. F o u CD

3

u Q. 12

c OJ a. t o

4—1

DO

n "D

C 03

r o

c CD

ft Q.

E o u

cr 03 3

- Q OJ U

O > u CD

c

CD N OJ

£ ~ ~ o «. i ! ^ u

£ K-S 1) C 4)

O K » * < £ i t 3

_OJ •

o U

OJ

O OJ

^ o ^ o

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

CD

O

O H U <

SB

< y i u <

QJ O

•P t5

" § 2 8 8 CD CD

en oo

t o

p n f n ^ m o o i - f N t f n n t o o m o o o m t N O O O N f O f l O O ^ ^ r - O O

K r O f N O t K ^ ^ O O m ^ O O f n O

T— O " 3 - v £ > h x < T > * s 0 i — Csl ffi CO o^ o O (N i - hs K r o m K

{ O f l O T - m ^ f l O ^ O O T - O O t N t

O CO o o

• 1^

O rn o r

r - CO ! O 00 CO IX

O c O J N L n O C O O O O t n

O T — h s m ^ - r - N . o o r x ^ -

t f N M r o r o c o m r - t O f N L O M O O ^ i n t N r - O C O t O K C O O ( N ^ r - ^ K K O O O N ^ - m L T i O i n r f ^ o ^ o O C O

r - ^ r o m r - o N O N f N v O t ^ m o ^ f ^ m O O > C O ( N O O ^ v £ I O O r - m O > ^ N K ^ c o o ^ - i n t O L n r o ^ o ^ K c o

g a;

c o

u ^ S •*= J2 -c S 3 -

ii CD 0

"^ 00 E 03 oo a;

C O t 'J75 $ g _ U < <D<

en CD CD ^ +1

c E

. " £ 3 3 ' CD t/> i_ i_ i_

< > O Q £ 5 : 5 0 u S i 2 i 2 i 2

c

_>? £ 0)

CD

ID O c

d.SP O £

Q

0) i—

? "O c o u CD

(D "as _

£ E_* ~ y

m a; * j -

^ § S S i o - E T «/> CD ^ 4 - 0 0 U 55 c 3 $ 03 L Ql L

O CD CD +-" N *-

y := 3

03 ^3 +-"

r- ^ C

o °°g

CD en Q-

"8 § e>

Q. CD

CD C

~2 o CD O ' £

JC O C

+- -C CD E "CD "S 2 ^ &. IS •- ~° T5

•8 j f s <S a3 II fa c

CD . x

2 CD - ^

"!P o ^5 E CD

^ . E ^ J

00 c . r- C ^

Q3 . = Jr CD O 03 ^ 03

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 151

lyzed. Departures from previous WISC-R research included the use of (a) rela-tively accurate exploratory criteria for determining the number of factors to retain, (b) computer-simulated data to evaluate the potential contributions of sampling error to observed variability in the saliency of factors across the 11 age groups, (c) estimates of comparative and parsimonious fit (CFI and PCFI) under both maximum likelihood and generalized least squares estimation, and (d) standardized factor-structure matrices for the three competing models. In general, the results lent further credence to O'Grady's (1989) arguments in favor of a one-factor model.

Although we think that a one-factor model is plausible, it is philosophically impossible to demonstrate that a one-, two-, or three-factor model is "true" in a strict sense (e.g., Bender, 1989; Breckler, 1990; McDonald & Marsh, 1990). A myriad of alternative factor models might be supported on the basis of adequately reproducing the correlations among the WISC-R subtests, including the four-factor model proposed by Horn (1985). Although truth is not necessarily constrained by parsimony (see Messick, 1989), reasonable (value-biased) arguments nonetheless can be advanced on the basis of converging lines of evidence regarding the theo-retical plausibility and potential usefulness of alternative factor solutions (e.g., Cronbach, 1988; House, 1977; Messick, 1989). In the discussion that follows, the concept of parsimony initially is applied to analyses of the discriminant validity and incremental explanatory power of multiple-factor solutions. The logic sub-sequently is extended to analyses of the functional relations between assessment activity and treatment outcome. It is argued that a one-factor model for the WISC-R is both plausible and parsimonious, and represents the most defensible factor solution of those considered. Critical issues related to the predictive and treat-ment utility of the general factor are explored, as well as alternatives to the use of general IQ in educational settings.

The One-Factor Model: An Argument Based on Parsimony

Alternative criteria for determining the number of factors to retain provided a somewhat conflicted set of results. On the basis of the average intercorrelation matrix, the one-factor model was supported by the minimum average partial (MAP) criterion and estimates of model fit under generalized least squares (GLS) esti-mation and the two-factor model was supported by the parallel analysis (PA/Kaiser) criterion and estimates of model fit under maximum likelihood (ML) estimation. Although satisfactory estimates of comparative fit also were obtained for the three-factor model, this model (a) was not supported by exploratory criteria and (b) failed to make a substantial, incremental contribution to the level of fit obtained for the two-factor model.

Analytic support for the one-factor model was drawn from the factor-structure matrices. Regardless of the conflicting guidelines for model selection, the plausi-bility of the two- and three-factor models was constrained primarily by the high correlations across factors. The estimated correlation between the verbal and per-formance factors meant that from 76% (ML estimation) to 79% (GLS estimation) of the variance in verbal-performance factor scores was accounted for by a second-order factor, an implicit hierarchical relationship (Kenny, 1979, p. 112) that was

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

152 THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992

illustrated by the second-order loadings for the two-factor model. For the hier-archical three-factor model, the second-order loadings ranged from .795/.818 (Factor 2) to .934/.942 (Factor 1). A counterargument might be raised on the grounds that the across-factor correlations were less than unity (i.e., an extremely lenient test of discriminant validity that has been widely used in the literature) (e.g., Anderson & Gerbing, 1988; Kenny, 1979), but the counterargument neglects the "practical significance" of the second-order factor loadings and high correla-tions across factors (see Anderson & Gerbing, 1988, p. 416).

The factor-structure matrices showed that the hypothesized factors could not be unambiguously defined in terms of a specific set of WISC-R subtests. Simply, a meaningful differentiation (or definition) of the latent variables was not possi-ble; too many subtests were too highly correlated with the "wrong" factors (e.g., McDonald, 1985). For example, it is difficult to conceptualize how the construct of spatial or perceptual organization abilities might be as highly correlated with scores on the Information (.607/.633) and Vocabulary (.651/.681) subtests as with scores on the Picture Completion (.657/.665) and Picture Arrangement (.607/.619) subtests. Identical analytic problems were symptomatic of the three-factor model (which was unambiguously rejected on the basis of multiple, formal criteria). The acceptance of a two-factor model requires acceptance of the theoretical anoma-lies. The high levels of variable complexity were most parsimoniously represented by a single, general factor.

The results of the present study did not dispute the demonstrated replicability of the multiple-factor solutions (e.g., Kronenberg & ten Berge, 1987; O'Grady, 1989; Reynolds & Harding, 1983), but rather drew attention to the triviality of the repli-cated factors. Triviality was inferred from the "interpretive problems" evident in the factor-structure matrices for the multiple-factor solutions. Given O'Grady's (1989) demonstration that uncorrected multiple-factor solutions for the WISC-R are implausible, the complexity of the variables was fundamental to the models considered, and cannot be dismissed in preference of an uncorrelated factor solu-tion (that would suppress variable loadings on more than one factor). As with Kendall's (1959) Hiawatha, whose arrows consistently missed the mark, little solace can be gained from the replication of theoretically trivial factors.

The strength of a general factor for the Wechsler series has been observed by numerous researchers (e.g., Conger et al., 1979; Kaufman, 1975, 1979a, 1979b; Sattler, 1988; Silverstein, 1980).

Averaged across the three instruments [Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R), WISC-R, and Wechsler Adult Intelligence Scale-Revised (WAIS-R)], the first unrotated com-ponent accounted for 50% of the total variance, and the first unrotated factor accounted for 98% of the common variance. There is, indeed, a large general component/factor in Wechsler's scales. (Silverstein, 1987, p. 383)

In fact, Silverstein's (1987) estimates of common variance accounted for (i.e., 98%) are essentially equivalent to the goodness-of-fit estimates under GLS (e.g., mean NFI = .981). Although the issues have not as yet been adequately addressed for the WPPSI-R (e.g., Gyurke, Stone, & Beyer, 1990), confirmatory support for a single-factor solution to the WAIS-R was provided by O'Grady (1983). Moreover, a

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992 153

recent clustering of subtest profiles based on the WISC-R standardization sample (McDermott et al., 1989) has provided evidence of convergence across alternative methods of structural analysis. McDermott et al. (1989) described "core profile types" that essentially were defined in terms of global or Full Scale IQ.

Perhaps the strongest argument for a one-factor solution may be drawn from research on the predictive validity of multiple-factor solutions. Hale and associ-ates (e.g., Hale, 1981; Hale 8c Raymond, 1981; Hale 8c Saxe, 1983) found that multiple-factor solutions for the WISC-R had limited incremental validity with respect to the concurrent prediction of academic achievement. Studies that have suggested a contribution of third-factor scores to the prediction of achievement typically have not taken account of either Full Scale IQ or the Arithmetic subtest. In fact, given the pattern of factor-variable loadings obtained in the present study, "mathematical achievement" represents a plausible rival hypothesis for the mean-ing of the third factor (i.e., essentially a "singlet" factor, trivial from the standpoint of common-factor theory, but more credible than freedom from distractibility from the standpoint of discriminant validity). The aptitude-achievement distinction may be valid theoretically, but is highly undesirable within a scale that is intended to measure scholastic aptitude. The key point is that decades of research on both the WISC-R and its predecessor, the WISC, have failed to establish the treatment utility of multiple-factor solutions or subtest profiles (e.g., Frank, 1983; Kramer et al., 1987; Witt 8c Gresham, 1985). Thus, in both a statistical and practical sense, there is little reason to believe that discrepancies between Verbal and Performance scale scores represent anything more clinically useful than either random mea-surement error or method variance (e.g., persons who do not speak English will perform poorly on the Verbal scale; persons with visual and motor impairments will perform poorly on the Performance scale).

Predictive and Treatment Utility of the General Factor

In contrast to the meager incremental validity of the multiple-factor solutions, the correlations between measures of general IQand school performance (gener-ally in the .50 range) are among the most impressive "real-life," predictive rela-tionships that one may encounter in the behavioral sciences (e.g., Cohen, 1988; Heller et al., 1982; Travers, 1982). Although the predictive validity of the WISC-R has been extensively documented (e.g., Hale, 1983; Sattler, 1988), the implications of these data for treatment planning are still controversial (e.g., Heller et al., 1982; Messick, 1989; Travers, 1982).

In the context of educational decision making it is not enough to know that IQ tests predict future classroom performance, nor would it be enough to know that they measure general ability. It is neces-sary to ask whether IQ tests provide information that leads to more effective instruction than would otherwise be possible. (Heller et al., 1982, p. 53)

On the basis of an exhaustive review of the literature, Cronbach and Snow (1977) concluded that

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

154 THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992

the treatment for students low in general ability should differ from that for able students; . . . a person with poor verbal development [i.e., low general scholastic aptitude] is handicapped in every field until the verbal load is consciously reduced, (p. 521)

Thus, there is some empirical support for historical assumptions regarding the treatment utility of general IQ, at least in a very broad, and correspondingly limited, sense (e.g., Cronbach 8c Snow, 1977; Snow, 1986; Travers, 1982). Cronbach and Snow (1977), appreciative of the limitations, cautioned that the use of general IQ for educational decision making was problematic both in terms of (a) an obvi-ous lack of specificity for the design of instructional interventions and (b) a lack of adequate dependability for important, life-affecting decisions.

In terms of the dependability of decisions, inferences about classification status and educational program based on the WISC-R need to be considered in relation to alternative measures of general scholastic aptitude (e.g., the Stanford-Binet Intel-ligence Scale) and other relevant parameters of score variation (time, examiner, etc.) (e.g., Barnett & Macmann, 1990). For example, Cronbach and Snow (1977) sug-gested that "aptitude distinctions that guide educational decisions must be based on clear evidence that such distinctions are stable over the period governed by the decision" (p. 161). In a computer simulation of the dependability of classifica-tion decisions, using two measures correlated at a .80 level, less than 50% of indi-vidual cases were consistently classified as "low" performers across alternative measures (Macmann, Barnett, Lombard, Belton-Kocher, 8c Sharpe, 1989). The prob-lems may be attributed to both (a) the indeterminacy or limited generalizability of scores within a construct domain (e.g., McDonald, 1985) and (b) the loss of infor-mation inherent in translating a graduated distribution of scores to either a dichotomous (e.g., eligible vs. not eligible) or otherwise limited range of decision outcomes (e.g., Barnett & Macmann, 1990; Cohen, 1983; Macmann et al., 1989).

Given the aforementioned problems, a satisfactory level of decision agreement (i.e., > .90) is virtually impossible to achieve over the 3-year time span that governs special education decisions (Macmann et al., 1989). A study by McDermott et al. (1989) provides a partial illustration of the difficulties. McDermott et al. suggested that "core profile types" (essentially defined in terms of Full Scale IQ) might pro-vide a foundation for future validity research and professional practices with the WISC-R. Yet, on the basis of test-retest data spanning a period of merely 3 to 5 weeks, the overall estimate of WISC-R classification or "typal stability" was only 64.7% (57.5%, corrected for chance), revealing high rates of classification error over a short time period. Of critical importance, the full-scale typology investigat-ed by McDermott et al. (1989) was the most stable configuration possible, illustrat-ing the precarious, time-bound limits of clinical inferences associated with less stable patterns, such as verbal-performance contrasts (e.g., Conger et al., 1979) and ipsative subtest profiles (e.g., Kaufman, 1979a). These issues might appear tangen-tial within the context of a factor-analytic study, but are fundamental to evaluat-ing the validity of putative constructs from the standpoint of clinical use (e.g., American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1985; Barnett 8c Zucker, 1990; Messick, 1989).

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 155

Parsimony Revisited: Alternatives to the Use of General IQ

Disenchantment with the social and educational consequences of IQ testing has spawned decades of legal and professional controversy (e.g., Cronbach, 1988; Heller et al., 1982; Reschly, 1988). In this section, we frame the broad parameters of policy debate through a critique of two major alternatives to the use of general IQ in educational settings: (a) assessments of cognitive-processing skills and (b) assess-ments of instructional outcomes. The critique is decidedly biased by our prefer-ence for assessment strategies that are conducive to treatment-focused decision making in institutional settings (e.g., Barnett & Zucker, 1990; Fishman 8c Neigher, 1987; Hayes et al., 1986, 1987; Paul, 1986, 1987; Peterson, 1987). Although we think that the value bias is reasoned and appropriate, the reader should note that many other perspectives on the validity of assessment practices are possible (e.g., Cronbach, 1988; Messick, 1989).

Superimposing a Substantive Theory. One response to the problems associated with measures of general IQ has been to design alternative measures of intellectual competencies based on "process theories" of intelligence (e.g., Cronbach 8c Snow, 1977; Snow, 1986; Snow 8c Lohman, 1989). Although the approach would poten-tially yield psychometric tools of some practical value, contemporary efforts to redefine the WISC-R and other measures of general scholastic aptitude in terms of process theory (e.g., Kaufman, 1979a; Naglieri, 1989; Reynolds 8c Kaufman, 1985; Sattler, 1988), spanning colorful interpretations of "more than 75 different pat-terns of subtest variation" (McDermott et al., 1989, p. 292), have no demonstrated treatment utility. Moreover, strong tests of intervention-related effectiveness (e.g., Hayes et al., 1986, 1987) are likely to founder on the unreliable classifications of extreme performance derived from process-related cognitive measures and pro-file patterns (e.g., Macmann & Barnett, 1985; Macmann et al., 1989; McDermott et al., 1989). The results of the present study indicated that the myriad specific abilities, hypothesized to underlie performance on the WISC-R, were not effec-tively differentiable from general scholastic aptitude. The "specific variance" that may be ascribed to individual WISC-R subtests is both unreliable and uninter-pretable (e.g., Boehm, 1985; McDermott, Fantuzzo, 8c Glutting, 1990; Thorndike, 1975). Despite understandable protests (e.g., Kaufman, 1990), the suggestion that this empirical morass can be unraveled through recourse to clinical judgment is wishful thinking, at best (e.g., Barnett, 1988; Barnett 8c Zucker, 1990).

Direct and Repeated Measurements of Criterion Performance. An alternative strategy for coping with the dearth of treatment-relevant information available from the WISC-R involves the planning and adaptation of instruction based on intensive monitoring of a student's progress and responses to instructional intervention (e.g., Cronbach & Snow, 1977; Fuchs 8c Fuchs, 1986; Gickling 8c Havertape, 1981; Shapiro, 1990; Shinn, Rosenfield, 8c Knutson, 1989). Rather than relying on a crude predictor, such as general IQ, the effectiveness of alternative instructional inter-ventions can be evaluated directly and idiographically through the analysis of repeated measures of criterion performance (e.g., curriculum-based measures of

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

156 THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992

academic performance) using single-case research designs (e.g., alternating treat-ments) (e.g., Wacker, Stegge, 8c Berg, 1988). Although the methodology is vulner-able to generic criticisms of assessment practice (including problems with the dependability of decisions and the ever-present threat of deleterious, long-term social and educational outcomes) (e.g., Barnett & Macmann, 1990; Cronbach & Snow, 1977; Derr 8c Shapiro, 1989), the methodology already has made impor-tant contributions toward the desiderata for educational decision making outlined by Cronbach and Snow (1977) and others (e.g., Heller et al., 1982; Hobbs, 1975, 1980; Reynolds, 1984). From the standpoint of incremental treatment utility (e.g., Hayes et al., 1986, 1987; Heller et al., 1982; Rosenfield & Reynolds, 1990), it is difficult to imagine a plausible circumstance in which inferences about a student's intellectual processing (based on the analysis of WISC-R Full Scale, Verbal, Per-formance, or assorted subtest scores) could markedly improve the quality of instruc-tion that may be afforded through sound instructional principles and the alteration of methods and materials based on careful monitoring of intervention outcomes.

CONCLUSIONS

In summary, the conventional paradigm for the factor structure of the WISC-R (i.e., verbal comprehension, perceptual organization, and freedom from distract-ibility) (e.g., Kaufman, 1975, 1979a, 1979b) may be challenged on both logical and empirical grounds. Although it would be absurd to suggest that a general and specific variance theory of intelligence has been resurrected by the intercorrela-tions among WISC-R subtests (i.e., the findings relate primarily to the structure of the test rather than the structure of the intellect) (e.g., Gould, 1981; Snow, 1986; Snow 8c Lohman, 1989), the one-factor model was empirically defensible and at least theoretically consistent with historical and resurging contemporary interest in the predictive power of general intelligence (e.g., McNemar, 1964; Spearman, 1904; Thorndike, 1985). Kaufman (1990) has argued that a one-factor model is bereft of both theoretical and instructional value. Although we agree, we also find it difficult to believe (for the reasons outlined in the preceding dis-cussion) that the extraction of either additional factors or ipsative profile patterns will enhance the contributions of the WISC-R to professional practices and child outcomes.

Undoubtedly, we have failed to present an argument that is sufficiently com-pelling to rule out all opposing arguments and points of view (Cronbach, 1988). Our interpretations of the validity evidence are biased by a functional perspec-tive (e.g., Cronbach, 1988; Peterson, 1987) and a scale of worth defined by meaning-ful treatment outcomes rather than administrative convenience (e.g., Barnett 8c Zucker, 1990; Hayes et al , 1986, 1987; Messick, 1989). Although measures of general scholastic aptitude, such as the WISC-R, have demonstrated predictive validity (e.g., Thorndike, 1985), "prediction alone . . . is insufficient evidence of the test's educa-tional utility" (Heller et al., 1982, p. 61). Aside from regulatory or administrative requirements (e.g., see Reschly, Kicklighter, 8c McKee, 1988), there is little justifi-cation for the measurement of a crude predictor when alternative means are readily available to permit more precise (albeit still imperfect) decisions about

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 157

the types of instruction most effective in promot ing an individual's skill development or behavioral competencies. Given the available alternatives, the arguments regard-ing the relative advantages of ei ther a one- or a two-factor solution for the WISC-R are moot: Nei ther factor solution no r the cher ished belief that the l imitat ions of the Wechsler series can be overcome th rough clinical j u d g m e n t (e.g., Kaufman, 1990) is likely to improve the quality of professional decisions that can be m a d e on the basis of alternative assessment techniques. In sum, the tradit ional paradigm for the use of the WISC-R (i.e., to guide differential diagnoses and educat ional p lacement decisions) also should be challenged (e.g., Heller et al., 1982; Hobbs , 1975, 1980; Reschly, 1988; Rosenfield & Reynolds, 1990; Witt & Gresham, 1985).

Authors' Note

The authors gratefully acknowledge the comments of Roger Stuebing, Wayne Velicer, Fran Mues, and three anonymous reviewers on an earlier draft of the manuscript. Steffani Burd provided valuable assistance with the statistical analyses.

References

American Educational Research Association, American Psychological Association, National Council on Measurement in Edu-cation. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

Anderson, J.C., 8c Gerbing, D.W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411-423.

Barnett, D.W. (1988). Professional judgment: A critical appraisal. School Psychology Review, 17, 658-672.

Barnett, D.W., 8c Macmann, G.M. (1990, August). Decision reliability and validity: Con-tributions and limitations of alternative assess-ment strategies. Paper presented at the annual meeting of the American Psycho-logical Association, Boston.

Barnett, D.W., 8c Zucker, K. (1990). The per-sonal and social assessment of children: An anal-ysis of current status and professional practice issues. Boston: Allyn & Bacon.

Bentler, P.M. (1989). EQS structural equations program manual. Los Angeles: BMDP Sta-tistical Software.

Bentler, P.M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246.

Bentler, P.M., 8c Bonett, D.G. (1980). Signif-icance tests and goodness of fit in the anal-ysis of covariance structures. Psychological Bulletin, 88, 588-606.

Bentler, P.M., 8c Mooijaart, A. (1989). Choice of structural model via parsimony: A ratio-nale based on precision. Psychological Bulle-tin, 106, 315-317.

Boehm, A. (1985). Educational applications of intelligence testing. In B.B. Wolman (Ed.), Handbook of intelligence: Theories, mea-surements, and applications (pp. 933-964). New York: Wiley.

Breckler, S.J. (1990). Applications of covari-ance structure modeling in psychology: Cause for concern? Psychological Bulletin, 107, 260-273.

Carlson, L., Reynolds, C.R., & Gutkin, T.B. (1983). Consistency of the factorial valid-ity of the WISC-R for upper and lower SES groups. Journal of School Psychology, 21, 319-326.

Cohen, J. (1983). The cost of dichotomiza-tion. Applied Psychological Measurement, 7, 249-253.

Cohen, J. (1988). Statistical power analysis for the behavioral and life sciences (2nd ed.). Hills-dale, NJ: Erlbaum.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

158 THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992

Conger , A.J., Conger , J.C., Farrell, A.D., & Ward, D. (1979). What can the WISC-R measure? Applied Psychological Measurement, 3, 421-436.

Crawford, C.B., 8c Koopman , P. (1973). A note on Horn ' s test for the n u m b e r of fac-to r s in factor analys is . Multivariate Behavioral Research, 8, 117-125.

Cronbach , LJ . (1988). Five perspectives on validity a rgument . In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17). Hills-dale, NJ: Er lbaum.

Cronbach, L.J., 8c Snow, R.E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York : Irvington.

Derr, T.F., 8c Shapiro , E.S. (1989). A behav-ioral evaluation of curriculum-based assess-ment of reading. Journal of Psychoeducational Assessment, 7, 148-160.

Fishman, D.B., 8c Neigher, W.D. (1987). Tech-nological assessment: T a p p i n g a " th i rd cu l tu re" for decision-focused psychologi-cal measurement . In D.R. Peterson 8c D.B. Fishman (Eds.), Assessment for decision (pp. 44-76) . New Brunswick, NJ: Rutgers Uni-versity Press.

Frank, G. (1983). The Wechsler enterprise: An assessment of the development, structure, and use of the Wechsler tests of intelligence. New York: Pe rgamon Press.

Fuchs, L.S., 8c Fuchs, D. (1986). Linking assessment to instruct ional in tervent ion: An overview. School Psychology Review, 15, 318-323 .

Gickling, E.E., 8c Haver tape, J.F. (1981). Cur-r iculum-based assessment. In J. A. Tucker (Ed.), Non-test based assessment. Minneapolis: The National School Psychology Inservice T ra in ing Network, University of Min-nesota.

Gould, S.J. (1981). The mismeasure of man. New York: Nor ton .

Gyurke, J.S., Stone, B.J., 8c Beyer, M. (1990). A confirmatory factor analysis of the WPPSI-R./owraa/ of Psychoeducational Assess-ment, 8, 15 -21 .

Hale, R.L. (1981). Concur ren t validity of the WISC-R factor scores. Journal of School Psy-chology, 19, 274-278 .

Hale, R.L. (1983). Intellectual assessment. In M. Hersen, A.E. Kazdin, 8c A.S. Bellack (Eds.), The clinical psychology handbook (pp. 345-376) . New York: Pe rgamon Press.

Hale, R.L., 8c Raymond, M.R. (1981). Wechs-ler intelligence scale for ch i ldren- revised (WISC-R) pa t te rns of s t rengths and weak-nesses as predic tors of the in te l l igence-achievement re la t ionship . Diagnostique, 7, 35-42 .

Hale, R.L., 8c Saxe, J. (1983). Profile anal-ysis of the Wechsler intel l igence scale for ch i ld ren- rev ised . Journal of Psychoeduca-tional Assessment, 1, 155-162.

Hayes, S.C., Nelson, R.O., 8c Jar re t t , R.B. (1986). Evaluat ing the quality of behav-ioral assessment. In R.O. Nelson 8c S.C. Hayes (Eds.), Conceptual foundations for behavioral assessment (pp. 461-503) . New York: Guilford.

Hayes, S.C, Nelson, R.O., 8c Ja r re t t , R.B. (1987). The t reatment utility of assessment: A functional a p p r o a c h to evaluat ing assessment quality. American Psychologist, 42, 963-974.

Haynes, S.N. (1986). T h e design of inter-vent ion p rograms . In R.O. Nelson 8c S.C. Hayes (Eds.), Conceptual foundations of behavioral assessment (pp. 385-429) . New York: Guilford.

Heller, K.A., Hol tzman, W.H., 8c Messick, S. (Eds.). (1982). Placing children in special edu-cation: A strategy for equity. Washington, DC: Nat ional Academy.

Hobbs , N. (1975). The futures of children. San Francisco: Jossey-Bass.

Hobbs , N. (1980). An ecologically or ien ted , service-based system for the classification of h a n d i c a p p e d chi ldren . In S. Salzinger, J. Ant robus , & J. Glick (Eds.), The ecosystem of the "sick" child (pp. 271-290) . New York: Academic Press.

Horn , J.L. (1965). A rat ionale and test for the n u m b e r of factors in factor analysis. Psy-chometrika, 30, 179-185.

Horn , J.L. (1985). Remodel ing old models of intel l igence. In B.B. Wolman (Ed.), Hand-book of intelligence: Theories, measurements, and applications (pp. 267-300) . New York: Wiley.

House, E.R. (1977). The logic of evaluative argu-ment Los Angeles: University of California Press.

James, L.R., Mulaik, S.A., & Brett, J.M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills, CA: Sage.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL. 26/NO. 2/1992 159

Kaufman, A.S. (1975). Factor analysis of the WISC-R at 11 age levels between 6 lA and 16 V6 years. Journal of Consulting and Clini-cal Psychology, 43, 135-147.

Kaufman, A.S. (1979a). Intelligent testing with the WISC-R. New York: Wiley.

Kaufman, A.S. (1979b). WISC-R research: Implications for interpretation. School Psy-chology Digest, 8, 5-27.

Kaufman, A.S. (1981). The WISC-R and learning disabilities assessment: State of the art. Journal of Learning Disabilities, 14, 520-526.

Kaufman, A.S. (1990). Assessing adolescent and adult intelligence. Boston: Allyn 8c Bacon.

Kendall, M.G. (1959). Hiawatha designs an experiment. American Statistician, 13, 23-24.

Kenny, D.A. (1979). Correlation and causality. New York: Wiley.

Keppel, G. (1982). Design and analysis: A researcher's handbook (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Kramer, J.J., Henning-Stout, M., Ullman, D.P., 8c Schellenberg, R.P. (1987). The via-bility of scatter analysis on the WISC-R and SBIS: Examining a vestige. Journal of Psychoeducational Assessment, 5, 37-47.

Kronenberg, P.M., 8c ten BergeJ.M.F. (1987). Cross-validation of the WISC-R factorial structure using three-mode principal com-ponents analysis and perfect congruence analysis. Applied Psychological Measurement, 11, 195-210.

Macmann, G.M., 8c Barnett, D.W. (1985). Dis-crepancy score analysis: A computer simu-lation of classification stability. Journal of Psychoeducational Assessment, 4, 363-375.

Macmann, G.M., Barnett, D.W., Lombard, T.J., Belton-Kocher, E., 8c Sharpe, M.N. (1989). On the actuarial classification of children: Fundamental studies of classifi-cation agreement. The Journal of Special Edu-cation, 23, 127-149.

Marsh, H.W., Balla, J.R., 8c McDonald, R.P. (1988). Goodness-of-fit indexes in confir-matory factor analysis: The effect of sam-ple size. Psychological Bulletin, 103, 391-410.

McDermott, P.A., Fantuzzo, J.W., & Glutting, JJ . (1990). Just say no to subtest analysis: A critique on Wechsler theory and prac-tice. Journal of Psychoeducational Assessment, 8, 280-302.

McDermott, P.A., Glutting, J.J., Jones, J.N., Watkins, M.W., & Kush, J. (1989). Core profile types in the WISC-R national stan-dardization sample: Structure, mem-bership, and applications. Psychological Assessment: A Journal of Consulting and Clin-ical Psychology, 1, 292-299.

McDonald, R.P. (1985). Factor analysis and related methods. Hillsdale, NJ: Erlbaum.

McDonald, R.P., 8c Marsh, H.W. (1990). Choosing a multivariate model: Noncen-trality and goodness of fit. Psychological Bulletin, 107, 247-255.

McNemar, Q. (1964). Lost: Our intelligence? Why? American Psychologist, 19, 871-882.

Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan.

Mulaik, S.A., James, L.R., Van Alstine, J.V., Bennett, N., Lind, S., & Stilwell, CD. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psy-chological Bulletin, 105, 430-445.

Naglieri, J.A. (1989). A cognitive processing theory for the measurement of intelli-gence. Educational Psychologist, 24, 185-206.

Newcomb, M.D., 8c Bentler, P.M. (1988). Con-sequences of adolescent drug use: Impact on the lives of young adults. Newbury Park, CA: Sage.

Norusis, MJ./SPSS Inc. (1988). SPSS/PC + V3.0 update manual. Chicago: SPSS.

O'Grady, K.E. (1983). A confirmatory maxi-mum likelihood factor analysis of the WAIS-R. Journal of Consulting and Clinical Psychology, 51, 826-831.

O'Grady, K.E. (1989). Factor structure of the WISC-R. Multivariate Behavioral Research, 24, 177-193.

Paul, G.L. (Ed.). (1986). Assessment in residen-tial treatment settings (Vol. 1): Principles and methods to support cost-effective quality opera-tions. Champaign, IL: Research Press.

Paul, G.L. (1987). Rational operations in residential treatment settings through ongoing assessment of client and staff functioning. In D.R. Peterson 8c D.B. Fishman (Eds.), Assessment for decision (pp. 145-203). New Brunswick, NJ: Rutgers University Press.

Peterson, C.R., & Hart, D.H. (1979). Factor structure of the WISC-R for a clinic-referred population and specific sub-groups. Journal of Consulting and Clinical Psychology, 47, 643-645.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

160 THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992

Peterson, D.R. (1987). The role of assessment in professional psychology. In D.R. Peter-son & D.B. Fishman (Eds.), Assessment for decision (pp. 5-43). New Brunswick, NJ: Rutgers University Press.

Ramanaih, N.V., O'Donnell, J.P., 8c Ribich, F. (1976). Multiple-group factor analysis of the Wechsler intelligence scale for chil-dren. Journal of Clinical Psychology, 32, 829-831.

Reddon, J.R. (1985). MAPF and MAPS: Sub-routines for the number of principal com-ponents. Applied Psychological Measurement, 9, 97.

Reschly, D. (1988). Special education reform: School psychology revolution. School Psy-chology Review, 17, 465-481.

Reschly, D.J., Kicklighter, R., 8c McKee, P. (1988). Recent placement litigation: Part III. Analysis of differences in Larry P., Marshall, and S-l and implications for future practices. School Psychology Review, 17, 39-50.

Reynolds, C.R., 8c Gutkin, T.B. (1980). Stabil-ity of the WISC-R factor structure across sex at two age levels. Journal of Clinical Psy-chology, 36, 775-777.

Reynolds, C.R., 8c Harding, R.E. (1983). Out-come in two large sample studies of fac-torial similarity under six methods of comparison. Educational and Psychological Measurement, 43, 723-728.

Reynolds, C.R., 8c Kaufman, A.S. (1985). Clin-ical assessment of children's intelligence with the Wechsler scales. In B.B. Wolman (Ed.), Handbook of intelligence: Theories, mea-surements, and applications (pp. 601-661). New York: Wiley.

Reynolds, M.C. (1984). Classification of stu-dents with handicaps. In E.W. Gordon (Ed.), Review of research in education (Vol. 11, pp. 63-92). Washington, DC: American Educational Research Association.

Rosenfield, S., 8c Reynolds, M.C. (1990). Mainstreaming school psychology: A pro-posal to develop and evaluate alternative assessment methods and intervention strategies. School Psychology Quarterly, 5, 55-65.

Sattler, J.M. (1988). Assessment of children (3rd ed.). San Diego: Author.

Shapiro, E.S. (1990). An integrated model for curriculum-based assessment. School Psy-chology Review, 19, 331-349.

Shinn, M.R., Rosenfield, S., & Knutson, N. (1989). Curriculum-based assessment: A comparison of models. School Psychology Review, 18, 299-316.

Silverstein, A.B. (1980). Estimating the gen-eral factor in the WISC-R. Psychological Reports, 47, 1185-1186.

Silverstein, A.B. (1982). Alternative multiple group solutions for the WISC and the WISC-R. Journal of Clinical Psychology, 38, 166-168.

Silverstein, A.B. (1987). Multidimensional scaling vs. factor analysis of Wechsler's intelligence scales. Journal of Clinical Psy-chology, 43, 381-386.

Snow, R.E. (1986). Individual differences and the design of educational programs. Ameri-can Psychologist, 41, 1029-1039.

Snow, R.E., 8c Lohman, D.F. (1989). Implica-tions of cognitive psychology for educa-tional measurement. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 263-331). New York: Macmillan.

Sobel, M.E., 8c Bohrnstedt, G.W. (1985). Use of null models in evaluating fit of covari-ance structure models. In N.B. Tuma (Ed.), Sociological methodology (pp. 152-178). San Francisco: Jossey-Bass.

Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201-293.

Stevens, J. (1986). Applied multivariate statis-tics for the social sciences. Hillsdale, NJ: Erlbaum.

Tanaka, J.S. (1987). "How big is big enough?" Sample size and goodness of fit in struc-tural equation models with latent vari-ables. Child Development, 58, 134-146.

Thorndike, R.L. (1975). Mr. Binet's test 70 years later. Educational Researcher, 5, 3-7.

Thorndike, R.L. (1985). The central role of general ability in prediction. Intelligence, 20, 241-254.

Tingstrom, D.H., 8c Pfeiffer, S.I. (1988). WISC-R factor structure in a referred pedi-atric population. Journal of Clinical Psychol-ogy, 44, 799-802.

Travers, J.R. (1982). Testing in educational placement: Issues and evidence. In K.A. Heller, W. Holtzman, & S. Messick (Eds.), Placing children in special education: A strat-egy for equity (pp. 230-261). Washington, DC: National Academy.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from

THE JOURNAL OF SPECIAL EDUCATION VOL 26/NO. 2/1992 161

Velicer, W.F. (1976). Determining the num-ber of components from the matrix of par-tial correlat ions. Psychometrika, 41, 321-327.

Velicer, W.F, 8c Jackson, D.N. (1990). Com-ponent analysis versus common factor analysis: Some issues in selecting an appropriate procedure. Multivariate Behav-ioral Research, 25, 1-28.

Wacker, D.P., Stegge, M., 8c Berg, W.K. (1988). Use of single-case designs to evalu-ate manipulable influences on school per-formance. School Psychology Review, 17, 651-657.

Wallbrown, F.H, Blaha, J , Wallbrown, J.D, 8c Engin, A.W. (1975). The hierarchical fac-tor structure of the Wechsler intelligence scale for children-revised. The Journal of Psychology, 89, 223-235.

Waller, N.G., 8c Waldman, I.D. (1990). A reexamination of the WAIS-R factor struc-ture. Psychological Assessment: A Journal

of Consulting and Clinical Psychology, 2, 139-144.

Wechsler, D. (1958). The measurement and appraisal of adult intelligence (4th ed.). San Antonio, TX: Psychological Corp.

Wechsler, D. (1974). Manual for the Wechsler intelligence scale for children-revised. San Antonio, TX: Psychological Corp.

Witt, J.C, 8c Gresham, F.M. (1985). Review of Wechsler intelligence scale for children-revised. InJ.V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (Vol. 2, pp. 1716-1719). Lincoln: University of Nebraska.

Zwick, W.R, 8c Velicer, W.F. (1982). Factors influencing four rules for determining the number of components to retain. Multivar-iate Behavioral Research, 17, 253-269.

Zwick, W.R, & Velicer, W.F. (1986). Compar-ison of five rules for determining the num-ber of components to retain. Psychological Bulletin, 99, 432-442.

at PENNSYLVANIA STATE UNIV on May 10, 2016sed.sagepub.comDownloaded from