Grouped Versus Randomized Format: An Investigation of ...Grouped Versus Randomized Format: An Investigation of Scale Convergent and Discriminant Validity Using LISREL Confirmatory

Grouped Versus Randomized Format:An Investigation of Scale Convergentand Discriminant Validity UsingLISREL Confirmatory Factor AnalysisChester A. Schriesheim, University of Miami

Esther Solomon, Fordham UniversityRichard E. Kopelman, Baruch College, City University of New York

LISREL maximum likelihood confirmatory factoranalyses (Jöreskog & Sörbom, 1984) were conductedto explore the effects of two questionnaire formats(grouping versus randomizing items) on the conver-gent and discriminant validity of two sets of question-naire measures. The first set of measures consisted ofsatisfaction scales that had demonstrated acceptablepsychometric properties in earlier studies; the secondset of scales were job characteristics measures that hadshown discriminant validity problems in previous re-search. Correlational data were collected from two

groups of employed business administration students(N = 80 in each group) concurrently (Study 1) and attwo points in time (Study 2). The results of the anal-yses showed that the grouped format was superior tothe random format, particularly with respect to theweaker measures (the job characteristics scales). Theresults also illustrated and supported the usefulness ofLISREL confirmatory factor analysis in studies of con-vergent and discriminant validity. Index terms: con-

firmatory factor analysis, convergent validity, discrim-inant validity, LISREL analysis, questionnaire formats,scale validity.

Questionnaires are widely used in applied psy-chological research and, of course, thousands ofdifferent instruments exist. The typical format pre-sents items measuring different constructs to re-spondents in a randomized, unlabeled manner. Otheritem presentation formats are rarely used. Therehas been little research on whether format exerts

meaningful effects on the psychometric adequacyof various measurement instruments, despite thefact that format is probably one of the easiest char-acteristics that might be changed on existing mea-sures. Potentially, format changes could result inpsychometric improvements without requiring theredevelopment of instruments or the expenditureof much time or effort.

Randomization of the order of questionnaire itemsis often assumed as necessary or desirable for mea-

suring instruments, without fully considering otheroptions or the possible consequences of randomi-zation. Schriesheim and DeNisi (1980) have notedthat randomizing items makes sense when it is nec-essary to disguise the true purpose of a measure toreduce the impact of various response biases. How-ever, randomizing items could also be dysfunc-tional. If respondents are trying to provide accurateanswers, randomization makes the task more dif-ficult and may therefore impair the validity of re-sponses. This suggests that grouping and labelingitems measuring the same constructs may be ad-vantageous ; such a format may break monotonyand lessen fatigue effects, for example, by pre-senting respondents with many short questionnairesubsections as opposed to several long subsections.It may also help convince respondents that manyitems are not simply duplicates, designed to &dquo;catch&dquo;them in inconsistencies (Schriesheim & DeNisi,1980).Although not extensive, some research bears on

the effects of different item presentation formats.

APPLIED PSYCHOLOGICAL MEASUREMENTVol. 13, No. 1, March 1989, pp. 19-32© Copyright 1989 Applied Psychological Measurement Inc.0146-6216/89/010019-14$1.95

19Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

20

Baehr (1953) assessed the impact of grouping ver-sus randomly distributing items measuring variousjob-related attitudes. Almost identical attitude pro-files were obtained using both formats, suggestingno differences in measurement between the twomodes.

Schriesheim and DeNisi (1980) and Schriesheim( 1981 a, 1981b) examined the effects of groupedand labeled versus randomized items on the reli-

ability and validity of two leadership instruments.Grouping and labeling items resulted in impaireddiscriminant validity (Schriesheim & DeNisi, 1980)and heightened leniency response bias (Schries-heim, 1981a). Additionally, follow-up analysessuggested that the impaired discriminant validitywas due to the heightened leniency effects (Schries-heim, 1981b).

Solomon and Kopelman (1984) examined theeffects of grouping, grouping and labeling, andrandomizing items on internal consistency relia-bilities. Using four samples and measures of per-ceived job characteristics and personal satisfaction(job, life, family, and self), modestly to moder-ately higher internal consistency reliabilities wereobtained for the grouped format and the groupedand labeled format than for the randomized format.The results of these studies appear reasonably

consistent: Grouping or grouping and labeling itemsmeasuring the same constructs may result in height-ened response biases (such as leniency), producinghigher reliabilities and also impairing discriminantvalidity (due to the effects of a common underlyingresponse tendency). These studies suggest that therandomized format is superior for use in appliedpsychological research, because although reliabil-ity may be enhanced by grouping or grouping andlabeling, systematic error may be increased as well.

The prior studies do not, however, provide ad-equate evidence for drawing firm and widely gen-eralizable conclusions about the effects of item pre-sentation format. The Baehr (1953) study involvedonly a limited comparison of grouped versus ran-domized item formats. The Solomon and Kopel-man (1984) study used four different samples; thusdifferences in format effects could have been due,at least in part, to sample differences. Finally, theSchriesheim and DeNisi (1980) and Schriesheim

(1981a, 1981b) studies all used the same small

sample and examined format effects only for lead-ership scales. Because the leadership scales usedin these studies suffer from substantial leniencyeffects (Schriesheim, Kinicki, & Schriesheim, 1979),generalization of the results to other, less responsebias-prone research instruments is unclear.

Given that the existing empirical evidence merelysuggests that the randomized item presentation for-mat may be superior to grouped formats, the cur-rent investigation was undertaken. Two differentstudies were conducted to explore format effectson convergent and discriminant validity within thesame group at one point in time (Study 1), and onconvergent and discriminant validity within tworandomly-assigned subgroups at two points in time(Study 2).

Study 1

Participants and Procedure

Participants in the first study were 80 studentsenrolled in business courses at an Eastern univer-

sity. All participants had worked full- or part-timewithin the previous year, and their most recent jobexperiences were used as the referents in com-

pleting the study instruments. Participation wasvoluntary and respondents were guaranteed ano-nymity ; assignment to the two conditions was ona random basis.

Study 1 compared responses to grouped and ran-domized measures obtained from the same indi-

viduals, thereby controlling for respondent effects.In this study, scale formats were treated as meth-ods. In half the cases (N = 40), the first part ofthe questionnaire included randomized satisfactionscales and grouped job characteristics scales; thesecond part consisted of grouped satisfaction scalesand randomized job characteristics scales. In theother half of the cases (N = 40), the first sectionconsisted of grouped satisfaction scales and ran-domized job characteristics scales; the second sec-tion consisted of randomized satisfaction scales and

grouped job characteristics scales. (Within thesetwo groups, the instruments were also counterbal-anced with respect to presentation order-first the

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction


21

satisfaction scales and then the job characteristicsscales, and vice-versa.)

Measures

Job characteristics scales. Two sets of mea-sures were used. The first consisted of five jobcharacteristics scales from the Job Diagnostic Sur-vey (JDS; Hackman & Oldham, 1975). These scaleshave had notable discriminant validity problems inprevious research (Dunham, Aldag, & Brief, 1977;Pierce & Dunham, 1978); including them permittedthe examination of format effects for scales withknown psychometric problems. The five JDS scalesmeasure the perceived &dquo;core job dimensions&dquo; ofAutonomy (A), Variety (v), Significance (s), Iden-tity (1), and Feedback from the job (F). In the groupedversion, items relating to each job dimension weregrouped together and labeled; in the randomizedversion, the 15 items were randomly distributed,so that no indication was given of the dimensioneach was measuring.

Satisfaction measures. The second set of mea-sures consisted of four satisfaction scales exam-

ining Life, Self, Job, and Family satisfaction. Allscales used a five-point (Likert) response format,with alternatives ranging from &dquo;strongly agree&dquo; to&dquo;strongly disagree&dquo;. These scales were selectedfor investigation because they were expected notto suffer from discriminant validity problems (un-like the JDS scales), as they came from fairly di-verse content domains and had not previously shownpsychometric deficiencies.

Life satisfaction (LS) was measured by two itemssimilar to those used by the Survey Research Cen-ter’s quality-of-employment surveys (e.g., &dquo;In

general, I am satisfied with the way I spend mytime these days&dquo;; Robinson & Shaver, 1973), plustwo additional items which were developed in priorresearch (Kopelman, Greenhaus, & Connolly, 1983).Self-satisfaction (ss) was measured by two items,one identical to an item from the Rosenberg (1965)scale and one similar to an item from that scale.

Job satisfaction (is) was measured by the three-item JDS general satisfaction scale (Hackman &

Oldham, 1975).Family satisfaction (FS) was measured by three

items which basically restated the general job sat-isfaction scale, substituting the word &dquo;family&dquo; for&dquo;job&dquo; (e.g., &dquo;I frequently think I would like tochange my family situation&dquo;). In the grouped ver-sion of the questionnaire, items pertaining to eachof the four satisfaction areas were grouped by areaand labeled; this was not done in the randomizedversion.

Data Analysis

Internal consistency reliability. Coefficient al-

pha internal consistency reliabilities were first com-puted for all the measures of Study 1; no statisti-cally significant differences were found betweenthe grouped and randomized scale reliabilities. Ofthe 18 computed coefficients, the 3 lowest were.59, .65, and .69; all others exceeded .70. Thus,while 3 coefficients were lower than desired, thescales were generally of adequate reliability formeaningful data analyses (Nunnally, 1978).

Convergent and discriminant validity. The

multitrait-multimethod matrix framework first sug-gested by Campbell and Fiske (1959) for exam-ining the validity of an instrument was employed.However, rather than using Campbell and Fiske’slargely nonstatistical assessment criteria, the anal-yses were performed by LISREL m maximum like-lihood confirmatory factor analysis (Joreskog &

Sorbom, 1984), employing the procedures outlinedby Widaman (1985) and recommended by Schmittand Stults (1986). Confirmatory factor analysis wasemployed because it allows the examination of thegrouped and random scales separately. This is notpossible in rival approaches, such as Kavanagh,MacKinney, and Wollins’ (1971) analysis-of-var-iance approach (Schmitt & Stults, 1986).

First, a full matrix model (Model 3C in Wida-man’s taxonomy) was fit to the multitrait-multi-method correlation matrices,’ using Rindskopf’s(1983) parameterization technique (Widaman, 1985,

’To conserve space, the input correlation matrices are not pre-sented here; they are given in a preliminary analysis of thesedata (Schriesheim, Kopelman, & Solomon, in press) which usedweaker and more traditional analyses (and reached differentconclusions than those presented here).



22

pp. 10-11). This full model consisted of three in-dependent sets of factor components: trait, method,and unique factors. Each measured variable wasspecified as having a loading on one trait, onemethod, and one unique factor, with no inappro-priate or cross-loadings (see, e. g. , Table 1). Fol-lowing convention, the trait factors were allowedto be correlated with each other but not with themethod or unique factors, and the methods wereallowed to be correlated only between themselves.The unique (random error) factors were specifiedas uncorrelated among themselves or with any otherfactors.

The goodness of fit of this full model to the datawas assessed by a chi-square likelihood test (J6-reskog & Sorbom, 1984) and by Bentler and Bo-nett’s (1980) rho and delta indices. Rho (p) is a

relative index of the off-diagonal covariation amongobserved variables which is explained by a model;delta (A) is an absolute measure representing theproportion of off-diagonal covariance independentof degrees of freedom (see Schmitt & Stults, 1986,pp. 13-14). Following Widaman (1985) and Schmittand Stults (1986), the full model was consideredacceptable if the chi-square probability exceeded.05 and both p and A were .90 or greater.To test for convergent and discriminant validity

and method bias effects, the full model describedabove was then compared to three additional modelsby first estimating those models and then calculat-ing the differences in x2, p, and A between eachrival model and the full model. The first rival modelconsisted of only method and unique factors-withno trait factors (Widaman’s Model 1 C)-and as-sessed the lack of convergent validity. The secondrival model (Model 2C) was identical to the initialfull model except that all trait intercorrelations wereconstrained to 1; this model assessed the lack ofdiscriminant validity. Finally, the third rival model(Model 3A) included no method factors and as-sessed the presence of method bias effects. Fol-

lowing Widaman (1985), x2 differences with prob-abilities in excess of .05 or differences of .01 or

greater in p or A were considered to indicate thata rival model provided a more meaningful fit tothe data than did the initial model.

Results

Satisfaction measures. Tables 1 and 2 presentthe LISREL-estimated full model results for the sat-isfaction and job characteristics scales, respec-tively ; Table 3 presents the analyses comparingthese models to those indicating a lack of conver-gent and discriminant validity and the presence ofmethod bias.

As shown in Tables 1 and 3, the confirmatoryanalysis of the satisfaction scales indicates an ex-cellent fit of the initial full model to the data

(X2 = 8.86, p = .12; p = .94; A = .98), and in-dicates that meaningful convergent and discrimi-nant validity and method bias effects are present(all p and A differences were -.07 or less). Fur-thermore, examination of the trait intercorrelations(Table 1) does not suggest any need to eliminateor combine trait factors (i.e., to hypothesize andestimate other initial structural models).

Averaging and squaring the factor loadings shownin Table 1 (to examine variance attributable to trait,method, and unique factors) shows that for theeight satisfaction measures, trait factor variance

accounts for 75% of the total variance. Further-

more, method factor variance averages only 12%and error factor variance only 14% (the sum ex-ceeds 100% due to rounding error). The groupedscales have a higher average percentage of traitvariance than do the random scales (78% vs. 71 %,respectively), and less method variance (7% vs.18%). However, the grouped scales show slightlymore error (16% vs. 12%). Overall, this analysissuggests some advantage for the grouped formatfor the satisfaction scales, as they have greater traitand less method variance than do the random scales.

Job characteristics scales. Tables 2 and 3 pre-sent the confirmatory analyses relevant to the jobcharacteristics measures. Again, these results in-dicate an excellent fit of the initial full model to

the data (X2 = 5.02, p = .99; p = 1.09 ; A = .99),and indicate that meaningful convergent and dis-criminant validity and method bias effects are pres-ent (all p and A differences were 10 or less).The trait factor intercorrelations (see Table 2)

show that it was necessary in this analysis to con-



23

Table 1Parameters Estimated in the Confirmatory Analysis of the Study 1

Satisfaction Scales With Four Trait and Two Method Factors

*Fixed values.

strain the Variety-Feedback trait factor intercor-relation at .99 to obtain parameter estimates withinbounds (i.e., with absolute values less than or equalto 1),2 and that the Autonomy-Feedback intercor-relation was estimated to be 1. Although this sug-gests that several other models might be examined,further analyses indicated that they did not providemeaningfully better fit to the data. Constraining theVariety-Feedback parameter at 1 did not producea better fit. Poorer fits were obtained by eliminatingthe Feedback trait factor and examining three newmodels with both the grouped and random Feed-back measures loading on (1) the Variety trait fac-tor, (2) the Autonomy trait factor, and (3) both the

2The footnotes to Tables 3, 6, and 8 show all parameters in theanalyses that were constrained so as to produce within-boundsestimates (Widaman, 1985); other than in this instance (the .99Variety-Feedback correlation), they all involved the method

correlations of the rival models (which were all constrained at1; other values uniformly produced poorer model fits).

Variety and Autonomy trait factors. This suggeststhat the original full model was an acceptable por-trayal of the job characteristics data.

Analysis of the results in Table 2 shows that thejob characteristics scales are of poorer psycho-metric quality than are the satisfaction scales. Here,trait factor variance accounts for 43% of the totalvariance in the 10 job characteristics measures,while method factor variance accounts for 35% and

error explains 23%. This poor showing relative tothe satisfaction scales is due largely to the randomjob scales rather than the grouped scales. The

grouped scales average 51 % trait variance, as com-pared to only 35% for the random scales. Bothhave approximately equal method variance (36%for the grouped and 34% for the random), whilethe grouped job scales have only 16% error vari-ance, as compared to 30% for the random scales.This pattern is similar to-but stronger than-thatfound for the satisfaction scales, indicating a con-siderable advantage for the grouped job character-istics scales over the random scales.



24


Job Characteristics Scales With Five Trait and Two Method Factors

*Fixed values.

Discussion

The results presented above show that groupingmay have a beneficial effect on psychometric ad-equacy, particularly for measures with a history ofconvergent and discriminant validity problems(Dunham et al., 1977; Pierce & Dunham, 1978).However, because the findings of a single studymay be due to various artifacts, a second study wasundertaken.

Study 2

As Schneider and Dachler (1978) have noted,&dquo;when a measure is employed in a time-based mode,stability (’retest reliability’), not internal consist-ency, is the critical index of reliability&dquo; (p. 650).For this reason, and because it is desirable to rep-licate confirmatory analysis results (Schmitt & Stults,1986), Study 2 explored format effects over time.

Schneider and Dachler (1978) conceptualized twoadministrations of an instrument (separated by time)as two measurement methods or sources; therefore,a standard multitrait-multimethod matrix analysiscan be used to examine test-retest reliability as&dquo;convergent stability&dquo; (&dquo;convergent validity&dquo; inCampbell and Fiske’s terminology) by assessingoverall respondent agreement across administra-tions of an instrument and its scales. Discriminant

validity can be assessed by examining within- andacross-administration scale intercorrelations, andmethod or source bias can be assessed on the basis

of respondent x administration variance over scales.The principal advantage of treating test-retestadministrations in this manner is that it allows an

assessment of the relative magnitudes of score sta-bility and scale independence (discrimination) incomparison with source or method bias. This, ofcourse, is useful as another way of assessing themerits of grouping or randomizing questionnaireitems on psychometric quality.



25

Table 3Goodness-of-Fit Indices for the Study 1 Confirmatory Models

’For this analysis, the correlation between methods was constrained at 1.0.bFor this analysis, the Variety-Feedback factor correlation was constrainedat .99 (see Table 2).

Participants and Procedure

The participants in the second study were 80different business students from the same Eastern

university as the first study. The questionnaire ref-erents, anonymity guarantees, and administrationprocedures were identical, and the respondents wererandomly assigned to the two conditions.

In this study, format effects on scale score sta-bilities were examined by treating instrumentadministration times as methods. Participants weretwice administered the same set of randomized or

grouped questionnaire versions at 5- to 8-week in-tervals. Half of the participants (N = 40) receiveda randomized satisfactions instrument and a groupedjob characteristics instrument for both administra-tions, while the other half (N = 40) received a

grouped satisfactions instrument and a randomizedjob characteristics instrument at both administra-

tions (within these two groups, the instruments werealso counterbalanced with respect to presentationorder) .

Measures and Analysis

The measures used in Study 2 were identical tothose employed in the first study. The data-analyticmethods used were also identical, except as notedbelow.

Results

Internal consistency reliabilities. The coeffi-

cient alpha reliability estimates obtained for Study2 were very similar to those of Study 1. Again, noreliability differences were statistically significant,so that neither format appeared preferable with re-spect to internal consistency reliability.



26

Satisfaction scale results. With respect to con-

vergent and discriminant validity, the general re-sults of Study 1 were further supported by the con-firmatory analyses of Study 2. Tables 4 through 6present these results for the satisfaction measures,while Tables 7 through 9 present results for the jobcharacteristics scales.As shown in Table 6, the goodness of fit of the

initial model to the grouped satisfaction scales isquite satisfactory (x2 = 7.21, p = .21; p = .92;A = .96), as is that for the random measures

(X2 = 2. 11, p = .83; p = 1.10; A = .99). The in-tercorrelations among the grouped satisfaction traitfactors shown in Table 4 do not suggest the ne-

cessity of trying other initial models, nor do thoseshown in Table 5 for the random satisfaction scales

(testing several alternative initial models supportedthis conclusion). Testing the initial grouped andrandom models (Tables 4 and 5) against the threerival models shows meaningful convergent and dis-criminant validity, as well as meaningful methodbias effects (see Table 6).

The factor loadings shown in Table 4 indicatethat 73% of the grouped satisfaction scale varianceis accounted for by the trait factors, but only 13%is attributable to method factors and 15% to randomerror. Furthermore, the average Fisher z-trans-

formed intercorrelation (McNemar, 1969) betweentrait factors is only .44 (the highest r = .84), in-dicating reasonable discriminant validity. Almostidentical results were obtained for the randomizedsatisfaction scales. Here (see Table 5), 72% of thetotal variance is attributable to the trait factors, withonly 15% due to methods and 14% to error. Theaverage trait factor r in this analysis is identical tothat for the grouped scales (.44), and the highestr is a more modest .73. These results, then, suggestgood measurement properties for the satisfactionmeasures as a set and no advantage for either thegrouped or random formats.Job characteristics scale results. The results

for the job characteristics measures, as shown inTables 7 through 9, are quite different from thosefor satisfaction. Here, the initial model fits the

Table 4Parameters Estimated in the Confirmatory Analysisof the Study 2 Grouped Satisfaction Scales With

Four Trait and Two Method Factors

*Fixed values.



27

Table 5

Parameters Estimated in the Confirmatory Analysis of the Study 2Random Satisfaction Scales With Four Trait and Two Method Factors

*Fixed values.

grouped job scales well (X2 = 15.92, p = .32;p = .97; A = .94), and the pattern of trait factorintercorrelations is very moderate (mean = .17;highest = .67). This does not suggest the necessityof searching for better-fitting initial models and,as a consequence, none was examined.

Testing the initial grouped job scales model (Ta-ble 7) against the three rival models (Table 8) showsthem to have meaningful convergent and discrim-inant validity, as well as meaningful method bias.The trait factors account for 48% of the total groupedjob scale variance, with the method factors ex-plaining 35% and random error 17%. Althoughthese are less favorable results than those for boththe grouped and random satisfaction measures, theyare very close to the variance percentages estimatedfor the Study 1 analysis of the grouped job scales(the corresponding results were 51%, 34%, and16%). Again, these data suggest that the satisfac-tion scales are better measures than the job char-acteristics scales.The results for the random job characteristics

scales further support this conclusion, as well as

the possibility that grouping may improve the psy-chometric quality of weaker measures. The initialmodel does not fit the random job scales very well.In fact, the statistics presented in Table 8 show arather poor fit (X2 = 171.26, p = .00; p = - .08;A = .67). In an attempt to find a satisfactory initialmodel, more than 100 additional separate LISREL

analyses were undertaken. These were guided bythe results from exploratory factor and componentanalyses, beginning with models that were closestto the initial model, and progressing to simpler ones(involving various combinations of five, four, three,and two trait factors, with two, one, and zero methodfactors).

These exploratory factor analyses used both theprincipal factor (with RZ communality estimates)and principal components methods, with varimaxand promax (Hendrickson & White, 1964) rota-tions as suggested by Gorsuch (1974), Harman(1976), and Rummel (1970) for complex data. Theprincipal factor analyses produced ultra-Heywood(1931) solutions when attempts were made to ex-tract more than three factors. The factor eigenval-



28

Table 6

Goodness-of-Fit Indices for the Study 2 Satisfaction Scales Confirmatory Models

’For this analysis, the correlation between methods was constrained at 1.0.

ues (5.63, 1.36, 1.01, .59, .48, .42, .23, .14, .13,and .01) and a scree plot (Cattell, 1966) also sug-gested the presence of only three factors.

Because none of these analyses produced bothfully acceptable within-bounds estimates and a sat-isfactory fit to the data, various additional &dquo;patch-up&dquo; designs were tried, including models with five,four, three, and two trait factors. These additional

designs included1. Kenny’s (1976) method that eliminated the

method factors and allowed two sets of cor-related within-method error;

2. A modification of the Kenny (1976) procedurewith error also allowed to be equally correlatedacross the unique factors;

3. Marsh and Hocevar’s (1983) procedure thatsets uniqueness factor loadings equal to com-

munalities estimated from principal compo-nents analyses (two-, three-, four-, and five-factor communality models were examined);and

4. Combinations of the third approach with thefirst two approaches.

None of these analyses produced satisfactory re-sults. Because an acceptable initial confirmatorymodel could not be identified, Table 9 instead pre-sents the results of a promax-rotated three-factorprincipal components analysis of these data, as wellas a five-factor solution (for these analyses, vari-max rotations produced virtually identical results).As indicated above, the three-factor results better

portray the data, but the five-factor results are shownbecause of the a priori dimensionality of the jobcharacteristics scales (principal components results



29


Grouped Job Characteristics Scales With Five Trait and Two Method Factors

*Fixed values.

are shown for comparability between the three- andfive-factor results because of the problems en-

countered with the principal factor analyses).3Table 9 presents the oblique-rotated pattern load-

ings and the factor intercorrelations for the three-and five-factor solutions (the loadings in excess of1.0 are due to their obliqueness). As shown inTable 9, a clear lack of discriminant validity occursamong the Variety, Autonomy, and Feedback scalesfor the three-factor solution. In the three-factor so-

lutions, these scales (at both administrations) havehigh (.50 or greater) loadings on the same factor(I); the Administration 1 Feedback scale also has

a high loading on Factor II, and the Administration2 Variety scale has a moderate Factor III loading.

3More detailed information concerning the various attemptedanalyses is available from the authors on request.

The Identity scales fare much better, with bothadministrations’ measures having only large load-ings on Factor III (and none on the other factors);although the Administration 2 Variety scale alsoloads on this factor (III), the loading is moderate.A similar pattern is evident with the two admin-istrations of the Significance scales, as these havetheir sole meaningful loadings on Factor II and onlythe Administration 2 Feedback scale loads highlyon this factor as well.The five-factor results shown in Table 9 appear

in some ways even poorer. Not only are con-founded factors obtained, but several of the scalesdo not meaningfully load on the same factors acrossthe two administrations (the Variety and Identityscales, in particular). These results, then, taken inconjunction with those for both the grouped jobscales and both sets of satisfaction scales, indicateclearly that the grouped format generally improved



30

Table 8Goodness-of-Fit Indices for the Study 2 Job Scales Confirmatory Models

’For this analysis, the correlation between methods was constrained at 1.0.

Table 9Promax-Rotated Principal Components Analysesfor Study 2 Random Job Characteristics Scales



31

measurement quality in Study 2, particularly forthe psychometrically weaker job characteristicsscales. Although difficulties were encountered inanalyzing the random job scales in Study 2, theoverall pattern of results seems sufficiently clearto draw this conclusion.

Discussion and Conclusions

Previous research has suggested that groupingitems may improve scale internal consistency re-liabilities while impairing scale discriminant valid-ity. The results of the current two studies do notsupport these conclusions.The present evidence suggests that grouping items

does not significantly enhance internal consistencyreliabilities, nor does it necessarily result in im-paired discriminant validity. In fact, the confir-

matory factor analyses suggested that grouping mayenhance the convergent validities of measures with-out impairing discriminant validity or seriously in-creasing method bias effects. Overall, these resultssuggested a reasonable improvement in psycho-metric quality (both convergent and discriminantvalidity) for scales with previously documentedproblems (the job characteristics scales).

Because the present evidence suggests that

grouping items may improve the psychometricproperties of weak measures, further studies of for-mat effects should be conducted. In particular, re-search might profitably explore format effects indifferent measurement domains (other than satis-faction and job characteristics) and under differentconditions, perhaps for scales with varying levelsof psychometric adequacy.The current research also supports the assertions

of Schmitt and Stults (1986), Widaman (1985), andothers: that confirmatory factor analysis is a sen-sitive analytic technique for this type of research,and that future research in this domain would be

well advised to use this approach (perhaps as themain data-analytic technique). However, regard-less of the methodology employed, researchers

should not be content with the current psychometricproperties of their instruments; they should seek toimprove all measures by whatever means possible(Schriesheim & DeNisi, 1980). Because improve-

ments in psychometric adequacy appear possiblefrom use of the grouped questionnaire format, fu-ture research exploring its usefulness in other do-mains seems both necessary and desirable.

References

Baehr, M. E. (1953). A simplified procedure for themeasurement of employee attitudes. Journal of Ap-plied Psychology, 37, 163-167.

Bentler, P. M., & Bonett, D. G. (1980). Significancetests and goodness of fit in the analysis of covariancestructures. Psychological Bulletin, 88, 588-606.

Campbell, D. T., & Fiske, D. W. (1959). Convergentand discriminant validation by the multitrait-multi-method matrix. Psychological Bulletin, 56, 81-105.

Cattell, R. B. (1966). The scree test for the number offactors. Multivariate Behavioral Research, 1, 245-276.

Dunham, R. B., Aldag, R. J., & Brief, A. P. (1977).Dimensionality of task design as measured by the jobdiagnostic survey. Academy of Management Journal,20, 209-223.

Gorsuch, R. L. (1974). Factor analysis. Philadelphia:Saunders.

Hackman, J. R., & Oldham, G. R. (1975). Developmentof the Job Diagnostic Survey. Journal of Applied Psy-chology, 60, 159-179.

Harman, H. H. (1976). Modern factor analysis (3rd ed.).Chicago: University of Chicago Press.

Hendrickson, A. E., & White, P. O. (1964). Promax:A quick method for rotation to oblique simple struc-ture. British Journal of Statistical Psychology, 17, 65-70.

Heywood, H. B. (1931). On finite sequences of realnumbers. Proceedings of the Royal Society of London,134, 486-501.

Jöreskog, K. G., & Sörbom, D. (1984). LISREL VI: Anal-ysis of linear structural relationships by maximumlikelihood, instrumental variables, and least squaresmethods. Mooresville IN: Scientific Software.

Kavanagh, M. J., MacKinney, A. C., & Wollins, L.(1971). Issues in managerial performance: Multitrait-multimethod analyses of ratings. Psychological Bul-letin, 75, 34-49.

Kenny, D. A. (1976). An empirical application of con-firmatory factor analysis to the multitrait-multimethodmatrix. Journal of Experimental Social Psychology,12, 247-252.

Kopelman, R. E., Greenhaus, J. H., & Connolly, T. F.(1983). A model of work, family, and inter-role con-flict : A construct validation study. Organizational Be-havior and Human Performance, 32, 198-215.

Marsh, H. W., & Hocevar, D. (1983). Confirmatory



32

factor analyses of multitrait-multimethod matrices.Journal of Educational Measurement, 20, 231-248.

McNemar, Q. (1969). Psychological statistics (4th ed.).New York: Wiley.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.).New York: McGraw-Hill.

Pierce, J. L., & Dunham, R. B. (1978). The measure-ment of perceived job characteristics: The job diag-nostic survey versus the job characteristics inventory.Academy of Management Journal, 21, 123-128.

Rindskopf, D. (1983). Parameterizing inequality con-straints on unique variances in linear structural models.Psychometrika, 48, 73-83.

Robinson, J. P., & Shaver, P. R. (1973). Measures ofsocial psychological attitudes (rev. ed.). Ann Arbor:University of Michigan, Institute for Social Research,Survey Research Center.

Rosenberg, M. (1965). Society and the adolescent self-image. Princeton NJ: Princeton University Press.

Rummel, R. J. (1970). Applied factor analysis. Evans-ton IL: Northwestern University Press.

Schmitt, N., & Stults, D. M. (1986). Methodology re-view : Analysis of multitrait-multimethod matrices.

Applied Psychological Measurement, 10, 1-22.Schneider, B., & Dachler, H. P. (1978). A note on the

stability of the job descriptive index. Journal of Ap-plied Psychology, 63, 650-653.

Schriesheim, C. A. (1981a). The effect of grouping orrandomizing items on leniency response bias. Edu-cational and Psychological Measurement, 41, 401-411.

Schriesheim, C. A. (1981b). Leniency effects on con-vergent and discriminant validity for grouped ques-tionnaire items: A further investigation. Educationaland Psychological Measurement, 41, 1093-1099.

Schriesheim, C. A., & DeNisi, A. S. (1980). Item pre-sentation as an influence on questionnaire validity: Afield experiment. Educational and PsychologicalMeasurement, 40, 175-182.

Schriesheim, C. A., Kinicki, A. J., & Schriesheim, J.F. (1979). The effect of leniency on leader behaviordescriptions. Organizational Behavior and HumanPerformance, 23, 1-29.

Schriesheim, C. A., Kopelman, R. E., & Solomon, E.(in press). The effect of grouped versus randomizedquestionnaire format on scale reliability and validity:A three-study investigation. Educational and Psycho-logical Measurement.

Solomon, E., & Kopelman, R. E. (1984). Questionnaireformat and scale reliability: An examination of threemodes of item presentation. Psychological Reports,54, 447-452.

Widaman, K. F. (1985). Hierarchically nested co-

variance structure models for multitrait-multimethoddata. Applied Psychological Measurement, 9, 1-26.

Acknowledgments

This research was supported by grants from the ResearchCouncil and Social Science Committee of the Universityof Miami and by the Corporate Affiliate Research Pro-gram of the School of Business Administration, Univer-sity of Miami. The authors thank the Editor and twoanonymous reviewers of earlier drafts of this paper fortheir helpful comments, David Rindskopf for his adviceand suggestions on some of the LISREL analyses, andLinda A. Tetrault for her assistance in performing manyof the analyses.

Author’s Address

Send requests for reprints or further information to Ches-ter A. Schriesheim, Management Department, Schoolof Business Administration, University of Miami, 414Jenkins Building, Coral Gables FL 33124-9145, U.S.A.



Documents

Grouped Versus Randomized Format: An Investigation of ...Grouped Versus Randomized Format: An Investigation of Scale Convergent and Discriminant Validity Using LISREL Confirmatory