19
This article was downloaded by: [North Carolina State University] On: 02 May 2013, At: 06:33 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Structural Equation Modeling: A Multidisciplinary Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hsem20 Longitudinal Invariance of Self- Esteem and Method Effects Associated With Negatively Worded Items Robert W. Motl & Christine DiStefano Published online: 19 Nov 2009. To cite this article: Robert W. Motl & Christine DiStefano (2002): Longitudinal Invariance of Self-Esteem and Method Effects Associated With Negatively Worded Items, Structural Equation Modeling: A Multidisciplinary Journal, 9:4, 562-578 To link to this article: http://dx.doi.org/10.1207/S15328007SEM0904_6 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms- and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages

Longitudinal Invariance of Self-Esteem and Method Effects Associated With Negatively Worded Items

Embed Size (px)

Citation preview

This article was downloaded by: [North Carolina State University]On: 02 May 2013, At: 06:33Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Structural Equation Modeling: AMultidisciplinary JournalPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/hsem20

Longitudinal Invariance of Self-Esteem and Method EffectsAssociated With NegativelyWorded ItemsRobert W. Motl & Christine DiStefanoPublished online: 19 Nov 2009.

To cite this article: Robert W. Motl & Christine DiStefano (2002): LongitudinalInvariance of Self-Esteem and Method Effects Associated With Negatively WordedItems, Structural Equation Modeling: A Multidisciplinary Journal, 9:4, 562-578

To link to this article: http://dx.doi.org/10.1207/S15328007SEM0904_6

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden.

The publisher does not give any warranty express or implied or make anyrepresentation that the contents will be complete or accurate or up todate. The accuracy of any instructions, formulae, and drug doses should beindependently verified with primary sources. The publisher shall not be liablefor any loss, actions, claims, proceedings, demand, or costs or damages

whatsoever or howsoever caused arising directly or indirectly in connectionwith or arising out of the use of this material.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

Longitudinal Invariance of Self-Esteemand Method Effects Associated With

Negatively Worded ItemsRobert W. Motl

Department of Exercise ScienceThe University of Georgia

Christine DiStefanoDepartment of Educational Leadership, Research, & Counseling

Louisiana State University

When developing self-report instruments, researchers often have included both posi-tively and negatively worded items to negate the possibility of response bias. Unfortu-nately, this strategy may interfere with examinations of the latent structure of self-re-port instruments by introducing methodeffects, particularly amongnegativelywordeditems.Thesubstantivenatureof themethodeffects remainsunclearandrequiresexam-ination. Building on recommendations from previous researchers (Tomás & Oliver,1999), this study examined the longitudinal invariance of method effects associatedwith negatively worded items using a self-report measure of global self-esteem. Datawere obtained from the National Educational Longitudinal Study (NELS; Ingels et al.,1992) across 3 waves, each separated by 2 years, and the longitudinal invariance of themethod effects was tested using LISREL 8.20 with weighted least squares estimationon polychoric correlations and an asymptotic variance/covariance matrix. Our resultsindicated that method effects associated with negatively worded items exhibited longi-tudinal invariance of the factor structure, factor loadings, item uniquenesses, factorvariances, and factor covariances. Therefore, method effects associated with nega-tively worded items demonstrated invariance across time, similar to measures of per-sonality traits, and should be considered of potential substantive importance. One pos-sible substantive interpretation is a response style.

When developing self-report instruments, researchers have utilized several strat-egies to prevent possible response bias (i.e., tendency to respond in a particular

STRUCTURAL EQUATION MODELING, 9(4), 562–578Copyright © 2002, Lawrence Erlbaum Associates, Inc.

Requests for reprints should be sent to Robert W. Motl, Department of Exercise Science, The Uni-versity of Georgia, Ramsey Student Center, 300 River Road, Athens, GA 30602–6554. E-mail:[email protected]

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

way or style to items on a test that yields systematic, construct-irrelevant error intest scores; American Educational Research Association, 1999). One strategyhas involved including positively and negatively worded items within the sameinstrument. The intent of wording items positively and negatively has been toavoid acquiescence, affirmation, or agreement bias (DeVellis, 1991; Nunnally,1978). Yet this strategy may interfere with examinations of the latent structure ofself-report instruments and the relations with other psychological constructsmeasured with positively and negatively worded items (Marsh, 1996; Tomás &Oliver, 1999).

The impact of item wording on the latent structure of self-report instrumentshas received considerable attention, particularly among researchers of globalself-esteem (e.g., Bachman & O’Malley, 1986; Carmines & Zeller, 1974, 1979;Corwyn, 2000; Goldsmith, 1986; Hensley & Roberts, 1976; Kaufman, Rasinski,Lee, & West, 1991; Kohn, 1977; Marsh, 1996; Tomás & Oliver, 1999; Wang,Siegal, Falck, & Carlson, 2001). For example, researchers have applied factoranalyses to responses from Rosenberg’s Self-Esteem scale and reported a two-fac-tor solution differentiating positively and negatively worded items (e.g., Carmines& Zeller, 1974, 1979; Hensley & Roberts, 1976; Kaufman et al., 1991; Kohn,1977). The substantive nature of the two-factor solution, however, was not clearand required further examination to determine whether it was meaningful or afunction of method effects associated with item wording.

Carmines and Zeller (1979) employed exploratory factor analysis and a constructvalidation approach to examine the latent structure of Rosenberg’s 10-item Self-Es-teemscale.Theexploratory factoranalysisyieldeda two-factor solution inwhichpos-itively worded items loaded on one factor and negatively worded items loaded on theother factor. The construct validation approach indicated that the relation between thetwo factors and 16 different criteria were never statistically different (p > .25). Thus,Rosenberg’sSelf-Esteemscalewasbest representedbyasinglesubstantivefactor,andthetwo-factorsolutionwastheresultofmethodeffectsassociatedwithitemwording.

Several researchers have combined confirmatory factor analysis (CFA) withcorrelated traits, correlated uniquenesses (CTCU), or correlated traits, correlatedmethods (CTCM) frameworks, or a combination of these, to assist in the separa-tion and estimation of substantive and methods components underlying re-sponses to Rosenberg’s Self-Esteem scale (e.g., Corwyn, 2000; Marsh, 1996;Tomás & Oliver, 1999; Wang et al., 2001).1 For example, Marsh employed CFA

LONGITUDINAL INVARIANCE 563

1The CTCU framework infers method effects from a series of CTCU among similarly wordeditems. The CTCU posit that there are some systematic, residual covariances among items that cannot beexplained by a substantive latent variable. The CTCM framework infers method effect from a latentvariable underlying similarly worded items. Similar to the CTCU framework, the latent variable in theCTCM framework posits that there is some source of systematic, residual covariance among items thatcannot be explained by a substantive latent variable. Hence, both frameworks posit a source of system-atic, residual covariance among similarly worded items but offer different means of (continued)

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

and the CTCU framework to test the fit of six models to a seven-item version ofRosenberg’s Self-Esteem scale that was employed in the National EducationLongitudinal Study of 1988 and 1990. Three models represented prior substan-tive interpretations of the Rosenberg Self-Esteem scale, and three models repre-sented a single self-esteem factor with method effects attributable to positive ornegative item wording. The substantive models contained a single substantiveself-esteem factor (Model 1), two factors representing general and transientself-evaluations (Model 2; Kaufman et al., 1991), and two factors representingeither positively or negatively worded items (Model 3). The three “methods ef-fect” models based on the CTCU framework contained a single substantiveself-esteem factor plus CTCU among negatively worded items (Model 4), posi-tively worded items (Model 5), and negatively and selected positively wordeditems (Model 6). CFA provided evidence that the Rosenberg Self-Esteem scalewas best described by a single substantive factor and “substantively irrelevantmethod effects” (Marsh, 1996, p. 815) associated primarily with the negativelyworded items.

Tomás and Oliver (1999) further evaluated the factor structure of globalself-esteem using a sample of Spanish junior and high school students and aSpanish version of the Rosenberg Self-Esteem scale. Tomás and Oliver em-ployed CFA and both the CTCU and CTCM frameworks to test a sequence ofnine models. The models included the six models employed by Marsh (1996),plus three additional models derived from the CTCM framework. The three ad-ditional models contained a single substantive factor plus a factor that repre-sented method effects among negatively worded items (Model 7), positivelyworded items (Model 8), and two factors that represented method effects amongboth negatively and positively worded items (Model 9). CFA indicated that theSpanish version of the Rosenberg Self-Esteem scale was best described by a sin-gle substantive factor, and there were method effects primarily among the nega-tively worded items.

Although method effects consistently have been associated with negativelyworded items, the substantive nature of the method effects remains unclear. Someresearchers have considered the method effects to be substantively irrelevant (e.g.,Marsh, 1996), essentially nothing more than noise contaminating an instrument’slatent structure. Other researchers have discussed the need to consider whether

564 MOTL AND DISTEFANO

1(continued) specifying and estimating method effects. The CTCM framework assumes that methodeffects can be explained by a single latent variable, and by default that method effects are unidimensional;the CTCU framework does not assume that method effects are unidimensional. This distinction indimensionality is testable when the number of items underlying a method effect exceeds three; when thenumber of items is three, the CTCM and CTCU frameworks are formally equivalent (Marsh & Grayson,1995). If the CTCU framework provides a better fit than the CTCM framework with the number of itemsunderlying a method effect exceeding three, then the method effect might be multidimensional.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

method effects represent something that is substantively meaningful (e.g., Motl,Conroy, & Horan, 2000; Tomás & Oliver, 1999).2

Tomás and Oliver (1999) recommended several avenues for future researchevaluating whether method effects associated with negatively worded items aresubstantively important or merely noise. Two of the recommendations includedexamining the presence of method effects in other social and personality scales andacross different ages.

Identifying method effects in other personality scales (cf. Conroy, 2002; Motl& Conroy, 2000; Motl et al., 2000) satisfies one of the recommendations for futureresearch provided by Tomás and Oliver (1999). For example, Motl and Conroytested for the presence of method effects associated with item wording among ameasure of social physique anxiety in a sample of college-age men and women.Using CFA and the CTCU framework, Motl and Conroy reported that the measureof social physique anxiety was best represented by a single substantive factor, andthere were method effects among negatively worded items. Similar results were re-ported by Motl et al. when using four different samples of college-age men andwomen and both the CTCU and CTCM frameworks.

The second recommendation of Tomás and Oliver (1999), examining whethermethod effects associated with negatively worded items are present across differ-ent ages, can be addressed with an analysis of longitudinal invariance using re-sponses from the same individuals across time. An analysis of longitudinalinvariance directly tests the equivalence of the factor structure, factor loadings,item uniquenesses, factor variances, and factor covariances across time (i.e., dif-ferent ages of the same individuals). Therefore, an analysis of longitudinalinvariance directly tests the stationarity and stability of method effects associatedwith negatively worded items. Stationarity demonstrates that the same construct isbeing measured across time and is based on the longitudinal invariance of the fac-tor structure and factor loadings (Pitts, West, & Tein, 1996; Tisak & Meredith,1990). Stability demonstrates that the relative ordering of individuals on the con-struct remains constant across time and is based on the longitudinal invariance ofthe factor covariances (Pitts et al., 1996; Tisak & Meredith, 1990).

Based on the recommendations of Tomás and Oliver (1999), this study examinedthe longitudinal invariance of method effects associated with negatively wordeditems using a self-report measure of global self-esteem. The test of longitudinalinvariance was performed using CFA and the CTCM framework (i.e., twouncorrelated latent variables representing substantive and method effects among a

LONGITUDINAL INVARIANCE 565

2Although Marsh (1996) employed a CTCU framework and reported that method effects appear tobe substantively irrelevant, and Tomás and Oliver (1999) employed both CTCU and CTCM frame-works and reported that method effects might represent something that is substantively meaningful,adopting either the CTCU or CTCM framework does not imply in and of itself that method effects aremeaningless contamination or substantively important.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

measure of global self-esteem). The CTCM framework was selected because it ac-counts for method effects as a latent variable and therefore permits a direct test of thefactorial invarianceof themeasurementofmethodeffectsassociatedwithnegativelyworded items across time. Moreover, a recent Monte Carlo simulation study demon-strated that previously reported problems with the CTCM (cf. Marsh & Grayson,1995) were specific to models that employed a single indicator per trait–methodcombination; CTCM models that contained two or more indicators per trait–methodcombination exhibited fewer methodological problems than its CTCU counterpart(Tomás, Hontangas, & Oliver, 2000). The analysis of longitudinal invariance in-volved successive tests of the equivalence of the factor structure, factor loadings,item uniquenesses, factor variances, and factor covariances across time(Vandenberg & Lance, 2000). Such analyses test the stationarity and stability ofmethod effects associated with negatively worded items and have implications forwhether method effects should be interpreted as substantively meaningful or as sub-stantively irrelevant noise. If method effects among negatively worded items aremeasuredconsistentlyacross timeandexhibit somedegreeofstability,webelieveanargument can be made for a substantive interpretation of the method effects amongnegatively worded items.

METHOD

Data

Data were employed from the National Educational Longitudinal Study (NELS).The NELS is a large, longitudinal data set collected by the National Department ofEducational Statistics (Ingels et al., 1992). The NELS data set contains measuresof student characteristics related to academic achievement and provides opportuni-ties to study social, behavioral, and educational variables related to student devel-opment.

The NELS data were collected at 2-year intervals from 1988 through 1994. Theinvariance analyses were conducted using responses from the NELS database col-lected in 1988, 1990, and 1992. Data from 1994 were not utilized because thepostsecondary follow-up did not include the self-esteem measure employed in theprevious three waves of NELS data. As recommended in the NELS user’s manual(Ingels et al., 1992), analyses were performed using the 1988, 1990, and 1992 data(i.e., students were enrolled in 8th, 10th, and 12th grades) weighted by the weight-ing variables (BYQWT, F1QWT, and F2QWT). The weighting variables take intoaccount disproportionate sampling of specific subgroups (Ingels et al., 1992). Af-ter using the weighting variables and including only sets of responses in whichthere were no missing data across the three time periods, the final sample consistedof 3,950 students.

566 MOTL AND DISTEFANO

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

Analyses were performed on the same seven items from Rosenberg’s Self-Es-teem scale as employed by Marsh (1996). Four of the items were positivelyworded, and three items were negatively worded. All items were rated on a 4-pointLikert scale ranging from 1 (strongly disagree) to 4 (strongly agree). The nega-tively worded items were reverse coded. The items and NELS variable names areprovided in the appendix.

Statistical Analysis

The longitudinal factorial invariance analyses were performed using LISREL (ver-sion 8.20; Jöreskog & Sörbom, 1996a) with weighted least squares estimation onpolychoric correlations and an asymptotic variance/covariance matrix computedusing PRELIS (version 2.20; Jöreskog & Sörbom, 1996b). Weighted least squaresestimation was selected because of the categorical nature of the data and violationsof multivariate normality (Mardia’s normalized estimates of multivariate skewnessand kurtosis were 77.75 and 88.08, respectively). The size of the sample was ade-quate to estimate and test the models using weighted least squares estimation(Jöreskog & Sörbom, 1996a).

Model specification. We initially tested a base model within each of thethree waves separately. The base model was specified using a CTCM frameworkand consisted of two uncorrelated factors. The two factors were a substantive fac-tor representing self-esteem and a method factor accounting for wording effectsamong the negatively phrased items. No correlations were specified between con-tent and method factors.

The model for the subsequent invariance analyses is presented in Figure 1. Themodel consisted of six factors: three substantive factors and three method factors.Substantive factors represented self-esteem and method factors accounted forwording effects among negatively phrased items. No correlations were specifiedbetween content and method factors. There were correlations that linked theself-esteem factors and the method effect factors across all three time-points (i.e.,First-Order Multiple Indicator Model; Marsh, 1993; Marsh & Grayson, 1994).Correlations were estimated among all possible pairs of uniquenesses because thesame items were utilized across the three time-points (i.e., not an AR(1) errormodel; Pitts et al., 1996).

The invariance routine involved testing and comparing five models that im-posed successive restrictions on model parameters. Model 1 tested the equality ofthe overall structure (i.e., same dimensions and same patterns of fixed, freed, andconstrained elements in the matrices containing factor loadings, factor variancesand covariances, and item uniquenesses). Model 2 included the restrictions fromModel 1 plus the additional constraint of equal factor loadings across all threewaves. Model 3 included the restrictions from Model 2 plus the additional con-

LONGITUDINAL INVARIANCE 567

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

straint of equal item uniquenesses across all three waves. Model 4 included the re-strictions from Model 3 plus the additional constraint of equal factor variancesacross all three waves. Model 5 included the restrictions from Model 4 plus the ad-ditional constraint of equal factor covariances across all three waves. Theinvariance routine was ordered similar to recommendations by Vandenberg andLance (2000).

The issues of evaluating model fit and comparing the fit of competing modelsremain unresolved. Accordingly, we employed the chi-square statistic and sub-jective indexes of fit to evaluate and compare the fit of the models (Cheung &Rensvold, 2002; Marsh & Grayson, 1994; Vandenberg & Lance, 2000). Thechi-square statistic assessed absolute fit of the model to the data, but it is sensi-tive to sample size and assumes the correct model (Bollen, 1989; Jöreskog,1993; Jöreskog & Sörbom, 1996a). Therefore, no restrictive model with positivedegrees of freedom is able to fit real data, and such models often will be rejectedby a formal test of significance with a sufficiently large sample size (Cudeck &Browne, 1983; Marsh, 1996). Accordingly, other subjective indexes of fit wereemployed to judge and compare the fit of the models (Cheung & Rensvold,2002; Marsh & Grayson, 1994; Vandenberg & Lance, 2000). The Good-ness-of-Fit Index (GFI) is an absolute fit index, and it provides a measure of theamount of variance/covariance in the sample matrix that is predicted by themodel implied variance/covariance matrix. Both the Non-Normed Fit Index(NNFI) and comparative fit index (CFI) are incremental fit indexes and test theproportionate improvement in fit by comparing the target model to a baselinemodel with no correlations among observed variables (Bentler, 1990; Bentler &

568 MOTL AND DISTEFANO

FIGURE 1 Model for the longitudinal invariance analyses specified using a CTCM frame-work.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

Bonett, 1980). GFI, NNFI, and CFI values approximating 0.95 were indicativeof good fit (Hu & Bentler, 1999). The standardized root mean squared residual(SRMR) is the average of the standardized residuals between the specified andobtained variance/covariance matrices (Bollen, 1989; Jöreskog & Sörbom,1996a). The SRMR value should approximate or be less than .08 (Hu & Bentler,1999). The root mean squared error of approximation (RMSEA) representscloseness of fit (Browne & Cudeck, 1993; Steiger & Lind, 1980). The RMSEAvalue should approximate or be less that .05 to demonstrate close fit of themodel (Browne & Cudeck, 1993). The 90% confidence interval around theRMSEA point estimate should contain .05 to indicate the possibility of close fit(Browne & Cudeck, 1993). The Expected Cross-Validation Index (Browne &Cudeck, 1993) is a single sample estimate of how well the current solutionwould fit in an independently drawn sample (Browne & Cudeck, 1993), and itcan be employed to compare the fit of competing models (Browne & Cudeck,1993).

RESULTS

The fit indexes for the models tested in the invariance analyses are presented in Table1.Thestandardizedandunstandardized factor loadingsarepresented inTable2.Thestandardized and unstandardized factor covariances are presented in Table 3.

We first examined the fit of the base model (i.e., two uncorrelated factors repre-senting self-esteem and method effects among negatively worded items) withineach of the three waves separately. The base model represented a good model–datafit within each of the three waves. Although the chi-square statistics were signifi-cant, all of the subjective fit indexes exceeded recommended criteria and suggestedgood model–data fit. Hence, there was evidence that method effects associatedwith the negatively worded items were present within each of the threetime-points.3

We examined the fit of the models in the invariance routine. All five of the mod-els within the invariance routine represented a good model–data fit. Although thechi-square statistics were significant, all of the subjective fit indexes exceeded rec-ommended criteria and suggested good model–data fit.

We then compared the fit of the models in the invariance routine. There was evi-dence for the invariance of the overall structure (Model 1), factor loadings (Model

LONGITUDINAL INVARIANCE 569

3We tested for the presence of method effects among positively worded items within each of thethree waves separately. The model consisted of two uncorrelated factors representing self-esteem andmethod effects among positively worded items. The model represented a good model–data fit withineach of the three waves, but it did not fit as well as the model with method effects among negativelyworded items. Hence, method effects were primarily associated with the negatively worded itemswithin each of the three time-points.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

570

TABLE 1Fit Indices From the Analyses Testing the Longitudinal Invariance of Substantive and Method Factors to a Seven-Item

Measure of Global Self-Esteem

Fit Indexes

Models χ2 df GFI NNFI CFI SRMR RMSEA (90% CI) P Close ECVI (90% CI)

Wave 1 106.85 11 .997 .994 .997 .035 .047 (.039–.055) 0.71 .036 (.028–.045)Wave 2 113.69 11 .996 .994 .997 .038 .049 (.041–.057) 0.59 .037 (.030–.047)Wave 3 121.24 11 .996 .995 .997 .038 .050 (.043–.059) 0.45 .039 (.031–.049)Model 1 574.64 153 .994 .995 .996 .043 .026 (.024–.029) 1.00 .185 (.168–.205)Model 2 646.38 169 .993 .995 .996 .045 .027 (.025–.029) 1.00 .195 (.177–.216)Model 3 689.01 183 .993 .995 .995 .046 .026 (.024–.029) 1.00 .199 (.179–.202)Model 4 708.50 187 .993 .995 .995 .048 .027 (.025–.029) 1.00 .202 (.182–.223)Model 5 843.08 191 .991 .994 .994 .064 .029 (.027–.031) 1.00 .234 (.211–.257)

Model Comparisons χ2diff df p

Model 1 vs. 2 71.74 16 <.05Model 2 vs. 3 42.63 14 <.05Model 3 vs. 4 19.49 4 <.05Model 4 vs. 5 134.58 4 <.05

Note. GFI = Goodness-of-Fit Index; NNFI = Nonnormed Fit Index; CFI = comparative fit index; SRMR = standardized root mean square residual;RMSEA = root mean square error of approximation; CI = confidence interval; P close = P value for testing Pr(RMSEA < 0.05); ECVI = expectedcross-validation index; Wave 1 = 1988; Wave 2 = 1990; Wave 3 = 1992; Model 1 = equality of the overall structure; Model 2 = Model 1 plus equality ofthe factor loadings; Model 3 = Model 2 plus equality of the item uniquenesses; Model 4 = Model 3 plus equality of factor variances; Model 5 = Model 4plus equality of factor covariances.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

2), item uniquenesses (Model 3), factor variances (Model 4), and factorcovariances (Model 5) across the three time-points. Although the chi-square dif-ference tests reported in Table 1 were significant, the subjective fit indexes werenearly identical across all five nested models. The RMSEA and ExpectedCross-Validation Index point estimates or 90% confidence intervals or both wereoverlapping across the models. The GFI, NNFI, and CFI were overlapping acrossthe five models. Moreover, the values of the CFI and, for that matter, the GFI andNNFI, did not change (∆CFI = CFIconstrained model – CFIunconstrained model) by morethan –.01 across the five models; this criterion has been reported to be robust fortesting the multigroup invariance of CFA models (Cheung & Rensvold, 2002). TheSRMR was similar across the four of the five models. Hence, the subjective fit in-dexes provided evidence that latent variables accounting for global self-esteemand method effects among negatively worded items demonstrated invariance of thefactor structure, factor loadings, item uniquenesses, factor variances, and factorcovariances across three waves each separated by 2 years.

DISCUSSION

Based on the recommendations of previous researchers (Tomás & Oliver, 1999),we employed CFA and the CTCM framework to examine whether method effectsassociated with negatively worded items were invariant over three waves each sep-arated by 2 years. Our results demonstrated that the method effects exhibitedinvariance of factor structure, factor loadings, item uniquenesses, factor variances,and factor covariances across three waves separated by 2 years. These results dem-onstrate that method effects associated with negatively worded items exhibitstationarity and stability across time, similar to measures of other personalitytraits. Stationarity demonstrates that the same construct is being measured acrosstime (Pitts et al., 1996; Tisak & Meredith, 1990). Stability demonstrates that therelative ordering of individuals on the construct remains constant across time (Pittset al., 1996; Tisak & Meredith, 1990). Hence, method effects should be consideredof potential substantive importance rather than simply substantively irrelevantnoise. Substantively irrelevant noise would not likely be measured similarly acrosstime, and the relative ordering of individuals would not likely remain consistentacross time.

If method effects associated with negatively worded items are substantivelymeaningful, one possible interpretation might be a response style. According toBentler, Jackson, and Messick (1971), a response style is a potentially measurablepersonality trait that can be “identified by the existence of a latent variable” and de-scribed as “a behavioral consistency operating across measures of several concep-tually distinct content traits” (p. 188). The idea that a response style is identified bya latent variable is directly related to the CFA models utilized to represent and esti-

LONGITUDINAL INVARIANCE 571

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

572

TABLE 2Factor Loadings From the Analyses Testing the Longitudinal Invariance of Substantive and Method Factors to a

Seven-Item Measure of Global Self-Esteem

Wave 1 Wave 2 Wave 3

Substantive Factor Method Factor Substantive Factor Method Factor Substantive Factor Method Factor

Model Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan.

Model 1Pos1 .808 1.000 .816 1.000 .824 1.000Pos2 .700 .866 .811 .995 .831 1.008Pos3 .622 .771 .713 .874 .788 .956Pos4 .838 1.037 .801 .982 .825 1.001Neg1 .454 .562 .610 1.000 .549 .673 .599 1.000 .567 .689 .588 1.000Neg2 .524 .648 .788 1.291 .594 .729 .749 1.250 .600 .729 .759 1.290Neg3 .590 .730 .226 .370 .652 .799 .233 .388 .637 .773 .274 .466

Model 2Pos1 .771 1.000 .821 1.000 .847 1.000Pos2 .741 .962 .790 .962 .815 .962Pos3 .677 .879 .722 .879 .745 .879Pos4 .774 1.005 .825 1.005 .851 1.005Neg1 .502 .651 .597 1.000 .535 .651 .600 1.000 .552 .651 .600 1.000Neg2 .546 .709 .762 1.276 .582 .709 .766 1.276 .601 .709 .765 1.276Neg3 .594 .770 .243 .407 .633 .770 .244 .407 .653 .770 .244 .407

Model 3Pos1 .806 1.000 .812 1.000 .814 1.000Pos2 .773 .959 .779 .959 .782 .959

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

573

Pos3 .707 .880 .714 .880 .718 .880Pos4 .812 1.007 .817 1.007 .820 1.007Neg1 .520 .649 .604 1.000 .527 .649 .602 1.000 .531 .649 .601 1.000Neg2 .565 .704 .771 1.274 .572 .704 .767 1.274 .575 .704 .764 1.274Neg3 .618 .769 .249 .412 .625 .769 .248 .412 .629 .769 .247 .412

Model 4Pos1 .810 1.000 .810 1.000 .810 1.000Pos2 .776 .958 .776 .958 .776 .958Pos3 .712 .880 .712 .880 .712 .880Pos4 .816 1.008 .816 1.008 .816 1.008Neg1 .524 .647 .604 1.000 .524 .647 .604 1.000 .524 .647 .604 1.000Neg2 .567 .700 .769 1.273 .567 .700 .769 1.273 .567 .700 .769 1.273Neg3 .623 .769 .250 .414 .623 .769 .250 .414 .623 .769 .250 .414

Model 5Pos1 .808 1.000 .808 1.000 .808 1.000Pos2 .768 .950 .768 .950 .768 .950Pos3 .710 .879 .710 .879 .710 .879Pos4 .810 1.002 .810 1.002 .810 1.002Neg1 .519 .642 .600 1.000 .519 .642 .600 1.000 .519 .642 .600 1.000Neg2 .558 .690 .775 1.291 .558 .690 .775 1.291 .558 .690 .775 1.291Neg3 .614 .759 .248 .413 .614 .759 .248 .413 .614 .759 .248 .413

Note. Stan. = standardized estimates of factor loadings; Unstand. = unstandardized estimates of factor loadings; Pos = positively worded item; Neg= negatively worded item; Model 1 = equality of the overall structure; Model 2 = Model 1 plus equality of the factor loadings; Model 3 = Model 2 plusequality of the item uniquenesses; Model 4 = Model 3 plus equality of factor variances; Model 5 = Model 4 plus equality of factor covariances. All factorloadings were significant p < .05.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

574

TABLE 3Factor Covariances From the Analyses Testing the Longitudinal Invariance of Substantive and Method Factors to a

Seven-Item Measure of Global Self-Esteem

Factor Covariance

Self1–Self2 Self2–Self3 Self1–Self3 Neg1–Neg2 Neg2–Neg3 Neg1–Neg3

Model Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan. Stan. Unstan.

Model 1 .571 .376 .691 .465 .492 .328 .379 .139 .465 .164 .296 .106Model 2 .565 .358 .688 .479 .486 .318 .390 .140 .465 .167 .308 .110Model 3 .564 .366 .686 .459 .483 .316 .390 .141 .456 .166 .310 .112Model 4 .564 .370 .685 .449 .482 .316 .390 .142 .454 .166 .311 .113Model 5 .612 .400 .612 .400 .612 .400 .393 .141 .393 .141 .393 .141

Note. Stan. = standardized estimates of factor covariances; Unstan. = unstandardized estimates of factor covariances; Self1 = self-esteem, wave 1;Self2 = self-esteem, wave 2; Self3 = self-esteem, wave 3; Neg1 = method factor, wave 1; Neg2 = method factor, wave 2; Neg3 = method factor, wave 3;Model 1 = equality of the overall structure; Model 2 = Model 1 plus equality of the factor loadings; Model 3 = Model 2 plus equality of the itemuniquenesses; Model 4 = Model 3 plus equality of factor variances; Model 5 = Model 4 plus equality of factor covariances. All factor covariances weresignificant p < .05.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

mate method effects among negatively worded items on self-report instruments(Marsh & Grayson, 1995; Tomás & Oliver, 1999). The notion that a response styleoperates across measures of distinct content traits has been supported by previousstudies of self-esteem (Tomás & Oliver, 1999), social physique anxiety (Motl etal., 2000), and fear of failure (Conroy, 2002). This study indicated that the methodeffects associated with negatively worded items demonstrated longitudinalinvariance and therefore represent a behavioral consistency operating across time(Billiet & McClendon, 2000). Accordingly, response style might provide a sub-stantive interpretation for the method effects associated with negatively wordeditems.

Interestingly, Wang et al. (2001) collected responses to Rosenberg’s Self-Es-teem scale via face-to-face interviews with crack-cocaine drug users and reportedthe existence of a method effect primarily among positively worded items. Thismight appear to be contradictory to the findings of negative wording effects forself-esteem among adolescents reported in this study and elsewhere (e.g., Corwyn,2000; Marsh, 1996; Tomás & Oliver, 1999). However, the divergent results mightbe attributable to differences in instrument administration (i.e., self-administrationvs. face-to-face interviews) and are perhaps even consistent with a substantive in-terpretation of method effects. Wang et al. noted that for the special population ofcrack-cocaine drug users who are regularly involved in illegal and socially stigma-tizing activities, the positively worded items may have made respondents feel un-comfortable in the interviews, and their responses might represent “something thatdeviates substantially from their real lives” (p. 284). Wang et al. further noted thatthe respondents may have tried to “appear more accepting of themselves than theyreally are in order not to ‘lose face’” (p. 284). Hence, the method effect associatedwith the positively worded items might be substantive in nature and also representa response style.

What is the possible cause of a response style underlying responses to nega-tively worded items? One possible explanation involves social desirability. Socialdesirability has been described as a motivation to present one’s self positively (i.e.,tendency to err on the flattering side in self-descriptions; Crowne & Marlowe,1964), and it has been considered a common source of systematic measurement er-ror because individuals might be more or less responsive to the social desirabilitycharacteristics of items on a self-report instrument (DeVellis, 1991). Accordingly,responses to negatively worded items might reflect both global self-esteem and anindividual’s unwillingness to admit to self-descriptions of low self-esteem as aprotective self-presentation mechanism. This possibility could be evaluated usingcovariance modeling techniques on responses from self-report instruments mea-suring global self-esteem and social desirability (e.g., Marlowe–Crowne SocialDesirability scale; Crowne & Marlowe, 1964).

Another possible explanation involves approach or avoidance behavioral ten-dencies. Emotion researchers have identified approach and avoidance as two basic

LONGITUDINAL INVARIANCE 575

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

circuits mediating motivation and emotion (Davidson, 1998). The approach sys-tem facilitates appetitive behaviors and generates positive affective responses ori-ented toward goal-attainment. The avoidance system facilitates withdrawal behav-iors and generates negative affective responses oriented toward avoiding aversivestimuli. Such behavioral tendencies are manifested in differences in prefrontal cor-tex activation asymmetry as measured by electroencephalography (Davidson,1998) and might influence responses to positively and negatively worded items onself-report instruments. This possibility could be evaluated using covariance mod-eling on responses from self-report instruments of global self-esteem and ap-proach or avoidance behavioral tendencies (e.g., Behavioral Inhibition and Behav-ioral Activation scales; Carver & White, 1994).

The presence of a response style associated with negatively worded items alsomay be related to sentence syntax or semantics and neural processing. Distinctparts of the left frontal cortex are associated with processing syntactic and seman-tic information (Vigliocco, 2000). Perhaps the left frontal cortex also influences re-sponses to negatively worded items, which has implications for identifying a neu-roanatomical correlate of method effects. We await future research delineating thecorrelates, causes, and influences of method effects associated with negativelyworded items.

REFERENCES

American Educational Research Association. (1999). Standards for educational and psychologicaltesting. Washington, DC: Author.

Bachman, J. G., & O’Malley, P. M. (1986). Self-concept, self-esteem, and educational experiences: Thefrog pond revisited (again). Journal of Personality and Social Psychology, 50, 35–46.

Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107,238–246.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis ofcovariance structures. Psychological Bulletin, 88, 588–606.

Bentler, P. M., Jackson, D. N., & Messick, S. (1971). Identification of content and style: A two-dimen-sional interpretation of acquiescence. Psychological Bulletin, 76, 186–204.

Billiet, J. B., & McClendon, M. J. (2000). Modeling acquiescence in measurement models for balancedsets of items. Structural Equation Modeling, 7, 608–628

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.

Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage.Carmines, E. G., & Zeller, R. A. (1974). On establishing the empirical dimensionality of theoretical

terms: An analytical example. Political Methodology, 1, 75–96.Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA: Sage.Carver, C. S., & White, T. L. (1994). Behavioral inhibition, behavioral activation and affective re-

sponses to impending reward and punishment: The BIS/BAS scales. Journal of Personality and So-cial Psychology, 67, 319–333.

576 MOTL AND DISTEFANO

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indices for testing measurementinvariance. Structural Equation Modeling, 9, 233–255.

Conroy, D. E. (2002). Progress in the development of a multidimensional measure of fear of failure:The Performance Fear Appraisal Inventory (PFAI). Anxiety, Stress, and Coping, 14, 431–452.

Corwyn, R. F. (2000). The factor structure of global self-esteem among adolescents and adults. Journalof Research in Personality, 34, 357–379.

Crowne, D., & Marlowe, D. (1964). The approval motive. New York: Wiley.Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behav-

ioral Research, 18, 147–167.Davidson, R. J. (1998). Affective style and affective disorders: Perspectives from affective neurosci-

ence. Cognition and Emotion, 12, 307–330.DeVellis, R. F. (1991). Scale development: Theory and applications. Newbury Park, CA: Sage.Goldsmith, R. E. (1986). Dimensionality of the Rosenberg self-esteem scale. Journal of Social Behav-

ior and Personality, 1, 253–264.Hensley, W. E., & Roberts, M. K. (1976). Dimensions of Rosenberg’s Self-Esteem Scale. Psychologi-

cal Reports, 38, 583–584.Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conven-

tional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.Ingels, S. J., Scott, L. A., Lindmark, J. T., Frankel, M. R., Myers, S. L., & Wu, S. (1992). National Edu-

cation Longitudinal Study of 1988. First follow-up: Student component data file user’s manual (Vol.1). Washington, DC: U.S. Department of Education.

Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testingstructural equation models (pp. 294–316). Newbury Park, CA: Sage.

Jöreskog, K. G., & Sörbom, D. (1996a). LISREL 8: User’s reference guide. Chicago: Scientific Soft-ware International.

Jöreskog, K. G., & Sörbom, D. (1996b). PRELIS 2: User’s reference guide. Chicago: Scientific Soft-ware International.

Kaufman, P., Rasinski, K. A., Lee, R., & West, J. (1991). National Education Longitudinal Study of1988. Quality of the responses of eighth-grade students in NELS88. Washington, DC: U.S. Depart-ment of Education.

Kohn, M. L. (1977). Class and conformity: A study of values (2nd ed.). Chicago: University of ChicagoPress.

Marsh, H. W. (1993). Stability of individual differences in multiwave panel studies: Comparisons ofsimplex models and one-factor models. Journal of Educational Measurement, 30, 157–183.

Marsh, H. W. (1996). Positive and negative global self-esteem: A substantively meaningful distinctionor artifactors? Journal of Personality and Social Psychology, 70, 810–819.

Marsh, H. W., & Grayson, D. (1994). Longitudinal stability of latent means and individual differences:A unified approach. Structural Equation Modeling, 1, 317–359.

Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait–multimethod data. In R. H.Hoyle (Ed.), Structural equation modeling: Concept, issues, and applications (pp. 177–198). Thou-sand Oaks, CA: Sage.

Motl, R. W., & Conroy, D. E. (2000). Validity and factorial invariance of the Social Physique AnxietyScale. Medicine and Science in Sports Exercise, 32, 1007–1017.

Motl, R. W., Conroy, D. E., & Horan, P. M. (2000). The Social Physique Anxiety Scale: An example ofthe potential consequences of negatively worded items in factorial validity studies. Journal of Ap-plied Measurement, 1, 327–345.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.Pitts, S. C., West, S. G., & Tein, J. (1996). Longitudinal measurement models in evaluation research:

Examining stability and change. Evaluation and Program Planning, 19, 333–350.

LONGITUDINAL INVARIANCE 577

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013

Steiger, J. H., & Lind, J. (1980, May). Statistically based tests for the number of common factors. Paperpresented at the meeting of the Psychometric Society, Iowa City, IA.

Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In A. von Eye (Ed.), Statistical methodsin longitudinal research: Vol. 1. Principles and structuring change (pp. 125–149). San Diego, CA:Academic.

Tomás, J. M., Hontangas, P. M., & Oliver, A. (2000). Linear confirmatory factor models to evaluatemultitrait–multimethod matrices: The effects of number of indicators and correlations among meth-ods. Multivariate Behavioral Research, 35, 469–499.

Tomás, J. M., & Oliver, A. (1999). Rosenberg’s self-esteem scale: Two factors or method effects. Struc-tural Equation Modeling, 6, 84–98.

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the management invariance litera-ture: Suggestions, practices, and recommendations for organizational research. Organization Re-search Methods, 3, 4–69.

Vigliocco, G. (2000). Language processing: The anatomy of meaning and syntax. Current Biology, 10,R78–R80.

Wang, J., Siegal, H. A., Falck, R. S., & Carlson, R. G. (2001). Factorial structure of Rosenberg’s self-es-teem scale among crack-cocaine drug users. Structural Equation Modeling, 8, 275–286.

578 MOTL AND DISTEFANO

APPENDIX

Item Wording (Variable Names)

Pos1 I feel good about myself. (BYS44A, F1S62A, F2S66A)Pos2 I feel I am a person of worth, the equal of other people. (BYS44D, F1S62D, F2S66D)Pos3 I am able to do things as well as most other people. (BYS44E, F1S62E, F2S66E)Pos4 On the whole, I am satisfied with myself. (BYS44H, F1S62H, F2S66H)Neg1 I certainly feel useless at times. (BYS44I, F1S62I, F2S66I)Neg2 At times I think I am no good at all. (BYS44J, F1S62J, F2S66J)Neg3 I feel I do not have much to be proud of. (BYS44L, F1S62L, F2S66L)

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

06:

33 0

2 M

ay 2

013