27
The Integration of Continuous and Discrete Latent Variable Models: Potential Problems and Promising Opportunities Daniel J. Bauer North Carolina State University Patrick J. Curran University of North Carolina at Chapel Hill Structural equation mixture modeling (SEMM) integrates continuous and discrete latent variable models. Drawing on prior research on the relationships between continuous and discrete latent variable models, the authors identify 3 conditions that may lead to the estimation of spurious latent classes in SEMM: misspecifica- tion of the structural model, nonnormal continuous measures, and nonlinear rela- tionships among observed and/or latent variables. When the objective of a SEMM analysis is the identification of latent classes, these conditions should be considered as alternative hypotheses and results should be interpreted cautiously. However, armed with greater knowledge about the estimation of SEMMs in practice, re- searchers can exploit the flexibility of the model to gain a fuller understanding of the phenomenon under study. In recent years, many exciting developments have taken place in structural equation modeling, but per- haps none more so than the development of structural equation models that account for unobserved popula- tion heterogeneity, or mixtures of unobserved groups. Structural equation mixture models (SEMMs) are ide- ally suited for testing theories where qualitatively distinct types of individuals are thought to be charac- terized by different structural relationships between variables (and/or their means) and where group mem- bership cannot be observed directly. For instance, in the context of marketing research, Jedidi, Jagpal, and DeSarbo (1997a) noted that market segments must be defined by the differential appeal of product features, rather than by a priori classification. Simi- larly, Arminger and Stein (1997) noted the desirabil- ity of identifying unobserved heterogeneity in indi- vidual lifestyles in sociological research. More recently, drawing on the work of Nagin and Land (1993) and Verbeke and Lesaffre (1996), B. O. Mu- the ´n (2001) demonstrated how SEMMs could be used in psychological and educational research to capture heterogeneity in developmental pathways. More formally, in SEMM, latent groups or classes are defined at the distributional level. Each group is composed of a collection of individuals who may dif- fer in their individual scores but are homogeneous with respect to the distribution from which they were sampled. Specifically, their observed scores are thought to be drawn from a common multivariate nor- mal distribution, such that the same structural rela- tionships among variables hold for all individuals within the group. Population heterogeneity is then indicated by the presence of two or more latent groups in the population characterized by different distribu- tions. Analyzing this type of population heterogeneity has been the traditional domain of finite normal mix- ture models (Everitt & Hand, 1981; McLachlan & Peel, 2000; Titterington, Smith, & Makov, 1985). The more recent synthesis of finite normal mixture mod- eling with structural equation modeling represents a major increase in the analytic capabilities of both models (Arminger & Stein, 1997; Dolan & van der Maas, 1998; Jedidi et al., 1997a; Jedidi, Jagpal, & Editor’s Note. Scott Maxwell served as action editor for this article.—SGW Daniel J. Bauer, Department of Psychology, North Caro- lina State University; Patrick J. Curran, Department of Psy- chology, University of North Carolina at Chapel Hill. This work was funded in part by National Institute on Drug Abuse (NIDA) Fellowship DA06062, awarded to Daniel J. Bauer, and NIDA Grant DA13148, awarded to Patrick J. Curran. We thank Kenneth Bollen and the mem- bers of the Carolina Structural Equations Modeling Group for their valuable input throughout this project. Correspondence concerning this article should be ad- dressed to Daniel J. Bauer, Department of Psychology, North Carolina State University, Raleigh, NC 27695-7801. E-mail: [email protected] Psychological Methods Copyright 2004 by the American Psychological Association 2004, Vol. 9, No. 1, 3–29 1082-989X/04/$12.00 DOI: 10.1037/1082-989X.9.1.3 3

The Integration of Continuous and Discrete Latent Variable

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Integration of Continuous and Discrete Latent Variable

The Integration of Continuous and DiscreteLatent Variable Models:

Potential Problems and Promising Opportunities

Daniel J. BauerNorth Carolina State University

Patrick J. CurranUniversity of North Carolina at Chapel Hill

Structural equation mixture modeling (SEMM) integrates continuous and discretelatent variable models. Drawing on prior research on the relationships betweencontinuous and discrete latent variable models, the authors identify 3 conditionsthat may lead to the estimation of spurious latent classes in SEMM: misspecifica-tion of the structural model, nonnormal continuous measures, and nonlinear rela-tionships among observed and/or latent variables. When the objective of a SEMManalysis is the identification of latent classes, these conditions should be consideredas alternative hypotheses and results should be interpreted cautiously. However,armed with greater knowledge about the estimation of SEMMs in practice, re-searchers can exploit the flexibility of the model to gain a fuller understanding ofthe phenomenon under study.

In recent years, many exciting developments havetaken place in structural equation modeling, but per-haps none more so than the development of structuralequation models that account for unobserved popula-tion heterogeneity, or mixtures of unobserved groups.Structural equation mixture models (SEMMs) are ide-ally suited for testing theories where qualitativelydistinct types of individuals are thought to be charac-terized by different structural relationships betweenvariables (and/or their means) and where group mem-bership cannot be observed directly. For instance,in the context of marketing research, Jedidi, Jagpal,and DeSarbo (1997a) noted that market segmentsmust be defined by the differential appeal of product

features, rather than by a priori classification. Simi-larly, Arminger and Stein (1997) noted the desirabil-ity of identifying unobserved heterogeneity in indi-vidual lifestyles in sociological research. Morerecently, drawing on the work of Nagin and Land(1993) and Verbeke and Lesaffre (1996), B. O. Mu-then (2001) demonstrated how SEMMs could be usedin psychological and educational research to captureheterogeneity in developmental pathways.

More formally, in SEMM, latent groups or classesare defined at the distributional level. Each group iscomposed of a collection of individuals who may dif-fer in their individual scores but are homogeneouswith respect to the distribution from which they weresampled. Specifically, their observed scores arethought to be drawn from a common multivariate nor-mal distribution, such that the same structural rela-tionships among variables hold for all individualswithin the group. Population heterogeneity is thenindicated by the presence of two or more latent groupsin the population characterized by different distribu-tions. Analyzing this type of population heterogeneityhas been the traditional domain of finite normal mix-ture models (Everitt & Hand, 1981; McLachlan &Peel, 2000; Titterington, Smith, & Makov, 1985). Themore recent synthesis of finite normal mixture mod-eling with structural equation modeling represents amajor increase in the analytic capabilities of bothmodels (Arminger & Stein, 1997; Dolan & van derMaas, 1998; Jedidi et al., 1997a; Jedidi, Jagpal, &

Editor’s Note. Scott Maxwell served as action editor forthis article.—SGW

Daniel J. Bauer, Department of Psychology, North Caro-lina State University; Patrick J. Curran, Department of Psy-chology, University of North Carolina at Chapel Hill.

This work was funded in part by National Institute onDrug Abuse (NIDA) Fellowship DA06062, awarded toDaniel J. Bauer, and NIDA Grant DA13148, awarded toPatrick J. Curran. We thank Kenneth Bollen and the mem-bers of the Carolina Structural Equations Modeling Groupfor their valuable input throughout this project.

Correspondence concerning this article should be ad-dressed to Daniel J. Bauer, Department of Psychology,North Carolina State University, Raleigh, NC 27695-7801.E-mail: [email protected]

Psychological Methods Copyright 2004 by the American Psychological Association2004, Vol. 9, No. 1, 3–29 1082-989X/04/$12.00 DOI: 10.1037/1082-989X.9.1.3

3

Page 2: The Integration of Continuous and Discrete Latent Variable

DeSarbo, 1997b). B. O. Muthen (2001) even sug-gested that SEMM represents a second-generation la-tent variable model because it integrates both continu-ous and discrete latent variable models. Indeed, wemay view SEMM as composed of two submodels, acontinuous latent variable submodel involving latentfactors and a discrete latent variable submodel com-posed of latent classes, both of which are estimatedsimultaneously. Although this new modeling architec-ture is exciting and highly promising, much work re-mains to be done to determine the specific strengthsand potential limitations of the model.

Our goals here are fourfold. First, we highlight thatSEMM represents the end result of more than a cen-tury of analytic developments on continuous and dis-crete latent variable models, each of which, in isola-tion, has been carefully evaluated. We trace thedevelopment of SEMM from these more traditionallatent variable models and discuss how SEMM mayserve as an integrative framework. Second, we arguethat although the assumptions of continuous and dis-crete latent variable models are relatively well under-stood when considered independently, the implica-tions of synthesizing the two types of models have notyet been fully investigated. We examine this issue byidentifying several analytical relationships that existbetween discrete and continuous latent variables thatmay portend difficulty for some SEMM analyses.Third, we formally extend these analytical develop-ments to SEMM to demonstrate the conditions underwhich the model may not optimally recover the popu-lation structure; specifically, we focus on the spuriousidentification of unobserved groups that do not actu-ally exist in the population. Here our goal is only todelineate general principles that may lead to the iden-tification of spurious latent classes without attemptingto thoroughly evaluate the scope or potential impactof these issues under specific sampling conditions.Finally, we use these same general principles to iden-tify several new opportunities for exploring complexempirical relationships with these models when thereis little explicit interest in identifying “true” sub-groups within the population.

Classical Continuous and Discrete LatentVariable Models

We begin with an overview of two “first-generation” latent variable models for continuous out-comes, the common factor model of Thurstone (1947)and the latent profile model of Gibson (1959). We

open with these models because they were developedexpressly as alternative approaches for latent variableanalysis. As such, much work has been conducted todemonstrate their analytical similarities and differ-ences. In addition, these models represent the initialsteps in a long line of research ultimately resulting inthe SEMM. We thus subsequently consider how theanalytical relationships revealed by these models gen-eralize to SEMM.

Factor Analysis

Since its inception, one of the interpretations of thecorrelation coefficient has been that it indicates twovariables share a common cause (e.g., Galton, 1888,p. 135; see Stigler, 1986). Factor analysis, originatingwith Spearman (1904) and greatly extended by Thur-stone (1935, 1947), was developed as a tool for iden-tifying these unobserved common causes or factors.Traditionally, the common factor model assumes thatall of the shared variance (covariance or correlation)among a set of observed variables reflects the pres-ence of continuously distributed latent factors with acommon influence on the variables.1 For example,suppose that a study of high school youth revealed apositive correlation between truancy and illicit druguse. From the perspective of the common factormodel, this correlation might be viewed as evidencefor a continuously distributed underlying antisocialbehavior dimension. After the presence of this under-lying common factor is accounted for, the residualrelationship between truancy and drug use would ap-proach zero. This illustrates an idea known as theaxiom of local independence, which holds that onceall of the common factors are estimated, the residualassociations among the observed variables should bezero, within sampling error.

The population model for the common factor modelmay be expressed formally as

∑∑ = ���� + �, (1)

where ∑∑ is a q × q covariance matrix of the observedvariables, � is a q × m factor loading matrix, � is anm × m covariance matrix of the common factors, and� is a q × q covariance matrix of the residuals net ofthe common factors. The assumption of local inde-

1 Though he is best known for developing a model formultiple continuous common factors, Thurstone (1935, pp.51–52; 1947, p. 343) also considered the possibility of di-chotomous latent factors.

BAUER AND CURRAN4

Page 3: The Integration of Continuous and Discrete Latent Variable

pendence is formalized by constraining � to be adiagonal matrix. With this assumption, the varianceson the diagonal of ∑∑ are partitioned into commonfactor variance and residual (unique) variance, but theoff-diagonal covariances in ∑∑ are reproduced exclu-sively by the relationship of the observed variables tothe common factors.

Two other assumptions are key to the common fac-tor model. The first is that the observed variables canbe expressed as a linear combination of the latentfactors and the residuals. Nonlinearities in these rela-tionships may force the estimation of difficulty or nui-sance factors that do not reflect the true dimension-ality of the factor space (Gibson, 1959; McDonald,1967; Wherry & Gaylord, 1944). Second is the as-sumption that the relationships among the measuresare sufficiently summarized by the second-order mo-ments (variances and covariances), which is strictlytrue only if the data are multivariate normally distrib-uted. Because the observed variables are linear com-binations of the latent factors and residuals, thiswould imply that the latent factors and residuals arealso multivariate normally distributed.2

Latent Profile Analysis

Latent profile analysis is based on a second inter-pretation of the correlation coefficient, namely, thatthe correlation reflects the presence of discrete groupsin the population, each characterized by differentmean levels on the observed variables. To understandhow this interpretation differs from the common fac-tor model, consider again the positive correlation be-tween truancy and illicit drug use. The common factormodel posited that this correlation reflects the pres-ence of an underlying antisocial behavior dimension.Alternatively, the same observed correlation could in-stead indicate the presence of two qualitatively dis-tinct types of adolescents, one characterized by highlevels of truancy and illicit drug use, and the othercharacterized by low levels of truancy and illicit druguse. When mixed together, the total population ofadolescents would exhibit the observed positive cor-relation between truancy and drug use.

Latent profile analysis was developed by Gibson(1959) as a means for identifying these latent groups,and was explicitly presented as an alternative to Thur-stone’s (1947) common factor model. Thus, ratherthan postulating that continuous latent factors explainthe observed associations, the latent profile modelholds that the associations are a by-product of differ-ences in the means of the continuous measures over

the latent groups or classes. Paralleling the assump-tion of local independence in the common factormodel, Gibson assumed that, conditional on classmembership, the residual association between the ob-served variables would be zero, again within samplingvariability. In this context, local independence is in-voked based on the philosophy that the mean vector(or centroid) of a latent class represents the true scoresfor all individuals of that type. Any deviation from themean vector should therefore be random and indepen-dent.

The fundamental equations of the latent profilemodel for the variances and covariances of the ob-served variables were expressed by Lazarsfeld andHenry (1968, chap. 8) in scalar form as

�ii2 = �

k=1

K

�k ��ik − �i�2 + �

k=1

K

�k�iik2 , (2)

and

�ij = �k=1

K

�k ��ik − �i� ��jk − �j�, (3)

where i and j (i � j) are index specific variables andk designates a specific latent class, so that �ik

repre-sents the mean and �2

iikrepresents the variance for

variable i in group k, K is the total number of latentclasses, �k indicates the proportion of cases belongingto each class (where ∑∑K

k=1 �k � 1), and the grandmean of each variable is calculated as a weightedaverage of the class means, or

�i = �k=1

K

�k�ik. (4)

Lazarsfeld and Henry (pp. 235–236) noted that Equa-tion 2 represents the familiar decomposition of total(or aggregated) variance into between-class andwithin-class components that is used in conventionalanalysis of variance models. The difference in thiscase is that group membership is not observed but isestimated instead.

2 The assumption of normality is made explicit in maxi-mum-likelihood estimation. Other methods of estimationmay permit the inclusion of ordinal indicators and/or relaxthe assumption of normality, but typically the latent factorsare still assumed to be continuously distributed and linearlyrelated, and the second-order moments are treated as suffi-cient summary statistics.

PROBLEMS AND OPPORTUNITIES 5

Page 4: The Integration of Continuous and Discrete Latent Variable

Estimation of the model relies in part on the as-sumption of local independence embodied in Equa-tion 3. Note that in contrast to Equation 2, there is nowithin-class component to Equation 3. This meansthat any association between measures i and j thatexists in the aggregate population must be accountedfor by the between-class component, or the classmean differences. By implication, i and j are assumedto be independent (i.e., orthogonal) within each classk. Of importance, unlike the common factor model,the latent profile model also traditionally uses thehigher order moments of the data, including the tripleproducts among three variables that are likewise as-sumed to be zero (implying higher order indepen-dence as opposed to simply pairwise independence;for further detail, see Gibson, 1959; Lazarsfeld &Henry, 1968).

To facilitate further comparison to the common fac-tor model, we can rewrite Equations 2 and 3 morecompactly in matrix form in terms of the class covari-ance matrices and mean vectors so that the aggregatedcovariance matrix is

∑ = �k=1

K

�l�k+1

K

�k�l�µk − µl� �µk − µl�� + �k=1

K

�k∑k,

(5)

where local independence dictates that ∑∑k is diagonalfor all k and µk is the centroid of variable means forclass k.3 Note that the two terms in Equation 5 directlyparallel those in Equation 1. Whereas the factor analy-sis model decomposes the covariance matrix intocommon factors and residual variances, the latent pro-file model decomposes the same covariance matrixinto mean differences between discrete latent classesand within-class residual variances.

A Comparison of the Two Models

There are important similarities between the latentprofile model and the common factor model, but alsokey differences. We discuss these relationships hereconceptually and illustrate the primary points withseveral demonstrative artificial data sets. These datasets are used for pedagogical purposes only and arenot intended to represent a comprehensive simulationstudy of the issues at hand.

One similarity between the latent profile model andthe common factor model is that both models takelocal independence to be axiomatic. Because of thisshared assumption, both models decompose the co-variance matrix into shared variance accounted for bythe latent variables and uncorrelated residuals. In fact,

under certain conditions, the two decompositions ofthe covariance matrix are analytically equivalent.Specifically, a covariance matrix generated to be con-sistent with an m-factor model can be perfectly repro-duced with a K � m + 1 class latent profile model andvice versa (Bartholomew, 1987; Gibson, 1959; Mc-Donald, 1967; Molenaar & von Eye, 1994). This facthas led some methodologists to take the stance thatthe two models should be regarded as mutuallycomplementary (e.g., B. O. Muthen, 2003; B. O. Mu-then & Muthen, 2000; but see Meehl, 1995). Fromthis perspective, the common factor model decom-poses the covariances to highlight relationshipsamong the variables, whereas the latent profile modeldecomposes the covariances to highlight relationshipsamong individuals. Because one never knows the truegenerating model, and because each model canequivalently reproduce the covariances, it could beargued that neither model is superior to the other (i.e.,Cudeck & Henly, 2003; B. Muthen, 2003), though wedo not necessarily take this view ourselves (Bauer &Curran, 2003b; see also Meehl, 1995).

This equivalency of the two models is representedgraphically in Figure 1. The ellipses presented in thefigure are 95% confidence ellipses based on the as-sumption of bivariate normality. The artificially gen-erated sample data consist of 100 cases (N � 100),and the sample Pearson product–moment correlationis 67. First consider Figure 1A, representing the com-mon factor model. Here the variables are related toone another in a continuous and linear fashion. Thepositive correlation between x and y, captured by theupward slope of the major axis of the ellipse, is as-sumed to reflect the presence of a common underlyinglatent factor. In comparison, Figure 1B is consistentwith a latent profile model (where bivariate normality

3 Equation 5 also represents the multivariate generaliza-tion of Meehl’s (1965, 1968; Meehl & Golden, 1982) gen-eral covariance mixture theorem. Of interest, Meehl’s co-herent cut kinetics method for discriminating latent taxa,like the latent profile model, makes the assumption of in-dependence within categories (i.e., that the covariances be-tween variables are zero within classes). Because the ana-lytic model of coherent cut kinetics appears to be identicalto the latent profile model, any empirical differences thatarise between the two techniques can probably be ascribedsolely to differences in the methods of estimation. Giventhis relationship between the two models, it is likely that theconclusions we make here regarding the latent profile modelcan be generalized directly to coherent cut kinetics as well.

BAUER AND CURRAN6

Page 5: The Integration of Continuous and Discrete Latent Variable

is assumed within components for convenience). Notethat in Figure 1B there is no tilt to the ellipses for thetwo latent classes, reflecting the assumption of localindependence (that x and y are uncorrelated withinclasses). The overall correlation between x and y inthe aggregate data is instead captured by the differentclass centroids, or mean vectors. Specifically, if onewere to draw a line between the class centroids inFigure 1B, it would parallel the major axis of the tiltedellipse in Figure 1A, illustrating that the two modelsequivalently reproduce the correlation between x and y.

Given this fundamental relationship between thetwo models, Molenaar and von Eye (1994) remarkedthat the choice of continuous versus discrete scalingfor the latent variables is essentially arbitrary, if weconfine our analysis to first and second moments (i.e.,means and covariances). The common factor model,because it assumes multivariate normality and linear-ity, only involves the analysis of these moments. Incontrast, as Lazarsfeld and Henry (1968, pp. 228–229) noted, the latent profile model uses the higherorder moments of the data and so does not assumemultivariate normality of the aggregate distribution(though normality within classes may be assumed;e.g., Arminger & Stein, 1997; Lazarsfeld & Henry,1968, pp. 235–239). For the same reason, unlike thecommon factor model, the latent profile model does

not require that the variables be linearly related (Gib-son, 1959).

Figures 2 and 3 highlight these differences betweenthe common factor and latent profile models. Figure 2presents the same contrast as in Figure 1, but here thedata were generated to be slightly skewed (N � 100;r � .75; skewx � .64; skewy � .82). Most of the datapoints are observed in the lower left quadrant, andboth the sparseness and the spread of the data increaseas we move to the upper right quadrant. The confi-dence ellipse in Figure 2A does not optimally repre-sent the characteristics of the observed data because itassumes bivariate normality. In contrast, the latentprofile model in Figure 2B provides a better summaryof the data. The dense data in the lower left quadrantare captured by one large group with small variancesfor x and y, and the disparate data in the upper rightquadrant are captured by a smaller group with largervariances for x and y. Note that both models equiva-lently recover the overall correlation between x and y,but the latent profile model also captures the nonnor-mality of the data.

Finally, Figure 3 demonstrates the performance ofthe two models when x and y are nonlinearly related(which also implies that bivariate normality will beviolated). The generated data show a strong positiverelationship (N � 100, r � .74), but y increases ex-

Figure 1. A and B are scatter plots containing 100 data points (N � 100) generated froma bivariate normal distribution. The 95% confidence ellipse in A shows how a common factormodel would reproduce the data. The 95% confidence ellipses in B show how a two-classlatent profile model would reproduce the data (the percentage of cases in each class isindicated).

PROBLEMS AND OPPORTUNITIES 7

Page 6: The Integration of Continuous and Discrete Latent Variable

Figure 3. A and B are scatter plots containing 100 data points (N � 100) generated to showa nonlinear relationship. The 95% confidence ellipse in A shows how a common factor modelwould reproduce the data. The 95% confidence ellipses in B show how a three class latentprofile model would reproduce the data (the percentage of cases in each class is indicated).

Figure 2. A and B are scatter plots containing 100 data points (N � 100) generated froma skewed bivariate distribution. The 95% confidence ellipse in A shows how a common factormodel would reproduce the data. The 95% confidence ellipses in B show how a two-classlatent profile model would reproduce the data (the percentage of cases in each class isindicated).

BAUER AND CURRAN8

Page 7: The Integration of Continuous and Discrete Latent Variable

ponentially with x—specifically, E(y|x) � −.70 +.42ex in the population. The confidence ellipse in Fig-ure 3A represents the common factor model, with itsassumptions of normality and linearity. This ellipsecaptures the overall positive relationship between thetwo variables but fails to account for the nonlinearform of this relationship. In contrast, the latent profilemodel depicted in Figure 3B, including three latentclasses, captures the nonlinear relationship between xand y well. This result is explained analytically byMcDonald (1967), who noted that a nonlinear factormodel approximated well by a polynomial of degreem can be equally represented by a latent profile modelwith K � m + 1 points (a result that is related to thefact that m + 1 points is the minimum number neededto define a polynomial of degree m), at least at thelevel of the second-order moments.

In summary, both the common factor model and thelatent profile model have the ability to account for themean and covariance structure of a set of continuousmeasures. However, the two models are not equiva-lent beyond the level of the first and second moments.Because the latent profile model uses informationabout the higher order moments, it can more flexiblycapture nonnormality and nonlinearity in the data. Ofimportance, these differences between the commonfactor model and latent profile model reflect a generalasymmetry between continuous and discrete latentvariables. This asymmetry has important implicationsfor models such as the SEMM that involve the simul-taneous estimation of both continuous and discretelatent variables. In the next section, we cast theSEMM as a hybrid model that may be viewed as anextension of more traditional continuous and discretelatent variable models (see also Arminger & Stein,1997; B. O. Muthen, 2002). The common basis ofSEMM with these models suggests that the principleswe highlighted here for the common factor model andthe latent profile model also hold in SEMM. We dem-onstrate this point with several examples illustratinghow the asymmetrical capabilities of discrete and con-tinuous latent variables may influence SEMM analy-ses.

Structural Equation Mixture Models

Relation to the Common Factor Model

The intellectual progression from the common fac-tor model to SEMM is relatively direct, and we thusprovide only a cursory review of this developmenthere. See Bollen (1989, pp. 4–9) and Kaplan (2000,

chap. 1) for further details on various steps in thisprogression. First, the common factor model may beviewed as the antecedent of confirmatory factoranalysis (CFA). CFA is predicated on the same no-tions as the common factor model and follows thesame basic form as Equation 1. Where CFA differsfrom the common factor model is in the imposition ofrestrictions on the parameters (typically the factorloadings). These restrictions are used to achievemodel identification, (Bollen, 1989, pp. 238–253) andmay allow the axiom of local independence to berelaxed.4 In turn, the CFA model may be viewed asthe antecedent of the structural equation model.Whereas in CFA the correlations among latent factorsare typically freely estimated (that is, unrestricted), instructural equation modeling these correlations maybe structured to reflect causal relations (Joreskog,1973; see also Bollen, 1989).

A key assumption of standard CFA and structuralequation modeling is that the population is homoge-neous, so that a single covariance matrix can be usedto summarize the relationships among the variables.To overcome this assumption, Joreskog (1971) ex-tended the model to multiple samples. These modelsallow for the simultaneous estimation of structuralequation models in two or more groups, each charac-terized by their own mean vector and covariance ma-trix. Multiple-groups structural equation modeling isparticularly useful for testing hypotheses about theinvariance of a model over samples (e.g., the factorstructure of a test for males and females). Althoughthe multiple-groups framework explicitly accounts forpopulation heterogeneity, its limitation is the require-ment that the groups be defined a priori. SEMM maythen be viewed as the logical next step, extending themultiple-groups model to the case in which groupmembership is not observed and must instead be in-ferred inductively from the data (Jedidi et al., 1997a;Yung, 1997).

Relation to the Latent Profile Model

Surprisingly, the latent profile model appears tohave had little direct influence on the development ofSEMM. It is instructive to better understand why this

4 For instance, it is common to intercorrelate the residualsof observed variables that share some common measure-ment property. From the perspective of the common factormodel, such a correlation would be viewed as evidence of acommon method factor.

PROBLEMS AND OPPORTUNITIES 9

Page 8: The Integration of Continuous and Discrete Latent Variable

was the case. As with the common factor model, thelatent profile model was predicated on the axiom oflocal independence. The two models were thus mutu-ally contradictory: Either the observed correlationsreflect the presence of continuous underlying factors,or they reflect the presence of unobserved groups, butnot both. The mutual reliance of the two models onthis axiom precluded their integration.

The development of SEMM may be traced moredirectly to a different kind of discrete latent variablemodel, namely, finite normal mixture models. Finitenormal mixture models assume that the populationconsists of a mixture of unobserved groups, eachcharacterized by its own normal distribution for thecontinuous measures. The univariate normal mixturemodel was considered by Pearson (1894), with prac-tical applications to multivariate settings awaiting thecomputer revolution (Wolfe, 1970, 1971). The prob-ability density function (PDF) of the multivariate nor-mal mixture model is

f�x� = �k=1

K

�k�k�x; µk, ∑k�, (6)

where x � (x1, x2, � � �, xq) is a vector of q continuousrandom variables, �k are again the class proportions(where ∑∑K

k�1 �k � 1), and the variables within eachclass follow a multivariate normal PDF (denoted �k)with mean vector µk and covariance matrix ∑∑k. Forfurther detail on this model (as well as mixtures ofother distributions), see Everitt and Hand (1981), Tit-terington et al. (1985), or McLachlan and Peel (2000).

There are two key differences between finite nor-mal mixture models and the latent profile model.First, in the finite normal mixture model, the paramet-ric assumption of normality within groups is explicitand is relied upon to estimate the model parameters.Second, given this, the axiom of local independenceneed not be invoked for the model to be estimable. Assuch, the off-diagonal elements of the within-classcovariance matrices ∑∑k may be freely estimated (as-suming the model is identified). Despite these differ-ences, the basic formulas for calculating the means µand covariances ∑∑ of the aggregate distribution f(x)remain identical to Equations 4 and 5 presented earlierfor the latent profile model. By implication, finitenormal mixture models have many of the same basicproperties as latent profile analysis (i.e., the latentclasses can serve many of the same functions).

These differences between the latent profile modeland finite normal mixture model are deceptively

simple, for they actually represent a radical differencein the underlying conceptual model. In the latent pro-file model, the assumption of local independence wasimposed not simply for analytical convenience but onthe philosophy that the latent class variable shouldexplain the associations in the data. That is, control-ling for the mean differences between the groups,there should be no residual association between thevariables. This notion was based on the conceptualmodel that all individuals within a class should becharacterized by identical true scores as representedby the class centroid. Apart from measurement error,all individuals within a class would then be essentiallyinterchangeable and thus truly homogeneous. Re-sidual variability within a class, reflecting only ran-dom measurement error, would be uncorrelated bydefinition.

In contrast, the finite normal mixture model in-volves a much different conceptual model. Specifi-cally, homogeneity is no longer defined in terms of acommon set of true scores (i.e., interchangeability).Instead, homogeneity is defined at the distributionallevel, by a common mean vector and covariance ma-trix. From this perspective, the latent class variable isnot an explanatory variable (as it was in the latentprofile model) but is instead a moderator variable (Je-didi et al., 1997a, 1997b). That is, the magnitude anddirection of the relationships between variables mayvary as a function of class. In terms of our earlierexample, truancy and drug use may be positively cor-related in one group of adolescents, orthogonal in an-other, and negatively correlated in a third (e.g., forwhom the school provides a context for drug use);thus, the relation between truancy and drug use de-pends on class membership, but it is not explained byclass membership.

It is this latter definition of homogeneity that per-mits the integration of finite normal mixture modelswith contemporary continuous latent variable models.Specifically, given that each latent class is defined byits own mean vector and covariance matrix, a logicalnext step is to structure these moments to be consis-tent with a continuous latent variable model. This stepwas first taken by Blåfield (1980) and Yung (1997),who demonstrated that a mixture of factor analysismodels could be estimated by specifying that eachwithin-class covariance matrix ∑∑k is structured ac-cording to Equation 1. By doing so, researchers canestimate latent classes that may differ in factor struc-ture, factor means, variances or covariances, factorloadings, and/or residual variances (subject to identi-

BAUER AND CURRAN10

Page 9: The Integration of Continuous and Discrete Latent Variable

fiability constraints). Jedidi et al. (1997a, 1997b),Arminger and Stein (1997), and Dolan and van derMaas (1998) each independently generalized this ap-proach to mixtures of structural equation models withlatent variables. The basic form for SEMM is thesame as Equation 6, with the only difference beingthat the class PDFs are rewritten,

�k [x; µk(�k), ∑∑k(�k)], (7)

to indicate that both the class mean vector µk and theclass covariance matrix ∑∑k are modeled as a functionof a class-specific vector of model parameters �k.

5

The analytic goal is thus to fit a structural model to thewithin-class means and covariances. This basic modelwas recently extended by B. Muthen and Shedden(1999) to include categorical outcomes and covariatesof latent class membership (for greater detail, seeL. K. Muthen & Muthen, 1998).

SEMM as an Integrative Framework

Implicit in our preceding discussion is that SEMMsrepresent a synthesis and generalization of many otherlatent variable models. This is highlighted by consid-ering the specific conditions under which SEMM re-duces to the other models we have discussed. Forinstance, if K � 1, then SEMM reduces to a conven-tional single-group CFA or structural equation model.Alternatively, if K > 1, and both µk and ∑∑k are fullyunconstrained, the model becomes a standard multi-variate finite normal mixture model (Jedidi et al.,1997a, 1997b). Finally, if K > 1, µk is unconstrained,and ∑∑k is constrained to be a diagonal matrix, themodel reduces to the latent profile model (with a for-mal assumption of normality within classes; Arminger& Stein, 1997). Thus, both conceptualizations of ho-mogeneity discussed above (interchangeability anddistributional homogeneity) can be accommodatedwithin the SEMM. The relation of SEMM to tradi-tional multiple-groups CFA or structural equationmodeling is also straightforward. Specifically, in theestimation process, probabilities of group member-ship (posterior probabilities) are estimated for eachcase and are used to weight each case’s contributionto the estimation of the model parameters for eachclass. If the grouping variable was observed, theseprobabilities could be replaced by 1s (indicating in-clusion in the group) or 0s (indicating exclusion), andthe model would reduce to the standard multiple-groups structural equation model (Jedidi et al., 1997b;L. K. Muthen & Muthen, 1998, Appendix 9; Zhu &

Lee, 2001). SEMM may thus be thought of as mul-tiple-groups structural equation modeling wheregroup membership is an unobserved or latent variable.

To summarize thus far, we note that the develop-ment of SEMM may be viewed as the end result ofconverging lines of research on continuous and dis-crete latent variable models that began more than acentury ago. However, little research has yet beenconducted to examine the implications of combiningthe two models. It is to this issue that we now turn.

Combining Continuous and DiscreteLatent Variables

As we noted in our discussion of the common fac-tor model and the latent profile model, the relation-ship between continuous and discrete latent variablesis both complex and asymmetrical. Whereas continu-ous and discrete latent variables are equivalently ca-pable of reproducing the first two moments of the data(means and covariances), discrete latent variables canalso accommodate nonnormal data and nonlinear re-lationships in a way that continuous latent variablestypically cannot. We now show how these same prop-erties of continuous and discrete latent variables sur-face in important ways in the SEMM. Paralleling ourdiscussion of the common factor model and the latentprofile model, we focus on three issues: model speci-fication, distributional assumptions, and the assump-tion of linearity. We show that, apart from the pres-ence of true latent groups, misspecification of any ofthese aspects of the model may lead to the estimationof multiple latent classes.

Of importance, although we both analytically andempirically demonstrate that spurious latent classescan be extracted for any one of these reasons, wemake no conclusions about the extent to which theseproblems will arise in applied research. To address the

5 For simplicity, we assume all variables in the SEMMare endogenous. Arminger and Stein (1997) and Arminger,Stein, and Wittenberg (1999) have noted that models withcategorical or nonnormal exogenous predictors (e.g., gen-der, marital status) should be based on conditional finitenormal mixtures. Otherwise, the distribution of the exog-enous variables will influence latent class estimation andpotentially bias within-class parameter estimates. The con-ditional finite normal mixture differs in important waysfrom the unconditional model we describe here, but we donot dwell on the distinction here because our points gener-alize directly to the conditional case.

PROBLEMS AND OPPORTUNITIES 11

Page 10: The Integration of Continuous and Discrete Latent Variable

latter issue would require comprehensive simulationstudies beyond the scope of the current article to de-termine more precisely the conditions under whichspurious latent classes will be estimated when theseassumption violations occur, either alone or, as seemsmore likely, in combination.

In the examples that follow, we approach the prob-lem of identifying the optimal number of latentclasses from a largely exploratory vantage point. Usedin an exploratory way, the analysis begins with thehypothesis that multiple groups may exist and focuseson identification of the optimal number of classesbased on the comparative fit of models with succes-sively more latent classes. Here we are concernedwith whether the number decided upon includes spu-rious latent classes. A more confirmatory approach tothe problem would involve a strong theoretical pre-diction that the population consists of a specific num-ber of groups. It may be argued that in this confirma-tory mode, our theory would in some way protect usfrom accepting a model with too many (or too few)latent classes. However, even in this case, the theorymust be supported by showing that the model with thepredicted number of classes fits optimally when com-pared with models with fewer or more latent classes.We could, of course, elect to estimate the predictednumber of classes perforce, but this would reverse thetypical practice of inferential statistics. That is, wewould be using theory to dictate the optimal statisticalmodel for the data rather than using the statisticalmodel to test the theory against the data (Bauer &Curran, 2003a, 2003b). For this reason, we regard theissues we raise as being equally important for explor-atory and confirmatory SEMM analyses.

Of course, this raises the question, How does onedefine the optimally fitting model? This matter iscomplicated by the fact that regularity conditions donot hold for conducting standard likelihood ratio testsbetween models with different numbers of latentclasses (see McLachlan & Peel, 2000, for a review). Amodified version of the likelihood ratio test, based ona sum of chi-square distributions, was recently pro-posed by Lo, Mendall, and Rubin (2001) and has beenadvocated by B. Muthen (2003), though we cautionthat the performance of this test has yet to be inves-tigated in complex models of the kind considered here(Bauer & Curran, 2003b).

In practice, it is still common to determine the op-timal number of latent classes through the comparisonof information criteria. Commonly used measures offit include Akaike’s information criterion (AIC),

Bayes’s information criterion (BIC), the consistentAIC (CAIC), the classification likelihood criterion(CLC), the normalized entropy criterion (NEC), andthe integrated completed likelihood criterion with BICapproximation (ICL-BIC). These criteria all balancethe improvement in fit associated with adding classesto the model against specific penalty factors. Whereasthe AIC, BIC, and CAIC all penalize for the numberof parameters in the model (overparameterization),the CLC and NEC instead involve penalties based onthe model entropy, which increases as the degree ofseparation between the latent classes decreases (i.e.,when individual probabilities of class membershipstray far from zero or one). The ICL-BIC implementsboth types of penalty and is the most conservative ofthe criteria presented here, generally favoring fewerclasses. For each index, the model with the minimumvalue is considered optimal. More detail on thesemeasures can be obtained from McLachlan and Peel(2000) or Bauer and Curran (2003a).

These measures too must be used with some cau-tion. Although the behavior of these indices has beenwell studied for unrestricted finite normal mixtures, ithas not yet been investigated thoroughly in an SEMMcontext. This is especially true of fit indices that de-pend on the model entropy, such as the CLC, NEC,and ICL-BIC, because they tend to favor models inwhich classes are distinguished by well-separatedmean vectors, as it is typically in these cases that onecould assign cases to classes with the most precision(Biernacki, Celeux, & Govaert, 1999, 2000; Celeux &Soromenho, 1996; Ramaswamy, DeSarbo, Reibstein,& Robinson, 1993). In SEMM analyses, however, at-tention is often focused instead on the different co-variance matrices of the classes (i.e., how the latentclasses moderate the relationships among the ob-served variables). For this reason, Jedidi et al. (1997a,1997b) recommended determining the number ofclasses based on the BIC and CAIC, with entropycriteria used only subsequently to judge the degree ofseparation between classes.

In addition to these indices, following Jedidi et al.(1997a), we also present the standardized root-mean-square residual (SRMR) for each model. In theSEMM context, the SRMR is calculated by first com-puting the model-implied aggregate covariance matrixaccording to Equation 10. The usual formula for theSRMR is then applied, given as

SRMR =� 2

q�q + 1� �i=1

q

�j=1

i � �sij − �ij�

�sii �sjj�2

, (8)

BAUER AND CURRAN12

Page 11: The Integration of Continuous and Discrete Latent Variable

where sij are elements of the observed aggregate co-variance matrix, �ij are elements of the model-impliedaggregate covariance matrix, and q is the total numberof observed variables. Hu and Bentler (1999) sug-gested that an SRMR of .08 or below represents goodfit in traditional structural equation models. Becausethe covariances are not sufficient statistics for normalmixtures, Jedidi et al. (1997a) argued that the SRMRshould not be used to determine the number of latentclasses in an SEMM. We agree. However, in juxta-position with the other fit statistics, the SRMR doesallow us to see whether the improvement in fit be-tween successive models takes place primarily at thelevel of the covariance matrix (in which case theSRMR should decrease) as opposed to the higher or-der moments (in which case the SRMR should bestable).

We now consider in turn how model misspecifica-tions, violations of distributional assumptions, or un-modeled nonlinearity can all lead to the situation inwhich the optimally fitting model includes spuriouslatent classes.

Model Specification

Consider again the data presented in Figure 1. Sup-pose we know that these data were generated to beconsistent with a bivariate normal distribution for asingle group. We may then view the ellipse in Figure1A as symbolic of a properly specified model. In thiscase, the overall association between the two vari-ables is captured using just one ellipse. However,what if we misspecified the model—for instance, byimposing the constraint that x and y are uncorrelated?Then we could not account for the association be-tween x and y using just one ellipse; we would requiretwo, as in Figure 1B. In other words, we can com-pensate for the lack of fit of the within-class model(which incorrectly imposes orthogonality) by alsomisspecifying the between-class model (estimatingmore classes than groups in the population). In a veryreal sense, the spurious latent classes “absorb” themisspecification of the model.

This relationship between the two parts of themodel can be expressed more formally by consideringthe implied means and covariances of the aggregatedistribution:

µ��,�� = �k=1

K

�kµk��k�, (9)

∑∑��,�� = �k=1

K

�l = k+1

K

�k�l��µk�k� − µl��l���µk��k� − µl(�l���

+ �k=1

K

�k∑∑k��k�, (10)

where � is the vector of class proportions, � is avector of model parameters from all latent classes,and all other notation remains as before. Note thatthese equations follow directly from Equations 4 and5 given for the latent profile model, demonstrating ageneral property of discrete latent variable models forcontinuous data. Of importance, as before, the aggre-gate covariance matrix is partitioned into two compo-nents. The first term in Equation 10 is the between-class component, reflecting the mean differencesbetween the classes, whereas the second term reflectsthe within-class component, or the covariance struc-ture within each class.

There is a fundamental interdependency betweenthe two components of Equation 10. Recall that bothdiscrete latent classes and continuous latent factorshold the potential to reproduce the aggregate covari-ances. This interdependency becomes problematic ifthe structural model applied within the latent classesis misspecified. In this case, the within-class compo-nent will fail to fully reproduce the covariances, andthe model will not fit well. One possible way to im-prove the fit of the model would be to modify thewithin-class portion, for instance, by adding newpaths. However, we could also compensate for thepoor fit of the within-class model by estimating ad-ditional latent classes in the between-class compo-nent.6

We provide an example of this point with a singleartificial data set generated from the linear latentcurve model presented in Figure 4. The repeated mea-sures span five equally spaced time points, they aremultivariate normally distributed in the population,and 50% of their variance is accounted for by theunderlying growth process. Note that the fixed valuesof the factor loadings imply linear change over time.The means of the latent factors indicate that the av-erage trajectory is increasing over time. Both the in-tercept and slope parameters have significant vari-ances and are positively correlated. A single sampleof 600 cases was generated from the model, and the

6 Additional latent classes may also be required when themean structure is misspecified, though we do not explorethat possibility here.

PROBLEMS AND OPPORTUNITIES 13

Page 12: The Integration of Continuous and Discrete Latent Variable

resulting sample statistics are presented in Table 1.Fitting a properly specified single group latent curvemodel to the data yielded good fit, �2(10) � 4.67,p � .91; root-mean-square error of the approximation(RMSEA) � .00.7 We then extended this model totwo classes, and as shown in Table 2, the addition ofa second class was rejected by all measures of fitexcept the AIC. Thus, as one would expect, underproper specification of the within-class structuralmodel, misspecification of the number of latentclasses was not supported by the data.

We next fit a misspecified model to this samesample, allowing between one and eight latent classes.The misspecification was to restrict the covariance

matrix of the latent growth factors to a null matrix,that is, to permit no individual variability in eitherintercepts or slopes within classes. This specificationcorresponds to one popular variant of the SEMM,referred to as latent class growth analysis (LCGA) byB. O. Muthen (2001), and inspired by the semi-parametric groups-based trajectory model of Naginand colleagues (Nagin, 1999). By imposing these re-strictions to the factor covariance matrix, the LCGAmodel implies that all of the individual variability ingrowth is captured by the class mean trajectories (i.e.,fixed effects), and any individual deviations from theclass mean trajectories are random error. This as-sumption is commensurate with the latent profilemodel conceptualization of homogeneity as inter-changeability: Individuals of a given class share asingle trajectory of change over time, and any indi-vidual deviations from that trajectory are construed asrandom error. In fact, the LCGA can be viewed as arestricted latent profile model for repeated measuresthat constrains the within-class means to follow a spe-cific time trend (e.g., linear).

Given the characteristics of the population gener-ating model, the single class LCGA model containsthe correct number of groups but a misspecified struc-tural model that does not allow for systematic indi-vidual variability around the mean trajectory. LCGAmodels with more classes are then doubly misspeci-fied, as they are estimated with both the incorrectnumber of groups and a misspecified within-classstructural model. Intuitively, one might expect thatthis additional misspecification would lead to a dec-rement in model fit. In reality, as Equation 10 dem-onstrates, estimation of additional latent classes (i.e.,the between-class component) predictably allows usto better recover the aggregate covariance matrix ofthe observed data in the presence of a misspecifiedstructural model. We should thus expect the overallmodel fit to improve as we estimate more latent tra-jectory classes even though we are increasingly mis-specifying the number of classes in the model.

The results of fitting the LCGA models are pro-vided in Table 2. Large and significant improvementsin the log-likelihood were obtained with the additionsof a second and third latent trajectory class, but theimprovements thereafter were nonsignificant and in-creasingly negligible. It is also worth noting that with

7 Mplus (Version 2.12) was used to estimate this modeland all models reported hereafter (L. K. Muthen & Muthen,1998).

Table 1Intercorrelations, Means, Standard Deviations, Skew, andKurtosis of Data Generated From the Linear LatentCurve Model (N = 600)

Variable 1 2 3 4 5

1. x0 —2. x1 .477 —3. x2 .403 .507 —4. x3 .368 .504 .533 —5. x4 .291 .440 .493 .510 —

M 1.031 1.851 2.618 3.438 4.235SD 1.362 1.671 2.226 2.621 3.139Skew −.180 −.055 −.152 .078 .094Kurtosis .207 −.086 −.230 −.111 .106

Figure 4. Path diagram displaying the population latentcurve model used to generate data for the demonstration thatmodel misspecification may induce the estimation of latentclasses. � designates the factor mean and designates thefactor variance.

BAUER AND CURRAN14

Page 13: The Integration of Continuous and Discrete Latent Variable

five or more latent classes the log-likelihood actuallydipped below the value of the correctly specified one-group latent curve model, suggesting that the latentclass model was overfitting the data (capturing ran-dom sampling fluctuations). The fit criteria varied interms of their indication of the optimal model, rangingfrom seven classes (the AIC) to two classes (the ICL-BIC). This lack of consensus among fit criteria is notuncommon in empirical applications and simply re-flects differences in the penalty factors applied.

Our immediate concern, however, is not to arrive ata definitive conclusion regarding the optimal numberof latent classes but to show how our ability to repro-duce the aggregate correlation matrix improves asnew classes are added to the model (even though suchclasses do not exist in the population). This is mostdirectly measured by the SRMR. The SRMR of theone-class model indicates that, on average, we areeither under- or overestimating the aggregate correla-tions in Table 1 by about .375. The reason is that theone-class model implies that the off-diagonal ele-ments of the aggregate correlation matrix are zero,when in reality (i.e., within the population) they arenot. The additional latent classes present in the two- toeight-class models allow these off-diagonal elementsto take on nonzero values, where these values areimplied by the between-class mean differences asspecified in Equation 10. Predictably, we see that the

precision with which we recover the aggregate corre-lations improves rapidly as we add latent classes, andwe attain remarkably good fit with three or fourclasses (SRMR3 � .06; SRMR4 � .05).

These results provide a demonstration of the keydependency between the within-class and between-class components of an SEMM that we identified ana-lytically in Equation 10. In our example, a properlyspecified single-group latent curve model providedgood fit to the data (as would be expected given itscorrespondence to the population model). However,when the variance–covariance matrix of the latentgrowth factors was constrained to be zero, multipletrajectory classes were needed to attain comparablefit. This difference is evident in Figure 5. Figure 5Apresents the 95% confidence bands for the individualtrajectories implied by the correctly specified latentcurve model. For comparison, Figure 5B presents theresults of the four-class LCGA. Because individualvariability around the mean trajectories is not permit-ted in LCGA, multiple classes are required to coarselyapproximate the essential features of the correctlyspecified model in Figure 5A.

It is worth noting that a different number of latentclasses might have appeared optimal had we imposeda different model misspecification. For instance, inthe context of LCGA, White, Johnson, and Buyske(2000) suggested placing an autoregressive structure

Table 2Fit Statistics of Correctly Specified One- and Two-Class Latent Curve Models and Latent Class Growth Analyses WithOne Through Eight Classes

Model LL SRMR AIC BIC CAIC CLC NEC ICL-BIC

Latent curve model (individual trajectories distributed around mean trajectory)

One class −6,050.73 .027 12,121.46 12,165.43 12,175.43 12,101.46 1.00 12,165.43Two classes −6,038.69 .026 12,119.38 12,211.72 12,232.72 12,383.52 12.72 12,517.86

Latent class growth model (no individual trajectories—fixed class mean trajectories only)

One class −6,492.43 .374 12,998.85 13,029.63 13,036.62 12,984.84 1.00 13,029.63Two classes −6,175.36* .110 12,380.72 12,446.67 12,461.67 12,570.83 0.35 12,666.78Three classes −6,089.23* .060 12,224.47 12,325.60 12,348.59 12,544.36 0.45 12,691.49Four classes −6,059.81 .050 12,181.62 12,317.93 12,348.92 12,553.69 0.50 12,751.99Five classes −6,043.50 .035 12,165.00 12,336.48 12,375.48 12,659.14 0.64 12,908.61Six classes −6,030.25 .033 12,154.50 12,361.15 12,408.16 12,755.87 0.75 13,056.53Seven classes −6,014.40 .029 12,138.79 12,380.63 12,435.63 12,738.86 0.74 13,090.69Eight classes −6,009.77 .028 12,145.54 12,422.54 12,485.55 12,769.38 0.78 13,172.38

Note. For AIC, BIC, CAIC, CLC, NEC, and ICL-BIC, the minimum value obtained from the series of models is in boldface to indicate themodel with optimal fit. LL � log-likelihood; SRMR � standardized root-mean-square residual; AIC � Akaike’s information criterion; BIC� Bayes’s information criterion; CAIC � consistent AIC; CLC � classification likelihood criterion; NEC � normalized entropy criterion;ICL-BIC � integrated completed likelihood criterion using BIC approximation. Asterisks attached to LL values indicate a significantimprovement in log-likelihood over the preceding model as judged by the test of Lo, Mendell, and Rubin (2001).

PROBLEMS AND OPPORTUNITIES 15

Page 14: The Integration of Continuous and Discrete Latent Variable

on the covariance matrix of the residuals withinclasses. Because this model would allow for nonzerooff-diagonal covariances within classes, it wouldlikely require fewer latent classes to adequately re-cover the aggregate covariances of a latent curvemodel. More generally, our first observation is thatthe number of latent classes needed to optimally fit agiven set of data will depend on the degree of mis-specification of the within-class structural model.

Distributional AssumptionsAs we saw in Figure 2, unlike most standard con-

tinuous latent variable models, discrete latent variablemodels are capable of capturing nonnormality in theobserved variables. This point is widely recognized inthe general literature on finite normal mixture models.For instance, one key use of finite normal mixtures isto approximate nonnormal distributions of an unspeci-

fied form (e.g., Escobar & West, 1995; Ferguson,1983; Land & Nagin, 1996; Nagin, 1999; Roeder &Wasserman, 1997; for a review, see Everitt & Hand,1981, pp. 118–124; Titterington et al., 1985, pp. 18–34). By extension, when the normality assumption isviolated in a standard one-group structural equationmodel, elaboration of the model into an SEMM withmultiple latent classes may improve the ability of themodel to reproduce the observed data distribution. Inthis case, even if the structural equation model is notstructurally misspecified, the additional latent classeswill improve the fit of the model because the summa-tion of the normally distributed class distributions willprovide a better approximation to the nonnormality ofthe aggregate distribution. Bauer and Curran (2003a)extensively explored this point both analytically andempirically in the context of growth mixture model-ing. We thus illustrate the role of nonnormality here(in the broader context of SEMM) with a single em-pirical example.

Our example is a CFA model of two correlatedfactors, each with three indicators. The populationmodel and associated parameter values are presentedin Figure 6. The indicators (x1–x6) were assigned unitloadings and zero intercepts, and 60% of their vari-ance was explained by the associated latent factor.The residuals were independently and normally dis-tributed with variances of .67, and the latent factorswere given zero means, variances of one, and a cor-relation of .71 (2 � .50). All of these parametervalues were selected to be reflective of a model thatmight commonly be encountered in applied research.Of importance, the population distributions of the la-tent variables were modestly nonnormal, with univari-ate skew � 2 and kurtosis � 8. As shown in Figure7, these distributions were marked by a high leftwardpeak and long right tail. A total of 400 cases weregenerated from the model. Data were first generatedfor the latent variables (including the residuals), andthese values were then used to construct the model-implied observed scores (the technique of Vale &Maurelli, 1983, was used to generate values for thenonnormal latent factors). The sample skew and kur-tosis for the first latent factor, �1, were 1.63 and 4.69,respectively, whereas the corresponding values for thesecond latent factor, �2, were 2.02 and 6.42. Thesample statistics for the observed variables are pro-vided in Table 3. Note that the sample skew and kur-tosis of the observed measures are well within therange commonly encountered in applied psychologi-cal research (Micceri, 1989).

Figure 5. Comparison of single-group latent trajectorymodel (A; shaded region is within the implied nonsimulta-neous 95% confidence bands of the individual trajectories)and the four-class latent trajectory class analysis (B; thepercentage of cases in each class is indicated).

BAUER AND CURRAN16

Page 15: The Integration of Continuous and Discrete Latent Variable

We began our analysis by fitting the populationgenerating model to the data. The factors were scaledby setting the factor loadings for x1 and x3 to one andtheir intercepts to zero. All other factor loadings andintercepts were estimated, as were the factor means.Even with the nonnormality of the data, the overall fitof the model was quite good, �2(8) � 3.70, p � .88;RMSEA � .00. This is not surprising, as the degree

of nonnormality in the observed variables was mod-est, and at the level of the first and second moments,the fitted model was correctly specified.

We then proceeded to fit SEMMs with two andthree latent classes (with the knowledge that the popu-lation consists of only one group). The within-classstructural models were of the same form as the popu-lation model (i.e., correctly specified), but the param-

Figure 7. Frequency histogram displaying the distribution of the latent factors (�) in theconfirmatory factor model with nonnormal latent factors (N � 10,000).

Figure 6. Path diagram displaying the population confirmatory factor model used to gen-erate data for the demonstration that nonnormality may induce the estimation of latent classes.In the estimated models the loadings with asterisks were fixed to scale the latent factors. �designates the factor mean and designates the factor variance.

PROBLEMS AND OPPORTUNITIES 17

Page 16: The Integration of Continuous and Discrete Latent Variable

eter values freely varied over classes. Thus, the latentclass variable served to moderate the levels and rela-tionships of the observed variables, consistent withthe view that a homogeneous latent group is definedby its own distinct distribution. The change in log-likelihood between the one- and two-class models waslarge and significant; however, the improvement inlog-likelihood between two and three classes was not.As Table 4 shows, the AIC, BIC, CAIC, CLC, andNEC all also supported the two-class model over theone-class model, though the more conservative ICL-BIC did not. The CLC and NEC, which do not pe-nalize for overparameterization, both indicated thethree-class model to be optimal, though only slightlybetter than the one- or two-class alternative models.Further inspection of the three-class model revealedan improper (negative) residual variance estimate. Onthe basis of the sum of these results, we concludedthat the two-class model was optimal for the data.8

Given the proper specification of the within-classmodel, the improvement in fit associated with esti-mating two classes was due almost entirely to the

ability of the model to approximate the nonnormalityof the data rather than to improved recovery of thecovariance structure, and this was reflected in the sta-bility of the SRMR.

Figure 8 further clarifies how the presence of thetwo latent classes allows the model to capture thenonnormality of the data. Figure 8, A and B, plots thedistribution of �1 that is implied by the standard CFAmodel and the two-class model, respectively, and iscomparable to the histogram presented earlier in Fig-ure 7. Figure 8A shows that the usual assumption ofnormality provides a poor representation of the actualdistribution of the latent variable. In contrast, in Fig-ure 8B, the weighted component distributions of thetwo latent classes, each with a different mean andvariance, combine to imply a nonnormal aggregatedistribution with a much greater resemblance to thehistogram in Figure 7. In fact, the model-impliedskew and kurtosis for the first factor were 1.13 and2.02, respectively, whereas for the second factor, they

8 It is common to estimate mixtures of CFA models con-straining the factor loading matrices to be invariant overclasses so that the latent variables will have the same mean-ing in each class (e.g., Arminger & Stein, 1997; Armingeret al., 1999; Jedidi et al., 1997a, 1997b; Yung, 1997). In-variance constraints are sometimes also placed on the itemintercepts and residual variances. Both sets of constraintswere found tenable for the present model. The fit indiceswe examined changed predictably with the equality con-straints in place. Those indices that balance fit with parsi-mony (e.g., AIC, BIC, CAIC) began to favor the three-classmodel, as far fewer parameters were required per class. Incontrast, those indices that balance fit with classificationquality (e.g., NEC, CLC) began to favor the one-classmodel, as the equality constraints served to make the classesless distinctive from one another. These results point to theinherent complexities associated with judging the fit ofSEMMs.

Table 3Intercorrelations, Means, Standard Deviations, Skew, andKurtosis of Data Generated From the NonnormalCorrelated Factors Model (N = 400)

Variable 1 2 3 4 5 6

1. x1 —2. x2 .589 —3. x3 .604 .586 —4. x4 .439 .469 .454 —5. x5 .423 .450 .438 .641 —6. x6 .443 .468 .443 .642 .688 —

M .048 −.037 .059 −.060 .068 .012SD 1.339 1.284 1.241 1.329 1.328 1.344Skew 0.635 0.973 0.816 1.042 1.057 1.159Kurtosis 0.902 2.799 1.690 2.281 3.453 3.085

Table 4Fit Statistics of a Properly Specified Confirmatory Factor Model With One to Three Classes Applied to NonnormallyDistributed Data Generated From a Homogeneous Population

Model LL SRMR AIC BIC CAIC CLC NEC ICL-BIC

One class −3,498.54 .013 7,035.08 7,110.92 7,129.92 6,997.08 1.00 7,110.92Two classes −3,405.12* .012 6,888.24 7,043.90 7,082.90 6,993.23 0.98 7,226.89Three classes −3,388.11 .013 6,894.21 7,129.71 7,188.71 6,991.54 0.97 7,345.04

Note. For AIC, BIC, CAIC, CLC, NEC, and ICL-BIC, the minimum value obtained from the series of models is in boldface to indicate themodel with optimal fit. LL � log-likelihood; SRMR � standardized root-mean-square residual; AIC � Akaike’s information criterion; BIC� Bayes’s information criterion; CAIC � consistent AIC; CLC � classification likelihood criterion; NEC � normalized entropy criterion;ICL-BIC � integrated completed likelihood criterion using BIC approximation. Asterisks attached to LL values indicate a significantimprovement in log-likelihood over the preceding model as judged by the test of Lo, Mendell, and Rubin (2001).

BAUER AND CURRAN18

Page 17: The Integration of Continuous and Discrete Latent Variable

were 1.81 and 3.91. Although the implied skew andkurtosis values of the two-class model underestimatethe actual sample values (which in practice would notbe known), they are clearly closer to the sample val-ues than the values of zero assumed by the single-class model. In turn, the presence of the two classesbetter approximates the nonnormality of the observedvariable distributions, and this leads to the superior fitof the two-class model. We thus see how latentclasses can serve to capture nonnormality in the ob-served data, though only one group actually exists inthe population.

This single artificial data example provides asimple illustration of the effect of nonnormality onlatent class estimation, showing how the presence oflatent classes can function to approximate a nonnor-mal multivariate distribution (i.e., capture the higherorder moments) and thereby improve model fit, evenwhen only one group truly exists in the population. Ifmodel fit is used as a guide for inferring the correct

number of classes, this may lead to the identificationof classes that do not correspond to true groups in thepopulation. This leads us to our second observation:Nonnormality can induce the estimation of latentclasses even when the structural model is correctlyspecified and only one group exists in the population.

The Assumption of Linearity

In some ways, nonlinearity may be viewed as aspecial case of either model misspecification or non-normality. First, in the presence of nonlinearity, wemay view a standard continuous latent variable modelas misspecified. We may see the assumption of lin-earity as a misspecification of the functional form ofthe relationship, or we may view the misspecificationas one of omitted variables (i.e., polynomial terms)that, if included, would serve to better capture therelationship (McDonald, 1967). It is rare, however,that polynomial terms are tested in latent variablemodels, in part because of the complexity this adds tothe modeling process (Schumacker & Marcoulides,1998). We also consider this case separately for thefundamental reason that, unlike the model misspeci-fications considered before, the population structureof a nonlinear model cannot be recovered by simplyrearranging the linear relationships among the vari-ables; at the least we must add new terms to the modelto approximate the nonlinearities. Second, we mayview nonlinearity as a special case of nonnormality.Multivariate normality implies linearity; hence, non-linearity implies multivariate nonnormality. We con-sider nonlinearity separately, however, because, fromthe standpoint of the continuous latent variable model,it involves a simultaneous violation of both distribu-tional assumptions and the functional form of themodeled relationships.

As Figure 3 illustrates, discrete latent variables cancapture nonlinear relations among the observed mea-sures, so we should expect the presence of (unmod-eled) nonlinear relationships to induce the estimationof latent classes in an SEMM as well. To demonstratethis point, we generated a single sample of 500 casesof data conforming to the population structural equa-tion model given in Figure 9.9 Three indicators (x1–x3) loaded on the exogenous factor �, and three indi-cators (y1–y3) loaded on the endogenous latent factor�. For simplicity (and without loss of generality), the

9 Consistent with Bollen (1995), we represent the nonlin-ear effect in the path model with a saw-toothed arrow.

Figure 8. The distribution of the latent factor �1 implied bya single-group confirmatory factor model (A) comparedwith a two-class model (B; the percentage of cases in eachclass is indicated).

PROBLEMS AND OPPORTUNITIES 19

Page 18: The Integration of Continuous and Discrete Latent Variable

unconditional means of � and � were set to zero andthe total variances of the factors were set to one. Theindicators were assigned unit loadings and zero inter-cepts, and 75% of their variance was explained by theassociated latent factor. The uniquenesses of the in-dicators, the factor �i, and the disturbance of the en-dogenous factor ( i) were independent and normallydistributed. Most important, the relationship betweenthe exogenous factor � and the endogenous factor �was curvilinear; specifically �i � −.5 + .5�i + .5�i

2 + i, where i ∼ N(0, .25). This function implies that75% of the variance in �i is explained by �i. If a linearapproximation is used, only about 25% of the vari-ance in �i is explained by �i. The sample statistics forthe data are presented in Table 5. Note that the non-linear relationship between � and � naturally inducesskew and kurtosis in y1–y3.

We first fit the data with a standard one-groupstructural equation model of the same form as in Fig-ure 9 but with the key exception that the relationbetween � and � was modeled as linear. To scale thelatent variables, we set the loadings of x1 and y1 to oneand their intercepts to zero. All other parameters (in-cluding intercepts–means) were estimated. Usingconventional criteria, we found that the overall fit ofthis model was quite good, despite failure to modelthe nonlinear relationship, �2(8) � 2.43, p � .97;RMSEA � .00. This is not surprising, as conven-tional fit indices take no account of possible nonlinearrelationships and the degree of nonnormality in thedata was quite modest, so would not be expected toinflate the test statistic.

We then extended the model by estimating two andthree latent classes. The form of each within-classstructural model was identical to that of the modelused in the prior analysis, but the parameter valueswere free to vary over classes. As in the prior ex-ample, the latent class variable thus served to moder-ate the parameter values within classes, consistentwith the conceptualization of latent groups as beingcharacterized by different distributions (i.e., meanlevels and relationships between variables). The re-sults of fitting these models are displayed in Table6.10 A significant improvement in the log-likelihood

10 Although we present detailed results for these modelsonly, we did again fit models with invariance constraints onthe factor loadings and the item intercepts and residual vari-ances. As with the previous nonnormal example, the AIC,

Figure 9. Path diagram displaying the population structural equation model used to generatedata for the demonstration that nonlinearity may induce the estimation of latent classes. Thesaw-toothed arrow represents the nonlinear effect of latent factor � on latent factor �. In theestimated models the loadings with asterisks were fixed to scale the latent factors, and theeffect of � on � was assumed to be linear. � designates the mean of �, designates thevariance of �, and � designates the intercept of �.

Table 5Intercorrelations, Means, Standard Deviations, Skew, andKurtosis of Data Generated From the NonlinearTwo-Factor Structural Equation Model (N = 500)

Variable 1 2 3 4 5 6

1. x1 —2. x2 .751 —3. x3 .770 .751 —4. y1 .409 .375 .371 —5. y2 .410 .382 .377 .698 —6. y3 .419 .395 .397 .746 .736 —

M −0.001 −0.007 0.033 0.075 0.109 0.093SD 1.158 1.169 1.179 1.171 1.176 1.134Skew −0.002 −.131 −.173 .726 .820 .933Kurtosis −0.278 −0.245 −0.199 1.349 1.661 2.176

BAUER AND CURRAN20

Page 19: The Integration of Continuous and Discrete Latent Variable

of the model was obtained with the addition of thesecond latent class, but not with the addition of thethird latent class. The AIC, BIC, and CAIC all alsoindicated that the two-class model was superior to thestandard one-group structural equation model. Ofthese, only the AIC also supported the estimation ofthree classes. The three-class model, however, con-tained an improper (negative) residual variance esti-mate. As in the strictly nonnormal data exampleabove, the SRMR showed high stability, again indi-cating that the improvement in fit associated with add-ing latent classes did not occur at the level of thecovariance matrix (which captures only linear asso-ciations). None of the three criteria involving entropypenalties (i.e., the CLC, NEC, and ICL-BIC) sup-ported more than one class, although in fact the mean-estimated probability of class membership of the in-dividuals within each class was approximately .83 forthe two-class solution and .80 for the three-class so-lution. From these results we concluded that twoclasses were optimal for the data but that the classeswere likely not well separated.

To further clarify the results, in Figure 10 we pre-sent a graph of the 95% confidence ellipses from thetwo-class model overlaid on a scatter plot of the truescores for the two latent factors. Note that althoughthe relationship between � and � is linear withinclasses, together the two ellipses provide an almostpiecewise approximation of the true nonlinear rela-tionship. The approximation is visually somewhatcrude but is clearly better than what a single ellipsewould allow (e.g., Figure 3A), and this is more for-mally reflected in the significantly better log-

likelihood and superior AIC, BIC, and CAIC of thetwo-class model. Figure 10 also reveals why the two-class model is not supported by the entropy-basedcriteria. Despite the fact that the relationship betweenthe latent variables is significantly negative in oneclass and significantly positive in the other (i.e., mod-erated by the latent class variable), the classes are notconsidered well separated because the densest area ofpoints is found in the middle of the graph where the

BIC, and CAIC of the two- and three-class models all im-proved as more invariance constraints were imposed, but inthis case the entropy-based CLC and NEC changed little.

Figure 10. Scatter plot of the true scores for latent factors� and �. The dashed line plots the nonlinear effect of latentfactor � on latent factor � in the population, and the 95%confidence ellipses (estimated from the sample) show howthe nonlinear effect can be captured via the discrete latentclass variable (the percentage of cases in each class is in-dicated).

Table 6Fit Statistics of a Linear Structural Equation Model With One to Three Classes Applied to Data Generated From aHomogeneous Population Model That Includes a Nonlinear Relationship Between Two Latent Variables

Model LL SRMR AIC BIC CAIC CLC NEC ICL-BIC

One class −3,741.86 .009 7,521.72 7,601.80 7,620.80 7,483.72 1.00 7,601.80Two classes −3,623.92* .005 7,325.83 7,490.20 7,529.20 7,578.46 1.40 7,820.83Three classes −3,580.29 .009 7,278.57 7,527.23 7,586.23 7,549.48 1.20 7,916.14

Note. For AIC, BIC, CAIC, CLC, NEC, and ICL-BIC, the minimum value obtained from the series of models is in boldface, to indicate themodel with optimal fit. LL � log-likelihood; SRMR � standardized root-mean-square residual; AIC � Akaike’s information criterion; BIC� Bayes’s information criterion; CAIC � consistent AIC; CLC � classification likelihood criterion; NEC � normalized entropy criterion;ICL-BIC � integrated completed likelihood criterion using BIC approximation. Asterisks attached to LL values indicate a significantimprovement in log-likelihood over the preceding model as judged by the test of Lo, Medell, and Rubin (2001).

PROBLEMS AND OPPORTUNITIES 21

Page 20: The Integration of Continuous and Discrete Latent Variable

two ellipses overlap. This specific pattern is a conse-quence of our choice of a normal distribution for � andis certainly not a necessary feature of nonlinear rela-tionships. Hence, we should not expect entropy-basedcriteria to offer general protection against the estima-tion of spurious latent classes when linearity assump-tions are violated.

In summary, the nonlinear relationship between �and � necessitated estimation of two latent classes toachieve optimal fit to the data (by most measures)even though the population consisted of just onegroup. This was true because the standard structuralequation model, assuming linearity, could not fullycapture the relationship between � and �. The nonlin-earity of this relationship could, however, be approxi-mated with the estimation of a second latent class.Our third and final observation is thus that unmodelednonlinearity can induce the estimation of latentclasses even when no latent groups are present.

Potential Problems

In this section we consider some of the potentiallyproblematic implications of the observations expli-cated above, particularly when these models are usedto make inferences about the presence of qualitativelydistinct unobserved groups in the population. Specifi-cally, there are several alternative and competing ex-planations for the estimation of multiple latentclasses, including (a) that the structural model is mis-specified, (b) that the data is nonnormal, and (c) thatthe relationships among the latent and/or observedvariables are nonlinear (or some combination of thethree). Each of these alternative explanations impliesthat a core assumption of traditional structural equa-tion modeling has been violated and is compensatedfor by latent classes that do not reflect true groups inthe population. We now discuss how the complica-tions these issues raise can be mitigated, at least tosome degree.

Model Specification

The fact that misspecification of the within-classstructural model may induce the estimation of spuri-ous latent classes is especially disconcerting if weconsider that, in practice, we never know the truegenerating model. Indeed, Cudeck and Browne (1983)and MacCallum, Browne, and Sugawara (1996) haveremarked that we may reasonably regard all structuralequation models as misspecified because all modelsare approximations of reality (see also Meehl, 1967).

Given this, how can we have confidence that the la-tent classes in any given SEMM are not due simply toa poorly specified structural model?

At issue is the fundamental interdependency be-tween the within-class and between-class componentsof the model that we identified in Equation 10. Al-though we focused on the possibility that misfit of thewithin-class model would induce the estimation ofspurious latent classes, the interdependence of thesetwo components of the model works both ways. AsMeredith and Horn (2001) recently noted, failure tomodel heterogeneity due to the mixture of unobservedgroups may lead to model misfit even if the within-class model structure is identical for all groups andproperly specified, a phenomenon known more gen-erally as Simpson’s paradox (Pearl, 2000). In thiscase, the aggregate covariance matrix is augmentedby apparent “common variance” due to the unmod-eled between-class mean differences. Post hoc modelmodifications to include additional latent factorsmight then absorb this additional common varianceand lead to good fit, though the added factors wouldbe an entirely spurious consequence of the failure tomodel the true heterogeneity of the population.

This interdependency between the within-class andbetween-class components of the model was dealtwith explicitly in the latent profile model through useof a clear decision rule for the partitioning of covari-ances—namely, that the observed covariances shouldbe reproduced (that is, explained) exclusively by thebetween-class component of the model (the discretelatent variable). We propose consideration of a similardecision rule that would prevent estimating latentclasses solely to compensate for misspecifications ofthe structural model. The rule we propose is to deter-mine the optimal number of latent classes for anSEMM by using a model that is, by definition, notstructurally misspecified: an unrestricted (or satu-rated) model.

Using this decision rule, the modeling processwould consist of two steps. First, a series of unre-stricted multivariate finite normal mixture models (orsaturated SEMMs) would be estimated, correspond-ing to the model in Equation 6.11 The number of latent

11 If exogenous predictors were included in the model, anunrestricted conditional finite normal mixture model wouldbe required instead. In this case, the unrestricted modelwould include all possible regression paths from the exog-enous predictors to the endogenous variables, and the mix-

BAUER AND CURRAN22

Page 21: The Integration of Continuous and Discrete Latent Variable

classes would then be determined from the compara-tive fit of these unrestricted models. Because no re-strictions are imposed on the class means and covari-ances, we know a priori that the latent classes are notestimated solely to compensate for model misspecifi-cations. In essence, Step 1 answers the question, Howmany covariance matrices and/or mean vectors areneeded to adequately reproduce the observed data?Consistent with the conceptualization of a homogene-nous latent group as being defined by a shared distri-bution, this question may be translated to mean, Howmany latent groups do the data suggest are present?

The second step then addresses the question, As-suming the number of latent classes identified in Step1 is correct, does the hypothesized model structureadequately reproduce the within-class means and co-variances? If we set the number of classes to be equalto the optimal number identified in Step 1, the hy-pothesized SEMM would then be estimated with theprimary goal of recovering the class means and co-variances from the unrestricted model. This corre-sponds to the usual objective of more traditional SEManalyses to recover the observed sample means andcovariances. Of importance, just as in traditionalstructural equation modeling, we can formally testhow well the SEMM in Step 2 succeeds in this task.As Yung (1997) pointed out, the unrestricted (or satu-rated) model chosen in Step 1 and the fitted model inStep 2 are formally nested, allowing for the compu-tation of a standard chi-square difference test of over-all model fit, much as is done in traditional multiple-groups structural equation modeling. In fact, on thebasis of this result, Arminger and Stein (1997) advo-cated general use of this two-step approach to esti-mating SEMMs for the express purpose of testing thefit of the structural model (assuming the number oflatent classes is correct). The current results suggestthat the two-step approach may be equally importantfor avoiding the estimation of spurious latent classesdue to misfit of the structural model.

Although theoretically compelling, this two-stepstrategy has several practical drawbacks. First, the un-restricted models estimated in Step 1 will generallyuse far more parameters than the eventual SEMM. Assuch, statistics such as the AIC, BIC, and ICL-BICthat penalize for the number of estimated parameters

may tend to be overly conservative (i.e., identifyfewer latent classes than a correctly specified, orgood-fitting, SEMM with fewer parameters). A re-lated problem concerns estimation. Unrestricted mul-tivariate finite normal mixture models are known tohave very poorly behaved likelihood functions, espe-cially when variance–covariance parameters are per-mitted to differ between classes (McLachlan & Peel,2000, pp. 94–96). One advantage of SEMM is that therestrictions imposed in the within-class structuralmodel may make the model more stable (Blåfield,1980; Dolan & van der Maas, 1998; Jedidi et al.,1997b). Beginning SEMM analyses with an unre-stricted model would then potentially counteract thisimportant advantage. Given these complexities, itmay be fruitful to consider alternative approaches tothe same issue, such as the development and use ofsensitivity indices that would signal latent class esti-mation due to misspecification of the structuralmodel, or the use of residual diagnostics rather thanmodel fit indices to identify the optimal number oflatent classes (e.g., Lindsay & Roeder, 1992).

Nonnormality and Nonlinearity

The fact that latent classes may be estimated toaccommodate nonnormality in the observed data ornonlinear relationships between variables is a lesstractable problem. The reason is that nonnormalityand/or nonlinearity in the aggregate data is a naturaloutcome of the mixture of unobserved groups, evenwhen the data are normally distributed and the vari-ables are linearly related within groups (Teicher,1960; see Bauer & Curran, 2003a, for a more detaileddiscussion of this point). This would be the case if, forinstance, the two class distributions in Figure 8B, orthe two ellipses in Figure 10, corresponded to truegroups. Of course, there are countless other reasonsthat nonnormality or nonlinearities could arise withina truly homogeneous population. Nonnormality couldbe a natural feature of the distribution of the phenom-enon of interest (e.g., alcohol use in adolescence),could be a product of the sampling design (e.g., un-even selection from a normal population distribution),or could be due to measurement properties (e.g., flooror ceiling effects, analysis of counts or proportions,etc.), among other possibilities. Similarly, nonlinearbivariate and trivariate relationships are ubiquitous inpsychological theory, such as the case in which twovariables interact in the prediction of a third.

The two-step procedure advocated above would beineffective in discriminating between these alternative

ture would be based on the residual covariance matrices andmean vectors of the latent classes (see Arminger & Stein,1997, for further detail).

PROBLEMS AND OPPORTUNITIES 23

Page 22: The Integration of Continuous and Discrete Latent Variable

explanations, as it would incorrectly assume that nor-mality and linearity hold within the latent groups. An-other possibility is to apply additional consistencychecks to the model. For instance, whereas some non-normal distributions may be consistent with a mixtureof normals, others may not be. On the basis of thisidea, Rindskopf (2003) noted that, from a Bayesianapproach, one might consider applying posterior pre-dictive model assessment techniques to gauge theability of the model to reproduce the data. Similarly,B. Muthen (2003) formulated a test designed to assessthe fit of an SEMM to the higher order moments ofthe observed data. If the implied multivariate skewand kurtosis of the SEMM do not correspond to theobserved values, then this would be cause for rejec-tion of the model. One interpretation of this resultwould be that the mixture hypothesis is incorrect andthat the nonnormality in the data likely originatesthrough some other mechanism than the pooling ofunobserved groups. Another interpretation, however,is that too few latent classes have been estimated, andhence the approximation that the mixture distributionprovides to the observed distribution is not yet ad-equate (Bauer & Curran, 2003b). Clearly, further re-search on this issue is needed.

Promising Opportunities

Up to this point, we have considered the chief goalof SEMM analyses to be the identification of distinc-tive subgroups within the population for whom dif-ferent models may hold. This presentation is consis-tent with the introduction of SEMMs by Jedidi et al.(1997a, 1997b) and Arminger and colleagues (Arm-inger & Stein, 1997; Arminger, Stein, & Wittenberg,1999) and was made most explicit by Dolan and vander Maas (1998), who made the following statement:

In so-called direct applications of finite mixture model-ing, . . . the aim is to determine the number and the typeof components in the mixture, to estimate the unknownparameters, and to assign the cases to their respectivecomponents. In indirect applications the finite mixturemodel is employed as a mathematical device, for ex-ample, to approximate intractable heavy-tailed distribu-tions. In such applications, the underlying componentsdo not necessarily have a physical interpretation. Herewe are concerned with fitting multivariate normal finitemixtures in direct applications subject to structural equa-tion modeling. . . . (pp. 227–228)

What we have shown thus far is that it is possiblethat the SEMM is really serving an indirect functionwhen a direct interpretation is desired. However, it

may be argued that in empirical applications we cannever really know which function the model is serv-ing. From this perspective, attempting to recover truelatent subgroups is a fool’s errand and should not bethe primary focus of the investigation. Rather, themerit of the model should be based on its ability torepresent the data in a theoretically meaningful way(Cudeck & Henly, 2003). As we have expressed else-where (Bauer & Curran, 2003b), we do not fully sharethis view; however, we do feel that, in some cases,indirect applications of the SEMM can yield scientifi-cally useful information. It is in this spirit that we nowreinterpret our observations on SEMM analyses interms of the opportunities they may provide to over-come some of the traditional limiting assumptions ofstandard continuous latent variable models, particu-larly when one is willing to relinquish direct interpre-tations of the latent classes as true population sub-groups.

Model Specification

There may be some situations in which one wouldprefer to summarize an admittedly continuous latentvariable using discrete latent classes. A salient ex-ample is the LCGA model. A one-class LCGA modelholds that there is a single fixed trajectory in thesample and that there are no systematic individualdifferences around that trajectory over time. For mostgrowth modeling applications this would be an unre-alistic assumption, as there is often reason to believethat individuals differ both in overall level and rate ofchange over time (e.g., Curran, 2000; Curran, Stice, &Chassin, 1997). However, by imposing this assump-tion, we require additional latent classes in order torecover the observed covariances (as in the latent pro-file model). By requiring that the between-class por-tion of the model account for the relations among therepeated measures over time, we can summarize thepossibly continuous distribution of individual trajec-tories using a small set of class mean trajectories.

The value of this approach was emphasized byLand and Nagin (1996) and Nagin (1999), who sug-gested that the class trajectories be interpreted aspoints of support for the continuous distribution ofindividual differences, analogous to the bars of asample histogram. From this perspective, Figure 5Byields much important summary information. It cor-rectly suggests that the individual trajectories aresymmetrically distributed and that initial levels andrates of change are positively correlated. On the basisof these observations, we might feel more confident

BAUER AND CURRAN24

Page 23: The Integration of Continuous and Discrete Latent Variable

estimating a standard one-group latent curve model,which assumes the individual intercepts and slopesare normally distributed and linearly related. Ofcourse, it is possible that such a plot would revealasymmetries in the distribution of the trajectories, orthat the individual intercepts and slopes of the trajec-tories are nonlinearly related, calling into question theusefulness of a standard latent curve model and sug-gesting the need for a semiparametric or nonparamet-ric analysis (Nagin, 1999; Vermunt & van Dijk,2001). In our opinion, it is under conditions such asthese that SEMMs offer the most exciting (and po-tentially controversial) new modeling opportunities.12

Nonnormality and Nonlinearity

Traditional structural equation models (and argu-ably all techniques under the umbrella of the generallinear model) are often applied under the tacit as-sumptions that the data are multivariate normal andthat the variables of the model are linearly related. Aswe have shown, SEMMs do not make these assump-tions. Through the inclusion of latent classes, SEMMsmay accommodate both nonnormally distributed andnonlinearly related continuous variables. Given this(and assuming sufficient power), if only one latentclass is needed to optimally summarize the data (andhere we assume a properly specified or unrestrictedmodel), then it is likely that the data are approxi-mately normally distributed and that the relationshipsamong variables are close to linear. Alternatively, ifmore than one class is required (again assuming aproperly specified or unrestricted model), then theseassumptions of traditional structural equation model-ing must be rejected.

We are not proposing the use of SEMM as simplyanother tool for testing the assumptions of traditionalstructural equation modeling. Other tools for this taskare readily available (e.g., Mardia, 1985). Further,techniques exist aside from SEMM for explicitly ac-commodating violation of these assumptions. For in-stance, Browne (1984), Satorra and Bentler (1994),and Bollen (1996) have discussed robust or distribu-tion-free methods for estimating structural equationmodels in the presence of nonnormal data, andSchumacker and Marcoulides (1998) presented sev-eral approaches to modeling nonlinear relationships instructural equation modeling. However, we submitthat SEMM offers new and unique opportunities tomore fully explore these issues. For instance, currentmethods for estimating structural equation modelswith nonnormal outcomes typically only provide a

means of obviating or correcting for the problem ofnonnormality. SEMM instead capitalizes on the addi-tional information available with nonnormal out-comes and provides a means to explicitly map out thedistributions of the latent variables (as in Figure 8B).Multimodality of these distributions, although not anecessary requirement of a normal mixture, wouldstrongly suggest the presence of more than one popu-lation (McDonald, 1967, p. 60; Waller & Meehl,1998, chap. 5). Similarly, the techniques reviewed inSchumacker and Marcoulides for modeling nonlinearrelationships among latent variables are currentlycumbersome to implement and require strong theoryabout the functional form of the relationships. In con-trast, SEMM can be used in a more exploratory mode.For instance, plotting the results of the SEMM (as inFigure 10) may suggest the presence of a nonlinearrelationship between the continuous latent variables.In both of these cases, SEMM yields new informationon the phenomenon of interest that could not be ob-tained easily from a traditional one-group structuralequation model.

Recommendations

The observations that we have explored in this ar-ticle have important implications for the practical useof SEMMs in applied research. We conclude by draw-ing these implications together into an explicit list ofrecommendations for using SEMMs in practice. Ofimportance, these recommendations should be re-garded as tentative and subject to debate and laterrevision in light of additional research on these mod-els.

Recommendation 1: Determine what function thelatent classes are intended to serve. By this we meanthat applied researchers should decide a priori whethera direct interpretation of the latent classes, in terms ofqualitatively distinct latent subgroups, is desired or ifthe latent classes are to serve one of the indirect func-tions noted above. In indirect applications, we person-ally believe inferences should be reserved for the ag-gregate population (rather than within-class parameterestimates), though others may disagree. In direct ap-plications, the latent classes are assumed to represent

12 The potential controversy over these uses of the modelwas reflected in two reviews of this article with equallystrong but opposing reactions (with one reviewer feeling wehad gone too far and the other feeling we had not gone farenough).

PROBLEMS AND OPPORTUNITIES 25

Page 24: The Integration of Continuous and Discrete Latent Variable

true groups in the population, a stronger inferencerequiring a greater burden of proof. Our remainingrecommendations are given primarily for direct appli-cations of the model.

Recommendation 2: Attempt to ascertain whatfunction the latent classes are actually serving. Toavoid estimating latent classes to accommodate mis-specifications of the structural model, consider usingthe two-step procedure outlined above to determinethe optimal number of latent classes. After selectingthe optimal number of latent classes, plot the model-implied distributions of the latent and/or observedvariables (as in Figures 8 and 10) to better understandhow the latent classes are capturing nonnormality ornonlinearity in the data and apply formal tests to as-sess the fit between the model and data. Considerwhether, aside from the mixture of unobservedgroups, there are other possible sources contributingto the nonnormality or nonlinearity of the measuredvariables. Attempt to rule out these alternativesources. For instance, examine univariate distribu-tions to see if they exhibit floor or ceiling effects.Evaluate the tenability of the assumption that normal-ity represents homogeneity (e.g., infrequent countvariables would be expected to show nonnormalityeven in the absence of a mixture, so a finite normalmixture would be inappropriate). Also consider thenature of the sampling design and whether it mayhave distorted the shape of the sample distributionrelative to the population distribution (e.g., individu-als self-selecting into the study with probability re-lated to their level on the construct of interest).

Recommendation 3: Embark on a program of con-struct validation. Accumulate evidence in favor of thehypothesis that the population indeed consists of amixture of unobserved latent groups. Are the charac-teristics of the latent groups consistent with a prioritheoretical predictions? Are the latent groups distin-guished in multiple samples and/or by multiple mea-sures? Can they be distinguished by differential rela-tions to precursor variables, or do they show distinctlydifferent sequelae in subsequent assessments that areconsistent with the motivating theoretical model? SeeBauer and Curran (2003b) for further discussion onthe problem of construct validation in SEMMs, with aparticular emphasis on growth mixture models.

Recommendation 4: Most important, take care indrawing conclusions from the fit of a single model. AsRecommendation 3 implies, the construct validationof the latent classes will often require a programmaticseries of research studies. Determining that the opti-

mal model for a given sample includes several latentclasses on the basis of model fit statistics is not suf-ficient evidence for concluding that the population isa heterogeneous mixture of latent groups. The latentclasses may be serving an indirect function even whena direct interpretation is desired; thus, the conclusionsdrawn from a single model should be guarded. Alter-native hypotheses for the latent classes should benoted and, where possible, evaluated.

Concluding Remarks

We have attempted to elucidate some of the impli-cations that may arise from integrating continuous anddiscrete latent variable models. Although these mod-els have historically been viewed as alternative waysof representing individual differences, the more recentSEMM allows for the simultaneous estimation of bothcontinuous and discrete latent variables. It is the in-herent flexibility of the SEMM that leads both to po-tential problems and to promising new opportunities.This flexibility allows for violation of assumptions inone part of the model (i.e., the structural model) to beaccommodated by changes to the other (i.e., the latentclass model). This is problematic if the goal of theanalysis is the identification of qualitatively distinctgroups, as it suggests that the latent classes may in-stead serve simply to accommodate assumption vio-lations of the structural model. Alternatively, we mayeschew conclusions about the true structure of indi-vidual differences (i.e., whether latent groups exist inthe population or not) and knowingly capitalize on theflexibility of the model to capture phenomena that arenot readily tractable otherwise. Using the models inthis way avoids reification of the latent classes yetretains the ability to explore complex features of thedata in ways that up to now have not been easilyaccessible.

References

Arminger, G., & Stein, P. (1997). Finite mixtures of covari-ance structure models with regressors. Sociological Meth-ods and Research, 26, 148–182.

Arminger, G., Stein, P., & Wittenberg, J. (1999). Mixturesof conditional mean- and covariance-structure models.Psychometrika, 64, 475–494.

Bartholomew, D. J. (1987). Latent variable models and fac-tor analysis. London: Griffin.

Bauer, D. J., & Curran P. J. (2003a). Distributional assump-tions of growth mixture models: Implications for overex-

BAUER AND CURRAN26

Page 25: The Integration of Continuous and Discrete Latent Variable

traction of latent trajectory classes. Psychological Meth-ods, 8, 338–363.

Bauer, D. J., & Curran, P. J. (2003b). Overextraction of la-tent trajectory classes: Much ado about nothing? Reply toRindskopf (2003), Muthén (2003), and Cudeck andHenly (2003). Psychological Methods, 8, 384–383.

Biernacki, C., Celeux, G., & Govaert, G. (1999). An im-provement of the NEC criterion for assessing the numberof clusters in a mixture model. Pattern Recognition Let-ters, 20, 267–272.

Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessinga mixture model for clustering with the integrated com-pleted likelihood. IEEE Transactions on Pattern Analysisand Machine Intelligence, 22, 719–725.

Blåfield, E. (1980). Jyvaskyla studies in computer science,economics & statistics 2: Clustering of observations fromfinite mixtures with structural information. Jyvaskyla,Finland: Jyvaskyla University.

Bollen, K. A. (1989). Structural equations with latent vari-ables. New York: Wiley.

Bollen, K. A. (1995). Structural equation models that arenonlinear in latent variables: A least-squares estimator. InP. M. Marsden (Ed.), Sociological methodology (pp.223–252). Oxford, England: Blackwell.

Bollen, K. A. (1996). An alternative two stage least squares(2SLS) estimator for latent variable equations. Psy-chometrika, 61, 109–121.

Browne, M. W. (1984). Asymptotic distribution free meth-ods in analysis of covariance structures. British Journalof Mathematical and Statistical Psychology, 37, 62–83.

Celeux, G., & Soromenho, G. (1996). An entropy criterionfor assessing the number of clusters in a mixture model.Journal of Classification, 13, 195–212.

Cudeck, R., & Browne, M. W. (1983). Cross-validation ofcovariance structures. Multivariate Behavioral Research,18, 141–167.

Cudeck, R., & Henly, S. J. (2003). A realistic perspective onpattern representation in growth data: Comment on Bauerand Curran (2003). Psychological Methods, 8, 378–383.

Curran, P. J. (2000). A latent curve framework for studyingdevelopmental trajectories of adolescent substance use. InJ. Rose, L. Chassin, C. Presson, & J. Sherman (Eds.),Multivariate applications in substance use research (pp.1–42). Hillsdale, NJ: Erlbaum.

Curran, P. J., Stice, E., & Chassin, L. (1997). The relationbetween adolescent and peer alcohol use: A longitudinalrandom coefficients model. Journal of Consulting andClinical Psychology, 65, 130–140.

Dolan, C. V., & van der Maas, H. L. J. (1998). Fitting mul-tivariate normal finite mixtures subject to structural equa-tion modeling. Psychometrika, 63, 227–253.

Escobar, M. D., & West, M. (1995). Bayesian density esti-mation and inference using mixtures. Journal of theAmerican Statistical Association, 90, 577–588.

Everitt, B. S., & Hand, D. J. (1981). Finite mixture distri-butions. London: Chapman & Hall.

Ferguson, T. S. (1983). Bayesian density estimation viamixtures of normal distributions. In M. H. Rizvi, J. S.Rustagi, & D. Siegmund (Eds.), Recent advances in sta-tistics (pp. 287–302). New York: Academic Press.

Galton, F. R. S. (1888). Co-relations and their measure-ment, chiefly from anthropometric data. Proceedings ofthe Royal Society of London, 45, 135–145.

Gibson, W. A. (1959). Three multivariate models: Factoranalysis, latent structure analysis, and latent profileanalysis. Psychometrika, 24, 229–252.

Hu, L., & Bentler, P. M. (1999). Cutoff criterion for fitindices in covariance structure analysis: Conventionalcriteria versus new alternatives. Structural EquationModeling, 6, 1–55.

Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997a). Finite-mixture structural equation models for response-basedsegmentation and unobserved heterogeneity. MarketingScience, 16, 39–59.

Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997b).STEMM: A general finite mixture structural equationmodel. Journal of Classification, 14, 23–50.

Joreskog, K. G. (1971). Simultaneous factor analysis in sev-eral populations. Psychometrika, 36, 409–426.

Joreskog, K. G. (1973). A general method for estimating alinear structural equation system. In A. S. Goldberger &O. D. Duncan (Eds.), Structural equation models in thesocial sciences (pp. 85–112). New York: AcademicPress.

Kaplan, D. (2000). Structural equation modeling: Founda-tions and extensions. Thousand Oaks, CA: Sage.

Land, K. C., & Nagin, D. S. (1996). Micro-models of crimi-nal careers: A synthesis of the criminal careers and lifecourse approaches via semiparametric mixed Poissonmodels with empirical applications. Journal of Quantita-tive Criminology, 12, 163–191.

Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structureanalysis. Boston: Houghton Mifflin.

Lindsay, B. G., & Roeder, K. (1992). Residual diagnosticsfor mixture models. Journal of the American StatisticalAssociation, 87, 785–794.

Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing thenumber of components in a normal mixture. Biometrika,88, 767–778.

MacCallum, R. C., Browne, M. W., & Sugawara, H. M.(1996). Power analysis and determination of sample size

PROBLEMS AND OPPORTUNITIES 27

Page 26: The Integration of Continuous and Discrete Latent Variable

for covariance structure modeling. Psychological Meth-ods, 1, 130–149.

Mardia, K. V. (1985). Mardia’s test of multinormality. In S.Kotz & N. L. Johnson (Eds.), Encyclopedia of statisticalsciences (Vol. 5, pp. 217–221). New York: Wiley.

McDonald, R. P. (1967). Nonlinear factor analysis (Psy-chometric Monograph No. 15). Richmond, VA: ByrdPress.

McLachlan, G., & Peel, D. (2000). Finite mixture models.New York: Wiley.

Meehl, P. E. (1965). Detecting latent clinical taxa by fal-lible quantitative indicators lacking an accepted criterion(Report No. PR-65-2). Minneapolis: University of Min-nesota, Research Laboratories of the Department of Psy-chiatry.

Meehl, P. E. (1967). Theory testing in psychology and phys-ics: A methodological paradox. Philosophy of Science,34, 103–115.

Meehl, P. E. (1968). Detecting latent clinical taxa: II. Asimplified procedure, some additional hitmax cut loca-tors, a single-indicator method, and miscellaneous theo-rems (Report No. PR-68-4). Minneapolis: University ofMinnesota, Research Laboratories of the Department ofPsychiatry.

Meehl, P. E. (1995). Bootstrap taxometrics: Solving theclassification problem in psychopathology. AmericanPsychologist, 50, 266–275.

Meehl, P. E., & Golden, R. R. (1982). Taxometric methods.In P. Kendall & J. Butcher (Eds.), Handbook of researchmethods in clinical psychology (pp. 127–182). NewYork: Wiley.

Meredith, W., & Horn, J. (2001). The role of factorial in-variance in modeling growth and change. In A. Sayer &L. Collins (Eds.), New methods for the analysis of change(pp. 203–240). Washington, DC: American Psychologi-cal Association.

Micceri, T. (1989). The unicorn, the normal curve, and otherimprobable creatures. Psychological Bulletin, 105, 156–166.

Molenaar, P. C. M., & von Eye, A. (1994). On the arbitrarynature of latent variables. In A. von Eye & C. C. Clogg(Eds.), Latent variable analysis: Applications for devel-opmental research (pp. 226–242). Thousand Oaks, CA:Sage.

Muthen, B. (2003). Statistical and substantive checking ingrowth mixture modeling: Comment on Bauer and Cur-ran (2003). Psychological Methods, 8, 369–377.

Muthen, B. O. (2001). Second-generation structural equa-tion modeling with a combination of categorical and con-tinuous latent variables: New opportunities for latentclass/latent growth modeling. In A. Sayer & L. Collins

(Eds.), New methods for the analysis of change (pp. 291–322). Washington, DC: American Psychological Associa-tion.

Muthen, B. O. (2002). Beyond SEM: General latent variablemodeling. Behaviormetrika, 29, 81–117.

Muthen, B. O., & Muthén, L. K. (2000). Integrating person-centered and variable-centered analyses: Growth mixturemodeling with latent trajectory classes. Alcoholism:Clinical and Experimental Research, 24, 882–891.

Muthen, B., & Shedden, K. (1999). Finite mixture modelingwith mixture outcomes using the EM algorithm. Biomet-rics, 55, 463–469.

Muthen, L. K., & Muthen, B. O. (1998). Mplus (Version 2)[Computer software and manual]. Los Angeles: Muthen& Muthen.

Nagin, D. (1999). Analyzing developmental trajectories: Asemi-parametric, group-based approach. PsychologicalMethods, 4, 139–157.

Nagin, D., & Land, K. C. (1993). Age, criminal careers, andpopulation heterogeneity: Specification and estimation ofa nonparametric, mixed Poisson model. Criminology, 31,327–362.

Pearl, J. (2000). Causality. Cambridge, England: Cam-bridge University Press.

Pearson, K. (1894). Contributions to the mathematicaltheory of evolution. Philosophical Transactions of theRoyal Society of London, A, 185, 71–110.

Ramaswamy, V., DeSarbo, W. S., Reibstein, D. J., & Rob-inson, W. T. (1993). An empirical pooling approach forestimating marketing mix elasticities with PIMS data.Marketing Science, 12, 103–124.

Rindskopf, D. (2003). Mixture or homogeneous? Commenton Bauer and Curran (2003). Psychological Methods, 8,364–368.

Roeder, K., & Wasserman, L. (1997). Practical Bayesiandensity estimation using mixtures of normals. Journal ofthe American Statistical Association, 92, 894–902.

Satorra, A., & Bentler, P. M. (1994). Corrections to teststatistics and standard errors in covariance structureanalysis. In A. von Eye & C. C. Clogg (Eds.), Latentvariable analysis: Applications for developmental re-search (pp. 399–419). Thousand Oaks, CA: Sage.

Schumacker, R. E., & Marcoulides, G. A. (1998). Interac-tion and nonlinear effects in structural equation model-ing. Mahwah, NJ: Erlbaum.

Spearman, C. (1904). General intelligence, objectively de-termined and measured. American Journal of Psychol-ogy, 15, 201–293.

Stigler, S. M. (1986). The history of statistics: The measure-ment of uncertainty before 1900. Cambridge, MA: Har-vard University Press.

BAUER AND CURRAN28

Page 27: The Integration of Continuous and Discrete Latent Variable

Teicher, H. (1960). On the mixture of distributions. Annalsof Mathematical Statistics, 31, 55–73.

Thurstone, L. L. (1935). The vectors of the mind. Chicago:University of Chicago Press.

Thurstone, L. L. (1947). Multiple-factor analysis. Chicago:University of Chicago Press.

Titterington, D. M., Smith, A. F. M., & Makov, U. E.(1985). Statistical analysis of finite mixture distributions.Chichester, England: Wiley.

Vale, C. D., & Maurelli, V. A. (1983). Simulating multi-variate nonnormal distributions. Psychometrika, 48, 465–471.

Verbeke, G., & Lesaffre, E. (1996). A linear mixed-effectsmodel with heterogeneity in the random-effects popula-tion. Journal of the American Statistical Association, 91,217–221.

Vermunt, J. K., & van Dijk, L. A. (2001). A nonparametricrandom coefficient approach: The latent class regressionmodel. Multilevel Modelling Newsletter, 13, 6–13.

Waller, N. G., & Meehl, P. E. (1998). Multivariate taxomet-ric procedures: Distinguishing types from continua.Thousand Oaks, CA: Sage.

Wherry, R. J., & Gaylord, R. H. (1944). Factor pattern oftest items and tests as a function of the correlation coef-

ficient: Content, difficulty, and constant error factors.Psychometrika, 9, 237–244.

White, H. R., Johnson, V., & Buyske, S. (2000). Parentalmodeling and parental behavior effects on offspring al-cohol and cigarette use: A growth curve analysis. Journalof Substance Abuse, 12, 287–310.

Wolfe, J. H. (1970). Pattern clustering by multivariate mix-ture analysis. Multivariate Behavioral Research, 5, 329–350.

Wolfe, J. H. (1971). A Monte Carlo study of the samplingdistribution of the likelihood ratio for mixtures of multi-normal distributions (Tech. Bulletin No. STB 72-2). SanDiego, CA: U.S. Naval Personnel and Training ResearchLaboratory.

Yung, Y.-F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika, 62, 297–330.

Zhu, H.-T., & Lee, S.-Y. (2001). A Bayesian analysis offinite mixtures in the LISREL model. Psychometrika, 66,133–152.

Received June 13, 2002Revision received May 21, 2003

Accepted July 21, 2003 �

PROBLEMS AND OPPORTUNITIES 29