19
Psychological Methods 1999, Vol. i. No. 2, 139-157 Copyright 1999 by the American Psychological Association, Inc. 1082-989X/99/$.VOO Analyzing Developmental Trajectories: A Semiparametric, Group-Based Approach Daniel S. Nagin Carnegie Mellon University A developmental trajectory describes the course of a behavior over age or time. A group-based method for identifying distinctive groups of individual trajectories within the population and for profiling the characteristics of group members is demonstrated. Such clusters might include groups of "increasers." "decreasers," and "no changers." Suitably defined probability distributions are used to handle 3 data types—count, binary, and psychometric scale data. Four capabilities are dem- onstrated: (a) the capability to identify rather than assume distinctive groups of trajectories, (b) the capability to estimate the proportion of the population following each such trajectory group, (c) the capability to relate group membership probabil- ity to individual characteristics and circumstances, and (d) the capability to use the group membership probabilities for various other purposes such as creating profiles of group members. Over the past decade, major advances have been made in methodology for analyzing individual-level developmental trajectories. The two main branches of methodology are hierarchical modeling (Bryk & Raudenbush, 1987, 1992; Goldstein, 1995) and latent curve analysis (McArdle & Epstein, 1987; Meredith &Tisak, 1990;Muthen, 1989; Willett& Sayer, 1994). These advanced methods allow researchers to move beyond the use of ad hoc categorization procedures for constructing developmental trajectories. Although these two classes of methodology differ in very important respects, they also have important commonalities (MacCallum, Kim, Malarkey, & Kiecolt-Glaser, 1997; Muthen & Curran, 1997; Raudenbush, in press; Willett & Sayer, 1994). For the purposes of this article, one is key: Both model the This work was supported by National Science Foundation Grant SBR-9511412 and by the National Consortium for Violence Research. 1 thank Robert Brame, Lisa Broidy, Pat- rick Curran, Kenneth Dodge, Laura Dugan, Anne Garvin, Don Lynam, Steve Raudenbush, Kathryn Roeder, Robert Siegler, and Larry Wasserman for helpful comments on previous versions of this article. Correspondence concerning this article should be ad- dressed to Daniel S. Nagin, H. John Heinz III School of Public Policy & Management, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213. unconditional and conditional population distribution of growth curves based on continuous distribution functions. Unconditional models estimate two key features of the population distribution of growth curve parameters—their mean and covariance struc- ture. The former defines average growth within the population, and the latter calibrates the variances of growth throughout the population. The conditional models are designed to explain this variability by re- lating growth parameters to one or more explanatory variables. This article demonstrates a distinct semiparametric, group-based approach for modeling developmental trajectories. The modeling strategy is intended to pro- vide a flexible and easily applied approach for iden- tifying distinctive clusters of individual trajectories within the population and for profiling the character- istics of individuals within the clusters. Using mix- tures of suitably defined probability distributions, the method identifies distinctive groups of developmental trajectories within the population (Land & Nagin, 1996; Nagin, Harrington, & Moffitt, 1995; Nagin & Land, 1993; Nagin & Tremblay, in press; Roeder, Lynch, & Nagin, in press). Thus, whereas the hierar- chical and latent curve methodologies model popula- tion variability in growth with multivariate continuous distribution functions, the group-based approach uses a multinomial modeling strategy and is designed to 139

Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

Embed Size (px)

DESCRIPTION

Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach, political science

Citation preview

Page 1: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

Psychological Methods1999, Vol. i. No. 2, 139-157

Copyright 1999 by the American Psychological Association, Inc.1082-989X/99/$.VOO

Analyzing Developmental Trajectories: A Semiparametric,Group-Based Approach

Daniel S. NaginCarnegie Mellon University

A developmental trajectory describes the course of a behavior over age or time. Agroup-based method for identifying distinctive groups of individual trajectorieswithin the population and for profiling the characteristics of group members isdemonstrated. Such clusters might include groups of "increasers." "decreasers,"and "no changers." Suitably defined probability distributions are used to handle 3data types—count, binary, and psychometric scale data. Four capabilities are dem-onstrated: (a) the capability to identify rather than assume distinctive groups oftrajectories, (b) the capability to estimate the proportion of the population followingeach such trajectory group, (c) the capability to relate group membership probabil-ity to individual characteristics and circumstances, and (d) the capability to use thegroup membership probabilities for various other purposes such as creating profilesof group members.

Over the past decade, major advances have beenmade i n methodology for analyzing individual-leveldevelopmental trajectories. The two main branches ofmethodology are hierarchical modeling (Bryk &Raudenbush, 1987, 1992; Goldstein, 1995) and latentcurve analysis (McArdle & Epstein, 1987; Meredith&Tisak, 1990;Muthen, 1989; Willett& Sayer, 1994).These advanced methods allow researchers to movebeyond the use of ad hoc categorization proceduresfor constructing developmental trajectories.

Although these two classes of methodology differin very important respects, they also have importantcommonalities (MacCallum, Kim, Malarkey, &Kiecolt-Glaser, 1997; Muthen & Curran, 1997;Raudenbush, in press; Willett & Sayer, 1994). For thepurposes of this article, one is key: Both model the

This work was supported by National Science FoundationGrant SBR-9511412 and by the National Consortium forViolence Research. 1 thank Robert Brame, Lisa Broidy, Pat-rick Curran, Kenneth Dodge, Laura Dugan, Anne Garvin,Don Lynam, Steve Raudenbush, Kathryn Roeder, RobertSiegler, and Larry Wasserman for helpful comments onprevious versions of this article.

Correspondence concerning this article should be ad-dressed to Daniel S. Nagin, H. John Heinz III School ofPublic Policy & Management, Carnegie Mellon University,Pittsburgh, Pennsylvania 15213.

unconditional and conditional population distributionof growth curves based on continuous distributionfunctions. Unconditional models estimate two keyfeatures of the population distribution of growthcurve parameters—their mean and covariance struc-ture. The former defines average growth within thepopulation, and the latter calibrates the variances ofgrowth throughout the population. The conditionalmodels are designed to explain this variability by re-lating growth parameters to one or more explanatoryvariables.

This article demonstrates a distinct semiparametric,group-based approach for modeling developmentaltrajectories. The modeling strategy is intended to pro-vide a flexible and easily applied approach for iden-tifying distinctive clusters of individual trajectorieswithin the population and for profiling the character-istics of individuals within the clusters. Using mix-tures of suitably defined probability distributions, themethod identifies distinctive groups of developmentaltrajectories within the population (Land & Nagin,1996; Nagin, Harrington, & Moffitt, 1995; Nagin &Land, 1993; Nagin & Tremblay, in press; Roeder,Lynch, & Nagin, in press). Thus, whereas the hierar-chical and latent curve methodologies model popula-tion variability in growth with multivariate continuousdistribution functions, the group-based approach usesa multinomial modeling strategy and is designed to

139

Page 2: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

140 NAGIN

identify relatively homogeneous clusters of develop-mental trajectories.

The group-based modeling strategy demonstratedin this article is presaged in the work of Rindskopf(1990). Rindskopf's methodology is ful ly nonpara-metric. It is designed for analysis of repeated mea-surements of dichotomous response data. The re-peated measurements may come in the form ofresponses to a series of test items or, alternatively, inthe form of panel data. Rindskopf's method is de-signed to identify distinct groups of response se-quences across sampled individuals. The method de-scribed here expands on Rindskopf's innovative workin several important respects—it increases the varietyof response variables to which a group-based model-ing strategy can be applied, it provides a basis forlinking group membership probability to relevant in-dividual-level characteristics, and it describes a for-mal approach for determining the optimal number ofgroups.

Rationale for a Group-Based Modeling Strategy

Before discussing the technical features of themodel, I briefly speak to the underlying rationale fora group-based modeling strategy. There is a long tra-dition in psychology of group-based theorizing aboutdevelopment. Examples include theories of personal-ity development (Caspi, 1998), drug use (Kandel,1975), learning (Holyoak & Spellman, 1993), lan-guage and conceptual development (Markman, 1989),development of prosocial behaviors such as altruism,and development of antisocial behaviors such as de-linquency (Loeber, 1991; Moffitt, 1993; Patterson,DeBaryshe, & Ramsey, 1989). This group-based ap-proach is well suited to analyzing questions aboutdevelopmental trajectories that are inherently cat-egorical—do certain types of people tend to have dis-tinctive developmental trajectories?

For example, Moffitt (1993) and Patterson et al.(1989) have argued that early onset of delinquency isa distinctive feature of the development trajectories ofchronically criminal and antisocial adults but not ofindividuals whose offending is limited to their ado-lescence. The group-based modeling strategy can testwhether the developmental trajectories predicted bythe theories are actually present in the population. Itcan also be used to test key predictions such as wheth-er the chronic trajectory is typified by an earlier onsetof delinquent behavior than the adolescent limited tra-jectory. In contrast, because hierarchical and latent

curve modeling assume a continuous distribution oftrajectories within the population, these methods arenot designed to identify distinct clusters of trajecto-ries. Rather they are designed to describe how pat-terns of growth vary continuously throughout thepopulation. As a result, it is awkward to use thesemethods to address research questions that contrastdistinct categories of developmental trajectories.

The group-based modeling approach assumes thatthe population is composed of a mixture of distinctgroups defined by their developmental trajectories.The assumption that the population is composed ofdistinct groups is not likely literally correct. Unlikebiological or physical phenomena, in which popula-tions may be composed of actually distinct groupssuch as different types of animal or plant species,population differences in developmental trajectoriesof behavior are unlikely to reflect such bright-linedifferences (although biology has a long tradition ofdebates concerning classification; see Appel, 1987).

To be sure, taxonomic theories predict different tra-jectories of development across subpopulations, butthe purpose of such taxonomies is generally to drawattention to differences in the causes and conse-quences of different developmental trajectories withinthe population rather than to suggest that the popula-tion is composed of literally distinct groups. In thisrespect, developmental toxonomies are analogous toprototypal diagnostic classifications in clinical psy-chology. Unlike classical classifications, prototypalclassifications acknowledge fuzziness in classifica-tion, employ polythetic diagnostic criteria, and recog-nize within-group variation in the prevalence and se-verity of symptoms (Cantor & Genero, 1986; Widiger& Frances, 1985).

The group-based modeling strategy is prototypal indesign and provides a methodological complement totheories that predict prototypal developmental etiolo-gies and trajectories within the population. The mod-eling strategy explicitly recognizes uncertainty ingroup membership, allows an examination of the im-pact of multiple factors on probability of group mem-bership, and anoints no set of factors as necessary andsufficient in determining group membership. Withthis approach the basic elements of taxonomic theo-ries can be directly tested: Are the relatively homo-geneous developmental trajectories and etiologiespredicted by the theory actually present in the popu-lation?

In so doing, the method provides an alternative tousing assignment rules based on subjective categori-

Page 3: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 141

zation criteria to construct categories of developmen-tal trajectories. Although such assignment rules aregenerally reasonable, there are limitations and pitfallsattendant to their use. One is that the existence of thevarious developmental trajectories that underlie thetaxonornic theory cannot be tested; they must be as-sumed a priori. A related pitfall of constructinggroups with subjective classification procedures isoverfilling—creation of trajectory groups that reflectonly random variation. Second, ex ante specified rulesprovide no basis for calibrating the precision of indi-vidual classifications to the various groups that com-pose th;: taxonomy. The semiparametric, group-basedmethod presented in this article avoids each of theselimitations. It provides a formal basis for determiningthe number of groups that best fits the data and alsoprovides an explicit metric, the posterior probabilityof group membership, for evaluating the precision ofgroup assignments.

Data and Three Illustrative Examples

Throughout, key methodological points are illus-trated with findings from the analysis of two well-known longitudinal studies. One is the CambridgeStudy of Delinquent Development, which tracked asample of 411 British males from a working-classarea of London. Data collection began in 1961-62,when most of the boys were age 8. Criminal involve-ment is measured by conviction for criminal offensesand is available for all individuals in the samplethrough age 32, with the exception of the 8 individu-als who died prior to this age. Between ages 10 and 32a wealth of data was assembled on each individual'spsychological makeup, family circumstances includ-ing parental behaviors, and performance in school andwork. For a complete discussion of the data set, seeFarrington and West (1990).

The other data set is from a prospective longitudi-nal study based in Montreal, Canada. This study hasbeen tracking 1,037 White males of French ancestry.Participants were selected in 1984 from kindergartenclasses in low-socioeconomic-status Montreal neigh-borhoods. Following the assessment at age 6, the boysand other informants were interviewed annually fromages 10 to 18. Like the Cambridge study, assessmentswere made on a wide range of factors. Among these isphysical aggression, which was measured on the basisof teacher ratings using the Social Behavior Question-naire (Tremblay, Desmarais-Gervais, Gagnon, &Charlebois, 1987). The boys were rated on threeitems: kicking, biting, and hitting other children;

fighting with other children; and bullying and intimi-dating other children. See Tremblay et al. for furtherdetails on this study.

Figures 1, 2, and 3 show the results of trajectoryanalyses based on these two data sets. The softwareused to estimate these models is a customized SASprocedure that was developed with the SAS productSAS/TOOLKIT. The procedure, which is currentlyavailable for the PC platform only, is a program writ-ten in the C programming language and is designed tointerface with the SAS system to perform model fit-ting. It accommodates missing data. Thus, individualswith incomplete assessment histories do not have tobe dropped from the analysis. Also, time betweenassessment periods does not have to be equallyspaced, and assessment periods do not have to beidentical across participants. The software, which iseasily used and incorporated into the standard SASpackage, is available on request and is described indetail in Jones, Nagin, and Roeder (1998).

Figure 1 displays the results of the analysis of theCambridge data. The response variable in this analy-sis is a count, the number of convictions for each2-year period from ages 10 to 31 years. The solid linesrepresent actual behavior, and the dashed lines repre-sent predicted behavior.1 Three groups were identi-fied: A group called "the never convicted" was com-posed of respondents who, but for a few individualswith a single conviction, were not convicted through-out the observation period. This group accounts for anestimated 71 % of the sampled population. A secondgroup of individuals who ceased their offending (asmeasured by conviction) by their early 20s was la-beled "adolescent limited" after Moffitt (1993). Thisgroup is estimated to constitute 22% of the popula-tion. Finally, a chronic group was identified and isestimated to make up 7% of the population. Individu-

1 Predicted behavior is calculated as the expected value ofthe random variable depicting each group's behavior. Ex-pected values are computed based on model coefficient es-timates. For the application depicted in Figure !, this ex-pectation equals the antilog of Equation 1. For theapplication depicted in Figure 2, it is calculated according tothe relationship provided in footnote 4. For the applicationdepicted in Figure 3, it equals the binomial probability ascomputed by Equation 3. Actual behavior is computed asthe mean behavior of all persons assigned to the variousgroups identified in estimation. As described hereinafter,the assignments are based on the posterior probability ofgroup membership.

Page 4: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

142 NAGIN

2.5 4

- Never-actual-Never-pred.

-Adol. Limited-actual-Adol. Limited-pred.

-Chronic-actual- Chronic-pred.

10 12 14 16 18 20 22 24 26 28

Age

Figure I . Trajectories of number of convictions (Cambridge sample). Adol. = adolescent; pred. = predicted.

30

als in this group offended at a high level throughoutihe observation period.

Figure 2 displays results based on the Montrealdata. In this analysis the response variable was a psy-chometric scale of physical aggression. A four-groupmodel was found to best fit the data. A group called"nevers" is composed of individuals who never dis-play physically aggressive behavior to any substantialdegree. This group is estimated to make up about 15%of the sample population. A second group that con-stitutes about 50% of the population is best labeled"low-level desisters." At age 6 boys in this groupdisplayed modest levels of physical aggression, but byage 10 they had largely desisted. A third group, con-stituting about 30% of the population, was labeled''high-level near desisters." This group started offscoring high on physical aggression at age 6 but byage 15 scored far lower. Notwithstanding this markeddecline, at age 15 they continued to display a modestlevel of physical aggression. Finally, there is a smallchronic group, constituting about 5% of the popula-tion, who displayed high levels of physical aggressionthroughout the observation period.

Figure 3 displays another variant of the analysis ofthe physical aggression data. For this analysis the ag-gression data for each assessment period was trans-formed from an intensity scale over the interval 0 to 6to a binary indicator equal to 1 if the individual dis-played any evidence of physical aggression in thatperiod (i.e., if his score was greater than 0). Such asymptom indicator variable is an example of binarydata—a type of data that is widely analyzed in thesocial sciences. It is also the type of data Rindskopf sgroup-based method was designed to analyze. Threesymptom trajectories are depicted in Figure 3. Onegroup, constituting about 20% of the population,never displayed any symptoms. This group is thecounterpart of the nevers in Figure 2. A second groupincludes about 50% of the population. It started offwith a fairly high probability of displaying symptomsof physical aggression, about .6, but this probabilitydeclined rapidly to near 0 by age 15. The final group,which is estimated to account for about 30% of thepopulation, began with a probability of physical ag-gression that is only modestly higher than that of thesecond group, but the subsequent rate of decline was

Page 5: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 143

- Never-actual•Never-pred.

- Low desister-actual• Low desister-pred.

High desister-actualHigh desister-pred.

- Chronic-actual-Chronic-pred.

Figure 2. Trajectories of physical aggression (Montreal sample), pred. = predicted.

far slower. Even at age 15, the probability of their dis-playing symptoms of physical aggression is about 4.

Model

Figure 4 provides an overview of the model and itskey outputs. Model estimation does not require the exante sorting of the data among the various groupsdepicted in Figures 1 ,2 , and 3. To the contrary, thedata are used to identify the number of groups thatbest fit-, the data and the shape of the trajectory foreach group. The data also provide an estimate of theproportion of the population whose measured behav-ior conforms most closely to each trajectory group.

In the discussion that follows, I first lay out thegeneral form of the model for any given number ofgroups. Next, I discuss the determination of the opti-mal number of groups and the shape of trajectory foreach group. I then move to a discussion of how themodel's parameter estimates can be used to computethe final element depicted in Figure 4—the probabil-ity that each individual / in the estimation samplebelongs to each of the trajectory groups identified inestimation. These capabilities are illustrated with re-

sults from the Cambridge and Montreal data sets. Forthe Montreal data the censored normal analysis isused for this purpose. However, all of these capabili-ties are equally well employed in the analysis of bi-nary data.

Differences in the form of the response variable inthe analyses depicted in Figures 1, 2, and 3—a countvariable, a psychometric scale, and a binary vari-able—necessitate technical differences in the statisti-cal model used in each analysis. For the count data theunderlying mixture model builds from the Poissondistribution and its even more general relative, thezero-inflated Poisson (Lambert, 1993). The Poissonfamily of distributions is widely used in the analysisof count data (Cameron & Trivedi, 1986; Parzen,1960). For the psychometric scale data the underlyingmodel is based on the censored normal distribution.The censored normal distribution is well suited toaccommodate a common feature of psychometricscale data. Typically, a sizable contingent of thesample exhibits none of the behaviors measured bythe scale. The result is a clustering of data at the scaleminimum. Also, there is usually a smaller contingentthat exhibits all of the behaviors measured by thescale. The result is another cluster of data at the scale

Page 6: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

144 NAGIN

- Desister-actual- Desister-pred.

Non-desister-actualNon-desister-pred.

Figure 3. Trajectories of symptoms of physical aggression (Montreal sample), pred. = predicted.

maximum. Finally, the binary logit distribution isused to model binary data. See Maddala (1983) orGreene (1990) for a thorough discussion of the cen-sored normal and binary logit distributions.

Like hierarchical and latent curve modeling, a poly-nomial relationship is used to model the link betweenage and behavior. Specifically, a quadratic relation-ship is assumed.2 For the Poisson-based model it isassumed that

log(\/,) = ft, + ft Age, + ft Age?,, (1)

where X-J, is the expected number of occurrences of theevent of interest (e.g., convictions) of subjecti at time

/ given membership in group/ Age,, is subject i's ageat time t, and Age2, is the square of subject fs age attime r.3 The model's coefficients—(3{,, fj',, and ft—determine the shape of the trajectory and are super-scripted by j to denote that the coefficients are notconstrained to be the same across the j groups. Theconditional probability of the actual number of events,P(^it\j), given j is assumed to follow the well-knownPoisson distribution.4

For the censored normal model, the linkage be-tween age and behavior is established by means of alatent variable, >',*,', that can be thought of as measur-

Optimal Number of Groups&

Trajectory Shapes

I Proportion of Populationin Each Group

Probability that Individual iBelongs to Trajectory Group j

Figure 4. Overview of the model.

2 The software package used in estimating the modelsreported here allows for estimation of up to a cubic poly-nomial in age. For ease of exposition I limit the discussionto examples where the maximum order considered is qua-dratic.

•* A log-linear relationship between X', and age is assumedto ensure that the requirement that \j, > 0 is fulfilled inmodel estimation.

4 In an even more general version of this model, P(v/,[/) isassumed to follow the zero-inflated Poisson distribution.See Land, McCall, & Nagin (1996), Nagin and Land (1993),or Roeder, Lynch, and Nagin (in press) for the developmentof this more general case.

Page 7: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 145

ing potential for engaging in the behavior of inter-est—say, physical aggression—for individual ;'s ageat time t given membership in group/ Again, a qua-dratic relationship is assumed between >•'/ and age:

B

v*/=(yo+p' I (2)

where Age,-, and Age2,, are as previously defined ande is a disturbance assumed to be normally distributedwith zero mean and constant variance a2.

The latent variable, y*/, is linked to its observed butcensored counterpart, y,-,, as follows. Let 5min andSmax, respectively, denote the minimum and maxi-mum possible score on the measurement scale. Themodel assumes

y'ii ~ -^min 1' y / r ^min'

y/; = y*,' if 5mm < y*' < Smax, and

.Vi> = -Xmix " }'it -"" ^max-

In words, if the latent variable, y*', is less than 5min, itis assumed that observed behavior equals this mini-mum. Likewise, if the latent variable, y*/, is greaterthan Smax, it is assumed that observed behavior equalsthis maximum. Only if y*/ is within the scale mini-mum and maximum does y,-, = y*'.5'6

Finally, consider the case where the measured re-sponse y,,, is binary. In this case it is assumed thatconditional on membership in group; the probabilitythat y,, = 1, ot'ip follows the binary logit distribution:

(3)1 +<?"

In the Poisson, censored normal, and binary logitformulations, the parameters defining the shape of thetrajectory—$J

0. (3',, and (3;2—are left free to differ

across groups. This flexibility is a key feature of themodel because it allows for easy identification ofpopulation heterogeneity not only in the level of be-havior at a given age but also in its development overtime. Figure 5 illustrates two hypothetical possibili-ties. A single peaked trajectory—Trajectory A—isimplied if p, > 0 and p? < 0. Thus, if data collectionbegan at age 1, the trajectory would imply that for thisgroup the occurrence of the behavior rose steadilyuntil age 6 and then began a steady decline. Alterna-tively, if data collection began at age 6, as was thecase in the Montreal study, generally it would be in-appropriate to extrapolate backward to a younger ageoutside the period of measurement. Thus, for a model

10 11 12 Age

Figure 5. Two hypothetical trajectories. A = singlepeaked; B = chronic trajectory.

based on data from age 6 onward, this trajectorywould imply a steady decline in the behavior follow-ing the initial assessment. Such a trajectory wouldtypify desistance from the behavior. The second tra-jectory (B) depicted in Figure 5 has no curvature.Rather it remains constant over age. This trajectory isimplied if (^ = 0 and |32 = 0. If that stable level ofthe behavior is high, this trajectory would typify agroup that chronically engages in the behavior. Otherinteresting possibilities include trajectories in whichgrowth is either steadily accelerating or decelerating.The former would be characterized by a trajectory inwhich both (3j and (32

are positive and the latter byboth being negative.

The trajectories depicted in Figures 1, 2, and 3 arethe product of maximum-likelihood estimation. Thelikelihood function was constructed as follows. Letthe vector y, = {y,,, y/i, • • • , y,T} denote the longi-tudinal sequence of individual /"s behavioral measure-

5 The expected value of the latent variable y*' equals $nj

+ (3{ Age,, + (3^ Agef,, which is denoted below by p.v;. Theexpected value of the measured quantity, £(v',). assuming jwas observed, is E(Y>,) = VmtaSmin + P^*!™ - *!,„„) +(r(ct>ilin - 4>,',u») + (1 - 4>L*)Smax, where *(*) denotes thecumulative normal distribution function and <t>min and 4>max,respectively, denote <t>'[(Sm,n - (iO/cr)] and <l>'[(Smax -PX',)/O-)], with corresponding definitions for 4>min and <j>max,where <j> denotes the normal density function. This relation-ship was used to compute the predicted components of Fig-ure 1.

6 I use the term latent variable to describe y*' because itis not fully observed. Thus, my use of the term latent isdifferent from that in the psychometric literature, where theterm latent factor refers to an unobservable construct that isassumed to give rise to multiple manifest variables.

Page 8: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

146 NAGIN

merits over the T periods of measurement, P(Yt) de-note the probability of Y/ given membership in groupj, and TTj denote the probability of membership ingroup/ For count data P!(Y/) is constructed from thePoisson distribution, for psychometric scale data fromthe censored normal distribution, and for the binarydata from the binary logit distribution. If group mem-bership was observed, the sampled individuals couldbe sorted by group membership and their trajectoryparameters estimated with readily available Poisson,censored normal (tobit), and logit regression softwareroutines (e.g., STATA, 1995 or LIMDEP, Greene,1991).7

However, group membership is not observed. In-deed the proportion of the population composinggroup/ iTy, is an important parameter of interest in itsown right. Thus, construction of the likelihood re-quires the aggregation of the J conditional likeli-hoods, f'(K/), to form the unconditional probability ofthe data, K,:

(4)

where P(Yt) is the unconditional probability of ob-serving individual fs longitudinal sequence of behav-ioral measurements. It equals the sum across the Jgroups of the probability of K, given membership iniiroupy weighted by the proportion of the populationin group / The log of the likelihood for the entiresample is thus the sum across all individuals that com-pose the sample of the log of Equation 4 evaluated foreach individual /. The parameters of interest—(3(,, (B7,,|:12, TTJ, and, in addition, cr for the censored normal—can be estimated by maximization of this log likeli-hood. As previously mentioned, SAS-based softwarefor accomplishing this task is available on request.

For a derivation of the likelihood see the appendix,but intuitively, the estimation procedure works as fol-lows. Suppose unbeknownst to us there were two dis-tinct groups in the population: youth offenders, con-stituting 50% of the population, who up to age 18have an expected offending rate, X, of 5 and who afterage 18 have a X of 1, and adult offenders, constitutingthe other 50% of the population, whose offendingtrajectory is the reverse of that of the youth offend-ers—through age 18 their X = I, and after age 18their X increases to 5. If we had longitudinal data onthe recorded offenses of a sample of individuals fromthis population, we would observe two distinctgroups: a clustering of about 50% of the sample whogenerally have many offenses prior to age 18 and

relatively few offenses after age 18 and another 50%clustering with just the reverse pattern.

Suppose these data were analyzed under the as-sumption that the relationship between age and X wasidentical across all individuals. The estimated value ofX would be a "compromise" estimate of about 3 for allages from which we would mistakenly conclude thatin this population the rate of offending is invariantwith age. If the data were analyzed using the approachdescribed here, which specifies the likelihood func-tion as a mixing distribution, no such mathematicalcompromise would be necessary. The parameters ofone component of the mixture would effectively beused to accommodate (i.e., match) the youth offend-ing portion of the data whose offending declines withage, and another component of the mixing distributionwould be available to accommodate the adult offenderdata whose offending increases with age.

Model Selection

This section addresses two important issues inmodel selection: (a) determination of the optimalnumber of groups to compose the mixture and (b)determination of the appropriate order of the polyno-mial used to model each group's trajectory. Here or-der refers to the degree of the polynomial used tomodel the group's trajectory, where a second-ordertrajectory is defined by a quadratic equation, a first-order trajectory is defined by a linear equation inwhich (32 is set equal 0, and a zero-order trajectory isdefined by a flat line in which (}, and (32 are set equalto zero. I discuss these two issues in turn.

One possible choice for testing the optimality of aspecified number of groups is the likelihood ratio test.However, the null hypothesis (i.e., three componentsvs. more than three components) is on the boundary ofthe parameter space, and hence the classical asymp-totic results that underlie the likelihood ratio test donot hold (Erdfelder, 1990; Ghosh & Sen, 1985; Tit-terington, Smith, & Makov, 1985). The likelihood ra-tio test is suitable only for model selection problemsin which the alternative models are nested. In mixturemodels, a k group model is not nested within a k + 1

7 In the econometric literature censored normal regres-sion is generally called tobit regression after its originator,James Tobin (Tobin, 1958).

Page 9: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 147

group model, and, therefore, it is not appropriate touse the likelihood ratio test for model selection.8

Given these problems with the use of the likelihoodratio test for model selection, we follow the lead ofD'Unger, Land, McCall, and Nagin (1998) and usethe Bayesian information criterion (BIC) as a basis forselecting the optimal model. For a given model, BICis calculated as follows:

BIC = log(L) - 0.5*logOz)*(/t), (5)

where L is the value of the model's maximized like-lihood, n is the sample size, and k is the number ofparameters in the model. Kass and Raftery (1995) andRaftery (1995) have argued that BIC can be used forcomparison of both nested and unnested models underfairly general circumstances. When prior informationon the correct model is limited, they recommendedselection of the model with the maximum BIC. Notethat BIC is always negative, so the maximum BICwill be the least negative value. In even more recentwork, Keribin (1997) demonstrated that BIC identi-fies the optimal number of groups in finite mixturemodels. Her result is specifically relevant for the mix-ture models demonstrated here.

For insight into the usefulness of BIC as a criterionfor model selection, consider its calculation. The firstterm of Equation 5 is always negative. For a modelthat predicts the data perfectly it equals zero. As thequality of the model's fit with the data declines, thisterm declines (i.e., becomes more negative). One wayto improve fit and thereby reduce the first term is toadd more parameters to the model. The second termextracts a penalty proportional to the log of the samplesize for the addition of more parameters. Thus, on thebasis of the BIC criterion, expansion of the model byaddition of a trajectory group is desirable only if theresulting improvement in the log likelihood exceedsthe penalty for more parameters. As Kass and Raftery(1995) noted, the BIC rewards parsimony. For thisapplication the BIC criterion will tend to favor modelswith fewer groups.

Table 1 shows BIC scores for models with varyingnumbers of groups. For the Cambridge data the pat-tern is seemingly very distinct—BIC appears to reacha clear maximum at three groups. I say "seemingly,"because, without a concrete standard for calibratingthe magnitude of the change in BIC, it is difficult tocalibrate what constitutes a distinct maximum.

Kass and Wasserman (1995) and Schwarz (1978)have provided such a standard. Let Btj denote theBayes factor comparing models i and j, where for

Table 1BIC-Based Calculations of the Probability That a "j"Group Model Is the Correct Model for Different Numbersof Groups of Quadratic Trajectories

Data set

Cambridge

No. ofgroups

2345

BIC

-1583.43-1552.62-1569.42-1586.21

Probabilitycorrectmodel

.001.00.00.00

Montreal

BIC

-7325.26-7289.52-7289.27-7292.54

Probabilitycorrectmodel

.00

.43

.55

.02

Note. BIC = Bayesian information criterion.

this application model i might be a two-group modeland j a three-group model. The Bayes factor measuresthe odds of each of the two competing models beingthe correct model. It is computed as the ratio of theprobability of i being the correct model to j beingthe correct model. Thus, a Bayes factor of 1 impliesthat the models are equally likely, whereas a Bayesfactor of 10 implies that model i is 10 times morelikely than/ Table 2 shows Jeffreys's scale of evi-dence for Bayes factors as reported by Wasserman(1997).9

Computation of the Bayes factor is in general verydifficult and indeed commonly impossible. Schwarz(1978) and Kass and Wasserman (1995), however,have shown that eBIC/ ~ BIC' is a good approximation ofthe Bayes factor for problems in which equal weightis placed on the prior probabilities of models i and /On the basis of this approximation, for the Cambridge

8 The problem is most easily illustrated with an example.The likelihood ratio test is computed as minus two times thedifference in the likelihood of the two nested models. Thisstatistic is asymptotically distributed as chi-squared withdegrees of freedom equal to the difference in number ofparameters between the nested models. For our mixturemodel, the degrees of freedom is indeterminate, because agroup can become superfluous in two ways. One is by theproportion of the population in that group, itj, approachingzero. Alternatively, the three parameters defining the trajec-tory of one group can collapse onto those for another group.What then is the appropriate degrees of freedom—one orthree?

9 Jeffreys was an early and very prominent contributer toBayesian statistics.

Page 10: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

148 NAGIN

Table 2Jeffreys's Scale of Evidence for Bayes Factors

Bayes factor Interpretation

B;; < 1/101/10 < B,j <1/3 <Bij< 11 < BJJ < 33 <B]j< 10B,, > 10

1/3Strong evidence for model jModerate evidence for model jWeak evidence for model jWeak evidence for model /'Moderate evidence for model /'Strong evidence for model ;

Note. Adapted from "Bayesian Model Selection and Model Av-eraging" by L. Wasserman. 1997, Working Paper No. 666, Carne-gie Mellon University, Department of Statistics. Copyright 1997 byL. Wasserman. Adapted wi th permission.

data the odds of the three-group model compared with:he two- or four-group model far exceed 1000 to 1.Thus, according to Jeffreys's scale this is very strongevidence in favor of the three-group model.10

Schwarz (1978) and Kass and Wasserman (1995)have also provided a related metric for comparingmore than two models. Let pf denote the posteriorprobability that model j is the correct model, where ingeneral j is greater than 2. They show that p: is rea--onably approximated by the following:

(6)

where BICmax is the maximum B1C score of the mod-els under consideration.

Also shown in Table 1 are the probabilities, ascomputed using Equation 6, that the models withvarying numbers of groups are the true model. Withthis BIC-based probability approximation, for theCambridge data the probability of the three-groupmodel is near 1.

Consider now the BIC scores for the models esti-mated with the Montreal data. The four-group modelhas the best BIC score, but the probability of its beingthe correct model, .55, is far less than 1. Although thefour-group model is far more likely than the five-group model, the three-group model is a close com-petitor. Its probability is .43. On the basis of Jeffreys'sscale, the four-group model is strongly preferred tothe five-group model. The odds ratio in favor of thefour-group model is 26 to l(i.e., .55/.02). However,th: edge compared with the three-group model isslight with an odds ratio of only 1.28 (i.e., .551.43).Thus, for models in which each trajectory is describedby a full quadratic specification, the three- and four-group models fit the data about equally well.

The Montreal analysis illustrates a situation inwhich the mechanical application of the BIC modelselection criterion does not result in an unambiguousdetermination of the "best" model. This brings me tothe issue of determination of the appropriate order formodeling the trajectory of each group that makes upthe mixture. The best model will not necessarily in-volve a mixture in which all trajectories are of thesame order (e.g., quadratic). In principle one couldexhaustively explore all possible combinations of or-ders for a model. In practice this approach is generallynot practical. For a four-group model alone, there are81 (i.e., 34) possible combinations of trajectoriesmodels, the number of possibilities in a four-groupmodel is 256. Further, a full search would requireestimating models over varying numbers of groups.To reduce the number of alternatives, an analyst willgenerally have to use knowledge of the problem do-main to limit the model search process. For instance,in the Montreal example substantive considerationssuggest that an improved, more parsimonious modelshould restrict the number of parameters used to de-scribe the never and chronic trajectories. Descriptionof a never trajectory does not require a three-parameter model: a zero-order model with a "verynegative" intercept should suffice. Indeed the largestandard errors (not reported) associated with the pa-rameters of the never trajectory strongly suggest thatthe trajectory is overparameterized.

Table 3 shows the probabilities that various formsof three- and four-group models are the correct model.We refer to models in which the trajectories for allgroups are quadratic as Type A models. Three- andfour-group models in which the never trajectory isdescribed by a zero-order model are labeled Type Bmodels. Also shown are three- and four-group modelsin which, in addition, the chronic trajectory is de-scribed by a zero-order model (Type C). The infor-mation was prompted by the theories of Moffitt(1993) and Patterson et al. (1989), which predict theexistence of a small group of chronically antisocialindividuals. The BIC-based probability calculationsprovide strong support for the four-group, Type Cmodel in Table 3. Compared with the other options

10 Using the BIC criterion, D'Unger et al. (1998) andRoeder, Lynch, and Nagin (in press) showed that a four-group model is best. In those analyses the model is fit basedon the more general zero-inflated Poisson.

Page 11: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 149

Table 3BIC-Based Calculation of Probability of Correct Modelfor Models Combining Quadratic and NonquadraticTrajectories: Montreal Data

/W) — (7)

No. of groups

•\

'\

A

.00

.00

Model type

B

.00

.07

c.00.93

Note. ::1IC = Bayesian information criterion; Model Type A =all trajectories quadratic; Model Type B = one single parametertrajectory, remaining quadratic; Model Type C = two single pa-rameter trajectories, remaining quadratic.

listed in the table, the probability of this being thecorrect model is .93. The closest competitor is thefour-group. Type B model. However, the posteriorprobability of it being the correct model is only .07.

As the Montreal analysis illustrates, the determina-tion o: the optimal number of groups is not always aclear-cut process. Although Bayesian statisticianshave made important strides in developing theoreti-cally grounded criteria for model selection, the resultsare recent and not yet widely used. Further, as illus-trated by the Montreal example, the model search pro-cess may also be guided in part by nonstatistical con-siderations. This injects a degree of subjectivity intomodel selection. Still, it is important to recognize thatuse of the BIC criterion for model selection adds avery significant degree of statistical objectivity to thedetermination of the optimal number of groups. Thisprovides an important check against the spurious in-terpretation of random fluctuations in the data as re-flecting systematic patterns of behavior.

Calculation and Use of Posterior GroupMembership Probabilities

It is not possible to determine definitively an indi-vidual's group membership. However, it is possible tocalculate the probability of his or her membership inthe various groups that make up the model. Theseprobabilities, the posterior probabilities of groupmembership, are among the most useful products ofthe group-based modeling approach. Specifically,based on the model coefficient estimates, for eachindividual i the probability of membership in group jis calculated on the basis of the individual's longitu-dinal pattern of behavior, K,. We denote this probabil-ity by Pij'iy,-). It is computed as follows:

where P(Yj\j) is the estimated probability of observing;"s actual behavioral trajectory, K,, given member-ship in j, and fr; is the estimated proportion of thepopulation in group / The quantity P(K,-ly) can becalculated postestimation based on the maximum-likelihood estimates of the trajectory parameters.

The posterior probability calculations provide theresearcher with an objective basis for assigning indi-viduals to the development trajectory group that bestmatches their behavior. Individuals can be assigned tothe group to which their posterior membership prob-ability is largest. On the basis of this maximum pos-terior probability assignment rule, an individual fromthe Cambridge sample with a cluster of offenses inadolescence but none thereafter, in all likelihood, willbe assigned to the adolescent limited category. Alter-natively, an individual who offends at a compar-atively high rate throughout the observation periodwil l likely be assigned to the chronic group.

Table 4 shows the mean assignment probability forthe Montreal and Cambridge data. For example, in theCambridge data the mean chronic group posteriorprobability for the 21 individuals assigned to thisgroup was very high—.95. The counterpart averagefor the 77 individuals assigned to the adolescent lim-ited group is similarly large—.94. For the Montrealdata, classification certainly is not so high but stillseems reasonably good. Across the four groups itranges from .73 to .93.

One informative use of the posterior probability-based classifications is to create profiles of the ''av-erage" individual following the trajectory character-ized by each group. Table 5 shows summary statisticson individual characteristics and behaviors of eachgroup for both data sets. The profiles conform withlongstanding findings on risk and protective factorsfor antisocial and problem behaviors. In the Cam-bridge data those in the chronic group, on average,were most likely to have lived in a low-income house-hold, to have had at least one parent with a criminalrecord, to have been subjected to poor parenting, andto have displayed a high propensity to engage in riskyactivities. Conversely, the never group was lowest onthese risk factors. The pattern is similar for physicalaggression. Members of the chronic group had theleast well-educated parents and most frequentlyscored in the lowest quartile of the measured IQ dis-tribution of the sample, whereas the nevers were high-

Page 12: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

150 NAGIN

Table 4Average Assignment Probability Conditional on Assignment by Maximum Probability Rule

Group

Assigned group Never Adolescent limited Chronic

NeverAdolescent limitedChronic

NeverLow desisterHigh desisterChronic

Never.73.00.00.00

Cambridge data.94.030

Montreal dataLow desister

.27

.93

.09

.00

.06

.94

.05

High desister.00.10.84.24

.00

.03

.95

Chronic.00.00.06.76

est on these protective factors. Further, 90% of thosein the chronic group failed to reach the eighth gradeon schedule, and 13% had a juvenile record by age 18;only 19% of the nevers had fallen behind by theeighth grade, and none had a juvenile record. In be-tween are the low-level and high-level desisters, whothemselves order as expected on these characteristicsand behaviors.

Aside from providing a basis for group assignment,the posterior probabilities can provide an objectivecriterion for selecting subsamples for follow-up datacollection. For example, in an analysis based on theMontreal data, Nagin and Tremblay (in press) foundthat individuals following the chronic physical ag-gression trajectory depicted in Figure 3 displayed

heightened levels of violence at age 18, controllingfor chronic opposition and hyperactivity. Conversely,it was found that chronic opposition and hyperactivitydid not predict violence at age 18, controlling forchronic physical aggression. Nagin and Tremblayconcluded that chronic physical aggression was a dis-tinct risk factor for later violence. A follow-up studythat is currently under way at the time of this writingwas devised to examine whether self-regulatory pro-cesses were a possible explanation for the findings.The study involves a series of laboratory assessmentsof selected individuals in the Montreal study. Indi-viduals were recruited for this study based on theirposterior probabilites of membership in the varioustrajectory groups for physical aggression, opposition,

Table 5Group Profile

Group

Variable

Low household income (%)Poor parenting (%)High risk taking (%)Parents with criminal record (%)

Years of school — motherYears of school — fatherLow lQa (%)Completed eighth grade on time (%)Juvenile record (%)No. of sexual partners at age 17 (past year)

Never

Cambridge data16.417.421.417.4

Montreal data

Never11 .111.521.680.30.01.2

Adolescent

35.133.842.942.9

Lowdesister

10.810.726.864.6

2.01.7

limited

Highdesister

9.89.8

44.531.86.02.2

Chronic

55.651.974.163.0

Chronic8.49.1

46.46.5

13.33.5

1 Measured IQ score is in the lowest quartile of the sample.

Page 13: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 151

and hvperactivity. The purpose was to recruit strate-gically to ensure that the sample included personswith distinct combinations of development in exter-nalizing behaviors (e.g., high probability of chronicphysical aggression but low probability of chronicopposition). The posterior probabilities provide objec-tive, quantified criteria for such strategic sampling.

Statistically Linking Group Membershipto Covariates

What individual, familial, and environmental fac-tors distinguish the populations of the various trajec-tory groups? Are the factors consistent with receivedtheory on developmental trajectories? What statisticalprocedures are most appropriate to test whether suchfactors distinguish among the trajectory groups?Questons such as these follow naturally from theidentification of distinct groups of developmental tra-jectories.

The profiles reported in Table 5 are a first step inaddressing these questions but indeed only a begin-ning. First, the profiles are simply a collection of uni-variate contrasts. For the purpose of constructing amore parsimonious list of predictors or for causal in-ference, a multivariate procedure is required to sort outredundant predictors and to control for potential con-found-. Second, the profiles are based on group iden-tifications that are probabilistic, not certain. Conven-tional statistical methods to test for cross-groupdifferences, such as F- and chi-square-based tests, as-sume no classification error in group identification(Roeder et al., in press). Thus, in general they aretechnically inappropriate in this setting."

The mixture model of developmental trajectoriescomprises two basic components: (a) an expected tra-jectory given membership in group j and (b) a prob-ability of group membership denoted by IT;. Thus far,the dKcussion has focused on the former component.The discussion of the test for factors that distinguishgroups turns our focus to the latter component. Theprofiles in Table 5 show that the trajectory groups arecomposed of individuals who differ in substantial andpredictable ways. Provision of the capacity to testformally for whether and by what degree such factorsdistinguish the groups requires that the model be gen-eralized to allow itj to depend on characteristics of theindividual. Heretofore, ir; has been described as theproportion of the population following each trajectorygroup /. Equivalently, it can be described as the prob-ability that a randomly chosen individual from the

population under study belongs to trajectory group j.By allowing T^ to vary with individual characteristics,it is possible to test whether and by how much aspecified factor affects probability of group member-ship controlling for the level of other factors that po-tentially affect TT;.

Let Xj denote a vector of factors measuring indi-vidual, familial, or environmental factors that poten-tially are associated with group membership, and letTfj(Xj) denote the probability of membership in groupj given Xj. For a two-group model the logit model is anatural candidate for modeling group membershipprobability as a function of *,-. For this special case,we need only estimate Tr;U,) for one group, say Group1, because irfjc,) = 1 - TT,(J:,), where

e*f>

For the more general case, in which there are morethan two groups, the logit model generalizes to themultinomial logit model (Maddala, 1983):

(8)

where the parameters of the multinomial logit model,0y, capture the impact of the covariates of interest, *,-,on probability of group membership. Without loss ofgenerality, 9; for one "contrast" group can be set equalto zero. The coefficient estimates for the remaininggroups should be interpreted as measuring the impactof covariates on group membership relative to thecontrast group.

Table 6 illustrates the application of the extendedmodel to the Montreal data. Results are reported for afour-group model in which probability of group mem-bership is related to four variables: one parent havingless than a ninth-grade education, both parents havingless than a ninth-grade education, having a motherwho began childbearing as a teenager, and scoring inthe lowest quartile of the measured IQ of the sampledboys. The first panel of Table 6 shows coefficientestimates and r statistics. For this example, the never-convicted group serves as the contrast group. Con-

1 ' Roeder et al. (in press) explored the impact of classi-fication error on statistical inference. One finding is notsurprising. When classification is highly certain, as is thecase in the Cambridge data, errors in inference using con-ventional methods are small.

Page 14: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

152 NAGIN

Table 6The Impact of Parental Education. Low IQ, and Teen Onset of Motherhood on Group Membership Probabilities:Montreal Data

Group

Variable-condition Never

Multinomial logit coefficientsConstant —Low education" — parent —Low education" — both parents —Low IQb —Teen momc —

Predicted membership probabilities basedNo risk factorsLow education" — both parents — onlyLow lQh onlyTeen mom1' only

Lowdesister

Highdesister Chronic

(with ± statistics given in parentheses)1.04(6.36)0.44 ( 1 .46)0.18(0.48)0.07 (0.22)0.89 (2.03)

on multinomial logit.20.15.15.08

Low education" — both parents — and low IQh and teen momc .03

-0.02 (-0.10)0.69(2.28)0.67(1.83)0.85 (2.85)1.10(2.57)

model coefficient.57.51.46.58.29

estimates.20.29.35.25.44

-2. 14 (-4.49)-0.1 9 (-0.30)

0.98(1.73)0.92(1.83)2.27 (3.89)

.02

.05

.04

.09

.25

Parent has less than a ninlh-grade education.Measured IQ score is in the lowest quartile of the sample.Participant's mother began childbearing as a teenager.

sider first the parental low-education variables. Rela-t ive to the never group, having one or more poorlyeducated parents does not significantly increase theprobability of membership in the low-level desistergroup (a = .05, one-tailed test). However, probabilityof membership in the high-level near-desister andchronic groups is significantly increased if both par-ents are poorly educated. Low IQ has a similar impacton group membership probability. Compared with thenever group, probability of membership in the low-level group is not significantly affected, whereas prob-ability of membership in the two higher aggressiontrajectories is significantly increased. Finally, the teenn'iom variable is associated with a significant increasein all group membership probabilities relative to thenever group.

The second panel of Table 6 shows calculations ofthe predicted probabilities of group membershipbased on the coefficient estimates. The calculationswere performed by substituting the coefficient esti-mates into Equation 8 and then computing groupmembership probabilities for assumed values of x/.The calculations show that if none of the risk factorsare present, the probability of the never or low-levelgroups is .77 and the probability of the chronic groupis only .02. The largest single factor affecting thechronic group membership probability is the teenmom variable. For the case in which an individual hasnone of the other risk factors except for a mother who

began childbearing as a teenager, the combined prob-ability of the never and the low-level desister groupdeclines to .66 and the probability of the chronicgroup increases to .09. Also reported is a high-riskscenario in which three risk factors are present—bothparents poorly educated, low IQ. and teen mother-hood. In this case the predicted probability of thechronic group is .25, an order of magnitude largerthan the no-risk-factor case. Also, the predicted prob-ability of the high-level near-desister group of .44 isdouble the probability for the no-risk-factor case.

Simultaneous estimation of the group-specific tra-jectory parameters and the group membership prob-abilities conditional on individual factors is easily ac-complished with the above-referenced software. Theefficiency of estimation is also relatively good. Al-though estimation time will depend on data set size,number of groups and covariates, and the speed of thecomputer, computation time is generally 5 to 10 min.For analyses in which hypotheses are well formed,such computation time is inconsequential. However,in analyses in which hypotheses are less well formedor analyses that are generally exploratory, much moreeffort will be put into mode! fitting. In these circum-stances computation times of 5 to 10 min per run maybe very cumbersome.

Two alternatives for highly efficient exploratorymodel fitting are recommended. Both are two-stageapproaches with the same first stage—identification

Page 15: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 153

of the best fitting model in terms of number of groups.The first stage involves a search for the best fittingmodel. This search is conducted without covariatesdistinguishing group membership. One alternative forthe second-stage analysis makes use of the groupmembership assignments based on the maximum pos-terior probability rule. Multinomial logit models arethen estimated relating group assignment to indi-vidual-level factors. A second alternative for the sec-ond-stage analysis is to regress group membershipprobabilities on the individual-level factors of inter-est. Both of these approaches for identifying candi-date factors that distinguish group membership arehighly efficient and easily conducted with conven-tional statistical software. Personal experience hasshown that the results of an exploratory analysis con-ducted by either of these methods provide a goodguide for specification of the proper model in whichtrajectories and the impact of covariates on member-ship probabilities are jointly estimated.12

Comparative Advantages and Disadvantages ofGroup-Based Modeling

In this article I have described a group-based mod-eling strategy for analyzing developmental trajecto-ries. Alternative methods generally treat the popula-tion distribution of development as continuous. Thetwo leading examples of this continuous modelingstrategy are hierarchical modeling and latent curvemodeling.'3 The group-based and continous modes ofanalysis each have their distinctive strengths. Growthcurve modeling, whether in the hierarchical or latentvariable tradition, is designed to identify average de-velopmental tendencies, to calibrate variability aboutthe average, and to explain that variability in terms ofcovariates of interest. By contrast, the mixture mod-eling strategy is designed to identify distinctive, pro-totypal developmental trajectories within the popula-tion, to calibrate the probability of populationmembers following each such trajectory, and to relatethose probabilities to covariates of interest.

Raudenbush (in press) offered a valuable perspec-tive on the types of problems for which these twobroad classes of modeling strategies are most suitable.He observed, "In many studies it is reasonable to as-sume that all participants are growing according tosome common function but that the growth param-eters vary in magnitude" (p. 30). He offered children'svocabulary growth curves as an example of such agrowth process. Two distinctive features of such de-

velopmental processes are (a) they are generallymonotonic—thus, the term growth—and (b) they varyregularly within the population. For such processes itis natural to ask, "What is the typical pattern ofgrowth within the population and how does this typi-cal growth pattern vary across population members?"Hierarchical and latent curve modeling are specifi-cally designed to answer such a question.

Raudenbush (in press) also offered an example of adevelopmental process—namely, depression—thatdoes not generally change monotonically over timeand does not vary regularly through the population.He observed, "It makes no sense to assume thateveryone is increasing (or decreasing) in depres-sion. . . . many persons will never be high in depres-sion, others will always be high, while others willbecome increasingly depressed (p. 30). For problemssuch as this, he recommended the use of a multino-mial-type method such as that demonstrated here. Thereason for this recommendation is that the develop-ment does not vary regularly across population mem-bers. Instead trajectories vary greatly across popula-tion subgroups both in terms of the level of behaviorat the outset of the measurement period and in the rateof growth and decline over time. For such problems amodeling strategy designed to identify averages andexplain variability about that average is far less usefulthan a group-based strategy designed to identify dis-tinctive clusters of trajectories and to calibrate howcharacteristics of individuals and their circumstancesaffect membership in these clusters.

Summary and Discussion

This article has demonstrated a semiparametric,group-based approach for analyzing developmental

12 Although either of the two-stage procedures providesan efficient approach for exploring the potential impact ofcovariates on group membership, they are not a substitutefor a final analysis based on the jointly estimated model.Roeder, Lynch, and Nagin (in press) have found that thefirst described two-stage procedure in particular tends tooverstate the statistical significance of covariates on groupmembership. The probable reason is that the procedure ig-nores the uncertainty about group assignments.

u Recent advances in latent curve modeling have adaptedthe conventional assumption of a continuous distribution ofgrowth curves to accommodate the group-based approachdescribed here. For an excellent summary of this advance inlatent curve modeling, see Muthen (in press).

Page 16: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

154 NAGIN

trajectories. Technically, the model is simply a mix-ture of probability distributions that are suitably speci-fied to describe the data to be analyzed. Three suchdistributions were illustrated: the Poisson, which issuitable for analyzing count data; the censored nor-mal, which is suitable for analyzing psychometricscale data with clusters of observations at the scalemaximum or minimum or both; and the binary logit,which is suitable for analyzing binary data. For maxi-mum flexibility the parameters that characterize thetrajectory of each group are specified to vary freelyacross groups. This allows for substantial cross-groupdifferences in the shape of trajectories.

The examples used to illustrate the method wereselected to demonstrate four of the method's mostimportant capabilities: (a) the capability to identifyrather than assume distinctive developmental trajec-tories; (b) the capability to estimate the proportion ofthe population best approximated by the various tra-jectories so identified; (c) the capability to relate theprobability of membership in the various trajectorygroups to characteristics of the individual and his orher circumstances; and (d) the capability to use theposterior probabilities of group membership for vari-ous other purposes such as sampling, creating profilesof group membership, or serving as regressors in mul-livariate statistical analyses.

The modeling strategy described here has only re-cently been developed. Opportunities for extensionabound. Three extensions seem particularly worth-while. Further development of approaches for decid-ing on the best fitting number of groups would beextremely valuable. Although BIC scores are veryuseful for this purpose, in addition it would be helpfulto have measures of the goodness of fit between actualand predicted trajectories. A second useful extensioninvolves the joint estimation of trajectories of differ-ent behaviors—for example, joint estimation of psy-chological well-being in childhood and employmentstatus over adulthood. This would provide the capac-ity to examine the relationship between the develop-ment patterns of distinct but likely linked behaviors.A third valuable extension involves developing diag-nostics for calibrating the degree to which develop-mental trajectories cluster into distinct groups or varyregularly according to some specified parametric dis-tribution (e.g., multivariate normal). Nagin and Trem-blay (in press) described another rationale that is notdiscussed here for the group-based modeling strat-egy—to avoid making strong and generally untestableassumptions about the population distribution of de-

velopmental trajectories. The semiparametric mixturecan be thought of as approximating an unspecified butlikely continuous distribution of population heteroge-neity in developmental trajectories. In so doing, astandard procedure in nonparametric and semipara-metric statistics of approximating a continuous distri-bution by a discrete mixture is adopted (Follman &Lambert, 1989; Heckman & Singer, 1984; Lindsay,1995). A difficult but valuable research effort wouldbe to develop metrics for calibrating the degree towhich trajectory groups are actually closely approxi-mating a specified continuous distribution.

References

Appel, T. A. (1987). The Cuvier-Geoffroy debate: Frenchbiology in the decades before Darwin. New York: OxfordUniversity Press.

Bryk, A. S., & Raudenbush, S. W. (1987). Application ofhierarchical linear models to assessing change. Psycho-logical Bulletin, 101, 147-158.

Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical lin-ear models for social and behavioral research: Applica-tion and data analysis methods. Newbury Park, CA:Sage.

Cameron, A. C., &Trivedi, P. K. (1986). Econometric mod-els based on count data: Comparisons and applications ofsome estimators and tests. Journal of Applied Economet-rics, I, 29-53.

Cantor, N., & Genero, N. (1986). Psychiatric diagnosis andnatural categorization: A close analogy. In T. Millon &G. Klerman (Eds.), Contemporary directions in psycho-pathology (pp. 233-256). New York: Guilford Press.

Caspi, A. (1998). Personality development across the lifecourse. In W. Daom (Series Ed.) & N. Eisenberg (Vol.Ed.), Handbook of child psychology: Vol. 3. Social, emo-tional, and personality development (5th ed., pp. 311-388). New York: Wiley.

D'Unger, A., Land. K., McCall, P., & Nagin, D. (1998).How many latent classes of deliquent/criminal careers?Results from mixed Poisson regression analyses of theLondon, Philadelphia, and Racine cohorts studies. Ameri-can Journal of Sociology, 103, 1593-1630.

Erdfelder, E. (1990). Deterministic developmental hypoth-eses, probabilistic rules of manifestation, and the analysisof finite mixture distributions. In A. von Eye (Ed.), Sta-tistical methods in longitudinal research: Time series andcategorical longitudinal data (Vol. 2, pp. 471-509). Bos-ton: Academic Press

Farrington, D. P., & West, D. J. (1990). The Cambridgestudy in delinquent development: A prospective longitu-

Page 17: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 155

dinal study of 411 males. In H. Kerner & G. Kaiser(Eds ), Criminality: Personality, behavior, and life his-tory ipp. 115-138). New York: Springer-Verlag.

Follmari, D. A., & Lambert, D. (1989). Generalized logisticregression by nonparametric mixing. Journal of theAmerican Statistical Association, 84, 295-300.

Ghosh, J. K., & Sen, P. K. (1985). On the asymptotic per-formance of the log-likelihood ratio statistic for the mix-ture model and related results. In L. M. LeCam & R. A.Olshen (Eds.), Proceedings of the Berkeley Conference inHonor of Jerzy Neyman and Jack Kiefer (Vol. 2, pp.789-806). Monterey, CA: Wadsworth.

Goldstein, H. (1995). Multilevel statistical models (2nd ed.).London: Edward Arnold.

Greene. W. H. (1990). Econometric analysis. New York:Macmillan.

Greene. W. H. (1991). LIMDEP User's Manual (Version6.0) 'computer software], Bellport, NY: EconometricsSoftware, Inc.

Heckrriin, J., & Singer, B. (1984). A method for minimizingthe i npact of distributional assumptions in econometricmodels for duration data. Econometrica, 52, 271-320.

Holyoak, K., & Spellman, B. (1993). Thinking. Annual Re-view of Psychology, 44, 265-315.

Jones, B.L., Nagin, D. S., & Roeder, K. (1998). A SASprocedure based on mixture models for estimating devel-opmental trajectories (Working Paper No. 684). CarnegieMellon University, Department of Statistics.

Kandel. D. B. (1975). Stages in adolescent involvement indrug use. Science, 190, 912-914.

Kass, R. E., & Raftery, A. E. (1995). Bayes factor. Journalof th,- American Statistical Association, 90, 773-795.

Kass, R E., & Wasserman, L. (1995). A reference Bayesiantest :or nested hypotheses and its relationship to theSchwarz criterion. Journal of the American StatisticalAssociation, 90, 928-934.

Keribin. C. (1997). Consistent estimation of the order ofmixture models (Working Paper No. 61). Universited'Evry-Val d'Essonne, Laboratorie Analyse et Proba-bilite.

Lambeu, D. (1993). Zero-inflated Poisson regression, withan application to defects in manufacturing. Technomet-rics, 34, 1-13.

Land, K., McCall, P.. & Nagin, D. (1996). A comparison ofPoisson, negative binomial, and semiparametric mixedPoisson regression models with empirical applications tocriminal careers data. Sociological Methods & Research,24, 387^40.

Land, K., & Nagin, D. (1996). Micro-models of criminalcareers: A synthesis of the criminal careers and life

course approaches via semiparametric mixed Poissonmodels with empirical applications. Journal of Quantita-tive Criminology, 12, 163-191.

Lindsay, B. G. (1995). Mixture models: Theory, geometry,and applications. Hayward, CA: Institute of Mathemati-cal Statistics.

Loeber, R. (1991). Questions and advances in the study ofdevelopmental pathways. In D. Cicchetti & S. Toth(Eds.), Models and integrations. Rochester Symposiumon developmental psychopathology (Vol. 3, pp. 97-115).Rochester, NY: University of Rochester Press.

MacCallum, R. C., Kim, C., Malarkey, W. B., & Kiecolt-Glaser, J. K. (1997). Studying multivariate change usingmultilevel models and latent curve models. MultivariateBehavioral Research, 32, 215-253.

Maddala, G. S. (1983). Limited dependent and qualitativevariables in econometrics. New York: Cambridge Uni-versity Press.

Markman, E. M. (1989). Categorization and naming in chil-dren: Problems of induction. Cambridge, MA: MITPress.

McArdle, J. J., & Epstein, D. (1987). Latent growth curveswithin developmental structural equation models. ChildDevelopment, 58, 110-133.

Meredith, W., & Tisak, J. (1990). Latent curve analysis.Psychometrika, 55(1), 107-122.

Moffitt, T. E. (1993). Adolescence-limited and life-coursepersistent antisocial behavior: A developmental tax-onomy. Psychological Review, 100, 674-701.

Muthen, B. O. (1989). Latent variable modeling in hetero-geneous populations. Psychometrika, 54(4), 557-585.

Muthen, B. O. (in press). Second-generation structuralequation modeling with a combination of categorical andcontinuous latent variables: New opportunities for latentclass/latent curve modeling. In A. Sayers & L. Collins(Eds.), New methods for the analysis of change. Wash-ington, DC: American Psychological Association.

Muthen, B. O., & Curran, P. (1997). General longitudinalmodeling of individual differences in experimental de-sign: A latent variable framework for analysis and powerestimation. Psychological Methods, 2, 371-402.

Nagin, D., Farrington, D., & Moffitt, T. (1995). Life-coursetrajectories of different types of offenders. Criminology,33, 111-139.

Nagin, D., & Land, K. (1993). Age, criminal careers, andpopulation heterogeneity: Specification and estimation ofa nonparametric, mixed Poisson model. Criminology, 31,327-362.

Page 18: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

156 NAGIN

Nagin, D., & Tremblay, R. E. (in press). Trajectories ofboys' physical aggression, opposition, and hyperactivityon the path to physically violent and nonviolent juveniledelinquency. Child Development.

Parzen, E. (1960). Modern probability theory and its appli-cations. New York: Wiley.

Patterson, G. R., DeBaryshe, B. D., & Ramsey, E. (1989). Adevelopmental perspective on antisocial behavior. Ameri-can Psychologist, 44. 329-335.

Raftery, A. E. (1995). Bayesian model selection in socialresearch. Sociological Methodology, 25, 111-164.

Raudenbush, S. W. (in press). Toward a coherent frame-work for comparing trajectories of individual change. InA. Sayers & L. Collins (Eds.), New methods far theanalysis of change. Washington DC: American Psycho-logical Association.

Rindskopf, D. (1990). Testing developmental models usinglatent class analysis. In A. von Eye (Ed.), Statisticalmethods in longitudinal research: Time series and cat-egorical longitudinal data (Vol. 2, pp. 443^469). Boston:Academic Press.

Roeder, K., Lynch, K., & Nagin, D. (in press). Modelinguncertainty in latent class membership: A case study fromcriminology. Journal of the American Statistical Associa-

Schwarz, G. (1978). Estimating dimensions of a model. An-nals of Statistics, 6, 461-464.

STATA 4.0 [Computer software]. (1995). College Station,TX: STATA Corporation.

Titterington, D. M., Smith, A. F. M., & Madov, U. E.(1985). Statistical analysis of finite mixture distributions.New York: Wiley.

Tobin, J. (1958). Estimation of relationships for limited de-pendent variables. Econometrica, 26, 24-36.

Tremblay, R. E., Desmarais-Gervais, L., Gagnon, C., &Charlebois, P. (1987). The preschool behavior question-naire: Stability of its factor structure between culture,sexes, ages and socioeconomic classes. InternationalJournal of Behavioral Development, 10, 467-484.

Wasserman, L. (1997). Bavesian model selection and modelaveraging (Working Paper No. 666). Carnegie MellonUniversity, Department of Statistics.

Widiger, T. A., & Frances, A. (1985). The DSM-Hl person-ality disorders. Archives of General Psvchiatry, 42, 615-623.

Willett, J. B., & Sayer, A. G. (1994). Using covariancestructure analysis to detect correlates and predictors ofindividual change over time. Psychological Bulletin, 116,363-381.

Appendix

Derivation of Likelihood

As described in the main text, the form of the like-lihood for each individual i is

where P(Y-) is the unconditional probability of ob-serving individual /'s longitudinal sequence of behav-ioral measurements — Yt, P'(Y,) is the probability of Y/jnven membership inj, and TT; is the probability of j.Thus, the likelihood for the entire sample of N indi-viduals is

N

L =

For given j, conditional independence is assumedfor the sequential realizations of the elements of Y/, y,,over the T ages of measurement. Thus,

T

where pi(yit) is the probability distribution function ofv,, given membership in group /

For the censored normal, /^'(v,-,) equals

^min P Xil

(T ' \ afor S,nin := y,, < ax' and

where cj> and <J> are, respectively, the density functionand cumulative distribution function of a normal ran-

Page 19: Analyzing Developmental Trajectories- A Semiparametric, Group-Based Approach

ANALYZING DEVELOPMENTAL TRAJECTORIES 157

dom variable with mean ^xit = (jj, + $\ Age,, +P^Agef, and standard deviation cr, and Smin and Smax

are, respectively, the scale minimum and maximum.For the Poisson-based model,

/%,,)=-

more general form of the Poisson-based model thatmakes use of the zero-inflated Poisson distribution.

Finally, for the binary logit-based model,

Also, see Land, McCall, and Nagin (1996), Naginand Land (1993), or Roeder, Lynch, and Nagin (inpress) for a derivation of the likelihood for the still

Received June 15, 1998Revision received October 16, 1998

Accepted January 21, 1999

Members of Underrepresented Groups:Reviewers for Journal Manuscripts Wanted

If you are interested in reviewing manuscripts for APA journals, the APA Publicationsand Communications Board would like to invite your participation. Manuscript reviewersare vital to the publications process. As a reviewer, you would gain valuable experiencein publishing. The P&C Board is particularly interested in encouraging members ofUnderrepresented groups to participate more in this process.

If you are interested in reviewing manuscripts, please write to Demarie Jackson at theaddress below. Please note the following important points:

• To be selected as a reviewer, you must have published articles in peer-reviewedjournals. The experience of publishing provides a reviewer with the basis forpreparing a thorough, objective review.

• To select the appropriate reviewers for each manuscript, the editor needs detailedinformation. Please include with your letter your vita. In your letter, please identifywhich APA journal you are interested in, and describe your area of expertise. Be asspecific as possible. For example, "social psychology" is not sufficient—youwould need to specify "social cognition" or "attitude change" as well.

• Reviewing a manuscript takes time. If you are selected to review a manuscript, beprepared to invest the necessary time to evaluate the manuscript thoroughly.

Write to Demarie Jackson, Journals Office, American Psychological Association, 750First Street, NE, Washington, DC 20002-4242.