supp.apa.orgsupp.apa.org/.../DEV-DEV2-Naerde20132253-RRR-F1.FIN.docx · Web viewThus, by means of the Rasch scaling, the aggression items, as well as interview or questionnaire response

Supplemental Materials

Normative Development of Physical Aggression From 8 to 26 Months

by Ane Nærde et al, 2014, Developmental Psychology

http://dx.doi.org/10.1037/a0036324

Appendix A

Descriptive Statistics for Predictor Variables After Estimation Maximization

Imputation (N =1,148)

1

Variable M (SD) or % Range

Boy 52%Same age siblings 61%Two parents 96%Young mother (≤25) 14%Young father (≤25) 6%Maternal education 14.30 (2.60) 9–18Paternal education 13.86 (2.61) 9–18Maternal mental distress 1.34 (0.37) 1.00–3.77Paternal mental distress 1.28 (0.28) 1.00–4.00Child Distress to Limitations

1.69 (0.31) 1.00–2.82

Child Activity 2.07 (0.28) 1.09–2.83Child Soothability 2.63 (0.29) 1.56–3.00




http://dx.doi.org/10.1037/a0036324

Appendix B

Rasch Item, Category, and Person Estimates

We employed Rasch scaling (Bond & Fox, 2007) to construct a single linear scale that

could accommodate and equate response patters with varying numbers of completed items on

different response formats. The Rasch model estimates locations of items and persons on one

single linear latent dimension expressed in logits, with probabilistic equations linking

manifest data (i.e., item endorsement by individuals) to the latent model. Thus, by means of

the Rasch scaling, the aggression items, as well as interview or questionnaire response

vectors, were arranged along an underlying dimension representing less to more aggression,

where each unit on the scale represents an equal increase in aggression.

Item estimates for dichotomous items represent the point where item endorsement is

equally likely as not. For Likert-type items, both items and thresholds between response

categories are estimated. The estimate for a Likert-type item plus the estimate of the

threshold between two response categories represents the point where the likelihood of

endorsement of the higher category is as likely as not (Bond & Fox, 2007). Persons (or, in our

case, response sets) receive estimates on the same underlying dimension. A person estimate

exceeding an item estimate (or item estimate plus category threshold) indicates a higher

likelihood than not of the person to endorse the particular item (or category). Rasch modeling

does not involve interpolation in the case of missing data; pairs of items are compared based

on persons who contributed reports on both items, and pairs of persons are compared based

on items that both had responded to.

Rasch modeling consequently allows estimating items on a common scale even when

all persons do not have data on all items, as long as there is a substantial number of persons

who have overlapping data for subsets of items (Wright & Stone, 1999). We thus combined

information from personal interviews, where aggression items were rated on a frequency

scale, and information from telephone interviews with a yes or no response format. The

2




http://dx.doi.org/10.1037/a0036324

Rasch analyses equated these different response formats and produced item and person

estimates on a common scale.

In order to obtain Rasch item location and category threshold estimates in a sample

where measurements were independent (i.e., each person was represented only once) and in

which it would be possible to equate telephone and questionnaire responses, we created a

separate calibration subsample. This was done in a five-step procedure:

1. First, we identified 105 response sets for unique children from parents who had

completed both the 12-month questionnaire and a telephone interview within a 30-day time

frame, including both the questionnaire and the telephone-interview responses on aggression

in the response set.

2. Next, we identified 83 response sets for unique children (in no case the same as

those included in the previous step) from parents who had completed both the 24-month

questionnaire and a telephone interview within a 65-day time frame, including also for these

both questionnaire and telephone-interview responses in the set.

3. The remaining 969 participating children (out of the total 1,157 not withdrawn

from the study at the time of the analyses, and not selected in one of the two previous steps),

were randomized into three groups, I, II, and II, using a random number generator and pre-

specified cutoffs that were intended to yield an approximate distribution of 50%, 25%, and

25% into these groups in order to make up an about equal balance between telephone-

interview and questionnaire response sets in the calibration subsample. For the children

randomized to Group I, one single telephone-interview response set (from any age) was

selected, if available, resulting in 387 included interviews.

4. From the children randomized to Group II, one single response set from the 12-

month questionnaire was included if available, resulting in 245 included sets.

3




http://dx.doi.org/10.1037/a0036324

5. From the children randomized to Group III, one single response set from the 24-

month questionnaire was included if available, resulting in 218 included sets.

In all, our calibration sample thus came to be composed of 1,038 response sets from

unique children. That 119 children were not represented in the calibration sample was a result

of some families not having contributed a 12-month, a 24-month, or any telephone interview

to the data base, depending on their selection into Group I, II, or III in Steps 3-5 above

(participation in all parts of the project was optional; a minority of families had opted to not

take part in telephone interviews; at each age some families were unable to participate even

though they remained in the project; a small proportion of families chose to discontinue their

participation in data collection up to the age span included here; and in very few exceptional

cases individual interviews had been lost due to technical failures at data collection).

A series of Rasch analyses were performed on the calibration set with respect to

measurement precision and the underlying assumption of a unidimensional latent variable,

equally valid across the time span and for boys and girls. First, we addressed the rating scale

format for questionnaire responses. The original response format was a 7-point scale with 1

(never/not in the past year), 2 (on single occasions), 3 (1-3 times per month), 4 (once per

week), 5 (2-3 times per week), 6 (1-2 times per day), and 7 (3 times per day or more). Rating-

scale analysis results including small and in one case non-monotonous increase in thresholds,

as well as by indistinct category-response probabilities across the variable span, suggested

that the number of categories was larger than optimal. Several possible solutions for

collapsing the original 7-point scale, shown in Table SB1, were inspected for category

frequencies, average measures (which increased monotonously over categories for all

solutions), threshold estimates, probability curves (i.e., distinct), category fit (i.e., maximum

mean square outfit statistic), and person reliability (Bond & Fox, 2007; Linacre, 1999, 2013).

The best result, shown as the last solution in Table SB1, was obtained by collapsing

Categories 2, 3, and 4 into one and Categories 5 and 6 into one, which resulted in a clear

distinction of average measures across categories, wide and equal spacing of thresholds, clear

4




http://dx.doi.org/10.1037/a0036324

distinction of category probabilities across the variable continuum, and a maximum outfit

mean square of 1.33 for any category. The resulting 4-point scale used in all subsequent

analyses was thus 1 (never/not in the past year), 2 (on single occasions/1-3 times per

month/once per week), 3 (2-3 times per week/1-2 times per day), and 4 (3 times per day or

more).

Next, we inspected items for differential item functioning (DIF) across age. Several of

the items were only applied to a narrow age band (i.e., only to the 24-month questionnaire, or

in telephone interviews from 18 to 23 months) and needed only one age estimate (see Table

SB2). DIF analyses were applied to inspect the remaining items for the need for separate item

location estimates for younger (i.e., personal interview at 12 months and telephone interviews

up to 17 months) versus older (i.e., personal interview at 24 months and telephone interviews

from 18 months onward) ages. We applied conservative criteria of a DIF contrast of at least

0.5 logits and a t test significance level of p < .05. The results suggested that one item (biting)

needed to be estimated separately for younger versus older children in both telephone

interviews and questionnaires (the DIF contrast was 1.0 for telephone interview data and 1.1

for questionnaire data). When we accordingly re-analyzed the data set with separate item

estimates for biting for younger versus older children, it became clear that the item for

younger children fit poorly within the measurement model for both telephone interview and

questionnaire data (the outfit mean square was 1.6 for both items), suggesting that something

other than physical aggression played an important role in levels of biting for young children.

We found one plausible explanation in the fact that children develop teeth at varying ages and

that many children have a period of frequent biting concurring with the arrival of teeth.

Having identified, although post hoc, a plausible reason for biting not being an optimal

indicator of physical aggression at young ages, we eliminated the item for measurements

under 18.0 months in all further analyses. After excluding biting for younger children, there

were no further DIF findings for age by the chosen criteria.

5




http://dx.doi.org/10.1037/a0036324

Finally, DIF analyses were applied to reveal the need for separate item location

estimates for boys versus girls. These analyses led to the finding that the item concerning

throwing things at others should be estimated separately for boys and girls in telephone—as

well as personal interview data (the DIF contrast was 0.9 for telephone interview data and 0.6

for questionnaire data). We accordingly estimated this item separately for boys and girls in

both telephone-interview and questionnaire data. There were no further significant DIF

findings for child gender.

Item location estimates for dichotomous and rating-scale items based on the response

sets in the calibration subsample after the above modifications of the scale are shown in

Table SB2. The estimates of category thresholds were –3.62, –0.20, and 3.82, respectively,

for the thresholds between the seven response categories in the questionnaire response

format. The mean standard error for item estimates was 0.15. The Rasch item reliability

estimate was .99 (both “model” and “real”). Item separation was 9.62. Commonly reported

Rasch-model fit statistics include infit and outfit indices for items and persons. Infit refers to

information-weighted or inlier-sensitive fit, sensitive to the coherence of item response

patterns and persons when item difficulty and person ability match closer. Outfit refers to

outlier-sensitive fit, sensitive to the coherence of item response patterns and persons when

item difficulty and person ability are further apart. Infit and outfit are reported as mean

squares with departures from the expected value of 1.0 indicating the amount of randomness

in data, as well as standardized (z-transformed) values with departures from the expected

value of 0.0 indicating the degree to which the data fit the model perfectly (Linacre, 2002).

The average item infit mean square measure was 0.99, and the mean standardized infit –0.1

with a standard deviation of 1.8. The average item outfit mean square measure was 0.97, and

the mean standardized outfit –0.2 with a standard deviation of 1.7. Visual inspection of ICC

plots suggested good to reasonable fit of actual reported level relative to theoretical ICC for

all items. The item “Pushing someone to get his/her will,” from questionnaire data, displayed

the highest (i.e., most underfitting) infit (mean square), 1.21, as well as outfit (mean square),

1.23 (within the range of values commonly found in questionnaire items); this was the only

6




http://dx.doi.org/10.1037/a0036324

item with a standardized outfit (3.7) exceeding 2.0. Deleting this item from the scale would

result in an increase in minimum-estimated person measurements and a decrease in person

reliability; also, Rasch factor analysis (below) did not suggest that this item contributed

important variance contrary to an underlying unidimensional construct; on these grounds and

because we could not identify a theoretical rationale to exclude the item, we decided to keep

it in the scale.

The items “Hits you” and “Hits siblings” from questionnaire data had standardized

outfits less than –2.0 (for both the dichotomous and rating-scale forms) suggesting overfitting

or “better than expected” performance of these items with regard to the underlying

measurement model. It has been suggested that overfit may have no practical implications in

common human-science applications (Bond & Fox, 2007, p. 240).

The mean standard error for person estimates was 0.98 excluding measures for

extreme scores, and 1.21 including measures for extreme scores. The Rasch “model” person

reliability estimate (considered to be an upper bound to a standard-error based reliability

estimate equivalent to traditional “test” reliability), including measures for extreme scores,

was .56, and the “real” (lower bound) person reliability estimate was .51. These modest-size

estimates are consistent with many response sets consisting of only two or three dichotomous

items, as was further confirmed in separate calculations of model reliability estimates in

response sets of varying lengths in the calibration sample (i.e., the mean-squared SE estimate

in each group of response sets divided by the sample variance). For nine-item questionnaire

response sets at age 24, the model-based reliability estimate was 0.88, while the

corresponding estimate for three-item questionnaire response sets at age 12 was .48. The

reliability of six-item dichotomous telephone interview response sets starting at 18 months

was estimated at .58, while the reliability for the three-item form for the younger ages

calculated in the same fashion was .26. The reliability for composite response sets (1

telephone interview + 1 questionnaire from the same respondent within a short time frame)

7




http://dx.doi.org/10.1037/a0036324

was .89 to .92 for 13- to 15-item sets at age about 24 months, and .58 to .59 for four- to six-

item sets at age about 12 months.

The person separation was 1.02. The mean person infit measure (mean square) was

0.98, and the mean standardized infit –0.1 with a standard deviation of 1.1. The proportion of

persons with a standardized infit exceeding the –2 to +2 range was 3.3%. The mean person

outfit measure (mean square) was 0.98, and the mean standardized outfit 0.0 with a standard

deviation of 1.1. The proportion of persons with a standardized outfit exceeding the –2 to +2

range was 3.6%. These results suggest a good fit of the measurement model for persons,

however in the presence of low measurement precision.

Rasch factor analysis was performed to investigate the dimensionality of the

measured construct. The latent variable of the Rasch Factor Analysis accounted for 45.8% of

the variance (26.7% persons; 19.1% items; all three empirical estimates differed by no more

than 0.1% from model estimates). The unexplained variance in the first contrast accounted

for 5.6% of the total variance (10.3% of the unexplained variance). Thus, the notion of

unidimensionality of the measure was supported by the Rasch factor analysis results.

The item and threshold estimates from the composite sample of unique response sets

were used to estimate Rasch-scaled person locations (e.g., person estimates) for all response

sets in our entire data set; these estimates were used as the measure of physical aggression in

our further analyses. Person location estimates, excluding minimum and maximum estimates,

ranged from –3.35 to +1.15 logits based on dichotomous items, and from –5.49 to +3.86

logits based on 4-point rating scale items. Including minimum and maximum estimates,

person estimates range from –6.86 to +5.06.

Inspection of person/item maps in the calibration subsample as well as of the

distribution of estimates (including minimum and maximum estimates) in the entire sample

revealed that discrimination at the top of the continuum was excellent for 4-point

questionnaire measurements at 12 and 24 months, with a fair number of threshold estimates

8




http://dx.doi.org/10.1037/a0036324

at the top of the continuum to discriminate among persons and, as a result, very few

maximum estimates (0.1%) based on such data. Poor discrimination with more minimum

estimated measures occurred most often for the children estimated to be low on physical

aggression in the earlier part of the age span covered, that is, up to 17 months, and

particularly when measured by telephone interview, which in most of these cases included

only two or three dichotomous items. Telephone interviews with few items up to 17 months

also accounted for almost all of the maximum person estimates. All in all, the discrimination

was better at the top of the continuum (most aggressive) than at the bottom (least aggressive).

If the measurement had been based on single measurements, it might have been preferable to

exclude minimum and maximum estimates, which would not be very informative of

children’s aggression levels. However, in the current context of a longitudinal study with

several measurements per child, minimum as well as maximum estimates add meaningful

information to the estimation of individual as well as group-level development (Bond & Fox,

2007).

In order to lessen the influence of a small proportion of extreme data points (4%—in

all cases but 10 being minimum and maximum estimated scores) on the growth curve model

estimates, scores above 3.86 (the highest non-maximum-estimated score) were recoded to

3.87, and score estimates below –5.09 (the lowest non-minimum-estimated score based on

the 12-month personal interview) were recoded to –5.10.

Rasch analyses were performed in Winsteps, Version 3.80.1 (Linacre, 2013). The

third author wrote software to compute Rasch-model person estimates based on known item

and category estimates based on the algorithm presented by Linacre (1998).

9




http://dx.doi.org/10.1037/a0036324

References

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in

the human sciences (2nd ed.). New York, NY: Routledge.

Linacre, J. M. (1998). Estimating Rasch measures with known polytomous item difficulties.

Rasch Measurement Transactions, 12, 638.

Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome

Measurement, 3, 103-122.

Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean? Rasch

Measurement Transactions, 16, 878.

Linacre, J. M. (2013). Winsteps® Rasch measurement (Version 3.80.1) [Computer software].

Chicago, IL: Winsteps.com.

Wright, B., & Stone, M. (1999). Measurement essentials (2nd ed.). Wilmington, DE: Wide

Range.

10




http://dx.doi.org/10.1037/a0036324

Table SB1

Diagnostic Statistics for Rasch Rating Scale Collapsing Procedure

Category threshold Probability Max.

Category frequency increase curve outfit Person

distinct mean relia-

Solution Min. Max. Min. Max. peaks square bility

7 categories (original)

1234567 31 1580 Negative 1.47 No 1.45 .36

6 categories

1223456 31 1580 Negative 2.45 No 1.51 .42

1233456 31 1580 0.27 1.65 No 2.10 .39

1234456 31 1580 Negative 1.89 No 1.40 .38

1234566 202 1580 Negative 1.02 No 1.14 .36

5 categories

1223345 31 1580 1.38 1.97 Yes, even, flat 1.53 .45

1223445 31 1580 Negative 4.11 No 1.32 .43

1223455 202 1580 Negative 2.47 No 1.21 .42

1222345 31 1722 0.00 3.07 No 1.59 .45

1233445 31 1580 0.37 3.09 Barely 1.27 .40

1233455 202 1580 0.18 1.42 Barely 1.11 .39

1233345 31 1580 0.14 3.00 Barely 1.37 .41

11




http://dx.doi.org/10.1037/a0036324

1234455 202 1580 Negative 1.81 No 1.08 .38

1234445 31 1580 Negative 4.32 No 1.20 .39

4 categories

1223344 202 1580 1.32 2.01 Yes, even, flat 1.09 .45

1223334 31 1580 1.95 4.05 Yes, uneven 1.24 .45

1222334 31 1722 2.97 3.12 Yes, even, high 1.33 .47

Note. Solutions are represented by their sequence of recoded categories; for example,

1223456 indicates a solution where the original Categories 2 and 3 have been collapsed into

one. Rating scale collapsing analyses were performed using the calibration subsample, which

included 3833 rating-scale observations. The observed frequency count was 1580, 896, 543,

283, 329, 171, and 31, respectively, in the original response Categories 1 through 7. In all the

solutions tested, the average measure increased monotonically over categories.

12




http://dx.doi.org/10.1037/a0036324

Table SB2

Descriptive Statistics and Rasch Item Measure Estimates for Dichotomous Telephone Interview Responses and 4-Point Questionnaire Responses on Physical Aggression Items

Dichotomous telephone interview responsesa 4-point questionnaire responsesb

“Yes” for behavior in past 14 days Rasch 12-month formc 24-month formd Rasch

-9 10-11 12-13 14-15 16-17 18-19 20-21 22-23 i.est.e M (SD) M (SD) i.est.f

Hits you 36% 37% 34% 43% 55% 61% 60% 56% -2.63 1.81 0.78 2.22 0.57 -0.98

Hits siblingsg 22% 24% 27% 36% 51% 59% 63% 59% -2.46 1.70 0.74 2.31 0.67 -0.98

Pushes someone to get will 18% 18% 31% 37% 47% 48% 54% 54% -1.89 1.56 0.70 2.08 0.66 -0.14

Boys - Throws things at others 40% 38% 46% 26% -1.55 2.02 0.66 0.17

Girls - Throws things at others 19% 24% 29% 25% -0.41 1.70 0.61 0.86

Hits other adults 1.67 0.61 1.26

Bites someoneh 29% 25% 23% -0.14 1.57 0.59 1.54

Pulls hair 1.57 0.65 1.74

Pinches someone 1.58 0.67 1.7713




http://dx.doi.org/10.1037/a0036324

Kicks someone 8% 8% 10% 11% 1.22 1.35 0.55 2.60

Ni 379 873 381 1348 756 887 1140 102 1216 1140

Note. i.est = item measure estimate.

aParticipants were asked to respond yes to behavior descriptions that fit the child now or in the past 2 weeks. Age categories indicate child’s age in whole months (e.g., 10-11 includes children aged 10.00 until 11.99 months). Aggression items should have been administered from age 10 months, but in some cases were administered at earlier ages depending on choices made by the interviewer. For similar reasons, a small number of participants may have responded to items not intended for the age category (percentages for infrequently administered items [all with n < 8] are not displayed). bParticipants were asked to indicate the frequency on a 7-point scale, which was reduced to a 4-point scale for the current set of analyses based on Rasch rating scale analyses results: 1 (never/not in the past year), 2 (on single occasions /1-3 times per month/once per week), 3 (2-3 times per week/1-2 times per day), and 4 (3 times per day or more). cThe 12-month interview was to take place as close as possible to the child’s first birthday. The mean age at the interview was 12.2 months (SD = 0.6 months). 95% of interviews took place between ages 11.3 and 13.5 months. dThe 24-month interview was to take place as close as possible to the child’s second birthday. The mean age at the interview was 24.2 months (SD = 0.8 months). 95% of interviews took place between ages 23.3 and 25.6 months. eRasch item measure estimates for dichotomous items in joint analysis equating dichotomous and 4-point response test forms. fRasch rating scale model item measure estimates for 4-point items in joint analysis equating dichotomous and 4-point response test forms. The estimates of category thresholds were -3.62, -0.20, and 3.82, respectively. gAnswered when the child had sibling(s). hResponses on biting prior to 18 months were excluded from analysis. iHighest number of valid responses for any of the items listed. There may be more than one interview per child (more than one telephone interview with either parent may have been conducted within the time frame, and both parents may have completed questionnaires).

14




http://dx.doi.org/10.1037/a0036324

Appendix C

Unconditional Growth Model

We built the unconditional growth model stepwise, modeling the assumed non-

linearity in the early development of physical aggression (PA) in accordance with theory,

prior research, and our initial plotting of mean levels of PA over time. Our first model

consisted of the average level of PA at 16.59 months only (i.e., intercept; Model 1 in Table

SC1). Consecutively, we added linear growth over time (i.e., age; Model 2), a change in this

growth over time (i.e., age squared; Model 3), a second change in growth over time (i.e., age

cubic; Model 4), and allowing for estimation of between-child heterogeneity around the

average linear growth (i.e., age random, Model 5). Each increasingly complex model showed

significantly improved model fit according to -2 log likelihood deviance statistics (Hox,

2002). When adding a random quadratic growth term in addition (Model 6, not tabled), the

model did not converge. Consequently, we considered the model presented as Model 5 in

Table SC1 as the final unconditional model including the intercept, the linear, quadratic and

cubic growth parameters, and the random variation around the linear growth parameter. Note

that in Models 4 and 5, the negative quadratic and cubic growth estimates were highly

significant albeit very small. Although these coefficients seem small, they are multiplied with

the child’s centered age to the power of two and three, respectively, in the full equation. Thus,

the small coefficients have considerable impact on the shape of the growth curve.

This model suggested that the average normative development of PA, as modeled in

our data, takes a nonlinear form (see Figure SC1). On average, there is little growth, and in

fact a small decrease, in the frequency of PA prior to 10 months. This is followed by a rapid

increase with an estimated peak around 20 to 22 months, and eventually a decrease toward 26

months. The estimated scale score at 8 months (i.e., -3.2; see Table SC2), corresponds to very

low frequencies of manifest PA (cf. Table 1). At 10 months, the increased average scale score

(i.e., -3.4) is at its lowest level, and in the range that corresponds to, for instance, 36% reports

of “Hits you.” Thereafter, estimated PA scores increase to peak level; around 20 to 22

15




http://dx.doi.org/10.1037/a0036324

months, the average PA score (i.e., -1.7) is in the range that corresponds to, for instance, 74%

of the children having reports of “Hits you.” After this age, PA decreases until 26 months, at

which the level equals that of 17- to 18-month-olds. Importantly, such complex models tend

to be less precise in the periphery; hence results regarding levels of PA at 26 months should

be interpreted with due caution.

The random part of the final model suggests that there was considerable residual

variance both within children over time (σ2℮ = 1.597, p <.001) and between children at

intercept (σ2u0 = 0.927, p < .001), indicating that there was variance in the development of

PA at both levels (within- and between-child) yet to be explained in the model. There was

also significant, albeit not very strong, variance in the distribution in the coefficients for age

between children (σ2u1 = 0.023, p <.001), suggesting that the linear slope of PA differs

somewhat between children. Note that we were only able to estimate this residual variance

around linear age (as the model with random age2 did not converge). Thus, the σ2℮ and σ2u1

coefficients are probably over-estimating the true residual variance in the full model. Finally,

the initial level of PA is positively related to the rate of growth; that is, higher levels of PA is

associated with a steeper growth (σu01 = 0.007, p <.001). Figure SC1shows the estimated

unconditional growth curve and the mean Rash estimates of PA scores, demonstrating a fairly

high overlap.

The analyses presented above are based on reports by both parents. To ensure that

systematic reporter bias was not confounding our results we re-estimated the models with

mother and father reports only. This did not change the coefficients substantively, though the

intercept was slightly higher for mothers. We also added a dummy variable as a fixed effect

indicating whether the mother or father reported. On average, mothers’ reports of PA were

0.146 units higher than fathers’ (p <.001), which is comparable to 1 month’s average linear

growth in PA. Thus, we maintained the parent dummy as covariate in the remaining models.

16




http://dx.doi.org/10.1037/a0036324

Table SC1Estimates for Unconditional Growth Curve Models of Physical Aggression from Age 8 to 24 Months, Age Centered on Mean (16.59 Months; N = 1,148)

Model 1

intercept

Model 2

+ age

Model 3+ age2

Model 4+age3

Model 5+ age random

Est. SE Est. SE Est. SE Est. SE Est. SE

Fixed part

Intercept -2.535*** 0.032 -2.533*** 0.032 -2.383*** 0.036 -2.419*** 0.036 -2.419*** 0.036

Age (cent) 0.128*** 0.003 0.135*** 0.003 0.224*** 0.008 0.226*** 0.008

Age 2 -0.007*** 0.001 -0.003** 0.001 -0.003*** 0.001

Age 3 -0.002*** 0.000 -0.002*** 0.000

Random part

σ2℮ 2.234*** 0.032 1.828*** 0.031 1.808*** 0.030 1.773*** 0.030 1.597*** 0.029

σ2u0 0.845*** 0.049 0.902*** 0.049 0.905*** 0.049 0.904*** 0.050 0.927*** 0.049

σ2u1 0.023*** 0.004

σ u01 0.007*** 0.001

17




http://dx.doi.org/10.1037/a0036324

df 3 4 5 6 8

-2LL 31431.39 30005.56 29927.92 29784.13 29602.66

Note: Est.= physical aggression estimate; cent = centered; -2LL = -2 log likelihood statistic.

**p < .01. ***p < .001.

18




http://dx.doi.org/10.1037/a0036324

19

Table SC2

Point Estimates for the Unconditional

Growth Curve (Model 5 in Table SC1) and Raw

Mean Physical Aggression Scores, Corresponding to

Figure SC1

Child age(months)

Point estimatefor PA

Mean PA

8 -3.2 -3.49 -3.3 -3.310 -3.4 -3.311 -3.3 -3.412 -3.2 -3.313 -3.1 -3.214 -2.9 -2.815 -2.7 -2.816 -2.4 -2.317 -2.2 -2.218 -2.0 -1.919 -1.8 -1.620 -1.7 -1.721 -1.7 -1.822 -1.7 -2.123 -1.8 -1.924 -2.0 -1.925 -2.3 -2.126 -2.7 -2.0

Note. PA = physical aggression.




http://dx.doi.org/10.1037/a0036324

Figure SC1. Mean raw scores and estimated unconditional growth curve of physical

aggression.

20




http://dx.doi.org/10.1037/a0036324

Appendix D

Point Estimates of Physical Aggression for the Conditional Growth Curves for

Prototypical Cases With Average Levels of All Risk Factors, High Levels, and Low

Levels, Corresponding to Figure 1

Child age(months)

Average risk High risk Low risk

8 –3.2 –2.7 –4.09 –3.3 –2.7 –4.210 –3.3 –2.6 –4.211 –3.2 –2.5 –4.212 –3.1 –2.3 –4.213 –2.9 –2.1 –4.114 –2.7 –1.8 –3.915 –2.5 –1.6 –3.716 –2.3 –1.3 –3.517 –2.0 –1.1 –3.318 –1.8 –0.9 –3.119 –1.7 –0.7 –3.020 –1.5 –0.6 –2.921 –1.5 –0.5 –2.822 –1.5 –0.5 –2.823 –1.6 –0.6 –2.924 –1.8 –0.9 –3.125 –2.1 –1.2 –3.326 –2.6 –1.7 –3.7

Appendix E

Raw PA Scores and Final Model Estimated Individual Growth Curves of Physical Aggression for Six Example Children

21




http://dx.doi.org/10.1037/a0036324

Figure SE1. Child age in months is shown on the x-axis, and Rasch-scaled physical

aggression (PA) measure on the y-axis. Circles represent raw (Rasch-scaled) PA scores at

measurement time points. The line represents the model predicted individual growth curve,

with crosses at measurement time points. All example children have data data points (sample

mean = 7.16) and include one 12-month and one 24-month personal interview. The example

children represent a variation with regard to elevation and shape of the curves. Children 1, 4,

and 6 represent some more common individual growth curves in the study sample. Children 2

(high), 3 (low), and 5 (flat) represent more rare individual growth curves. Children 1, 2, and 3

represent growth curves with smaller residuals (i.e., distances of raw PA scores from

predicted curve) being at the 10th, 26th, and 13th percentile, respectively, of the study

sample’s distribution of average squared residuals. Children 4, 5, and 6 represent growth 22




http://dx.doi.org/10.1037/a0036324

curves with larger residuals at the 62nd, 92nd, and 98th percentile, respectively, of the sample

distribution of average squared residuals.

23

Documents

supp.apa.orgsupp.apa.org/.../DEV-DEV2-Naerde20132253-RRR-F1.FIN.docx · Web viewThus, by means of the Rasch scaling, the aggression items, as well as interview or questionnaire response