21
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/230787941 Construct validation using multitrait-multimethod-twin data: The case of a General Factor of Personality Article in European Journal of Personality · May 2010 DOI: 10.1002/per.760 CITATIONS 58 READS 1,457 2 authors: Some of the authors of this publication are also working on these related projects: Bielefeld Longitudinal Study of Adult Twins (BiLSAT) View project Study of Personality Architecture and Dynamics (SPeADy) View project Rainer Riemann Bielefeld University 127 PUBLICATIONS 4,054 CITATIONS SEE PROFILE Christian Kandler Medical School Berlin 95 PUBLICATIONS 1,346 CITATIONS SEE PROFILE All content following this page was uploaded by Christian Kandler on 04 November 2017. The user has requested enhancement of the downloaded file.

Construct Validation Using Multitrait-Multimethod-Twin ... · also emphasize that the distinction between trait and method ... methods affect correlations between different ... univariate

Embed Size (px)

Citation preview

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/230787941

Construct validation using multitrait-multimethod-twin data:

The case of a General Factor of Personality

Article  in  European Journal of Personality · May 2010

DOI: 10.1002/per.760

CITATIONS

58

READS

1,457

2 authors:

Some of the authors of this publication are also working on these related projects:

Bielefeld Longitudinal Study of Adult Twins (BiLSAT) View project

Study of Personality Architecture and Dynamics (SPeADy) View project

Rainer Riemann

Bielefeld University

127 PUBLICATIONS   4,054 CITATIONS   

SEE PROFILE

Christian Kandler

Medical School Berlin

95 PUBLICATIONS   1,346 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Christian Kandler on 04 November 2017.

The user has requested enhancement of the downloaded file.

European Journal of Personality

Eur. J. Pers. 24: 258–277 (2010)

Published online in Wiley InterScience

(www.interscience.wiley.com) DOI: 10.1002/per.760

*D

C

Construct Validation Using Multitrait-Multimethod-TwinData: The Case of a General Factor of Personality

RAINER RIEMANN* and CHRISTIAN KANDLER

Department of Psychology, University of Bielefeld, Bielefeld, Germany

Abstract

We describe a behavioural genetic extension of the classic multitrait-multimethod study

design that allows estimating genetic and environmental influences on method effects in

twin studies (MTMM-T). Genetic effects and effects of the environment shared by siblings

are interpreted as indicators of convergent validity. In an application of the MTMM study

design, we used self- and peer report data to examine the higher-order structure of the

NEO-PI-R. Structural equation modelling did not support a general factor of personality in

multimethod data. The higher-order factor Stability turns out to be, at most, a weak trait

factor. Genetic effects on method factors indicate that especially self-reports but also peer

reports show convergent validity between twins but not between methods. Copyright #

2010 John Wiley & Sons, Ltd.

Key words: personality traits; construct validity; behavioural genetics; twin; general

factor of personality

INTRODUCTION

When Campbell and Fiske (1959) introduced their model to validate psychological

measures, the focus was on the relative influence of two sources of variance: trait variance

and method variance. Psychological tests were conceptualized as trait-method units

emphasizing that any psychological measurement confounds trait variance with method

variance. Only by varying methods and traits systematically, both sources of variance can

be disentangled. The present paper focuses on the two most frequently used methods of

personality measurement – self- and peer reports – in order to show that behavioural

genetic designs to collect personality data provide an important extension of the classical

multitrait-multimethod (MTMM) analysis. We demonstrate the usefulness of this model in

a study of a general factor of personality (GFP) in self- and peer reports on the Revised

NEO Personality Inventory (NEOPI-R; Costa & McCrae, 1992) using a German twin

sample.

Correspondence to: Rainer Riemann, Department of Psychology, Bielefeld University, Universitatsstr. 25,-33615 Bielefeld, Germany. E-mail: [email protected]

opyright # 2010 John Wiley & Sons, Ltd.

Received 28 October 2009

Revised 20 January 2010

Accepted 21 January 2010

Construct validation using MTMM-T data 259

In MTMM analyses, method variance refers to all effects a procedure of data collection

has on the covariation of traits measured by this procedure. Although there are some basic

categories for methods of personality measurement (like self-report, observer rating and

objective test), the definition of method and as a consequence the degree to which method

effects are controlled for depends on the aims of a particular study. Method effects can be

studied within these categories, by contrasting, for example positively or negatively scored

items of self-report questionnaires or between categories using data from multiple sources.

In addition, if we consider a broader sample of personality traits, a single mode of data

collection (e.g. observer ratings) may be linked to more than one method factor. For

example observers’ knowledge of target persons’ physical characteristics may distort

ratings on a number of activity or extraversion related traits, whereas knowledge about

targets’ mental capacities may differentially affect ability related trait judgments. Thus, the

pattern of correlations among traits measured by observer ratings may give rise to two

separate method factors.

Thus, estimates of method effects in MTMM analyses obviously depend on the specific

combination of methods chosen for a particular study (Eid, Lischetzke, Nussbeck, &

Trierweiler, 2003). For Campbell and Fiske (1959) method variance has been almost

inevitably invoked by irrelevant features of the measurement procedure. However, they

also emphasize that the distinction between trait and method is ‘relative to the test

constructor’s intent. What is an unwanted response set for one tester may be a trait for

another. . .’ (Campbell & Fiske, 1959, p. 85). While, for example much of the covariance

among explicit self-reported personality traits may reflect method variance in comparison

to implicit personality measures and/or physiological data (Egloff, Wilhelm, Neubauer,

Mauss, & Gross, 2002; Grumm& von Collani, 2007; Schultheiss & Brunstein, 2001) there

is a longstanding tradition in personality research to study self-concepts.

The scientific study of personality requires the use of objective measures that capture the

nature of a person independent of the particular scientist who applies the measure.

Objectivity is not an issue for most current measures of personality – like standardized tests

– as long as the attribute that is measured is defined operationally. As soon as we interpret

objectively registered responses as indicators of hypothetical or latent constructs the

objectivity of these inferences is a controversial matter. To ensure the objectivity of such

inferences is the central concern of construct validation procedures (Cronbach & Meehl,

1955) and the demonstration of convergent validity of independent measures is the central

means to achieve it. Campbell and Fiske’s (1959) emphasis on discriminant validity, in

addition, helps to maintain an organized system of psychological concepts, by requiring

that hetero-method measures of the same constructs correlate stronger than mono-method

measures of different constructs. This has important consequences for personality research

that go beyond the mere validation of personality measures.

We doubt that there will ever be a general agreement upon ‘master measures’ of

personality constructs (like, e.g. a set of gene loci) that serve as points of reference for the

calibration of other more economic measures. Thus, personality theory must rely on that

part of the measured variance that is shared by methods, which reflects consensually

validated variance of a personality construct (i.e. trait variance, Campbell & Fiske, 1959,

also denoted as universe variance in generalizability theory by Cronbach, Gleser, Nanda, &

Rajaratnam, 1972, or simply target variance, Hoyt, 2000). It is not sufficient to demonstrate

convergent and discriminant validity of a personality measure and then to proceed, as if

there is no method variance. For example if molecular genetic studies find an association

between a genetic polymorphism and a self-report measure of some trait – given the

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

260 R. Riemann and C. Kandler

average effect size of these associations – there is no way to decide whether the association

is with the method or with the trait.

Hofstee (1994) argues against the use of self-reports in personality research, because

they are inherently subjective. Acknowledging that individuals may have a privileged

access to information about their motives, preferences or secrets, he concludes: ‘If we

would emphasize subjective aspects, we would withdraw personality from scientific study.

In a scientific context, personality is by definition a public phenomenon’. (Hofstee, 1994, p.

155). Hofstee advocates the use of mean peer report measures, since judgment errors

cannot be averaged out in self-report measures.

In sum, the classical MTMM design in combination with modern structural equation

modelling allows one to quantify the relative influence of method effects and traits on

personality measures as well as to test specific models of the structure of latent personality

constructs adjusted for method distortions. Combining the MTMM analysis with

genetically informative data, on the one hand, offers insight into the sources of personality

traits that are not confounded with method effects and, on the other hand, allows a

decomposition of method effects into genetic and environmental effects, which hints at the

nature of method variance. We limit our discussion to self- and peer report measures of

personality, because the meaning of method and trait variance cannot be analysed without

taking specific method–trait combinations into account, although a behavioural genetic

extension of the MTMM analyses can meaningfully be applied to many method–trait

combinations (e.g. the measurement of intelligence by standard IQ-tests and elementary

cognitive tasks; Neubauer, Spinath, Riemann, Borkenau, & Angleitner, 2000).

We name our extension of the MTMM study design multitrait-multimethod-twin model

(MTMM-T). As in the classic MTMM design there are at least two methods (raters) for

each trait. In addition, measures are collected in a genetically informative study design

using mono- and dizygotic twins reared together (see Figure 1). The variance of observed

variables (T1 for trait 1 and T2 for trait 2 measured by two methodsM1 andM2 for each twin

sibling X and for the co-twin Y) is decomposed into a trait component (T1/T2), a method

component (M1/M2) and a random error component (e). Thus, if we focus on the data for asingle sibling, we have the classical MTMM analysis. The genetically informative study

design enables us to decompose both the method and trait factors into biometric

components, for example into an additive genetic component (A), an environmental

component shared by twins (C) and a nonshared environmental component not shared by

twins (E) in Figure 1. The combination of MTMM analysis and biometric modelling adds

substantial to the complexity of the structural equation model, but basically we add four

(Figure 1; two traits and two method factors) univariate behavioural genetic analyses to the

MTMM model.

Genetic and shared environmental effects on methods and traits contribute to the

covariation of measures between twin siblings, whereas E has no effect on the twin

covariation. E is a residual effect, which is estimated from the comparison of correlations

within individuals (twin sibling) with correlations within twin pairs. As it is true for the

classical MTMM analysis, methods affect correlations between different traits measured

by the same method, and trait variance is the source for correlations of traits across

methods. Thus, A and C method effects increase hetero-trait mono-method correlations

between the two siblings of a twin pair and A as well as C trait effects contribute to mono-

trait hetero-method correlations between twins. If there is method variance that has neither

a genetic nor a shared environmental component, it must by definition be due to nonshared

environmental influences.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Figure 1. The minimal multimethod-multitrait-twin (MTMM-T) model. Two methods (M1 and M2) are shownfor every trait (T1 and T2)� twin (X and Y) combination. T¼ convergent valid trait factor; M¼method factor;e¼measurement error; X¼ observed variable of one twin sibling; Y¼observed variable of co-twin; a¼ 1.0 forMZ twins and 0.5 for DZ twins; D¼ 1.0 for all twins; A¼ additive genetic factors; C¼ shared environmentalfactors; E¼ nonshared environmental factors.

Construct validation using MTMM-T data 261

As can be seen from Figure 1, univariate or mono-trait mono-method behavioural

genetic analyses confound trait and method effects. As a rule of thumb, the estimates of

genetic and environmental effects from mono-method studies equal the averaged

environmental and genetic effects on method and trait factors in multimethod studies,

weighted by the relative strength of method and trait factors. In general, estimates from

mono-method studies may be grossly misleading, if method and trait factors differ in their

aetiology. However, the few existing studies using self-and peer report personality

measures show convergent estimates (e.g. Riemann, Angleitner & Strelau, 1997), while

some but not all observational studies of personality (see Borkenau, Riemann, Spinath, &

Angleitner, 2000, for a review) indicate effects of the shared environment that are not found

in self- and peer report studies.

The decomposition of trait variance (controlled for method effects) into genetic and

environmental components is important from a behavioural genetic perspective. But there

are two caveats for the interpretation of these components. First, depending on the

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

262 R. Riemann and C. Kandler

convergent validity of the specific methods used, trait variance common to different

measures may explain only a small fraction of the observed variance. If we measure

personality with reliable instruments via self- and peer reports (averaged across two peers

who know the target person very well) roughly one-third of the variance is shared by both

measures since the correlation between these measures usually is in the range of .50–.60.

Second, from the MTMM-T analysis we have little information about the aetiology of

correspondence among measures. To study traits independent of method effects does not

necessarily imply that we tap on the core of traits more directly. As we have argued before,

convergence of methods is a central requirement for the scientific study of personality

concepts and there are costs associated with scientific rigor. Interesting and theoretically

important facets of personality captured by only one method require additional validation

processes. It may, for example well be that the convergence in self- and peer ratings

actually captures those aspects of personality that are most under our conscious control

reflecting those aspects of personality we choose to present (Johnson, 24th August 2009).

Correlations between self-report measures from independent individuals who share their

genetic makeup and were reared in the same family (monozygotic twins) indicate a lower

limit of the convergent validity of these measures. The convergent validity is

underestimated to the degree that environmental experiences not shared by the twins

have an effect on the measure. In the Jena Twin Study of Social Attitudes (Stoßel, Kampfe,

& Riemann, 2006), for example convergence between twins’ self-reports for the highly

reliable domain scores of the NEO-PI-R (Costa &McCrae, 1992) was .61, .58, .60, .52, .57

(intra-class correlations, N¼ 226) for Neuroticism, Extraversion, Openness, Agreeable-

ness and Conscientiousness, while the corresponding correlations between self- and

averaged peer reports were on average slightly lower (r¼ .50, .62, .58, .44 and .53;

averaged across two twin siblings, N¼ 394). Self-report method variance, controlled for

trait and random error effects (i.e. the self-report method factor), may on the one hand

reflect all kinds of self-rater biases (see Paulhus & John, 1998, for a review), like individual

differences in self presentation or stereotypes shared by twins. On the other hand, it may

capture important aspects of the twins’ personality that are not accessible to the peer

judgment (Kraemer, Measelle, Ablow, Essex, Boyce, & Kupfer, 2003). The MTMM-T

allows decomposing this variance into genetic and environmental components.

Idiosyncratic judgment processes as well as specific environmental influences on the

use of self-report inventories (e.g. mediated by verbal comprehension), on self-concepts,

and/or on judgmental biases are reflected in the nonshared environmental component of the

self-report method factor. Genetic and shared environmental effects on the self-report

method factor may reflect (validated) deviations from the trait score that are only accessible

to self-report measures on the one hand or rater bias (shared by twins) that has a genetic or

environmental basis on the other hand. From the variance decomposition alone, little can

be said about the nature of these effects since they may reflect diverse mechanisms like

influences of cognitive abilities on the measure, shared forms of self-presentation, self-

perception or mere response style variance.

The interpretation of systematic effects on the peer report method factor is easier, if

independent peers provide the measures. In this case, rater biases should not contribute to the

correlation between twin siblings. The most reasonable interpretation of biases which

contribute to genetic and shared environmental effects on peer report method factors is that

stereotypes shared within the population from which the raters are drawn (Letzring, Wells, &

Funder, 2006) and again (validated) deviations from the trait score are the source for these

effects. However, as quality and quantity of personality-relevant information should reduce

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 263

stereotypes (Funder, Kolar, & Blackman, 1995; Letzring et al., 2006), well-informed

acquaintances should not provide personality assessments affected by stereotypes.

In sum, we have outlined here a behavioural genetic extension of the MTMM analysis.

Both mono-trait as well as hetero-trait twin correlations (cross-correlations) offer insight

into the sources of method variance. This is especially important for methods based on self-

reports, because correlations between self-reports provided by monozygotic twin siblings

are estimates of the lower bound of their convergent validity. Fitting structural equation

models derived from the basic MTMM-T model to empirical data will usually require

additional specifications of correlations among traits or their higher-order structure as well

as specifications of the method factor structure. Our model shares the basic idea to use

behavioural genetic data for the study of fundamental measurement problems with Bartels,

Boomsma, Hudziak, van Beijsterveldt and van den Oord’s (2007) model of rater (dis-)

agreement. It should be noted, however that the model elaborated by Bartels et al. requires

ratings of both twin siblings provided by the same observer (e.g. mother and father rate

both twin children) while our model requires methods to be independent (i.e. peer reports).

McCrae et al. (2008, Study 3) used a reduced version of the MTMM-T model to examine

the higher-order structure of the NEO-PI-R analysing the covariation among the NEO-PI-R

domain scores in a twin sample. In addition to five trait factors (Neuroticism, Extraversion,

Openness, Agreeableness and Conscientiousness), two higher-order trait factors – Digman’s

(1997) a, or socialization and b, or personal growth – were considered. The a-factor was

defined by low Neuroticism (N) and high Agreeableness (A) and Conscientiousness (C); b

was defined by Extraversion (E) and Openness (O). Based on previous research (Biesanz &

West, 2004; Paulhus& John, 1998) two correlatedmethod factors were included in themodel,

designated A-bias and B-bias. Comparisons between nested models indicated that both

method factors as well as higher-order trait factors were important to account for the

covariation among scale scores. Additional analyses revealed genetic and nonshared

environmental influences on the self-report method factors. However, McCrae et al. did not

report a full decomposition of trait and method factors into genetic and environmental sources

of variance.

In a study of the structure of the five personality domains measured with the NEO-PI-R

Kandler, Riemann, Spinath and Angleitner (in press) applied the full MTMM-T model. In

each of the five analyses (one for each domain) the variance and covariance of the six facets

was decomposed into six traits factors (one for each facet) and two method factors (self- vs.

peer report). To account for the correlations among the facets a higher-order trait-factor

was included in the model. On average, self- and peer reports – measured at the facet level

of the Five Factor model – shared 19.4% of their variance (r¼ .44 on average). Genetic

effects on the domain factors, controlled for facet-specific, method-specific and error

variance, explained on average about 63% of the variance, nonshared environmental effects

the remaining 37%. The self-report method factors were moderately (51%) influenced by

genetic effects whereas environmental influences shared by twins were negligible on

average (5%). For the peer report method factors substantially different sources were

found. Genetic influences were small (18%), shared environmental effects were again

negligible (1%) and the nonshared environment explained the remaining variance. On the

one hand, genetic effects on the self-report method factor may reflect genetic effects on

response distortions (e.g. response styles, self-enhancement). On the other hand, self-

reports may be partially based on information that is not accessible to peers or weighted

less in peers’ personality judgments (like motives, mental states). Genetic effects on

averaged peer reports should reflect real individual differences. They may result frommore

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

264 R. Riemann and C. Kandler

accurate comparisons of the target person with other persons that are not accessible to self-

raters. Again, differences in weighting the same information may play a role. Finally, it

cannot be ruled out that stereotypes shared by observers (e.g. inferences based on physical

characteristics; Kenny, 1994) contribute to a genetic component in peer reports.

An application of the multitrait-multimethod-twin model

In recent years, there has been a renewed interest in a general factor of the Big Five (Musek,

2007). However, the practical utility and empirical substance of a postulated GFP is

strongly questionable (Ashton, Lee, Goldberg, & de Vries, 2009; Backstrom, Bjorklund, &

Larsson, 2009; Biesanz &West, 2004; DeYoung, 2006). We used self- and peer reports on

the NEO-PI-R in a study of twins reared together to illustrate the MTMM-T model and

extended the higher-order factor MTMM-Tanalyses ofMcCrae et al. (2008) to hierarchical

analyses comparing different models with different levels of trait generality (Big Five and

Big Two) and a general factor at the highest level (Big One). In addition, we established full

variance decomposition in genetic and environmental effects on each trait-level as well as

on A- and B-bias method factors and correlations between them.

The focus of our analyses is on the hierarchical structure of personality factors. We did

not take into account that correlations among the NEO-PI-R domain scales may be due to

an unbalanced representation of facet scales in personality inventories that measure

blends of orthogonal factors (Ashton et al., 2009). However, applying a model formally

analogous to the blended variables model suggested by Ashton et al., we were able to test,

whether the higher order structure resulted from isolated correlations among domain

scores that were assigned to different second order factors (e.g. between Neuroticism and

Extraversion).

METHOD

For the present analyses we used the same data as in Kandler et al. (in press). Thus, we

summarize the data collection just briefly here.

Participants

We combined data from the third wave of the Bielefeld Longitudinal Study of Adult Twins

(BiLSAT; Spinath, Angleitner, Borkenau, Riemann, & Wolf, 2002) and the sample of the

Jena Twin Study of Social Attitudes (JeTSSA; Stoßel et al., 2006). In contrast to a previous

analysis, using a combined sample of BiLSAT and JeTSSA (McCrae et al., 2008), we

included data from unmatched twin pairs (UM), when data were available for one twin

sibling without zygosity diagnosis (N¼ 81). The resulting sample consisted of 1615

individuals from 919 twin pairs (433 MZ, 263 DZ and 223 UM), who were between 17 and

82 years old (M¼ 36.3, SD¼ 13.1); 1243 (77%) were females. Twins were instructed to

ask acquaintances, who knew them but not their twin sibling very well, to provide the peer

judgments. Thus, there were different peers for each twin sibling. Most peers were friends

and spouses. For 92.6% of the twins at least one peer report was available and for 86.1%

two peer reports were available.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 265

Measures

Twins completed the self-report and peers the peer report version of the German NEO

Personality Inventory Revised (NEO-PI-R; Ostendorf & Angleitner, 2004), which measures

the personality domains Neuroticism, Extraversion, Openness, Agreeableness and

Conscientiousness (Cronbach’s a ranges between .87 for Agreeableness and .92 for

Neuroticism, Ostendorf and Angleitner, German normative sample). Peer reports were

averaged for each target.

Since age and sex effects can bias twin correlations, self- and averaged peer reports of

the domain scores were adjusted for sex and linear as well as quadratic age effects using a

regression procedure, and the regression residuals were used in subsequent analyses. We

estimated variance–covariance matrices for each group (MZ, DZ and UM) using an

Expectation Maximization (EM) algorithm (Little & Rubin, 2002) for handling missing

values.

Structural equation modelling

As outlined before, we decomposed the variance of the observed variables X (phenotype of

one twin sibling) and Y (phenotype of the other) into trait, method and random error

components. In this application, however, the basic MTMM-T model (Figure 1) was

extended with respect to both the method and the trait structure. Figure 2 shows the full

extended phenotypic model without overlaying the genetically informative structure. Two

correlated method factors were included for each mode of data collection (self- vs. peer-

report): A-bias and B-bias (McCrae et al., 2008). A-bias is the tendency to distort self-

descriptions in a way to enhance approval by others and is also called moralistic bias

(Paulhus & John, 1998). It is related to negative valence (Tellegen, Grove, &Waller, 1991)

and has loadings on Neuroticism, Agreeableness and Conscientiousness. B-bias – called

egoistic bias by Paulhus and John – designates peoples’ tendency to describe themselves in

glowing terms, is related to the need for power (Paulhus & John), and has loadings on

Extraversion and Openness.

Although these method factors are primarily discussed as distortions of self-descriptions

(Paulhus & John, 1998), informant-specific effects noted by Biesanz and West (2004)

suggest that they may operate in observers, too. Although there is some discussion, to what

degree A-bias and B-Bias might reflect true personality variance, in our model they are

entered as method (bias) factors. Self-report measures (manifest variables, rectangles in

Figure 2, with indices of SR) of reflected Neuroticism scores (Emotional Stability, ES),

Agreeableness (A) and Conscientiousness (C) load on self-report A-bias (A-BiasSR), the

respective peer report measures (manifest variables, rectangles in Figure 2, with indices of

PR) load on peer report A-bias (A-BiasPR). The measures of Extraversion (E) and

Openness (O) have loadings on self-report and peer report B-bias (B-BiasSR and B-BiasPR).

A- and B-biases are allowed to correlate within methods (sSR and sPR) but not between

methods. Thus, these method factors affect correlations among self-report or peer report

measures, but not between self-report and peer report measures. To the degree that they

have genetic or shared environmental sources, they contribute to the mono-method (self- or

peer) correlation between twins but not to hetero-method correlations.

Eight hierarchically organized trait factors were considered in our model (see the right

side of Figure 2). Five trait factors are at the lowest level of the trait hierarchy representing

the domains of the NEO-PI-R (ES, E, O, A and C). Self- and peer report measures have

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Figure 2. The full phenotypic Big 5þ 2þ 1 model. SR, self-report; PR, peer report; E, Extraversion,O, Openness; ES, Emotional Stability; A, Agreeableness; C, Conscientiousness; GFP, general factor ofpersonality; empty arrows, random measurement error; sSR, covariance between self-report-specific A- andB-bias; sPR, covariance between peer-report-specific A- and B-bias; dashed arrows, blended variable loadingsfrom measures of E on the ES-factor; further description in the text.

266 R. Riemann and C. Kandler

loadings on the corresponding trait factor. At the next level there are two trait factors:

Stability and Plasticity (DeYoung, 2006). The E and O trait factors have loadings on

Plasticity, while ES, A and C trait factors load on Stability. At the highest level of the trait

hierarchy we included a GFP (Musek, 2007), which results from a correlation between

Stability and Plasticity.

Previous research on the higher-order structure of personality has not considered small

but significant correlations between Emotional Stability (Neuroticism) and Extraversion

(e.g. DeYoung, Peterson, & Higgins, 2002; Digman 1997). However, the correlation has

often been found across different measures of the Big Five, different methods and several

languages (Backstrom et al., 2009; Costa & McCrae, 1992; Graziano &Ward, 1992; John,

Goldberg, Angleitner, 1984; Ostendorf & Angleitner, 2004; Yik & Bond, 1993). Since this

correlation is not considered by the two-factor structure of Plasticity and Stability, it may

artificially lead to a significant GFP solution, although all other correlations which are

assumed to be zero in the two-factor solution (between E and A, E and C, O and ES, O and

A, O and C) may in fact be zero. We took the correlation between E and ES explicitly into

account by allowing for secondary loadings (formally similar to blended facet scales in

Ashton et al., 2009) from measures of E (SRE and PRE) on the ES-factor (dashed arrows in

Figure 2). We denote these models as blended factor models.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 267

The model presented in Figure 2 reflects the full phenotypic model which was named

Big 5þ 2þ 1�model (the asterisk stands for blended factor models allowing for secondary

loadings from measures of E on the ES-factor). Reduced model modifications are nested in

the full model: a model allowing only for the two-factor solution (Big 5þ 2�), and a model

without any higher-order structure (Big 5�). A model allowing only for the GFP (Big

5þ 1�) with direct paths to the Big Five was also tested. These four models were also

analysed without the blended factor loadings. Thus, eight models were compared whereby

the random error and systematic method components were included in all models. The

models were fitted to the EM variance–covariance matrices via maximum likelihood using

the statistical software package AMOS 17.0 (Arbuckle, 2007). Nested models were

compared by using the x2-difference test. Not nested models were descriptively compared

by the comparative fit index (CFI). The overall model fit was evaluated by the root-mean-

square error of approximation (RMSEA) in conjunction with its 90% confidence interval.

Not shown in Figure 2, each trait and each method factor was decomposed into an

additive genetic component (a2), a genetic dominance component (d2) and a nonshared

environmental component (e2). Consequently, each model modification about one

phenotypic factor includes three degrees of freedom. Since previous research on genetic

and environmental effects on NEO-PI-R domain scores (see Bouchard & Loehlin, 2001,

for a review), and – more importantly – an inspection of the twin correlations do not

indicate systematic effects of the environment shared by siblings, we did not consider

shared environmental effects in our models. To the degree that assortative mating of twins’

parents, gene–environment interaction and gene–environment correlation affect the

phenotypes observed in our study, parameter estimates will be distorted (see Neale &

Maes, 2004, for more details).

RESULTS

In order to provide an impression of trait and method effects, the classic MTMM

correlation matrix is presented in the Appendix. Since this matrix is quite complex, we will

only sketch the main trends here and then focus on SEM results. Since we want to explore

the higher order structure of the NEO-PI-R domain scores, hetero-trait correlations are in

the focus of the inspection of Appendix. Two points are quite obvious: First, as expected,

phenotypic mono-method hetero-trait correlations within twin siblings are substantially

higher than the corresponding hetero-method correlations, indicating substantial method

effects. Judged from averaging hetero-trait correlations within mono- versus hetero-

method same twin-sibling versus cross-twin blocks, hetero-method correlations calculated

within twin siblings are of about the same size as mono-method correlations across twin

siblings. Second, there is hardly any evidence for a substantial heritable GFP. A heritable

GFP that generalizes across methods of measurement would result in substantial

correlations of phenotypic trait scores between self- and peer reports across monozygotic

twin siblings (Table 1). However, only four of the 10 averaged correlations among the

NEO-PI-R domain scores are greater than or equal .10, while the mono-trait correlations

are larger than .25. This pattern severely questions the validity of a GFP.

SEM fit statistics are presented in Table 2. All phenotypic models showed an acceptable

up to a good overall model fit (upper limits of RMSEA< 0.08; Browne & Cudeck, 1993).

Starting from the simplest model (Big 5), a model allowing for a GFP (Big 5þ 1) and a

model allowing for Plasticity and Stability (Big 5þ 2) on the second order fitted the data

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Table 1. Averaged cross-correlations among observed NEO-PI-R domain scores for MZ twins

Emotionalstability Extraversion Openness Agreeableness Conscientiousness

Emotional stability 0.26Extraversion 0.11 0.43Openness 0.03 0.15 0.45Agreeableness 0.01 0.04 0.10 0.31Conscientiousness 0.12 0.05 �0.02 �0.02 0.39

Correlations are averaged Hetero-method (self- versus peer report) across twin siblings (twin A versus B)

coefficients.

Note: Correlations were averaged using Fisher’s r to z transformation. The complete correlation matrix is given in

Appendix.

268 R. Riemann and C. Kandler

significantly better than the reduced model (Dx2¼ 61.85, Dd.f.¼ 3, p< .05 and

Dx2¼ 103.20, Dd.f.¼ 6, p< .05). A model allowing for both a GFP and the Big Two

(Big 5þ 2þ 1) fitted the data significantly better than the Big 5þ 2 model (Dx2¼ 14.85,

Dd.f.¼ 3, p< .05).

All these models were compared with their corresponding blended factor models

(marked by an asterisk in Table 2). Consistently, the blended factor models fitted the data

significantly better (see Ashton et al., 2009, for similar results referring to the facet level).

Within the blended factor models, the model comparisons bore the similar pattern as within

the models without secondary loadings except for one comparison: The full model (Big

5þ 2þ 1�) did not fit the data significantly better than the Big 5þ 2� model (Dx2¼ 0.03,

Dd.f.¼ 3, p> .05). The Big 5þ 2� model also achieved the best overall model fit (the

smallest RMSEA and the highest CFI). As supposed, the evidence of a GFP rested

exclusively upon a significant correlation between ES and E, when method effects are

controlled for.

As factor loadings are useful to illustrate the correlation of lower-order variables with

higher-order variables, standardized factor loadings from five of the eight model variants

are presented in Table 3. A factor loading of .50 implies that 25% of variance in the lower-

order variable is attributable to the higher-order factor. Not surprisingly, factor loadings

and the implied resulting convergent valid trait variance was larger in the mean peer reports

Table 2. Model fit statistics

Models

Fit statistics

x2 d.f. CFI RMSEA 90% CI

Big 5 1475.23 430 0.831 0.052 0.049–0.054Big 5þ 1 1413.38 427 0.840 0.050 0.047–0.053Big 5þ 2 1372.03 424 0.847 0.049 0.046–0.052Big 5þ 2þ 1 1357.18 421 0.849 0.049 0.046–0.052Big 5� 1302.89 429 0.859 0.047 0.044–0.050Big 5þ 1� 1255.53 426 0.866 0.046 0.043–0.049Big 5R 2� 1153.26 423 0.882 0.043 0.040–0.046Big 5þ 2þ 1� 1153.23 420 0.881 0.044 0.041–0.047

�Blended factor models allowing for secondary loadings from SRE and PRE on the latent ES-factor. Best fitting

model is presented in boldface. Each latent variable is decomposed in two genetic and one environmental variance

components and thus the exclusion of one latent variable comprises three degrees of freedom (d.f.).

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Table 3. Standardized factor loadings

Factor loadings

Selected models

Big 5 Big 5þ 2 Big 5þ 2þ 1 Big 5R 2� Big 5þ 2þ 1�

Convergent valid traitsSRE on E 0.72 0.74 0.74 0.68 0.68PRE on E 0.82 0.85 0.85 0.76 0.76SRO on O 0.64 0.67 0.68 0.67 0.67PRO on O 0.78 0.82 0.83 0.83 0.83SRES on ES 0.64 0.65 0.65 0.66 0.66PRES on ES 0.75 0.74 0.75 0.75 0.75SRA on A 0.67 0.69 0.69 0.69 0.69PRA on A 0.71 0.73 0.73 0.72 0.72SRC on C 0.70 0.71 0.71 0.71 0.71PRC on C 0.72 0.73 0.73 0.73 0.73E on Plasticity 0.51 0.53 0.60 0.60O on Plasticity 0.60 0.60 0.65 0.65ES on Stability 0.28 0.31 0.28 0.28A on Stability 0.35 0.39 0.36 0.36C on Stability 0.31 0.34 0.31 0.31Stability on GFP 0.84 0.19Plasticity on GFP 0.48 0.09SRE on ES 0.34 0.34PRE on ES 0.38 0.38

Self-report specificitySRE on B-BiasSR 0.50 0.47 0.47 0.46 0.46SRO on B-BiasSR 0.52 0.50 0.50 0.50 0.50SRES on A-BiasSR 0.32 0.31 0.31 0.29 0.29SRA on A-BiasSR 0.44 0.42 0.41 0.40 0.40SRC on A-BiasSR 0.39 0.37 0.37 0.36 0.36

Peer report specificityPRE on B-BiasPR 0.57 0.55 0.55 0.56 0.56PRO on B-BiasPR 0.43 0.34 0.33 0.33 0.33PRES on A-BiasPR 0.48 0.38 0.37 0.38 0.38PRA on A-BiasPR 0.28 0.24 0.23 0.26 0.26PRC on A-BiasPR 0.51 0.49 0.48 0.49 0.49

Note: SR, self-report; PR, averaged peer report.�Blended factor models allowing for secondary loadings from SRE and PRE on the latent ES-factor. The best fitting

model is presented in boldface.

Construct validation using MTMM-T data 269

than for self-reports, since averages over two judges minimize individual biases and

increase reliability and as a consequence also convergent validity (Campbell & Fiske,

1959; Hofstee, 1994).

The latent Big Five factors accounted for 41% (self-reports on Emotional Stability,

SRES) to 67% (peer reports on Extraversion, PRE) of variance in the manifest variables

(based on the Big 5 model, Table 3). Based on the best fitting model (Big 5þ 2�), 8% in ES,

13% in A and 10% inCwas explained by Stability and 36% in E and 39% inO by Plasticity.

Consequently, the Stability factor accounted for 3% (SRES) to 7% (PRA) and Plasticity

accounted for 17% (SRE) to 29% (PRO) of the total variance in the observed variables.

For the sake of completeness, we additionally considered the case of the GFP based on

the best fitting model of unblended variable models (Big 5þ 2þ 1). The GFP accounted

for 71% of variance in Stability and 23% of variance in Plasticity. However, the GFP

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

270 R. Riemann and C. Kandler

explained only 6% (E) to 11% (A) of variance in the Big Five. Consequently, just about 2%

(SRES) to 6% (PRA) of variance in the manifest variables was accounted for by the GFP.

That is, although the Big 5þ 2þ 1 model fitted the data significantly better than the Big

5þ 2 model, the GFP accounted for a marginal proportion of variance in measured

personality domains (unless the specific correlation between E and ES is taken into

account).

Factor loadings from self- and mean peer report measures of the Big Five on self- and

peer- report-specific A-Bias and B-Bias factors are not easily convertible in variance

components of manifest variables, because A-Bias and B-Bias are allowed to correlate

across the models. Based on the best fitting model (Big 5þ 2�), the correlation between A-

Bias and B-Bias was rSR¼ .51 for self-reports and rPR¼ .35 for peer reports indicating the

presence of self- and peer-report-specific general method components which might have

yielded a markedly significant GFP within mono-method studies (Musek, 2007; Rushton &

Irwing, 2008).

In addition to the phenotypic model comparisons, we compared different genetically

informative models based on the best fitting phenotypic model (Big 5þ 2�). First, we tested

for the significance of nonadditive genetic effects (d2¼ 0), and second, for the significance

of additive genetic effects (a2¼ 0). The restricted model in which all genetic effects were

assumed to be additive did not fit the data significantly poorer than the full twin model

(Dx2¼ 10.19, Dd.f.¼ 13, p< .05, CFI¼ 0.882, RMSEA¼ 0.043). However, the reduced

model in which all genetic effects were assumed to be due to dominance deviations fitted

the data significantly poorer than the full twin model (Dx2¼ 29.29, Dd.f.¼ 13, p> .05,

CFI¼ 0.879, RMSEA¼ 0.043). Thus, genetic effects on different levels of personality

were assumed to be additive.

Additive genetic and nonshared environmental variance components for each latent

variable based on the best fitting model are presented in Table 4. Common variance of self-

and averaged peer reports, which was assumed to have a genetic basis (McCrae et al.,

2000), showed a clear genetic influence (on average 81%). Genetic effects on convergent

valid second-order traits Plasticity and Stability were smaller coming up to about a half of

Table 4. Genetic (a2) and environmental (e2) effects on trait and method factors

Latent variable and covariances

Variance components in %

a2 e2

Extraversion 86 14Openness 92 8Emotional Stability 59 41Agreeableness 85 15Conscientiousness 81 19Plasticity 50 50Stability 40 60A-BiasSR 83 17B-BiasSR 64 36A-BiasPR 32 68B-BiasPR 0 100sSR 53 47sPR 0 100

Note: Parameter estimates derived from the best fitting model.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 271

variance. This indicates that common variance inO and E as well as in ES, A and C appears

to be genetically as well as environmentally influenced.

Self-report-specific method factors showed a clear genetic basis, whereas of the peer

report method factors only the A-Bias was affected by genetic variance. The correlation

between A-BiasSR and B-BiasSR was attributable to genetic as well as environmental

effects indicating a self-report-specific general factor, which has almost equal genetic and

environmental sources. The correlation between A-BiasPR and B-BiasPR was exclusively

due to environmental effects indicating a peer-report-specific environmentally influenced

general factor.

DISCUSSION

The application of the MTMM-T design to self- and peer report data on the NEO-PI-R

supports four conclusions: (1) When method effects are controlled, there is no support for a

GFP and the second order factor Stability turns out to be a weak trait factor. (2) The

correlations between both the two self-report and between the two peer report specific

method factors indicate method specific general factors that are not validated across

methods. (3) If we consider the variance shared by self- and peer reports on the NEO-PI-R

domains, this is largely genetically determined (h2> 0.80) for all domains but Emotional

Stability. (4) Genetic effects on method factors indicate that especially self-reports but also

peer reports capture variance that shows convergent validity between twins but not between

methods. Each of these conclusions will be briefly discussed in the remainder.

Campbell and Fiske (1959) have argued that mono-method studies are severely limited

with respect to the scientific study of personality. Our analyses strongly support this view.

While there is support for a common source of variance both in self- and in peer report data,

the conjecture of a meaningful GFP is not supported in the analysis of variance shared by

self- and peer reports and controlled for method specific effects. Although compared to

simpler models not including blended factors the inclusion of a GFP in the higher-order

structure resulted in a significant improvement of model fit, the GFP explains only

marginal proportions of the observed variance. In addition, in this model, the putative GFP

basically reflects the comparably high correlation between Emotional Stability and

Extraversion (for a similar conclusion see Musek, 2007). This correlation is comparably

high even across methods and between twin siblings (e.g. Twin A’s self-report on

Emotional Stability correlated with Twin B’s peer-report on Extraversion). Thus, a blended

factor model that takes this correlation into account at the first order factor (domain score)

level fits the data substantially better and supports this conclusion convincingly. We used

the blended factor models here only as a convenient way to test whether the correlation

between Plasticity and Stability is mainly due to the correlation between ES and E. These

models were inspired by Ashton et al.’s (2009) blended variables model.

While there is a notable correlation between Extraversion and Openness that gives rise to

the higher-order factor Plasticity, the Stability trait factor explains only a marginal

proportion of the variance in phenotypic measures of Emotional Stability, Extraversion and

Agreeableness, which make up this factor. A similar pattern of loadings on Stability and

Plasticity was found in a comparable analysis of American data (McCrae et al., 2008; study

2). Stability explained more variance in the American adult sample than in our analyses,

however, it was quite weak in a sample of American adolescents. Similarly, Anusic,

Schimmack, Pinkus and Lockwood (2009) were not able to identify Stability as a second

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

272 R. Riemann and C. Kandler

order factor in some of their analyses of Big Five measures. Thus, while our results are in

line with a hypothesized Plasticity factor, our GermanMTMM data question the usefulness

of the Stability factor. Since the Plasticity factor is based on a single correlation,

independent evidence is needed to clarify whether Plasticity is indeed a meaningful higher

order factor and not a misrepresentation of causal effects between E and O.

In addition, it should be kept in mind that we did not test blended variables models since

this had resulted in overly complicated SEMs. Thus, the alternative explanation (Ashton

et al., 2009) that correlations among the NEO-PI-R domain scores are due to facets that

represent same-signed blends of orthogonal factors has not been faced here. Ashton et al.

argue that certain combinations of orthogonal factors are socially more important for

judging persons than other combinations.

The two correlated method factors A-bias (loading on ES, C and A) and B-bias (loading

on E and O) explain more phenotypic variance on average than the second order trait

factors. The loadings of self- and peer reports on the method factors as well as the

correlation between method factors are in line with Anusic et al.’s (2009) finding that halo

error may be responsible for mono-method correlations among Big Five traits.

Furthermore, it documents the importance of multiple sources of data for studies of

the structure of personality constructs. The method factor structure implies that in both

mono-method self-report (e.g. Rushton & Irwing, 2009) as well as mono-method other

report studies (e.g. Rushton, Bons, & Hur, 2008, study 3) robust higher-order factors will

be found, because these confound method and trait effects.

The behavioural genetic analysis of trait factors yields very high heritability estimates

for all first order trait factors except Emotional Stability. This replicates similar results

reported by Kandler et al. (in press) for NEO-PI-R facets and Riemann et al. (1997) for the

NEO-FFI (Costa &McCrae, 1989; Borkenau&Ostendorf, 1993). Obviously, those aspects

of personality that contribute to the conjoint basis of self- and peer reports have a strong

genetic basis. Genetic effects on the second order factors Stability and Plasticity are

substantially smaller. This reflects that the relative size of genetic effects does not simply

result from the aggregation of measures but rather depends on the sources of covariance

among personality constructs. The environment not shared by twin siblings seems to

contribute substantially to the covariation between Extraversion and Openness as well as

among the traits loading on Stability.

Most important in the present context is the behavioural genetic analysis of method

factors. The self-report method factors A-bias and B-bias show a very high heritability,

indicating that self-reports capture specific variance that is validated by the convergence

across twin siblings. Thus we can conclude that self-reports measure something that is not

assessed by peer reports. Results for the peer report method factors differ markedly. Peer

report A-bias is only moderately influenced by genetic variance and peer report B-bias

shows no genetic effects at all. With the exception of the moderate systematic variance

observed for A-bias, we may conclude that there is an asymmetrical relation between both

methods. Almost all systematic variance captured by peer reports is shared with self-

reports but not vice versa. These findings extend and support the results of Kandler et al.

(in press).

The difference in the sources of variance in self- and peer report data is in line with a

response bias explanation. Since in our study two independent peers rated each twin

sibling, rater bias may only correlate between twins, if it is triggered by a common feature

of the twins and shared by all raters (e.g. shared stereotypes). On the other hand, genetic

influences on all kinds of self-report response biases contribute to twin correlations. One

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 273

way to test this explanation is to analyse control scales that are included in numerous

questionnaires (e.g. Lie scale) or other traits (e.g. Narcissism) that may distort ratings on

traits actually measured. Loehlin and Martin (2001) found genetic effects on the Lie scale

of the EPQ (Eysenck, Eysenck, & Barrett, 1985). Vernon, Villani, Vickers and Harris

(2008) report substantial genetic influences on self-reports of Narcissism (about 0.60).

However, the test of a bias explanation requires not only that one must estimate the

heritability of control scales in self-reports, but also that one must analyse the pattern of

correlations between control scales and traits scores in self- and peer reports or,

alternatively, to statistically control trait scores for control scale variance. For example in

their analysis of self- and peer reports of the EPQ-RS (Eysenck & Eysenck, 1991),

Angleitner, Riemann and Strelau (1997) included the EPQ-RS Lie scale. The lie scale

showed a heritability of 0.40 for self-reports. For peer reports effects of the shared

environment were stronger than genetic effects. Although agreement on the lie scale

among two peers was rather low (0.41), self- and peer report scores correlated 0.47. Thus,

the EPQ-RS lie scale reflects not only self-report specific response bias but also variance

common to self- and peer reports. We are not aware of any data that provide a test of the

bias explanation.

Alternatively, the strong genetic influence on self-report method factors might be also

interpreted as supporting the view that self-report measures of personality traits provide an

extended perspective on personality. That is, self-reports might capture aspects of traits that

are not readily accessible to external observers. Commenting on the relation between life

record data (including peer reports) and self-report measures, Cattell (1950) concluded that

‘presumably these [self-report data] deal with responses too confined to the ideational and

introspective fields to have anything but oblique representation in the basic, universal,

behavioural manifestation of the personality sphere’ (p. 83). Allport (1937) already

emphasized that personality can be described appropriately only if a multitude of measures

is combined. He emphasized that ‘there are a great many legitimate methods of studying

personality, each with a proper place in the armamentarium of the psychologist’ (p. 369). If

we consider modern physiological, neuroscience, genetic or information processing

approaches to personality measurement, this view seems even more plausible. Personality

assessment, however, becomes more complex if different measures capture different

aspects of personality, since the variance common to different measures of personality may

not represent central or core features of individual differences, but just the overlap that two

or more variables happen to measure conjointly. As anticipated by Allport, personality

description requires a thorough understanding of what measure captures what aspect of

personality or as McCrae (2009) put it ‘Because traits are not easily manipulated, trait

psychology depends on the thoughtful interpretation of careful observations’ (p. 673). The

MTMM-T study design is an important tool since it allows conclusions about the valid

variance specific to measures, and thus provides new insights into the interplay of

substance and artefact in personality measurement.

REFERENCES

Allport, G. W. (1937). Personality: A psychological interpretation. New York: Holt.Angleitner, A., Riemann, R., & Strelau, J. (1997). Genetic and environmental influences on the EPQ-RS scales: A twin study using self- and peer reports. Poster presented at the 8th Meeting of theInternational Society for the Study of Individual Differences, Arhus, Denmark, July 19–23.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

274 R. Riemann and C. Kandler

Anusic, I., Schimmack, U., Pinkus, R. T., & Lockwood, P. (2009). The nature and structure ofcorrelations among Big Five ratings: The Halo-Alpha-Beta model. Journal of Personality andSocial Psychology, 97, 1142–1156.

Arbuckle, J. L. (2007). AMOS users’ guide 17.0. Chicago: SPSS.Ashton, M. C., Lee, K., Goldberg, L. R., & de Vries, R. E. (2009). Higher order factors of personality:Do they exist? Personality and Social Psychology Review, 13, 79–91.

Backstrom, M., Bjorklund, F., & Larsson, M. R. (2009). Five-factor inventories have a major generalfactor related to social desirability which can be reduced by framing items neutrally. Journal ofResearch in Personality, 43, 335–344.

Bartels, M., Boomsma, D. I., Hudziak, J. J., van Beijsterveldt, T. C. E. M., & van den Oord, E. J. C. G.(2007). Twins and the study of rater (dis)agreement. Psychological Methods, 12, 451–466.

Biesanz, J. C., & West, S. G. (2004). Toward understanding assessments of the Big Five: Multitrait-multimethod analyses of convergent and discriminant validity across measurement occasion andtype of observer. Journal of Personality, 72, 845–876.

Borkenau, P., & Ostendorf, F. (1993). NEO-Funf-Faktoren-inventar (NEO-FFI) [NEO five-factorinventory]. Gottingen: Hogrefe.

Borkenau, P., Riemann, R., Spinath, F. M., & Angleitner, A. (2000). Behavior genetics of personality:The case of observational studies. In S. Hampson (Ed.), Advances in personality psychology(Volume 1, pp. 107–137). Hove, England: Psychology Press.

Bouchard, T. J. Jr., & Loehlin, J. (2001). Genes, evolution, and personality. Behavior Genetics, 31,243–273.

Browne, M.W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen, & J. S.Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park: Sage.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

Cattell, R. B. (1950). Personality: A systematic theoretical and factual study. New York: McGraw-Hill.

Costa, P. T., &McCrae, R. R. (1989).NEO personality inventory: Manual form S and form R. Odessa,FL: Psychological Assessment Resources.

Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO personality inventory (NEO-PI-R) and NEOfive-factor inventory (NEO-FFI) professional manual. Odessa, FL: Psychological AssessmentResources.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioralmeasurements: Theory of generalizability for scores and profiles. New York: Wiley.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. PsychologicalBulletin, 52, 281–302.

DeYoung, C. G. (2006). Higher-order factors of the Big Five in a multiinformant sample. Journal ofPersonality and Social Psychology, 91, 1138–1151.

DeYoung, C. G., Peterson, J. B., & Higgings, D. M. (2002). Higher-order factors of the Big Fivepredict conformity: Are there neuroses of health? Personality and Individual Differences, 33, 533–552.

Digman, J. M. (1997). Higher-order factors of the Big Five. Journal of Personality and SocialPsychology, 73, 1246–1256.

Egloff, B., Wilhelm, F. H., Neubauer, D. H., Mauss, I. B., & Gross, J. J. (2002). Implicit anxietymeasure predicts cardiovascular reactivity to an evaluated speaking task. Emotion, 2, 3–11.

Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects fromtrait-specific method effects in multitrait-multimethod models: A multiple-indicator CT-C(M-1)model. Psychological Methods, 8, 38–60.

Eysenck, H. J., & Eysenck, S. B. G. (1991). Manual of the Eysenck personality scales. London:Hodder & Stroughton.

Eysenck, S. B. G., Eysenck, H. J., & Barrett, P. (1985). A revised version of the Psychoticism scale.Personality and Individual Differences, 6, 21–29.

Funder, D. C., Kolar, D. C., & Blackman, M. C. (1995). Agreement among judges of personality:Interpersonal relations, similarity, and acquaintanceship. Journal of Personality and SocialPsychology, 69, 656–672.

Graziano,W. G., &Ward, D. (1992). Probing the Big Five in adolescence: Personality and adjustmentduring the developmental transition. Journal of Personality, 60, 425–439.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 275

Grumm, M., & von Collani, G. (2007). Measuring Big Five personality dimensions with the implicitassociation test – Implicit personality traits or self-esteem? Personality and Individual Differences,43, 2205–2217.

Hofstee, W. K. B. (1994). Who should own the definition of personality? European Journal ofPersonality, 8, 149–162.

Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we doabout it? Psychological Methods, 5, 64–86.

John, O. P., Goldberg, L. R., & Angleitner, A. (1984). Better than the alphabet: Taxonomies ofpersonality-descriptive terms in English, Dutch, and German. In H. Bonarius, G. van Heck, & N.Smid (Eds.), Personality psychology in Europe: Theoretical and empirical developments. Lisse:Swets & Zeitlinger.

Kandler, C., Riemann, R., Spinath, F., & Angleitner, A. (in press) Sources of variance in personalityfacets: A twin study of self-self, peer-peer, and self-peer (dis-)agreement. Journal of Personality.

Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. New York: Guilford.Kraemer, H. C., Measelle, J. R., Ablow, J. C., Essex, M. J., Boyce, W. T., & Kupfer, D. J. (2003).A new approach to integrating data from multiple informants in psychiatric assessment andresearch: Mixing and matching contexts and perspectives. American Journal of Psychiatry, 160,1566–1577.

Letzring, T. D.,Wells, S.M., & Funder, D. C. (2006). Information quantity and quality affect the realisticaccuracy of personality judgment. Journal of Personality and Social Psychology, 9, 111–123.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (second edition). NewJersey: Wiley.

Loehlin, J. C., &Martin, N. G. (2001). Age changes in personality traits and their heritabilities duringthe adult years: Evidence from Australian twin registry samples. Personality and IndividualDifferences, 30, 1147–1160.

McCrae, R. R. (2009). The physics and chemistry of personality. Theory & Psychology, 19, 670–687.McCrae, R. R., Costa, P. T., Jr., Hrebıckova, M., Ostendorf, F., Angleitner, A., Avia, M. D., et al.(2000). Nature over nurture: Temperament, personality, and life span development. Journal ofPersonality and Social Psychology, 78, 173–186.

McCrae, R. R., Yamagata, S., Jang, K. L., Riemann, R., Ando, J., Ono, Y., et al. (2008). Substance andartifact in the higher-order factors of the Big Five. Journal of Personality and Social Psychology,95, 442–455.

Musek, J. (2007). A general factor of personality: Evidence for the Big One in the five-factor model.Journal of Research in Personality, 41, 1213–1233.

Neale, M. C., & Maes, H. H. M. (2004). Methodology for genetic studies of twins and families.Dordrecht: Kluwer Academic Publishers B.V.

Neubauer, A. C., Spinath, F. M., Riemann, R., Borkenau, P., & Angleitner, A. (2000). Genetic andenvironmental influences on two measures of speed of information processing and their relation topsychometric intelligence: Evidence from the German observational study of adult twins.Intelligence, 28, 267–289.

Ostendorf, F., & Angleitner, A. (2004). NEO-Personlichkeitsinventar, revidierte Form, NEO-PI-Rnach Costa und McCrae [Revised NEO personality inventory, NEO-PI-R of Costa and McCrae].Gottingen, Germany: Hogrefe.

Paulhus, D. L., & John, O. P. (1998). Egoistic and moralistic biases in self-perception: The interplayof self-deceptive styles with basic traits and motives. Journal of Personality, 66, 1025–1060.

Riemann, R., Angleitner, A., & Strelau, J. (1997). Genetic and environmental influences onpersonality: A study of twins reared together using the self- and peer report NEO-FFI scales.Journal of Personality, 65, 449–475.

Rushton, J. P., Bons, T. A., & Hur, Y.-M. (2008). The genetics and evolution of the general factor ofpersonality. Journal of Research in Personality, 42, 1173–1185.

Rushton, J. P., & Irwing, P. (2008). A general factor of personality (GFP) from two meta-analyses ofthe Big Five: Digman (1997) and Mount, Barrick, Scullen, and Rounds (2005). Personality andIndividual Differences. 45, 679–683.

Rushton, J. P., & Irwing, P. (2009). A general factor of personality in the Comrey personality scales,the Minnesota multiphasic personality inventory-2, and the multicultural personality question-naire. Personality and Individual Differences, 46, 437–442.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

276 R. Riemann and C. Kandler

Schultheiss, O. C., & Brunstein, J. C. (2001). Assessment of implicit motives with a research versionof the TAT: Picture profiles, gender differences, and relations to other personality measures.Journal of Personality Assessment, 77, 71–86.

Spinath, F. M., Angleitner, A., Borkenau, P., Riemann, R., & Wolf, H. (2002). German observationalstudy of adult twins (GOSAT): A multimodal investigation of personality, temperament andcognitive ability. Twin Research and Human Genetics, 5, 372–375.

Stoßel, K., Kampfe, N., & Riemann, R. (2006). The Jena twin registry and the Jena twin study ofsocial attitudes (JeTSSA). Twin Research and Human Genetics, 9, 783–786.

Tellegen, A., Grove, W. M., & Waller, N. G. (1991). Inventory of personal characteristics #7 (IPC7).Unpublished materials, University of Minnesota.

Vernon, P. A., Villani, V. C., Vickers, L. C., & Harris, J. A. (2008). A behavioural geneticinvestigation of the dark triad and the big five. Personality and Individual Differences, 44,445–452.

Yik, M. S. M., & Bond, M. H. (1993). Exploring the dimensions of Chinese person perception withindigenous and imported constructs: Creating a culturally balanced scale. International Journal ofPsychology, 28, 75–95.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

APPENDIX

Multitrait-m

ultim

ethod-twin

correlationmatrix

MZ

twin

DZ

method

Twin

method

trait

AB

Self

Peer

Self

Peer

ES

EO

AC

ES

EO

AC

ES

EO

AC

ES

EO

AC

ASelf

ES

0.44

�0.02

0.04

0.42

0.48

0.20

�0.10

�0.11

0.09

0.20

0.15

�0.07

�0.01

0.11

0.19

0.15

�0.04

�0.01

0.12

E0.39

0.26

0.01

0.30

0.21

0.60

0.09

�0.17

0.06

0.07

0.32

0.06

0.07

0.10

0.09

0.31

0.04

0.10

0.09

O0.07

0.45

�0.08

�0.10

�0.06

�0.12

0.59

0.02

�0.04

0.01

0.03

0.28

0.02

0.03

�0.02

�0.10

0.17

0.03

0.04

A0.17

0.03

�0.03

0.04

�0.06

0.02

0.15

0.48

0.00

�0.01

0.07

0.01

0.27

0.01

0.04

0.01

0.01

0.20

0.01

C0.42

0.25

0.03

0.11

0.25

0.11

�0.19

�0.10

0.52

0.12

0.16

�0.04

0.13

0.28

0.11

0.06

�0.08

0.12

0.21

Peer

ES

0.40

0.24

0.04

0.01

0.25

0.27

�0.11

0.03

0.44

0.01

0.01

�0.04

0.06

0.07

0.15

�0.02

�0.11

0.08

0.04

E0.24

0.64

0.21

0.05

0.13

0.30

0.22

�0.11

0.06

0.08

0.21

0.06

0.01

0.03

0.07

0.30

0.07

0.02

0.03

O0.10

0.24

0.56

0.06

0.00

0.12

0.42

0.17

�0.01

0.05

�0.02

0.17

�0.03

�0.02

0.13

0.01

0.14

0.00

0.09

A0.03

�0.05

0.01

0.45

0.01

0.16

0.00

0.18

0.05

0.03

�0.03

�0.01

0.27

0.08

0.00

�0.08

0.07

0.30

0.02

C0.19

0.04

0.02

0.03

0.54

0.45

0.12

0.10

0.16

�0.05

0.03

0.06

0.09

0.11

0.11

0.00

�0.13

0.05

0.11

BSelf

ES

0.53

0.17

0.01

0.04

0.24

0.22

0.13

0.04

�0.02

0.09

0.43

0.10

0.09

0.39

0.61

0.34

0.04

�0.07

0.34

E0.21

0.52

0.19

0.11

0.16

0.06

0.43

0.12

�0.01

�0.01

0.32

0.40

0.18

0.29

0.26

0.67

0.23

�0.01

0.13

O0.06

0.25

0.56

0.08

0.03

0.05

0.20

0.49

0.06

0.02

0.05

0.40

0.18

0.02

�0.03

0.18

0.55

0.03

�0.05

A0.15

0.09

0.03

0.53

0.01

0.01

0.09

0.07

0.27

�0.09

0.18

0.15

0.16

0.09

0.01

0.06

0.15

0.60

0.05

C0.23

0.07

�0.01

0.00

0.55

0.11

0.09

�0.06

�0.02

0.37

0.42

0.23

0.06

0.06

0.25

0.15

�0.03

0.06

0.61

Peer

ES

0.30

0.10

0.01

0.01

0.15

0.37

0.14

0.10

0.03

0.17

0.48

0.17

0.02

0.04

0.24

0.36

0.00

0.13

0.48

E0.15

0.42

0.10

0.09

0.05

0.11

0.51

0.13

�0.03

�0.02

0.19

0.64

0.23

0.14

0.06

0.30

0.34

0.04

0.22

O0.01

0.16

0.41

0.17

�0.10

�0.04

0.26

0.49

0.15

�0.04

�0.06

0.23

0.59

0.23

�0.12

0.00

0.46

0.20

0.12

A0.02

�0.02

0.08

0.35

�0.09

0.10

0.00

0.12

0.46

0.06

0.02

0.00

0.12

0.46

�0.06

0.23

0.10

0.29

0.22

C0.14

0.06

0.05

0.11

0.40

0.20

0.05

0.03

0.09

0.43

0.17

0.07

0.04

0.08

0.48

0.43

0.12

0.05

0.20

Note:Abovethediagonal

theMTMM-T

correlationmatrixofDZtwinsisshown,below

thediagonal

ofMZtwins.Correlations>.25areshownin

boldface.

Copyright # 2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)

DOI: 10.1002/per

Construct validation using MTMM-T data 277

View publication statsView publication stats