Chapter 7 Special Topics in Social Psychological ...psych.wfu.edu/furr/Psychometrics/Special topics - profiles and... · Furr, Special Topics - 1 Chapter 7 . Special Topics in Social

Furr, Special Topics - 1

Chapter 7

Special Topics in Social Psychological Measurement:

Profile Similarity and Difference Scores

This chapter addresses two important but somewhat misunderstood special topics in

social psychological measurement – profile similarity (i.e., profile agreement) and difference

scores. Both are used to reflect important social psychological phenomena. For example,

researchers studying personality similarity and relationship satisfaction might use sets of

psychological scales to obtain profiles of psychological characteristics for wives and husbands,

interpreting the similarity between wife-husband profiles in terms of “personality similarity.”

Similarly, researchers studying attraction and body types might ask participants to rate the

“thinnest” body type they find attractive and to rate the “heaviest” body type they find attractive,

computing the difference between these two ratings, and interpreting the difference in terms of

the “range of body sizes that each participant finds attractive.”

Although such measurement strategies are appealing, they entail complexities that, if not

understood and managed, can compromise researchers’ conclusions. For example, if not based

upon appropriately-conducted analyses, an apparent association between psychological similarity

and relationship satisfaction could reflect the relatively simple and mundane fact that well-

adjusted people generally have good relationships. Such results would say little, if anything,

about the supposed link between psychological similarity and relationship satisfaction. Similarly,

correlations between “desirability of women’s psychological attributes” and “attractiveness

range” difference scores may appear to indicate that men are attracted to a wider range of body

sizes among women who have desirable psychological attributes than among women having less


desirable psychological attributes. However, careful analysis may reveal a simpler, clearer, and

more fundamental message that men’s attraction to heavier women increases in response to

desirable psychological attributes, but their attraction to thinner women is unchanged. The

current chapter addresses these measurement issues, articulating the problems and introducing

solutions.

Profile Similarity

Social psychologists often study profiles of psychological characteristics when examining

phenomena such as psychological similarity, accuracy of social judgments, person-environment

fit, cross-situational behavioral consistency, and developmental stability across time Such

examinations often arise from analysis of profile similarity – the degree to which two profiles of

characteristics are similar to each other (e.g., Baker & Block, 1957; Bernieri, Zuckerman,

Koestner, & Rosenthal, 1994; Biesanz, West, & Millevoi, 2007; Blackman & Funder, 1998;

Colvin, 1993; Furr, Dougherty, Marsh, & Mathias, 2007; Furr & Funder, 2004; Gonzaga,

Campos, & Bradbury, 2007; Letzring, Wells, & Funder, 2005; No, Hong, Liao, Lee, Wood, &

Chao, 2008; Starzyk, Holden, Fabrigar, & MacDonald, 2006).

For example, Luo and Klohnen (2005) examined associations between couples’

personality similarity and marital satisfaction, evaluating the hypothesis that people select mates

who are relatively similar to themselves. Participants completed self-report personality ratings

across several characteristics, and Luo and Klohnen computed a similarity correlation between

each wife’s profile of ratings and her husband’s profile of ratings. As an index of profile

similarity, a strong positive correlation indicates that a wife’s personality profile is similar to her

husband’s profile.

To illustrate, Figure 7.1a presents hypothetical profiles for a couple. The profiles’ shapes


are quite similar to each other, with corresponding patterns of highs and lows. Thus, the

characteristics that the wife views as relatively descriptive of her personality (as compared to

other characteristics) are the characteristics that the husband as relatively descriptive of his

personality. The Pearson correlation between the two sets of scores is large, r = .70, quantifying

the considerable similarity between the profiles’ shapes.

--------Insert Figure 7.1----------

Profile analysis is appealing in at least three ways. First, it reflects similarity (or stability,

or consistency, or agreement, or fit) across a wide range of characteristics, providing an

ostensibly “holistic” perspective on similarity. Rather than defining similarity in terms of a

single variable (e.g., the degree to which couples have similar levels of Extraversion), it reflects

similarity across multiple characteristics. Second, it reflects similarity at the couple-level or

person-level, rather than at a sample-level. In the context of wife-husband similarity, it produces

a similarity score for each couple. Third, profile analysis provides a relatively straightforward

method of examining correlates, predictors, or consequences of similarity (or agreement, or

stability, etc). For example, after similarity indices are computed for each couple, they can be

correlated with couples’ relationship satisfaction scores. A positive correlation might be

interpreted as indicating that wife-husband pairs with relatively similar personality profiles are

relatively satisfied with each other. Such questions can be addressed in other ways, but they

require complex analytic strategies (e.g., Furr et al., 2007).

Despite this appeal, at least two complexities potentially obscure analyses of profile

similarity. First is the need to understand and accommodate profile elevation, scatter, and shape.

Second is the need to understand and differentiate between profile normativeness and

distinctiveness.


Elevation, Scatter, and Shape

Thus far, discussion has highlighted one facet of a profile – its shape, in terms of the

pattern of high and low values (see Figure 7.1b). Indeed, for many applications of profile

similarity, profile shape may be the most interesting and psychologically meaningful facet.

However, there are two additional facets of a profile (Cronbach & Gleser, 1953; Furr,

2009b). Profile elevation refers to the average score across all variables in the profile, and

profile scatter refers to the variability across the variables – the degree of spread among the

scores. As shown in Table 7.1 and Figure 7.1b, the wife in couple 1 has an elevation of 3.8 and a

scatter (as indexed by the standard deviation) of 1.47. Setting aside ceiling or floor effects, the

three elements of a profile are independent.

--------Insert Table 7.1----------

Because each profile has multiple elements, similarity between profiles can be gauged in

multiple ways. That is, profile similarity can be indexed in terms of one, two, or all three

elements. In fact, psychologists have debated the methods of quantifying profile similarity for

more than 60 years (e.g., Carroll & Field, 1974; Cattell; 1949; Cohen, 1969; Cronbach & Gleser,

1953; du Mas, 1946; Furr, 2009b; Kenny, Kashy, & Cook, 2006; McCrae, 1993, 2008; Nunnally,

1962).

Currently, researchers use either of two primary techniques to quantify profile similarity.

The first and simplest is a Pearson correlation between two profiles, as described earlier. The

simple correlation reflects the similarity between profiles’ shapes – that is, it reflects similarity

between profiles’ patterns of highs and lows, being unaffected by similarity or dissimilarity in

the profiles’ elevation and scatter. In contrast, the double-entry intraclass correlation is an

“omnibus” index of profile similarity, affected by all three elements of profile similarity (Furr,


2009b). The double-entry intraclass correlation is computed by: a) creating two “doubly-entered”

profiles by appending each profile of scores to the end of the other, and b) computing a Pearson

correlation between these two doubly-entered profiles (see Furr, 2009b for a detailed example).

Some researchers recommend the double-entry intraclass correlation over the Pearson correlation

because it blends the three elements (McCrae, 2008). However, such recommendations have

been questioned in several ways – in terms of conceptual ambiguity associated with blending of

independent elements, in terms of significant technical problems and potential confusions, and in

terms of a lack of clear empirical benefit over an approach focusing on each element separately

(Furr, 2009b).

Currently, the simple Pearson correlation seems to be the most commonly-used index of

profile similarity. However, researchers should be familiar with profile elevation, scatter, and

shape, and they should understand their effects on any index of profile similarity.

Normativeness and Distinctiveness

Profile similarity’s second complexity is the “normativeness problem” (Furr, 2008). A

profile’s normativeness is the degree to which it reflects an average profile – the similarity

between the shape of an individual profile and the shape of a group’s normative profile. Figure

7.2 illustrates this, with Figure 7.2a presenting three wives’ self-rated personality profiles. There

are clear commonalities across the three individual profiles – all have higher Extraversion than

Neuroticism, and all have lower Openness than Agreeableness and Conscientiousness. Figure

7.2b presents these profiles alongside a “normative” profile. The normative profile reflects scores

for each variable averaged across all wives’ self-ratings. Each wife’s profile is at least somewhat

similar to the normative wife profile, suggesting that each wife is normative to some degree.

--------Insert Figure 7.2----------


Although there has been little empirical examination of normative profiles, there are at

least three likely properties of normativeness (Furr, 2008). First, many individual profiles are

likely to be quite normative, as suggested in Figure 7.2b. Second, two normative profiles in an

analysis are likely to be very similar to each other. For example, the normative wife profile in

Figure 7.2b is likely very similar to a normative husband profile. Third, a normative profile is

probably psychologically meaningful as social desirability, psychological well-being, or

adaptation to an environment (Wood, Gosling, & Potter 2007). That is, the variables having

relatively high scores within a normative profile are likely to be socially desirable, and those

having relatively low scores are likely to be undesirable.

Together, these properties have implications for profile similarity. First, any two profiles

are likely to be similar, even with no intrinsic connection between them. For example, a wife’s

profile is likely to be somewhat, perhaps very, similar to the profile of a husband from another

couple. Second, profile similarity can represent social desirability, adjustment, or adaptiveness.

For example, high similarity between a wife’s profile and her husband’s profile might indicate

simply that both people are well-adjusted.

These normativeness implications create two problems for analyses of profile similarity.

First, they obscure interpretations of average levels of profile similarity. For example, social

psychologists might wish to interpret the average level wife-husband similarity (i.e., averaged

across all couples) as indicating the degree to which people marry people with similar

psychological characteristics. Unfortunately, this interpretation is clouded by the fact that any

given wife’s profile is probably similar to any given husband’s profile. That is, researchers

would likely find psychological similarity between women and men in general, even among

people from different couples. The second normativeness problem concerns interpretations of


antecedents, consequences, or correlates of profile similarity. For example, social psychologists

might examine wife-husband similarity and relationship satisfaction, find a positive correlation

between the two, and wish to conclude that psychological similarity contributes to (or at least is

associated with) satisfying relationships. However, wife-husband similarity may partially reflect

psychological adjustment, arising from connections between individual profiles, normativeness,

and desirability or adjustment. Therefore, a correlation between wife-husband similarity and

relationship satisfaction might indicate simply that well-adjusted people generally have

satisfying relationships. This may be an important psychosocial finding, but it reflects no

intrinsic link between psychological similarity and relationship satisfaction.

There are at least two methods for handling normativeness problems (Furr, 2008). The

first is a sample-level method, in which researchers create profile similarity values for random

pairs of profiles, in order to gauge the “normative” level of similarity in a sample. For example,

Luo and Klohnen (2005) examined wife-husband similarity and noted that “individuals, on

average, tend to be more similar than dissimilar” (p. 311). To address this normativeness issue,

they created “random couples” by pairing each wife’s profile with the profile of a husband from

a different couple, and they computed a similarity correlation for each random couple. They then

interpreted the mean “random couple” similarity correlation as reflecting “the average similarity

between men and women” (p. 311). Finally, they computed the average similarity correlation

between “real” couples and contrasted it with the mean “random couple” correlation, interpreting

the difference as the degree to which real couples are more similar than are random pairs of men

and women. In the context of couples, this approach has been referred to as “pseudo-couple

analysis” (Kenny et al., 2006, p. 335-337). This approach facilitates analysis of average levels of


profile similarity, but it cannot address problems associated with antecedents, consequences, or

correlates of similarity. The second approach addresses both issues.

The second method is a pair-level method in which similarity is decomposed for each

pair of profiles (Cronbach, 1955; Furr, 2008). The association between two profiles can be

partitioned into components representing blends of similarity, normativeness, and distinctiveness.

This decomposition can be done in several ways (Furr, 2008), and Tables 7.1 and 7.2 illustrate

one way applied to three hypothetical couples.

--------Insert Table 7.2----------

The pair-level process begins by decomposing each profile into two component profiles –

a normative profile reflecting group means, and a distinctive profile reflecting differences

between an individual and the group on each variable. As shown in Table 7.1, the normative

wife profile is the mean of all wives (for each variable), and the normative husband profile is the

mean of all husbands. A distinctive profile includes “deviation scores” reflecting the difference

between an individual’s score on a variable and a group’s mean score on the variable. For

example, Marge’s distinctive profile (Table 7.1) reveals that she is exactly as Neurotic as the

average wife (4-4=0), somewhat less Extraverted than the average wife (5-5.67=-.67), and so on.

Following the decomposition of individual profiles, several indices can be computed for

each pair of profiles (see Table 7.2). Overall Similarity is the correlation between two raw,

unadjusted profiles as discussed earlier (e.g., the overall similarity between Marge and Homer

is .70, see Couple 1 in Table 7.2). Distinctive similarity is the correlation between a pair of

distinctive profiles, reflecting the degree to which the pair shares a pattern of non-normativeness.

For example, the correlation between Marge’s distinctive profile and Homer’s distinctive profile

is .19, indicating that, to a slight degree, the ways in which Marge has unusually high or low


levels of specific characteristics is similar to the ways in which Homer has unusually high or low

levels of those characteristics. That is, the ways that Marge is distinctive are somewhat similar

to the ways that Homer is distinctive. Generalized Normative Similarity is the degree of

similarity between two normative profiles, such as the correlation between the normative wife

and normative husband profiles, r = .92 in Table 7.2 (note that this value is the same across all

pairs of profiles). Pair-level normativeness indices can also be computed, for example, between

each wife’s profile and the normative wife profile and between each husband’s profile and the

normative husband profile. For example, Table 7.2 indicates that Marge is extremely like the

average wife (r = .94) and that Homer is very much like the average husband (r = .77). See Furr

(2008) for additional possibilities in decomposing profiles and for psychometric details of these

decompositions.

A pair-level approach handles both normativeness problems – the “average-level”

problem and the “antecedents, consequences, and correlates” problem. By examining distinctive

similarity alongside overall similarity and normativeness, researchers gain insight into general

levels of wife-husband similarity, into the similarity between wives’ and husbands’ distinctive

qualities, and into the possibility that relationship satisfaction is associated either with

normativeness (and thereby potentially adjustment) or with the similarity between wives’ and

husbands distinctive qualities. Such differentiated analyses can produce interesting insights,

such as the insight that high acquaintanceship enhances peoples’ understanding of each others’

distinctive personality qualities while minimizing their reliance on normative personality

information in social judgments (Beisanz et al., 2007).

Summary

Analysis of profile similarity is an appealing method for assessing and examining social


psychological phenomena such as psychological similarity, judgmental accuracy, cross-

situational behavioral consistency, and person-environment fit. However, to realize the full

potential of this method, researchers must account for important complexities. Specifically, they

should recognize that any index of profile similarity is affected by one or more of the three

elements of similarity – shape, elevation, and or scatter. Furthermore, they should be aware of

the elements affecting a specific index, and they should understand the costs, benefits, and

meaning of each potential index. Finally, they should understand normativeness and its potential

effects on profile similarity, and they should implement appropriate analytic strategies to account

for these effects. When conducted with appropriate analytic strategies, profile similarity can be a

useful tool for social psychologists.

Difference Scores

Many interesting social psychological phenomena can be seen as differences between

two “component” phenomena. For example, actual-ideal discrepancy might be viewed as the

difference between a participant’s actual standing on a variable and his or her preferred standing

on that variable. Similarly, “attractiveness range” might be viewed as the difference between the

thinnest body type (say as indexed in terms of Body Mass Index) that a person finds attractive

and the heaviest body type that he or she finds attractive. For such phenomena, researchers

might be tempted to use difference scores – measuring each component variable (e.g., BMI of

thinnest body type that a person finds attractive and BMI of the largest body type he or she finds

attractive) and then subtracting one value from another to produce a difference score (also called

change scores, gain scores, and discrepancy scores).


Difference scores are intuitively appealing. Intuitively, they seem to fit well with

phenomena such as actual-ideal discrepancy, psychological change, and attractiveness range, and

they arise from simple subtraction:

Di=Xi-Yi Equation 7.1

An individual’s difference score is the difference between his or her score on two components –

variable X and variable Y. Given this intuitive appeal and simplicity, difference scores have

been used in many areas of psychology, including social psychology.

Unfortunately, this intuitive appeal masks psychometric issues potentially compromising

psychological conclusions based upon difference scores. These issues have been discussed for

decades (e.g., Collins, 1996; Cronbach & Furby, 1970; Overall & Woodward, 1975; Rogosa,

1995; Zimmerman, Williams, & Zumbo, 1993; Zumbo, 1999), but non-optimal use of difference

scores persists. If these issues are ignored when using difference scores, then research quality

suffers – perhaps producing conclusions that are misinformed or that simply miss more

fundamental phenomena.

This section presents psychometric and statistical properties of difference scores,

important implications of these properties, problems arising from these implications, and

measurement-based recommendations regarding the potential use of difference scores. The

practical take-home message is twofold.

1. Researchers should consider avoiding difference scores, instead focusing on the

component variables from which difference scores are computed.

2. If researchers use difference scores, then they should do so with thorough

examination of the component variables and with serious attention to psychometric

quality.


Properties of Difference Scores

There are at least two fundamental psychometric properties of difference scores –

properties that, we shall see, have implications for the meaning of difference scores and,

ultimately, for the psychological meaning of conclusions based upon difference scores. The

properties concern the reliability and variability of difference scores.

Reliability. Observed difference scores are treated as indicators of “true” psychological

difference scores. That is, the observed difference between measured variable X and measured

variable Y is taken as an indicator of the difference between a person’s true score on variable X

and his or her true score on variable Y. Thus, it is important to recognize the factors affecting

the reliability of observed difference scores – factors affecting the degree to which variability in

observed difference scores reflects variability in true difference scores.

In theoretical terms, the reliability of observed difference scores (rDD) is affected by true

score variability in the component variables, the true correlation between component variables,

and the reliability of the measures of the component variables:

2 2

2 2

2

2

X Y X Y X Y

X Y

X Y X Y

T T T T T TDD

T TT T T T

XX YY

s s s s rr

s ss s r

r r

+ −=

+ −

Equation 7.2

In this equation, , , , and are the true score variances and standard deviations of

variables X and Y (again, the two components of the difference score), rXX and rYY are

reliabilities of the measures of X and Y, and is the true correlation between X and Y1. As

we shall see, this equation has important implications for the meaning and utility of difference

scores.

2XTs 2

YTsXTs

YTs

X YT Tr

Variability. Most analyses of difference scores focus on their observed variability. In an

experimental context, one might evaluate whether experimentally-induced differences in one or


more IVs explain variability in participants’ observed difference scores. In a non-experimental

context, one might evaluate whether naturally-occurring differences in one or more predictor

variables are associated with variability in participants’ observed difference scores. The

importance of variability in difference scores requires an understanding the factors producing

that variability. Specifically, variability in observed difference scores reflects variability in

component measures and the correlation between those measures:

2 2 2 2D X Y XY Xs s s r s s= + − Y Equation 7.3

where 2Ds is the variance of observed difference scores, , , , and are the observed

variances and standard deviations of measures of variables X and Y, and rXY is the correlation

between those measures.

2Xs 2

Ys Xs Ys

Implications of These Properties

These two properties reflect fundamental statistical and psychometric qualities of

difference scores, and they have several implications. These implications, in turn, affect the

meaningfulness and utility of difference scores.

Unreliable scales produce unreliable difference scores. There is much debate about the

reliability of difference scores. Some researchers believe that difference scores are inherently

unreliable, while others note that difference scores can, in fact, be reasonably reliable. In reality,

difference scores can indeed be reliable, but they are unreliable when their component measures

are unreliable.

Based upon Equation 7.2, Figure 7.3a reflects the reliability of difference scores as a

function of the reliability of component measures. Values were generated by holding constant

the true correlation between X and Y (arbitrarily) at =.50, holding equal the true variances of

X and Y (i.e, = ), and assuming that measures of X and Y are equally reliable. As the

X YT Tr

2XTs 2

YTs


Figure shows, components with low reliability produce difference scores with very poor

reliability – e.g., if rXX and rYY are .40, then rDD=.25. However, components with strong

reliability can produce difference scores that are reasonably reliable – e.g., if rXX and rYY are .90,

then rDD=.82. Thus, despite some widespread belief to the contrary, difference scores can be

reliable, but only if one (or both) component is highly reliable. Nevertheless, the fact remains

that, when component measures are unreliable, difference scores are unreliable.

--------Insert Figure 7.3----------

Highly-correlated components produce unreliable difference scores. Perhaps somewhat

counterintuitively, the reliability of difference scores is reduced when components are positively

correlated with each other. Equation 7.2 reflects the reliability of difference scores as a function

of the true correlation between the two components (i.e., ), and Figure 7.3b illustrates this

across a range of true correlations. This figure (which assumes that the measures of X and Y

have reliabilities of .80 and that they have equal true variances) shows that larger inter-

component correlations produce smaller reliabilities of difference scores. For example, it shows

that components truly correlated with each other at only =.10 can produce difference scores

with good reliability of rDD=.78; however, components correlated more robustly with each other

at =.70 produce difference scores with substantially lower reliability of rDD=.54. That is,

even though both component measures might be highly reliable, difference scores have lower

reliability when components are highly positively correlated. This fact leads some researchers to

question the reliability – perhaps even the utility in general – of difference scores.

X YT Tr

X YT Tr

X YT Tr

Difference scores can simply reflect one of the components. Because difference scores

arise from two component variables, variability in difference scores can simply reflect variability

in one component. That is, under some circumstances, difference scores reflect – or largely


reflect – one component. This implication is apparent in the correlation (rXD) between the

difference scores and one component (in this case, variable X):

2 2 2X XY Y

XD

X Y XY X Y

s r srs s r s s

−=

+ − Equation 7.4

where , , , and and rXY are as defined above. This equation shows that the link

between a difference score and a component is largely a function of the difference between the

variabilities of the two components. That is, for any given level of association between

components (i.e., holding rXY constant), difference scores are more strongly correlated with the

component having greater variability. In sum, if components have different variabilities, then the

one with greater variability will have greater impact on the difference scores (see Equation 7.3)

and thus will be more strongly correlated with difference scores (Equation 7.4).

2Xs 2

Ys Xs Ys

This effect can be seen in Figure 7.4, which presents correlations between difference

scores and each component variable (i.e., rXD and rYD). Correlations are presented as a function of

the ratio of variability in component X to variability in component Y (i.e., sX/sY) and (arbitrarily)

setting the components correlated with each other at rXY = .40. When component X has less

variability than component Y (e.g., when sX/sY = .2), difference scores are less strongly correlated

with component X than with component Y (i.e., rXD = -.21 and rYD = -.98). In contrast, when

component X has greater variability than component Y (e.g., when sX/sY = 2), difference scores

are more strongly correlated with component X than with component Y (i.e., rXD = .87 and rYD =

-.10). It is only when the components have equal variability that they are equally-correlated with

difference scores (i.e., if sX/sY = 1, then |rXD| = |rYD|).

--------Insert Figure 7.4----------


Issues in Application of Difference Scores

Thus far, this section has articulated psychometric properties of difference scores along

with three implications of those properties. In addition, my experience as a reviewer, editor, and

reader suggests three observations of the way that difference scores have been applied in social

psychological research.

First, difference scores are derived occasionally from “single-item” component measures,

with little, if any, attention to psychometric implications. This problem compounds an issue

mentioned earlier in this volume – single-item measures are relatively likely to have poor

psychometric quality. Thus, difference scores based upon single-item component measures likely

have very poor psychometric quality, with potentially serious consequences for the quality of

subsequent analyses and psychological interpretation.

Second, despite long-held concerns regarding the reliability of difference scores and

despite the importance of knowing the psychometric quality of any variables being examined,

researchers often seem to ignore the reliability of difference scores. When difference scores are

examined, researchers occasionally focus only on the psychometric quality of component

measures, paying little or no attention to the psychometric quality of the difference scores.

Third, despite the strong dependence of difference scores upon their components,

researchers often seem to ignore the components when examining difference scores. That is,

researchers seem to move quickly to analysis of difference scores, apparently ignoring the fact

that difference scores simply reflect their components to varying degrees.

Potential Problems Arising From Difference Scores


Taken together, the properties, implications, and applied issues produce significant

concerns with the use of difference scores in social psychological research. Two problems are

particularly important, potentially compromising research based upon difference scores.

Poor reliability may obscure real effects. Although reliability might be less pervasively

problematic than sometimes supposed, there are legitimate concerns about the reliability of

difference scores in some research. As mentioned earlier, the reliability of difference scores

suffers when components are correlated positively with each other and when they are measured

with poor reliability. In many, perhaps most, applications of difference scores, components are

likely to be robustly correlated with each other. When combined with the possibility that

difference scores are sometimes derived from components having poor (or unknown) reliability,

this creates significant potential problems with the reliability of difference scores.

If difference scores have poor reliability, then analyses may miss meaningful and real

effects. As discussed earlier (Chapter 4), poor reliability reduces observed effect sizes, which

reduces the power of inferential analyses, which increases the likelihood of Type II errors.

These effects are as true for difference scores as they are for any dependent variable in any social

psychological study.

Lack of discriminant validity, producing obscured results. Perhaps the most subtle

problem is that difference scores may lack discriminant validity, potentially obscuring

psychological conclusions based upon their analysis. Because difference scores can be

influenced heavily by a component having relatively large variance, they can simply reflect one

component. This psychometric situation could produce conclusions that poorly reflect

psychological reality – that are misleading, overly complex, obscure, or that simply miss deeper

psychological messages2.


For example, a recent study of perceived physical attractiveness presented personality

correlates of “attractiveness range” difference scores (Swami et al., in press). In this research,

male participants viewed images of nine increasingly-large female figures, which were

interpreted as a 1-to-9 interval scale of size. Each participant identified the smallest and largest

figures he found attractive, and the difference (in terms of the size-scale values) was interpreted

as a participant’s “attractiveness range” (AR). These AR difference scores were then correlated

with scores from personality scales, producing a significant negative correlation between AR and

Extraversion. This seemingly suggests a potentially-meaningful connection between males’

Extraversion and the range of body sizes they find attractive. Specifically, it suggests that males

with high levels of Extraversion are attracted to a “narrower range of body sizes” than are males

with low levels of Extraversion.

However, deeper analysis clarified and fully explained the apparent “attractiveness range”

findings. Specifically, across all males, the “largest attractive” ratings had much greater

variability than the “smallest attractive” ratings. That is, males varied much more dramatically

in the largest-sized figures they found attractive than in the smallest-sized figures they found

attractive. As discussed earlier, if one component of a difference score has greater variability

than the other, then the difference score largely reflects the one with greater variability.

Consequently, AR difference scores largely reflected the “largest attractive” ratings, and this fact

was verified with an extremely high correlation (r=.86) between the difference scores and the

“largest attractive” ratings (Swami et al., in press). Fortunately, the researchers examined the

components (i.e., smallest attractive and largest attractive ratings) alongside the difference scores

(i.e., AR scores), revealing the confounding of AR scores with one of its components. This

allowed the researchers to make more informative and psychologically-meaningful


interpretations than would have been possible with only the analysis of difference scores.

Reliance upon only the AR scores would have produced conclusions that were limited reflections

of the psychological reality otherwise readily apparent in the analysis of the two components.

This potential obscuring effect can be seen in the correlation (rPD) between a predictor or

independent variable (e.g., Extraversion) and a difference score (e.g., AR):

2 2 2PX X PY Y

PD

X Y XY X Y

r s r srs s r s s

−=

+ − Equation 7.5

In this equation, rPX and rPY are correlations between the predictor and component variables, and

, , , , and rXY are as defined earlier. The numerator reveals that the association

between a predictor/IV and a difference score is largely a blend of correlations between the

predictor/IV and the components. More deeply, the association largely reflects a link between

the predictor/IV variable and whichever component has greater variability. Thus, in the AR

study, the association between Extraversion and AR largely reflected a link between

Extraversion and the “largest attractive” ratings.

2Xs 2

Ys Xs Ys

Recommendations and Alternatives

Given the potential problems associated with difference scores, researchers might

consider several alternatives and suggestions. There are at least two alternatives to difference

scores, and there are several recommendations that should be strongly considered if difference

scores are used.

Examine each component as a dependent variable. Researchers might avoid difference

scores altogether, focusing instead on the components. For example, researchers might conduct

analyses twice – once with each component as a dependent variable. Anything potentially


revealed in an analysis of difference scores might be revealed more clearly, simply, and directly

by analysis of the two components.

Consider a regression context, though the same principles apply in an ANOVA context.

Rather than examining a single “difference score model” in which difference scores are predicted

by a predictor variable (i.e., Di=a+bPD(Pi)), researchers could examine two “component models,”

Xi=a+bPX(Pi) and Yi=a+bPY(Pi), predicting each component (X and Y) from the predictor

variable (P). The slopes from the component models reflect the association between each

component and the predictor, and the difference between these slopes is identical to the slope

obtained from the difference score model (i.e., bPD=bPX-bPY). Similarly, as shown earlier

(Equation 7.5), the correlation between a predictor variable and a difference score (rPD) largely

reflects the difference between the two component-predictor correlations (rPX and rPY, weighted

by the components’ standard deviations). Similar ANOVA approaches could be used, or perhaps

even more usefully, an ANOVA approach can be translated into a regression analysis. Important

generalizations of this approach are presented by Edwards (1995).

Such examination of components rather than difference scores avoids problems

associated with difference scores. For example, if the predictor is related to only one component,

then a separate-component analysis would reveal this fact. Similarly, if components differ in

their variability, then separate-components analysis avoids the discriminant validity problem

arising with difference scores (i.e., the difference score primarily reflecting the component

having greater variability).

Examine both components as predictors in a regression model. Another separate-

component approach is to enter components as predictors in regression models predicting a

variable of interest. As noted by others (e.g., Peter, Churchill, & Brown, 1993), this approach


requires a slight reconceptualization to a question of incremental variance. For example, the

attractiveness example described earlier (Swami et al., in press) could be framed via a

hierarchical set of questions – 1) do male extraverts differ from introverts in the smallest size

figure they find attractive, and 2) after accounting for any such extraversion-related preferences

in small-sized figures, are there any remaining extraversion-related differences in preferences for

larger-sized figures? Such questions could be addressed through hierarchical regression with

two simple models:

1: Extraversion=a+b1(Smallest-size deemed attractive)

2: Extraversion=a+b1(Smallest-size deemed attractive)+b2(Largest-size deemed attractive).

The size and significance of b1 from Model 1 addresses the first question, and the size and

significance of b2 from Model 2 addresses the second. A large and significant b2 from Model 2

indicates that, for two males having identical “small-size” attraction preferences, the one with

higher Extraversion is likely to have a different “large-sized” attraction preference than the one

with lower Extraversion. Conceptually, this is quite similar to a “difference score” type of

finding that Extraversion is correlated with “attractiveness range,” but the two-phase analytic

strategy provides more information of potential importance. For example, it avoids the

possibility that an apparent “difference score” effect simply reflects one component, and it

provides information about the combined effect of the components.

Examine difference scores along with their components. After considering alternatives,

some researchers may remain interested in difference scores. In such cases, analysis might be

informatively-conducted by adding the analysis of difference scores to the analysis of the

components, rather than replacing the components with difference scores. Indeed, if difference


scores are used, then they should be used only when accompanied by careful examination of

component scores.

At a minimum, researchers using difference scores should present fundamental

psychometric information and descriptive statistics of the components and of difference scores.

Specifically, they should present: a) reliability estimates of the components and of difference

scores, b) the means and variabilities of the components and of difference scores, and c) the

correlation between the two components and the correlation between each component and the

difference scores. Researchers can estimate the reliability of difference scores via:

Equation 7.6( ) ( )2 2

2 2

22

X XX Y YY X Y XYDD

X Y X Y XY

s estimated r s estimated r s s restimated r

s s s s r+ −

=+ −

where , , 2Xs 2

Ys Xs , Ys , and rXY are as defied earlier. Thus, researchers can use the components’

basic descriptive and psychometric information to estimate the reliability of difference scores.

Such information allows researchers and readers to gauge potential problems with

reliability and discriminant validity. If reliabilities appear low, then researchers should consider

the resulting limitations upon their ability to detect meaningful results. Further, if one

component has substantially-greater variability than the other, then researchers should recognize

the resulting lack of discriminant validity between the “larger variability” component and the

difference score. A lack of discrminant validity would be apparent also in a large correlation

between that component and the difference score. If such validity-related concerns exist, then

readers and researchers should interpret analysis of difference scores very cautiously. In fact,

such findings might motivate researchers to avoid difference scores altogether, returning to a

separate-component approach.

Going further, researchers interested in using difference scores should strongly consider

running all analyses with both components along with the difference score (i.e., running three


sets of analysis). For example, the “attractiveness range” research presented one set of

ANCOVAs with the difference score as the dependent variable, another ANCOVA with the

“thinnest-size deemed attractive” component as the DV, and another with the “largest-size

deemed attractive” component as the DV (Swami et al., in press). Results revealed no effects for

the “thinnest” component, significantly robust effects for the “largest” component, and

significantly robust effects for the difference score. The latter finding is fully predictable from

the fact that previously-reported correlational analysis revealed an extremely large association

between the “largest” component and the difference scores. That is, the IV’s effects on the

“attractiveness range” difference score simply reflect the IV’s effects on the “largest” component.

The authors noted this important fact when discussing their results.

Summary

The intuitive appeal of difference scores masks a psychometrically-thorny set of

problems. The current section introduced some psychometric properties of difference scores,

noted important implications and potential problems arising from these properties, and presented

recommendations regarding the analysis of difference scores. Two psychometric issues are

particularly relevant – the potential lack of discriminant validity and the potential for low

reliability. The discriminant validity issue is perhaps the more serious, though less-appreciated,

problem with difference scores. That is, many researchers are familiar with concerns about the

reliability of difference scores, but fewer may be aware that a difference score could simply

reflect one of its components. In sum, analysis of difference scores – if conducted at all – should

be conducted only alongside analysis of the components producing the difference score, and only

with careful attention to core psychometric issues.


Footnotes

1. Equation 7.2 differs from some equations articulating the reliability of difference scores, such

as this commonly-presented equation:

( )1 21

XX YY XYDD

XY

r r rr

r+ −

=−

This equation is accurate only when the two component measures have equal variabilities – an

assumption that, though sometimes valid, bypasses some crucial psychometric facts. In contrast,

Equation 7.2 follows the basic tenets of classical test theory, with no additional assumptions, and

it reveals implications regarding the links between variability and reliability.

2. Interestingly, difference scores are the basis of some familiar statistical procedures, such as the

test of an interaction in a split-plot analysis. Note that such analysis requires attention to the

homogeneity of variances of the within-subjects factor, corresponding to concern about the

similarity of variances of the two components of a difference score. Furthermore, researchers

would rarely limit analysis to a significant interaction, more likely proceeding to decompose the

interaction into simple main effects. Such informative, important, and quite standard follow-up

analysis parallels the examination of the two components of a difference score.


Table 7.1

Example Data for Profile Similarity

Couple 1 Distinctive DistinctiveTrait Marge Homer Marge Homer

Neuroticism 4 3 .00 -1.00Extraversion 5 7 -.67 1.67

Openness 1 2 -.33 -1.67Agreeableness 5 4 -.33 -1.00

Conscientiousness 4 3 -1.67 -2.00Mean 3.80 3.80 -.60 -.80

Std Dev 1.47 1.72 .57 1.29

Couple 2 Distinctive DistinctiveTrait Wilma Fred Wilma Fred

Neuroticism 2 4 -2.00 .00Extraversion 5 4 -.67 -1.33

Openness 1 5 -.33 1.33Agreeableness 6 5 .67 .00

Conscientiousness 7 6 1.33 1.00Mean 4.20 4.80 -.20 .20

Std Dev 2.32 .75 1.15 .93

Couple 3 Distinctive DistinctiveTrait Betty Barney Betty Barney

Neuroticism 6 5 2.00 1.00Extraversion 7 5 1.33 -.33

Openness 2 4 .67 .33Agreeableness 5 6 -.33 1.00

Conscientiousness 6 6 .33 1.00Mean 5.20 5.20 .80 .60

Std Dev 1.72 .75 .81 .53

Norms Normative NormativeTrait Wife Husband

Neuroticism 4.00 4.00Extraversion 5.67 5.33

Openness 1.33 3.67Agreeableness 5.33 5.00

Conscientiousness 5.67 5.00Mean 4.40 4.60

Std Dev 1.65 .65


Table 7.2 Profile Similarity Components Generalized Overall Distinctive Normative Wife Husband Couple Similarity Similarity Similarity Normativeness Normativeness

1 .70 .19 .92 .94 .77 2 .48 .37 .92 .89 .11 3 .59 -.29 .92 .89 .72


Figure 7.1 Hypotehtical profiles

7.1a

0

1

2

3

4

5

6

7

8

Neur. Ext Open. Agree. Consc.

Scor

e

Trait

Wife and Husbad Personality Profiles

WifeHusband

7.1a

.00

1.00

2.00

3.00

5.00

6.00

7.00

8.00


Scor

e

Trait

Wife 1 Personality Profile

4.00Elevation Shape

Scatter


Figure 7.2

7.2a

.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00


Scor

e

Trait

Wives' self-rated profiles

MargeWilmaBetty

7.2b

.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00


Scor

e

Trait

Wives' self-rated profiles, with the Normative Wife proflie

MargeWilmaBettyNormative Wife


Figure 7.3 Reliability of difference scores as a function of: a) reliabilities of components, and b) true correlations between components

7.3a

.00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

.00 .10 .20 .30 .40 .50 .60 .70 .80

Reliability of Differen

ce Scores (rDD)

Reliability of measures of X and Y (rXX and rYY)

.90 1.00

7.3b

.00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

.00 .10 .20 .30 .40 .50 .60 .70 .80 .90 1.00

Reliability of Differen

ce Scores (rDD)

True correlation between X and Y (rTXTY)


Figure 7.4

‐1.00

‐.80

‐.60

‐.40

‐.20

.00

.20

.40

.60

.80

1.00

.20 .40 .60 .80 1.00 1.20 1.40 1.60 1.80 2.00 2.20

r XD

or rYD

sX/sY

rXD

rYD

Documents

Chapter 7 Special Topics in Social Psychological ...psych.wfu.edu/furr/Psychometrics/Special topics - profiles and... · Furr, Special Topics - 1 Chapter 7 . Special Topics in Social