Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

1/13

Psychological

Methods

2000,

Vol.

5, No. 3,

343-335

Copyright

2000

b> the American Ps>chologiciil Association, Inc.

I082-989X/00/S5.00 DOI: IO.I037//I082 989X.5.3.343

Coefficient Alpha and Related Internal Consistency

Reliability Coefficients

H. G. Osburn

University of Houston

The

author studied

the

conditions

under

which coefficient alpha

and

10

related

internal consistency reliability

coefficients underestimate the

reliability

of amea-

sure.

Simulated datashowed

that

alpha,

though

reasonably robust

when computed

onncomponentsin

moderately

heterogeneousdata,can

under

certainconditions

seriously

underestimate

the reliability of ameasure.

Consequently, alpha,

when

used incorrectionsfor

attenuation,

can result in nontrivial overestimation of the

corrected correlation.Mostof the

coefficients

studied,

including

Iambda2, did not

improve the estimate to anygreatextent when thedata wereheterogeneous. The

exceptions were stratified alpha

and

maximal

reliability, which

performed well

whenthe components weregrouped intotwo subsets, each measuring adifferent

factor,

and maximized

Iambda4,

which

provided

the

most

consistently

accurate

estimate of the

reliability

in all simulations

studied.

Coefficient alpha is an appropriate reliability esti-

mator for composite measures containing multiple

components. A component may be a

test

item, a

judge,

aThematic Apperception

Test

(TAT)

card,

a

survey

question, a subtest, or a test that is being com-

bined intoa composite test battery. Multiplecompo-

nents may be homogeneous in thesenseof measuring

a single latent variable, or they may be heterogeneous

in

the

sense

of

measuring

two or

more factors

or

latent

variables.Because of coefficient alpha's flexibility,

its use is

ubiquitous

in

mostareas

of

psychology

as

well

as in

many other disciplines.

For

example,

Cortina (1993)reported that a review of the Social

Sciences Citation Index for the period

from

1966 to

1990 revealed that Cronbach's 1951 article on coef-

ficient alpha had been referenced approximately 60

times

per

year

in a

total

of 278

differentjournals.

Although

equivalent-forms

estimators

are

often

preferred, they are appropriate only in very

special-

ized circumstances

in

which

two or

more test forms

constructed to be parallel are available.Test-retest

coefficients

are

problematical

for

paper-and-pencil

tests because of memory and practiceeffects.Internal

consistencyreliability

estimators

arevery flexibleand

Correspondence concerning this article should be ad-

dressedto H. G.

Osburn, Department

of Psychology,Uni-

versityofHouston,4800Calhoun,

Houston,Texas

77004.

Electronic

mail

may be sent to

[email protected].

appropriate for a wide variety of circumstances in

which an estimate of reliability is needed. Examples

are (a) a single administration of a composite test

containing n components, (b) two or more subtests

combinedtoformacomposite

score,

(c) two ormore

tests combined in a battery

score,

and (d) two or more

judges' ratings combined into a

composite.

Several investigators have noted that internal con-

sistency

reliabilitycoefficients, including coefficient

alpha, based on asingleadministration of a test may

overestimate reliability because such coefficients as-

sign transient error due to differences in test admin-

istration, temporary changes in the examinee, and so

on

to true-score variance(Guion, 1965; Schmidt and

Hunter, 1996). Inaddition, if the

errors

made in re-

spondingto test items are positively correlated, coef-

ficient

alphamay beinflated(Komaroff, 1997). Also,

reliabilitymay be

overestimated

ifcoefficient alphais

erroneously used to estimate the reliability of a

speeded

test. Although it is true that coefficient alpha

may

sometimes be inflated because of the conditions

mentioned above, a potentially more serious problem

is

the

tendency

of

coefficient alpha

to

underestimate

the true reliability

when

the data are multidimen-

sional.If the items of a test conform to classical test-

theory assumptions, coefficient alpha

is a

lower

bound to the true reliability of a measure and fre-

quently is an

underestimate

of

reliability

if the

mea-

sure contains a small number of heterogeneous items.

This point was made as far back as

1945

by

Guttman,

343


2/13

344

OSBURN

who noted, "Itseemsplausible that in a vast number

of

cases the

hypothesis

of

equivalence

(of the

split

halves)

is

dubious,

and

that therefore

the

reliability

of

a great many tests

has

been seriously underesti-

mated."Agoodexample of this phenomenon are the

numerous studies of the internal consistency reliabil-

ityof the McClelland/Atkinson TAT measure of need

for

achievement. Coefficient alpha applied to this

measure consistently shows low reliabilities. How-

ever,

as

McClelland

(1980)has

pointed out, this mea-

sure shows moderate

to

high validity when predicting

operant measures of performance. TAT investigators

have formulated fairlyelaborate scenarios to explain

this

discrepancy.

For

example,

it hasbeen

suggested

that responses to TATcardsareinconsistent because

subjectsfeel they should notrepeatthemselves. Some

TAT investigators have even argued

that

the construct

validityof operant measuresdoesnot require internal

consistency

as

estimated

by

coefficient alpha.

The

point

is

that

it is

very likely that

the

discrepancy

is due

to the heterogeneity of the TAT data that

results

in a

negatively biased reliability estimate. This explana-

tion has been hinted at in the comments mentioned

above,

but not explicitly recognized.

That coefficient alpha may underestimatereliability

was

also recognized by Cronbach (1951),who noted

that a test containing distinct clusters of items might

have

a parallel split coefficient considerably higher

thancoefficient alpha. A major purpose of this study

was

to examine the conditions under which coeffi-

cient alpha underestimates

the

true reliability

of a

measure

that is not tau

equivalent, that

is, is

either

congeneric equivalent

or

multidimensional.

A

related

purpose was to examinealternative internal consis-

tency reliability estimates that may improve the esti-

mate

of

reliability compared withcoefficient alpha.

Corrections for Attenuation

If

reliability

is

underestimated, attenuation

is

over-

estimated (Guttman,

1945).

Because of their tendency

to

underestimate

the

reliability

of

heterogeneous

tests,

internal consistency estimators have been especially

controversial when used incorrection forattenuation

formulas

(Cureton,

1958;Lord&

Novick, 1968; Nun-

nally, 1978).The use of such reliability estimates to

correct

for

attenuation

will in

many

cases result in

overestimation

of the correlation between the true

scores on the two measures and in some cases may

result

in

substantial overestimation.

In a

recentarticle,

Muchinsky

(1996)argued that correcting for errors of

measurement in the criterion remains controversial in

meta-analyses,criterion-related validity studies, and

utilityanalyses. Finally, Schmidt

and

Hunter (1996),

in

a discussion of using various reliability coefficients

incorrections for attenuation, largely ignored the ten-

dency of coefficient alpha to underestimate the reli-

ability because of

item

heterogeneity. A second pur-

pose

of

this

study

was to

examine

the

extent

to

which

correlation coefficients, corrected forunreliability,are

overestimated when using coefficient alpha to esti-

mate reliability.

Internal Consistency Reliability Estimators

Although

coefficient

alpha iswidely used inprac-

ticalwork, a significant body of literature has devel-

oped in which several less familiar internal consis-

tency reliability coefficients have beenproposed.

All

of these coefficients have in common an attempt to

reduce

the underestimation of reliability

when

the

components of a measure are not equivalent. All ex-

cept standardized alpha and maximal reliability are

discussed

by

Feldt

and

Brennan(1989).However,

in

the

interests of clarity and

reader

convenience, a

brief

discussionofeachestimator's important

properties

is

presented

in

this section, along

witha

computing for-

mula for

each coefficient.

It

should be noted that this article does not specifi-

callyaddress the generalizability approach to reliabil-

ity,

which

goes

far beyond classical reliability theory.

However, in certain designs, such as ap x i single-

facet

study,thegeneralizability coefficientis thesame

as coefficient alpha, so the findings of the present

studyapply.

Coefficient Alpha

Inthe theory of composite tests, thereliabilityof a

measure,

p~

XT

,

is defined as theratioof the variance

of the truescores (T) of the measure to the variance

of theobserved

scores

(X). In the framework of clas-

sical reliability

theoryand forclarity,wecallp the

true reliabilityof a composite measure. As mentioned

above, coefficient alpha is a lower bound to the true

reliability

(Outtman,1945).However, under very

re-

stricted conditions coefficient alpha is equal to the

true reliability.In his derivation of the

formula

for

coefficient

alpha, Gulliksen (1950) considered twon-

item tests that are parallel

item-for-item

and assumed

that

the

average

covarianceof an

item

in one

testwith

theparallel item in the other test isequal to the av-


3/13

COEFFICIENTALPHA

345

eragecovariance of pairs of items

within

a single test,

that

is,

wherep,/o-,

2

is the covariance between parallel items

and p y T f O y is the

covariance between nonparallel

items.

If

Equation

1

holds, then coefficient alpha

is

equal

to the true reliability. Novick and Lewis (1967)

showed that Equation 1holdsif and only if the com-

ponents of the test are essentially tau equivalent,

that

is,

Ti

= a

v

+ Tj, (2)

for

all /andj.

Tau-equivalent

measures have equal true scores,

and as

indicated

in

Equation

2,

essentially tau-

equivalentmeasures have truescoresthaidiffer only

by anadditive constant. In this article both essential

tau

equivalenceand tauequivalence leadto the same

conclusions. Therefore, essential

tau

equivalence

will

be referred

to

simply

as

tau

equivalence.

Both

tau

equivalence andessential tauequivalence imply that

the correlations between true

scores

are all unityand

that the variances and covariances of the true scores

onthe

components

of the

measure

are all

equal.

Tau

equivalencealso implies that the observed component

intercorrelations corrected for attenuation are

unity

and

that

the

measure

is

unidimensional

with

equal

factor loadings

but

possibly

unequalerror

variances.

There are two important advantages of a test whose

components are tau equivalent: (a) The test can be

interpreted as

measuring

a single underlying construct

and (b) coefficient alpha computed on the components

of the

test

is

exactly equal

to the

true reliability.

Congeneric equivalence is closely related to tau

equivalence. If the components of a measure are con-

generic equivalent, the correlations between the true

scoresare

unity

but

variances

of the

truescores

may

vary(Joreskog

&

Sorbom.

1990).Congeneric equiva-

lence implies that

the

measure

is

unidimensional,

but

the factor loadings and

error

variances may vary.

Congeneric tests share the two advantages of tau-

equivalenl

tests because such

tests

measure

a

single

underlying

construct,

and the

data presented

in

this

article show that coefficient alpha

for

such tests

is

reasonablyclose

to the true

reliability

of the

compos-

ite measure.

Finally,

if the

components

of a

measure

are

parallel,

the correlations between true scores are all

unity

and

the variance and covariance of the component true

scoresof the measure are allequal. In addition, the

components are equally reliable. Parallel components

are unidimensional with equal factor loadings and

equal

error variances.

Coefficient

alpha

computed on

heterogeneous

components may seriously underestimate the true re-

liability.This issue is closely related to the

multidi-

mensionality of the

latent variables underlying

the

components.

If the

components

are

unidimensional,

coefficient alpha willbe anadequate estimate of the

true

reliability in allcases.If the components are char-

acterized by two or more moderately correlated fac-

tors

(i.e.,

inthe

range

of

.2-.4),coefficient alpha

may

be

seriously underestimated.

Coefficient alpha may be computed using variance

components

but is

ordinarily computed

by the

follow-

ing

equation:

a = n(l- 2o-,

2

/o-

2

.)/(n- 1 ,

3

where of is thevarianceof the;'th component, of is

the variance of the test, and the total score on the

test is the sum of the ncomponent scores. It is im-

portant to

note

that coefficient alpha can be comput-

ed on the n

components

of a

measure,

thencompo-

nentsgrouped into split halves, or thencomponents

grouped into three or more parts. In all oftheseap-

plicationscoefficient alpha is still a lower bound to

the

true reliability (Guttman,1945).

Analternate formula

for

coefficient alpha

is

a =

n

2

Ave(o-,j)/(nAve(of)

+n(n-

l)Ave(o-,

y

)),

4

whereAve(a,y)

is the

average covariance among com-

ponents and

Ave(of)

is the average variance of the

components

in the

measure. Several conclusions

about coefficient alphacan be drawn from this for-

mula:

(a) If Ave(a,y) islarge relative to Ave(of), co-

efficient

alpha will

be

large;

(b) if

there

are

many

components in the measure,the n(n 1)averageCo-

variances willdominate the

average variances and

coefficient alpha willtend to belarge;(c) if

Ave(o-,

7

)

is

small relative to Ave(of) either because the covari-

ances are

uniformly

low or because the covariances

are

very variable,

coefficient

alphawilltend

to be low

to moderate.

Raju s Coefficient

Raju

(1977)

derived

a

formula

for

estimatingreli-

ability

whendifferent numbersofitemsare assigned

topartsof a

test

so

that

thepartsare at

most conge-

neric.

Raju's coefficient

is

defined

by


4/13

346

OSBURN

p =(o-

2

-2o-

2

)/((l

(5)

where X , =

,/S;

and ,-is the number of items in the

ithpart.If theparts contain equalnumbers of items,

all X , equal Mr(where k is the number ofparts) and

Raju's

coefficient

equals

coefficient alpha.

Angoff-Feldt

Coefficient

Feldt (1975) derived

this

coefficient for the situa-

tion

in

which

a

measuring instrument could

be

divided

into only two

parts

of arbitrary length. He assumed

that the available

parts were

homogeneous in content

but

that because of the unequal length of the

parts

the

scorescould only

be

congeneric equivalent.He also

made

the

additional assumption that

the two

parts

were differentially

shortened

versions of the total

test.

This latter assumption amounts to assuming that the

sum

of the

error

variances for the two parts

is

equal to

the

error variance

of the

total test.Feldt'scoefficient

will

exceed coefficient alpha whenever the part-test

variances

areunequal.The

coefficient

is not alower

bound

to the

true reliability

and in

some circum-

stances

may

overestimate

the

true

reliability.

Angoff

(1953) and

Kristof

(1971)

have

also derived versions

that

ultimately lead to the same formula. Feldt and

Brennan

(1989) have designated

this

coefficient as the

Angoff-Feldt

coefficient:

p =4o-.V(o-

2

-

((a*

-


5/13

COEFFICIENTALPHA

347

Ten Berge and

Zegers

computed

U

0

, u,, and u

2

on

college test data. In their data for a four-item test

(similar

to our

four-component simulations), coeffi-

cient

alpha

was

.4359,\

2

was .4602, and

u

2

was

.4673.

It was

decided

not toinclude dataon u

2

inthis

article

because of the

small difference between

u

2

and\

2

-

Maximized Lambda4

Cronbach

(1951)

showed that coefficient alpha

is

the

expectation

of

coefficient alpha computed

on all

possible

l/2(2) /(n )

2

split

halvesof a

test

containing

2nitems. If the data are tau equivalent, all coefficient

alphas computed on the split halves

will

be equal to

coefficientalpha computed on the

In

items. However,

if

the data are not tau

equivalent, some split halves

will

give higher reliability

estimates

than coefficient

alpha computedon the

In

componentsof themeasure.

Because

coefficient alpha

is a

lower bound

to the

true

reliability, coefficient alpha

computed

on some split

halves

may

provide lower bounds that

are

closer

to the

true reliability. This

propertywaspointedout by

Gutt-

man(1945) andusedbyCallenderand Osburn (1977)

inan

attempt

to

improve reliability estimates

for

cor-

rectionsforattenuation.Theterm

Iambdu4

wasintro-

duced byGuttman. Lambda4 is the same as coeffi-

cient alpha computed

on

split halves

of the

test:

=2 1 - 0-2

13)

whereu\and


6/13

348

OSBURN

Standardized

Alpha

Use of

standardized alpha implies that observed

scores on the

components

are

standardized before

summing. However, it is important to note that the

true reliability of a measure is the same for both stan-

dardized and unstandardized observed scores. Stan-

dardized alpha is not a lower bound to the true reli-

abilityand may yield estimates that are either lower or

higherthan the true reliability. When the components

of a

composite measure

are

congeneric, standardized

alpha

will

always exceed

the true

reliability. Also,

if

the components are tau equivalent and reliabilities of

the

components

differ,

standardized alpha

will

always

exceed the true reliability. Standardized alpha is com-

puted

using the

correlations rather than

the covari-

anccs:

Standardized a = n(l

n/(n

j

(15)

where

py

is the correlation between the

(th

and>th

items. It is easily shown that standardized alpha is the

same as the generalizedSpearman-Brownformulafor

lengthening

a test n times when the average correla-

tion (p) is used in the formula, thatis,

Standardizeda = p/(l + (n - l)p), (16)

wherep is the

average

of then(n - 1)

correlations.

Maximal

Reliability

Maximal reliability

provides

an estimate of reliabil-

ity

whenthe components in a test can be groupedinto

subsets.Maximal

reliability(R

K

)

assumes that

(a) all

components within

a

subset

are

parallel, that

is, the

components

in a

subtest have equal reliabilities

and

equalvariances; and (b) the components in different

subtests may havedifferingreliabilities and variances.

If

these

two assumptions are met, the components in

all subtests are congeneric (Li,

Rosenthal,

& Rubin,

1996).

Maximal reliability is a generalization of the

Spearman-Brown approach

and

provides

a

compo-

nent-weighting procedure to achieve maximal reli-

ability.

Maximalreliabilityis easy to

compute

usinga

handcalculator:

Rf

=

A/(K/(\ +(K

-

1)p) + A),

17)

whereK = numberofsubsetsorsubtests,A =n,p|/

(1 - p|) +

2

P2/d - PT ) + +"iPt/(l ~ Pi t .Pis

tne

common correlation among the subsets,n

t

is the num-

ber of typeicomponents, and p,is the reliability of a

type i item. For two subsets, p can be estimated by

r

l2

/V(r,r

2

), where

r

l2

is the average correlation be-

tween

Type

1

components

and

Type

2

components.

For more than two subsets, p can be estimated for

each pair

of

subsets

and

then averaged.

Study ofCoefficient AlphaandRelated

Reliability

Estimates

This study focused on two types of demonstration

data. The first data type simulated measures

with four

components and relatively large correlations among

the components (p = approximately .50). Examples

ofthis

data

type include supervisor ratings, cognitive

abilitysubtests, and personality dimensions. The sec-

ond

data

type simulated measureswith eight compo-

nents and

relatively

low

correlations among

the

com-

ponents (p =

approximately .25). Examples

of this

data

type include questionnaire measures of organi-

zational commitment, role ambiguity, personal pref-

erences, perception,

and

personality facets. Various

scenarios weresimulated,and theextenttowhichthe

variousreliabilityestimators under- or overestimated

the true reliability was determined.

Demonstration Data

Demonstration data were created to study estima-

tion

of

true reliability

by the

various internal consis-

tencycoefficients. The general method was to gener-

atesimulatedpopulation covariancematricesfor both

trueand observed component

scores.

From

these

data

the

true

reliability and 11

estimators

of reliability

couldbe computed for varying

degrees

of heteroge-

neity among the components. A description of the

simulation methodis in the Appendix.

The following data sets were simulated:(a) homo-

geneous, one-factor data

with

equal factor loadings

and equal

error

variances (parallel components); (b)

homogeneous, one-factor datawith equalfactor load-

ings

but

unequal error variances(tau-equivalentcom-

ponents);(c)homogeneous, one-factor data

with

un-

equal

factor loadings

and

unequal

error

variances

(congeneric-equivalent components); and (d) hetero-

geneous, two-factor data

with

decreasing correlations

between the factors (heterogeneous components).

Three

types of situations with which an investigator

might be confronted were simulated:(a)ungrouped

data such as a test containing n items, (b) data

grouped into two subsets such as two subtests of a

larger test, and (c)

data

grouped into three subsets. If

the data are ungrouped, an investigator has a choice of

computing awide varietyof

reliability

coefficients by


7/13

COEFFICIENT

ALPHA

349

analyzing

the

ungrouped data

as is or by

grouping

the

data in various ways to find thelargestlower-bound

reliabilityestimate.

As

shown

in

this

demonstration,

an improved lower-bound estimate

can

sometimes

be

obtained by grouping thedataand computing a reli-

ability

estimate

on the

groupeddata.

If the

data

are

grouped to start with, aninvestigator

still

has a num-

ber of lower-bound reliability estimates to

choose

from.In the case of two groups, six

different

estimates

are

available.

The

downside

in

computing

a

large

number of the

lower-bound alternatives

to

coefficient

alpha

is the

possibility

of

capitalizing

on

sampling

error. However,

as

shown

in

this demonstration,many

of the

alternative coefficients give similar results,

so it

is

not

necessary

to compute a large number of esti-

mates.

Results for

Four-Component

Data

Analysis

of

four components ungrouped.

As can

be

seen from Table

1,

when

the

components were

Table

1

Underestimation

of

Reliability: Four-Component Data

parallel, all coefficients were equal to the true

reli-

ability.When the components were tau equivalent, all

coefficients

were equal

to the

true reliability except

for standardized alpha, which overestimated

the

true

reliability

by a small amount. When the components

were

congeneric,

onlyFeldt-Gilmerandmaximized

Iambda4 were equal to the true reliability, although

coefficient

alpha and

Iambda2

were

very

close,

and

standardized alpha and theFeldtcoefficient slightly

overestimated

the true reliability. When the compo-

nents were generated by two factors, allcoefficients

except maximized

Iambda4

underestimated

the

true

reliabilityto a greater or lesser extent. For example,

whenthe components were moderately heterogeneous

coefficient

alpha

was

about

92% of the

true reliability,

and

when

the

components were strongly

heteroge-

neous coefficient alpha

was

about

78% of the

true

reliability.In the strongly heterogeneouscasethe cor-

relation between the two factors was only .2. This

degree

of

heterogeneity

is

probably rare

in

practical

Degreeofheterogeneity

Reliability coefficient

True reliability

Four components ungrouped

Coefficient

alpha

Feldt

Feldt-Gilmer

Lambda2

Maximized

Iambda4

Standardizedalpha

Components

1 & 2

vs.

Components3 & 4

Coefficient alpha

Angoff-Feldt

Lambda2

Stratified

alpha

Standardized

alpha

Maximal

reliability

Components 1

& 2 vs.

Component3 vs.Component4

Coefficient

alpha

Raju

Feldt

Kristof

Lambda2

Standardizedalpha

Maximal reliability

Parallel

.800

.800

.800

.800

.800

.800

.800

.800

.800

.800

.800

.800

.800

.750

.800

.800

.800

.760

.787

.800

Tau

eq.

.798

.798

.798

.798

.798

.798

.799

.798

.799

.798

.798

.799

.801

.748

.798

.808

.798

.758

.797

.804

Congen.

.786

.778

.792

.786

.782

.786

.799

.763

.806

.763

.786

.799

.801

.673

.718

.820

.786

.690

.797

.804

Slight

.781

.752

.752

.752

.753

.781

.753

.694

.695

.694

.781

.694

.784

.683

.694

.738

.703

.688

.742

.749

Moderate

.760

.696

.697

.696

.702

.760

.698

.570

.570

.570

.760

.570

.764

.605

.615

.655

.608

.606

.677

.688

Slrong

.703

.547

.548

.547

.586

.703

.549

.234

.235

.234

.703

.235

.708

.396

.402

.430

.422

.410

.508

.549

Note.

To

assessthe

degree of heterogeneity in the two-factor data, the observed covariances of thesimulateddata were

factor analyzed using

confirmatory

factor

analysis. The correlationsamong thefactors

were .80, .40,

and

.20,

respectively, in

slightly,

moderately,and

strongly

heterogeneousdata.Tau eq. = tauequivalent;

congen.

=

congeneric

equivalent.


8/13

350

OSBURN

work. However, the strongly

heterogeneous

condition

was

useful for determining whether alternatives to

coefficient alpha could improve the estimate of reli-

ability

even

in thisextreme condition.

Analysis

of two

halves:

Twocomponents

each

half.

In

the heterogeneous split-half simulations coefficient

alpha strongly underestimated the true reliability(see

Table

1). Coefficient

alpha was 89% of

true

reliability

when

the

components

were slightly heterogeneous,

75% of the true reliability when the components were

moderately heterogeneous, and only 33% of true re-

liability

when the components were strongly hetero-

geneous.

However, in this situation stratified alpha

was exactlyequalto the true reliability. Maximal re-

liabilityvery slightly overestimated

the

true validity.

Analysis of three

pans:

Part I, Components

1-2;

Part

2,

Component

3;

Part

4,

Component

4.

As can

be

seen from

Table 1, when the components were

parallel

in

three-part simulations with

an

unequal

number of components in the parts, only the

Raju,

Feldt,and Kristof coefficients and maximal reliability

were equal to

the

true reliability.This is because the

parts were only congeneric equivalent even though

the

components were parallel. When the components

were

tau

equivalent, only Raju

and

Kristof coeffi-

cients were equal to the true reliabilitybecause the

partswereonly congeneric. The Feldt coefficient did

notequalthe truereliabilitybecause it assumes

that

the components are parallel. When the components

were congeneric equivalent, onlyKristof'scoefficient

was exactly equal to the true reliability.All

coeffi-

cients underestimated true

reliability

when

the

com-

ponents

were generated by two

factors. Coefficient

alpha

was about 87% of the true reliability when the

components were slightly heterogeneous, and 80%

and

56% of truereliabilitywhen the components were

moderately and strongly

heterogeneous,

respectively.

Maximal reliability was thebestestimator for these

data, but in the strongly heterogeneous condition

maximal reliabilitywas

still

only78% oftrue reliability.

Summary of

simulations

of

four-component

data. In these simulations coefficient alpha was

fairly robust

in

ungrouped single-component data.

However, eveninthis condition

coefficient

alphawas

only

92% of the true

reliability

when the data were

moderately heterogeneous. Coefficient alphacom-

puted on two-part splitswhen the components in each

part loaded

on

separate factorstended

to

strongly

un-

derestimate the true reliability in the heterogeneous

conditions,

but

stratified alpha

was

equal

to the

true

reliability

and

maximal reliability overestimated

the

true reliability only very slightly in all such condi-

tions. MaximizedIambda4was equal to true reliabil-

ity

in all simulations.Because

with four

components

there

are

only threepossible split halves, maximized

Iambda4 should routinely be computed when studying

thereliabilityof the average of four components.

Results

for

Eight-Component Data

Analysis of eight components ungrouped. Results

for the eight-component simulations are presented in

Table 2. On the whole the results mirrored the four-

component simulations,

that

is, allcoefficients were

equal

to thetruereliability when the components were

parallel, all except

standardized

alpha were equal to

the true reliability when the components were tau

equivalent,

and

only

the

Feldt-Gilmercoefficient

and

maximizedIambda4

equaled the true reliability when

the

components were congeneric equivalent.How-

ever,

in

these simulations coefficient alpha underesti-

mated

the true reliability

only

to a moderate

degree.

Coefficient

alpha

was 98% of the

true reliability

in

slightlyheterogeneous condition and 96% and 90% of

the true reliability in the moderate and stronglyhet-

erogeneous conditions, respectively. Lambda2

was a

slightlybetter estimatorthancoefficient alpha,but the

differences were not large.

Analysis oft\vo halves:Fourcomponents each half.

As in the four-component data, coefficient alpha did

not perform nearly as well in these simulations. Co-

efficient

alpha was 89% of true reliability in the

slightlyheterogeneous

case

and 75% and 33% oftrue

reliability

in the moderately and strongly heteroge-

neous conditions, respectively. However, stratified al-

pha was

exactly equal

to the

truereliability,

and

maxi-

mal

reliability

overestimated the

true reliability

only

very slightly

in the

heterogeneous conditions.

The

other estimatorsexceptfor maximized Iambda4 were

veryclose

to

coefficient alpha

and did not

improve

the

estimate

to any

greatextent.

Analysis of threeparts: Part I, Components 13;

Part

2, Components

4-6;

Part 3, Components

7-8.

Results on three-part data are presented in

Table 2.

Raju,

Feldt, and Kristof coefficients and

stratified alpha

and

maximal reliabilitywere equal

to

the true reliability when the components were paral-

lel, and

only Raju, Kristof,

and

stratified alpha coef-

ficients

were equal to the

true reliability when

the

components were

tau equivalent,

rendering

the

three

parts only congeneric

equivalent.

Coefficient alpha

was 95 , 95 ,and 77 ,respectively, of the true

reliability in the slightly, moderately, and strongly


9/13

COEFFICIENTALPHA

351

Table 2

Underestimation

of

Reliability: Eight-Component Data

Degree

of heterogeneity

Reliability

coefficient

True

reliability

Eight

componentsungrouped

Coefficient alpha

Feldt

Feldt-Gilmer

Lambda2

Maximized

Iambda4

Standardized alpha

Components

\^

vs.Components 5-8

Coefficient

alpha

Angoff-Feldt

Lambda2

Stratified

alpha

Standardized alpha

Maximal reliability

Components

1-3

vs.

Components

46

vs.Components 7 & 8

Coefficient

alpha

Raju

Feldt

Kristof

Lambda2

Stratified

alpha

Standardized alpha

Maximalreliability

Parallel

.727

.727

.727

.727

.727

.727

.727

.727

.727

.727

.727

.727

.727

.716

.727

.727

.727

.721

.727

.723

.727

Tau eq.

.725

.725

.725

.725

.725

.725

.727

.725

.726

.725

.725

.726

.728

.714

.725

.728

.725

.719

.725

.726

.728

Congen.

.712

.709

.715

.712

.711

.712

.727

.691

.735

.691

.712

.726

.728

.683

.694

.726

.712

.695

.711

.723

.728

Slight

.704

.692

.693

,692

.693

.704

.694

.625

.626

.625

.704

.626

.706

.667

.678

.681

.680

.672

.699

.679

.702

Moderate

.679

.654

.654

.654

.657

.679

.656

.509

.509

.509

.679

.509

.679

.612

.622

.625

.633

.618

.668

.625

.674

Strong

.613

.554

.555

.554

.571

.613

.556

.204

.445

.204

.613

.204

.613

.449

.477

.480

.597

.484

.578

.487

.612

?re.

To

assess

the

degree

of

heterogeneity

in the

two-factor

data, the

observed covariances

of the

simulateddata were

factor

analyzed

using

mfirmatory factor analysis.Thecorrelations amongthe factorswere .80, .40, and .20, respectively, in

slightly,

moderately,and strongly

Note.

heterogeneous data.Tau eq. = tauequivalent;congen. congenericequivalent.

heterogeneousconditions. Stratified alpha performed

reasonably well in these simulations. Maximal reli-

ability

performed best, yielding estimates that were

veryclose

to the true reliability in all

three-part

simu-

lations.

Summary ofsimulations for eight-component data.

Coefficient

alpha performed reasonably well

in the

eight-part simulations, equaling 95% of true reliabil-

ity

in moderately heterogeneous data. When the com-

ponents were grouped into two parts with separate

factors ineach part,

coefficient

alphawas only 75% of

true reliability

in

moderately heterogeneous data.

However,

in

this condition,

stratified alpha was ex-

actlyequal

to the

true reliability

and

maximal reliabil-

ity overestimated the true reliability only by a very

small margin.

It is

clear that coefficient alpha com-

puted

on individualcomponents was abetter lower

bound than that computed on components grouped

into two halves that were not highly correlated. Maxi-

mizedIambda4was equal to the true reliability in all

simulations.

ImplicationsforCorrections forAttenuation

In

four-component simulations

with

moderate het-

erogeneity, coefficient alpha ranged

from

75% to 92%

of

true reliability. Under these conditions, using

co-

efficient

alpha

in

single-variable corrections

for at-

tenuation would result

in

overcorrections

of 4% to

15%.For

example,

if true

validity were

actually

.50,

estimated

true validity would range from

.52 to

.58.

In

the worst case scenario, whenthe components in each

part loaded on separate factors and the data were

strongly

heterogeneous, coefficient alpha

was

only

33% of

true reliability,

and if

used

in the

correction

forattenuation would overestimate true reliability by

74%.


10/13

352

OSBURN

In

eight-component simulationswithmoderate het-

erogeneity,

coefficient

alpha ranged

from75% to 96%

of

true reliability. Under these conditions, using co-

efficient

alpha

in

corrections

for

attenuation

in one

variable

could result

in an

overcorrection

of 2% to

15%.

For example, if the true validity was

actually

.50, estimated true validity would range from

.51 to

.58.

In the

worst-case

scenario,in

which

the

data

were

strongly heterogeneous and the components in each

split

loaded on separate factors, coefficient alpha was

only

33% of

true reliability.

In this

extreme case,

if

the

true

validitywereactually .50, estimated trueva-

lidity would be .87.

Finally, if

coefficient alpha

was

used

to

correct

both independent

and

dependent variables

for unreli-

ability and if

both variables were moderately hetero-

geneous,

in the

worst-case scenario

the

double cor-

rection would overestimate thetruevaliditybyabout

33%.

Discussion and Recommendations

One-Factor Models

When the components of a measure were parallel

and

each

part

contained

an

equal number

of

compo-

nents,

all

estimators were equal

to the

true reliability.

However, this conditionisseldomrealizedinreal data

because

it

means that

the

components (items, subtests,

etc.)

have equal factor loadings and equal error vari-

ances. When

the

parts contained

an

unequal number

of parallel components, the

parts

were only conge-

neric equivalent,

and

only Raju, Feldt,

and

Kristof

coefficients and

stratified alpha

and

maximal reliabil-

ity estimates equaled thetrue reliability.

When the components were tau equivalent and

there were an equal number ofcomponents in each

part, all

estimates were equal

to the true reliability

except

Angoff-Feldt,

standardized alpha,

and

maxi-

mal

reliability,

which overestimated true

reliabilityby

a

very small amount.

Whenthecomponents were congeneric equivalent,

onlyFeldt-Giliner,maximized Iambda4, stratified

al-

pha, and Kristof coefficients were equal to the true

reliability. However, when

the

data were congeneric,

almost all

coefficients could

be

used

as

adequate

es-

timates of the true

reliability.

The exception was

co-

efficient

alpha computed

on

three-part splits

withan

unequal

number

of

components

in the

splits.

Two-Factor

Models

When

the data were generated by two

underlying

factors and the

components

in one

subset measured

one

factor

and the

components

in the

other subset

measured a different factor,

coefficient

alpha seri-

ously underestimated the true reliability. Lambda2

tended

to be

slightly larger thancoefficient alpha

but

did not

improve

the

estimate

of

true reliability

to any

great extent. However,

in

these conditions stratified

alpha andmaximalreliability both provided adequate

estimates

of thetrue

reliability.

Maximized

Iambda4 was clearly the

most consis-

tentlyaccurate estimator

of the

true reliability

in all

the simulations

that were studied.

For

measures

with

few

components, if one suspects that data are hetero-

geneous, Iambda4

should

be

computed

for all

possible

splits and maximized

Iambda4

used to estimate the

true reliability.For

measureswith

6 to 10

components

the

situation

is

problematical.

The

approximation

to

maximizedIambda4 presented

in

this article could

be

computed

and

used

to

estimate true

reliability.

How-

ever, maximized Iambda4

may

slightly overestimate

thetrue

reliability

because

of

capitalizing

on

sampling

Assessing

Heterogeneity

Unfortunately,

thecharacteristicsof empirical data

(tau

equivalent, congeneric,

ormultifactorial)are

sel-

dom

known precisely. However, confirmatory factor

analysis(CFA)isuseful formodelinganddistinguish-

ing

between parallel,

tau-equivalent,

and congeneric

data providing

that

the assumptions of this approach

are

reasonably approximated (see Turban, Sanders,

Frances, & Osburn, 1989).Anothergood reason for

using

CFA is the possibility of

correlated

errors

among the components.

Komaroff

(1997) has shown

that

correlated errors

can

inflatecoefficient alpha

and

thatsucherrors interact with lackof tau equivalence.

Correlated errors can be detected with CFA, and if

necessary alpha

can be

adjusted

for this

effect. Also,

there has been agooddeal of work on ways to deter-

mine

whether data

isuni-or

multidimensional,

and a

largenumberof indices have beenproposed(see Hal-

tie, 1985).Factoranalysis (either exploratory or con-

firmatory)of the

components

is

recommended

to de-

termine theheterogeneity of thedata. Ifthe data arc

not

continuous,

it is

also important

to

factor

the poly-

choric correlations among the components. If the data

set is characterized by two or more moderately cor-

related underlying factors (in the range of .20 to .40 or

so), aninvestigatorshould be concerned. The excep-

tion to this rule is the

situation

in which there are

multiple

factors

but the first

factor

is

dominant

in the


11/13

COEFFICIENTALPHA 353

sense of accounting for the bulk of the variance

among

the components (Cronbach, 1951).

It

is important to note that the average intercompo-

nent correlation is not a reliable indicator ofhetero-

geneity.

The

average correlation among components

maybe low, but the data may stillbe tau equivalent,

and

therefore true reliability

is

accurately estimated

bycoefficient

alpha (Green, Lissitz, &

Mulaik, 1977).

Variations in the correlations amongcomponentsmay

also be a somewhat misleadingindicatorof heteroge-

neity.Even if

there

are variations in intercomponent

correlations, the data may still be tau equivalent.

However,

intercomponent correlations that cluster

into subgroups indicate heterogeneity (Cortina, 1993).

If

the data are very heterogeneous, components can

be grouped on the basis of similar content, and strati-

fied alpha

or

maximal reliability along with

coeffi-

cientalpha should

be

computed

and the

largest value

used

to estimate the true reliability. If there is no basis

for

grouping and there are a small number of compo-

nents,

Iambda4

should

be

computed

for

all possible

splits.

Many Versus Few Components

This

studyassessedcoefficient alpha computed on

onlyfour

and

eight components.

The

reason

for

study-

ing such

a

small number

of

components

is

that coef-

ficientalpha

is

moreoften than

not

computed

on

eight

or

fewer components (Peterson, 1994)

and

because

coefficient alphaapproaches

the

true

reliabilityas the

number of positively correlated components in-

creases.

Coefficient alpha becomes less and less of an

underestimate of the true reliability because the per-

cent increase in coefficient alpha as the number of

components increases is greater than the percent in-

crease

in true reliability. For example, in the present

study,

to limit overestimation ofcorrectedvalidity to

about 2%, coefficient alpha must be at least 95% of

the

true reliability. In the worst-case scenario(strong

heterogeneity)

for

eight components ungrouped,

co-

efficient alpha

was 90% of

true reliability,

but in

this

caseonly 20 components would be needed for coef-

ficient alpha

to

reach

95% of

true reliability. In

the

worst-case scenario

for

four components ungrouped,

coefficient alpha was 78% of true reliability,but in

this

case

only 28 components would be

needed

for

coefficient alpha to reach 95% of true reliability. It is

concluded that when estimatingthereliabilityofques-

tionnaire

or rating data, even if the data are very het-

erogeneous, only

20 to 30

components

are

needed

to

be reasonably assured of an adequate estimate of true

reliability. However, in the

case

of

grouped data

in-

creasing, the number of components in each group

does

not necessarily make coefficient alpha a better

lower

bound for

alternative

estimates of the true reli-

ability to be used. For example, in theworst-case

scenario for two groups of four components in each

group, coefficient alpha was only 33% of true reli-

ability. However,

in

this

case

one can

compute strati-

fied alpha, which

is

100%

of

true reliability.

References

Angoff,

W. H. (1953).

Test

reliability and

effective

test

length. Psychometrika, 18, 1-14.

Callender, J. C., & Osburn, H. G. (1977). A method for

maximizing

andcross-validatingsplit-half reliabilityco-

efficients.

Educational

&

Psychological Measurement,

37,

819-826.

Callender,

J. C., & Osburn, H. G. (1979).Anempirical

comparisonof coefficient alpha,

Gunman's

Iambda2 and

MSPLITmaximized split-half

reliability

estimates.Jour-

nal

of Educational Measurement, J6,89-99.

Cortina,

J. M. (1993).Whatiscoefficient alpha? An exami-

nation oftheoryandapplications.

Journal

of

Applied

Psy-

chology,

78,

96-104.

Cronbach, L.J. (1951).

Coefficient

alpha and the internal

structure

of tests.Psychometrika, 16, 297-334.

Cronbach, L.J.,

Schoneman,

P., &

McKie,

D. (1965).Alpha

coefficient

for stratified-parallel

tests. Educational

&


25,

291-312.

Cureton,

E. E.(1958).Thedefinitionand

estimation

of test

reliability.

Educational

&


18, 715-738.

Feldt,

L. S.

(1975).

Estimation of the reliability of a test

divided

into

twopansof unequal

length.

Psychometrika,

40,

557-561.

Feldt, L.

S.,

& Brennan, R. L. (1989). Reliability.InR. L.

Linn(Ed.),Educational Measurement(3rd ed.:pp. 105-

146). New York: Macmillan.

Gilmer,

J.

S.,

&

Feldt,

L. S.

(1983). Reliability

estimations

for atestwith

parts

of unknown lengths.Psychometrika,

48,

99-111.

Green, S. B.,Lissitz,R.

W.,

& Mulaik, S. A. (1977).Limi-

tationsof coefficient alphaas an

index

of test

unidimen-

sionality. Educational

&


37.

827-838.

Guion.

R. M. (1965). Personnel testing. New York: Mc-

Graw-Hill.


12/13

354

OSBURN

Gulliksen, H. (1950). Theory

of mental

tests. New York:

Wiley.

Guttman,L.

(1945).A

basis

for analyzing

test-relest reli-

ability.

Psychometrika, 10,255-282.

Hattie, J. (1985).

Methodology

review: Assessing unidi-

mensionality

of tests and items.Applied Psychological

Measurement, 9, 139164.

Joreskog,

K.

G., &

Sorbom,

D.

(1990).

L1SREL

7: A

guide

to

the

program

and

applications (2nded.).Chicago:

Na-

tional EducationPress.

Komaroff, E. (1997).

Effect

of simultaneous violationsof

essential tau-equivalence and uncorrelatederror.Applied


21,337-348.

Kristof, W. (1971). On the theory of a set of

tests

which

differ only in length.Psychometrika, 36,207-225.

Kristof,

W. (1974).

Estimation

of

reliability

and

truescore

variance from a split of a test intothreearbitraryparts.

Psychometrika,

39,

491-499.

Li,

H.,

Rosenthal, R., &Rubin,D. B. (1996). Reliability of

measurement in psychology: From Spearman-Brown to

maximal reliability.

Psychological

Methods,

I,

98-107.

Lord, F. M., & Novick, M. R.(1968).Statistical theories

of

mental testscores.MenloPark,CA: Addison-Wesley.

McClelland, D. C.

(1980).

Motive dispositions: The merits

ofoperant and respondent measures. In L. Wheeler

(Ed.),

Review of personality

and

social psychology (pp.10-41).

Beverly Hills,CA:Sage.

Muchinsky,

P. M.

(1996).

The

correlation

for

attenuation.

Educational &Psychological Measurement, 56,63-75.

Novick,

M.R., &

Lewis,

C. L.

(1967). Coefficient alpha

and the

reliability

of

measurements. Psychnmetriku, 32,

1-13.

Nunnally,

J. C.(1978).Psychometric theory(2nd ed.).New

York: McGraw-Hill.

Peterson,R. A.

(1994).

A

meta-analysis

ofCronbach'sco-

efficient alpha.Journal of Consumer Research, 21,381-

391.

Raju, N. S. (1977). A generalization of coefficient alpha.

Psychometrika, 42,549-565.

Schmidt,

F.

L.,

&

Hunler,i.

E. (1996).

Measurement error

inpsychologicalresearch: Lessonsfrom26 researchsce-

narios. Psychological Methods,

I,

199-223.

Ten

Berge, J. M.F., &

Zegers,

F. E. (1978). A seriesof

lower bounds to the reliability of atest. Psychometrika,

43,575-579.

Turban,

D.

B.. Sanders,

P.

A., Frances,

D.

J.,

&

Osburn,

H. G. (1989). Construct equivalence as an approach to

replacing validated cognitive ability selection tests. Jour-

nal ofApplied Psychology, 74,62-71.

Appendix

Simulation

Method

1.Tau-equivalentdatawere simulated

as

follows:

a. The correlations among the component true scores

were set to1.0.

b. The variances of the componenttruescoreswere set

to1.0.

These two operations

generate

a true-score

covariance

matrix

with every element unity.The matrix is 4 x 4 for

four-component dataand8x8foreight-component data.

c. To generate the corresponding tau-equivalent ob-

served correlation matrix, the reliabilities of the components

were

systematically varied.

For

example, in the four-

component

case

the reliabilities were set to.45, .46, .52,and

.55. The

observed-score

correlationbetween the < th andj'th

component observed scoreswas computed by

multiplying

the true-score correlation between

the

ith and

jlh

compo-

nentsby the square root of

the

product oftheirrespective

reliabilities. For example, in the four-componentcase the

observed-score correlation between Components

I and 2

(.465)was computed bymultiplying 1.0 by the square root

of

.45 x

.48.

d. Because reliabilityistrue-scorevariance divided by

observed-score variance, theobserved-scorevariance of the

ithcomponent was computed by dividing the true variance

of the component by itsrespectivereliability.For example,

the

observed variance

of

Component

1 in the

four-

component case (2.15) was computed by dividing 1.0 by

.465.

2.

Parallel data were simulated

the

same

way as for

tau-

equivalent

data except

that the reliabilities of the

compo-

nents

were kept constant.

3. Congeneric-equivalent data were

simulated the

same

way

as for

tau-equivalentdata exceptthattrue variances

of

the components were systematically varied.

4.Truereliabilitywas computed bydividingthe variance

of the summed true

score

on the components by the variance

of the summedobserved score.


13/13

COEFFICIENTALPHA 355

5. Supervisor-rating data with

four

components were

simulated

as

follows:

a.To simulate paralleldata,the reliability of each com-

ponent was set at .50. This resulted in parallel data with

correlations of .50 among the observed scores.

b.To simulatetau-equivalentdata, the reliabilities of

the components were

set to

.45, .48, .52,

and

.55. This

resulted in tau-equivalent data with correlations that were

variable (.465

to

.535) among

the

components

but

averaging

approximately .50.

c. To simulate congeneric-equivalent data, the true

variance of Components 1 and 2 was set at 2.0 and the true

variance of Components 3 and 4 was set at 1.0.

d. To generateslightlyheterogeneous data, true-score

correlations between Components

1 and 2

withComponents

3 and 4

were

set to .8; for

moderately heterogeneous

data

0.80swere replaced by

0.60s.

For strongly heterogeneous

data 0.60s were replaced by 0.20s. Confirmatory

factor

analysis (CFA) applied to all three observed

covariance

ma-

trices resulted in a two-factor solution. Components 1 and 2

loaded

1.00

on Factor 1 and 0 on Factor 2. Components 3

and

4 loaded 0 on Factor 1 and 1.00 on

Factor

2. For

slightly

heterogeneous data,

the

correlation between

Factors1 and 2

was

.80. For moderately heterogeneous data the correlation

was

.60, and for strongly heterogeneous data the correlation

was .20.

e.Four-pan datawere simulated byX =XI+ X2 + X3

+ X4; two-part data were simulated by Xa = X I

+ X2,

Xb

= X3 +

X4;

three-part data were simulated by Xa

=

XI +

X2,

Xb = X3, Xc = X4.

6. Questionnaire data w i t height components were simu-

lated as follows:

a.To simulate paralleldata,the reliability of each com-

ponent was set at .25. This resulted in correlations of .25

among

the

observed scores.

b. To simulate tau-equivalent data, the reliabilities of

the components wereset to .22, .22, .24, .24, .26, .26, .28,

and

.28. This produced tau-equivalent data with observed

score correlations ranging

from

.22 to .28 but averaging

approximately .25.

c. To simulate congeneric-equivalent data, the true

variances wereset to 2.0 forComponents 1, 2, 3, and 4, and

to 1.0 for Components 5, 6, 7 and 8.

d. To

generate slightlyheterogeneous data, true-score

correlations

between

Components 1,2, 3, and 4with Com-

ponents 5, 6, 7, and 8 were set to .8, for moderatelyhetero-

geneous data

0.80s

were replaced

by

0.60s.

For

strongly

heterogeneous data

0.60s

were replaced by 0.20s. These

data were factored using CFA.

The

factor loadings were 1.0

and 0.0.

and

correlations among

the two

factors were

the

same

as for

four-component data, that

is, .8, .6, and .2.

e. Eight-part data were simulated by X =

X I

+ X2 + X3

+ X4 + X5 + X6 + X7 + X8;

two-part data were simulated

by

Xa

= XI + X2 + X3 + X4, Xb = X5 + X6

-f

X7 + X8;

three-part data were simulated by Xa= XI +X2 + X3, Xb

= X4 + X5 + X6, Xc = X7 + X8.

ReceivedOctober 11, 1996

RevisionreceivedJuly

20,

1999

Accepted March

22

2000

Documents

Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf