Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

Embed Size (px)

Citation preview

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    1/13

    Psychological

    Methods

    2000,

    Vol.

    5, No. 3,

    343-335

    Copyright

    2000

    b> the American Ps>chologiciil Association, Inc.

    I082-989X/00/S5.00 DOI: IO.I037//I082 989X.5.3.343

    Coefficient Alpha and Related Internal Consistency

    Reliability Coefficients

    H. G. Osburn

    University of Houston

    The

    author studied

    the

    conditions

    under

    which coefficient alpha

    and

    10

    related

    internal consistency reliability

    coefficients underestimate the

    reliability

    of amea-

    sure.

    Simulated datashowed

    that

    alpha,

    though

    reasonably robust

    when computed

    onncomponentsin

    moderately

    heterogeneousdata,can

    under

    certainconditions

    seriously

    underestimate

    the reliability of ameasure.

    Consequently, alpha,

    when

    used incorrectionsfor

    attenuation,

    can result in nontrivial overestimation of the

    corrected correlation.Mostof the

    coefficients

    studied,

    including

    Iambda2, did not

    improve the estimate to anygreatextent when thedata wereheterogeneous. The

    exceptions were stratified alpha

    and

    maximal

    reliability, which

    performed well

    whenthe components weregrouped intotwo subsets, each measuring adifferent

    factor,

    and maximized

    Iambda4,

    which

    provided

    the

    most

    consistently

    accurate

    estimate of the

    reliability

    in all simulations

    studied.

    Coefficient alpha is an appropriate reliability esti-

    mator for composite measures containing multiple

    components. A component may be a

    test

    item, a

    judge,

    aThematic Apperception

    Test

    (TAT)

    card,

    a

    survey

    question, a subtest, or a test that is being com-

    bined intoa composite test battery. Multiplecompo-

    nents may be homogeneous in thesenseof measuring

    a single latent variable, or they may be heterogeneous

    in

    the

    sense

    of

    measuring

    two or

    more factors

    or

    latent

    variables.Because of coefficient alpha's flexibility,

    its use is

    ubiquitous

    in

    mostareas

    of

    psychology

    as

    well

    as in

    many other disciplines.

    For

    example,

    Cortina (1993)reported that a review of the Social

    Sciences Citation Index for the period

    from

    1966 to

    1990 revealed that Cronbach's 1951 article on coef-

    ficient alpha had been referenced approximately 60

    times

    per

    year

    in a

    total

    of 278

    differentjournals.

    Although

    equivalent-forms

    estimators

    are

    often

    preferred, they are appropriate only in very

    special-

    ized circumstances

    in

    which

    two or

    more test forms

    constructed to be parallel are available.Test-retest

    coefficients

    are

    problematical

    for

    paper-and-pencil

    tests because of memory and practiceeffects.Internal

    consistencyreliability

    estimators

    arevery flexibleand

    Correspondence concerning this article should be ad-

    dressedto H. G.

    Osburn, Department

    of Psychology,Uni-

    versityofHouston,4800Calhoun,

    Houston,Texas

    77004.

    Electronic

    mail

    may be sent to

    [email protected].

    appropriate for a wide variety of circumstances in

    which an estimate of reliability is needed. Examples

    are (a) a single administration of a composite test

    containing n components, (b) two or more subtests

    combinedtoformacomposite

    score,

    (c) two ormore

    tests combined in a battery

    score,

    and (d) two or more

    judges' ratings combined into a

    composite.

    Several investigators have noted that internal con-

    sistency

    reliabilitycoefficients, including coefficient

    alpha, based on asingleadministration of a test may

    overestimate reliability because such coefficients as-

    sign transient error due to differences in test admin-

    istration, temporary changes in the examinee, and so

    on

    to true-score variance(Guion, 1965; Schmidt and

    Hunter, 1996). Inaddition, if the

    errors

    made in re-

    spondingto test items are positively correlated, coef-

    ficient

    alphamay beinflated(Komaroff, 1997). Also,

    reliabilitymay be

    overestimated

    ifcoefficient alphais

    erroneously used to estimate the reliability of a

    speeded

    test. Although it is true that coefficient alpha

    may

    sometimes be inflated because of the conditions

    mentioned above, a potentially more serious problem

    is

    the

    tendency

    of

    coefficient alpha

    to

    underestimate

    the true reliability

    when

    the data are multidimen-

    sional.If the items of a test conform to classical test-

    theory assumptions, coefficient alpha

    is a

    lower

    bound to the true reliability of a measure and fre-

    quently is an

    underestimate

    of

    reliability

    if the

    mea-

    sure contains a small number of heterogeneous items.

    This point was made as far back as

    1945

    by

    Guttman,

    343

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    2/13

    344

    OSBURN

    who noted, "Itseemsplausible that in a vast number

    of

    cases the

    hypothesis

    of

    equivalence

    (of the

    split

    halves)

    is

    dubious,

    and

    that therefore

    the

    reliability

    of

    a great many tests

    has

    been seriously underesti-

    mated."Agoodexample of this phenomenon are the

    numerous studies of the internal consistency reliabil-

    ityof the McClelland/Atkinson TAT measure of need

    for

    achievement. Coefficient alpha applied to this

    measure consistently shows low reliabilities. How-

    ever,

    as

    McClelland

    (1980)has

    pointed out, this mea-

    sure shows moderate

    to

    high validity when predicting

    operant measures of performance. TAT investigators

    have formulated fairlyelaborate scenarios to explain

    this

    discrepancy.

    For

    example,

    it hasbeen

    suggested

    that responses to TATcardsareinconsistent because

    subjectsfeel they should notrepeatthemselves. Some

    TAT investigators have even argued

    that

    the construct

    validityof operant measuresdoesnot require internal

    consistency

    as

    estimated

    by

    coefficient alpha.

    The

    point

    is

    that

    it is

    very likely that

    the

    discrepancy

    is due

    to the heterogeneity of the TAT data that

    results

    in a

    negatively biased reliability estimate. This explana-

    tion has been hinted at in the comments mentioned

    above,

    but not explicitly recognized.

    That coefficient alpha may underestimatereliability

    was

    also recognized by Cronbach (1951),who noted

    that a test containing distinct clusters of items might

    have

    a parallel split coefficient considerably higher

    thancoefficient alpha. A major purpose of this study

    was

    to examine the conditions under which coeffi-

    cient alpha underestimates

    the

    true reliability

    of a

    measure

    that is not tau

    equivalent, that

    is, is

    either

    congeneric equivalent

    or

    multidimensional.

    A

    related

    purpose was to examinealternative internal consis-

    tency reliability estimates that may improve the esti-

    mate

    of

    reliability compared withcoefficient alpha.

    Corrections for Attenuation

    If

    reliability

    is

    underestimated, attenuation

    is

    over-

    estimated (Guttman,

    1945).

    Because of their tendency

    to

    underestimate

    the

    reliability

    of

    heterogeneous

    tests,

    internal consistency estimators have been especially

    controversial when used incorrection forattenuation

    formulas

    (Cureton,

    1958;Lord&

    Novick, 1968; Nun-

    nally, 1978).The use of such reliability estimates to

    correct

    for

    attenuation

    will in

    many

    cases result in

    overestimation

    of the correlation between the true

    scores on the two measures and in some cases may

    result

    in

    substantial overestimation.

    In a

    recentarticle,

    Muchinsky

    (1996)argued that correcting for errors of

    measurement in the criterion remains controversial in

    meta-analyses,criterion-related validity studies, and

    utilityanalyses. Finally, Schmidt

    and

    Hunter (1996),

    in

    a discussion of using various reliability coefficients

    incorrections for attenuation, largely ignored the ten-

    dency of coefficient alpha to underestimate the reli-

    ability because of

    item

    heterogeneity. A second pur-

    pose

    of

    this

    study

    was to

    examine

    the

    extent

    to

    which

    correlation coefficients, corrected forunreliability,are

    overestimated when using coefficient alpha to esti-

    mate reliability.

    Internal Consistency Reliability Estimators

    Although

    coefficient

    alpha iswidely used inprac-

    ticalwork, a significant body of literature has devel-

    oped in which several less familiar internal consis-

    tency reliability coefficients have beenproposed.

    All

    of these coefficients have in common an attempt to

    reduce

    the underestimation of reliability

    when

    the

    components of a measure are not equivalent. All ex-

    cept standardized alpha and maximal reliability are

    discussed

    by

    Feldt

    and

    Brennan(1989).However,

    in

    the

    interests of clarity and

    reader

    convenience, a

    brief

    discussionofeachestimator's important

    properties

    is

    presented

    in

    this section, along

    witha

    computing for-

    mula for

    each coefficient.

    It

    should be noted that this article does not specifi-

    callyaddress the generalizability approach to reliabil-

    ity,

    which

    goes

    far beyond classical reliability theory.

    However, in certain designs, such as ap x i single-

    facet

    study,thegeneralizability coefficientis thesame

    as coefficient alpha, so the findings of the present

    studyapply.

    Coefficient Alpha

    Inthe theory of composite tests, thereliabilityof a

    measure,

    p~

    XT

    ,

    is defined as theratioof the variance

    of the truescores (T) of the measure to the variance

    of theobserved

    scores

    (X). In the framework of clas-

    sical reliability

    theoryand forclarity,wecallp the

    true reliabilityof a composite measure. As mentioned

    above, coefficient alpha is a lower bound to the true

    reliability

    (Outtman,1945).However, under very

    re-

    stricted conditions coefficient alpha is equal to the

    true reliability.In his derivation of the

    formula

    for

    coefficient

    alpha, Gulliksen (1950) considered twon-

    item tests that are parallel

    item-for-item

    and assumed

    that

    the

    average

    covarianceof an

    item

    in one

    testwith

    theparallel item in the other test isequal to the av-

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    3/13

    COEFFICIENTALPHA

    345

    eragecovariance of pairs of items

    within

    a single test,

    that

    is,

    wherep,/o-,

    2

    is the covariance between parallel items

    and p y T f O y is the

    covariance between nonparallel

    items.

    If

    Equation

    1

    holds, then coefficient alpha

    is

    equal

    to the true reliability. Novick and Lewis (1967)

    showed that Equation 1holdsif and only if the com-

    ponents of the test are essentially tau equivalent,

    that

    is,

    Ti

    = a

    v

    + Tj, (2)

    for

    all /andj.

    Tau-equivalent

    measures have equal true scores,

    and as

    indicated

    in

    Equation

    2,

    essentially tau-

    equivalentmeasures have truescoresthaidiffer only

    by anadditive constant. In this article both essential

    tau

    equivalenceand tauequivalence leadto the same

    conclusions. Therefore, essential

    tau

    equivalence

    will

    be referred

    to

    simply

    as

    tau

    equivalence.

    Both

    tau

    equivalence andessential tauequivalence imply that

    the correlations between true

    scores

    are all unityand

    that the variances and covariances of the true scores

    onthe

    components

    of the

    measure

    are all

    equal.

    Tau

    equivalencealso implies that the observed component

    intercorrelations corrected for attenuation are

    unity

    and

    that

    the

    measure

    is

    unidimensional

    with

    equal

    factor loadings

    but

    possibly

    unequalerror

    variances.

    There are two important advantages of a test whose

    components are tau equivalent: (a) The test can be

    interpreted as

    measuring

    a single underlying construct

    and (b) coefficient alpha computed on the components

    of the

    test

    is

    exactly equal

    to the

    true reliability.

    Congeneric equivalence is closely related to tau

    equivalence. If the components of a measure are con-

    generic equivalent, the correlations between the true

    scoresare

    unity

    but

    variances

    of the

    truescores

    may

    vary(Joreskog

    &

    Sorbom.

    1990).Congeneric equiva-

    lence implies that

    the

    measure

    is

    unidimensional,

    but

    the factor loadings and

    error

    variances may vary.

    Congeneric tests share the two advantages of tau-

    equivalenl

    tests because such

    tests

    measure

    a

    single

    underlying

    construct,

    and the

    data presented

    in

    this

    article show that coefficient alpha

    for

    such tests

    is

    reasonablyclose

    to the true

    reliability

    of the

    compos-

    ite measure.

    Finally,

    if the

    components

    of a

    measure

    are

    parallel,

    the correlations between true scores are all

    unity

    and

    the variance and covariance of the component true

    scoresof the measure are allequal. In addition, the

    components are equally reliable. Parallel components

    are unidimensional with equal factor loadings and

    equal

    error variances.

    Coefficient

    alpha

    computed on

    heterogeneous

    components may seriously underestimate the true re-

    liability.This issue is closely related to the

    multidi-

    mensionality of the

    latent variables underlying

    the

    components.

    If the

    components

    are

    unidimensional,

    coefficient alpha willbe anadequate estimate of the

    true

    reliability in allcases.If the components are char-

    acterized by two or more moderately correlated fac-

    tors

    (i.e.,

    inthe

    range

    of

    .2-.4),coefficient alpha

    may

    be

    seriously underestimated.

    Coefficient alpha may be computed using variance

    components

    but is

    ordinarily computed

    by the

    follow-

    ing

    equation:

    a = n(l- 2o-,

    2

    /o-

    2

    .)/(n- 1 ,

    3

    where of is thevarianceof the;'th component, of is

    the variance of the test, and the total score on the

    test is the sum of the ncomponent scores. It is im-

    portant to

    note

    that coefficient alpha can be comput-

    ed on the n

    components

    of a

    measure,

    thencompo-

    nentsgrouped into split halves, or thencomponents

    grouped into three or more parts. In all oftheseap-

    plicationscoefficient alpha is still a lower bound to

    the

    true reliability (Guttman,1945).

    Analternate formula

    for

    coefficient alpha

    is

    a =

    n

    2

    Ave(o-,j)/(nAve(of)

    +n(n-

    l)Ave(o-,

    y

    )),

    4

    whereAve(a,y)

    is the

    average covariance among com-

    ponents and

    Ave(of)

    is the average variance of the

    components

    in the

    measure. Several conclusions

    about coefficient alphacan be drawn from this for-

    mula:

    (a) If Ave(a,y) islarge relative to Ave(of), co-

    efficient

    alpha will

    be

    large;

    (b) if

    there

    are

    many

    components in the measure,the n(n 1)averageCo-

    variances willdominate the

    average variances and

    coefficient alpha willtend to belarge;(c) if

    Ave(o-,

    7

    )

    is

    small relative to Ave(of) either because the covari-

    ances are

    uniformly

    low or because the covariances

    are

    very variable,

    coefficient

    alphawilltend

    to be low

    to moderate.

    Raju s Coefficient

    Raju

    (1977)

    derived

    a

    formula

    for

    estimatingreli-

    ability

    whendifferent numbersofitemsare assigned

    topartsof a

    test

    so

    that

    thepartsare at

    most conge-

    neric.

    Raju's coefficient

    is

    defined

    by

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    4/13

    346

    OSBURN

    p =(o-

    2

    -2o-

    2

    )/((l

    (5)

    where X , =

    ,/S;

    and ,-is the number of items in the

    ithpart.If theparts contain equalnumbers of items,

    all X , equal Mr(where k is the number ofparts) and

    Raju's

    coefficient

    equals

    coefficient alpha.

    Angoff-Feldt

    Coefficient

    Feldt (1975) derived

    this

    coefficient for the situa-

    tion

    in

    which

    a

    measuring instrument could

    be

    divided

    into only two

    parts

    of arbitrary length. He assumed

    that the available

    parts were

    homogeneous in content

    but

    that because of the unequal length of the

    parts

    the

    scorescould only

    be

    congeneric equivalent.He also

    made

    the

    additional assumption that

    the two

    parts

    were differentially

    shortened

    versions of the total

    test.

    This latter assumption amounts to assuming that the

    sum

    of the

    error

    variances for the two parts

    is

    equal to

    the

    error variance

    of the

    total test.Feldt'scoefficient

    will

    exceed coefficient alpha whenever the part-test

    variances

    areunequal.The

    coefficient

    is not alower

    bound

    to the

    true reliability

    and in

    some circum-

    stances

    may

    overestimate

    the

    true

    reliability.

    Angoff

    (1953) and

    Kristof

    (1971)

    have

    also derived versions

    that

    ultimately lead to the same formula. Feldt and

    Brennan

    (1989) have designated

    this

    coefficient as the

    Angoff-Feldt

    coefficient:

    p =4o-.V(o-

    2

    -

    ((a*

    -

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    5/13

    COEFFICIENTALPHA

    347

    Ten Berge and

    Zegers

    computed

    U

    0

    , u,, and u

    2

    on

    college test data. In their data for a four-item test

    (similar

    to our

    four-component simulations), coeffi-

    cient

    alpha

    was

    .4359,\

    2

    was .4602, and

    u

    2

    was

    .4673.

    It was

    decided

    not toinclude dataon u

    2

    inthis

    article

    because of the

    small difference between

    u

    2

    and\

    2

    -

    Maximized Lambda4

    Cronbach

    (1951)

    showed that coefficient alpha

    is

    the

    expectation

    of

    coefficient alpha computed

    on all

    possible

    l/2(2) /(n )

    2

    split

    halvesof a

    test

    containing

    2nitems. If the data are tau equivalent, all coefficient

    alphas computed on the split halves

    will

    be equal to

    coefficientalpha computed on the

    In

    items. However,

    if

    the data are not tau

    equivalent, some split halves

    will

    give higher reliability

    estimates

    than coefficient

    alpha computedon the

    In

    componentsof themeasure.

    Because

    coefficient alpha

    is a

    lower bound

    to the

    true

    reliability, coefficient alpha

    computed

    on some split

    halves

    may

    provide lower bounds that

    are

    closer

    to the

    true reliability. This

    propertywaspointedout by

    Gutt-

    man(1945) andusedbyCallenderand Osburn (1977)

    inan

    attempt

    to

    improve reliability estimates

    for

    cor-

    rectionsforattenuation.Theterm

    Iambdu4

    wasintro-

    duced byGuttman. Lambda4 is the same as coeffi-

    cient alpha computed

    on

    split halves

    of the

    test:

    =2 1 - 0-2

    13)

    whereu\and

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    6/13

    348

    OSBURN

    Standardized

    Alpha

    Use of

    standardized alpha implies that observed

    scores on the

    components

    are

    standardized before

    summing. However, it is important to note that the

    true reliability of a measure is the same for both stan-

    dardized and unstandardized observed scores. Stan-

    dardized alpha is not a lower bound to the true reli-

    abilityand may yield estimates that are either lower or

    higherthan the true reliability. When the components

    of a

    composite measure

    are

    congeneric, standardized

    alpha

    will

    always exceed

    the true

    reliability. Also,

    if

    the components are tau equivalent and reliabilities of

    the

    components

    differ,

    standardized alpha

    will

    always

    exceed the true reliability. Standardized alpha is com-

    puted

    using the

    correlations rather than

    the covari-

    anccs:

    Standardized a = n(l

    n/(n

    j

    (15)

    where

    py

    is the correlation between the

    (th

    and>th

    items. It is easily shown that standardized alpha is the

    same as the generalizedSpearman-Brownformulafor

    lengthening

    a test n times when the average correla-

    tion (p) is used in the formula, thatis,

    Standardizeda = p/(l + (n - l)p), (16)

    wherep is the

    average

    of then(n - 1)

    correlations.

    Maximal

    Reliability

    Maximal reliability

    provides

    an estimate of reliabil-

    ity

    whenthe components in a test can be groupedinto

    subsets.Maximal

    reliability(R

    K

    )

    assumes that

    (a) all

    components within

    a

    subset

    are

    parallel, that

    is, the

    components

    in a

    subtest have equal reliabilities

    and

    equalvariances; and (b) the components in different

    subtests may havedifferingreliabilities and variances.

    If

    these

    two assumptions are met, the components in

    all subtests are congeneric (Li,

    Rosenthal,

    & Rubin,

    1996).

    Maximal reliability is a generalization of the

    Spearman-Brown approach

    and

    provides

    a

    compo-

    nent-weighting procedure to achieve maximal reli-

    ability.

    Maximalreliabilityis easy to

    compute

    usinga

    handcalculator:

    Rf

    =

    A/(K/(\ +(K

    -

    1)p) + A),

    17)

    whereK = numberofsubsetsorsubtests,A =n,p|/

    (1 - p|) +

    2

    P2/d - PT ) + +"iPt/(l ~ Pi t .Pis

    tne

    common correlation among the subsets,n

    t

    is the num-

    ber of typeicomponents, and p,is the reliability of a

    type i item. For two subsets, p can be estimated by

    r

    l2

    /V(r,r

    2

    ), where

    r

    l2

    is the average correlation be-

    tween

    Type

    1

    components

    and

    Type

    2

    components.

    For more than two subsets, p can be estimated for

    each pair

    of

    subsets

    and

    then averaged.

    Study ofCoefficient AlphaandRelated

    Reliability

    Estimates

    This study focused on two types of demonstration

    data. The first data type simulated measures

    with four

    components and relatively large correlations among

    the components (p = approximately .50). Examples

    ofthis

    data

    type include supervisor ratings, cognitive

    abilitysubtests, and personality dimensions. The sec-

    ond

    data

    type simulated measureswith eight compo-

    nents and

    relatively

    low

    correlations among

    the

    com-

    ponents (p =

    approximately .25). Examples

    of this

    data

    type include questionnaire measures of organi-

    zational commitment, role ambiguity, personal pref-

    erences, perception,

    and

    personality facets. Various

    scenarios weresimulated,and theextenttowhichthe

    variousreliabilityestimators under- or overestimated

    the true reliability was determined.

    Demonstration Data

    Demonstration data were created to study estima-

    tion

    of

    true reliability

    by the

    various internal consis-

    tencycoefficients. The general method was to gener-

    atesimulatedpopulation covariancematricesfor both

    trueand observed component

    scores.

    From

    these

    data

    the

    true

    reliability and 11

    estimators

    of reliability

    couldbe computed for varying

    degrees

    of heteroge-

    neity among the components. A description of the

    simulation methodis in the Appendix.

    The following data sets were simulated:(a) homo-

    geneous, one-factor data

    with

    equal factor loadings

    and equal

    error

    variances (parallel components); (b)

    homogeneous, one-factor datawith equalfactor load-

    ings

    but

    unequal error variances(tau-equivalentcom-

    ponents);(c)homogeneous, one-factor data

    with

    un-

    equal

    factor loadings

    and

    unequal

    error

    variances

    (congeneric-equivalent components); and (d) hetero-

    geneous, two-factor data

    with

    decreasing correlations

    between the factors (heterogeneous components).

    Three

    types of situations with which an investigator

    might be confronted were simulated:(a)ungrouped

    data such as a test containing n items, (b) data

    grouped into two subsets such as two subtests of a

    larger test, and (c)

    data

    grouped into three subsets. If

    the data are ungrouped, an investigator has a choice of

    computing awide varietyof

    reliability

    coefficients by

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    7/13

    COEFFICIENT

    ALPHA

    349

    analyzing

    the

    ungrouped data

    as is or by

    grouping

    the

    data in various ways to find thelargestlower-bound

    reliabilityestimate.

    As

    shown

    in

    this

    demonstration,

    an improved lower-bound estimate

    can

    sometimes

    be

    obtained by grouping thedataand computing a reli-

    ability

    estimate

    on the

    groupeddata.

    If the

    data

    are

    grouped to start with, aninvestigator

    still

    has a num-

    ber of lower-bound reliability estimates to

    choose

    from.In the case of two groups, six

    different

    estimates

    are

    available.

    The

    downside

    in

    computing

    a

    large

    number of the

    lower-bound alternatives

    to

    coefficient

    alpha

    is the

    possibility

    of

    capitalizing

    on

    sampling

    error. However,

    as

    shown

    in

    this demonstration,many

    of the

    alternative coefficients give similar results,

    so it

    is

    not

    necessary

    to compute a large number of esti-

    mates.

    Results for

    Four-Component

    Data

    Analysis

    of

    four components ungrouped.

    As can

    be

    seen from Table

    1,

    when

    the

    components were

    Table

    1

    Underestimation

    of

    Reliability: Four-Component Data

    parallel, all coefficients were equal to the true

    reli-

    ability.When the components were tau equivalent, all

    coefficients

    were equal

    to the

    true reliability except

    for standardized alpha, which overestimated

    the

    true

    reliability

    by a small amount. When the components

    were

    congeneric,

    onlyFeldt-Gilmerandmaximized

    Iambda4 were equal to the true reliability, although

    coefficient

    alpha and

    Iambda2

    were

    very

    close,

    and

    standardized alpha and theFeldtcoefficient slightly

    overestimated

    the true reliability. When the compo-

    nents were generated by two factors, allcoefficients

    except maximized

    Iambda4

    underestimated

    the

    true

    reliabilityto a greater or lesser extent. For example,

    whenthe components were moderately heterogeneous

    coefficient

    alpha

    was

    about

    92% of the

    true reliability,

    and

    when

    the

    components were strongly

    heteroge-

    neous coefficient alpha

    was

    about

    78% of the

    true

    reliability.In the strongly heterogeneouscasethe cor-

    relation between the two factors was only .2. This

    degree

    of

    heterogeneity

    is

    probably rare

    in

    practical

    Degreeofheterogeneity

    Reliability coefficient

    True reliability

    Four components ungrouped

    Coefficient

    alpha

    Feldt

    Feldt-Gilmer

    Lambda2

    Maximized

    Iambda4

    Standardizedalpha

    Components

    1 & 2

    vs.

    Components3 & 4

    Coefficient alpha

    Angoff-Feldt

    Lambda2

    Stratified

    alpha

    Standardized

    alpha

    Maximal

    reliability

    Components 1

    & 2 vs.

    Component3 vs.Component4

    Coefficient

    alpha

    Raju

    Feldt

    Kristof

    Lambda2

    Standardizedalpha

    Maximal reliability

    Parallel

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .800

    .750

    .800

    .800

    .800

    .760

    .787

    .800

    Tau

    eq.

    .798

    .798

    .798

    .798

    .798

    .798

    .799

    .798

    .799

    .798

    .798

    .799

    .801

    .748

    .798

    .808

    .798

    .758

    .797

    .804

    Congen.

    .786

    .778

    .792

    .786

    .782

    .786

    .799

    .763

    .806

    .763

    .786

    .799

    .801

    .673

    .718

    .820

    .786

    .690

    .797

    .804

    Slight

    .781

    .752

    .752

    .752

    .753

    .781

    .753

    .694

    .695

    .694

    .781

    .694

    .784

    .683

    .694

    .738

    .703

    .688

    .742

    .749

    Moderate

    .760

    .696

    .697

    .696

    .702

    .760

    .698

    .570

    .570

    .570

    .760

    .570

    .764

    .605

    .615

    .655

    .608

    .606

    .677

    .688

    Slrong

    .703

    .547

    .548

    .547

    .586

    .703

    .549

    .234

    .235

    .234

    .703

    .235

    .708

    .396

    .402

    .430

    .422

    .410

    .508

    .549

    Note.

    To

    assessthe

    degree of heterogeneity in the two-factor data, the observed covariances of thesimulateddata were

    factor analyzed using

    confirmatory

    factor

    analysis. The correlationsamong thefactors

    were .80, .40,

    and

    .20,

    respectively, in

    slightly,

    moderately,and

    strongly

    heterogeneousdata.Tau eq. = tauequivalent;

    congen.

    =

    congeneric

    equivalent.

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    8/13

    350

    OSBURN

    work. However, the strongly

    heterogeneous

    condition

    was

    useful for determining whether alternatives to

    coefficient alpha could improve the estimate of reli-

    ability

    even

    in thisextreme condition.

    Analysis

    of two

    halves:

    Twocomponents

    each

    half.

    In

    the heterogeneous split-half simulations coefficient

    alpha strongly underestimated the true reliability(see

    Table

    1). Coefficient

    alpha was 89% of

    true

    reliability

    when

    the

    components

    were slightly heterogeneous,

    75% of the true reliability when the components were

    moderately heterogeneous, and only 33% of true re-

    liability

    when the components were strongly hetero-

    geneous.

    However, in this situation stratified alpha

    was exactlyequalto the true reliability. Maximal re-

    liabilityvery slightly overestimated

    the

    true validity.

    Analysis of three

    pans:

    Part I, Components

    1-2;

    Part

    2,

    Component

    3;

    Part

    4,

    Component

    4.

    As can

    be

    seen from

    Table 1, when the components were

    parallel

    in

    three-part simulations with

    an

    unequal

    number of components in the parts, only the

    Raju,

    Feldt,and Kristof coefficients and maximal reliability

    were equal to

    the

    true reliability.This is because the

    parts were only congeneric equivalent even though

    the

    components were parallel. When the components

    were

    tau

    equivalent, only Raju

    and

    Kristof coeffi-

    cients were equal to the true reliabilitybecause the

    partswereonly congeneric. The Feldt coefficient did

    notequalthe truereliabilitybecause it assumes

    that

    the components are parallel. When the components

    were congeneric equivalent, onlyKristof'scoefficient

    was exactly equal to the true reliability.All

    coeffi-

    cients underestimated true

    reliability

    when

    the

    com-

    ponents

    were generated by two

    factors. Coefficient

    alpha

    was about 87% of the true reliability when the

    components were slightly heterogeneous, and 80%

    and

    56% of truereliabilitywhen the components were

    moderately and strongly

    heterogeneous,

    respectively.

    Maximal reliability was thebestestimator for these

    data, but in the strongly heterogeneous condition

    maximal reliabilitywas

    still

    only78% oftrue reliability.

    Summary of

    simulations

    of

    four-component

    data. In these simulations coefficient alpha was

    fairly robust

    in

    ungrouped single-component data.

    However, eveninthis condition

    coefficient

    alphawas

    only

    92% of the true

    reliability

    when the data were

    moderately heterogeneous. Coefficient alphacom-

    puted on two-part splitswhen the components in each

    part loaded

    on

    separate factorstended

    to

    strongly

    un-

    derestimate the true reliability in the heterogeneous

    conditions,

    but

    stratified alpha

    was

    equal

    to the

    true

    reliability

    and

    maximal reliability overestimated

    the

    true reliability only very slightly in all such condi-

    tions. MaximizedIambda4was equal to true reliabil-

    ity

    in all simulations.Because

    with four

    components

    there

    are

    only threepossible split halves, maximized

    Iambda4 should routinely be computed when studying

    thereliabilityof the average of four components.

    Results

    for

    Eight-Component Data

    Analysis of eight components ungrouped. Results

    for the eight-component simulations are presented in

    Table 2. On the whole the results mirrored the four-

    component simulations,

    that

    is, allcoefficients were

    equal

    to thetruereliability when the components were

    parallel, all except

    standardized

    alpha were equal to

    the true reliability when the components were tau

    equivalent,

    and

    only

    the

    Feldt-Gilmercoefficient

    and

    maximizedIambda4

    equaled the true reliability when

    the

    components were congeneric equivalent.How-

    ever,

    in

    these simulations coefficient alpha underesti-

    mated

    the true reliability

    only

    to a moderate

    degree.

    Coefficient

    alpha

    was 98% of the

    true reliability

    in

    slightlyheterogeneous condition and 96% and 90% of

    the true reliability in the moderate and stronglyhet-

    erogeneous conditions, respectively. Lambda2

    was a

    slightlybetter estimatorthancoefficient alpha,but the

    differences were not large.

    Analysis oft\vo halves:Fourcomponents each half.

    As in the four-component data, coefficient alpha did

    not perform nearly as well in these simulations. Co-

    efficient

    alpha was 89% of true reliability in the

    slightlyheterogeneous

    case

    and 75% and 33% oftrue

    reliability

    in the moderately and strongly heteroge-

    neous conditions, respectively. However, stratified al-

    pha was

    exactly equal

    to the

    truereliability,

    and

    maxi-

    mal

    reliability

    overestimated the

    true reliability

    only

    very slightly

    in the

    heterogeneous conditions.

    The

    other estimatorsexceptfor maximized Iambda4 were

    veryclose

    to

    coefficient alpha

    and did not

    improve

    the

    estimate

    to any

    greatextent.

    Analysis of threeparts: Part I, Components 13;

    Part

    2, Components

    4-6;

    Part 3, Components

    7-8.

    Results on three-part data are presented in

    Table 2.

    Raju,

    Feldt, and Kristof coefficients and

    stratified alpha

    and

    maximal reliabilitywere equal

    to

    the true reliability when the components were paral-

    lel, and

    only Raju, Kristof,

    and

    stratified alpha coef-

    ficients

    were equal to the

    true reliability when

    the

    components were

    tau equivalent,

    rendering

    the

    three

    parts only congeneric

    equivalent.

    Coefficient alpha

    was 95 , 95 ,and 77 ,respectively, of the true

    reliability in the slightly, moderately, and strongly

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    9/13

    COEFFICIENTALPHA

    351

    Table 2

    Underestimation

    of

    Reliability: Eight-Component Data

    Degree

    of heterogeneity

    Reliability

    coefficient

    True

    reliability

    Eight

    componentsungrouped

    Coefficient alpha

    Feldt

    Feldt-Gilmer

    Lambda2

    Maximized

    Iambda4

    Standardized alpha

    Components

    \^

    vs.Components 5-8

    Coefficient

    alpha

    Angoff-Feldt

    Lambda2

    Stratified

    alpha

    Standardized alpha

    Maximal reliability

    Components

    1-3

    vs.

    Components

    46

    vs.Components 7 & 8

    Coefficient

    alpha

    Raju

    Feldt

    Kristof

    Lambda2

    Stratified

    alpha

    Standardized alpha

    Maximalreliability

    Parallel

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .727

    .716

    .727

    .727

    .727

    .721

    .727

    .723

    .727

    Tau eq.

    .725

    .725

    .725

    .725

    .725

    .725

    .727

    .725

    .726

    .725

    .725

    .726

    .728

    .714

    .725

    .728

    .725

    .719

    .725

    .726

    .728

    Congen.

    .712

    .709

    .715

    .712

    .711

    .712

    .727

    .691

    .735

    .691

    .712

    .726

    .728

    .683

    .694

    .726

    .712

    .695

    .711

    .723

    .728

    Slight

    .704

    .692

    .693

    ,692

    .693

    .704

    .694

    .625

    .626

    .625

    .704

    .626

    .706

    .667

    .678

    .681

    .680

    .672

    .699

    .679

    .702

    Moderate

    .679

    .654

    .654

    .654

    .657

    .679

    .656

    .509

    .509

    .509

    .679

    .509

    .679

    .612

    .622

    .625

    .633

    .618

    .668

    .625

    .674

    Strong

    .613

    .554

    .555

    .554

    .571

    .613

    .556

    .204

    .445

    .204

    .613

    .204

    .613

    .449

    .477

    .480

    .597

    .484

    .578

    .487

    .612

    ?re.

    To

    assess

    the

    degree

    of

    heterogeneity

    in the

    two-factor

    data, the

    observed covariances

    of the

    simulateddata were

    factor

    analyzed

    using

    mfirmatory factor analysis.Thecorrelations amongthe factorswere .80, .40, and .20, respectively, in

    slightly,

    moderately,and strongly

    Note.

    heterogeneous data.Tau eq. = tauequivalent;congen. congenericequivalent.

    heterogeneousconditions. Stratified alpha performed

    reasonably well in these simulations. Maximal reli-

    ability

    performed best, yielding estimates that were

    veryclose

    to the true reliability in all

    three-part

    simu-

    lations.

    Summary ofsimulations for eight-component data.

    Coefficient

    alpha performed reasonably well

    in the

    eight-part simulations, equaling 95% of true reliabil-

    ity

    in moderately heterogeneous data. When the com-

    ponents were grouped into two parts with separate

    factors ineach part,

    coefficient

    alphawas only 75% of

    true reliability

    in

    moderately heterogeneous data.

    However,

    in

    this condition,

    stratified alpha was ex-

    actlyequal

    to the

    true reliability

    and

    maximal reliabil-

    ity overestimated the true reliability only by a very

    small margin.

    It is

    clear that coefficient alpha com-

    puted

    on individualcomponents was abetter lower

    bound than that computed on components grouped

    into two halves that were not highly correlated. Maxi-

    mizedIambda4was equal to the true reliability in all

    simulations.

    ImplicationsforCorrections forAttenuation

    In

    four-component simulations

    with

    moderate het-

    erogeneity, coefficient alpha ranged

    from

    75% to 92%

    of

    true reliability. Under these conditions, using

    co-

    efficient

    alpha

    in

    single-variable corrections

    for at-

    tenuation would result

    in

    overcorrections

    of 4% to

    15%.For

    example,

    if true

    validity were

    actually

    .50,

    estimated

    true validity would range from

    .52 to

    .58.

    In

    the worst case scenario, whenthe components in each

    part loaded on separate factors and the data were

    strongly

    heterogeneous, coefficient alpha

    was

    only

    33% of

    true reliability,

    and if

    used

    in the

    correction

    forattenuation would overestimate true reliability by

    74%.

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    10/13

    352

    OSBURN

    In

    eight-component simulationswithmoderate het-

    erogeneity,

    coefficient

    alpha ranged

    from75% to 96%

    of

    true reliability. Under these conditions, using co-

    efficient

    alpha

    in

    corrections

    for

    attenuation

    in one

    variable

    could result

    in an

    overcorrection

    of 2% to

    15%.

    For example, if the true validity was

    actually

    .50, estimated true validity would range from

    .51 to

    .58.

    In the

    worst-case

    scenario,in

    which

    the

    data

    were

    strongly heterogeneous and the components in each

    split

    loaded on separate factors, coefficient alpha was

    only

    33% of

    true reliability.

    In this

    extreme case,

    if

    the

    true

    validitywereactually .50, estimated trueva-

    lidity would be .87.

    Finally, if

    coefficient alpha

    was

    used

    to

    correct

    both independent

    and

    dependent variables

    for unreli-

    ability and if

    both variables were moderately hetero-

    geneous,

    in the

    worst-case scenario

    the

    double cor-

    rection would overestimate thetruevaliditybyabout

    33%.

    Discussion and Recommendations

    One-Factor Models

    When the components of a measure were parallel

    and

    each

    part

    contained

    an

    equal number

    of

    compo-

    nents,

    all

    estimators were equal

    to the

    true reliability.

    However, this conditionisseldomrealizedinreal data

    because

    it

    means that

    the

    components (items, subtests,

    etc.)

    have equal factor loadings and equal error vari-

    ances. When

    the

    parts contained

    an

    unequal number

    of parallel components, the

    parts

    were only conge-

    neric equivalent,

    and

    only Raju, Feldt,

    and

    Kristof

    coefficients and

    stratified alpha

    and

    maximal reliabil-

    ity estimates equaled thetrue reliability.

    When the components were tau equivalent and

    there were an equal number ofcomponents in each

    part, all

    estimates were equal

    to the true reliability

    except

    Angoff-Feldt,

    standardized alpha,

    and

    maxi-

    mal

    reliability,

    which overestimated true

    reliabilityby

    a

    very small amount.

    Whenthecomponents were congeneric equivalent,

    onlyFeldt-Giliner,maximized Iambda4, stratified

    al-

    pha, and Kristof coefficients were equal to the true

    reliability. However, when

    the

    data were congeneric,

    almost all

    coefficients could

    be

    used

    as

    adequate

    es-

    timates of the true

    reliability.

    The exception was

    co-

    efficient

    alpha computed

    on

    three-part splits

    withan

    unequal

    number

    of

    components

    in the

    splits.

    Two-Factor

    Models

    When

    the data were generated by two

    underlying

    factors and the

    components

    in one

    subset measured

    one

    factor

    and the

    components

    in the

    other subset

    measured a different factor,

    coefficient

    alpha seri-

    ously underestimated the true reliability. Lambda2

    tended

    to be

    slightly larger thancoefficient alpha

    but

    did not

    improve

    the

    estimate

    of

    true reliability

    to any

    great extent. However,

    in

    these conditions stratified

    alpha andmaximalreliability both provided adequate

    estimates

    of thetrue

    reliability.

    Maximized

    Iambda4 was clearly the

    most consis-

    tentlyaccurate estimator

    of the

    true reliability

    in all

    the simulations

    that were studied.

    For

    measures

    with

    few

    components, if one suspects that data are hetero-

    geneous, Iambda4

    should

    be

    computed

    for all

    possible

    splits and maximized

    Iambda4

    used to estimate the

    true reliability.For

    measureswith

    6 to 10

    components

    the

    situation

    is

    problematical.

    The

    approximation

    to

    maximizedIambda4 presented

    in

    this article could

    be

    computed

    and

    used

    to

    estimate true

    reliability.

    How-

    ever, maximized Iambda4

    may

    slightly overestimate

    thetrue

    reliability

    because

    of

    capitalizing

    on

    sampling

    Assessing

    Heterogeneity

    Unfortunately,

    thecharacteristicsof empirical data

    (tau

    equivalent, congeneric,

    ormultifactorial)are

    sel-

    dom

    known precisely. However, confirmatory factor

    analysis(CFA)isuseful formodelinganddistinguish-

    ing

    between parallel,

    tau-equivalent,

    and congeneric

    data providing

    that

    the assumptions of this approach

    are

    reasonably approximated (see Turban, Sanders,

    Frances, & Osburn, 1989).Anothergood reason for

    using

    CFA is the possibility of

    correlated

    errors

    among the components.

    Komaroff

    (1997) has shown

    that

    correlated errors

    can

    inflatecoefficient alpha

    and

    thatsucherrors interact with lackof tau equivalence.

    Correlated errors can be detected with CFA, and if

    necessary alpha

    can be

    adjusted

    for this

    effect. Also,

    there has been agooddeal of work on ways to deter-

    mine

    whether data

    isuni-or

    multidimensional,

    and a

    largenumberof indices have beenproposed(see Hal-

    tie, 1985).Factoranalysis (either exploratory or con-

    firmatory)of the

    components

    is

    recommended

    to de-

    termine theheterogeneity of thedata. Ifthe data arc

    not

    continuous,

    it is

    also important

    to

    factor

    the poly-

    choric correlations among the components. If the data

    set is characterized by two or more moderately cor-

    related underlying factors (in the range of .20 to .40 or

    so), aninvestigatorshould be concerned. The excep-

    tion to this rule is the

    situation

    in which there are

    multiple

    factors

    but the first

    factor

    is

    dominant

    in the

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    11/13

    COEFFICIENTALPHA 353

    sense of accounting for the bulk of the variance

    among

    the components (Cronbach, 1951).

    It

    is important to note that the average intercompo-

    nent correlation is not a reliable indicator ofhetero-

    geneity.

    The

    average correlation among components

    maybe low, but the data may stillbe tau equivalent,

    and

    therefore true reliability

    is

    accurately estimated

    bycoefficient

    alpha (Green, Lissitz, &

    Mulaik, 1977).

    Variations in the correlations amongcomponentsmay

    also be a somewhat misleadingindicatorof heteroge-

    neity.Even if

    there

    are variations in intercomponent

    correlations, the data may still be tau equivalent.

    However,

    intercomponent correlations that cluster

    into subgroups indicate heterogeneity (Cortina, 1993).

    If

    the data are very heterogeneous, components can

    be grouped on the basis of similar content, and strati-

    fied alpha

    or

    maximal reliability along with

    coeffi-

    cientalpha should

    be

    computed

    and the

    largest value

    used

    to estimate the true reliability. If there is no basis

    for

    grouping and there are a small number of compo-

    nents,

    Iambda4

    should

    be

    computed

    for

    all possible

    splits.

    Many Versus Few Components

    This

    studyassessedcoefficient alpha computed on

    onlyfour

    and

    eight components.

    The

    reason

    for

    study-

    ing such

    a

    small number

    of

    components

    is

    that coef-

    ficientalpha

    is

    moreoften than

    not

    computed

    on

    eight

    or

    fewer components (Peterson, 1994)

    and

    because

    coefficient alphaapproaches

    the

    true

    reliabilityas the

    number of positively correlated components in-

    creases.

    Coefficient alpha becomes less and less of an

    underestimate of the true reliability because the per-

    cent increase in coefficient alpha as the number of

    components increases is greater than the percent in-

    crease

    in true reliability. For example, in the present

    study,

    to limit overestimation ofcorrectedvalidity to

    about 2%, coefficient alpha must be at least 95% of

    the

    true reliability. In the worst-case scenario(strong

    heterogeneity)

    for

    eight components ungrouped,

    co-

    efficient alpha

    was 90% of

    true reliability,

    but in

    this

    caseonly 20 components would be needed for coef-

    ficient alpha

    to

    reach

    95% of

    true reliability. In

    the

    worst-case scenario

    for

    four components ungrouped,

    coefficient alpha was 78% of true reliability,but in

    this

    case

    only 28 components would be

    needed

    for

    coefficient alpha to reach 95% of true reliability. It is

    concluded that when estimatingthereliabilityofques-

    tionnaire

    or rating data, even if the data are very het-

    erogeneous, only

    20 to 30

    components

    are

    needed

    to

    be reasonably assured of an adequate estimate of true

    reliability. However, in the

    case

    of

    grouped data

    in-

    creasing, the number of components in each group

    does

    not necessarily make coefficient alpha a better

    lower

    bound for

    alternative

    estimates of the true reli-

    ability to be used. For example, in theworst-case

    scenario for two groups of four components in each

    group, coefficient alpha was only 33% of true reli-

    ability. However,

    in

    this

    case

    one can

    compute strati-

    fied alpha, which

    is

    100%

    of

    true reliability.

    References

    Angoff,

    W. H. (1953).

    Test

    reliability and

    effective

    test

    length. Psychometrika, 18, 1-14.

    Callender, J. C., & Osburn, H. G. (1977). A method for

    maximizing

    andcross-validatingsplit-half reliabilityco-

    efficients.

    Educational

    &

    Psychological Measurement,

    37,

    819-826.

    Callender,

    J. C., & Osburn, H. G. (1979).Anempirical

    comparisonof coefficient alpha,

    Gunman's

    Iambda2 and

    MSPLITmaximized split-half

    reliability

    estimates.Jour-

    nal

    of Educational Measurement, J6,89-99.

    Cortina,

    J. M. (1993).Whatiscoefficient alpha? An exami-

    nation oftheoryandapplications.

    Journal

    of

    Applied

    Psy-

    chology,

    78,

    96-104.

    Cronbach, L.J. (1951).

    Coefficient

    alpha and the internal

    structure

    of tests.Psychometrika, 16, 297-334.

    Cronbach, L.J.,

    Schoneman,

    P., &

    McKie,

    D. (1965).Alpha

    coefficient

    for stratified-parallel

    tests. Educational

    &

    Psychological Measurement,

    25,

    291-312.

    Cureton,

    E. E.(1958).Thedefinitionand

    estimation

    of test

    reliability.

    Educational

    &

    Psychological Measurement,

    18, 715-738.

    Feldt,

    L. S.

    (1975).

    Estimation of the reliability of a test

    divided

    into

    twopansof unequal

    length.

    Psychometrika,

    40,

    557-561.

    Feldt, L.

    S.,

    & Brennan, R. L. (1989). Reliability.InR. L.

    Linn(Ed.),Educational Measurement(3rd ed.:pp. 105-

    146). New York: Macmillan.

    Gilmer,

    J.

    S.,

    &

    Feldt,

    L. S.

    (1983). Reliability

    estimations

    for atestwith

    parts

    of unknown lengths.Psychometrika,

    48,

    99-111.

    Green, S. B.,Lissitz,R.

    W.,

    & Mulaik, S. A. (1977).Limi-

    tationsof coefficient alphaas an

    index

    of test

    unidimen-

    sionality. Educational

    &

    Psychological Measurement,

    37.

    827-838.

    Guion.

    R. M. (1965). Personnel testing. New York: Mc-

    Graw-Hill.

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    12/13

    354

    OSBURN

    Gulliksen, H. (1950). Theory

    of mental

    tests. New York:

    Wiley.

    Guttman,L.

    (1945).A

    basis

    for analyzing

    test-relest reli-

    ability.

    Psychometrika, 10,255-282.

    Hattie, J. (1985).

    Methodology

    review: Assessing unidi-

    mensionality

    of tests and items.Applied Psychological

    Measurement, 9, 139164.

    Joreskog,

    K.

    G., &

    Sorbom,

    D.

    (1990).

    L1SREL

    7: A

    guide

    to

    the

    program

    and

    applications (2nded.).Chicago:

    Na-

    tional EducationPress.

    Komaroff, E. (1997).

    Effect

    of simultaneous violationsof

    essential tau-equivalence and uncorrelatederror.Applied

    Psychological Measurement,

    21,337-348.

    Kristof, W. (1971). On the theory of a set of

    tests

    which

    differ only in length.Psychometrika, 36,207-225.

    Kristof,

    W. (1974).

    Estimation

    of

    reliability

    and

    truescore

    variance from a split of a test intothreearbitraryparts.

    Psychometrika,

    39,

    491-499.

    Li,

    H.,

    Rosenthal, R., &Rubin,D. B. (1996). Reliability of

    measurement in psychology: From Spearman-Brown to

    maximal reliability.

    Psychological

    Methods,

    I,

    98-107.

    Lord, F. M., & Novick, M. R.(1968).Statistical theories

    of

    mental testscores.MenloPark,CA: Addison-Wesley.

    McClelland, D. C.

    (1980).

    Motive dispositions: The merits

    ofoperant and respondent measures. In L. Wheeler

    (Ed.),

    Review of personality

    and

    social psychology (pp.10-41).

    Beverly Hills,CA:Sage.

    Muchinsky,

    P. M.

    (1996).

    The

    correlation

    for

    attenuation.

    Educational &Psychological Measurement, 56,63-75.

    Novick,

    M.R., &

    Lewis,

    C. L.

    (1967). Coefficient alpha

    and the

    reliability

    of

    measurements. Psychnmetriku, 32,

    1-13.

    Nunnally,

    J. C.(1978).Psychometric theory(2nd ed.).New

    York: McGraw-Hill.

    Peterson,R. A.

    (1994).

    A

    meta-analysis

    ofCronbach'sco-

    efficient alpha.Journal of Consumer Research, 21,381-

    391.

    Raju, N. S. (1977). A generalization of coefficient alpha.

    Psychometrika, 42,549-565.

    Schmidt,

    F.

    L.,

    &

    Hunler,i.

    E. (1996).

    Measurement error

    inpsychologicalresearch: Lessonsfrom26 researchsce-

    narios. Psychological Methods,

    I,

    199-223.

    Ten

    Berge, J. M.F., &

    Zegers,

    F. E. (1978). A seriesof

    lower bounds to the reliability of atest. Psychometrika,

    43,575-579.

    Turban,

    D.

    B.. Sanders,

    P.

    A., Frances,

    D.

    J.,

    &

    Osburn,

    H. G. (1989). Construct equivalence as an approach to

    replacing validated cognitive ability selection tests. Jour-

    nal ofApplied Psychology, 74,62-71.

    Appendix

    Simulation

    Method

    1.Tau-equivalentdatawere simulated

    as

    follows:

    a. The correlations among the component true scores

    were set to1.0.

    b. The variances of the componenttruescoreswere set

    to1.0.

    These two operations

    generate

    a true-score

    covariance

    matrix

    with every element unity.The matrix is 4 x 4 for

    four-component dataand8x8foreight-component data.

    c. To generate the corresponding tau-equivalent ob-

    served correlation matrix, the reliabilities of the components

    were

    systematically varied.

    For

    example, in the four-

    component

    case

    the reliabilities were set to.45, .46, .52,and

    .55. The

    observed-score

    correlationbetween the < th andj'th

    component observed scoreswas computed by

    multiplying

    the true-score correlation between

    the

    ith and

    jlh

    compo-

    nentsby the square root of

    the

    product oftheirrespective

    reliabilities. For example, in the four-componentcase the

    observed-score correlation between Components

    I and 2

    (.465)was computed bymultiplying 1.0 by the square root

    of

    .45 x

    .48.

    d. Because reliabilityistrue-scorevariance divided by

    observed-score variance, theobserved-scorevariance of the

    ithcomponent was computed by dividing the true variance

    of the component by itsrespectivereliability.For example,

    the

    observed variance

    of

    Component

    1 in the

    four-

    component case (2.15) was computed by dividing 1.0 by

    .465.

    2.

    Parallel data were simulated

    the

    same

    way as for

    tau-

    equivalent

    data except

    that the reliabilities of the

    compo-

    nents

    were kept constant.

    3. Congeneric-equivalent data were

    simulated the

    same

    way

    as for

    tau-equivalentdata exceptthattrue variances

    of

    the components were systematically varied.

    4.Truereliabilitywas computed bydividingthe variance

    of the summed true

    score

    on the components by the variance

    of the summedobserved score.

  • 7/23/2019 Osburn, H.G. (2000) Coefficient alpha and related internal consistency reliability coefficients.pdf

    13/13

    COEFFICIENTALPHA 355

    5. Supervisor-rating data with

    four

    components were

    simulated

    as

    follows:

    a.To simulate paralleldata,the reliability of each com-

    ponent was set at .50. This resulted in parallel data with

    correlations of .50 among the observed scores.

    b.To simulatetau-equivalentdata, the reliabilities of

    the components were

    set to

    .45, .48, .52,

    and

    .55. This

    resulted in tau-equivalent data with correlations that were

    variable (.465

    to

    .535) among

    the

    components

    but

    averaging

    approximately .50.

    c. To simulate congeneric-equivalent data, the true

    variance of Components 1 and 2 was set at 2.0 and the true

    variance of Components 3 and 4 was set at 1.0.

    d. To generateslightlyheterogeneous data, true-score

    correlations between Components

    1 and 2

    withComponents

    3 and 4

    were

    set to .8; for

    moderately heterogeneous

    data

    0.80swere replaced by

    0.60s.

    For strongly heterogeneous

    data 0.60s were replaced by 0.20s. Confirmatory

    factor

    analysis (CFA) applied to all three observed

    covariance

    ma-

    trices resulted in a two-factor solution. Components 1 and 2

    loaded

    1.00

    on Factor 1 and 0 on Factor 2. Components 3

    and

    4 loaded 0 on Factor 1 and 1.00 on

    Factor

    2. For

    slightly

    heterogeneous data,

    the

    correlation between

    Factors1 and 2

    was

    .80. For moderately heterogeneous data the correlation

    was

    .60, and for strongly heterogeneous data the correlation

    was .20.

    e.Four-pan datawere simulated byX =XI+ X2 + X3

    + X4; two-part data were simulated by Xa = X I

    + X2,

    Xb

    = X3 +

    X4;

    three-part data were simulated by Xa

    =

    XI +

    X2,

    Xb = X3, Xc = X4.

    6. Questionnaire data w i t height components were simu-

    lated as follows:

    a.To simulate paralleldata,the reliability of each com-

    ponent was set at .25. This resulted in correlations of .25

    among

    the

    observed scores.

    b. To simulate tau-equivalent data, the reliabilities of

    the components wereset to .22, .22, .24, .24, .26, .26, .28,

    and

    .28. This produced tau-equivalent data with observed

    score correlations ranging

    from

    .22 to .28 but averaging

    approximately .25.

    c. To simulate congeneric-equivalent data, the true

    variances wereset to 2.0 forComponents 1, 2, 3, and 4, and

    to 1.0 for Components 5, 6, 7 and 8.

    d. To

    generate slightlyheterogeneous data, true-score

    correlations

    between

    Components 1,2, 3, and 4with Com-

    ponents 5, 6, 7, and 8 were set to .8, for moderatelyhetero-

    geneous data

    0.80s

    were replaced

    by

    0.60s.

    For

    strongly

    heterogeneous data

    0.60s

    were replaced by 0.20s. These

    data were factored using CFA.

    The

    factor loadings were 1.0

    and 0.0.

    and

    correlations among

    the two

    factors were

    the

    same

    as for

    four-component data, that

    is, .8, .6, and .2.

    e. Eight-part data were simulated by X =

    X I

    + X2 + X3

    + X4 + X5 + X6 + X7 + X8;

    two-part data were simulated

    by

    Xa

    = XI + X2 + X3 + X4, Xb = X5 + X6

    -f

    X7 + X8;

    three-part data were simulated by Xa= XI +X2 + X3, Xb

    = X4 + X5 + X6, Xc = X7 + X8.

    ReceivedOctober 11, 1996

    RevisionreceivedJuly

    20,

    1999

    Accepted March

    22

    2000