36
ACCURACY OF MEASUREMENT ACCURACY OF MEASUREMENT VALIDITY VALIDITY AND AND RELIABILITY RELIABILITY

ACCURACY OF MEASUREMENT VALIDITYANDRELIABILITY. Having dealt with many sources of measurement error, you may wish to know how successful you have been

Embed Size (px)

Citation preview

ACCURACY OF MEASUREMENTACCURACY OF MEASUREMENT

VALIDITYVALIDITY

ANDAND

RELIABILITYRELIABILITY

ACCURACY OF MEASUREMENTACCURACY OF MEASUREMENTHaving dealt with many sources of measurement error, you may Having dealt with many sources of measurement error, you may wish to know wish to know how successful you have been in how successful you have been in eliminating/reducing measurement erroreliminating/reducing measurement error. .

How would you How would you assess the accuracyassess the accuracy of a measurement of a measurement instrument? For example,…instrument? For example,… QUESTION:QUESTION: How would you decide/judge if your How would you decide/judge if your bathroom scalebathroom scale is operating is operating accuratelyaccurately (free of measurement error)? (free of measurement error)?

ACCURACY OF MEASUREMENTACCURACY OF MEASUREMENT

There are There are Two yardsticksTwo yardsticks against which against which we judge the we judge the accuracy/precisionaccuracy/precision of a of a measurement instrument/procedure measurement instrument/procedure and its relative success in measuring a and its relative success in measuring a variable: variable: ValidityValidity and and ReliabilityReliability

ACCURACY OF MEASUREMENTACCURACY OF MEASUREMENT

Validity?Validity? The degree to which a measurementinstrument actually measures what it is supposed/designed to measure

Reliability?Reliability?The degree of dependability, stability, consistency, and predictability of measurement instrument

Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Palisades, CA; Goodyear, 1972) 353-355 and from Fred N: Kerlinger, Foundations of Behavioral Palisades, CA; Goodyear, 1972) 353-355 and from Fred N: Kerlinger, Foundations of Behavioral

Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.

Neither Valid Nor Reliable

Valid and Reliable

Reliable But Not Valid(Rifle A) (Rifle B) (Rifle C)

Which pattern can be characterized as valid, reliable, or neither?

ACCURACY OF MEASUREMENT:ACCURACY OF MEASUREMENT:VALIDITYVALIDITY

Let’s focus on validity first!Let’s focus on validity first!

VALIDITY:VALIDITY:– Face ValidityFace Validity

– Content ValidityContent Validity

– Construct ValidityConstruct Validitya. Convergent Validitya. Convergent Validityb. Discriminant Validityb. Discriminant Validity

– Criterion ValidityCriterion Validitya. Predictivea. Predictiveb. Concurrentb. Concurrent

Are (can be) used to assess quality of data obtained after using the instrument.

Are (should be) of concern at the time of designing the instrument.

ACCURACY OF MEASUREMENT:ACCURACY OF MEASUREMENT:FACE VALIDITYFACE VALIDITY

• FACE VALIDITY:

– Most subjective of all types of validity

– The measurement instrument is intuitively judged for its presumed relevance to the attribute being measured.

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONTENT VALIDITYCONTENT VALIDITY

CONTENT VALIDITY:CONTENT VALIDITY:– Many Many constructsconstructs represent represent complexcomplex, abstract, , abstract,

and illusive qualities that cannot be directly and illusive qualities that cannot be directly observed/ measured, but have to beobserved/ measured, but have to be inferred inferred from their multiple indicators.from their multiple indicators.

– Content validity is concerned with whether or Content validity is concerned with whether or not the measurement not the measurement instrumentinstrument contains a contains a fair fair samplingsampling of the of the construct’s content domainconstruct’s content domain, i.e., , i.e., the universe of the issues it is supposed to the universe of the issues it is supposed to represent.represent. Example: Example: Course Exam?Course Exam?Stress or AnxietyStress or Anxiety What are some of the What are some of the symptomssymptoms of stress? of stress? (see next slide…) (see next slide…)

Stress

PhysicalTension

Emotional orPsychological

TensionMental Tension

Nervousness Anxiety

C

D D

DDD

D

E

Could be measured by

Blood pressure

Pulse Rate

Could be measured by

Extent of sleeplessness

Sweating

Stomach upsets

Could be measured by

Headaches

Fatigue

Confusion

Fear

E

E

E

E

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONTENT VALIDITYCONTENT VALIDITY

– Ensuring content validity is aEnsuring content validity is a subjective subjective process process Often involves obtaining Often involves obtaining expert judgmentexpert judgment on on adequacy adequacy

of theof the overlapoverlap between the between the instrumentinstrument and the and the domaindomain of the concept being measured.of the concept being measured.

– Concern about Content Validity Concern about Content Validity should receive should receive due attentiondue attention during the during the survey constructionsurvey construction phase (rather than when data collection has phase (rather than when data collection has been completed)been completed)

Assessing the quality of scores Assessing the quality of scores afterafter they have been they have been obtained is the concern of obtained is the concern of criterioncriterion validity and validity and constructconstruct validity validity

– Steps to ensureSteps to ensure content validity when content validity when constructing a measurement instrument will constructing a measurement instrument will discussed laterdiscussed later in this presentation. in this presentation.

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONSTRUCT VALIDITYCONSTRUCT VALIDITY

Construct ValidityConstruct Validity Is based on the way scores on a variable relate to Is based on the way scores on a variable relate to

scores on other variables within a theoretical scores on other variables within a theoretical system/framework.system/framework.

A.A. Refers to the degree to which scores obtained Refers to the degree to which scores obtained from the instrument show from the instrument show relationships with other relationships with other variablesvariables that are that are consistent with those consistent with those substantiated by prior researchsubstantiated by prior research, e.g., , e.g.,

Organizational LoyaltyOrganizational Loyalty Job satisfactionJob satisfaction

AbsenteeismAbsenteeism+_

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONSTRUCT VALIDITYCONSTRUCT VALIDITY

B.B. Alternatively, construct validity can be viewed as the Alternatively, construct validity can be viewed as the degree to which degree to which scores obtainedscores obtained using a using a measurement instrument are measurement instrument are consistent withconsistent with scores scores obtained from an obtained from an alternative instrumentalternative instrument (designed to (designed to measure the same concept)measure the same concept) For example: subjective For example: subjective self-report measures of firm self-report measures of firm

performanceperformance and firm’s and firm’s objective ROIobjective ROI..Evidence of construct validity:Evidence of construct validity:1.1. Scores obtained on the same construct Scores obtained on the same construct using different using different

measurement instruments/procedures measurement instruments/procedures must convergemust converge—i.e., must —i.e., must share variance share variance (convergent validity)(convergent validity)..

2.2. Measures of different constructsMeasures of different constructs must be empirically must be empirically distinguishable--i.e., must NOT convergedistinguishable--i.e., must NOT converge (discriminant validity)(discriminant validity)

Also see theAlso see the multitrait-multimethod multitrait-multimethod correlationcorrelation matrix for:matrix for:

– Job Satisfaction Survey--JSS (Spector 1985), andJob Satisfaction Survey--JSS (Spector 1985), and– Job Descriptive Index--JDI (Smith et al., 1969)Job Descriptive Index--JDI (Smith et al., 1969)

Multitrait-Multimethod Correlation MatrixMultitrait-Multimethod Correlation Matrix for Three for Three JSS (1985)Versus JDI (1969) SubscalesJSS (1985)Versus JDI (1969) Subscales

Subscales: Subscales: JDI JDI WorkWork JDI JDI PayPay JDI JDI SuperSuper JSS Work JSS Work JSS PayJSS Pay

1.1. JDI WorkJDI Work

2.2. JDI PayJDI Pay .27 .27

3.3. JDI SuperJDI Super . .3131 .23 .23

4.4. JSS JSS WorkWork .66.66 .24 .24 .24.24

5.5. JSS JSS PayPay .33.33 .62.62 .34.34 .29 .29

6.6. JSS JSS SuperSuper .25.25 .27 .27 .80.80 .22 .34 .22 .34

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CRITERION VALIDITYCRITERION VALIDITY

Criterion ValidityCriterion ValidityThe degree to which scores obtained from an The degree to which scores obtained from an instrument can instrument can predict a related practical predict a related practical outcomeoutcome (e.g., GMAT and academic performance (e.g., GMAT and academic performance in MBA program--r = .48 with 1st year GPA)in MBA program--r = .48 with 1st year GPA)

Predictive ValidityPredictive Validity--if the prediction involves --if the prediction involves a future outcome, e.g., GMATa future outcome, e.g., GMAT

Concurrent ValidityConcurrent Validity--if the prediction involves --if the prediction involves a present outcome or state of affairs—e.g., a present outcome or state of affairs—e.g., score on political score on political liberalism/conservatismliberalism/conservatism scale predicting scale predicting political party affiliationpolitical party affiliation..

Evidence of Criterion Validity of SAT Evidence of Criterion Validity of SAT

Correlation (r)Correlation (r) with Freshman GPA with Freshman GPA

SAT-Critical Reading SAT-Critical Reading 0.480.48SAT-MathSAT-Math 0.470.47SAT-Writing SAT-Writing 0.51 0.51 SAT (all components combined)SAT (all components combined) 0.530.53

High School GPA High School GPA 0.540.54

High School GPA and all SAT ComponentsHigh School GPA and all SAT Components 0.62 0.62

Source: College Board ‘s news conference on “new” SAT validity, June 17, 2008 Source: College Board ‘s news conference on “new” SAT validity, June 17, 2008

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CRITERION VALIDITYCRITERION VALIDITY

Entry-level + Training:Entry-level + Training:Cognitive ability tests .53Cognitive ability tests .53Job TryoutJob Tryout .44 .44Biographical inventories .37Biographical inventories .37Reference checksReference checks .26 .26ExperienceExperience .18 .18InterviewInterview .14 .14Ratings of training Ratings of training .13 .13and experienceand experienceAcademic achievement .11Academic achievement .11Amount of education .10Amount of education .10InterestInterest .10 .10AgeAge -.01 -.01

Experienced Workers:Experienced Workers:Work-sample testsWork-sample tests .54 .54Cognitive ability tests .53Cognitive ability tests .53Peer ratingsPeer ratings .49 .49Ratings of the qualityRatings of the quality .49 .49

of performance inof performance inpast work experience past work experience (behavioral consistency (behavioral consistency ratings)ratings)

Job knowledge testsJob knowledge tests .48 .48Assessment centersAssessment centers .43 .43

Criterion validity of alternative predictors of job performance:

___________________________________________________

SOURCE: Hunter, J.E., & Hunter, R.E., 1984, Validity and Utility of Alternative Predictors of Job Performance, Psychological Bulletin, 96, pp. 72-98.

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

Dependability, consistency, stability, predictabilityDependability, consistency, stability, predictability

Reliability:Reliability:– StabilityStability

a. Test-Retest Reliabilitya. Test-Retest Reliability

b. Parallel-Form Reliabilityb. Parallel-Form Reliability

– Internal ConsistencyInternal Consistencya. Split-Half Reliabilitya. Split-Half Reliability

b. Inter-Item Consistencyb. Inter-Item Consistency

c. Inter-Rater Reliabilityc. Inter-Rater Reliability

To better understand RELIABILITY, let’s first review different To better understand RELIABILITY, let’s first review different types of types of measurement errormeasurement error……

RELIABILITY:

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

• Measurement, Measurement, especially in social sciencesespecially in social sciences, is often , is often not exactnot exact; it ; it involves approximating and estimating, i.e., involves involves approximating and estimating, i.e., involves measurement errormeasurement error

Measurement Error = True Score - Observed ScoreMeasurement Error = True Score - Observed Score

• Measurement Error:Measurement Error:– ConstantConstant (repeated) error:(repeated) error: reflects error that reflects error that appearsappears

consistentlyconsistently in in repeated repeated measurementsmeasurements..

– RandomRandom (unsystematic) error:(unsystematic) error:

reflects error that reflects error that appearsappearssporadicallysporadically in repeated measurements. in repeated measurements.

Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Palisades, CA; Goodyear, 1972) 353-355 and from Fred N: Kerlinger, Foundations of Behavioral Palisades, CA; Goodyear, 1972) 353-355 and from Fred N: Kerlinger, Foundations of Behavioral

Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.

Neither Valid Nor Reliable

Valid and Reliable

Reliable But Not Valid(Rifle A) (Rifle B) (Rifle C)

Which pattern represents what type of measurement error?

Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Palisades, CA; Goodyear, 1972) Source: Adopted from Keith K. Cox and Ben M. Enis, The Marketing Research Process (Pacific Palisades, CA; Goodyear, 1972) 353-355 and from Fred N: Kerlinger, Foundations of Behavioral Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.353-355 and from Fred N: Kerlinger, Foundations of Behavioral Research, 44, copyright @1973 by Holt, Rinehart and Winston, Inc.

(Rifle A) (Rifle B) (Rifle C)

CNCLUSION: CNCLUSION: Reliability is Reliability is only concerned with random erroronly concerned with random error of measurement. of measurement.

• ReliabilityReliability can be can be defined asdefined as extent to which extent to which measurements obtained are measurements obtained are free from random error.free from random error.

QUESTION: QUESTION: Relationship between Relationship between validityvalidity and and reliabilityreliability??

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

A.A. Measure the Measure the same person many timessame person many times and use and use ______________________________ of the scores as an of the scores as an index of index of stabilitystability over repeated measures (e.g., weight) over repeated measures (e.g., weight)

How would you How would you assess assess reliabilityreliability of a measurement of a measurement instrument instrument (say, a (say, a bathroom scalebathroom scale)?)?

standard deviationstandard deviation

Another Way?Another Way?

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

B.B. Measure Measure many individuals twicemany individuals twice using using same instrumentsame instrument and and look for stability of scores. That is, compute the look for stability of scores. That is, compute the correlation correlation coefficient rcoefficient r ((Test- Retest ReliabilityTest- Retest Reliability))

– Most applicable Most applicable for fairly stable attributesfor fairly stable attributes (e.g., personality (e.g., personality traits).traits).

((See next slide for how to computeSee next slide for how to compute correlation coefficient r) correlation coefficient r)

Computing Reliability CoefficientComputing Reliability Coefficient

22 )()(

))((r

yyxx

yyxx

X = Each subject’s 1st scoreY = Each subject’s 2nd score

_ _ _ _ _ _

X Y X – X Y – Y (X – X) (Y – Y) (X – X)2 (Y – Y)2

11 12 -4 -4 16 16 16 17 17 2 1 2 4 1 16 14 1 -2 -2 1 4 . . . . . . . . . . . . . . . . . . . . ._ _ _ _ _ _

X=15 Y=16 ∑(X – X) (Y – Y) ∑ (X – X)2 ∑ (Y – Y)2

r = Test-Retest reliability coefficientAnother way to assess reliability of an instrument

when an alternative instrument also exists?

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

C.C. Measure Measure many individualsmany individuals with with two instrumentstwo instruments (the focal as (the focal as well as an alternative instrument), and look for consistency well as an alternative instrument), and look for consistency of scores across the of scores across the two instrumentstwo instruments, i.e., , i.e., compute rcompute r ((Parallel Form ReliabilityParallel Form Reliability) )

(See next slide for how to compute(See next slide for how to compute correlation coefficient r) correlation coefficient r)

X = Each subjects’ score on the new instrument (X = Each subjects’ score on the new instrument (form Aform A))

Y = Each subject’s score on the alternative instrument (Y = Each subject’s score on the alternative instrument (form Bform B))

__ _ _ _ __ _ _ _ __

XX Y Y X – XX – X Y – Y Y – Y (X – X) (Y – Y) (X – X) (Y – Y) (X – X)(X – X)22 (Y – Y) (Y – Y)2 2

1111 1212 -4-4 -4-4 1616 16 16 16 16

1717 1717 2 2 1 1 2 2 4 4 1 1

1616 1414 1 1 -2-2 -2 -2 1 1 4 4

.. . . . . . . . . .. ..

.. . . . . . . . . .. ..

.. . . . . . . . . .. ..

__ __ _ _ _ _ _ _ _ _

X=15X=15 Y=16Y=16 ∑∑(X – X) (Y – Y) (X – X) (Y – Y) ∑∑ (X – X) (X – X)22 ∑∑ (Y – Y)(Y – Y)22

r = parallel-form reliability coefficientr = parallel-form reliability coefficientAnother way to assess reliability for multi-item summated scales?

Computing Reliability CoefficientComputing Reliability Coefficient

22 )()(

))((r

yyxx

yyxx

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

D.D. In the case of In the case of multi-item summatedmulti-item summated rating scales, you can rating scales, you can artificially createartificially create two alternative instruments (parallel forms) two alternative instruments (parallel forms) by splitting the multiple items into two halvesby splitting the multiple items into two halves and then look and then look for consistency of scores across the two halves.for consistency of scores across the two halves.• compute rcompute r between pairs of summated scores between pairs of summated scores ((Split- Split-

Half ReliabilityHalf Reliability).). Let’s See an EXAMPLE.Let’s See an EXAMPLE.

ACCURACY OF MEASUREMENT: RELIABILITYACCURACY OF MEASUREMENT: RELIABILITYSummated multi-item scale given to 340 individuals to measure Summated multi-item scale given to 340 individuals to measure

Self-EsteemSelf-EsteemKEY:KEY: 1 = Never true, 2 = Seldom true, 3 = Sometimes true,1 = Never true, 2 = Seldom true, 3 = Sometimes true,

4 = Often true, 5 = Almost always true 4 = Often true, 5 = Almost always true 1.1. I feel that I have a number of good qualitiesI feel that I have a number of good qualities2.2. I wish I could have more respect for myself I wish I could have more respect for myself (R)(R)3.3. I feel that I’m a person of worth, at least on an equal plane with I feel that I’m a person of worth, at least on an equal plane with

othersothers4.4. I feel I do not have much to be proud of I feel I do not have much to be proud of (R)(R)5.5. I take a positive attitude toward myselfI take a positive attitude toward myself6.6. I certainly feel useless at times I certainly feel useless at times (R)(R)7.7. All in all, I’m inclined to feel that I am a failure All in all, I’m inclined to feel that I am a failure (R)(R)8.8. I am able to do things as well as most other peopleI am able to do things as well as most other people9.9. As times I think I am not good at all As times I think I am not good at all (R)(R)10.10. On the whole, I am satisfied with myselfOn the whole, I am satisfied with myself

R = Reverse items--these R = Reverse items--these must bemust be reverse-coded before any subsequent reverse-coded before any subsequent analysis is performed.analysis is performed.

______________________________SOURCE: Carmines, E.G., & Zeller, R.A., 1979, Reliability and Validity Assessment, Beverly Hills, CA: SOURCE: Carmines, E.G., & Zeller, R.A., 1979, Reliability and Validity Assessment, Beverly Hills, CA:

Sage Publications.Sage Publications.

Summated multi-item scale given to 340 individuals to measure Summated multi-item scale given to 340 individuals to measure Self-EsteemSelf-Esteem

KEY:KEY: 1 = Never true, 2 = Seldom true, 3 = Sometimes true,1 = Never true, 2 = Seldom true, 3 = Sometimes true, 4 = Often true, 5 = Almost always true 4 = Often true, 5 = Almost always true

1.1. I feel that I have a number of good qualitiesI feel that I have a number of good qualities2.2. I wish I could have more respect for myself (R)I wish I could have more respect for myself (R)3.3. I feel that I’m a person of worth, at least on anI feel that I’m a person of worth, at least on an

equal plane with othersequal plane with others4.4. I feel I do not have much to be proud of (R)I feel I do not have much to be proud of (R)5.5. I take a positive attitude toward myselfI take a positive attitude toward myself6.6. I certainly feel useless at times (R)I certainly feel useless at times (R)7.7. All in all, I’m inclined to feel that I am a failure (R)All in all, I’m inclined to feel that I am a failure (R)8.8. I am able to do things as well as most other peopleI am able to do things as well as most other people9.9. As times I think I am not good at all (R)As times I think I am not good at all (R)10.10. On the whole, I am satisfied with myselfOn the whole, I am satisfied with myself

R = Reverse items--these R = Reverse items--these must bemust be reverse-coded before any subsequent reverse-coded before any subsequent analysis is performed.analysis is performed.

______________________________SOURCE: Carmines, E.G., & Zeller, R.A., 1979, Reliability and Validity Assessment, Beverly Hills, CA: SOURCE: Carmines, E.G., & Zeller, R.A., 1979, Reliability and Validity Assessment, Beverly Hills, CA:

Sage PublicationsSage Publications..

ACCURACY OF MEASUREMENT: RELIABILITYACCURACY OF MEASUREMENT: RELIABILITY

Instrument A: Instrument A: summated summated score A score A

Instrument B:Instrument B: summated summated score B score B

X = Each subjects’ X = Each subjects’ summated score Asummated score A (on the (on the 11stst half half of items, e.g., of items, e.g., items 1-5items 1-5))

Y = Each subject’s Y = Each subject’s summated score Bsummated score B (on the (on the 2nd half2nd half of items, e.g., of items, e.g., items 6-10items 6-10))

__ _ _ _ __ _ _ _ __

XX Y Y X – XX – X Y – Y Y – Y (X – X) (Y – Y) (X – X) (Y – Y) (X – X)(X – X)22 (Y – Y) (Y – Y)2 2

1111 1212 -4-4 -4-4 1616 16 16 16 16

1717 1717 2 2 1 1 2 2 4 4 1 1

1616 1414 1 1 -2-2 -2 -2 1 1 4 4

.. . . . . . . . . .. ..

.. . . . . . . . . .. ..

.. . . . . . . . . .. ..

__ __ _ _ _ _ _ _ _ _

X=15X=15 Y=16Y=16 ∑∑(X – X) (Y – Y) (X – X) (Y – Y) ∑∑ (X – X) (X – X)22 ∑∑ (Y – Y)(Y – Y)22

r = split-half reliability coefficientr = split-half reliability coefficient

Computing Reliability CoefficientComputing Reliability Coefficient

22 )()(

))((r

yyxx

yyxx

• It is It is unstableunstable--it is not reflected in a single --it is not reflected in a single coefficient. coefficient. Solution?Solution?

Take into account all possible ways that a scale can be split into Take into account all possible ways that a scale can be split into two halves and two halves and average all possible Split-Half Reliability average all possible Split-Half Reliability CoefficientsCoefficients . That is, compute . That is, compute Cronbach’s Alpha (Inter-item Cronbach’s Alpha (Inter-item Consistency Reliability)Consistency Reliability) An index of homogeneity/consistency/congruence among the An index of homogeneity/consistency/congruence among the

multiple items designed to measure the same concept.multiple items designed to measure the same concept. Shows how well the Shows how well the items measuring a particular constructitems measuring a particular construct hang hang

together?together?– if the items do not generate consistent results, then suspect that they may if the items do not generate consistent results, then suspect that they may

be measuring different constructsbe measuring different constructs

Alpha (Alpha ()= nr / [1+ r (n-1)])= nr / [1+ r (n-1)]Where,Where, n n = number of items in the scale, and= number of items in the scale, and

r r = the mean of correlations among all items.= the mean of correlations among all items.

ACCURACY OF MEASUREMENT: RELIABILITYACCURACY OF MEASUREMENT: RELIABILITYProblem with Split- Half Reliability Coefficient?

ACCURACY OF MEASUREMENT: RELIABILITYACCURACY OF MEASUREMENT: RELIABILITY

ItemsItems 11 22 33 44 55 66 77 88 99 1010

11 1.001.00 .185.185 .451.451 .399.399 .413.413 .263.263 .394.394 .352.352 .361.361 .204.204

22 1.001.00 .048.048 .209.209 .248.248 .246.246 .230.230 .050.050 .277.277 .270.270

33 1.001.00 .350.350 .399.399 .209.209 .381.381 .427.427 .276.276 .332.332

44 1.001.00 .369.369 .415.415 .469.469 .280.280 .358.358 .221.221

55 1.001.00 .338.338 .446.446 .457.457 .317.317 .425.425

66 1.001.00 .474.474 .214.214 .502.502 .189.189

77 1.001.00 .315.315 .577.577 .311.311

88 1.001.00 .299.299 .374.374

99 1.001.00 .233.233

1010 1.001.00

Correlation Coefficients Among Self Esteem Items:

NOTE: NOTE: Reverse items Reverse items must bemust be reverse-coded reverse-coded beforebefore computing above computing above correlations.correlations.

The 1.00s in the The 1.00s in the diagonal cellsdiagonal cells of the correlation matrix of the correlation matrix must NOTmust NOT be included be included in the computation of reliability coefficient in the computation of reliability coefficient αα

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: RELIABILITYRELIABILITY

Alpha = nr / [1+ r (n-1)]Alpha = nr / [1+ r (n-1)]

rr = (.185 + .451 + .399 + . . . . . + .299 + .374 + .233) / 45 = .32 = (.185 + .451 + .399 + . . . . . + .299 + .374 + .233) / 45 = .32

Note: The 1.00s in the diagonal cells of the correlation matrixNote: The 1.00s in the diagonal cells of the correlation matrixshould should NOTNOT be included in the above computation be included in the above computation

Alpha (Alpha () = 10 (.32) / [ 1 + .32 (9) ] = .82) = 10 (.32) / [ 1 + .32 (9) ] = .82

NOTE: NOTE: Cronbach’s Alpha can be computed and reported Cronbach’s Alpha can be computed and reported ONLYONLY for forsummated multi-itemsummated multi-item scales. scales.

REMEMBER—It shows how well the REMEMBER—It shows how well the multiple items multiple items measuring measuring a particular constructa particular construct hang together? hang together?

-- EXAMPLE: Let’s see the SPSS OUTPUT for a -- EXAMPLE: Let’s see the SPSS OUTPUT for a 4-item measure4-item measure of of organizational organizational loyaltyloyalty. .

Item-Total Statistics

Scale Mean if Item Deleted

Scale Variance if

Item Deleted

Corrected Item-Total

Correlation

Squared Multiple

Correlation

Cronbach's Alpha if Item

Deleted If I were completely free to choose, I would continue working for my current employer

9.81 9.81 .789 .633 .849

I feel a strong sense of loyalty to the org I work for 9.71 12.803 .720 .539 .874

I often think of leaving the org. I work for 10.06 11.857 .721 .541 .876

I don't have a strong personal desire to continue working for my current employer

9.76 11.447 .816 .668 .838

Inter-Item Correlation Matrix - - a 4-Item Summated Scale for Measuring ORG. LOYALTY (n = 518)

1. If I were completely free to choose, I would continue working for my

current employer

2. I feel a strong sense of

loyalty to the org I work for

3. I often think of leaving the org. I work for

4. I don't have a strong personal desire to

continue working for my current employer

1. If I were completely free to choose, I would continue working for my current employer 1.000 .690 .646 .743

2. I feel a strong sense of loyalty to the org I work for .690 1.000 .573 .674 3. I often think of leaving the org. I work for .646 .573 1.000 .711 4. I don't have a strong personal desire to continue working for my current employer .743 .674 .711 1.000

Response Options: 7-Point Scales (1= Strongly Disagree, 7 = Strongly Agree)

Reliability Statistics

Cronbach's Alpha

Cronbach's Alpha Based on

Standardized Items N of Items

.891 .892 4

_ r = .6728 n = 4

= 4 (.6728) / [1+ 3 (.6728)]

= 0.8916Use these for item analysis; i.e., determin-ing quality of individual items.

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONTENT VALIDITY CONTENT VALIDITY

DefineDefine the Construct Theoretically. the Construct Theoretically.– Clearly determineClearly determine what specifically the instrument/scale what specifically the instrument/scale should be measuringshould be measuring..

Search the literatureSearch the literature to understand the concept’s domain. to understand the concept’s domain.– Identify all the potential Identify all the potential dimensions, issues, and elementsdimensions, issues, and elements to be included. to be included.

Develop a list of itemsDevelop a list of items (questions/statements) for measuring the (questions/statements) for measuring the conceptconcept

Ask a few experts to Ask a few experts to judge/rate each itemjudge/rate each item for its relevance to the for its relevance to the concept’s domain.concept’s domain.– Watch for/identify Watch for/identify items causing discussionitems causing discussion or requiring explanations or requiring explanations– Also, seek suggestions for item Also, seek suggestions for item additions/deletions.additions/deletions.

ModifyModify based on feedback. based on feedback.

(continued)(continued)

Steps to ensure Steps to ensure content validitycontent validity and and higher reliabilityhigher reliability when when constructing a constructing a multi-item scalemulti-item scale::

ACCURACY OF MEASUREMENT: ACCURACY OF MEASUREMENT: CONTENT VALIDITY CONTENT VALIDITY

PretestPretest the scale. the scale.– Test it on Test it on a group similar to populationa group similar to population being studied. being studied.– Encourage thinking aloudEncourage thinking aloud/indicating their thoughts as they consider /indicating their thoughts as they consider

each instruction/item to identify problematic items.each instruction/item to identify problematic items.– Encourage suggestions and criticismsEncourage suggestions and criticisms--don’t get defensive.--don’t get defensive. – Examine descriptive statistics Examine descriptive statistics for scale means too close to for scale means too close to

minimum/maximum values; they may signal range restriction and can minimum/maximum values; they may signal range restriction and can be candidates for modification.be candidates for modification.

Do Do reliability analysisreliability analysis to identify to identify items that don’t hang togetheritems that don’t hang together with the rest of items.with the rest of items.– For each item, compareFor each item, compare : :

(a)(a) Cronbach’s alpha of the scale if that particular item were to be Cronbach’s alpha of the scale if that particular item were to be deleted from the scale, with deleted from the scale, with

(b)(b) The multi-item scale’s overall Cronbach’s alpha when all items are The multi-item scale’s overall Cronbach’s alpha when all items are includedincluded

Whenever alpha increases as a result of deleting an item (i.e., a > b), Whenever alpha increases as a result of deleting an item (i.e., a > b), that item is a candidate for deletion/revision.that item is a candidate for deletion/revision.

Revise/deleteRevise/delete ambiguous/problematic items/instructions. ambiguous/problematic items/instructions.

Steps to ensure Steps to ensure content validitycontent validity and and higher reliabilityhigher reliability when when constructing a constructing a multi-item scalemulti-item scale::

QUESTIONS OR COMMENTSQUESTIONS OR COMMENTS

??