44
Reliability and Reliability and Validity Validity Hatim Al-Jifree Hatim Al-Jifree MB;ChB(Hon), FRCSC, GOC, MMedEd MB;ChB(Hon), FRCSC, GOC, MMedEd

Reliability and validity

Embed Size (px)

DESCRIPTION

Reliability and validity by Dr. Hatim Al-Jifri as part of the 5th Research Summer School - Jeddah at KAIMRC - WR

Citation preview

Page 1: Reliability and validity

Reliability and ValidityReliability and Validity

Hatim Al-JifreeHatim Al-JifreeMB;ChB(Hon), FRCSC, GOC, MMedEdMB;ChB(Hon), FRCSC, GOC, MMedEd

Page 2: Reliability and validity

Lecture objectivesLecture objectives

To review the definitions of reliability To review the definitions of reliability

and validityand validity

To review methods of evaluating To review methods of evaluating

reliability and validity in survey researchreliability and validity in survey research

EBM prospectiveEBM prospective

Page 3: Reliability and validity

ReliabilityReliability

Page 4: Reliability and validity

DefinitionDefinition

The degree of The degree of stabilitystability exhibited when a exhibited when a measurement is measurement is repeatedrepeated under identical under identical conditionsconditions

Lack of reliability may arise from Lack of reliability may arise from divergences between divergences between observersobservers or or instrumentsinstruments of measurement or of measurement or instabilityinstability of the attribute being of the attribute being measured measured

(from Last. Dictionary of Epidemiology)(from Last. Dictionary of Epidemiology)

Page 5: Reliability and validity

Assessment of reliabilityAssessment of reliability

Reliability is assessed in 3 formsReliability is assessed in 3 forms

1.1. Test-retest reliabilityTest-retest reliability

2.2. Alternate-form reliabilityAlternate-form reliability

3.3. Internal consistency reliabilityInternal consistency reliability

Page 6: Reliability and validity

Test-retest reliabilityTest-retest reliability

Most common form in surveysMost common form in surveys

Same respondents complete a survey Same respondents complete a survey

at at twotwo different points in different points in timetime

Usually quantified with a Usually quantified with a correlation correlation

coefficient (coefficient (rr value) value)

rr values are considered good if values are considered good if rr

0.700.70

Page 7: Reliability and validity

Test-retest reliability (2)Test-retest reliability (2)

If data are recorded by an observer, If data are recorded by an observer,

you can have the you can have the same observer same observer

make make twotwo separate measurements separate measurements

The comparison between the two The comparison between the two

measurements is measurements is intrintraaobserverobserver

reliabilityreliability

What does a difference mean?What does a difference mean?

Page 8: Reliability and validity

Test-retest reliability (3)Test-retest reliability (3)

You can test-retest You can test-retest specific questions specific questions

or the or the entireentire survey instrument survey instrument

Variables likely to change over a short Variables likely to change over a short

period of time, such as energy, period of time, such as energy,

happiness, anxiety happiness, anxiety

Test-retest over very short periods of Test-retest over very short periods of

timetime

Page 9: Reliability and validity

Test-retest reliability (4)Test-retest reliability (4)

Potential problem with test-retest Potential problem with test-retest

is the is the practice effectpractice effect

Individuals become familiar with Individuals become familiar with

the itemsthe items

What effect does this have on your What effect does this have on your

reliability estimates?reliability estimates?

It inflates the reliability estimateIt inflates the reliability estimate

Page 10: Reliability and validity

Alternate-form reliabilityAlternate-form reliability

Use differently worded forms Use differently worded forms to measure the same attributeto measure the same attribute

Questions or responses are Questions or responses are reworded reworded

Or their order is changed Or their order is changed

To produce two items that are To produce two items that are similar but not identicalsimilar but not identical

Page 11: Reliability and validity

Alternate-form reliability Alternate-form reliability (2)(2)

Two items address: Two items address:

The same aspect of behavior The same aspect of behavior

Same vocabulary Same vocabulary

Same level of difficultySame level of difficulty

Items should differ in wording onlyItems should differ in wording only

It is common to simply change the order of It is common to simply change the order of

the response alternativesthe response alternatives

This reduces practice effectThis reduces practice effect

Page 12: Reliability and validity

Example: Assessment of Example: Assessment of depressiondepression

Circle one itemCircle one item

Version A:Version A:

During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:

Every dayEvery day 11

Some daysSome days 22

NeverNever 33

Version B:Version B:

During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:

NeverNever 11

Some daysSome days 22

Every dayEvery day 33

Page 13: Reliability and validity

Alternate-form reliability Alternate-form reliability (3)(3)

You could also You could also change the change the

wording of the wording of the responseresponse

alternatives without changing alternatives without changing

the meaningthe meaning

Page 14: Reliability and validity

Example: Assessment of urinary Example: Assessment of urinary functionfunction

Version A:Version A:

During the past week, how often did you usually During the past week, how often did you usually empty your bladder?empty your bladder?

1 to 2 times per day1 to 2 times per day

3 to 4 times per day3 to 4 times per day

5 to 8 times per day5 to 8 times per day

12 times per day12 times per day

More than 12 times per dayMore than 12 times per day

Page 15: Reliability and validity

Example: Assessment of urinary Example: Assessment of urinary functionfunction

Version B:Version B:

During the past week, how often did you usually During the past week, how often did you usually empty your bladder?empty your bladder?

Every 12 to 24 hoursEvery 12 to 24 hours

Every 6 to 8 hoursEvery 6 to 8 hours

Every 3 to 5 hoursEvery 3 to 5 hours

Every 2 hoursEvery 2 hours

More than every 2 hoursMore than every 2 hours

Page 16: Reliability and validity

Alternate-form reliability Alternate-form reliability (4)(4)

You could also change the actual You could also change the actual

wording of the wording of the questionquestion

The two items must be equivalentThe two items must be equivalent

Items with different degrees of difficulty do Items with different degrees of difficulty do

not measure the same attributenot measure the same attribute

What might they measure?What might they measure?

Reading comprehension or cognitive functionReading comprehension or cognitive function

Page 17: Reliability and validity

Example: Assessment of Example: Assessment of lonelinessloneliness

Version A:Version A:

How often in the past month have you felt alone in the world?How often in the past month have you felt alone in the world?

Every dayEvery day

Some daysSome days

OccasionallyOccasionally

NeverNever

Version B: Version B:

During the past 4 weeks, how often have you felt a sense of loneliness?During the past 4 weeks, how often have you felt a sense of loneliness?

All of the timeAll of the time

SometimesSometimes

From time to timeFrom time to time

NeverNever

Page 18: Reliability and validity

Example of nonequivalent item Example of nonequivalent item rewordingrewording

Version A:Version A:

When your boss blames you for something you did not do, how often When your boss blames you for something you did not do, how often do you stick up for yourself?do you stick up for yourself?

All the timeAll the time

Some of the timeSome of the time

None of the timeNone of the time

Version B:Version B:

When presented with difficult professional situations where a superior When presented with difficult professional situations where a superior censures you for an act for which you are not responsible, how censures you for an act for which you are not responsible, how frequently do you respond in an assertive way?frequently do you respond in an assertive way?

All of the timeAll of the time

Some of the timeSome of the time

None of the timeNone of the time

Page 19: Reliability and validity

Alternate-form reliability Alternate-form reliability (5)(5)

You can measure alternate-form reliability at the You can measure alternate-form reliability at the

same timepointsame timepoint or or separate timepointsseparate timepoints

If large enough sample:If large enough sample:

You can split it in half and administer one item to You can split it in half and administer one item to

each half each half

Then compare the two halvesThen compare the two halves

This is called a split-halves methodThis is called a split-halves method

Can split into thirds and administer three forms of the Can split into thirds and administer three forms of the

itemitem

Page 20: Reliability and validity

Internal consistency Internal consistency reliabilityreliability

Applied to Applied to groups of items groups of items that are thought that are thought

to measure to measure different aspects different aspects of the of the same same

conceptconcept

CronbachCronbach’’s coefficient alphas coefficient alpha

Measures internal consistency reliability Measures internal consistency reliability

It is a reflection of how well the different items It is a reflection of how well the different items

complement eachcomplement each

Interpret like a correlation coefficient (Interpret like a correlation coefficient (0.70 is 0.70 is

good)good)

Page 21: Reliability and validity

Example: Assessment of physical Example: Assessment of physical functionfunction

Limited a lot

Limited a little

Not limited

Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports

1 2 3

Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf

1 2 3

Lifting or carrying groceries 1 2 3

Climbing several flights of stairs 1 2 3

Bending, kneeling, or stooping 1 2 3

Walking more than a mile 1 2 3

Walking several blocks 1 2 3

Walking one block 1 2 3

Bathing or dressing yourself 1 2 3

Page 22: Reliability and validity

Calculation of CronbachCalculation of Cronbach’’s coefficient alphas coefficient alpha

Example: Assessment of emotional healthExample: Assessment of emotional health

During the past month: During the past month: Yes No Yes No

Have you been a very nervous person?Have you been a very nervous person? 1 1 0 0

Have you felt downhearted and blue?Have you felt downhearted and blue? 1 1 0 0

Have you felt so down in the dumps thatHave you felt so down in the dumps that nothing could cheer you up? nothing could cheer you up? 1 0 1 0

Page 23: Reliability and validity

ResultsResults Patient

Item 1

Item 2

Item 3

Summed scale score

1 0 1 1 2

2 1 1 1 3

3 0 0 0 0

4 1 1 1 3

5 1 1 0 2

Percentage positive

3/5=.6

4/5=.8

3/5=.6

Page 24: Reliability and validity

CalculationsCalculationsMean score=2Mean score=2

Sample variance=Sample variance=

86.02

3

5.1

)4)(.6(.)2)(.8(.)4)(.6(.1

1

)(%)(%1

k

k

Var

negposalphaCC

ii

Conclude that this scale has good reliability

Page 25: Reliability and validity

Internal consistency Internal consistency reliability (2)reliability (2)

If internal consistency is low:If internal consistency is low:

You can add more items You can add more items

Re-examine existing items Re-examine existing items

for clarityfor clarity

Page 26: Reliability and validity

Interobserver reliabilityInterobserver reliability

How well How well twotwo evaluators agree in evaluators agree in

their assessment of a variabletheir assessment of a variable

Use Use correlation coefficient correlation coefficient to to

compare data between observerscompare data between observers

May be used as May be used as property of the property of the

test test or as an or as an outcome variableoutcome variable

Page 27: Reliability and validity

ValidityValidity

Page 28: Reliability and validity

DefinitionDefinition

How well a survey How well a survey

measures what it measures what it

sets out to measure sets out to measure

Page 29: Reliability and validity

Assessment of validityAssessment of validity

Validity is measured in four formsValidity is measured in four forms

Face validityFace validity

Content validityContent validity

Criterion validityCriterion validity

Construct validityConstruct validity

Page 30: Reliability and validity

Face validityFace validity

Cursory review of survey items by Cursory review of survey items by

untrained judgesuntrained judges

Ex. Showing the survey to Ex. Showing the survey to untrained untrained

individualsindividuals to see whether they to see whether they

think the items look okaythink the items look okay

Very casual, softVery casual, soft

Many donMany don’’t really consider this as a t really consider this as a

measure of validity at allmeasure of validity at all

Page 31: Reliability and validity

Content validityContent validity

SubjectiveSubjective measure of how appropriate measure of how appropriate

the items seem to a set of reviewers who the items seem to a set of reviewers who

have have some knowledge some knowledge of the subject of the subject

mattermatter

Usually consists of an organized review Usually consists of an organized review

of the surveyof the survey’’s contentss contents

Still very qualitativeStill very qualitative

Page 32: Reliability and validity

Criterion validityCriterion validity

Measure of how well Measure of how well one instrument one instrument

stacks up stacks up against another instrument against another instrument

or predictoror predictor

ConcurrentConcurrent: assess your instrument : assess your instrument

against a against a ““gold standardgold standard””

PredictivePredictive: assess the ability of your : assess the ability of your

instrument to forecast instrument to forecast future eventsfuture events, ,

behavior, attitudes, or behavior, attitudes, or outcomesoutcomes

Assess with Assess with correlation coefficientcorrelation coefficient

Page 33: Reliability and validity

Construct validityConstruct validity

Most Most valuablevaluable and most and most

difficultdifficult measure of validity measure of validity

Basically, it is a measure of Basically, it is a measure of

how meaningful the scale or how meaningful the scale or

instrument is instrument is when it is in when it is in

practical usepractical use

Page 34: Reliability and validity

Construct validity (2)Construct validity (2)

ConvergentConvergent: Implies that : Implies that several several

different methodsdifferent methods for obtaining the for obtaining the

same informationsame information about a given trait or about a given trait or

concept produce similar resultsconcept produce similar results

Evaluation is analogous to Evaluation is analogous to alternate-form alternate-form

reliabilityreliability exceptexcept that it is that it is more more

theoreticaltheoretical and requires a great deal of and requires a great deal of

work-usually work-usually by multiple investigators by multiple investigators

with different approacheswith different approaches

Page 35: Reliability and validity

Construct validity (3)Construct validity (3)

DivergentDivergent: The ability of a : The ability of a

measure to estimate the measure to estimate the

underlying truth in a given area-underlying truth in a given area-

must be shown not to correlate must be shown not to correlate

too closely with similar but too closely with similar but

distinct concepts or traits distinct concepts or traits

Page 36: Reliability and validity

EBM ProspectiveEBM Prospective

Page 37: Reliability and validity

IntroductionIntroduction

Three Steps in Using Medical Three Steps in Using Medical

Literature Articles :Literature Articles :

Are the results of the study Are the results of the study

valid? valid?

What are the results? What are the results?

How can I apply these results How can I apply these results

to patient care? to patient care?

Page 38: Reliability and validity

IntroductionIntroduction

Four types of papers:Four types of papers:

TherapyTherapy

Diagnostic InterventionDiagnostic Intervention

PrognosisPrognosis

Systematic reviewSystematic review

Page 39: Reliability and validity

Therapy Therapy

Study design: RCTStudy design: RCT

Were Patients Randomized? Were Patients Randomized?

Was Randomization Concealed? Was Randomization Concealed?

Were Patients Analyzed in the Groups Were Patients Analyzed in the Groups

to Which They Were Randomized? to Which They Were Randomized?

Intention to treat analysisIntention to treat analysis

Page 40: Reliability and validity

TherapyTherapy

Were Patients in Were Patients in The Treatment The Treatment And Control Groups And Control Groups Similar With Respect to Similar With Respect to Known Prognostic Factors? Known Prognostic Factors?

Were Patients Aware of Were Patients Aware of Group Allocation? Group Allocation?

Page 41: Reliability and validity

TherapyTherapy

Were Clinicians Aware of Were Clinicians Aware of Group Allocation? Group Allocation?

Were Outcome Assessors Were Outcome Assessors Aware of Group Allocation? Aware of Group Allocation?

Was Follow-up Complete? Was Follow-up Complete?

Was Follow-up Long Enough? Was Follow-up Long Enough?

Page 42: Reliability and validity

Diagnostic InterventionDiagnostic InterventionStudy Design: Cross-sectional Study Design: Cross-sectional

Was there an independent, blind comparison with Was there an independent, blind comparison with a reference standard?a reference standard?

•Spectrum of patientsSpectrum of patients

•Did the results of the test being evaluated Did the results of the test being evaluated influence the decision to perform the reference influence the decision to perform the reference standard?standard?

•Were the methods description permit replication? Were the methods description permit replication?

Page 43: Reliability and validity

PrognosisPrognosis• Study design: Cohort Study design: Cohort

• Was a Was a – Defined, Defined, – representative sample of patient representative sample of patient – assembled at a common point in the course of assembled at a common point in the course of

their disease?their disease?

• Inception Cohort; early Inception Cohort; early

• Late stage prognosisLate stage prognosis

• Patient equal in all prognostic factorsPatient equal in all prognostic factors• Stratified analysis?Stratified analysis?

• Follow up complete and long enoughFollow up complete and long enough

• Valid and reliable data collectionValid and reliable data collection

Page 44: Reliability and validity

Thank YouThank You