1 Lecture 3 Validity of screening and diagnostic tests Reliability: kappa coefficient Criterion...

Preview:

Citation preview

1

Lecture 3Validity of screening and

diagnostic tests• Reliability: kappa coefficient

• Criterion validity: – “Gold” or criterion/reference standard– Sensitivity, specificity, predictive value– Relationship to prevalence – Likelihood ratio– ROC curve– Diagnostic odds ratio

2

Clinical/public health applications

• screening: – for asymptomatic disease (e.g., Pap test,

mammography)• for risk (e.g., family history of breast cancer

• case-finding: testing of patients for diseases unrelated to their complaint

• diagnostic: to help make diagnosis in symptomatic disease or to follow-up on screening test

3

Evaluation of screening and diagnostic tests

• Performance characteristics– test alone

• Effectiveness (on outcomes of disease):– test + intervention

4

Criteria for test selection

• Reliability

• Validity

• Feasibility

• Simplicity

• Cost

• Acceptability

5

Measures of inter- and intra-rater reliability: categorical data

• Percent agreement– limitation: value is affected by prevalence -

higher if very low or very high prevalence

• Kappa statistic– takes chance agreement into account– defined as fraction of observed agreement not

due to chance

6

Kappa statistic

Kappa = p(obs) - p(exp)

1 - p(exp)

p(obs): proportion of observed agreement

p(exp): proportion of agreement expected by chance

7

Example of Computation of Kappa

Agreement between the First and the Second Readings to Identify Atherosclerosis Plaquein the Left Carotid Bifurcation by B-Mode Ultrasound Examination in theAtherosclerosis Risk in Communities (ARIC) Study

First ReadingPlaque Normal Total

Second reading Plaque 140 52 192Normal 69 725 794Total 209 777 986

Observed agreement = 140 +725/986 = 0.877

Chance agreement for plaque – plaque cell = (209 x 192)/986 = 40.7

Chance agreement for normal- normal cell = 777 x 794/986 = 625.7

Total chance agreement = 40.7 + 625.7/986 = 0.676

Kappa = 0.877 – 0.676 = 0.62 1 – 0.676

8

Interpretation of kappa

• Various suggested interpretations

• Example: Lanis & Koch, Fleiss excellent: over 0.75

fair to good: 0.40 - 0.75

poor: less than 0.40

9

Validity (accuracy) of screening/diagnostic tests

• Face validity, content validity: judgement of the appropriateness of content of measurement

• Criterion validity – concurrent– predictive

10

Normal vs abnormal

• Statistical definition– “Gaussian” or “normal” distribution

• Clinical definition – using criterion

11

12

13

14

15

Selection of criterion(“gold” or criterion standard)

• Concurrent– salivary screening test for HIV– history of cough more than 2 weeks (for TB)

• Predictive– APACHE (acute physiology and chronic

disease evaluation) instrument for ICU patients – blood lipid level– maternal height

16

Sensitivity and specificity

Assess correct classification of:

• People with the disease (sensitivity)

• People without the disease (specificity)

17

"True" Disease Status

Screeningtest results

Present Absent

Positive "True positives"A

"False positives"B

Negative "False negatives"C

"True negatives"D

Sensitivity of screening test = A A + C

Specificity of screening test = D B + D

Predictive value of positive test = A A + B

Predictive value of negative test = D C + D

18

Predictive value

• More relevant to clinicians and patients

• Affected by prevalence

19

Choice of cut-point

If higher score increases probability of disease

• Lower cut-point:– increases sensitivity, reduces specificity

• Higher cut-point:– reduces sensitivity, increases specificity

20

Considerations in selection of cut-point

Implications of false positive results

• burden on follow-up services

• labelling effect

Implications of false negative results

• Failure to intervene

21

Receiver operating characteristic (ROC) curve

• Evaluates test over range of cut-points

• Plot of sensitivity against 1-specificity

• Area under curve (AUC) summarizes performance:– AUC of 0.5 = no better than chance

22

23

Likelihood ratio

• Likelihood ratio (LR) = sensitivity

1-specificity

• Used to compute post-test odds of disease from pre-test odds:

post-test odds = pre-test odds x LR

• pre-test odds derived from prevalence

• post-test odds can be converted to predictive value of positive test

24

Example of LR

• prevalence of disease in a population is 25%

• sensitivity is 80%

• specificity is 90%,

• pre-test odds = 0.25 = 1/3

1 - 0.25

• likelihood ratio = 0.80 = 8

1-0.90

25

Example of LR (cont)

• If prevalence of disease in a population is 25%

• pre-test odds = 0.25 = 1/3

1 - 0.25

• post-test odds = 1/3 x 8 = 8/3

• predictive value of positive result = 8/3+8

= 8/11 = 73%

26

Diagnostic odds ratio

• Ratio of odds of positive test in diseased vs odds of negative test in non-diseased:

a.d

b.c

• From previous example:

OR = 8 x 27 = 36

2 x 3

27

Summary: LR and DPR

• Values:– 1 indicates that test performs no better than

chance – >1 indicates better than chance– <1 indicates worse than chance

• Relationship to prevalence?

28

Applications of LR and DOR

• Likelihood ratio: Primarily in clinical context, when interest is in how much the likelihood of disease is increased by use of a particular test

• Diagnostic odds ratio Primarily in research, when interest is in factors that are associated with test performance (e.g., using logistic regression)

Recommended