1 Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2010 Note to users: you may find the additional notes

1

Interpreting Diagnostic Tests

Ian McDowellDepartment of Epidemiology &

Community MedicineJanuary 2010

Note to users: you may find the additional notes & explanations in the ppt notes panel helpful.

2

Objectives

• To understand sources of error in typical measurements

• To understand sensitivity, specificity

• To explain the implications of false positives and false negatives

• To understand predictive values,

• And Likelihood ratios

3

Road map to date

It builds on some of the ideasintroduced last term:

This session considers the interpretation of diagnostic tests, a daily issue in clinical practice.

Measurements:validity, biasdeterminants of bias

Applying conclusions from a study sample

to an individual patient

Evidence-based practice

Contrasts between researchon hospital patients

and community practice

4

The Challenge of Clinical Measurement

• Diagnoses are based on information, from formal measurements and/or from your clinical judgment.

• This information is seldom perfectly accurate:– Random errors can occur (machine not working?)– Biases in judgment or measurement can occur (“this kid

doesn’t look sick”)– Due to biological variability, this patient may not fit the

general rule– Diagnosis (e.g., hypertension) involves a categorical

judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point is challenging.

5

Therefore…• You need to be aware …

– Diagnostic judgments are based on probabilities;– That using a quantitative approach is better than

just guessing!– That you will gradually become familiar with the

typical accuracy of measurements in your chosen clinical field;

– That the principles apply to both diagnostic and screening tests;

– Of some of the ways to describe the accuracy of a measurement.

Why choose one test and not another?

• Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores). “Is it measuring something?”

• Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can I draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction)

• Safety, Acceptability, Cost, etc.6

7

Reliability and ValidityReliability

Low High

Validity Low

High

•

••

•

•

•

•

• •

••

•

•••• ••

•••••

•

Biasedresult!

☺Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work

Ways of Assessing Validity

• Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms?

• Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy)– Expressed as sensitivity and specificity

• Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard)

8

Criterion validation: “Gold Standard”

The criterion that your clinical observation or simple test is judged against:

– more definitive (but expensive or invasive) tests, such as a complete work-up, or

– the clinical outcome (for screening tests, when workup of well patients is unethical).

Sensitivity and specificity are calculatedfrom a research study comparing the test to a gold standard.

9

10

“2 x 2” table for validating a test

TP = true positive; FP = false positive…

Golden Rule: always calculate based on the gold standard

Gold standard

Disease DiseasePresent AbsentTest score:

Test positive Test negative

a (TP) b (FP)

c (FN) d (TN)

Validity: Sensitivity Specificity = a/(a+c) = d/(b+d)

= TP/Diseased = TN/Healthy

A Bit More on Sensitivity

= Test’s ability to detect disease when it is present

a/(a+c) = TP/(TP+FN) = TP/disease

Mnemonics: - a sensitive person is one who is aware of your feelings- (1 – seNsitivity) = false Negative rate = how many cases are missed by the screening test?

11

…and More on Specificity

Precision of the test – a specific test would identify only that type of

disease. “Nothing else looks like this”– a highly specific test generates few false

positives. So,– If the result is positive, you can be confident

the patient has this diagnosis.

• Mnemonics: (1- sPecificity) = false Positive rate (How many are falsely classified as having the disease?)

12

Problems Resulting from Test Errors

• False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient.– This is similar to Type I or alpha error in a test of

statistical significance (the possibility of falsely concluding that there is an effect of an intervention).

• False Negatives imply missed cases, so potentially bad outcomes if untreated– Cf. Type II or beta error: the chance of missing a true

difference

13

14

Most Tests Provide a Continuous Score. Selecting a Cutting Point

Pathologicalscores

Healthyscores

Move this way to increase sensitivity(include more of

sick group)

Move this way toincrease specificity

(exclude healthy people)

Test scores for a healthy population

Sick population

Crucial issue: changing cut-point can improve sensitivity or specificity, but never both

Possible cut-point

16

Clinical applications• A specific test can be useful to

rule in a disease. Why?

– Very specific tests give few false positives.So, if the result is positive, you can be sure the patient has the condition (‘nothing else would give this result’): “SpPin”

D + D -

a bc d

T +T -

• A sensitive test can be useful for ruling a disease out:

–A negative result on a very sensitive test (which detects all true cases) reassures you thatthe patient does not have the disease: “SnNout”

17

Your Patient’s Question:“Doctor, how likely am I to have this disease?”

This introduces Predictive Values

• Sensitivity & specificity don’t answer this, because they work from the gold standard.

• Now you need to work from the test result, but you won’t know whether this person is a true positive or a false positive (or a true or false negative). Hmmm…

How accurately does a positive (or negative) result predict disease (or health)?

18

Start from Prevalence

• Before you do any test, the best guide you have to a diagnosis is based on prevalence:

– Common conditions (in this population) are the more likely diagnosis

• Prevalence indicates the ‘pre-test probability of disease’

19

Disease present

Diseaseabsent Total

Test positive a b a+b

Test negative c d c+d

Total a+c b+d N

2 x 2 table: Prevalence

Prevalence = a+c / N

Positive and Negative Predictive Values• Based on rows, not columns

• Positive Predictive Value (PPV) = a/(a+b) = Probability that a positive score is a true positive

• NPV = d/(c+d); same for a negative test result

• BUT… there’s a big catch:

• We are now working across the columns, so PPV & NPV depend on how many cases of disease there are (prevalence).

• As prevalence goes down, PPV goes down (it’s harder to find the smaller number of cases) and NPV rises.

• So, PPV and NPV must be determined for each clinical setting,

• But they are immediately useful to clinician: they reflect this population, so tell us about this patient

D + D –

a b

c dT +T –

21

D + D -

T +

T -

50

5

10

100

Sensitivity = 50/55 = 91%Specificity = 100/110 = 91%

Prevalence = 55/165 = 33%

A. Specialist referral hospital

PPV = 50/60 = 83%NPV = 100/105 = 95%

D + D -

T +

T -

50

5

100

1000

Sensitivity = 50/55 = 91%Specificity = 1000/1100 = 91%

Prevalence = 55/1155 = 3%

B. Primary care

PPV = 50/150 = 33%NPV = 1000/1005 = 99.5%

Prevalence and Predictive Values

22

Predictive Values

• High specificity = few FPs: Sp = TN/(TN+FP);FPs also drive PPV: PPV = TP/(TP + FP);So, the clinician is more certain that a patient with a positive test has the disease (it rules in the disease)

• The higher the sensitivity, the higher the NPV:Sn = TP/(TP+FN); NPV = TN/(TN+FN); the clinician can be more confident that a patient with a negative score does not have the diagnosis (because there are few false negatives). So, high NPV can rule out a disease.

23

From the literature you can getSensitivity & Specificity.

To work out PPV and NPV for your practice, you need to guess prevalence, then work backwards:

Fill cells in following order:“Truth”

Disease Disease Total PredictivePresent Absent Values

Test Pos Test Neg

Total 1st2nd 3rd

4th

5th

(from sensitivity) (from specificity)

7th6th

8th

9th10th

11th

(from estimated prevalence)

24

Gasp…! Isn’t there an easier way to do all this…?

Yes (good!)

But first, you need a couple more concepts (less good…)

• We said that before you apply a test, prevalence gives your best guess about the chances that this patient has the disease.

• This is known as “Pretest Probability of Disease”: (a+c) / N in the 2 x 2 table:

• It can also be expressed as odds of disease: (a+c) / (b+d), as long asthe disease is rare

a b

c dN

25

This Leads to … Likelihood Ratios

• Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positive rate / false positive rate [TP / FP]

• Advantages:– Combines sensitivity and specificity into one number

– Can be calculated for many levels of the test

– Can be turned into predictive values

• LR for positive test = Sensitivity / (1-Specificity)• LR for negative test = (1-Sensitivity) / Specificity

26

Practical application: a Nomogram

1) You need the LR for this test

2) Plot the likelihood ratio on

center axis (e.g., LR+ = 20)

Example:

Post-test probability = 91%

▪3) Select pretest probability

(prevalence) on left axis

(e.g. Prevalence = 30%) ▪4) Draw line through these points to right axis to indicate post-test probability of disease

27

There is another way to combine sensitivity and specificity:Meet Receiver Operating Characteristic (ROC) curves

Work out Sen and Spec for every possible cut-point, then plot these.Area under the curve indicates the information provided by the test

1-Specificity ( = false positives)

Sens

itiv

ity

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

In an ideal test, theblue line would reach the top leftcorner.For a useless test it would lie along the diagonal: nobetter than guessing

Chaining LRs Together (1)

• Example: 45 year-old woman presents with “chest pain”

– Based on her age, pretest probability that a vague chest pain indicates CAD is about 1%

• Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest…)

– LR of this history for angina is about 100

29

The previous example:1. From the

History:

Pretest probabilityrises to 50%

based on history

She’s young;pretest

probabilityabout 1%

LR 100

30

Chaining LRs Together (2)

45 year-old woman with 1-month history of intermittent chest pain…After the history, post test probability is now about

50%. What will you do?A more precise (but also more costly) test:

• Record an ECG– Results = 2.2 mm ST-segment depression.

LR for ECG 2.2 mm result = 10.– This raises post test probability to > 90% for

coronary artery disease (see next slide)

31

The previous example: ECG Results

Now start pretest probability (i.e. 50%, prior to ECG, based onhistory)

Post-test probabilitynow rises

to 90%

Documents

1 Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2010 Note to users: you may find the additional notes