Evaluation of Diagnostic Tests Presenter: Akash Ranjan Moderator: Dr Chetna Maliye

Evaluation of Diagnostic Tests

Presenter: Akash RanjanModerator: Dr Chetna Maliye

Framework

Introduction Determining useful Diagnostic Test Evaluation of Diagnostic Test Gold Standard Measure of Diagnostic Accuracy ROC Curve Multiple Testing Reliability of Test Relationship between Reliability and Validity References

Correctly classifying individuals by Disease Status

Tests are used in medical diagnosis, screening and research to

classified subjects in to disease or non-diseased group Ideally, all subjects who have the disease should be classified

as “having the disease” and vice-versa

Diagnostic Test and Screening Test

A diagnostic test is used to determine the presence or absence of a disease when a subject shows signs or symptoms of a disease

A screening test identifies asymptomatic individuals who may have the disease

The diagnostic test is performed after a positive screening test to establish a definitive diagnosis

Useful Diagnostic Test

Reproducibility Accuracy Feasibility Effects on clinical decisions Outcomes

Evaluation of Diagnostic Test

Ability to classify individuals in to correct disease status in reliable manner

Help to make decisions about their use and interpretation By determining validity and reliability. Validity Internal Validity External Validity

Reliability

Simplify Data

Many test results have a continuous, ordinal or continuous variables Complex data are reduce to simple dichotomy

Present/ Absent Abnormal/ Normal Disease/ Well.

Distribution of Systolic Blood Pressures: Males, Ages 40–64

Gold Standard

Accuracy of a test established by independent comparison with “Gold Standard”

Ideally, Gold Standard is 100% accurate test Practically, sensitivity and specificity tend to be 100%

Histopathology Cytopathology Radiologic contrast procedures Prolong follow up Autopsy

All people with disease

All people without disease

+Disease

Measure of Diagnostic Accuracy

Comparison of Disease status: Gold Standard test and Index test

a (True positives)

b (False Positives)

c (False Negative)

d (True Negative)

+Disease

+IndexTest

Sensitivity

Proportion of people with the disease, who have positive test result for the disease

A sensitive test will rarely miss people with the disease

Used when there is an important penalty for missing the disease eg. Ca Cervix, Breast Cancer, HIV

Sensitivity = a a + c

Specificity

The proportion of people without the disease, who have negative test result

useful to confirm ( “rule in” ) a diagnosis

For screening a prevalent dis like DM when false positive results can harm the patients, physically and

financially eg. Cancer Chemotherapy

Specificity = d b + d

Factors establishing Sensitivity and Specificity

Spectrum of Patients Test may not distinguish when differences are subtle between

patients

Bias Sn & Sp of test should be assessed separately, not be part

of information in making diagnosis eg x ray

Chance

Small sample Size Confidence Interval

Trade-off between Sensitivity and Specificity

Sensitivity can be increased only at the expense of Specificity

Trade-off between Sensitivity and Specificity when diagnosing Diabetes

Blood Sugar after fasting 8 hour

Sensitivity (%) Specificity(%)

ROC Curve

ROC Curve

By Plotting Sensitivity against false positive rate (1-Sp) over a range of cut off values

Test that discriminate well, crowd towards the upper right corner of the curve

Tests that performs less well have curves that fall closer to diagonal running from lower left to upper right.

shows how severe trade off between Sn & Sp To decide where best cut off point should be Generally it is near the shoulder of ROC curve, unless there are

clinical reasons for minimizing either false negative or false positives

ROC Curve

In comparing alternative tests for same diagnosis Area under the ROC curve-larger the area, better the test

Predictive Accuracy (“Clinician’s dilemma”)

Positive predictive value - Probability of disease in a patient with positive test result.

Reflects the diagnostic power of a test Depends on Sn & Sp Directly proportional to disease prevalence in population

PPV= a a + b

Predictive Accuracy

PPV = Sensitivity x Prevalence

(Sensitivity x Prevalence) + (1- Sp) x (1-Pr)

Predictive Accuracy

Predictive Accuracy

Negative predictive value- Probability that the patient with Negative test result do not have the disease.

Reflect the diagnostic power of test Depends on Sn & Sp Inversely proportional to disease prevalence in population

NPV= d c + d

Likelihood Ratios

Positive Likelihood ratio(LR+): Ratio of proportion of diseased people with a positive test result (Sn) to the proportion of non diseased people with a positive test result (1-Sp)

Negative Likelihood ratio(LR-):proportion of diseased people with a negative test result (1-Sn) devided by proportion of non diseased people with a negative test result (Sp)

LR+ = Sn 1- Sp

LR- = 1 – Sn Sp

Likelihood Ratios

Example: A positive test is about 2.6 times more likely to be found in presence of DVT (Deep vein thrombosis) than in absence of it.

Advantages of LR’s Not change with changes in the prevalence Can be used at multiple levels of test results describing the overall odds of disease when a series of

diagnostic test is used.

Likelihood Ratios

Techniques of using LR’s Mathematical approach Using a likelihood ratio nomogram

Disease

+ -

Test + 34 168

- 1 282

Sn=97%, Sp= 63%, Pv=7%, PPV= 17%, NPV= 100%, LR+ = 2.6, LR- =0.05

Step1: Convert pretest probability to pretest odds Odds= 0.075Step2: Post test odds= Pretest odds x LR+ = 0.075 X 2.6 = 0.195Step3: Convert Post test odds to post test probability P= 0.195/ (1+0.195) = 16%

Likelihood Ratios

Using a likelihood ratio nomogram

Multiple Tests

Single test frequently results in a probability of disease that is neither very high nor very low

Physician raise or lower the probability of disease in such situations Multiple tests helps the clinicians in this regard Applied in in two basic ways

Parallel testing: (All at once) Serial Testing: (Consecutive)

Multiple Tests

Parallel testing: (All at once) A positive result of any test is considered evidence for disease Rapid assessment is needed eg. hospitalized or emergency patients useful when need for a very sensitive strategy Net effect is a more sensitive diagnostic strategy

Serial Testing: (Consecutive) Decision to order next test in series based on results of previous

test All tests must give a positive result in order for diagnosis to be

made Maximizes Sp and PPV, but lowers Sn and NPV

Multiple Tests

Reliability of a test

Reliability/ Repeatability- Test is able to give same result again and again.

Regardless of Sn and Sp of a test, if the test result can not be reproduced, the value and usefulness of the test are minimal

Factors contribute to the variation between test results Intra subject variation (with in individual subjects) Intra observer variation Inter observer variation (variation between those reading

test result).

Reliability of a test

Intra subject variation

Therefore, in evaluating any test result, it is important to consider conditions under which the test was performed, including the time of day

Table: Examples showing variation in Blood Pressure reading during a 24-Hour PeriodBlood Pressure (mmHg)

Female Aged 27 Yr

Female Aged 62 Yr

Male Aged 33 Yr

Basal 110/70 132/82 152/109Lowest Hour 86/47 102/61 123/78Highest Hour 126/79 172/94 153/107Casual 108/64 155/93 157/109

Reliability of test

Intra observer variation Variation occurs between two observations made by the same

observer Eg. A radiologist who reads the same group of x rays at two different

times, may read one or more x ray differently at second time. Tests and examinations differ in the degree to which subjective factors enter in to observer’s conclusion, greater the subjective element in the reading, greater the intra observer variation in reading is likely to be.

Reliability of test

Inter observer variation Variation between observers Measures extent to which observers agree or disagree in

quantitative terms. Kappa Statistics (Kappa measure of agreement) Difference between observed and expected agreement

expressed as a fraction of the maximum difference. Since the maximum value of I0 is 1, this gives K = I0 – Ie / 1- Ie

Relationship between Validity and Reliability

Reliability/ Repeatability- Test is able to give same result again and again.

Validity- Test is able to measure what it is intended to

Comparison of reliability and validity using graphical presentation

When the reliability of a test is poor, the validity of the test for a given individual also be poor.

References

Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. Geveva: World Health Organization; 1993.

Fletcher RH, Fletcher SW. Clinical Epidemiology- The essentials. Third ed. Baltimore: Lippincott Williams and Williams; 1996. 35-56 p.

Gordis L. Epidemiology. Pennsylvania: Elsever Saunders; 2004. 71-94p. Armitage P, Berry G. Statistical Methods in Medical Research. Third ed.

London: Blackwell Scientific Publications; 1994.445p

Documents

Evaluation of Diagnostic Tests Presenter: Akash Ranjan Moderator: Dr Chetna Maliye