25
Selective cutoff reporting in studies of diagnostic test accuracy of depression screening Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: tools: Comparing traditional meta-analysis to Comparing traditional meta-analysis to individual patient data meta-analysis individual patient data meta-analysis Brooke Levis, MSc, PhD Candidate Brooke Levis, MSc, PhD Candidate Jewish General Hospital and McGill University Jewish General Hospital and McGill University Montreal, Quebec, Canada Montreal, Quebec, Canada

Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Embed Size (px)

Citation preview

Page 1: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools:

Comparing traditional meta-analysis to Comparing traditional meta-analysis to individual patient data meta-analysisindividual patient data meta-analysis

Brooke Levis, MSc, PhD CandidateBrooke Levis, MSc, PhD CandidateJewish General Hospital and McGill UniversityJewish General Hospital and McGill University

Montreal, Quebec, CanadaMontreal, Quebec, Canada

Page 2: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Does Selective Reporting of Data-driven Does Selective Reporting of Data-driven Cutoffs Exaggerate Accuracy? Cutoffs Exaggerate Accuracy?

The Hockey AnalogyThe Hockey Analogy

2

Page 3: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

What is Screening?What is Screening?

Illustration: This information was originally developed by the UK National Screening Committee/NHS Screening Programmes (www.screening.nhs.uk) and is used under the Open Government Licence v1.0

Purpose to identify Purpose to identify otherwise otherwise unrecognisableunrecognisable diseasedisease

By sorting out apparently By sorting out apparently well persons who well persons who probably have a condition probably have a condition from those who probably from those who probably do notdo not

Not diagnosticNot diagnostic Positive tests require Positive tests require

referral for diagnosis and, referral for diagnosis and, as appropriate, treatmentas appropriate, treatment

A program – of which a A program – of which a test is one componenttest is one component

3

Page 4: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Patient Health Patient Health Questionnaire (Questionnaire (PHQ-9PHQ-9))

Depression screening toolDepression screening tool

Scores range from 0 to 27Scores range from 0 to 27

Higher scores = Higher scores = more severe symptomsmore severe symptoms

The Patient Health Questionnaire (PHQ-9) The Patient Health Questionnaire (PHQ-9) depression screening tooldepression screening tool

4

Page 5: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Extreme scenarios:Extreme scenarios: Cutoff of ≥ 0Cutoff of ≥ 0

All subjects above cutoff All subjects above cutoff

sensitivity = 100% sensitivity = 100%

Cutoff of ≥ 27Cutoff of ≥ 27 All subjects below cutoff All subjects below cutoff

specificity = 100%specificity = 100%

Selective Reporting of Results Using Selective Reporting of Results Using Data-Driven CutoffsData-Driven Cutoffs

5

Page 6: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Does Selecting Reporting of Data-driven Does Selecting Reporting of Data-driven Cutoffs Exaggerate Accuracy?Cutoffs Exaggerate Accuracy?

Sensitivity increases from cutoff of 8 to cutoff of 11

For standard cutoff of 10, missing 897 cases (13%)

For cutoffs of 7-9 and 11, missing 52-58% of data

Manea et al., CMAJ, 20126

Page 7: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

QuestionsQuestions

Does selective cutoff reporting lead to exaggerated Does selective cutoff reporting lead to exaggerated estimates of accuracy?estimates of accuracy?

Can we identify predictable patterns of selective cutoff Can we identify predictable patterns of selective cutoff reporting?reporting?

Why does selective cutoff reporting appear to impact Why does selective cutoff reporting appear to impact sensitivity, but not specificity?sensitivity, but not specificity?

Does selective cutoff reporting transfer high Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates?homogeneous accuracy estimates?

7

Page 8: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

MethodsMethods

Data source: Data source: Studies included in published traditional meta-analysis on Studies included in published traditional meta-analysis on

the diagnostic accuracy of the PHQ-9. (Manea et al, CMAJ the diagnostic accuracy of the PHQ-9. (Manea et al, CMAJ 2012)2012)

Inclusion criteria:Inclusion criteria: Unique patient sampleUnique patient sample Published diagnostic accuracy for MDD for at least one Published diagnostic accuracy for MDD for at least one

PHQ-9 cutoffPHQ-9 cutoff

Data transfer: Data transfer: Invited authors of the eligible studies to contribute their Invited authors of the eligible studies to contribute their

original patient data (de-identified)original patient data (de-identified) Received data from 13 of 16 eligible datasets (80% of Received data from 13 of 16 eligible datasets (80% of

patients, 94% of MDD cases)patients, 94% of MDD cases)8

Page 9: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

MethodsMethods

Data preparationData preparation For each dataset, extracted PHQ-9 scores and MDD For each dataset, extracted PHQ-9 scores and MDD

diagnostic status for each patient, and information diagnostic status for each patient, and information pertaining to weightingpertaining to weighting

Statistical analyses (2 sets performed)Statistical analyses (2 sets performed) Traditional meta-analysisTraditional meta-analysis

For each cutoff between 7 and 15, included data from the For each cutoff between 7 and 15, included data from the studies that reported accuracy results for the respective studies that reported accuracy results for the respective cutoff in the original publicationcutoff in the original publication

IPD meta-analysisIPD meta-analysis For each cutoff between 7 and 15, included data from all For each cutoff between 7 and 15, included data from all

studiesstudies

9

Page 10: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

 Published data

(traditional MA)All data (IPD MA)

Cutoff# of

studies# of

patients# mdd cases

# of studies

# of patients

# mdd cases

7 4 2094 550 13 4589 1037

8 4 2094 550 13 4589 1037

9 4 1579 309 13 4589 1037

10 11 3794 723 13 4589 1037

11 5 1253 216 13 4589 1037

12 6 1388 261 13 4589 1037

13 4 1073 186 13 4589 1037

14 3 977 150 13 4589 1037

15 4 1075 193 13 4589 1037

Comparison of data availability Comparison of data availability

10

Page 11: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

MethodsMethods

Model: Model: Bivariate random-effects* Bivariate random-effects* meta-analysis modelsmeta-analysis models

Models sensitivity and specificity at the same timeModels sensitivity and specificity at the same time

Accounts for clustering by studyAccounts for clustering by study

Provides an overall pooled sensitivity and specificity for each Provides an overall pooled sensitivity and specificity for each cutoff, for the 2 sets of analysescutoff, for the 2 sets of analyses

Within each set of analyses, each cutoff requires its own modelWithin each set of analyses, each cutoff requires its own model

Estimates between study heterogeneityEstimates between study heterogeneity

Note:Note: model accounts for correlation between sensitivity and specificity model accounts for correlation between sensitivity and specificity atat each each threshold, but not for correlation of parameters threshold, but not for correlation of parameters acrossacross thresholds thresholds

*Random effects model*Random effects model: sensitivity & specificity assumed to vary across primary studies: sensitivity & specificity assumed to vary across primary studies

11

Page 12: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

QuestionsQuestions

Does selective cutoff reporting lead to exaggerated Does selective cutoff reporting lead to exaggerated estimates of accuracy?estimates of accuracy?

Can we identify predictable patterns of selective cutoff Can we identify predictable patterns of selective cutoff reporting?reporting?

Why does selective cutoff reporting appear to impact Why does selective cutoff reporting appear to impact sensitivity, but not specificity?sensitivity, but not specificity?

Does selective cutoff reporting transfer high Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates?homogeneous accuracy estimates?

12

Page 13: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Comparison of Diagnostic Accuracy Comparison of Diagnostic Accuracy

Published data (traditional MA)

All data (IPD MA)

Cutoff

N studies

Sens SpecCuto

ffN

studiesSen

sSpec

7 4 0.85 0.73 7 130.97

0.73

8 4 0.79 0.78 8 130.93

0.78

9 4 0.78 0.82 9 130.89

0.83

10 11 0.85 0.88 10 130.87

0.88

11 5 0.92 0.90 11 130.83

0.90

12 6 0.82 0.92 12 130.77

0.92

13 4 0.82 0.94 13 130.67

0.94

14 3 0.71 0.97 14 130.59

0.96

15 4 0.61 0.98 15 130.52

0.97

13

Page 14: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Comparison of ROC CurvesComparison of ROC Curves

14

Page 15: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

QuestionsQuestions

Does selective cutoff reporting lead to exaggerated Does selective cutoff reporting lead to exaggerated estimates of accuracy?estimates of accuracy?

Can we identify predictable patterns of selective Can we identify predictable patterns of selective cutoff reporting?cutoff reporting?

Why does selective cutoff reporting appear to impact Why does selective cutoff reporting appear to impact sensitivity, but not specificity?sensitivity, but not specificity?

Does selective cutoff reporting transfer high Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates?homogeneous accuracy estimates?

15

Page 16: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Publishing trends by studyPublishing trends by study

16

Page 17: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Comparison of Sensitivity by Cutoff Comparison of Sensitivity by Cutoff

17

Page 18: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

QuestionsQuestions

Does selective cutoff reporting lead to exaggerated Does selective cutoff reporting lead to exaggerated estimates of accuracy?estimates of accuracy?

Can we identify predictable patterns of selective cutoff Can we identify predictable patterns of selective cutoff reporting?reporting?

Why does selective cutoff reporting appear to impact Why does selective cutoff reporting appear to impact sensitivity, but not specificity?sensitivity, but not specificity?

Does selective cutoff reporting transfer high Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates?homogeneous accuracy estimates?

18

Page 19: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Comparison of Diagnostic Accuracy Comparison of Diagnostic Accuracy

19

Published data (traditional MA)

All data (IPD MA)

Cutoff

N studies

Sens SpecCuto

ffN

studiesSen

sSpec

7 4 0.85 0.73 7 130.97

0.73

8 4 0.79 0.78 8 130.93

0.78

9 4 0.78 0.82 9 130.89

0.83

10 11 0.85 0.88 10 130.87

0.88

11 5 0.92 0.90 11 130.83

0.90

12 6 0.82 0.92 12 130.77

0.92

13 4 0.82 0.94 13 130.67

0.94

14 3 0.71 0.97 14 130.59

0.96

15 4 0.61 0.98 15 130.52

0.97

Page 20: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

Why Sensitivity Changes with Why Sensitivity Changes with Moving Cutoffs, but Not SpecificityMoving Cutoffs, but Not Specificity

20

Page 21: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

QuestionsQuestions

Does selective cutoff reporting lead to exaggerated Does selective cutoff reporting lead to exaggerated estimates of accuracy?estimates of accuracy?

Can we identify predictable patterns of selective cutoff Can we identify predictable patterns of selective cutoff reporting?reporting?

Why does selective cutoff reporting appear to impact Why does selective cutoff reporting appear to impact sensitivity, but not specificity?sensitivity, but not specificity?

Does selective cutoff reporting transfer high Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates?homogeneous accuracy estimates?

21

Page 22: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

HeterogeneityHeterogeneity

22

Page 23: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

SummarySummary

Selective cutoff reporting in depression screening tool DTA Selective cutoff reporting in depression screening tool DTA studies may distort accuracy across cutoffs.studies may distort accuracy across cutoffs.

It will lead to exaggerated estimates of accuracy.It will lead to exaggerated estimates of accuracy.

These distortions were relatively minor in the PHQ, but would These distortions were relatively minor in the PHQ, but would likely be much larger for other measures where standard likely be much larger for other measures where standard cutoffs are less consistently reported and more data-driven cutoffs are less consistently reported and more data-driven reporting seems to occur (e.g., HADS).reporting seems to occur (e.g., HADS).

IPD meta-analysis can address this and will allow subgroup-IPD meta-analysis can address this and will allow subgroup-based accuracy evaluation.based accuracy evaluation.

23

Page 24: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

SummarySummary

STARD undergoing revision:STARD undergoing revision:

Needs to require precision-based sample size Needs to require precision-based sample size calculation to avoid very small samples – particularly calculation to avoid very small samples – particularly number of cases – and unstable estimatesnumber of cases – and unstable estimates

Needs to require reporting of spectrum of cutoffs, Needs to require reporting of spectrum of cutoffs, which is easily done with online appendiceswhich is easily done with online appendices

24

Page 25: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test

AcknowledgementsAcknowledgements

Brett ThombsBrett Thombs Andrea BenedettiAndrea Benedetti Roy ZiegelsteinRoy Ziegelstein Pim CuijpersPim Cuijpers Simon GilbodySimon Gilbody John IoannidisJohn Ioannidis

Alex LevisAlex Levis Danielle RiceDanielle Rice

Scott PattenScott Patten Dean McMillanDean McMillan Ian ShrierIan Shrier Russell SteeleRussell Steele Lorie KlodaLorie Kloda

DEPRESSD Investigators

Other Contributors

25