5
Journal of Evidence-Based Medicine ISSN 1756-5391 METHODOLOGY Understanding GRADE: an introduction Gabrielle Goldet and Jeremy Howick Department of Primary Health Sciences, University of Oxford, Oxford, UK Keywords Evidence-based medicine; GRADE; randomized controlled trial; systematic review. Correspondence Jeremy Howick, Department of Primary Care Health Sciences, New Radcliffe House, University of Oxford, Oxford OX2 2GG, UK. Tel: 01865 289 258 Fax: 01865 289 287 Email: [email protected] Financial aid J Howick is funded by the National Institute of Health (UK). G Goldet received no funding for this work. Author contributions This was a collaborative effort. J Howick (guarantor) wrote the first draft, and the paper was developed in meetings and correspondence between J Howick and G Goldet. Conflict of interest The authors declare that they have no conflicts of interest in the research. Received 24 November 2012; accepted for publication 11 January 2013. doi: 10.1111/jebm.12018 Abstract Objective: Grading of recommendations, assessment, development, and evalua- tions (GRADE) is arguably the most widely used method for appraising studies to be included in systematic reviews and guidelines. In order to use the GRADE sys- tem or know how to interpret it when reading reviews, reading several articles and attending a workshop are required. Moreover, the GRADE system is not covered in standard medical textbooks. Here, we explain GRADE concisely with the use of examples so that students and other researchers can understand it. Background: In order to use or interpret the GRADE system, reading several articles and attending a workshop is currently required. Moreover, the GRADE system is not covered in standard medical textbooks. Methods: We read, synthesized, and digested the GRADE publications and con- tacted GRADE contributors for explanations where required. We composed a di- gested version of the system in a concise way a general medical audience could understand. Results: We were able to explain the GRADE basics clearly and completely in under 1500 words. Conclusions: While advanced critical appraisal requires judgment, training, and practice, it is possible for a non-specialist to grasp GRADE basics very quickly. Introduction The Grading of recommendations, assessment, development, and evaluations (GRADE) system is emerging as the dom- inant method for appraising controlled studies and mak- ing recommendations for systematic reviews and guidelines (1–12). Reading the series of publications dedicated to ex- plaining GRADE (2–12) and perhaps attending a workshop are usually required to grasp the GRADE system in suffi- cient detail to be able to use it for critical appraisal. There are no concise explanations of GRADE intended for a gen- eral medical audience and medical textbooks do not include explanations of GRADE. GRADE may thus elude students, doctors, and policy makers who do not have the time to study the literature or attend the workshops. Yet anyone reading a systematic review or evaluating guideline recommenda- tions needs to understand GRADE in order to appraise the review or guideline. Taking all appraisal skills out of the hands of clinicians and into the hands of expert reviewers is undesirable from an Evidence-Based Medicine perspective. Therefore a clear, brief, and accurate description of GRADE is required. What is GRADE? GRADE is a method used by systematic reviewers and guide- line developers to assess the quality of evidence and decide 50 JEBM 6 (2013) 50–54 C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University

Understanding GRADE: an introduction

  • Upload
    jeremy

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Understanding GRADE: an introduction

Journal of Evidence-Based Medicine ISSN 1756-5391

METHODOLOGY

Understanding GRADE: an introductionGabrielle Goldet and Jeremy Howick

Department of Primary Health Sciences, University of Oxford, Oxford, UK

Keywords

Evidence-based medicine; GRADE;randomized controlled trial; systematicreview.

Correspondence

Jeremy Howick, Department of Primary CareHealth Sciences, New Radcliffe House,University of Oxford, Oxford OX2 2GG, UK.Tel: 01865 289 258Fax: 01865 289 287Email: [email protected]

Financial aid

J Howick is funded by the National Instituteof Health (UK). G Goldet received no fundingfor this work.

Author contributions

This was a collaborative effort. J Howick(guarantor) wrote the first draft, and thepaper was developed in meetings andcorrespondence between J Howick and GGoldet.

Conflict of interest

The authors declare that they have noconflicts of interest in the research.

Received 24 November 2012; accepted forpublication 11 January 2013.

doi: 10.1111/jebm.12018

Abstract

Objective: Grading of recommendations, assessment, development, and evalua-tions (GRADE) is arguably the most widely used method for appraising studies tobe included in systematic reviews and guidelines. In order to use the GRADE sys-tem or know how to interpret it when reading reviews, reading several articles andattending a workshop are required. Moreover, the GRADE system is not coveredin standard medical textbooks. Here, we explain GRADE concisely with the use ofexamples so that students and other researchers can understand it.Background: In order to use or interpret the GRADE system, reading severalarticles and attending a workshop is currently required. Moreover, the GRADEsystem is not covered in standard medical textbooks.Methods: We read, synthesized, and digested the GRADE publications and con-tacted GRADE contributors for explanations where required. We composed a di-gested version of the system in a concise way a general medical audience couldunderstand.Results: We were able to explain the GRADE basics clearly and completely inunder 1500 words.Conclusions: While advanced critical appraisal requires judgment, training, andpractice, it is possible for a non-specialist to grasp GRADE basics very quickly.

Introduction

The Grading of recommendations, assessment, development,and evaluations (GRADE) system is emerging as the dom-inant method for appraising controlled studies and mak-ing recommendations for systematic reviews and guidelines(1–12). Reading the series of publications dedicated to ex-plaining GRADE (2–12) and perhaps attending a workshopare usually required to grasp the GRADE system in suffi-cient detail to be able to use it for critical appraisal. Thereare no concise explanations of GRADE intended for a gen-eral medical audience and medical textbooks do not includeexplanations of GRADE. GRADE may thus elude students,doctors, and policy makers who do not have the time to study

the literature or attend the workshops. Yet anyone readinga systematic review or evaluating guideline recommenda-tions needs to understand GRADE in order to appraise thereview or guideline. Taking all appraisal skills out of thehands of clinicians and into the hands of expert reviewers isundesirable from an Evidence-Based Medicine perspective.Therefore a clear, brief, and accurate description of GRADEis required.

What is GRADE?

GRADE is a method used by systematic reviewers and guide-line developers to assess the quality of evidence and decide

50 JEBM 6 (2013) 50–54 C© 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University

Page 2: Understanding GRADE: an introduction

G. Goldet and J. Howick Understanding GRADE: an introduction

whether to recommend and intervention (1). GRADE differsfrom other appraisal tools for three reasons: (i) because itseparates quality of evidence and strength of recommenda-tion, (ii) the quality of evidence is assessed for each outcome,and (iii) observational studies can be ‘upgraded’ if they meetcertain criteria.

Using GRADE

The GRADE method involves five distinct steps that weexplain here with examples.

Step 1Assign an a priori ranking of ‘high’ to randomized con-trolled trials and ‘low’ to observational Studies. Random-ized controlled trials are initially assigned a higher grade be-cause they are usually less prone to bias than observationalstudies (13, 14).

Step 2‘Downgrade’ or ‘upgrade’ initial ranking. It is commonfor randomized controlled trials and observational studiesto be downgraded because they suffer from identifiable bias(15). Also, observational studies can be upgraded when mul-tiple high-quality studies show consistent results (16–19).

Reasons to ‘downgrade’

� Risk of bias

◦ Lack of clearly randomized allocation sequence◦ Lack of blinding◦ Lack of allocation concealment◦ Failure to adhere to intention-to-treat analysis◦ Trial is cut short◦ Large losses to follow-up.

There is strong evidence supporting the view that lackof randomization, lack of allocation concealment, and lackof blinding biases results (20). Intention-to-treat analysis isalso important to avoid bias arising when those droppingout of trials due to harmful effects are not accounted for(21). Similarly, interim results from trials that are cut shortoften lead to overestimated effect sizes (22). Large losses dueto follow-up also lead to exaggerated effect estimates. Forexample, in a randomized controlled trial comparing weightloss programs in obese participants, 27% of participants werelost to follow-up in a year. There is a putative positive biasconferred on this study as participants benefiting from thetreatment were more likely to stay in the trial (23).

� Inconsistency when there is significant and unexplainedvariability in results from different trials. For example,a meta-analysis of early pharmacotherapy for preventionof type 2 diabetes identified heterogeneity in the results

from different trials and could not assert whether treatmentbenefits outweighed the harms (24).

� Indirectness of evidence can refer to severalthings.

◦ An indirect comparison of two drugs. If there are notrials comparing drugs A and B directly, we infer acomparison based on separate trials comparing drugA with placebo and drug B with placebo. Howeverthe validity of such an inference depends on the oftenunjustified assumption that the two trial populationswere similar.

◦ An indirect comparison of population, outcome, orintervention, for example, the studies in the reviewinvestigated one intervention, in one population witha certain outcome, but the conclusions of the stud-ies are intended to be applicable for related inter-ventions, populations, or outcomes. For example,the American College of Chest Physicians (ACCP)downgraded the quality of the evidence for com-pression stocking use in trauma patients from highto moderate because all the randomized controlledtrials had been performed on the general population(25).

� Imprecision when wide confidence intervals mar the qual-ity of the data.

� Publication bias when studies with ‘negative’ findingsremain unpublished. This can bias the review outcome(26). For example, Turner et al examined all antidepressanttrials registered by the US Food and Drug Administration.Of 38 trials with positive results, all but one was published,whereas of the 36 trials with negative results, 22 wereunpublished (27)—a systematic review of all publishedstudies would give a biased result.

Reasons to ‘upgrade’

� Large effect when the effect is so large that bias com-mon to observational studies cannot possibly account forthe result. For example, observational studies examiningthe effect of antithrombotic therapy on thromboembolicevents in patients having undergone cardiac valve replace-ment revealed an 80% relative risk reduction, which wasused by the ACCP to rank the quality of evidence as high(28).

� Dose–response relationship when the result is propor-tional to the degree of exposure. For example, the Nurses’Health Study demonstrated a strong negative correlationbetween physical activity and stroke risk—that is the morephysically active the nurses were, the smaller the risk ofstroke (29).

� All plausible biases only reducing an apparent treat-ment effect when all possible confounders would onlydiminish the observed effect. It is likely thus that the

JEBM 6 (2013) 50–54 C© 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University 51

Page 3: Understanding GRADE: an introduction

Understanding GRADE: an introduction G. Goldet and J. Howick

actual effect is larger than the data suggests. For exam-ple, a systematic review of studies in 26,000 hospitalsfound mortality was higher in for-profit private hospi-tals compared to non-profit public hospitals. Investiga-tors suspected teaching hospitals might have confoundedthe study because they are nonprofit and were suspectedto have low mortality. When teaching hospitals were re-moved from the analysis, however, the relative mortalityof for-profit hospitals actually increased (30).

Step 3Assign final grade for the quality of evidence as ‘high’,‘moderate’, ‘low’, or ‘very low’ for all the critically im-portant outcomes (Box 1) (31). Upgrading and downgradingrequires careful judgment and thought. A rule of thumb is tomove up or down one category per issue. For example, thequality of evidence for an outcome studied in randomizedcontrolled trials that initially starts as ‘high’ might moveto ‘moderate’ because of high risk of bias in the includedstudies, and further down to ‘low’ if there was significantunexplained heterogeneity between the trials, and even fur-ther down to ‘very low’ if there were few events leading toconfidence intervals including both important benefits andharms.

Box 1 Final GRADE rankingHigh we are very confident that the effect in the study

reflects the actual effect.Moderate we are quite confident that the effect in the

study is close to the true effect, but it is also possible it is substantially different.Low the true effect may differ significantly from the

estimate.Very low the true effect is likely to be substantially

different from the estimated effect.

Step 4Consider other factors that impact on the strength of rec-ommendation for a course of action. High-quality evidencedoes not always imply a strong recommendation. Recom-mendations must consider factors besides to the quality ofevidence (3). The first factor to consider is the balance be-tween desirable and undesirable effects. Some interventions,such as antibiotics to prevent urinary infections associatedwith certain urologic procedures, have clear benefits and fewside effects and therefore a strong recommendation is uncon-troversial (32, 33). In cases where the benefit to harm ratiois less clear, then patient values and preferences, as well ascosts, need to be considered carefully (Figure 1).

Step 5Make a ‘strong’ or ‘weak’ recommendation (34). Whenthe net benefit of the treatment is clear, patient values andcircumstances are unlikely to affect the decision to take atreatment and a strong recommendation is appropriate. Forexample, high-dose chemotherapy for testicular cancer isoften highly recommended for the increased likelihood ofsurvival in spite of chemotherapy’s toxicity (32, 35). Oth-erwise, if the evidence is weak or the balance of positiveand negative effects is vague, a weak recommendation isappropriate. Consider the following example (borrowed fromthe GRADE literature) concerning whether to recommendscreening for melanoma in primary care in the United States(36, 37). The baseline risk for this population is 13.3 per100,000, and there is weak evidence for the accuracy ofscreening and the outcome of lethal melanoma; potentialharms from screening is lacking yet likely to include harm-ful consequences of false positive tests (38). Based onthis lack of strong evidence, the recommendation to screenor not to screen will not be strong, and may require in-put from patients, friends, and family members about the

Downgrade for:• Risk of bias• Inconsistency• Indirectness• Imprecision• Publica�on bias

Upgrade for:• Large consistent effect• Dose response• Confounders only

reducing size of effect

High

Moderate

Low

Very low

Balance of desirable and undesirable effects

Cost-effec�veness

Preference of pa�ents

Strong for using

Weak for using

Strong against using

Weak against using

STEP 1: a priori ranking

STEP 2: upgrade/downgrade

STEP 3: assign final

grade

STEP 4: consider factors affec�ng recommenda�on

STEP 5: make recommenda�on

Randomised controlled trial: HIGH

Observa�onal study: LOW

Figure 1 How GRADE is used to make recommendations; steps 1 to 3 are repeated for each critical outcome.

52 JEBM 6 (2013) 50–54 C© 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University

Page 4: Understanding GRADE: an introduction

G. Goldet and J. Howick Understanding GRADE: an introduction

different treatment options. Some people might choose torecommend not screening because of the lack of strong ev-idence for benefit and potential harms. Others might rec-ommend screening because they value the potential benefitsmore highly (34).

Summary

GRADE is becoming increasingly popular for guideline de-velopers and systematic reviewers. It is thus important foranyone (students, clinicians, and policy makers) to under-stand GRADE. Advanced critical appraisal (using GRADEor any other system) requires sound judgment, training, andpractice. However it is possible to grasp GRADE basics usingthis concise and accurate summary.

Acknowledgement

We are very grateful to Gunn Vist, who provided expertadvice on the GRADE system and editorial suggestions.

References

1. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emergingconsensus on rating quality of evidence and strength ofrecommendations. BMJ 2008; 336(7650): 924−6.

2. Schunemann HJ, Oxman AD, Vist GE, et al. Interpreting resultsand drawing conclusions. Cochrane Handbook for Systematic

Reviews of Interventions 2008; 359−87.3. Balshem H, Helfand M, Schunemann HJ, et al. GRADE

guidelines: 3. Rating the quality of evidence. J Clin Epidemiol

2011; 64(4): 401−6.4. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1.

Introduction-GRADE evidence profiles and summary of findingstables. J Clin Epidemiol 2011; 64(4): 383−94.

5. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 2.Framing the question and deciding on important outcomes. J

Clin Epidemiol 2011; 64(4): 395−400.6. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines 6.

Rating the quality of evidence—imprecision. J Clin Epidemiol

2011; 64(12): 1283−93.7. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 7.

Rating the quality of evidence—inconsistency. J Clin Epidemiol

2011; 64(12): 1294−302.8. Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines:

5. Rating the quality of evidence—publication bias. J Clin

Epidemiol 2011; 64(12): 1277−82.9. Guyatt GH, Oxman AD, Schunemann HJ, Tugwell P, Knottnerus

A. GRADE guidelines: a new series of articles in the Journal ofClinical Epidemiology. J Clin Epidemiol 2011; 64(4): 380−2.

10. Guyatt GH, Oxman AD, Sultan S, et al. GRADE guidelines: 9.Rating up the quality of evidence. J Clin Epidemiol 2011;64(12): 1311−6.

11. Guyatt GH, Oxman AD, Vist G, et al. GRADE guidelines: 4.Rating the quality of evidence—study limitations (risk of bias). J

Clin Epidemiol 2011; 64(4): 407−15.12. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 8.

Rating the quality of evidence—indirectness. J Clin Epidemiol

2011; 64(12): 1303−10.13. Odgaard-Jensen J, Vist GE, Timmer A, et al. Randomisation to

protect against selection bias in healthcare trials. Cochrane

database of systematic reviews 2011; (4): MR000012.14. Howick J. The Philosophy of Evidence-Based Medicine. Oxford:

Wiley-Blackwell, 2011.15. Ioannidis JP. Why most published research findings are false.

PLoS Med 2005; 2(8): e124.16. Doll R, Hill AB. Smoking and carcinoma of the lung;

preliminary report. BMJ 1950; 2(4682): 739−48.

17. Doll R, Hill AB. A study of the aetiology of carcinoma of thelung. Pak J Health 1953; 3(2): 65−94.

18. Doll R, Hill AB. The mortality of doctors in relation to theirsmoking habits; a preliminary report. BMJ 1954; 1(4877):1451−5.

19. Doll R, Hill AB. Lung cancer and other causes of death inrelation to smoking; a second report on the mortality of Britishdoctors. BMJ 1956; 2(5001): 1071−81.

20. Savovic J, Jones HE, Altman DG, et al. Influence of reportedstudy design characteristics on intervention effect estimates fromrandomized, controlled trials. Ann Intern Med 2012; 157(6):429−38.

21. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ

2001; 165(10): 1339−41.22. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM.

Problems of stopping trials early. BMJ 2012; 344:e3863.

23. Heshka S, Anderson JW, Atkinson RL, et al. Weight loss withself-help compared with a structured commercial program: arandomized trial. JAMA 2003; 289(14): 1792−8.

24. Krentz AJ. Some oral antidiabetic medications may prevent type2 diabetes in people at high risk for it, but the evidence isinsufficient to determine whether the potential benefits outweighthe possible risks. Evid Based Med 2011; 28:948−64.

25. Guyatt G, Gutterman D, Baumann MH, et al. Grading strengthof recommendations and quality of evidence in clinicalguidelines: report from an american college of chest physicianstask force. Chest 2006; 129(1): 174−81.

26. Scherer Roberta W, Langenberg P, von Elm E. Full publicationof results initially presented in abstracts. Cochrane Database of

Systematic Reviews 2007, Issue 2. Available from http://www.mrw.interscience.wiley.com/cochrane/clsysrev/articles/MR000005/frame.html.

27. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R.Selective publication of antidepressant trials and its influence onapparent efficacy. N Engl J Med 2008; 358(3):252−60.

28. Hirsh J, Guyatt G, Albers GW, Harrington R, Schunemann HJ;American College of Chest Physician. Antithrombotic and

JEBM 6 (2013) 50–54 C© 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University 53

Page 5: Understanding GRADE: an introduction

Understanding GRADE: an introduction G. Goldet and J. Howick

thrombolytic therapy: American College of Chest PhysiciansEvidence-Based Clinical Practice Guidelines (8th Edition). Chest

2008; 133(6 Suppl): 110S−2S.29. Hu FB, Stampfer MJ, Colditz GA, et al. Physical activity and

risk of stroke in women. JAMA 2000; 283(22):2961−7.

30. Devereaux PJ, Choi PT, Lacchetti C, et al. A systematic reviewand meta-analysis of studies comparing mortality rates of privatefor-profit and private not-for-profit hospitals. CMAJ 2002;166(11): 1399−406.

31. Guyatt GH, Oxman AD, Kunz R, et al. What is “quality ofevidence” and why is it important to clinicians? BMJ 2008;336(7651): 995−8.

32. Canfield SE, Dahm P. Rating the quality of evidence and thestrength of recommendations using GRADE. World J Urol 2011;29(3): 311−7.

33. Wolf JS, Bennett CJ, Dmochowski RR, et al. Best practicepolicy statement on urologic surgery antimicrobial prophylaxis.J Urol 2008; 179(4): 1379−90.

34. Guyatt GH, Oxman AD, Kunz R, et al. Going from evidence torecommendations. BMJ 2008; 336(7652): 1049−51.

35. Feldman DR, Bosl GJ, Sheinfeld J, Motzer RJ. Medicaltreatment of advanced testicular cancer. JAMA 2008; 299(6):672−84.

36. Atkins D, Best D, Briss PA, et al. Grading quality of evidenceand strength of recommendations. BMJ 2004; 328(7454):1490–498.

37. Helfand M, Mahon S, Eden K. Screening for Skin Cancer. Am J

Prev Med 2001; 20(3): 47–58.38. Evans I, Thornton H, Chalmers I. Testing Treatments: Better

Research for Better Healthcare. London: Pinter and Martin,2010.

54 JEBM 6 (2013) 50–54 C© 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University