126
Whole genome sequencing for epidemiological studies of tuberculosis: a systematic review of reporting practices and factors associated with reporting quality of STROME-ID Brianna Cheng Department of Epidemiology, Biostatistics, and Occupational Health McGill University, Montréal, Canada April 2020 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science © Brianna Cheng, 2020

Cheng Brianna EBOH final - escholarship.mcgill.ca

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Whole genome sequencing for epidemiological studies of tuberculosis: a systematic review of

reporting practices and factors associated with reporting quality of STROME-ID

Brianna Cheng

Department of Epidemiology, Biostatistics, and Occupational Health

McGill University, Montréal, Canada

April 2020

A thesis submitted to McGill University in partial fulfillment of the requirements of the degree

of Master of Science

© Brianna Cheng, 2020

1

Abstract

Background: Whole-genome sequencing (WGS) has the potential to improve the understanding

of tuberculosis (TB) epidemiology. However, standardized reporting is necessary to enhance the

reproducibility and interpretation of WGS results in genomic epidemiology

studies of TB to better inform public health decision-making. In 2014, guidelines called

STROME-ID were published to provide recommendations for reporting in genomic

epidemiology studies. Reporting practices before and after its publication were compared, and

the correlation between STROME-ID reporting quality and study-level characteristics were also

explored.

Methods: This study is registered on PROSPERO (CRD42017064395). MEDLINE, Embase

Classic and Embase were searched on May 3, 2017 (updated April 23, 2019). 976 titles and

abstracts were screened, with 114 full-texts eligible for inclusion. The proportion of STROME-

ID criteria reported was tabulated for each article, and differences in means were compared

before and after STROME-ID’s publication date using a t-test. A 6-month lag period after

STROME-ID was included to account for articles in-press; sensitivity analyses were also

performed. Quasi-Poisson and tobit regression were used to assess whether h-index (HI), journal

impact factor (IF), sample size (SS), and geographic region of the senior author’s primary

affiliation were correlated with the count and proportion of STROME-ID criteria met.

Results: The proportion of applicable criteria met in included articles ranged from 16.3-75.0%

(mean 49.9%, ± 11.88%), with no difference between mean proportions of criteria comparing

before and after guideline publication. HI was not included in the adjusted regression analysis.

Only SS was significantly associated with a greater proportion of STROME-ID criteria met.

Conclusion: Reporting quality in genomic epidemiology studies of tuberculosis is variable,

despite publication of STROME-ID guidelines. Future studies should investigate factors

affecting adherence to these guidelines to improve the value and utility of evidence. Journal

endorsement may be needed to support this.

2

Résumé

Context: Le séquençage du génome entier (SGE) possède le potentiel d'améliorer la

compréhension de l'épidémiologie de la tuberculose (TB). Cependant, des rapports standardisés

sont nécessaires pour améliorer la reproductibilité et l'interprétation des résultats du SGE dans

les études épidémiologie génomiques de la tuberculose. Cela peut mieux éclairer la prise de

décision en matière de santé publique. En 2014, des lignes directrices appelées STROME-ID ont

été publiées pour fournir des recommandations sur la notification dans les études

d'épidémiologie génomique. Les pratiques de déclaration avant et après sa publication ont été

comparées, et la corrélation entre la qualité de la déclaration STROME-ID et les caractéristiques

au niveau de l'étude a également été explorée.

Methodes: Cette étude est enregistrée sur PROSPERO (CRD42017064395). Les bases de

données MEDLINE, Embase Classic, et Embase ont été cherchées le 3 mai, 2017 (mise à jour le

23 avril, 2019). 976 titres et résumés ont été évalués, dont 114 textes-complètes étaient inclus. La

proportion de critères STROME-ID rapporter s’est tabulée pour chaque publication, et les

différences entre les moyennes étaient comparées avant et après la publication de STROME-ID,

à l’aide d’un test t. Une période de 6 mois après la publication de STROME-ID était inclus pour

inclure les articles sous presse. En plus, des analyses de sensibilité ont été réalisés. Les

régressions quasi-poisson et tobit ont été utilisés pour déterminer si l’indice-h (IH), le facteur

d’impact du journal (FI), la taille de l’échantillon (TE), et la région géographique de l’affiliation

de l’auteur principal ont été corrélés avec le nombre et proportion de critères STROME-ID

effectués.

Resultats: La proportion des critères applicables effectués dans les articles était 16,3-75,0%

(taux moyenne 49,9% ± 11,88%), sans différence entre les taux moyennes de la proportion des

critères comparées avant et après la publication des lignes directrices STROME-ID. IH n’était

pas inclus dans l’analyse de régression ajustée. Seulement TE était associée significativement

avec une plus grande proportion des critères STROME-ID effectués.

Conclusion: La qualité de rapport des études épidémiologiques génomiques est variée, malgré la

publication des lignes directrices STROME-ID. Les études dans le futur devraient investiguer les

facteurs responsables pour maintenir l’adhérence des lignes directrices STROME-ID afin

d’augmenter la qualité et l’utilité des données. L’appui des journaux pourrait être nécessaire pour

augmenter l’adhérence des lignes directrices STROME-ID.

3

Acknowledgements

I would like to thank my co-supervisors, Dr. Marcel Behr and Dr. Robyn Lee for providing high-

quality mentorship and dedicated supervision throughout my graduate training. Their

commitment to scientific inquiry and public health, especially during the coronavirus pandemic,

has made a lasting impression on me. I am certain I am a better researcher and communicator

because of how they have challenged and encouraged me throughout this process. Amid the

challenges and milestones, they were quick to listen and offer feedback, which has greatly

informed this work. Having worked with Robyn closely, I cannot express enough gratitude for

her patience and willingness to go above and beyond to guide me throughout this entire process,

and to make sure I met required deadlines. I am grateful that they both took an interest in my

long-term goals, and for providing me with learning opportunities along the way.

My heartfelt gratitude extends to Dr. Jim Hanley for sharing his expertise about the statistical

methods used in this thesis, Jaryd Sullivan for providing French translation services, and Fiona

McIntosh for her lab support. Thank you to the rest of the Behr lab for providing a welcoming,

stimulating learning environment, and the administrative staff of the Epidemiology department.

Thank you to my peers who brought joy and balance into my life. Kacper, David, and Jiameng

for sharing their stories. Talia, for celebrating the outdoors with me. Most of all, I could not have

done this without my twin sister, Breagh, who completed her own epidemiology degree in

tandem with me. To my friends and family, thank you for your ongoing trust and support.

Acknowledgement of financial support

I would like to acknowledge and thank the Canadian Institutes of Health Research for the

funding of this work, via a Frederick Banting and Charles Best Canada Graduate Scholarship

(CIHR-CGS-M). An Operating Grant awarded to Marcel also funded my stipend during my

degree. I would also like to thank Marcel and the Epidemiology department for providing

financial support to attend The Union NAR conference in Chicago.

4

Preface and contribution of authors

This thesis contains 7 chapters. Chapter 1 provides a rationale for the research and outlines the

main objectives of the thesis. Chapter 2 is a literature review summarizing the epidemiology of

tuberculosis (TB), whole-genome sequencing (WGS) and its epidemiological applications, and

the current reporting issues in TB epidemiology studies using WGS. Results are elaborated upon

in Chapter 3. Chapter 4 explains the study methodology. The results of the thesis are presented in

the form of a manuscript in Chapter 5, which will be submitted to The Lancet Microbe. Chapter

6 reports additional findings to those in the manuscript, which their interpretation is discussed in

Chapter 7. The master reference list is provided at the end of the thesis.

This thesis is presented in manuscript-based format. The results are given in the following

manuscript, which has been prepared for submission to a peer-reviewed journal:

Manuscript I. Whole genome sequencing for epidemiological studies of tuberculosis: a

systematic review of applications and reporting practices

Authors: Brianna Cheng, Marcel A Behr, Ben P Howden, Ted Cohen, Robyn S Lee

Status: In preparation for submission to The Lancet Microbe

BC screened abstracts and titles for inclusion, data extraction, statistical analysis, created the

tables and figures, interpreted the data, and wrote the first draft of the manuscript. RSL

conceived and led the study, designed the protocol and ran the searches, screened abstracts and

titles for inclusion, guided statistical analyses and interpretation of the data, wrote the first draft

of the manuscript with BC and co-supervised BC. MAB assisted with data interpretation,

reviewed manuscript drafts, and co-supervised BC. BPH and TC contributed to protocol

development, and reviewed the final manuscript draft. TC also served as arbitrator for

disagreement in study inclusion.

Author initials: Brianna Cheng (BC), Dr. Marcel A Behr (MAB), Dr. Robyn S Lee (RSL), Dr.

Benjamin P Howden (BPH), and Dr. Ted Cohen (TC).

5

Table of contents

Abstract ........................................................................................................................................... 1

Résumé ............................................................................................................................................ 2

Acknowledgements ......................................................................................................................... 3

Preface and contribution of authors ................................................................................................ 4

Table of contents ............................................................................................................................. 5

List of tables .................................................................................................................................... 8

List of appendices ......................................................................................................................... 10

List of abbreviations ..................................................................................................................... 11

Chapter 1. Introduction ................................................................................................................. 12

Chapter 2. Literature review ......................................................................................................... 12

2.1. Overview of the pathogenesis of TB .................................................................................. 12

2.1.1. Diagnosis and clinical treatment of TB ....................................................................... 13

2.1.2. Prevention and control of TB ...................................................................................... 13

2.1.3. Epidemiology of TB .................................................................................................... 13

2.2. Genotyping methods of TB strains ..................................................................................... 14

2.2.1. Traditional genotyping methods .................................................................................. 14

2.2.2. Whole genome sequencing (WGS) ............................................................................. 14

2.3. Reporting guidelines for genomic epidemiology studies ................................................... 15

2.4. Data-sharing of genomic data ............................................................................................ 16

2.5. Reproducibility of next-generation sequencing research ................................................... 17

2.6. Factors correlated with reporting quality ........................................................................... 17

2.6.1 Geographic affiliation of the authors ............................................................................ 18

2.6.2. Journal Impact Factor (IF) ........................................................................................... 18

2.6.3. H-index (HI) ................................................................................................................ 18

2.6.4. Sample size (SS) .......................................................................................................... 18

6

Chapter 3. Overview of study data and methodology ................................................................... 19

3.1. Systematic review .............................................................................................................. 19

3.2. Statistical analysis .............................................................................................................. 20

3.2.1. Descriptive statistics .................................................................................................... 20

3.2.2. Missing data ................................................................................................................. 20

3.2.3. Sensitivity analyses ...................................................................................................... 21

3.2.4. Quasi-Poisson .............................................................................................................. 21

3.2.5. Tobit regression ........................................................................................................... 22

3.2.6. Model fit ...................................................................................................................... 23

Chapter 4: Results ......................................................................................................................... 23

4.1. Statistical analyses .............................................................................................................. 23

4.2. Model fit ............................................................................................................................. 27

Preamble to Manuscript I .............................................................................................................. 31

Chapter 5: Manuscript I ................................................................................................................ 32

5.1. Summary ............................................................................................................................ 32

5.2. Introduction ........................................................................................................................ 34

5.3. Methods .............................................................................................................................. 35

5.4. Results ................................................................................................................................ 37

5.5. Discussion .......................................................................................................................... 40

5.7. Acknowledgements ............................................................................................................ 44

Manuscript references ............................................................................................................... 45

Manuscript figures and tables ................................................................................................... 59

Supplemental materials ............................................................................................................. 81

Chapter 6. Discussion ................................................................................................................. 100

6.1. Summary .......................................................................................................................... 100

6.2. Strengths and limitations .................................................................................................. 101

7

Chapter 7. Conclusions ............................................................................................................... 102

References ................................................................................................................................... 103

Appendix ..................................................................................................................................... 125

8

List of tables

Chapter 3

Table 1. Number of genomic epidemiology of tuberculosis papers per publication year ............ 24

Table 2. Descriptive statistics of independent variables ............................................................... 24

Table 3. Descriptive statistics for dependent variables ................................................................. 26

Table 4. Mean and variance of count data suggest over-dispersion ............................................. 27

Chapter 4

Table 1. Summary of included studies .......................................................................................... 63

Table 2. Mean proportions of STROME-ID criteria met pre- and post-guideline publication .... 78

Table 3. Quasi-Poisson univariate and multivariate analyses of impact factor, H-index, continent,

and sample size of isolates ............................................................................................................ 79

Table 4. Univariate and multivariate tobit analysis of impact factor, H-index, continent, and

sample size of isolates ................................................................................................................... 80

9

List of figures

Chapter 3

Figure 1. Pairwise correlation matrix for impact factor (IF), H-index (HI), sample size of isolates

(SS), and sample size of patients (SP). ......................................................................................... 27

Chapter 4

Supplemental Figure 1. Count of "not applicable" papers per STROME-ID criterion, pre-

publication. .................................................................................................................................... 84

Supplemental Figure 2. Count of "not applicable" papers per STROME-ID criterion, post-

publication ..................................................................................................................................... 85

Supplemental Figure 3. Distribution of impact factors for included papers ................................. 86

Supplemental Figure 4. Distribution of sample size of isolates in included papers ..................... 87

Supplemental Figure 5. Proportion of STROME-ID criteria met with 12-month lag pre-

publication. .................................................................................................................................... 88

Supplemental Figure 6. Proportion of STROME-ID criteria met with 12-month lag post-

publication. .................................................................................................................................... 89

Supplemental Figure 7. Proportion of STROME-ID criteria met post-publication, excluding

articles from 12-month lag. ........................................................................................................... 90

10

List of appendices

Appendix 1. Distribution of H-index in included papers ............................................................ 125

Appendix 2. Distribution of count of eligible criteria met in included papers ........................... 125

Appendix 3. Distribution of proportion of all criteria met in included papers ........................... 125

11

List of abbreviations

AIC Akaike information criterion

CI confidence interval

CONSORT Consolidated Standards of Reporting Trials

DNA deoxyribonucleic acid

HI h-index

IF impact factor

IQR interquartile range

MDR multi-drug resistant M. tuberculosis

M. tuberculosis Mycobacterium tuberculosis

NGS next generation sequencing

RFLP restriction fragment length polymorphism

SS sample size of isolates

SP sample size of patients

STARD Standards for Reporting of Diagnostic Accuracy Studies

STROBE Strengthening the reporting of observational studies in

epidemiology

STROME-ID Strengthening the reporting of observational studies in

epidemiology for infectious diseases

TB tuberculosis

VNTR variable number tandem repeat

WGS whole-genome sequencing

WHO World Health Organization

XDR extensively drug-resistant M. tuberculosis

12

Chapter 1. Introduction

Tuberculosis (TB) is the world’s leading cause of mortality attributed to a communicable disease

(1). Despite available treatments and advances in public health infrastructure, it remains a major

global public health challenge.1 Recent advances in technology and cost have enabled the

widespread use of whole-genome sequencing (WGS). WGS is a next-generation sequencing

technique that is being increasingly applied in TB epidemiological research, with the potential to

inform clinical and public-health decision-making.

The greatest potential of WGS is realized when its results are reproducible. Lack of

standardization in analytical approaches and poor reporting make it difficult to compare WGS

results across studies, as different analytical approaches can influence final data interpretation.2

Recent studies have also revealed poor reporting and sharing practices of genomic data, which

further hinders the ability to assess the validity and biases of studies.3 At the time of writing, no

studies have examined reporting quality and its correlates per reporting guidelines called

STROME-ID among genomic epidemiology studies of TB.

This thesis will assess the reporting quality of genomic epidemiology studies of TB before and

after the publication of STROME-ID in 2014. It will investigate the relationship between

reporting quality and study characteristics; specifically, h-index (HI), sample size of isolates

(SS), geographical affiliation of the senior authors’ affiliation, and journal impact factor (IF) and

their association with the count, and proportion, respectively, of met criteria were examined. It

was hypothesized that reporting quality will increase post-guideline publication, and that IF, SS,

and the continent of the senior author’s primary affiliation are associated with reporting quality.

Chapter 2. Literature review

2.1. Overview of the pathogenesis of TB

Tuberculosis (TB) is an infectious respiratory disease caused by the bacteria, Mycobacterium

tuberculosis.1 It is the number one infectious cause of mortality in the world.1 The infection is

spread primarily by aerosol transmission between humans, whereby the bacteria gains access to

alveolar macrophages in the lung.4 Bacilli that survive attack from immune system cells (e.g.,

macrophages, granulocytes) establish primary infection, giving rise to granuloma or the “Ghon

13

focus,” a hallmark feature of TB caused by aggregates of inflammatory cells.5,6 The degeneration

of granuloma results in active TB infection.7 In cases where the bacilli migrate to organs outside

of the lungs, this is known as extrapulmonary TB.6

Until recently, the scientific community has distinguished the acquisition of TB infection to be

either latent or active.8 According to these definitions, latent TB is understood as when

individuals fail to exhibit clinical symptoms of active TB infection, yet displays positive

immunologic markers for the disease.8 Of those who are immune reactive, the risk of progression

to active TB is highest in the first 2 years, then declines thereafter.9

2.1.1. Diagnosis and clinical treatment of TB

Current diagnostic procedures for active TB include chest radiography, microbiologic cultures,

and phenotypic drug-susceptibility testing, sputum smear microscopy, and other rapid screening

tests.10 Latent TB may be diagnosed using the tuberculin skin test (TST) or interferon-γ release

assays (IGRA), in addition to evaluating the patient’s medical history and the results of their

physical examination.11 Chest x-ray is used to rule out active TB disease before initiating

treatment of latent TB.12

For adults with active TB, standard treatment involves 6-months of anti-TB drugs.13 Patients

with drug-sensitive TB are treated according to the standard therapy of isoniazid, rifampin,

pyrazinamide, and ethambutol for the first 2 months followed by isoniazid and rifampicin for 4

more months.13

2.1.2. Prevention and control of TB

Prevention and control of TB relies on the timely and accurate identification of active cases.

Accurate strain differentiation, which allows for the identification of the source and transmission

pathway of TB, is thus important for public health and clinical decision-making.14,15

2.1.3. Epidemiology of TB

It is estimated that 10 million new cases of TB are reported worldwide annually.16 Although the

global incidence of TB has declined over the past decade, efforts are still needed to achieve

14

worldwide TB elimination.16 Globally, TB incidence remains highest among Asian countries,

including Bangladesh, China, India, Indonesia, and Pakistan, which collectively account for half

of new cases each year.17 One third of the world population is estimated to have latent TB,

although this may be overestimated.1,18 TB disproportionately affects the poorest, the most

vulnerable, and marginalized population groups wherever it occurs.19 For example, in 2016, TB

incidence rates were almost 300 times greater in the Inuit compared to the non-Indigenous

Canadian born population.20

The World Health Organization (WHO) has set global TB elimination targets to reduce 80% of

new TB cases by the year 2030.21 Furthermore, there has been a renewed urgency in addressing

TB as a public health priority due to the emerging global spread of drug-resistant TB, including

multi-drug resistant (MDR) and extensively-drug resistant (XDR) TB. According to drug

surveillance data by the World Health Organization (WHO), the global incidence of MDR-TB in

2018 was estimated to be 600 000 new cases, of which 6.2% are XDR-TB.22

2.2. Genotyping methods of TB strains

2.2.1. Traditional genotyping methods

Molecular genotyping is a laboratory technique for studying the spread and evolution of

diseases.23 Traditional genotyping relies on different genetic markers for analysis, such as strain-

specific banding patterns (IS6110 fingerprinting), numerical patterns (24 locus-MIRU-VNTR

typing), or barcode-like patterns (spoligotyping).24 These tools have broad applications when

applied to the study of bacterial pathogens, such as M. tuberculosis. For example, they can help

discern if two, unrelated individuals are part of the same chain of TB transmission; this is more

likely for genetic patterns that are similar.24

2.2.2. Whole genome sequencing (WGS)

WGS is an alternative sequencing method that is increasingly being used for detecting genomic

variability.25 In contrast to traditional genotyping, it analyzes the entire deoxyribonucleotide

(DNA) genome.26 Based on parallel sequencing technologies called next-generation sequencing

(NGS), WGS identifies genomic regions at which individual nucleotide bases differ, called

single nucleotide polymorphisms (SNPs).25,26 TB transmission can be inferred by analyzing the

15

genetic distances (number of SNPs) between patients’ bacterial isolates; closely related isolates

may provide evidence of recent transmission.24

Specifically, WGS analyzes approximately 90% of the genome instead of 1% by traditional

genotyping such as spoligotyping and VNTR.27 WGS thus provides added resolution that is

useful for understanding recent TB transmission, as well as drug-resistance evolution and strain

characterization.27 Previous transmission and systematic review studies show that WGS-based

genotyping can identify strains with greater accuracy and higher resolution compared to

traditional genotyping methods.28-30 Despite these advantages, intensive bioinformatic resources

are required to process and interpret raw genomic data.32 After sequencing, WGS data must be

analyzed using SNP-calling pipelines,2 which have been described in detail by other review

papers.33,34 This genotyping information can then be combined with epidemiological data to

determine whether cases are indicative of recent transmission.24 Future improvements in its cost

effectiveness, and ease of data interpretation will allow WGS to become a gold standard in

routine practices among diagnostic and reference laboratories.31

2.3. Reporting guidelines for genomic epidemiology studies

There are currently no reporting guidelines for WGS. Sandve et al. had proposed ten reporting

guidelines for computational biology research, although these were general suggestions about

reporting software versions and data sharing.35 Lubin et al. provided recommendations for

standardizing the content of NGS variant files in clinical settings, which also discussed reporting

software versions and their parameters.36 To expand upon these limited works, specific reporting

guidelines for WGS are needed.

Formal reporting guidelines do exist for observational studies, however. In 2007, reporting

guidelines for observational studies in epidemiology, called Strengthening the reporting of

observational studies (STROBE), were published by an international organization called the

EQUATOR network.37 STROBE guidelines consist of a checklist of 22 items that provide

recommendations relating to the abstract, methods, results, and discussion sections of the

article.37

16

In 2014, reporting guidelines called Strengthening the Reporting of Molecular Epidemiology for

Infectious Diseases (STROME-ID) were published to provide tailored recommendations for

infectious disease studies, including TB.38 These guidelines were developed to increase

transparency of reporting, interpretability of results, and to encourage data-sharing.

STROME-ID guidelines extended the original list of STROBE criteria with 20 more items that

are tailored to genomic epidemiology studies. In total, the STROME-ID guidelines5 comprise 42

criteria for which specific details are recommended in the methods, results or analysis.

Evidence for STROBE guidelines affect reporting quality is mixed. While some systematic

reviews found improved reporting post-publication,6,7 others did not find STROBE to

significantly impact reporting compliance.8,9 Other systematic reviews found that guidelines are

not being appropriately applied, which suggests confusion about the intended use of STROBE

guidelines.10 To date, no studies have investigated adherence to STROME-ID guidelines for

genomic epidemiology studies of TB.

2.4. Data-sharing of genomic data

Pathogen genomics first emerged with the commercial introduction of NGS technology in

2005.39 The falling costs of NGS, and its capacity for parallel sequencing has generated an

enormous number of reads; this has been further facilitated by Illumina sequencing platforms,

which currently offer the lowest per-base cost.26

In this era of “big data” where increasingly larger quantities of information are being produced,

open-access biological databases provide researchers access to these datasets for conducting their

own independent analyses.40 Repositories for depositing WGS data include The National Center

for Biotechnology Information’s (NCBI) GenBank database, The European Nucleotide Archive

also partners with the NCBI, which consists of three databases, including the Sequence Read

Archive.41 Despite the importance of data-sharing in advancing bioinformatic and TB

epidemiology research, there is still inadequate data-sharing.3

17

2.5. Reproducibility of next-generation sequencing research

Studies continue to under-report methods and study limitations,3 which contributes to wasted

time and resources.42,43The extent of this “reproducibility crisis” was described in a 2016 Nature

survey of over 1500 researchers, whereby approximately 70% were unable to replicate

another scientist's experiments.42 Surprisingly, 50% of respondents, who come from scientific,

engineering, and medical disciplines, failed to even reproduce their own data.42 These findings

suggest that greater attention is needed to allow for the verification and transparency of

biomedical research. There are also moral and ethical reasons for openly communicating study

methods and biases, given that the use of public funds are often used to support scientific

research.44

Reproducibility is of particular concern for WGS studies, where there is presently widespread

heterogeneity of WGS analytical pipelines.33,45 These analytical pipelines rely on various

commercial or publicly available bioinformatic software or programs, which all perform to

different standards.28,46 Thus, given the range of tools available for bioinformatic analysis of

WGS data, it is necessary to know the specific version of the base software in order to replicate

and execute the workflow successfully. For instance, a specific version of Java (version 1.8) is

required to execute tools from GATK or Picard toolkit, and is needed to successfully execute

workflows using this software.47 Moreover, different pipelines may lead to discrepancies in

variant calling, as suggested by studies comparing unique SNP-calling pipelines.48,49 Thus,

understanding the specific characteristics of WGS pipelines (e.g., types of bioinformatic tools

used, and their versions) facilitates reproducibility, and the assessment of bias. Standardization

better allows for comparisons and investigations of cross-border outbreaks and other public

health initiatives.49

2.6. Factors correlated with reporting quality

Several study characteristics and bibliometric indicators have been suggested to be correlated

with reporting quality. In this section, the evidence for correlates of reporting quality will be

discussed.

18

2.6.1 Geographic affiliation of the authors

No studies have yet examined this using the STROBE framework. Selman et al. also did not find

a significant correlation between geographic region of publication and percent compliance to the

Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines.50 Ghimire et al.

discovered a similar finding when tested using a different framework, Consolidated Standards of

Reporting Trials (CONSORT).51

2.6.2. Journal Impact Factor (IF)

Often perceived as a metric of study quality by funding agencies and academic employers,52

journal IF is a bibliometric measure that conveys the average number of citations of recent

articles published in a particular journal.53 The few studies examining the association of IF with

reporting quality using STROBE have found mixed results. A systematic review observed that

journals with lower IF (<5) had a greater increase in their STROBE reporting score than journals

with higher IF (> 5) when comparing time periods before and after STROBE publication in

2007.54 Another study found that even prestigious medical journals with the top IF reported only

69.3% of STROBE criteria three years after guideline publication.55 A systematic review that

analyzed this correlation using STARD did not observe a significant relationship between the

number of criteria reported and five-year journal IF.56

2.6.3. H-index (HI)

The HI was developed in 2005 as an alternative indicator of scientific quality to journal IF.57

Briefly, its calculation involves ordering an author’s publications so the most cited is listed first.

The final HI is obtained by counting down until the number of papers equals the number of

citations.57 Although the HI is thought to be a clearer and more objective measure of the quality

of papers published by an author,58 its correlation with reporting quality has been poorly

described in the literature. One study has suggested that perceptions of journal quality are

correlated with higher HI.59

2.6.4. Sample size (SS)

Small sample size (e.g., of genomic isolates) may not be representative of the entire target

population, and may lead to bias if inappropriate sampling methods are used, affecting overall

19

conclusions of WGS drug-susceptibility human studies.28 Simulation studies suggest that larger

sample sizes are correlated with higher reporting quality.60,61 This has been observed among

randomized control studies per CONSORT guidelines,62-64 and per STARD guidelines.50

Chapter 3. Overview of study data and methodology

In this section, the variables used in the dataset will be described. The rationale for the

systematic review and statistical approaches will be discussed. All statistical analyses were

performed using RStudio (version 1.1.456). Ethics approval was not needed for the purposes of

this study.

3.1. Systematic review

In response to the proliferation of pathogen genomic studies in the past decade, this thesis was

conducted to understand their level reporting. Although systematic and narrative reviews about

WGS have previously been conducted,2,29,65 these studies have not examined reporting quality in

light of their reproducibility issues. Instead, they focused on outbreaks,29 the methodology of

WGS,65 and general issues with WGS-based bioinformatic pipelines.2 These studies are also

limited in their scope, given their small sample sizes of three and twenty-five.29,65

This systematic review will examine the reporting quality of WGS papers based on the

STROME-ID reporting checklist. The detailed methods, including inclusion and exclusion

criteria, are discussed in the registered PROSPERO protocol (CRD42017064395). The theme of

these papers will be discussed. Moreover, variables defined a priori were extracted, as detailed

in Supplemental Dataset 1 (https://drive.google.com/file/d/1r0JIu4q1XfQxlFJWE-

VZzyv9da1V_QnM/view?usp=sharing), including information about bioinformatic tools used,

their versions, and whether or not changes were made to public health interventions based on

genomic data.

This systematic review will also examine whether certain study characteristics are associated

with reporting quality, which is defined as both the count of eligible criteria, and proportion of

all criteria met, respectively.

20

3.2. Statistical analysis

3.2.1. Descriptive statistics The median, minimum, maximum, and interquartile range (IQR) for continuous variables IF, HI,

and SS are presented in Table 2. After assessing the distribution of these continuous variables, IF

and SS were categorized. HI was analyzed as a continuous variable (Appendix 1). The categories

for IF were selected arbitrarily, and based on author experience with the metric. Quartiles of SS

were chosen to reflect the low frequency range across the included studies (Supplemental Figure

4), and were also informed by previously defined quartiles by similar studies.66 Due to low

counts of individual countries of the senior author’s primary affiliation, this was analyzed by

continent instead.

Collinearity was tested between all continuous variables. Using a pairwise correlation matrix,

continuous variables, including sample size of patients (SP), were checked graphically for

collinearity (Figure 1). To test this empirically (Table 4), Spearman’s non-parametric correlation

test was done to determine whether both or only one of these should be included in analysis.

Dependent variables include the count of eligible criteria met, and the proportion of all criteria

met. The number of STROME-ID criteria met out of all eligible STROME-ID criteria was

tabulated to determine the count of eligible criteria. Proportions were obtained by dividing the

number of criteria met by each paper divided by the total number of criteria.

3.2.2. Missing data

There was a non-negligible amount (n= 15, 13.16%) of missing IF values, which was assumed to

be missing at random. To address this, the IF for the previous year was used after assessing the

variation of IF between 2013-2018 was assessed for variation (Supplemental Table 3). Given the

little variation across this five-year time period (<1.00 IF), negligible selection bias is expected

for using this simple imputation method compared to complete case analysis to address missing

data.67

21

3.2.3. Sensitivity analyses Sensitivity analyses were conducted for each of the publication time periods. Articles published

during the six-month and twelve-month lag were excluded to acknowledge that work under

review or in-press at the time of STROME-ID publication would be less likely to be able to

apply the reporting guidelines.

This six- and twelve-month lag period were chosen arbitrarily, as there are no defined standards

regarding publication uptake time. One systematic review used a lag time of eighteen months.68

Other studies did not account for lag time in their analysis, although they analyzed reporting

levels one54,69 and three years70 post-guideline publication.

A sensitivity analysis was also conducted when examining the correlation of geographic region

of last authors’ affiliation with reporting quality. This was done to address instances in which

there was more than one senior author with equal contributions, who were from different

continents. In total, seven papers with multiple last authors whose primary affiliations were from

different continents were identified.

3.2.4. Quasi-Poisson Poisson regression was used to model the count data in this thesis. Poisson regression is part of a

family of generalized linear models that is used to model count data.71 The general log-linear

regression equation takes the form of Equation 1, given explanatory variable x, and outcome

variable Y. Equation 2 shows the exponentiated form of Equation 1, which is transformed for

easier interpretation of parameter estimates (incidence rate ratio):71

!"#$%&'(1:[-.|0.] = 3404 + 3606 + 3707 +3808

!"#$%&'(2:![-.|0.] = exp( 3404 + 3606 + 3707 +3808)

A central assumption of this statistical model is that the mean (μ) equals the variance (σ2).71

When this assumption is not met, either over- or under-dispersion occurs. For the latter, this is

defined as when μ > σ2.

22

In the presence of under-dispersion, the standard Poisson model is inadequate and can result in

misleading conclusions about the effects of experimental factors or covariates of interest.72

Under-dispersion can be accounted for during quasi-Poisson prediction and/or inference by

estimating the scale parameter,73 whereby the variance is multiplied by a scale factor to allow for

over- or under-dispersion.72 In contrast to traditional Poisson, this offers a more flexible

modelling strategy that allows variances to differ from the expected values.74

3.2.5. Tobit regression Two-limit tobit regression was used to model the proportion outcome data in this thesis to

account for values between zero and one. There is no commonly accepted method to analyze

proportions. Health economics research has traditionally used generalized linear models

(GLMs),75,76 however, linear modeling may not guarantee that the fitted values will be

constrained within the upper and lower thresholds of the data interval, especially if the data is

naturally bounded or theoretically “censored” above or below certain values. Use of regression

models based on the binomial distribution are also not appropriate for non-binary, continuous

proportions, such as the case for this data set.77

The tobit model allows for the modeling of continuous proportions of values that are restricted to

a closed interval.78 The model works best if there are no excessive values at the endpoints of this

interval, such as the case with this data set, which can lead to erroneous inferences.75

Applications of tobit regression to bounded or semi-bounded data has mostly been described in

statistical modeling papers79,80 instead of in observational studies.81 This evidence suggests that

tobit regression performs better or the same as linear regression methods, with the added benefit

of addressing outcome data with particular floor and ceiling limits. In papers comparing

traditional linear mixed models and tobit regression, the tobit model was found to be either

superior in model fit, based on model fit parameters and residual plots.78 In other statistical

modeling papers, tobit estimates produced comparable coefficient estimates with other

regression methods.76

23

3.2.6. Model fit Pseudo-R2, the Akaike information criterion (AIC), and log-likelihood were used to assess the fit

of quasi-Poisson models (Table 6) and tobit regression (Table 7). For quasi-Poisson models,

which are estimated by quasi-maximum likelihood,82 the pseudo-R2 has been proposed as an

alternative indicator of model fit to the traditional R2 value.83 This value can be interpreted as

the relative reduction in deviance due to the added to the model covariates.84 The formula is

shown in Equation 3.

!"#$%&'(3:@AB#C'D6 = 1 −FBA&C#$GCBH&$(IB(#GGCBH&$(IB

For tobit, the formula for the pseudo-R2 is calculated as shown in Equation 4:

!"#$%&'(4:@AB#C'D6 = 1 −K4KL

where L1 is the log-likelihood of the constant-only model, and LO is that of the full model.85

Chapter 4: Results

This section further elaborates on the results presented in the manuscript.

4.1. Statistical analyses

The total number of included papers ranged in publication years from 2009 to 2019, with most

published in 2015 and onwards (Table 1). The median, minimum, maximum, and IQR for the

independent and dependent variables are presented in Table 2 and 3, respectively. Histogram

plots for the count and proportion are displayed in Appendix 2 and 3 respectively. In the

histogram of counts (Appendix 2), the distribution appears to be right skewed, with a small tail

extending to the right, which is characteristic of the Poisson distribution. Appendix 3 shows that

the proportion data lies within a range of 0 and 1. Testing showed that the count data used in this

study is under-dispersed (Table 4). The ratio of the variance and mean is also <1, which further

confirms under-dispersion.

24

Table 1. Number of genomic epidemiology of tuberculosis papers per publication year

Publication year of paper Total count of papers

2009 1

2010 2

2011 1

2012 1

2013 8

2014 6

2015 18

2016 12

2017 17

2018 34

2019 13

25

Table 2. Descriptive statistics of independent variables

Independent variables Descriptive statistics

Impact factor Median

IQR

Min

Max

4.85

(3.06, 9.09)

1.67

79.26

H-index Median

IQR

Min

Max

34.5

(19, 51)

1.0

88.0

Continent North America

South America

Africa

Asia

Europe

Oceania

32

1

6

13

53

8

Sample size of isolates Median

IQR

Min

Max

83.0

(30, 277)

2.0

5715

IQR= Interquartile range, Min= minimum, Max= maximum

26

Table 3. Descriptive statistics for dependent variables

Visual and empirical tests of collinearity are displayed in Figure 1 and Table 4. Upon visual

inspection, the matrix scatterplot does not suggest any collinearity between IF, HI, and SS.

However, SS and SP appeared to be correlated given the graph’s slight linear trend (third row,

fourth column in Figure 1). Based on Spearman’s rho (0.86, P-value <0.01), and greater

completeness of data for SS, SP was excluded from statistical analysis.

Dependent variables Descriptive statistics

Count Median

Min

Max

IQR

8

4

14

(7, 9)

Proportion Median

Min

Max

IQR

0.5

0.16

0.75

(0.41, 0.58)

IQR= Interquartile range, Min= minimum, Max= maximum

27

Figure 1. Pairwise correlation matrix for impact factor (IF), H-index (HI), sample size of isolates

(SS), and sample size of patients (SP).

Table 4. Mean and variance of count data suggest over-dispersion

Mean Variance Ratio of variance and mean

8.01 3.44 0.43

4.2. Model fit

The AIC and log-likelihood for both quasi-Poisson and tobit models support the final

multivariate model in Table 6 and 7 respectively. The residual deviance values that were used to

calculate the pseudo-R2 for quasi-Poisson analysis are displayed in Table 5. The pseudo-R2

supported the full quasi-Poisson model.

28

Table 5. Residual deviance for independent variables

Independent variables Residual deviance

Impact factor 43.11

H-index 46.29

Continent 45.48

Sample size of isolates 45.23

IF + SS 42.96

IF + Continent + SS 41.72

Note: Null deviance for all models is 46.64. IF= Impact factor, HI= H-index, SS= sample size of isolates

29

Table 6. Tests of model fit for quasi-Poisson model

Quasi-Poisson model AIC Pseudo-R2 Log-likelihood DF

Impact factor 492.88 0.08 -242.44 4

Continent 497.25 0.02 -243.62 5

Sample size of isolates 495.00 0.03 -243.50 4

IF + SS 498.73 0.08 -242.36 7

IF + SS + Continent 505.50 0.11 -241.74 11

AIC: Akaike information criteria, DF = degrees of freedom, IF= impact factor, SS= sample size of isolates

30

Table 7. Tests of model fit for tobit model

Tobit model AIC Pseudo-R2 Log-likelihood DF

Impact factor -165.12 -0.07 87.56 221

Continent -155.55 -0.03 83.77 220

Sample size of isolates -167.61 -0.09 88.81 221

IF + SS -166.31 -0.12 91.15 218

IF + SS + Continent -163.05 -0.15 93.53 214

AIC: Akaike information criteria, DF = degrees of freedom, IF= impact factor, SS= sample size of isolates

31

Preamble to Manuscript I

The results of this thesis are presented in one manuscript. The manuscript presents a systematic

review of genomic epidemiology studies of TB, as well as the association between study

characteristics with reporting quality.

The appendix to the manuscript is included at the end of the chapter, and provides additional

information on the dataset and study methodology.

The results have previously been presented at:

The Union North American Regional (NAR) Meeting, February 2020. Chicago, Illinois,

United States of America. Poster presentation.

32

Chapter 5: Manuscript I

Whole genome sequencing for epidemiological studies of tuberculosis: a systematic review

of applications and reporting practices Cheng B1, Behr MA1, 2, Howden BP3, Cohen T4, Lee RS5,6*

1McGill University, Department of Epidemiology, Biostatistics and Occupational Health, Montreal, Canada 2Infectious Diseases and Immunity in Global Health Program, Research Institute of the McGill University Health Centre, Montreal, Quebec; McGill International TB Centre, Montreal, Quebec, Canada 3The Microbiological Diagnostic Unit Public Health Laboratory, University of Melbourne, Melbourne, Australia 4Yale University, New Haven, United States of America 5University of Toronto, Dalla Lana School of Public Health, Epidemiology Division, Toronto, Canada 6Harvard School of Public Health, Center for Communicable Disease Dynamics, Boston, United States of America

*Address correspondence to: Dr. Robyn Lee, PhD Epidemiology Division Dalla Lana School of Public Health Health Sciences Building 155 College Street, 6th floor Toronto, ON M5T3M7 [email protected] Word count: 3561 Figures: 4 Tables: 4

33

5.1. Summary Background: As pathogen genomics become increasingly important in infectious disease

epidemiology and public health, it is essential to assess the quality of studies that use these

approaches. Here, we investigate the reporting practices in genomic epidemiology studies of

tuberculosis (TB) using the 'Strengthening the Reporting of Observational studies in

Epidemiology’ (STROME-ID) guidelines as a benchmark.

Methods: MEDLINE, Embase Classic and Embase were searched on April 23, 2019. Two

reviewers determined eligibility, and completeness of STROME-ID criteria were extracted. A

pre-post publication analysis of the mean proportion of STROME-ID criteria was done using a

two-tailed t-test. Quasi-Poisson and tobit regression were used to examine associations between

study characteristics and the number, and proportion of criteria completed, respectively.

Results: 976 titles and abstracts were screened; 114 full-texts (2009-2019) met inclusion criteria.

The mean proportion of criteria met was 49·9% (range 16·3-75·0%). Reporting quality did not

change significantly before vs. after STROME-ID publication (0·51 vs. 0·46, P=0·26). The

number of criteria reported (among those applicable to all studies) was not associated with

impact factor, h-index, country of affiliation of the senior author, or sample size (SS). In terms of

reproducibility, 87·7% (n=100) of studies reported which bioinformatic tools were used, but only

33% (n= 33) reported corresponding version numbers. Sequencing data was available for 75·4%

(n= 86).

Conclusion: STROME-ID criteria were not fully met in the majority of genomic epidemiology

studies of TB. The high proportion of studies without sequencing data available highlights a key

concern for reproducibility in this field.

34

5.2. Introduction

Whole genome sequencing (WGS) has been increasingly used in genomic epidemiological

studies of tuberculosis (TB). Its superior resolution compared to classical genotyping methods

(e.g., restriction fragment length polymorphism or mycobacterial interspersed repetitive units

variable number tandem repeats) provides the opportunity to gain new insights into transmission

of TB and evolution of drug resistance, and potentially inform public health interventions.1-4

However, the ability of WGS to serve these purposes depends on the quality of the studies that

use this technology. This is true not only for TB, but for other pathogens as well. Presently,

heterogeneity of WGS bioinformatic pipelines pose challenges to the standardized reporting and

interpretation of results across genomic epidemiological studies.5, 6 Standardized reporting of

data and software would further facilitate comparison of WGS-based findings, and enable

researchers to assess the validity of published data.7

In 2007, guidelines called 'Strengthening the Reporting of Observational studies in

Epidemiology (STROBE)’ were published. These consisted of 22 criteria8 intended to help

readers better understand and assess the validity of observational studies. More recently, a new

set of guidelines was released in 2014, called the Strengthening the Reporting of Molecular

Epidemiology for Infectious Diseases (STROME-ID).9 These extended the original 22 STROBE

criteria with 20 additional criteria on study design, and reporting of results, that were specific to

genomic epidemiology studies (Supplemental Table 1). In this paper, unless otherwise reported,

we define STROME-ID as the combined set of STROBE and STROME-ID criteria.

The impact of STROBE guidelines on reporting quality has been inconsistent. Some systematic

reviews found improved reporting post-publication of STROBE guidelines,10, 11 while others

found it was not associated with improved reporting.12, 13 One systematic review suggested that

guidelines were not being appropriately applied, even when used, suggesting the guidelines may

lack clarity or be otherwise difficult to fulfill.14 To date, no study has investigated reporting

quality using STROME-ID for pathogen genomic epidemiology. To address this gap, we

systematically reviewed genomic epidemiology studies, using TB as an example, to determine

the extent to which STROME-ID criteria have been reported, and investigated whether specific

study characteristics were associated with reporting practices.

35

5.3. Methods

Search strategy

This study is registered on PROSPERO (CRD42017064395) and followed Preferred Reporting

Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.15 We initially searched

MEDLINE, Embase Classic and Embase on May 3, 2017 using the terms “tuberculosis” and

“genom* sequencing”. We then updated this search on April 23, 2019. No restrictions were

placed on the start date or geographic location. We also systematically searched the pre-print

server bioRxiv. References of included articles were also hand-searched to ensure no eligible

articles were missed.

Inclusion and exclusion criteria

To be eligible for inclusion, studies needed to include patients with microbiologically-confirmed

TB and needed to have used WGS for typing of strains. Studies must have been published in

English, French or Spanish. As suggested by Field et al.,9 we considered studies to be genomic

epidemiology papers if they investigated the distribution or transmission dynamics of TB across

time, a particular population, or a geographic location in order to inform outbreaks, evaluate

infection control practices or perform surveillance. Studies were also included if they examined

risk factors for transmission (e.g., clustering), or if they distinguished between recurrent cases of

TB as relapse or reinfection. If studies described the evolution of TB strains and drug resistance,

or if they identified and classified new TB strains or lineages, they were included as well.

Finally, studies were included if they investigated the association between strain types or

mutations and clinical outcomes (e.g., death, treatment failure, relapse).

We excluded non-human studies, studies that were exclusively experimental (e.g., in-vitro or in-

vivo animal studies, or those that were purely diagnostic. The latter included studies where WGS

was exclusively used for predicting phenotypic drug resistance, without epidemiological aims.

We also excluded studies whose primary aim was to use WGS to develop a SNP-based typing

method (unless the overall analysis and description of the epidemiology still relied on WGS),

studies that exclusively compared typing methods, and studies with less than two patients.

Conference abstracts, editorials, and literature reviews were also excluded.

36

Data extraction

To determine if manuscript met eligibility criteria for STROME-ID, two reviewers

independently reviewed titles and abstracts (BC and RSL). Discrepancies were resolved by

discussion and third-party arbitration (TC). One reviewer (BC) was responsible for data

extraction. A second reviewer (RSL) independently checked a random sample consisting of 5%

of all eligible papers, with data extraction of these compared and discussed to clarify

discrepancies prior to extraction for the remaining articles (see Supplemental Methods for

detail).

Each STROME-ID variable was assessed, and scored as ‘complete’ or ‘incomplete’ (or assigned

‘not applicable’, where appropriate). The number of STROME-ID criteria and proportion of

those out of all criteria were then tabulated for each article, with the denominator for the

proportions excluding criteria that were not applicable (e.g., specific to a different study design).

In addition to this, we analyzed whether certain study characteristics were associated with the

number and proportion of fulfilled STROME-ID criteria, which were specified a priori. These

included sample size (SS), the journal impact factor (IF), the geographic region of senior

author’s primary affiliation, and h-index of the senior author. See Supplemental Methods for

rationale and details on how these data were collected/extracted, and analyzed.

Statistical Analysis

To assess differences in reporting following STROME-ID’s publication, the mean proportions of

completed criteria were compared before and after its publication date. A 6-month lag period

was included to account for articles that were already in press when STROME-ID was published.

Sensitivity analyses were also performed using a 12-month lag period, and excluding articles

published 6 and 12 months post-STROME-ID publication. Differences in mean proportions of

criteria were compared pre- and post-publication using a two-tailed t-test using R software

(version 1.1.456). The least and most reported STROME-ID criteria were also qualitatively

assessed to explore differences between periods, excluding criteria that were not eligible for >

20% of articles (Supplemental Figure 1 and 2).

37

To examine the association between study characteristics and reporting, two main approaches

were used. First, we used quasi-Poisson regression (to account for under-dispersion) with the

number of criteria completed as the dependent variable. Given not all criteria were applicable

across every study, this analysis was restricted to criteria that were applicable across all studies.

Second, we used tobit regression (censored between 0 and 1) to assess the association with the

proportion of criteria that were completed, including all studies in the analysis. The distribution

of IFs from all papers is shown in Supplemental Figure 3; IF was used as a categorical variable,

with categories chosen based on our experience with the metric, and previous studies that

examined correlates with IF.16, 17 For SS, we categorized this into quartiles due to low counts

across a wide range of data (Supplemental Figure 4). HI was analyzed as a linear variable.

Variables that had a P-value of < 0·20 in univariate analyses were included in the final model for

each analysis. Pseudo-R2, the Akaike information criterion, and log-likelihood were calculated to

assist with model selection and evaluate fit.

Missing data

The number of patients was missing for 18·4% (n=21) articles. IF was also not available for one

article published during the first year of the journal (2013), which we excluded from further

analysis, and from 15 articles published in 2019 (13.16%). For the latter, trends in IF were

reviewed over time to assess the degree of variation (Supplemental Table 2).

Role of the funding source

The funder of the study had no role in study design, data collection, data analysis, data

interpretation, or writing of the report. The corresponding author had full access to all the data in

the study and had final responsibility for the decision to submit for publication.

5.4. Results

A total of 976 titles and abstracts were screened. After full-text review and removal of

duplicates, 114 full-texts from among the original list of articles were eligible for inclusion

(Figure 1). No studies were excluded due to language of publication. A summary of the main

characteristics of included papers is presented in Table 1 (detailed information in Supplemental

38

Dataset 1). These were classified into themes based on their overall aims (not mutually-

exclusive: transmission, evolution, strain identification and clinical outcomes; Supplemental

Results).

Reporting practices of included articles

Overall, we found that the proportion of applicable STROME-ID criteria met among the

included papers in this study ranged from 16·3-75·0% (mean 49·9%, ± 11·88%). There was no

significant difference between the average proportion of criteria before and after guideline

publication (Table 2). For both the pre- and post-publication period, STROME-4.1 (definitions

for molecular terminology), and STROME-ID 8.1 (methods used to detect multiple-strain

infections) were among the two least reported criterion (pre-publication: 0%, post-publication:

11.34%, and pre-publication: 5.88%, post-publication: 7.45%, respectively). Across both time

periods, both STROBE-3 (study objectives and hypotheses) (pre-publication: 94.12%, post-

publication: 96.91%), and STROME-3.1 (the epidemiological objectives of using molecular

typing) were among the top reported criterion (pre-publication: 94.12%, post-publication:

96.91%, and pre-publication: 100%, post-publication: 94.85%, respectively). The same fifteen

criteria were ‘not eligible’ in ≥ 20% of papers for both pre- and post-publication periods

(Supplemental Figures 1 and 2); of these, 12 (80%) were from the original STROBE guidelines,

and pertained to specific epidemiological study designs and/or statistical analyses that are less

likely to be used in genomic epidemiology studies.

The average proportion of each individual STROME-ID criteria are shown in Figures 3 and 4 for

pre- and post-publication periods, respectively. During the pre-publication period (Figure 3),

there were 6 STROME-ID criteria that were not completed at all, while during post-publication

period, a single criterion was not completed (STROBE-16a). Similar results were obtained in

sensitivity analyses employing a 12-month lag and those excluding articles published during the

lag period, respectively (Supplemental Figures 5, 6, and 7).

Association of Study Characteristics

We initially considered sample size both in terms of the number of isolates and number of

individual patients. However, Spearman’s rho suggested evidence of collinearity between these

39

variables (0.86, P-value <0.01). In light of this, and missing data for the number of patients

(n=21 articles, 18·4%), the sample size of isolates was used for further analysis (SS). When

examining the IF between 2013-2018 for articles published in 2019, there was minimal variation

across these years (Supplemental Table 2), therefore the 2018 values were used. One paper in

2013 did not have an IF; this was excluded from the analysis. Moreover, due to low individual

country counts, we analyzed author affiliation by continent; there was only one count of South

America, which was subsequently combined with North America in the category “Americas”.

See Supplemental Table 3 for counts of papers per continent.

Univariate and multivariate analyses for quasi-Poisson and tobit regression models are presented

in Tables 3 and 4, respectively. As shown, HI did not meet criteria for inclusion in the full

multivariate model for either quasi-Poisson or tobit regression models. There was no association

between SS, IF, or geographic region of the senior author and the number of STROME-ID

criteria completed. Similar results were found in the multivariate tobit regression analysis,

although SS ≥ 277 was significantly associated with proportion of criteria met (P-value < 0.01).

There were seven papers with equal last authors whose primary affiliations were from different

continents; sensitivity analyses excluding these manuscripts yielded similar results

(Supplemental Tables 4 and 5).

Data-Sharing

As STROME-ID aims to support transparent reporting practices9, which is important for

reproducibility, we also investigated 1) whether authors reported the bioinformatics tools used,

along with corresponding version numbers for software, and 2) whether studies had uploaded

their genomic data to an open-access sequence archive. 87.7% (n=100) articles reported the

names of bioinformatic tools, however, only 33 (33%) of articles provided version numbers for

all of them (Supplemental Dataset 1). 75.4% (n= 86) of papers reported accession numbers for

their raw genomic data (Supplemental Table 6).

Effect on clinical and public health interventions

Given that genomic epidemiology studies aim to inform public health, we investigated whether

any articles reported clinical or public health actions as a result of their findings. Possibly due to

40

the retrospective nature of most of these studies, only 3 (2.6%) of included studies reported such

changes; specifically, WGS results helped identify linked cases, guide tailored drug treatment

based on drug-resistance analysis, and informed epidemiological investigations.18-20 It was noted

that one study reported their WGS findings to national tuberculosis surveillance programs, but

subsequent public health intervention was not possible because of the region’s political

instability.21

5.5. Discussion

STROME-ID was developed by an interdisciplinary team with expertise in infection control and

infectious diseases9 to facilitate reporting of study variables that were considered to be critical

for assessment of bias and study quality. Herein, we have used STROME-ID as the framework to

evaluate the reporting and transparency of genomic epidemiology studies of tuberculosis. This

comprehensive systematic review explored the application of WGS to genomic epidemiological

studies, completion of STROME-ID guidelines, and the association of specific study

characteristics with the degree of reporting.

We initially hypothesized that there would be improved reporting following the publication of

STROME-ID guidelines, however, we found no evidence of this in the current study. Only

~50%, on average, of STROME-ID criteria were completed before and after their publication, a

finding similar to that from other systematic reviews that have evaluated reporting quality post-

publication of STROBE. The proportion of criteria completed in these reviews ranged from

51·4%-76·5%.11, 12, 21, 22 While the proportions of criteria completed pre and post STROME-ID

publication were similar, however, we note there were more criteria completed, at least to a small

degree, in post-publication, as there were fewer criteria that were never completed in the post-

publication period (Figures 2 and 3). However, this could simply be due to undocumented

temporal changes, such as increased demand for reproducibility, and therefore unrelated to

STROME-ID.

There may be several reasons for the observed low reporting of STROME-ID criteria. That only

one included article specifically cited these guidelines20 suggests a lack of awareness may be an

issue.23 Some studies have also shown that formal journal endorsement of STROBE reporting

41

guidelines improves reporting adherence,24, 25 but to our knowledge, no publishers require

authors follow and report adherence to STROME-ID guidelines. Other practical limitations, such

as article word count and lack of online supplements, could have also influenced reporting

practices. That we did not find a single article that completed all STROME-ID criteria may also

suggest that many of the criteria in these guidelines may be too vague and/or difficult to employ

in practice. Further investigation is needed to evaluate this.

In terms of which criteria were less likely to be reported, we found STROME-ID criteria that

concerned key definitions, methods and potential limitations to be more poorly reported. While it

may seem trivial that the least completed STROME-ID criteria related to the defining of

molecular terminology, we would argue that standardization of basic microbiological

terminology is essential to allow for clear comparisons between studies and correct interpretation

of results for public health. Despite this, it has been suggested that, even in the same academic

field, terms such as strain, isolate, and clone may be used differently by different researchers.26

In addition to this, we note that STROME-ID 8·1 (methods for detecting multi-strain infections)

was also reported poorly across the entire study period; while this criterion was investigated by

some of the included papers, methods for discriminating within-host diversity using WGS data

are an area of active research,27, 28 which may explain why these were less frequently discussed.

Journal IF has been frequently used as an indicator of quality,29 by funding organizations30, 31 and

even for academic promotion.31However, our analyses suggest that reporting quality is not linked

to IF, adjusting for SS and geographic region of publication. Similarly, we found no association

between HI and reporting quality. This reinforces the limitations of such indicators as correlates

of the quality of scientific publications, supporting other recent studies.30, 32, 33 Moreover, SS was

not found to be associated with the number of criteria completed; studies with 153-276 isolates

completed a similar number of mean criteria as those with ≥ 277 isolates. While SS ≥ 277 was

associated with a higher proportion of criteria being reported, this was equivalent to a < 10%

increase compared to the reference of < 30 samples, and only a 2.0% difference from 153-276,

the adjacent category. Therefore, while this is statistically significant, we suspect this is not an

epidemiologically meaningful difference.

42

In addition to STROME-ID criteria, we also investigated whether bioinformatics tools (at a

minimum) were well-documented in TB genomic epidemiology papers as reproducibility is a

critical concern in genomic studies.34, 35 Although we found articles frequently reported the name

of the tool, we found that their corresponding version number of the software was reported much

less frequently - consistent with a recent analysis of RNA-seq methodology.36 The inclusion of

version numbers is essential to evaluate bias, reproduce workflows and compare results across

studies, which, as Simoneau et al. propose, suggests the need for standardized reporting of these

methodologic details.36 Even more surprisingly, we found that nearly 1/4 of studies did not

provide an Sequence Read Archive or Genbank accession number for their sequencing data -

with no improvement across the whole study period (Supplemental Dataset 1). This is

problematic; it not only prevents others from reproducing analyses and verifying others’

results,37 but in the context of infectious diseases, this can hinder public health investigations that

rely on global strain depositions for genomic context and/or for evaluation of cross-jurisdictional

transmission. We therefore suggest that data deposition should be a requirement for publication,

rather than just a ‘social norm’ in genomic epidemiology. However, such a change will be

unlikely without collaboration (and enforcement) by publishers.34

Overall, this study has a number of strengths. First, this represents a comprehensive review of

reporting practices in TB genomic epidemiology studies, starting with the first publication in TB

genomic epidemiology in 200938 and including a search of unpublished literature. Using

STROME-ID guidelines, we have identified key gaps in current reporting practices which may

affect interpretation of results; this adds to recent work that highlighted the implications of

differences in analytic pipelines.4 To our knowledge, this is the first study to examine the

application of STROME-ID guidelines - to TB or any other pathogen - and will serve as a role

model for other such investigations. In terms of analysis, we employed a rigorous analytic

approach, and conducted numerous sensitivity analyses to assess the robustness of our results,

lending further support to our inferences. Finally, in addition to STROME-ID criteria, we also

examined variables related to reproducibility - highlighting that even in a field that has

(arguably) embraced open-science, a large proportion of studies continue to not share their

underlying genomic data.

43

There are several limitations of this work. First, we note that, given that the STROME-ID

guidelines were only published in 2014, this may have not been enough time for widespread

uptake of these reporting guidelines at the time this study was conducted. However, as we did

not observe increased reporting practices even in 2019 - four years after publication - we

consider this to be somewhat unlikely. This view is supported by other studies suggesting low

adherence to STROBE post-publication.12, 13, 39 Furthermore, due to the limited number of

studies in each time period, we were not able to conduct an analysis controlling for secular trends

(e.g., an interrupted time-series). However, as we did not see evidence of any such trends on

visual assessment by year, this is unlikely to influence our pre/post comparison, and in our

regression analyses, we specifically accounted for time by using IF for the year of publication.

We also note that, as bioinformatics pipelines are not yet standardized,4 our review of reporting

bioinformatics tools was qualitative and did not require adherence to a specific pipeline or set of

steps. Had we required a minimum set of tools and/or analytic steps be reported, we expect this

would have painted an even worse picture of reproducibility of these pipelines. Finally, we did

not separate STROME-ID criteria that required multiple pieces of information (e.g., STROBE-

19, which required reporting of both limitations and direction of bias); thus, if the entire criterion

was not met, it was assigned “incomplete”. Similarly, for bioinformatics version numbers, we

considered reporting to be complete only if steps were reported with versions for all included

tools; there may be differences in reporting version numbers across steps in the analysis.

5.6. Conclusion

In this comprehensive review, we systematically examined reporting quality using STROME-ID

as a benchmark. We have shown that, in general, only ~50% of STROME-ID criteria were

completed. While the current study is limited to TB, we anticipate that many of these reporting

and transparency issues also apply to genomic epidemiology studies of other pathogens as well.

The reasons underlying this low level of reporting are unclear; similar reporting practices have

been found with other guidelines for other types of studies.40, 41 Possible reasons include

adherence to strict word limits, low author awareness and/or understanding of guidelines, and

possibly, resistance to change. Alternatively, it may be that these guidelines may be too difficult

to implement in practice. Further study is warranted to investigate these hypotheses.

44

Finally, in addition to STROME-ID, we also identified key reproducibility issues in many

studies, pertaining to methods of analysis and data sharing. For the latter, we suggest that data

deposition should be more than a ‘social norm’ in genomic epidemiology - it should be a

requirement for publication. This will require active support for journals, with real consequences

for failing to meet this obligation.36

5.7. Acknowledgements Contributions

BC was responsible for screening abstracts and titles for inclusion, data extraction, statistical

analysis, making the tables and figures, interpreting the data, and writing the first draft of the

manuscript. MAB assisted with interpreting the data, reviewed drafts of the manuscript, and co-

supervised BC. BPH and TC contributed to the protocol development, and reviewed the final

draft of the manuscript. TC also served as arbitrator for disagreement in study inclusion. RSL

conceived and led the study, designed the protocol and ran the searches, screened abstracts and

titles for inclusion, guided statistical analyses and interpretation of the data, wrote the first draft

of the manuscript with BC and co-supervised BC.

Funding

BC was supported by a CIHR Frederick Banting and Charles Best graduate award. MAB holds a

CIHR Foundation Grant (FDN-148362). BP Howden holds a Practitioner Fellowship from the

National Health and Medical Research Council (Australia). TC holds grants from the National

Institutes of Health, USA (R01 AI112438 and U54GM088558).

Declaration of interest

The authors declare no conflict of interests.

45

Manuscript references

1. Roetzer A, Diel R, Kohl TA, et al. Whole genome sequencing versus traditional

genotyping for investigation of a ycobacterium tuberculosis outbreak: A longitudinal molecular

epidemiological study. PLoS Medicine 2013; 10(2): e1001387.

2. Berg JS, Khoury MJ, Evans JP. Deploying whole genome sequencing in clinical practice

and public health: Meeting the challenge one bin at a time. Genet Med. 2011; 13(6): 499-504.

3. Lee RS, Behr MA. The implications of whole-genome sequencing in the control of

tuberculosis. Ther Adv Infect Dis 2016; 3(2): 47-62.

4. Meehan CJ, Goig GA, Kohl TA, et al. Whole genome sequencing of Mycobacterium

tuberculosis: current standards and open issues. Nat Rev Microbio.2019; 17(9): 533-45.

5. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol

2006; 163(9): 783-9.

6. Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in

clinical and public health microbiology. Pathology 2015; 47(3): 199-210.

7. Phelan J, O’Sullivan DM, Machado D, et al. The variability and reproducibility of whole

genome sequencing technology for detecting resistance to anti-tuberculous drugs. Genome

Medicine 2016; 8(1): 132.

8. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The

Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement:

guidelines for reporting observational studies. J Clin Epidemiol 2008; 61(4): 344-9.

9. Field N, Cohen T, Struelens MJ, et al. Strengthening the Reporting of Molecular

Epidemiology for Infectious Diseases (STROME-ID): an extension of the STROBE statement.

Lancet Infect Dis 2014; 14(4): 341-52.

10. Sorensen AA, Wojahn RD, Manske MC, Calfee RP. Using the strengthening the

reporting of observational studies in epidemiology (STROBE) statement to assess reporting of

observational trials in hand surgery. J Hand Surg Am 2013; 38(8): 1584-9.

11. Agha RA, Fowler AJ, Limb C, et al. Impact of the mandatory implementation of

reporting guidelines on reporting quality in a surgical journal: A before and after study. Int J

Surg 2016; 30: 169-72.

46

12. Bastuji-Garin S, Sbidian E, Gaudy-Marqueste C, et al. Impact of STROBE statement

publication on quality of observational study reporting: interrupted time series versus before-

after analysis. PLoS ONE 2013; 8(8): e64733.

13. Rao A, Brück K, Methven S, et al. Quality of reporting and study design of CKD cohort

studies assessing mortality in the elderly before and after STROBE: a systematic review. PLoS

ONE 2016; 11(5): e0155078.

14. da Costa BR, Cevallos M, Altman DG, Rutjes AWS, Egger M. Uses and misuses of the

STROBE statement: bibliographic study. BMJ Open 2011; 1(1).

15. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic

reviews and meta-analyses of studies that evaluate health care interventions: explanation and

elaboration. Ann Intern Med 2009; 151(4): W65-W94.

16. Kuroki LM, Allsworth JE, Peipert JF. Methodology and analytic techniques used in

clinical research: associations with journal impact factor. Obstet Gynecol 2009; 114(4): 877-84.

17. Falagas ME, Kouranos VD, Michalopoulos A, Rodopoulou SP, Batsiou MA,

Karageorgopoulos DE. Comparison of the distribution of citations received by articles published

in high, moderate, and low impact factor journals in clinical medicine. Intern Med J 2010; 40(8):

587-91.

18. Cabibbe AM, Trovato A, De Filippo MR, Ghodousi A, Rindi L, Garzelli C, et al.

Countrywide implementation of whole genome sequencing: an opportunity to improve

tuberculosis management, surveillance and contact tracing in low incidence countries. The Eur

Respir J 2018;51(6): 1800387.

19. Genestet C, Tatai C, Berland JL, Claude JB, Westeel E, Hodille E, et al. Prospective

whole-genome sequencing in tuberculosis outbreak investigation, France, 2017-2018. Emerg

Infect Dis 2019; 25(3): 589-92.

20. Walker TM, Merker M, Knoblauch AM, et al. A cluster of multidrug-resistant

Mycobacterium tuberculosis among patients arriving in Europe from the Horn of Africa: a

molecular epidemiological study. Lancet Infect Dis 2018; 18(4): 431-40.

21. Parsons NR, Hiskens R, Price CL, Achten J, Costa ML. A systematic survey of the

quality of research reporting in general orthopaedic journals. J Bone Joint Surg Br 2011; 93(9):

1154-9.

47

22. Hendriksma M, Joosten MHMA, Peters JPM, Grolman W, Stegeman I. Evaluation of the

quality of reporting of observational studies in otorhinolaryngology - Based on the STROBE

statement. PLoS ONE 2017; 12(1): e0169316.

23. Sharp MK, Bertizzolo L, Rius R, Wager E, Gómez G, Hren D. Using the STROBE

statement: survey findings emphasized the role of journals in enforcing reporting guidelines. J

Clin Epidemiol 2019; 116: 26-35.

24. Sharp MK, Tokalić R, Gómez G, Wager E, Altman DG, Hren D. A cross-sectional

bibliometric study showed suboptimal journal endorsement rates of STROBE and its extensions.

J Clin Epidemiol 2019; 107: 42-50.

25. Sharp MK, Utrobicic A, Gomez G, Cobo E, Wager E, Hren D. The STROBE extensions:

protocol for a qualitative assessment of content and a survey of endorsement. BMJ Open 2017;

7(10).

26. Van Belkum A, Tassios PT, Dijkshoorn L, et al. Guidelines for the validation and

application of typing methods for use in bacterial epidemiology. Clin Microbiol Infect 2007;

13(s3): 1-46.

27. Wyllie DH, Davidson JA, Grace Smith E, Rathod P, Crook DW, Peto TEA, et al. A

Quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for

identifying Mycobacterium tuberculosis transmission: a prospective observational cohort study.

EBioMedicine 2018; 34: 122-30.

28. Martin MA, Lee RS, Cowley LA, Gardy JL, Hanage WP. Martin MA, Lee RS, Cowley

LA, Gardy JL, Hanage WP. Within-host Mycobacterium tuberculosis diversity and its utility for

inferences of transmission. Microb Genom 2018; 4(10).

29. Lee KP, Schotland M, Bacchetti P, Bero LA. Association of journal quality indicators

with methodological quality of clinical research articles. JAMA 2002; 287(21): 2805-8.

30. Bornmann L, Williams R. Can the journal impact factor be used as a criterion for the

selection of junior researchers? A large-scale empirical study based on ResearcherID data. J

Informetr 2017; 11(3): 788-99.

31. Retzer V, Jurasinski G. Towards objectivity in research evaluation using bibliometric

indicators – A protocol for incorporating complexity. Basic App Ecology 2009; 10(5): 393-400.

32. Oswald A. An examination of the reliability of prestigious scholarly journals: evidence

and implications for decision-makers. Economica 2007; 74(293): 21-31.

48

33. Waltman L, Costas R, Jan van Eck N. Some Limitations of the H Index: A Commentary

on Ruscio and Colleagues' Analysis of Bibliometric Indices. Measurement 2012; 10(3): 172-5.

34. Nekrutenko A, Taylor J. Next-generation sequencing data interpretation: enhancing

reproducibility and accessibility. Nature Reviews Genetics 2012; 13(9): 667-72.

35. Reality check on reproducibility. Nature 2016; 533(7604).

36. Simoneau J, Dumontier S, Gosselin R, Scott MS. Current RNA-seq methodology

reporting limits reproducibility. Brief Bioinform 2019.

37. Miyakawa T. No raw data, no science: another possible source of the reproducibility

crisis. Mol Brain 2020; 13(1): 24.

38. Bryant JM, Schürch AC, van Deutekom H, et al. Inferring patient to patient transmission

of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infect Dis 2013;

13(1): 110-.

39. Pouwels KB, Widyakusuma NN, Groenwold RHH, Hak E. Quality of reporting of

confounding remained suboptimal after the STROBE guideline. J Clin Epidemiol 2016; 69: 217-

24.

40. Ghimire S, Kyung E, Lee H, Kim E. Oncology trial abstracts showed suboptimal

improvement in reporting: a comparative before-and-after evaluation using CONSORT for

Abstract guidelines. J Clin Epidemiol 2014; 67(6): 658-66.

41. Fleming PS, Buckley N, Seehra J, Polychronopoulou A, Pandis N. Reporting quality of

abstracts of randomized controlled trials published in leading orthodontic journals from 2006 to

2011. Am Journal Orthod Dentofacial Orthop 2012; 142(4): 451-8.

42. Al-Ghafli H, Kohl TA, Merker M, et al. Drug-resistance profiling and transmission

dynamics of multidrug-resistant Mycobacterium tuberculosis in Saudi Arabia revealed by whole

genome sequencing. Infect Drug Resist 2018; 11: 2219-29.

43. Alaridah N, Hallback ET, Tangrot J, et al. Transmission dynamics study of tuberculosis

isolates with whole genome sequencing in southern Sweden. Sci Rep 2019; 9.

44. Arandjelovic I, Merker M, Richter E, et al. Longitudinal Outbreak of Multidrug-Resistant

Tuberculosis in a Hospital Setting, Serbia. Emerg Infect Dis 2019; 25(3): 555-8.

45. Arnold A, Witney AA, Vergnano S, et al. XDR-TB transmission in London: Case

management and contact tracing investigation assisted by early whole genome sequencing. J

Infect 2016; 73(3): 210-8.

49

46. Auld SC, Shah NS, Mathema B, et al. Extensively drug-resistant tuberculosis in South

Africa: genomic evidence supporting transmission in communities. Eur Respir J 2018; 52(4).

47. Ayabina D, Ronning JO, Alfsnes K, et al. Genome-based transmission modeling

separates imported tuberculosis from recent transmission within an immigrant population.

Microb Genom 2018; 4(10): e000219.

48. Bainomugisa A, Lavu E, Hiashiri S, et al. Multi-clonal evolution of multi-drug-

resistant/extensively drugresistant Mycobacterium tuberculosis in a high-prevalence setting of

Papua New Guinea for over three decades. Microb Genom 2018; 4(2): 000147.

49. Bouzouita I, Cabibbe AM, Trovato A, et al. Whole-genome sequencing of drug-resistant

Mycobacterium tuberculosis strains, Tunisia, 2012-2016. Emerg Infect Dis 2019; 25(3): 547-50.

50. Bjorn-Mortensen K, Soborg B, Koch A, et al. Tracing Mycobacterium tuberculosis

transmission by whole genome sequencing in a high incidence setting: a retrospective

population-based study in East Greenland. Scientific reports 2016; 6: 33180.

51. Black PA, de Vos M, Louw GE, et al. Whole genome sequencing reveals genomic

heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates. BMC Genomics

2015; 16(1): 857.

52. Brown TS, Narechania A, Walker JR, et al. Genomic epidemiology of Lineage 4

Mycobacterium tuberculosis subpopulations in New York City and New Jersey, 1999-2009.

BMC Genomics 2016; 17(1): 947.

53. Bryant JM, Harris SR, Parkhill J, et al. Whole-genome sequencing to establish relapse or

re-infection with Mycobacterium tuberculosis: A retrospective observational study. Lancet

Respir Med 2013; 1(10): 786-92.

54. Bui DP, Oren E, Roe DJ, et al. A Case-Control Study to Identify Community Venues

Associated with Genetically-clustered, Multidrug-resistant Tuberculosis Disease in Lima, Peru.

Clin Infect Dis 2018; 68(9): 1547-55.

55. Casali N, Nikolayevskyy V, Balabanova Y, et al. Microevolution of extensively drug-

resistant tuberculosis in Russia. Genome Research 2012; 22(4): 735-45.

56. Casali N, Nikolayevskyy V, Balabanova Y, et al. Evolution and transmission of drug-

resistant tuberculosis in a Russian population. Nature Genetics 2014; 46(3): 279-86.

50

57. Casali N, Broda A, Harris SR, Parkhill J, Brown T, Drobniewski F. Whole Genome

Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A

Retrospective Observational Study. PLoS Medicine 2016; 13(10): e1002137.

58. Chatterjee A, Nilgiriwala K, Saranath D, Rodrigues C, Mistry N. Whole genome

sequencing of clinical strains of Mycobacterium tuberculosis from Mumbai, India: A potential

tool for determining drug-resistance and strain lineage. Tuberculosis 2017; 107: 63-72.

59. Clark TG, Mallard K, Coll F, et al. Elucidating emergence and transmission of multidrug-

resistant tuberculosis in treatment experienced patients by whole genome sequencing. PLoS ONE

2013; 8(12): e83012.

60. Cohen KA, Abeel T, Manson McGuire A, et al. Evolution of Extensively Drug-Resistant

Tuberculosis over Four Decades: Whole Genome Sequencing and Dating Analysis of

Mycobacterium tuberculosis Isolates from KwaZulu-Natal. PLoS Medicine 2015; 12(9):

e1001880.

61. Comas I, Hailu E, Kiros T, et al. Population Genomics of Mycobacterium tuberculosis in

Ethiopia Contradicts the Virgin Soil Hypothesis for Human Tuberculosis in Sub-Saharan Africa.

Current Biology 2015; 25(24): 3260-6.

62. Comas I, Coscolla M, Luo T, et al. Out-of-Africa migration and Neolithic coexpansion of

Mycobacterium tuberculosis with modern humans. Nat Genet 2013; 45(10): 1176-82.

63. Coscolla M, Barry PM, Oeltmann JE, et al. Genomic epidemiology of multidrug-resistant

Mycobacterium tuberculosis during transcontinental spread. J Infect Dis 2015; 212(2): 302-10.

64. Dheda K, Limberis JD, Pietersen E, et al. Outcomes, infectiousness, and transmission

dynamics of patients with extensively drug-resistant tuberculosis and home-discharged patients

with programmatically incurable tuberculosis: a prospective cohort study. Lancet Respir Med

2017; 5(4): 269-81.

65. Dixit A, Freschi L, Vargas R, et al. Whole genome sequencing identifies bacterial factors

affecting transmission of multidrug-resistant tuberculosis in a high-prevalence setting. Sci Rep

2019; 9.

66. Doroshenko A, Pepperell CS, Heffernan C, et al. Epidemiological and genomic

determinants of tuberculosis outbreaks in First Nations communities in Canada. BMC Med 2018;

16.

51

67. Eldholm V, Monteserin J, Rieux A, et al. Four decades of transmission of a multidrug-

resistant Mycobacterium tuberculosis outbreak strain. Nature Communications 2015; 6.

68. Fiebig L, Kohl TA, Popovici O, et al. A joint cross-border investigation of a cluster of

multidrug-resistant tuberculosis in Austria, Romania and Germany in 2014 using classic,

genotyping and whole genome sequencing methods: Lessons learnt. Eurosurveillance 2017;

22(2).

69. Gardy JL, Johnston JC, Sui SJ, et al. Whole-genome sequencing and social-network

analysis of a tuberculosis outbreak. N Engl J Med 2011; 364(8): 730-9.

70. Gautam SS, Aogain MM, Cooley LA, et al. Molecular epidemiology of tuberculosis in

Tasmania and genomic characterisation of its first known multi-drug resistant case. PLoS ONE

2018; 13(2): e0192351.

71. Gautam SS, Mac Aogain M, Bower JE, Basu I, O'Toole RF. Differential carriage of

virulence-associated loci in the New Zealand Rangipo outbreak strain of Mycobacterium

tuberculosis. Infectious Diseases 2017; 49(9): 680-8.

72. Glynn JR, Guerra-Assuncao JA, Houben RM, et al. Whole Genome Sequencing Shows a

Low Proportion of Tuberculosis Disease Is Attributable to Known Close Contacts in Rural

Malawi. PLoS ONE 2015; 10(7): e0132840.

73. Guerra-Assuncao JA, Crampin AC, Houben RM, et al. Large-scale whole genome

sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. elife

2015; 4: 03.

74. Guerra-Assunçao JA, Houben RMGJ, Crampin AC, et al. Recurrence due to relapse or

reinfection with Mycobacterium tuberculosis: a whole-genome sequencing approach in a large,

population-based cohort with a high HIV infection prevalence and active follow-up. J Infect Dis

2015; 211(7): 1154-63.

75. Guthrie JL, Delli Pizzi A, Roth D, et al. Genotyping and Whole-Genome Sequencing to

Identify Tuberculosis Transmission to Pediatric Patients in British Columbia, Canada, 2005-

2014. J Infect Dis 2018; 218(7): 1155-63.

76. Ho ZJM, Chee CBE, Ong RTH, et al. Investigation of a cluster of multi-drug resistant

tuberculosis in a high-rise apartment block in Singapore. International J Infect Dis 2018; 67: 46-

51.

52

77. Holden KL, Bradley CW, Curran ET, et al. Unmasking leading to a healthcare worker

Mycobacterium tuberculosis transmission. Journal of Hospital Infection 2018; 100(4): E226-

E32.

78. Holt KE, McAdam P, Thai PVK, et al. Frequent transmission of the Mycobacterium

tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam. Nat

Genet 2018; 50(6): 849-56.

79. Huang H, Ding N, Yang T, et al. Cross-sectional whole-genome sequencing and

epidemiological study of multidrug-resistant Mycobacterium tuberculosis in China. Clin Infect

Dis: 2019: 69(30):405-413.

80. Ioerger TR, Koo S, No E-G, et al. Genome analysis of multi- and extensively-drug-

resistant tuberculosis from KwaZulu-Natal, South Africa. PloS one 2009; 4(11): e7778.

81. Ioerger TR, Feng Y, Chen X, et al. The non-clonality of drug resistance in Beijing-

genotype isolates of Mycobacterium tuberculosis from the Western Cape of South Africa. BMC

Genomics 2010; 11:670.

82. Ismail NA, Omar SV, Joseph L, et al. Defining bedaquiline susceptibility, resistance,

cross-resistance and associated genetic determinants: a retrospective cohort study. EBioMedicine

2018; 28: 136-42.

83. Jajou R, de Neeling A, Rasmussen EM, et al. A predominant variable-number tandem-

repeat cluster of Mycobacterium tuberculosis isolates among asylum seekers in the Netherlands

and Denmark, deciphered by whole-genome sequencing. J Clin Microbiol 2018; 56(2).

84. Jajou R, De Neeling A, Van Hunen R, et al. Epidemiological links between tuberculosis

cases identified twice as efficiently by whole genome sequencing than conventional molecular

typing: A population-based study. PLoS ONE 2018; 13(4): e0195413.

85. Jiang Q, Lu L, Wu J, et al. Assessment of tuberculosis contact investigation in Shanghai,

China: An 8-year cohort study. Tuberculosis 2018; 108: 10-5.

86. Kato-Maeda M, Ho C, Passarelli B, et al. Use of whole genome sequencing to determine

the microevolution of mycobacterium tuberculosis during an outbreak. PLoS ONE 2013; 8

(3):e58235.

87. Koster K, Largen A, Foster JT, et al. Whole genome SNP analysis suggests unique

virulence factor differences of the Beijing and Manila families of Mycobacterium tuberculosis

found in Hawaii. PLoS ONE 2018; 13(7).

53

88. Koster KJ, Largen A, Foster JT, et al. Genomic sequencing is required for identification

of tuberculosis transmission in Hawaii. BMC Infect Dis2018; 18.

89. Kato-Miyazawa M, Miyoshi-Akiyama T, Kanno Y, Takasaki J, Kirikae T, Kobayashi N.

Genetic diversity of Mycobacterium tuberculosis isolates from foreign-born and Japan-born

residents in Tokyo. Clinical Microbiology and Infection 2015; 21(3): 248.

90. Korhonen V, Smit PW, Haanpera M, et al. Whole genome analysis of Mycobacterium

tuberculosis isolates from recurrent episodes of tuberculosis, Finland, 1995-2013. Clin Microbiol

Infect 2016; 22(6): 549-54.

91. Lalor MK, Casali N, Walker TM, et al. The use of whole-genome sequencing in cluster

investigation of a multidrug-resistant tuberculosis outbreak. Eur Resp J 2018; 51(6).

92. Lanzas F, Karakousis PC, Sacchettini JC, Ioerger TR. Multidrug-resistant tuberculosis in

panama is driven by clonal expansion of a multidrug-resistant mycobacterium tuberculosis strain

related to the KZN extensively drug-resistant m. tuberculosis strain from South Africa. J Clin

Microbiol 2013; 51(10): 3277-85.

93. Lee RS, Radomski N, Proulx JF, et al. Reemergence and amplification of tuberculosis in

the Canadian Arctic. J Infect Dis 2015; 211(12): 1905-14.

94. Lee RS, Radomski N, Proulx J-F, et al. Population genomics of Mycobacterium

tuberculosis in the Inuit. Proc Natl Acad Sci USA 2015; 112(44): 13609-14.

95. Luo T, Comas I, Luo D, et al. Southern East Asian origin and coexpansion of

Mycobacterium tuberculosis Beijing family with Han Chinese. Proc Natl Acad Sci U S A 2015;

112(26): 8136-41.

96. Luo T, Yang C, Peng Y, et al. Whole-genome sequencing to detect recent transmission of

Mycobacterium tuberculosis in settings with a high burden of tuberculosis. Tuberculosis 2014;

94(4): 434-40.

97. Ma MJ, Yang Y, Wang HB, et al. Transmissibility of tuberculosis among school

contacts: An outbreak investigation in a boarding middle school, China. Infect Genet Evol 2015;

32: 148-55.

98. Macedo R, Pinto M, Borges V, et al. Evaluation of a gene-by-gene approach for

prospective whole-genome sequencing-based surveillance of multidrug resistant Mycobacterium

tuberculosis. Tuberculosis 2019; 115: 81-8.

54

99. Madrazo-Moya CF, Cancino-Munoz I, Cuevas-Cordoba B, et al. Whole genomic

sequencing as a tool for diagnosis of drug and multidrug-resistance tuberculosis in an endemic

region in Mexico. PLoS ONE 2019; 14(6).

100. Mai TQ, Martinez E, Menon R, et al. Mycobacterium tuberculosis Drug Resistance and

Transmission among Human Immunodeficiency Virus-Infected Patients in Ho Chi Minh City,

Vietnam. Am J Trop Med Hy g2018; 99(6): 1397-406.

101. Makhado NA, Matabane E, Faccin M, et al. Outbreak of multidrug-resistant tuberculosis

in South Africa undetected by WHO-endorsed commercial tests: an observational study. Lancet

Infect Dis 2018; 18(12): 1350-9.

102. Malm S, Linguissi LSG, Tekwu EM, et al. New Mycobacterium tuberculosis complex

sublineage, Brazzaville, Congo. Emerg Infect Dis 2017; 23(3): 423-9.

103. Manson AL, Abeel T, Galagan JE, et al. Mycobacterium tuberculosis whole genome

sequences from Southern India suggest novel resistance mechanisms and the need for region-

specific diagnostics. 2017; 64(11): 1494-501.

104. Manson AL, Cohen KA, Abeel T, et al. Genomic analysis of globally diverse

Mycobacterium tuberculosis strains provides insights into the emergence and spread of

multidrug resistance. Nature Genetics 2017; 49(3): 395-402.

105. Mehaffy C, Guthrie JL, Alexander DC, Stuart R, Rea E, Jamieson FB. Marked

microevolution of a unique Mycobacterium tuberculosis Strain in 17 years of ongoing

transmission in a high risk population. PLoS ONE 2014; 9(11): 0112928.

106. Merker M, Blin C, Mona S, et al. Evolutionary history and global spread of the

Mycobacterium tuberculosis Beijing lineage. Nature Genetics 2015; 47(3): 242-9.

107. Merker M, Barbier M, Cox H, et al. Compensatory evolution drives multidrug-resistant

tuberculosis in Central Asia. elife 2018; 7.

108. Merker M, Kohl TA, Roetzer A, et al. Whole genome sequencing reveals complex

evolution patterns of multidrug-resistant Mycobacterium tuberculosis Beijing strains in patients.

PLoS ONE 2013; 8(12): e82551.

109. Mizukoshi F, Miyoshi-Akiyama T, Iwai H, et al. Genetic diversity of Mycobacterium

tuberculosis isolates from Tochigi prefecture, a local region of Japan. BMC Infect Dis2017;

17(1): 365.

55

110. Mokrousov I, Shitikov E, Skiba Y, Kolchenko S, Chernyaeva E, Vyazovaya A. Emerging

peak on the phylogeographic landscape of Mycobacterium tuberculosis in West Asia: Definitely

smoke, likely fire. Mol Phylogenetics Evol 2017; 116: 202-12.

111. Mortimer TD, Weber AM, Pepperell CS. Signatures of selection at drug resistance loci in

Mycobacterium tuberculosis. mSystems 2018; 3(1).

112. Nelson KN, Shah NS, Mathema B, et al. Spatial patterns of extensively drug-resistant

tuberculosis transmission in KwaZulu-Natal, South Africa. J Infect Dis 2018; 218(12): 1964-73.

113. Norheim G, Seterelv S, Arnesen TM, et al. Tuberculosis outbreak in an educational

institution in Norway. J Clin Microbiol 2017; 55(5): 1327-33.

114. Ocheretina O, Shen L, Escuyer VE, et al. Whole genome sequencing investigation of a

tuberculosis outbreak in Port-au-Prince, Haiti caused by a strain with a "Low-Level" rpoB

Mutation L511P - Insights into a Mechanism of Resistance Escalation. PLoS ONE 2015; 10(6):

e0129207.

115. O'Neill MB, Shockey A, Zarley A, et al. Lineage specific histories of Mycobacterium

tuberculosis dispersal in Africa and Eurasia. Mol Ecol 2019; 28(13): 3241-56.

116. Otchere ID, Coscolla M, Sanchez-Buso L, Asante-Poku A, Meehan C, Osei-Wusu S, et

al. Comparative genomics of Mycobacterium africanum Lineage 5 and Lineage 6 from Ghana

suggests different ecological niches. Sci Rep 2018;8;11269.

117. Outhred AC, Holmes N, Sadsad R, et al. Identifying likely transmission pathways within

a 10-year community outbreak of tuberculosis by high-depth whole genome sequencing. PLoS

ONE 2016; 11(3): e0150550.

118. Packer S, Green C, Brooks-Pollock E, Chaintarli K, Harrison S, Beck CR. Social network

analysis and whole genome sequencing in a cohort study to investigate TB transmission in an

educational setting. BMC Infect Dis 2019; 19.

119. Panossian B, Salloum T, Araj GF, Khazen G, Tokajian S. First insights on the genetic

diversity of MDR Mycobacterium tuberculosis in Lebanon. BMC Infect Dis 2018; 18.

120. Parvaresh L, Crighton T, Martinez E, Bustamante A, Chen S, Sintchenko V. Recurrence

of tuberculosis in a low-incidence setting: a retrospective cross-sectional study augmented by

whole genome sequencing. BMC Infect Dis 2018; 18.

121. Perdigao J, Silva H, Machado D, et al. Unraveling genomic diversity and evolution in

lisbon, portugal, a highly drug resistant setting. BMC Genomics 2014; 15 (1): (991).

56

122. Perez-Lago L, Comas I, Navarro Y, et al. Whole genome sequencing analysis of

intrapatient microevolution in mycobacterium tuberculosis: potential impact on the inference of

tuberculosis transmission. J Infect Dis 2014; 209(1): 98-108.

123. Regmi SM, Chaiprasert A, Kulawonganunchai S, et al. Whole genome sequence analysis

of multidrug-resistant Mycobacterium tuberculosis Beijing isolates from an outbreak in

Thailand. Mol Genet Genomics 2015; 290(5): 1933-41.

124. Roycroft E, O'Toole RF, Fitzgibbon MM, et al. Molecular epidemiology of multi- and

extensively-drug-resistant Mycobacterium tuberculosis in Ireland, 2001-2014. J Infect 2018;

76(1): 55-67.

125. Ruesen C, Chaidir L, van Laarhoven A, et al. Large-scale genomic analysis shows

association between homoplastic genetic variation in Mycobacterium tuberculosis genes and

meningeal or pulmonary tuberculosis. BMC Genomics 2018; 19(1): 122.

126. Rutaihwa LK, Menardo F, Stucki D, et al. Multiple introductions of Mycobacterium

tuberculosis Lineage 2–Beijing into Africa over centuries. Front Ecol and Evol 2019; 7(112).

127. Saelens JW, Lau-Bonilla D, Moller A, et al. Whole genome sequencing identifies

circulating Beijing-lineage Mycobacterium tuberculosis strains in Guatemala and an associated

urban outbreak. Tuberculosis 2015; 95(6): 810-6.

128. Satta G, Witney AA, Shorten RJ, Karlikowska M, Lipman M, McHugh TD. Genetic

variation in Mycobacterium tuberculosis isolates from a London outbreak associated with

isoniazid resistance. BMC Med 2016; 14: 1-9.

129. Schurch AC, Kremer K, Daviena O, et al. High-resolution typing by integration of

genome sequencing data in a large tuberculosis cluster. J Clin Microbiol 2010; 48(9): 3403-6.

130. Senghore M, Otu J, Witney A, et al. Whole-genome sequencing illuminates the evolution

and spread of multidrug-resistant tuberculosis in Southwest Nigeria. PLoS ONE 2017; 12(9):

e0184510.

131. Seraphin MN, Didelot X, Nolan DJ, et al. Genomic Investigation of a Mycobacterium

tuberculosis Outbreak Involving Prison and Community Cases in Florida, United States. Am J

Trop Med Hyg2018; 99(4): 867-74.

132. Shah NS, Auld SC, Brust JCM, et al. Transmission of extensively drug-resistant

tuberculosis in South Africa. N Engl J Med 2017; 376(3): 243-53.

57

133. Smit PW, Vasankari T, Aaltonen H, et al. Enhanced tuberculosis outbreak investigation

using whole genome sequencing and IGRA. Eur Resp J2015; 45(1): 276-9.

134. Sobkowiak B, Glynn JR, Houben R, et al. Identifying mixed Mycobacterium tuberculosis

infections from whole genome sequence data. BMC Genomics 2018; 19(1): 613.

135. Stucki D, Ballif M, Bodmer T, et al. Tracking a tuberculosis outbreak over 21 years:

strain-specific single-nucleotide polymorphism typing combined with targeted whole-genome

sequencing. J Infect Dis 2015; 211(8): 1306-16.

136. Stucki D, Ballif M, Egger M, et al. Standard Genotyping Overestimates Transmission of

Mycobacterium tuberculosis among Immigrants in a Low-Incidence Country. J Clin Microbiol

2016; 54(7): 1862-70.

137. Stucki D, Brites D, Jeljeli L, et al. Mycobacterium tuberculosis lineage 4 comprises

globally distributed and geographically restricted sublineages. Nat Genet 2016; 48(12): 1535-43.

138. Tyler AD, Randell E, Baikie M, et al. Application of whole genome sequence analysis to

the study of Mycobacterium tuberculosis in Nunavut, Canada. PLoS ONE 2017; 12(10):

e0185656.

139. Vaziri F, Kohl TA, Ghajavand H, et al. Genetic Diversity of Multi- and Extensively

Drug-Resistant Mycobacterium tuberculosis Isolates in the Capital of Iran, Revealed by Whole-

Genome Sequencing. J Clin Microbiol 2019; 57(1).

140. Walker TM, Ip CL, Harrell RH, et al. Whole-genome sequencing to delineate

Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis

2013; 13(2): 137-46.

141. Walker TM, Lalor MK, Broda A, et al. Assessment of Mycobacterium tuberculosis

transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: An

observational study. Lancet Respir Med 2014; 2(4): 285-92.

142. Winglee K, Manson McGuire A, Maiga M, et al. Whole genome sequencing of

mycobacterium africanum strains from mali provides insights into the mechanisms of geographic

restriction. PLoS Negl Trop Dis 2016; 10(1): e0004332.

143. Witney AA, Bateson AL, Jindani A, et al. Use of whole-genome sequencing to

distinguish relapse from reinfection in a completed tuberculosis clinical trial. BMC Med 2017;

15(1): 71.

58

144. Wollenberg KR, Desjardins CA, Zalutskaya A, et al. Whole-genome sequencing of

mycobacterium tuberculosis provides insight into the evolution and genetic composition of drug-

resistant tuberculosis in Belarus. J Clin Microbiol 2017; 55(2): 457-60.

145. Wyllie DH, Davidson JA, Smith EG, et al. A Quantitative Evaluation of MIRU-VNTR

Typing Against Whole-Genome Sequencing for Identifying Mycobacterium tuberculosis

Transmission: A Prospective Observational Cohort Study. Ebiomedicine 2018; 34: 122-30.

146. Yang C, Luo T, Shen X, et al. Transmission of multidrug-resistant Mycobacterium

tuberculosis in Shanghai, China: a retrospective observational study using whole-genome

sequencing and epidemiological investigation. Lancet Infect Dis 2017; 17(3): 275-84.

147. Yang CG, Lu LP, Warren JL, et al. Internal migration and transmission dynamics of

tuberculosis in Shanghai, China: an epidemiological, spatial, genomic analysis. Lancet Infect Dis

2018; 18(7): 788-95.

148. Yimer SA, Namouchi A, Zegeye ED, et al. Deciphering the recent phylogenetic

expansion of the originally deeply rooted Mycobacterium tuberculosis lineage 7. BMC Evol Biol

2016; 16(1): 146.

59

Manuscript figures and tables

Figure 1. PRISMA diagram of the stages of the systematic review.

Search and selection process during the systematic review. Full texts excluded because study was

a conference abstract or case report (n=3), lacked epidemiological aims (n=12), drug-resistance

prediction (n=2), inadequate or no WGS use (n=5), did not meet inclusion criteria (n=2).

60

Figure 2. Proportion of STROME-ID criteria met with 6-month lag pre-publication.

The average proportion of STROME-ID criteria met was variable across articles prior to

guideline publication, accounting for a six-month lag. Overall, the four most frequently

completed STROME-ID criteria included explaining the scientific background and rationale

(STROBE-2), stating the epidemiological objectives of using molecular typing (STROME-3.1),

stating the study’s overarching objectives and hypotheses (STROBE-3), and summarizing key

results with reference to study objectives (STROBE-18). The criterion requiring definitions for

molecular terminology (STROME-4.1) was not completed. The other two least completed

criterion were: methods used to detect multiple-strain infections (STROME-ID 8.1), and

discussion of limitations and direction of bias (STROBE-19).

61

Figure 3. Proportion of STROME-ID criteria met with 6-month lag post-publication.

The average proportion of STROME-ID criteria met was also variable post-guideline

publication, accounting for a six-month lag. The four most frequently completed STROME-ID

criteria were: explaining the scientific background and rationale (STROBE-2), stating the

epidemiological objectives of using molecular typing (STROME-3.1), stating source of

participants and specimens, including sampling frame (STROME-ID 6.1), and summarizing key

results (STROBE-18). The three least reported criterion were: definitions for molecular

terminology (STROME-4.1), methods used to detect multiple-strain infections (STROME-ID

8.1), and alternative explanations for explaining transmission chain results, (STROME-ID 19.1).

62

Figure 4. Proportion of STROME-ID criteria met post-publication, excluding articles from the

first 6 months following guideline publication.

A sensitivity analysis for the time period post-guideline publication was conducted, which

excludes articles published in a six-month publication lag. Six criteria were not completed. This

was also observed in the pre-guideline publication period that accounted for the six-month lag.

The least completed STROME-ID criterion required definitions for molecular terminology

(STROME-ID 4.1). The most completed STROME-ID criterion required stating the

epidemiological objectives of using molecular typing (STROME-ID 3.1).

63

Table 5. Summary of included studies

First author Study year Study aims Country or countries Sample size

isolates

Sample size

patients

Sequencing

platform(s)

Al-Ghafli 1 2018 Elucidate transmission dynamics

and describe resistance-conferring

mutations.

Saudia Arabia 205 NR Illumina

NextSeq

Alaridah 2 2019 Compare genotype techniques to

determine transmission in low

incidence country.

Sweden 100 52 Illumina HiSeq

Arandjelovic 3 2019 Explore countrywide transmission

routes, strain dynamics, and

bacterial evolution.

Serbia 103 110 Illumina MiSeq

and HiSeq

Arnold 4 2016 Describe XDR-TB cluster in the

UK.

England, Scotland,

Wales, Ireland

4 35 Not described

Auld 5 2018 Determine genomic transmission

links between individuals without

an epidemiologic link.

South Africa 342 386 Illumina MiSeq

Ayabina 6 2018 Infer if cases represent important

or local transmission.

Norway 129 127 Illumina MiSeq

and NextSeq

Bainomugisa 7 2018 Describe strains driving the

epidemic and associated drug

resistance mutations.

Daru Island, Papua

New Guinea

100 NR Illumina MiSeq

64

Bouzouita 8 2019 Investigate transmission of drug-

resistant strains.

Tunisia 46 46 Illumina

MiniSeq

Bjorn-

Mortensen 9

2016 Examine transmission in remote,

TB high-incidence region

Greenland 182 182 Illumina MiSeq,

HiSeq, NextSeq

Black 10 2017 Distinguish between outbreak

cases of relapse from reactivation

in UK.

England 17 25 Illumina MiSeq

Brown 11 2016 Describe genomic epidemiology

of subpopulations in two cities.

United States of

America

71 NR Illumina HiSeq

Bryant 12 2013 Estimate usefulness of the

molecular clock to refute and

affirm epidemiological links.

Amsterdam, Estonia 199 199 Illumina

Genome

Analyzer GAIIx

Bui 13 2019 Assess association between

exposure to community settings

and MDR-TB infection.

Peru 59 59 Not described

Cabibbe 14 2018 Describe WGS-based model for

TB diagnosis and surveillance.

Italy 298 56 Illumina

MiniSeq

Casali 15 2012 Examine microevolution of

Beijing strains and spread of drug

resistance.

Russian Federation 2348 2348 Illumina

Genome

Analyzer GAII

Casali 16 2014 Explore molecular mechanisms

determining transmissibility and

Russia 1000 2348 Illumina

Genome

65

prevalence of drug-resistant

strains.

Analyzer GAII,

HiSeq

Casali 17 2016 Compare WGS and MIRU-VNTR

to resolve the transmission

network within outbreak.

England 344 501 Illumina HiSeq

Chatterjee 18 2017 Characterize genotypic drug

resistance.

India 74 NR Illumina MiSeq

Clark 19 2013 Understand emergence and

acquisition of MDR-TB among

treated TB patients.

Uganda 51 41 Illumina HiSeq

Cohen 20 2015 Describe evolution of XDR-TB. Africa 337 337 Illumina HiSeq

Comas 21 2015 Describe population genomics in

Africa, and evolutionary origin of

TB.

Ethiopia 285 2151 Illumina HiSeq

Comas 22 2013 Describe evolutionary history of

human and TB.

46 countries 259 259 Illumina, didn't

specify

Coscolla 23 2015 Describe the genomic

epidemiology of MDR-TB among

refugees in USA.

United States of

America

57 45 Illumina HiSeq

Dheda 24 2017 Analyze transmission dynamics of

patients with XDR-TB.

Africa 149 237 Illumina HiSeq

66

Dixit 25 2019 Study evolution of isolates within

MDR-TB cluster.

Lima, Peru 61 60 Illumina HiSeq

Doroshenko 26 2018 Describe the epidemiological and

genomic determinants of two

outbreaks.

Canada 75 75 Illumina HiSeq

Eldholm 27 2015 Determine timeline of drug-

resistance evolution during an

outbreak.

Argentina 252 NR Illumina HiSeq

(244 samples),

Miseq (8

samples)

Fiebig 28 2017 Investigate cross-border MDR-TB

transmission.

Austria, Romania,

Germany

10 13 Illumina MiSeq

Gardy 29 2011 Describe outbreak transmission

with WGS and social network

analysis.

Canada 36 41 Illumina

Genome

Analyzer II

Gautum 30 2018 Describe the genomic

epidemiology of TB in Tasmania.

Tasmania 18 18 Illumina MiSeq

Gautum 31 2017 Analyze the genomic content of

the Rangipo strain.

New Zealand 9 NR Illumina MiSeq

Genestet 32 2019 Describe tracing of linked cases in

an outbreak using WGS.

France 14 14 Illumina MiSeq

Glynn 33 2015 Assess cases attributed to

transmission from close contacts.

Malawi 406 1907 Illumina HiSeq

67

Guerra-

Assunção 34

2015 Conduct district-wide analysis to

examine transmission over time.

Malawi 1687 2332 Illumina HiSeq

Guerra-

Assunção 35

2015 Assess effect of different factors

on the rate of recurrence due to

reinfection or relapse.

Malawi 1933 903 Illumina HiSeq

Gurjav 35 2016 Understand local TB transmission

in low-incidence setting.

Australia 30 1692 Ion Torrent

Personal

Genome

Guthrie 36 2018 Understand transmission

dynamics of pediatric TB in a

low-incidence setting.

Canada 49 49 Illumina HiSeq

Ho 37 2018 Describe extent of transmission

based on mass-screening exercise.

Singapore 10 6 Illumina, didn't

specify

Holden 38 2018 Describe results of an outbreak

investigation.

England 2 2 Illumina HiSeq

Holt 39 2018 Examine transmission dynamics. Vietnam 1635 2091 Illumina HiSeq

Huang 40 2019 Describe the epidemiological and

drug-resistance characteristics

MDR-TB.

China 357 357 Illumina HiSeq

Ioerger 41 2009 Investigate the causes and

evolution of drug-resistance.

South Africa 11 NR Illumina GAII

68

Ioerger 42 2010 Understand the mechanism of

drug-resistance among a subgroup

of the Beijing strain.

South Africa 14 NR Illumina, didn't

specify

Ismail 43 2018 Determine drug-resistance, and

assess criteria against putative

resistance associated with

variants.

South Africa 391 401 Illumina MiSeq

Jajou 44 2018 Analyze transmission dynamics

among asylum seekers, and assess

precision of VNTR typing vs

WGS.

Netherlands 40 40 Illumina

NextSeq

Jajou 45 2018 Investigate if WGS more

accurately predicts

epidemiological links between

patients than VNTR.

Netherlands 535 527 Illumina HiSeq

Jiang 46 2018 Determine incidence of TB in

close contacts and transmission.

China 4584 1765 Not described

Kato-Maeda 47 2018 Describe the microevolution

during outbreak of drug-

susceptible TB.

United States of

America

9 11 Illumina, didn’t

specify

69

Koster 48 2013 Identify genomic differences

between Beijing and Manila

families.

United States of

America

82 NR Illumina MiSeq

Koster 49 2019 Investigate TB transmission

clusters using WGS vs VNTR

typing.

United States of

America

16 15 Ilumina MiSeq

Kato-

Miyazawa 50

2018 Characterize genomic diversity of

foreign-born and Japan-born

residents in Tokyo.

Japan 259 91 Illumina MiSeq

Korhonen 51 2015 Determine whether recurrent

cases were caused by relapse vs

re-infection.

Finland 21 21 Illumina MiSeq

Lalor 52 2016 Delineate transmission networks

and investigate benefits of WGS

during cluster investigation.

England 22 22 Illumina MiSeq,

Illumina

Genome

Analyzer GAII,

Illumina HiSeq

Lanzas 53 2018 Determine extent of primary

acquired MDR-TB cases.

South Africa 97 NR Illumina

Genome

Analyzer Iix

70

Lee 54 2015 Explore epidemiological links

during an outbreak.

Canada 42 933 Illumina MiSeq

Lee 55 2015 Describe genomic features of an

epidemiologically successful

strain over time.

Canada 163 NR Illumina MiSeq

Luo 56 2015 Characterize global diversity of

358 Beijing strains.

China 908 NR Illumina HiSeq

Luo 57 2015 Compare VNTR and WGS to

study the transmission in a high

burden setting.

China 32 42 Illumina HiSeq

Ma 58 2014 Explore transmission dynamics of

an outbreak in a boarding school.

China 33 46 Ion Torrent

Macedo 59 2015 Compare WGS and classical

genotyping methods to determine

transmission chains.

Portugal 83 83 Illumina MiSeq

Madrazo-

Moya 60

2019 Identify drug-resistant mutations

in an endemic region.

Mexico 91 91 Illumina

NexSeq

Mai 61 2019 Examine transmission dynamics

and drug resistance-conferring

mutations among TB/HIV co-

infected patients.

Vietnam 200 200 Illumina

NextSeq

71

Makhado 62 2018 Determine if MDR-TB strains

genotypically similar to those in

eSwatini were also present in

South Africa.

South Africa 277 277 Illumina HiSeq,

MiSeq

Malm 63 2018 Determine the population

structure and transmission

dynamics.

Congo 75 211 Illumina MiSeq

Manson 64 2017 Describe prevalence of strains,

and evolution of drug-resistance

mutations.

India 223 196 Illumina HiSeq

Manson 65 2017 Determine acquisition timeline of

MDR-drug resistance mutations.

48 countries 5310 NR Illumina, didn’t

specify

Martin 66 2017 Use WGS data to identify within-

host heterogeneity amongst

patients in British Columbia.

Canada 25 NR Illumina HiSeq

Mehaffy 67 2018 Identify transmission events

associated with cases due to ON-

A strain.

Canada 61 57 Illumina, didn’t

specify

Merker 68 2014 Reconstruct evolutionary history

of Beijing lineage.

99 countries 4987 NR Illumina MiSeq

72

Merker 69 2015 Analyze evolutionary history of

drug-resistance and transmission

networks of MDR-TB isolates.

Uzbekistan 277 277 Illumina MiSeq,

HiSeq

Merker 70 2018 Examine mutation rates in Beijing

strains from regions with MDR-

TB.

Germany, Republic

of Georgia,

Uzbekistan

Not reported 3 Illumina, didn’t

specify

Mizukoshi 71 2013 Describe molecular epidemiology

of TB patients living in localized

area.

Japan 169 169 Illumina MiSeq

Mokrousov 72 2017 Describe evolutionary origin of

NEW-1 family in the Euro-

American lineage.

China, Tibet, Iran,

Russia, Kazakhstan

5715 NR Illumina MiSeq

Mortimer 73 2017 Characterized population genetics

of known drug resistance loci.

Russia, South Africa 1161 NR Illumina HiSeq

Nelson 74 2018 Evaluate XDR-TB transmission

within and between municipal

districts in KwaZulu-Natal.

South Africa 344 344 Illumina MiSeq

Norheim 75 2018 Report use of WGS to delineate

an outbreak.

Norway 22 24 Illumina MiSeq,

NextSeq

Ocheretina 76 2017 Investigate suspected outbreak of

8 cases.

Haiti 8 8 Illumia HiSeq

73

O'Neill 77 2019 Reconstruct lineage specific

patterns of spread in Africa and

Eurasia.

51 countries 552 NR Not described

Otchere 78 2018 Compared evolution of TB and

influence of human migration

from two lineages.

Ghana 214 NR Illumina HiSeq,

NextSeq

Outhred 79 2018 Clarify transmission pathways and

explore the evolution of an

outbreak.

Australia 23 23 Illumina HiSeq

Packer 80 2016 Investigate the transmission of TB

within an educational institution.

England 5 10 Illumina MiSeq

Panossian 81 2019 Evaluate genetic makeup of TB

lineages circulating in the Middle

East.

Lebanon 13 13 Illumina MiSeq

Parvaresh 82 2018 Analyze reinfection and

reactivation rates.

Australia 15 18 Illumina

NextSeq

Perdigao 83 2018 Determine genomic diversity and

microevolution of MDR- and

XDR-TB.

Portugal 56 NR Illumina HiSeq

Perez-Lago 84 2014 Examine microevolution of TB

within intrapatient and interpatient

scenarios.

Spain 36 NR Ilumina HiSeq

74

Regmi 85 2014 Investigate outbreak of MDR-TB. Thailand 64 148 Illumina HiSeq

Roetzer 86 2015 Identify outbreak-related

transmission chains.

Germany 86 86 Illumina (didn't

say which one)

Roycroft 87 2013 Examine acquisition and spread of

MDR-TB.

Ireland 42 41 Illumina MiSeq

Ruesen 88 2018 Examine association between TB

genotype and susceptibility to

TBM.

Indonesia 106 322 Illumina HiSeq

Rutaihwa 89 2018 Determine geographical origin of

Beijing strain and spread across

Africa.

Africa 781 781 Illumina HiSeq

Saelans 90 2019 Assess distribution of Beijing-

lineage.

Guatemala 5 5 Illumina HiSeq,

MiSeq

Satta 91 2015 Examine genetic variation of

outbreak samples.

England 16 NR Illumina HiSeq

Schurch 92 2016 Use WGS to study epidemiology

of an outbreak.

Netherlands 3 NR Genome

Sequencer

Senghore 93 2010 Understand epidemiology and

genetics of MDR-TB.

Nigeria 63 5 Illumina MiSeq

Seraphin 94 2017 Define recent transmission

clusters and timing of

transmission.

United States of

America

21 82 illumina MiSeq

75

Shah 95 2018 Describe population-level

transmission of XDR-TB.

South Africa 298 404 Illumina MiSeq

Smit 96 2017 Describe outbreak using WGS

and IGRA.

Finland 12 14 Not described

Sobkowiak 97 2014 Assess prevalence of mixed

infection and correlation with

patient characteristics and

outcomes.

Malawi, Portugal 48 10 Illumina HiSeq

(168 samples),

Illumina MiSeq

(10 samples)

Stucki 98 2018 Study outbreak dynamics. Switzerland 69 68 Illumina, didn't

specify

Stucki 99 2015 Assess transmission among

Swiss- and foreign-born TB

patients.

Switzerland 90 93 Illumina HiSeq,

MiSeq, NextSeq

Stucki 100 2016 Understand global population

structure of Lineage 4 and its

evolution.

100 countries 293 NR Illumina MiSeq,

HiSeq, NextSeq

Tyler 101 2016 Characterize genomic diversity of

outbreak clusters.

Canada 233 NR Illumina

NextSeq

Vaziri 102 2017 Explore drug resistance and

transmission dynamics.

Iran 38 13,892 Illumina

NextSeq

76

Walker 103 2019 Estimate genetic diversity of

related strains, and investigate

community outbreaks.

England 390 254 Illumina HiSeq

Walker 104 2013 Explore epidemiology of TB

transmission.

England 247 269 Illumina HiSeq

Walker 105 2014 Describe origin of transmission

cluster.

Germany,

Switzerland, France,

England, Somalia,

Ethiopia, Eritrea

58 29 Illumina, Ion

Torrent

Winglee 106 2018 Understand geographic

distribution of Lineages 5 and 6.

Mali 92 NR Illumina, didn’t

specify

Witney 107 2016 Determine proportion of cases

attributable to relapse and

reinfection.

South Africa,

Zimbabwe,

Botswana, Zambia

36 51 Illumina HiSeq

Wollenberg 108 2017 Understand evolution of MDR-

and XDR-TB

Belarus 138 97 Illumina HiSeq

Wyllie 109 2017 Determine proportion of linked

TB isolates that are closely

genomically related.

England 1999 1999 Illumina MiSeq

Yang 110 2018 Assess transmission of MDR-TB

and identify transmission risk

factors.

China 324 324 llumina Hiseq

77

Yang 111 2017 Describe transmission dynamics

in an urban setting.

China 218 NR Illumina HiSeq

Yimer 112 2018 Identify genomic features of

Lineage 7 strains.

Ethiopia 30 NR Illumina MiSeq

Note: NR = Not reported

78

Table 6. Mean proportions of STROME-ID criteria met pre- and post-guideline publication

Exposure Pre-STROME-ID SD Post-STROME-ID SD P-value

6 Months 0·51

0·11 0·46 0·14 0·26

12 Monthsa 0·48

0·14 0·51 0·11 0·52

6 Months Exclusionb 0·46

0·14 0·46 0·14 0·98

12 Months Exclusionb 0·48 0·14 0·49 0·14 0·71

SD= Standard deviation, STROME-ID= Strengthening the Reporting of Molecular Epidemiology for Infectious

Diseases

aPapers published within 12 months following STROME-ID were classified as ‘unexposed’, i.e., we considered that

authors may not have seen the guidelines or had the opportunity to incorporate them. bPapers published in this time

period following the STROME-ID publication date were excluded from the analysis altogether.

79

Table 7. Quasi-Poisson univariate and multivariate analyses of impact factor, H-index, continent,

and sample size of isolates

Univariate Multivariate

Variables IRR 95% CI P-value IRR 95% CI P-value

IF

0-4.9999*

5-9.9999 1·10 1·00, 1·21 0·06 1·09 0·98, 1·22 0·11

10-19.9999 1·20 1·03, 1·38 0·02

1·18 1·00, 1·39 0·06

≥20 1·13 1·00, 1·28 0·05 1·11 0·97, 1·28 0·14

HI 1·00 1·00, 1·00 0·37

Continent

Americas *†

Africa 0·97 0·79, 1·18 0·79 0·98 0·80, 1·19 0·83

Asia 0·93 0·81, 1·08 0·37

0·96 0·30, 1·12 0·62

Europe 0·93 0·84, 1·02 0·13 0·92 0·83, 1·01 0·09

Oceania 0·91 0·76, 1·09 0·30

0·95 0·79, 1·14 0·60

SS

<30*

30-152 1·03 0·92, 1·15 0·65 1·00 0·89, 1·13 0·97

153-276 1·05 0·90, 1·22 0·53

1·01 0·86, 1·18 0·95

≥277 1·11 0·99, 1·25 0·09 1·04 0·91, 1·19 0·55

*Reference level

†Combined North America and South America; only 1 country from South America

IRR= Incidence rate ratio, CI= confidence interval, IF= impact factor, HI= H-index, SS= sample size of isolates

80

Table 8. Univariate and multivariate tobit analysis of impact factor, H-index, continent, and

sample size of isolates

Univariate Multivariate

Variables Coefficients 95% CI P-value Coefficients 95% CI P-value

IF

0-4.9999*

5-9.9999 0·05 -0·001, 0·10 0·06 0·04 -0·02, 0·09 0·18

10-19.9999 0·08 0·006, 0·16 0·04

0·06 -0·02, 0·14 0·14

≥20 0·10 0·03, 0·16 0·003 0·06 -0·01, 0·13 0·09

HI 0·0002 0·001, 0·001 0·75

Continent

Americas *†

Africa 0·04 -0·06, 0·14 0·40 0·03 -0·01, 0·12 0·54

Asia -0·04 -0·11, 0·04 0·32

-0·03 -0·10, 0·04 0·34

Europe -0·04 -0·09, 0·01 0·15 -0·04 -0·01, 0·01 0·08

Oceania -0·04 -0·13, 0·05 0·42

-0·01 -0·01, 0·08 0·92

SS

<30*

30-152 0·07 0·01, 0·12 0·02 0·05 0·00, 0·11 0·05

153-276 0·09 0·02, 0·16 0·02

0·07 -0·01, 0·14 0·08

≥277 0·11 0·05, 0·17 < 0·0001 0·09 0·02, 0·15 0·01

*Reference level

†Combined North America and South America; only 1 country from South America

IRR= Incidence rate ratio, CI= confidence interval, IF= impact factor, HI= H-index, SS= sample size of isolates

81

Supplemental materials

Search strategy

This study is registered on PROSPERO (CRD42017064395) and followed Preferred Reporting Items for Systematic

Reviews and Meta-Analyses (PRISMA) guidelines.1 We initially searched MEDLINE, Embase Classic and Embase on May

3, 2017 using the terms “tuberculosis” and “genom* sequencing”. We then updated this search on April 23, 2019. No

restrictions were placed on the start date or geographic location. We also systematically searched the pre-print server

bioRxiv. References of included articles were also hand-searched to ensure no eligible articles were missed.

Inclusion and exclusion criteria

To be eligible for inclusion, studies needed to include patients with microbiologically-confirmed TB and needed to have

used WGS for typing of strains. Studies must have been published in English, French or Spanish. As suggested by Field et

al.,2 we considered studies to be genomic epidemiology papers if they investigated the distribution or transmission dynamics

of TB across time, a particular population, or a geographic location in order to inform outbreaks, evaluate infection control

practices or perform surveillance. Studies were also included if they examined risk factors for transmission (e.g., clustering),

or if they distinguished between recurrent cases of TB as relapse or reinfection. If studies described the evolution of TB

strains and drug resistance, or if they identified and classified new TB strains or lineages, they were included as well.

Finally, studies were included if they investigated the association between strain types or mutations and clinical outcomes

(e.g., death, treatment failure, relapse).

We excluded non-human studies, studies that were exclusively experimental (e.g., in-vitro or in-vivo animal studies, or

those that were purely diagnostic. The latter included studies where WGS was exclusively used for predicting phenotypic

drug resistance, without epidemiological aims. We also excluded studies whose primary aim was to use WGS to develop a

SNP-based typing method (unless the overall analysis and description of the epidemiology still relied on WGS), studies that

exclusively compared typing methods, and studies with less than two patients. Conference abstracts, editorials, and

literature reviews were also excluded.

Data extraction

To determine if manuscript met eligibility criteria for STROME-ID, two reviewers independently reviewed titles and

abstracts (BC and RSL). Discrepancies were resolved by discussion and third-party arbitration (TC). One reviewer (BC)

was responsible for data extraction based on STROME-ID criteria, as well as additional variables of interest (specified a priori), including whether the bioinformatic tools used were reported (along with corresponding version numbers) and

whether WGS sequencing data were made openly available, to assess reproducibility. All accession numbers were checked

to confirm that the raw data was uploaded for papers that reported sequence accession numbers. A second reviewer (RSL)

independently checked a random sample consisting of 5% of all eligible papers; data extraction for these papers was

compared between BC and RSL prior to data extraction for the remaining articles, with discussion to clarify any

discrepancies.

Following data extraction, overall themes of the articles were synthesized and described. Each STROME-ID variable was

assessed, and scored as ‘complete’ or ‘incomplete’ (or assigned ‘not applicable’, where appropriate). The number of

STROME-ID criteria and proportion of those out of all criteria were then tabulated for each article, with the denominator

for the proportions excluding criteria that were not applicable (e.g., specific to a different study design).

In addition to this, we analyzed whether certain study characteristics were associated with the number and proportion of

fulfilled STROME-ID criteria, which were specified a priori. Few studies have specifically examined factors correlated

with STROBE reporting quality,3,4 although this was analyzed using other reporting frameworks (e.g., CONSORT,

STARD).5-7 These include sample size (SS), the journal impact factor (IF), and the geographic region of senior author’s

primary affiliation. For sample size, the number of patients as well as isolates were extracted from each article. We

anticipated that the sample size of patients and that of isolates would be highly correlated and assessed this using the

82

Spearman’s non-parametric correlation test to determine whether both or only one of these should be included. For IF, this

was obtained from Journal Citation Reports (https://jcr.clarivate.com) for the year of each article’s publication. When IF

could not be located in in this database, SciJournal (https://scijournal.org) was searched. The continent of the senior

author’s primary affiliation was determined by examining the geographic region of the last author, which typically

represents the senior author in genomic epidemiology as well as other fields.8,9 When authors had multiple affiliations, the

continent from which the study samples were obtained was assigned as the primary affiliation (Supplemental Table 2). In

addition to these, we also included the current h-index (HI) of the senior author. This was obtained using Scopus

(https://www.scopus.com).

Statistical Analysis

To assess differences in reporting following STROME-ID’s publication, the mean proportions of completed criteria were

compared before and after its publication date. A 6-month lag period was included to account for articles that were already

in press when STROME-ID was published. Sensitivity analyses were also performed using a 12-month lag period, and

excluding articles published 6 and 12 months post-STROME-ID publication. Differences in mean proportions of criteria

were compared pre- and post-publication using a two-tailed t-test using R software (version 1.1.456). The least and most

reported STROME-ID criteria were also qualitatively assessed to explore differences between periods, excluding criteria

that were not eligible for > 20% of articles (Supplemental Figure 1 and 2).

To examine the association between study characteristics and reporting, two main approaches were used. First, we used

quasi-Poisson regression (to account for under-dispersion) with the number of criteria completed as the dependent variable.

Given not all criteria were applicable across every study, this analysis was restricted to criteria that were applicable across

all studies. Second, we used tobit regression (censored between 0 and 1) to assess the association with the proportion of

criteria that were completed, including all studies in the analysis. The distribution of IFs from all papers is shown in

Supplemental Figure 3; IF was used as a categorical variable, with categories chosen based on our experience with the

metric, and previous studies that examined correlates with IF.10,11 For SS, we categorized this into quartiles due to low

counts across a wide range of data (Supplemental Figure 4). HI was analyzed as a linear variable.

Variables that had a P-value of < 0·20 in univariate analyses were included in the final model for each analysis. Pseudo-R2,

the Akaike information criterion, and log-likelihood were calculated to assist with model selection and evaluate fit.

Missing data

The number of patients was missing for 18·4% (n=21) articles. IFs were also not available for articles published during the

first year of the journal (n=1, 2013) or published in 2019 (n= 15, 13·16%). To address this, IF was reviewed for all available

years. If the variation in IF between years was minor, the most recent value was used (Supplemental Table 3).

83

Supplemental Results

Themes of the included articles

135 studies used WGS to investigate transmission. When compared to classical genotyping methods, these studies

demonstrated WGS’ superior ability to identify and confirm epidemiological linkages between different subgroups in an

outbreak, and their patterns of transmission.12-15 This was seen across populations in both high-incidence15 and low-

incidence settings,16-18 as well as between different groups, such as foreign-born and locally born individuals.19,20 Authors

also found that WGS provided additional resolution to distinguish between recurrent TB cases due to relapse or re-

infection.14,21,22 36 studies used WGS to examine the evolution of TB and drug resistance. Several studies characterized

genomic differences using WGS in order to describe the mechanisms of the microevolution of drug-resistant TB and its

transmission.23-28 Studies also described the evolution of outbreaks in various settings, using WGS to reconstruct the

timeline of resistance-conferring mutations.28-30 Lastly, studies broadly examined the evolution of TB in comparison to

human migration patterns.31-34 Eight studies used WGS to investigate strains and/or lineages of TB. WGS-based genotyping

provided additional resolution to elucidate strain diversity,35,36 and identify genomic characteristics of different strains.37,38

WGS was also used to identify strains and the sub-lineages present in a particular region’s transmission network.33,39,40 One

of the studies also described a new sub-lineage.41 Finally, two studies examined associations of TB strains or mutations with

clinical outcomes, which included rates of relapse, treatment status, death or loss to follow-up.22,42

84

Supplemental figures

Supplemental Figure 1. Count of "not applicable" papers per STROME-ID criterion, pre-publication.

The number of “not applicable” (NA) papers (n= 17) per STROME–ID criterion prior to guideline publication, accounting

for a six-month lag. The criterion with the most amount of NA papers required translating estimates of relative risk into

absolute risk (STROBE–16c).

85

Supplemental Figure 2. Count of "not applicable" papers per STROME-ID criterion, post-publication

The number of “not applicable” (NA) papers (n= 97) per STROME–ID criterion in the pre-publication reporting period,

accounting for a six-month lag. The criterion with the most amount of NA papers required stating the eligibility criteria and

methods of participant selection (STROBE–6b).

86

Supplemental Figure 3. Distribution of impact factors for included papers

Frequency distribution of journal impact factor (IF). Most IFs in the data set are less than 20, with low counts of IFs greater

than 20.

87

Supplemental Figure 4. Distribution of sample size of isolates in included papers

Frequency distribution of sample size of isolates (SS). Most SS are less than 1000, with low counts of SS greater than 1000.

There were no counts of SS between 3000 and 4000 isolates.

88

Supplemental Figure 5. Proportion of STROME-ID criteria met with 12-month lag pre-publication.

The average proportion of STROME-ID criteria met across articles prior to guideline publication, accounting for a twelve–

month lag, was similarly variable to the trend observed in the six-month lag period. The least completed STROME–ID

criterion required defining of key molecular terms (STROME–4.1). The three most frequently reported criteria were:

explaining the scientific background and rationale (STROBE–2), stating the epidemiological objectives of using molecular

typing (STROME–ID 3.1), and stating study objectives and hypotheses (STROBE–3).

89

Supplemental Figure 6. Proportion of STROME-ID criteria met with 12-month lag post-publication.

All criteria were completed at least once in this reporting period. The least frequently completed STROME–ID criterion

required defining of key molecular terms (STROME–4.1). The three most frequently completed STROME-ID criteria were:

stating the study’s overarching objectives and hypotheses (STROBE 3), stating the epidemiological objectives of using

molecular typing (STROME–ID 3.1), and stating the source of participants, clinical specimens and the sampling frame

(STROME–ID 6.1).

90

Supplemental Figure 7. Proportion of STROME-ID criteria met post-publication, excluding articles from 12-

month lag.

A sensitivity analysis for the time period post-guideline publication was conducted, which excludes articles

published in a twelve–month publication lag. Five criteria were not met, which were the same as those criteria not

met in the pre-publication period accounting for the 12-month lag.

91

Supplemental tables Supplemental Table 1. STROME-ID criteria, adapted from Field et al.2

Criteria Description of criteria

STROBE-1(a) Denote study’s design using a term in the title or the abstract

STROBE-1(b) Briefly describe methods and results in the abstract

STROME-ID 1.1 The term molecular epidemiology is mentioned in the title or abstract and the keywords

STROBE-2 Describe the scientific context and rationale of the methods used

STROME-ID 2.1 Discuss the pathogen population and the distribution of pathogen strains within the host population

STROBE-3 State study objectives and any prespecified hypotheses

STROME-ID 3.1 State the epidemiological objectives of using molecular typing

STROBE-4 Discuss study design early in the paper

STROME-ID 4.1 Define key molecular terminologies used in the study

STROME-ID 4.2 Define the molecular markers using a standard nomenclature

STROME-ID 4.3 Provide definitions for infectious-disease cases

STROME-ID 4.4 Discuss methods about sample collection, laboratory techniques, and minimizing cross-contamination. Provide criteria for

identifying strains

STROBE-5 Provide information about the locations, dates, participant recruitment, exposure, follow-up, and data collection

STROME-ID 5.1 Mention the timeframe of the study. Discuss the molecular clock of markers if known, and its natural history

STROBE-6(a) Cohort study—Provide eligibility criteria, and the sources and methods for including participants. Explain follow-up methods

Case-control study— Provide eligibility criteria, and the sources and methods of case ascertainment and control selection.

Provide explanation for use of cases and controls

Cross-sectional study— Provide eligibility criteria, and the sources and methods of participant selection

STROBE-6(b) Cohort study—For matched studies, state matching criteria and number of exposed and unexposed participants

Case-control study—For matched studies, state matching criteria and controls per case

STROME-ID 6.1 Discuss source of participants and clinical specimens. State sampling frame and strategy

STROBE-7 Report all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable

STROBE-8 For each variable of interest, state data sources and methods of assessment. If more than one group, state comparability of

assessment methods.

STROME-ID 8.1 Explain detection of multiple-strain infections

STROBE-9 Explain methods to address potential sources of bias

STROME-ID 9.1 Explain how discovery or ascertainment bias was addressed

STROBE-10 Explain the rationale for study size

STROME-ID 10.1 Report unique restrictions placed on the study sample size

STROBE-11 Describe the analyses of quantitative variables. If relevant, describe rationale for groupings

STROBE-12(a) Describe all statistical methods, including those used to control for confounding

STROBE-12(b) Describe any methods used to examine subgroups and interactions

STROBE-12(c) Discuss methods for addressing missing data

STROBE-12(d) Cohort study—if applicable, explain how loss to follow-up was addressed case-Control study—if applicable, explain how

matching of cases and controls was addressed

Cross-sectional study—if applicable, describe analytical methods taking account of sampling strategy

STROBE-12(e) Describe any sensitivity analyses

STROME-ID 12.1 State how the study took account of the non-independence of sample data, if appropriate

92

STROME-ID 12.2 Explain methods for addressing missing data

STROBE-13(a) Discuss count of individuals at each stage of the study

STROBE-13(b) Provide rationale for non-participation at each stage

STROBE-13(c) Uses a flow diagram

STROME-ID 13.1 Report numbers of participants and samples at each stage of the study (e.g., number of samples, the number typed, and the

number yielding data)

STROME-ID 13.2 If molecular clusters are investigated, report the sampling fraction, cluster sizes, and the study population turnover, if known

STROBE-14(a) Provide characteristics of study participants, including details about exposures and potential confounders

STROBE-14(b) Denote the number of individuals with missing data for each variable of interest

STROBE-14(c) Cohort study-summarise follow-up time

STROME-ID 14.1 Give information by strain type if appropriate, with use of standardised nomenclature

STROBE-15 Cohort study—state numbers of outcome events or summary measures over time

Case-control study—state count of each exposure category

Cross-sectional study—state numbers of outcome events or summary measures

STROBE-16(a) Provide unadjusted estimates, included confounder-adjusted estimates and their precision if relevant. Explain which confounders

were adjusted for and why

STROBE-16(b) State category boundaries for continuous variables that were categorised

STROBE-16(c) If relevant, convert relative risk into absolute risk

STROME-ID 16.1 Illustrate molecular similarity among strains with a dendrogram or phylogenetic tree

STROBE-17 Report other analyses done, such as subgroup analyses

STROBE-18 Discuss key results that consider study objectives

STROBE-19 Report limitations, including potential bias or imprecision, and their direction and magnitude.

STROME-ID 19.1 Consider other possible explanations for findings about transmission chains if relevant. State the consistency between molecular

and epidemiological evidence

STROBE-20 Discussion of results that consider objectives, limitations, and other studies’ results

STROBE-21 Explain the generalizability of study results

STROBE-22 Provide funding sources and their role

STROME-ID 23.1 State ethical considerations and implications for infectious-disease molecular epidemiology

.

93

Supplemental Table 2. Count of papers per continent of senior author’s primary affiliation.

Continent of senior author’s primary affiliation Count of papers

North America 32

South America 1

Africa 6

Asia 13

Europe 54

Oceania 8

Note: Due to low individual country counts, countries were grouped by continent, where South America was

included with North America for the category “Americas” because it had only one count.

94

Supplemental Table 3. Standard deviation of journal IF from 2013-2018, shown for the journals

corresponding to an article published in 2019.

Journal SD

BMC Genomics 0.12

BMC Infectious Diseases 0.07

Clinical Infectious Diseases 0.30

Emerging Infectious Diseases 0.46

J Clin Microbiol 0.44

Molecular Ecology 0.20

Nature Scientific Reports 0.58

PLOS One 0.17

Tuberculosis 0.09

95

Supplemental Table 4. Sensitivity univariate and multivariate analysis for quasi-Poisson, excluding seven

papers with senior authors from >1 continent. Univariate Multivariate

Variables IRR 95% CI P-value IRR 95% CI P-value

IF

0-4.9999*

5-9.9999 1·10 1·00, 1·21 0·06 1·10 0·99, 1·22 0·09

10-19.9999 1.19 1·02, 1·39 0·03 1·18 0.99, 1·40 0·07

≥20 1.17 1·02, 1·33 0·02 1·15 0·99, 1·34 0·08

HI 1·00 1·00, 1·00 0·31

Continent

Americas *†

Africa 0·98 0·79, 1·22 0·89 1·00 0·80, 1·24 0.97

Asia 0·91 0·77, 1·06 0·23 0·95 0·81, 1·11 0·53

Europe 0·91 0·82, 1·00 0·07 0·91 0·82, 1·00 0·06

Oceania 0·92 0·76, 1·11 0·40 1.00 0·81, 1·21 0·96

SS

<30*

30-152 1·03 0·92, 1·16 0·59 1·00 0·89, 1·13 0·93

153-276 1·07 0·91, 1·25 0·40 1·03 0·88, 1·22 0·69

≥277 1·12 0·99, 1·26 0·08 1·05 0·91, 1·21 0·49

*Reference level

†Combined North America and South America; only 1 country from South America

IRR= Incidence rate ratio, CI= confidence interval, IF= impact factor, HI= H-index, SS= sample size of isolates

96

Supplemental Table 5. Sensitivity univariate and multivariate analysis for tobit regression, excluding seven

papers with senior authors from >1 continent. Univariate Multivariate

Variables Coefficient 95% CI P-value Coefficient 95% CI P-value

IF

0-4.9999*

5-9.9999 0·04 -0·01, 0·09 0·14 0·02 -0·03, 0·08 0·34

10-19.9999 0·07 -0·02, 0·15 0·12 0·04 -0·04, 0·12 0·34

≥20 0·09 0·02, 0·16 0·01 0·06 -0·02, 0·13 0·13

HI 0·0001 -0·001, 0·001 0·86

Continent

Americas *†

Africa 0·04 -0·07, 0·15 0·45 0·03 -0·07, 0·13 0·59

Asia -0·05 -0·13, 0·02 0·17 -0·04 -0·12, 0·03 0·24

Europe -0·04 -0·10, 0·01 0·08 -0·05 -0·10, 0·00 0·05

Oceania -0·05 -0·15, 0·04 0·27 -0·01 -0·10, 0·08 0·85

SS

<30*

30-152 0·07 0·02, 0·13 0·01 0·06 0·01, 0·11 0·03

153-276 0·07 -0.01, 0·14 0·07 0·06 -0·02, 0·14 0·12

≥277 0·10 0·05, 0·16 < 0·0001 0·09 0·02, 0·15 0·01

*Reference level

†Combined North America and South America; only 1 country from South America

IRR= Incidence rate ratio, CI= confidence interval, IF= impact factor, HI= H-index, SS= sample size of isolates

97

Supplemental Table 6. Number of papers with unavailable raw genomic data per year.

Publication year Papers with unavailable raw genomic data Total papers

2009 0 1

2010 1 2

2011 0 1

2012 0 1

2013 0 9

2014 0 6

2015 3 18

2016 2 12

2017 3 17

2018 13 34

2019 3 13

98

Supplemental references 1. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and

meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med 2009;

151(4): W65-W94.

2. Field N, Cohen T, Struelens MJ, et al. Strengthening the reporting of molecular epidemiology for infectious

diseases (STROME-ID): an extension of the STROBE statement. Lancet Infect Dis 2014; 14(4): 341-52.

3. Rao A, Brück K, Methven S, et al. Quality of reporting and study design of CKD cohort studies assessing

mortality in the elderly before and after STROBE: a systematic review. PLoS ONE 2016; 11(5): e0155078.

4. Adams AD, Benner RS, Riggs TW, Chescheir NC. Use of the STROBE checklist to evaluate the reporting

quality of observational research in obstetrics. Obstet Gynecol 2018; 132(2): 507-12.

5. Ghimire S, Kyung E, Lee H, Kim E. Oncology trial abstracts showed suboptimal improvement in reporting:

a comparative before-and-after evaluation using CONSORT for abstract guidelines. J Clin Epidemiol 2014; 67(6):

658-66.

6. Selman TJ, Morris RK, Zamora J, Khan KS. The quality of reporting of primary test accuracy studies in

obstetrics and gynaecology: application of the STARD criteria. BMC Women's Health 2011; 11(1): 8.

7. Mackinnon S, Drozdowska BA, Hamilton M, Noel-Storr AH, McShane R, Quinn T. Are methodological

quality and completeness of reporting associated with citation-based measures of publication impact? A secondary

analysis of a systematic review of dementia biomarker studies. BMJ Open 2018; 8(3): e020331.

8. Bhopal R, Rankin J, McColl E, et al. The vexed question of authorship: views of researchers in a British

medical faculty. BMJ 1997; 314(7086): 1009-12.

9. D Reisenberg GL. The order of authorship: who’s on first? JAMA 1990; (264): 1857.

10. Kuroki LM, Allsworth JE, Peipert JF. Methodology and analytic techniques used in clinical research:

associations with journal impact factor. Obstet Gynecol 2009; 114(4): 877-84.

11. Falagas ME, Kouranos VD, Michalopoulos A, Rodopoulou SP, Batsiou MA, Karageorgopoulos DE.

Original artile: comparison of the distribution of citations received by articles published in high, moderate, and low

impact factor journals in clinical medicine. Intern Med J 2010; 40(8): 587-91.

12. Jajou R, de Neeling A, van Hunen R, et al. Epidemiological links between tuberculosis cases identified

twice as efficiently by whole genome sequencing than conventional molecular typing: A population-based study.

PLoS One 2018; 13(5).

13. Ocheretina O, Shen L, Escuyer VE, et al. Whole genome sequencing investigation of a tuberculosis

outbreak in Port-au-Prince, Haiti caused by a strain with a “low-level" rpoB mutation L511P - insights into a

mechanism of resistance escalation. PLoS one 2015; 10(6): e0129207.

14. Witney AA, Bateson AL, Jindani A, et al. Use of whole-genome sequencing to distinguish relapse from

reinfection in a completed tuberculosis clinical trial. BMC Med 2017; 15(1): 71.

15. Wyllie D, Davidson J, Walker T, et al. A quantitative evaluation of MIRU-VNTR typing against whole-

genome sequencing for identifying Mycobacterium tuberculosis transmission: a prospective observational cohort

study. EBioMedicine 2018; 34: 122-30.

16. Cabibbe AM, Trovato A, De Filippo MR, et al. Countrywide implementation of whole genome sequencing:

an opportunity to improve tuberculosis management, surveillance and contact tracing in low incidence countries.

The Eur Respir J 2018.

17. Gurjav U, Outhred AC, Jelfs P, et al. Whole genome sequencing demonstrates limited transmission within

identified Mycobacterium tuberculosis clusters in New South Wales, Australia. PLoS ONE 2016; 11(10): e0163612.

18. Genestet C, Tatai C, Berland JL, et al. Prospective whole-genome sequencing in tuberculosis outbreak

investigation, France, 2017-2018. Emerg Infect Dis 2019; 25(3): 589-92.

19. Auld SC, Shah NS, Mathema B, et al. Extensively drug-resistant tuberculosis in South Africa: genomic

evidence supporting transmission in communities. Eur Respir J 2018; 52(4).

20. Mizukoshi F, Miyoshi-Akiyama T, Iwai H, et al. Genetic diversity of Mycobacterium tuberculosis isolates

from Tochigi prefecture, a local region of Japan. BMC Infect Dis 2017; 17(1): 365.

21. Bryant JM, Harris SR, Parkhill J, et al. Whole-genome sequencing to establish relapse or re-infection with

Mycobacterium tuberculosis: a retrospective observational study. Lancet Respir Med 2013; 1(10): 786-92.

22. Guerra-Assunçao JA, Houben RMGJ, Crampin AC, et al. Recurrence due to relapse or reinfection with

Mycobacterium tuberculosis: a whole-genome sequencing approach in a large, population-based cohort with a high

HIV infection prevalence and active follow-up. J Infect Dis 2015; 211(7): 1154-63.

99

23. Perez-Lago L, Comas I, Navarro Y, et al. Whole genome sequencing analysis of intrapatient

microevolution in Mycobacterium tuberculosis: potential impact on the inference of tuberculosis transmission. J Infect Dis 2014; 209(1): 98-108.

24. Casali N, Nikolayevskyy V, Balabanova Y, et al. Microevolution of extensively drug-resistant tuberculosis

in Russia. Genome Res 2012; 22(4): 735-45.

25. Kato-Maeda M, Ho C, Passarelli B, et al. Use of whole genome sequencing to determine the

microevolution of Mycobacterium tuberculosis during an outbreak. PLoS ONE 2013; 8(3): e58235.

26. Mehaffy C, Guthrie JL, Alexander DC, Stuart R, Rea E, Jamieson FB. Marked microevolution of a unique

Mycobacterium tuberculosis strain in 17 years of ongoing transmission in a high risk population. PLoS ONE 2014;

9(11): 0112928.

27. Ioerger TR, Koo S, No E-G, et al. Genome analysis of multi- and extensively-drug-resistant tuberculosis

from KwaZulu-Natal, South Africa. PLoS ONE 2009; 4(11): e7778.

28. Cohen KA, Abeel T, Manson McGuire A, et al. Evolution of Extensively Drug-Resistant Tuberculosis over

Four Decades: Whole Genome Sequencing and Dating Analysis of Mycobacterium tuberculosis Isolates from

KwaZulu-Natal. PLoS Medicine 2015; 12(9): e1001880.

29. Ioerger TR, Feng Y, Chen X, et al. The non-clonality of drug resistance in Beijing-genotype isolates of

Mycobacterium tuberculosis from the Western Cape of South Africa. BMC Genomics 2010; 11: 670.

30. Casali N, Broda A, Harris SR, Parkhill J, Brown T, Drobniewski F. Whole genome sequence analysis of a

large isoniazid-resistant tuberculosis outbreak in london: a retrospective observational study. PLoS Medicine 2016;

13(10): e1002137.

31. Eldholm V, Monteserin J, Rieux A, et al. Four decades of transmission of a multidrug-resistant

Mycobacterium tuberculosis outbreak strain. Nat Commun 2015; 6.

32. Comas I, Coscolla M, Luo T, et al. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium

tuberculosis with modern humans. Nat Genet 2013; 45(10): 1176-82.

33. Luo T, Comas I, Luo D, et al. Southern East Asian origin and coexpansion of Mycobacterium tuberculosis

Beijing family with Han Chinese. Proc Natl Acad Sci U S A 2015; 112(26): 8136-41.

34. O'Neill MB, Shockey A, Zarley A, et al. Lineage specific histories of Mycobacterium tuberculosis dispersal

in Africa and Eurasia. Mol Ecol 2019; 28(13): 3241-56.

35. Stucki D, Brites D, Jeljeli L, et al. Mycobacterium tuberculosis lineage 4 comprises globally distributed and

geographically restricted sublineages. Nat Genet 2016; 48(12): 1535-43.

36. Chatterjee A, Nilgiriwala K, Saranath D, Rodrigues C, Mistry N. Whole genome sequencing of clinical

strains of Mycobacterium tuberculosis from Mumbai, India: A potential tool for determining drug-resistance and

strain lineage. Tuberculosis 2017; 107: 63-72.

37. Gautam SS, Mac Aogain M, Bower JE, Basu I, O'Toole RF. Differential carriage of virulence-associated

loci in the New Zealand Rangipo outbreak strain of Mycobacterium tuberculosis. Infect Dis 2017; 49(9): 680-8.

38. Koster K, Largen A, Foster JT, et al. Whole genome SNP analysis suggests unique virulence factor

differences of the Beijing and Manila families of Mycobacterium tuberculosis found in Hawaii. Plos One 2018;

13(7).

39. Casali N, Nikolayevskyy V, Balabanova Y, et al. Evolution and transmission of drug-resistant tuberculosis

in a Russian population. Nat Genet 2014; 46(3): 279-86.

40. Winglee K, Manson McGuire A, Maiga M, et al. Whole genome sequencing of Mycobacterium africanum

strains from Mali provides insights into the mechanisms of geographic restriction. PLoS Negl Trop Dis 2016; 10(1):

e0004332.

41. Malm S, Linguissi LSG, Tekwu EM, et al. New Mycobacterium tuberculosis complex sublineage,

Brazzaville, Congo. Emerg Infect Dis 2017; 23(3): 423-9.

42. Sobkowiak B, Glynn JR, Houben R, et al. Identifying mixed Mycobacterium tuberculosis infections from

whole genome sequence data. BMC Genomics 2018; 19(1): 613.

100

Chapter 6. Discussion

6.1. Summary

The objectives of this thesis were to systematically assess the extent to which genomic

epidemiology studies of TB reported STROME-ID criteria, as well as whether this improved

after publication of STROME-ID reporting guidelines, and to investigate whether there was an

association between reporting quality and study characteristics. The following was achieved:

1. The extent of STROME-ID guideline reporting was assessed among TB genomic

epidemiology studies, from the first published manuscript in this area in 2009 through to

2019. The average proportion of the completeness of STROME-ID reporting before and

after its publication was 51% (±11%) and 0·46% (±14%), respectively (Table 2). Overall,

completeness of reporting ranged from 16·3-75·0% (mean 49·9%, ± 11·88%).

2. HI did not meet inclusion criteria to be included in multivariate analysis. Significant

associations were not identified between reporting quality and IF, SS, and continent of

senior author’s primary affiliation. In the tobit model, only a minor association was

observed between larger samples (SS≥277) and a greater proportion of eligible criteria

completed in one analysis.

Although larger samples were found to be significantly associated with a higher proportion of

criteria met, this was not interpreted as an epidemiologically meaningful difference. This result

was only found in one of the analyses, and a significant association was not observed for

categories of other sample sizes. Moreover, as discussed in the manuscript, this association

represented only a minor increase, equivalent to only less than a 10% increase compared to the

reference of <30 samples.

Only one article explicitly referred to STROME-ID guidelines. 105 As briefly mentioned in the

manuscript, this may suggest lack of awareness regarding STROME-ID guidelines. Although

this has not been specifically examined for STROME-ID, a survey of author attitudes towards

STROBE guidelines found that 185 (18.2%) of participants were not aware of guidelines. 113

This is surprising given that STROBE guidelines had already been implemented at the time of

101

the survey for twelve years. Furthermore, this survey found that the majority of participants

(70.7%, n = 718) were not aware of any STROBE extensions, including STROME-ID, which

had also already been published, though for five years. 113 These findings reinforce that formal

journal endorsement may be needed to enforce reproducibility, which aligns with other studies

that have investigated CONSORT guidelines, including a review 114 and randomized control

trial. 115

Although few studies have examined quality per STROBE guidelines and the impact of journal

endorsement 116,117, they suggest that journal endorsement alone is not an effective policy

strategy. To achieve high reporting levels, constant enforcement by journal editors, reviewers,

and support by senior research investigators is likely needed to cultivate a strong culture of

responsibility, as use of reporting guidelines is influenced by a variety of individual,

professional, environmental and logistical factors.118

6.2. Strengths and limitations

A strength of this thesis is that it has addressed a neglected area of research by systematically

examining reporting levels in genomic epidemiology studies. This thesis has extended the work

of the few studies that have examined reporting quality per STROBE among observational

studies for infectious diseases.119,120

A limitation of this study is that this work did not account for possible “within-journal”

clustering, although it included articles from 43 unique journals. Another limitation concerns the

lag time period. In this thesis, a lag time of six months was used to account for articles in-press

after guideline publication, and sensitivity analyses was done with a twelve-month lag time.

However, these time periods may not have been long enough for guideline dissemination and

uptake, especially if there were other reporting frameworks that were also being promoted

around the same time. Eight other extensions were published in the same year after the release of

STROME-ID guidelines, according to the EQUATOR network.

6.3. Future directions for research It is unclear why there is low reporting among genomic epidemiology studies of TB. It is

102

possible that authors are still unaware of STROME-ID guidelines, or if they are, that the

guidelines are otherwise too difficult or time-consuming to complete. To investigate these

reasons, qualitative studies should be conducted to assess authors’ attitudes towards STROME-

ID guidelines, including any perceived barriers or facilitators to reporting compliance. These

qualitative studies may also help to identify other possible correlates of reporting quality. Lastly,

researchers should consider developing mandatory reporting guidelines for genomic data to

encourage higher reporting levels beyond social norm expectations among the scientific

community. Strengthening the evidence base about reporting quality would provide authors,

guideline developers and journal editors with more compelling reasons for supporting and using

STROME-ID reporting guidelines.118

Chapter 7. Conclusions

In order to realize the benefits of WGS technology in public health decision-making regarding

TB prevention and control, reporting quality among genomic epidemiology studies needs to be

improved. The impact of reporting guidelines on reporting quality and its correlates that were

investigated in this thesis highlights current gaps in STROME-ID reporting. Reproducibility and

data-sharing may be encouraged through more focused reporting criteria for WGS studies,

formal journal endorsement, and mandatory reporting of STROME-ID guidelines.

103

References

1. World Health Organization. Global tuberculosis control [WHO report]. Geneva,

Switzerland: World Health Organization; 2018. Available from:

http://www.who.int/tb/publications/global_report/en/.

2. Meehan CJ, Goig GA, Kohl TA, Verboven L, Dippenaar A, Ezewudo M, et al. Whole

genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev

Microbio. 2019;17(9):533-45.

3. Miyakawa T. No raw data, no science: another possible source of the reproducibility

crisis. Mol Brain. 2020;13(1):24.

4. McDonough KA, Kress Y, Bloom BR. Pathogenesis of tuberculosis: interaction of

Mycobacterium tuberculosis with macrophages. Infect Immun. 1993;61(7):2763-73.

5. Lin PL, Flynn JL. Understanding latent tuberculosis: a moving target. J Immunol.

2010;185(1):15-22.

6. Getahun H, Matteelli A, Chaisson RE, Raviglione M. Latent Mycobacterium tuberculosis

infection. N Engl J Med. 2015;372(22):2127-35.

7. Orme IM, Basaraba RJ. The formation of the granuloma in tuberculosis infection. Semin

Immunol. 2014;26(6):601-9.

8. Schraufnagel DE. “Latent tuberculosis infection” is a term that should go dormant, and

the significance of latent tuberculosis should be rethought. Ann Am Thorac Soc. 2016;13(5):593-

4.

9. Vynnycky E, Fine PEM. Lifetime risks, incubation period, and serial interval of

tuberculosis. Am J Epidemiol. 2000;152(3):247-63.

10. Horsburgh CR, Barry CE, Lange C. Treatment of tuberculosis. N Engl J Med.

2015;373(22):2149-60.

11. Piccazzo R, Paparo F, Garlaschi G. Diagnostic accuracy of chest radiography for the

diagnosis of tuberculosis (TB) and its role in the detection of latent TB Infection: a systematic

review. J Rheumatol. 2014;91:32-40.

12. World Health Organization. Chest radiography in tuberculosis detection: summary of

current WHO recommendations and guidance on programmatic approaches. Switzerland; 2016.

104

13. World Health Organization. Treatment of tuberculosis: guidelines for national programs.

4th edition. Geneva: WHO; 2010.

14. Daley CL. Molecular epidemiology: A tool for understanding control of tuberculosis

transmission. Clin Chest Med. 2005;26(2):217-31.

15. Hasnain SE, O'Toole RF, Grover S, Ehtesham NZ. Whole genome sequencing: A new

paradigm in the surveillance and control of human tuberculosis. Tuberculosis. 2015;95(2):91-4.

16. World Health Organization. WHO guidelines on tuberculosis infection prevention and

control: 2019 update. Geneva: World Health Organization; 2019.

17. Glaziou P, Floyd K, Raviglione MC. Global epidemiology of tuberculosis. Semin Respir

Crit Care Med. 2018;39(3):271-85.

18. Behr MA, Edelstein PH, Ramakrishnan L. Is Mycobacterium tuberculosis infection life

long? BMJ. 2019;367:l5770.

19. Sulis G, Roggi A, Matteelli A, Raviglione MC. Tuberculosis: epidemiology and control.

Mediterr J Hematol Infect Dis. 2014;6(1):e2014070-e.

20. Vachon J, Gallant V, Siu W. Tuberculosis in Canada, 2016. Can Commun Dis Rep.

2018;44(3-4):75-81.

21. Floyd K, Glaziou P, Houben RMGJ, Sumner T, White RG, Raviglione M. Global

tuberculosis targets and milestones set for 2016-2035: definition and rationale. Int J Tuberc Lung

Dis. 2018;22(7):723-30.

22. Huynh J, Marais BJ. Multidrug-resistant tuberculosis infection and disease in children: a

review of new and repurposed drugs. Ther Adv Infect Dis. 2019;6:2049936119864737-.

23. Al-ObaidiMJM, Suhali ZS, Desa MJM. Genotyping approaches for identification and

characterization of staphylococcus aureus. In: Abdurakhmonov I, editor. Genotyping:

IntechOpen; 2018.

24. Niemann S, Supply P. Diversity and evolution of Mycobacterium tuberculosis: moving to

whole-genome-based approaches. Cold Spring Harb Perspect Med. 2014;4(12):a021188-a.

25. Schürch AC, Arredondo-Alonso S, Willems RJL, Goering RV. Whole genome

sequencing options for bacterial strain typing and epidemiologic analysis based on single

nucleotide polymorphism versus gene-by-gene–based approaches. Clin Microbiol Infect.

2018;24(4):350-4.

105

26. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation

sequencing technology. Trends Genet. 2014;30(9):418-26.

27. National Tuberculosis Controllers Association/Centers for Disease Control and

Prevention Advisory Group on Tuberculosis Genotyping. Guide to the application of genotyping

to tuberculosis prevention and control. Atlanta, GA: US Department of Health and Human

Services; 2004.

28. Papaventsis D, Casali N, Kontsevaya I, Drobniewski F, Cirillo DM, Nikolayevskyy V.

Whole genome sequencing of Mycobacterium tuberculosis for detection of drug resistance: a

systematic review. Clin Microbiol Infect. 2017;23(2):61-8.

29. Hatherell H-A, Colijn C, Stagg HR, Jackson C, Winter JR, Abubakar I. Interpreting

whole genome sequencing for investigating tuberculosis transmission: a systematic review. BMC

Med. 2016;14(1):21.

30. Roetzer A, Diel R, Kohl TA, Ruckert C, Nubel U, Blom J, et al. Whole genome

sequencing versus traditional genotyping for investigation of a mycobacterium tuberculosis

outbreak: a longitudinal molecular epidemiological study. PLoS Med. 2013;10 (2): e1001387.

31. Köser CU, Ellington MJ, Cartwright EJP, Gillespie SH, Brown NM, Farrington M, et al.

Routine use of microbial whole genome sequencing in diagnostic and public health

microbiology. PLoS Pathog. 2012;8(8):e1002824-e.

32. Oakeson KF, Wagner JM, Mendenhall M, Rohrwasser A, Atkinson-Dunn R.

Bioinformatic analyses of whole-genome sequence data in a public health laboratory. Emerg

Infect Dis. 2017;23(9):1441-5.

33. Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in

clinical and public health microbiology. Pathology. 2015;47(3):199-210.

34. Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and

annotation. Evolutionary Applications. 2014;7(9):1026-42.

35. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible

computational research. PLoS Comput Biol. 2013;9(10):e1003285.

36. Lubin IM, Aziz N, Babb LJ, Ballinger D, Bisht H, Church DM, et al. principles and

recommendations for standardizing the use of the next-generation sequencing variant file in

clinical settings. J Mol Diagn. 2017;19(3):417-26.

106

37. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The

strengthening the reporting of observational studies in epidemiology (STROBE) statement:

guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344-9.

38. Field N, Cohen T, Struelens MJ, Palm D, Cookson B, Glynn JR, et al. Strengthening the

reporting of molecular epidemiology for infectious diseases (STROME-ID): an extension of the

STROBE statement. Lancet Infect Dis. 2014;14(4):341-52.

39. Armstrong GL, MacCannell DR, Taylor J, Carleton HA, Neuhaus EB, Bradbury RS, et

al. Pathogen genomics in public health. N Engl J Med. 2019;381(26):2569-80.

40. Baxevanis AD, Bateman A. The importance of biological databases in biological

discovery. Curr Protoc Bioinformatics. 2015;50(1).

41. Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tarraga A, Cheng Y, et al. The

European Nucleotide Archive. Nucleic Acids Res. 2011;39:D28-31.

42. Baker M. Is there a reproducibility crisis? A Nature survey lifts the lid on how

researchers view the 'crisis rocking science and what they think will help. Nature. 2016;533:452.

43. Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, et al. Reducing waste

from incomplete or unusable reports of biomedical research. Lancet. 2014;383(9913):267-76.

44. Moher D. Reporting research results: a moral obligation for all researchers. Can J

Anaesth. 2007;54(5):331.

45. Phelan J, O’Sullivan DM, Machado D, Ramos J, Whale AS, O’Grady J, et al. The

variability and reproducibility of whole genome sequencing technology for detecting resistance

to anti-tuberculous drugs. Genome Med. 2016;8(1):132.

46. Wyres KL, Conway TC, Garg S, Queiroz C, Reumann M, Holt K, et al. WGS analysis

and interpretation in clinical and public health microbiology laboratories: what are the

requirements and how do existing tools compare? Pathogens. 2014;3(2):437-58.

47. Kanwal S, Khan FZ, Lonie A, Sinnott RO. Investigating reproducibility and tracking

provenance – A genomic workflow case study. BMC Bioinform. 2017;18(1):337.

48. O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, et al. Low concordance of multiple

variant-calling pipelines: practical implications for exome and genome sequencing. Genome

Med. 2013;5(3):28.

107

49. Jajou R, Kohl TA, Walker T, Norman A, Cirillo DM, Tagliani E, et al. Towards

standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for

detection of epidemiologically linked tuberculosis cases. Euro Surveill. 2019;24(50):1900130.

50. Selman TJ, Morris RK, Zamora J, Khan KS. The quality of reporting of primary test

accuracy studies in obstetrics and gynaecology: application of the STARD criteria. BMC

Women's Health. 2011;11(1):8.

51. Ghimire S, Kyung E, Lee H, Kim E. Oncology trial abstracts showed suboptimal

improvement in reporting: a comparative before-and-after evaluation using CONSORT for

Abstract guidelines. J Clin Epidemiol. 2014;67(6):658-66.

52. Jefferson T, Di Pietrantonj C, Debalini MG, Rivetti A, Demicheli V. Relation of study

quality, concordance, take home message, funding, and impact in studies of influenza vaccines:

systematic review. BMJ. 2009;338:b354.

53. Glujovsky D, Boggino C, Riestra B, Coscia A, Sueldo CE, Ciapponi A. Quality of

reporting in infertility journals. Fertil Steril. 2015;103(1):236-41.

54. Rao A, Brück K, Methven S, Evans R, Stel VS, Jager KJ, et al. Quality of reporting and

study design of CKD cohort studies assessing mortality in the elderly before and after STROBE:

a systematic review. PLoS ONE. 2016;11(5):e0155078.

55. Poorolajal J, Cheraghi Z, Irani AD, Rezaeian S. Quality of cohort studies reporting post

the strengthening the reporting of observational studies in epidemiology (STROBE) statement.

Epidemiol Health. 2011;33:e2011005-e.

56. Mackinnon S, Drozdowska BA, Hamilton M, Noel-Storr AH, McShane R, Quinn T. Are

methodological quality and completeness of reporting associated with citation-based measures of

publication impact? A secondary analysis of a systematic review of dementia biomarker studies.

BMJ Open. 2018;8(3):e020331.

57. Hirsch JE. An index to quantify an individual's scientific research output. PNAS USA.

2005;102(46):16569-72.

58. Costas R, Bordons M. The h-index: Advantages, limitations and its relation with other

bibliometric indicators at the micro level. J Informetr. 2007;1(3):193-203.

59. Hodge DR, Lacasse JR. Evaluating journal quality: is the h-index a better measure than

impact factors? Res Soc Work Pract. 2011;21(2):222-30.

108

60. Maleki F, Ovens K, McQuillan I, Kusalik AJ. Size matters: how sample size affects the

reproducibility and specificity of gene set analysis. Hum Genomics. 2019;13(1):42.

61. Fumagalli M. Assessing the effect of sequencing depth and sample size in population

genetics inferences. PLoS ONE. 2013;8(11):e79667-e.

62. Farrokhyar F, Chu R, Whitlock R, Thabane L. A systematic review of the quality of

publications reporting coronary artery bypass grafting trials. Can J Surg. 2007;50(4):266-77.

63. Lai TYY, Wong VWY, Lam RF, Cheng ACO, Lam DSC, Leung GM. Quality of

reporting of key methodological items of randomized controlled trials in clinical ophthalmic

journals. Ophthal Epidemiol. 2007;14(6):390-8.

64. Rios LP, Odueyungbo A, Moitri MO, Rahman MO, Thabane L. Quality of reporting of

randomized controlled trials in general endocrinology literature. J Clin Endocrinol Metab.

2008;93(10):3810-6.

65. van der Werf MJ, Ködmön C. Whole-genome sequencing as tool for investigating

international tuberculosis outbreaks: a systematic review. Front Public Health. 2019;7:87-.

66. Real J, Forné C, Roso-Llorach A, Martínez-Sánchez JM. Quality reporting of

multivariable regression models in observational studies: review of a representative sample of

articles published in biomedical journals. Medicine. 2016;95(20):e3653-e.

67. White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-

case analysis for missing covariate values. Stat Med. 2010;29(28):2920-31.

68. Madden K, Phillips M, Solow M, McKinnon V, Bhandari M. A systematic review of

quality of reporting in registered intimate partner violence studies: where can we improve? J Inj

Violence Res. 2019;11(2):123-36.

69. Bastuji-Garin S, Sbidian E, Gaudy-Marqueste C, Ferrat E, Roujeau J-C, Richard M-A, et

al. Impact of STROBE statement publication on quality of observational study reporting:

interrupted time series versus before-after analysis. PLoS ONE. 2013;8(8):e64733.

70. Pouwels KB, Widyakusuma NN, Groenwold RHH, Hak E. Quality of reporting of

confounding remained suboptimal after the STROBE guideline. J Clin Epidemiol. 2016;69:217-

24.

71. Coxe S, West SG, Aiken LS. The analysis of count data: a gentle introduction to poisson

regression and its alternatives. J Pers Assess. 2009;91(2):121-36.

109

72. Zeviani WM, Ribeiro PJ, Bonat WH, Shimakura SE, Muniz JA. The Gamma-count

distribution in the analysis of experimental underdispersed data. J Appl Stat. 2014;41(12):2616-

26.

73. Sellers KF, Morris DS. Underdispersion models: models that are “under the radar”.

Commun Stat-Theor M. 2017;46(24):12075-86.

74. Zeviani WM, Ribeiro PJ, Jr., Bonat WH, Shimakura SE, Muniz JA. The Gamma-count

distribution in the analysis of experimental underdispersed data. J Appl Stat. 2014;41(12):2616-

26.

75. Ramalho EA, Ramalho JJS, Murteira JMR. Alternative estimating and testing empirical

strategies for fractional regression models. J Econ Surv. 2011;25(1):19-68.

76. Moeller MM. Methods for analyzing proportions [Thesis]: The University of Texas;

2013.

77. Schmid M, Wickler F, Maloney KO, Mitchell R, Fenske N, Mayr A. Boosted beta

regression. PLoS ONE. 2013;8(4):e61623.

78. Twisk J, Rijmen F. Longitudinal tobit regression: A new approach to analyze outcome

variables with floor or ceiling effects. J Clin Epidemiol. 2009;62(9):953-8.

79. Hussain A, Rigby R, Stasinopoulos M, Enea M. A flexible approach for modelling a

proportion response variable: Loss given default. Proceedings of the 31st International Workshop

on Statistical Modelling (France). 2016.

80. Edward Martey RMA-H, John K.M. Kuwornu. Commercialization of smallholder

agriculture in Ghana: a tobit regression analysis. Afr J Agric Res. 2012;7(14):2131-41.

81. Mujasi PN, Asbu EZ, Puig-Junoy J. How efficient are referral hospitals in Uganda? A

data envelopment analysis and tobit regression approach. BMC Health Serv Res. 2016;16(1):230.

82. Carter RE, Lipsitz SR, Tilley BC. Quasi-likelihood estimation for relative risk regression

models. Biostatistics. 2005;6(1):39-44.

83. Heinzl H, Mittlböck M. Pseudo R-squared measures for Poisson regression models with

over- or underdispersion. Comput Stat Data An. 2003;44(1):253-71.

84. Faraway JJ. Extending the linear model with R: generalized linear, mixed effects and

nonparametric regression models. Boca Raton: Chapman & Hall/CRC; 2006.

85. Wooldridge JM. Introductory econometrics: a modern approach. 3rd ed. Mason, OH:

South-Western, 2006.

110

86. Berg JS, Khoury MJ, Evans JP. Deploying whole genome sequencing in clinical practice

and public health: meeting the challenge one bin at a time. Genet Med. 2011;13(6):499-504.

87. Lee RS, Behr MA. The implications of whole-genome sequencing in the control of

tuberculosis. Ther Adv Infect Dis. 2016;3(2):47-62.

88. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol.

2006;163(9):783-9.

89. Sorensen AA, Wojahn RD, Manske MC, Calfee RP. Using the strengthening the

reporting of observational studies in epidemiology (strobe) statement to assess reporting of

observational trials in hand surgery. J Hand Surg Am. 2013;38(8):1584-9.

90. Agha RA, Fowler AJ, Limb C, Whitehurst K, Coe R, Sagoo H, et al. Impact of the

mandatory implementation of reporting guidelines on reporting quality in a surgical journal: A

before and after study. Int J Surg. 2016;30:169-72.

91. da Costa BR, Cevallos M, Altman DG, Rutjes AWS, Egger M. Uses and misuses of the

STROBE statement: bibliographic study. BMJ Open. 2011;1(1).

92. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JPA, et al. The

PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate

health care interventions: explanation and elaboration. Ann Intern Med. 2009;151(4):W65-W94.

93. Kuroki LM, Allsworth JE, Peipert JF. Methodology and analytic techniques used in

clinical research: associations with journal impact factor. Obstet Gynecol. 2009;114(4):877-84.

94. Falagas ME, Kouranos VD, Michalopoulos A, Rodopoulou SP, Batsiou MA,

Karageorgopoulos DE. Comparison of the distribution of citations received by articles published

in high, moderate, and low impact factor journals in clinical medicine. Intern Med J.

2010;40(8):587-91.

95. Cabibbe AM, Trovato A, De Filippo MR, Ghodousi A, Rindi L, Garzelli C, et al.

Countrywide implementation of whole genome sequencing: an opportunity to improve

tuberculosis management, surveillance and contact tracing in low incidence countries. The Eur

Respir J. 2018;51(6): 1800387.

96. Genestet C, Tatai C, Berland JL, Claude JB, Westeel E, Hodille E, et al. Prospective

whole-genome sequencing in tuberculosis outbreak investigation, France, 2017-2018. Emerg

Infect Dis. 2019;25(3):589-92.

111

97. Walker TM, Merker M, Knoblauch AM, Helbling P, Schoch OD, van der Werf MJ, et al.

A cluster of multidrug-resistant Mycobacterium tuberculosis among patients arriving in Europe

from the Horn of Africa: a molecular epidemiological study. Lancet Infect Dis. 2018;18(4):431-

40.

98. Parsons NR, Hiskens R, Price CL, Achten J, Costa ML. A systematic survey of the

quality of research reporting in general orthopaedic journals. J Bone Joint Surg Br.

2011;93(9):1154-9.

99. Hendriksma M, Joosten MHMA, Peters JPM, Grolman W, Stegeman I. Evaluation of the

quality of reporting of observational studies in otorhinolaryngology - based on the STROBE

statement. PLoS ONE. 2017;12(1):e0169316.

100. Sharp MK, Bertizzolo L, Rius R, Wager E, Gómez G, Hren D. Using the STROBE

statement: survey findings emphasized the role of journals in enforcing reporting guidelines. J

Clin Epidemiol. 2019;116:26-35.

101. Sharp MK, Tokalić R, Gómez G, Wager E, Altman DG, Hren D. A cross-sectional

bibliometric study showed suboptimal journal endorsement rates of STROBE and its extensions.

J Clin Epidemiol. 2019;107:42-50.

102. Sharp MK, Utrobicic A, Gomez G, Cobo E, Wager E, Hren D. The STROBE extensions:

protocol for a qualitative assessment of content and a survey of endorsement. BMJ Open.

2017;7(10).

103. Van Belkum A, Tassios PT, Dijkshoorn L, Haeggman S, Cookson B, Fry NK, et al.

Guidelines for the validation and application of typing methods for use in bacterial

epidemiology. Clin Microbiol Infect. 2007;13(s3):1-46.

104. Wyllie DH, Davidson JA, Grace Smith E, Rathod P, Crook DW, Peto TEA, et al. A

Quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for

identifying Mycobacterium tuberculosis transmission: a prospective observational cohort study.

EBioMedicine. 2018;34:122-30.

105. Martin MA, Lee RS, Cowley LA, Gardy JL, Hanage WP. Within-host Mycobacterium

tuberculosis diversity and its utility for inferences of transmission. Microb Genom. 2018;4(10).

106. Lee KP, Schotland M, Bacchetti P, Bero LA. Association of journal quality indicators

with methodological quality of clinical research articles. JAMA. 2002;287(21):2805-8.

112

107. Bornmann L, Williams R. Can the journal impact factor be used as a criterion for the

selection of junior researchers? a large-scale empirical study based on ResearcherID data. J

Informetr. 2017;11(3):788-99.

108. Retzer V, Jurasinski G. Towards objectivity in research evaluation using bibliometric

indicators – a protocol for incorporating complexity. Basic Appl Ecol. 2009;10(5):393-400.

109. Oswald A. An examination of the reliability of prestigious scholarly journals: evidence

and implications for decision-makers. Economica. 2007;74(293):21-31.

110. Waltman L, Costas R, Jan van Eck N. Some limitations of the h index: a commentary on

Ruscio and colleagues' analysis of bibliometric indices. Measurement. 2012;10(3):172-5.

111. Nekrutenko A, Taylor J. Next-generation sequencing data interpretation: enhancing

reproducibility and accessibility. Nat Rev Genet. 2012;13(9):667-72.

112. Reality check on reproducibility. Nature. 2016;533(7604).

113. Simoneau J, Dumontier S, Gosselin R, Scott MS. Current RNA-seq methodology

reporting limits reproducibility. Brief Bioinform. 2019.

114. Bryant JM, Schürch AC, van Deutekom H, Harris SR, de Beer JL, de Jager V, et al.

Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome

sequencing data. BMC Infect Dis. 2013;13(1):110.

115. Fleming PS, Buckley N, Seehra J, Polychronopoulou A, Pandis N. Reporting quality of

abstracts of randomized controlled trials published in leading orthodontic journals from 2006 to

2011. Am Journal Orthod Dentofacial Orthop. 2012;142(4):451-8.

116. Al-Ghafli H, Kohl TA, Merker M, Varghese B, Halees A, Niemann S, et al. Drug-

resistance profiling and transmission dynamics of multidrug-resistant Mycobacterium

tuberculosis in Saudi Arabia revealed by whole genome sequencing. Infect Drug Resist.

2018;11:2219-29.

117. Alaridah N, Hallback ET, Tangrot J, Winqvistz N, Sturegard E, Floren-Johanssons K, et

al. Transmission dynamics study of tuberculosis isolates with whole genome sequencing in

southern Sweden. Sci Rep. 2019;9.

118. Arandjelovic I, Merker M, Richter E, Kohl TA, Savic B, Soldatovic I, et al. Longitudinal

outbreak of multidrug-resistant tuberculosis in a hospital setting, Serbia. Emerg Infect Dis.

2019;25(3):555-8.

113

119. Arnold A, Witney AA, Vergnano S, Roche A, Cosgrove CA, Houston A, et al. XDR-TB

transmission in London: Case management and contact tracing investigation assisted by early

whole genome sequencing. J Infect. 2016;73(3):210-8.

120. Auld SC, Shah NS, Mathema B, Brown TS, Ismail N, Omar SV, et al. Extensively drug-

resistant tuberculosis in South Africa: genomic evidence supporting transmission in

communities. Eur Respir J. 2018;52(4).

121. Ayabina D, Ronning JO, Alfsnes K, Debech N, Brynildsrud OB, Arnesen T, et al.

Genome-based transmission modeling separates imported tuberculosis from recent transmission

within an immigrant population. Microb Genom. 2018; 4(10): e000219.

122. Bainomugisa A, Lavu E, Hiashiri S, Majumdar S, Honjepari A, Moke R, et al. Multi-

clonal evolution of multi-drug-resistant/extensively drugresistant Mycobacterium tuberculosis in

a high-prevalence setting of Papua New Guinea for over three decades. Microb Genom.

2018;4(2):000147.

123. Bouzouita I, Cabibbe AM, Trovato A, Daroui H, Ghariani A, Midouni B, et al. Whole-

genome sequencing of drug-resistant Mycobacterium tuberculosis strains, Tunisia, 2012-2016.

Emerg Infect Dis. 2019;25(3):547-50.

124. Bjorn-Mortensen K, Soborg B, Koch A, Ladefoged K, Merker M, Lillebaek T, et al.

Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high

incidence setting: a retrospective population-based study in East Greenland. Sci Rep.

2016;6:33180.

125. Black PA, de Vos M, Louw GE, van der Merwe RG, Dippenaar A, Streicher EM, et al.

Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in

Mycobacterium tuberculosis isolates. BMC Genomics. 2015;16(1):857.

126. Brown TS, Narechania A, Walker JR, Planet PJ, Bifani PJ, Kolokotronis S-O, et al.

Genomic epidemiology of Lineage 4 Mycobacterium tuberculosis subpopulations in New York

City and New Jersey, 1999-2009. BMC Genomics. 2016;17(1):947.

127. Bryant JM, Harris SR, Parkhill J, Dawson R, Diacon AH, van Helden P, et al. Whole-

genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: A

retrospective observational study. Lancet Respir Med. 2013;1(10):786-92.

114

128. Bui DP, Oren E, Roe DJ, Brown HE, Harris RB, Knight GM, et al. A case-control study

to identify community venues associated with genetically-clustered, multidrug-resistant

tuberculosis disease in Lima, Peru. Clin Infect Dis. 2018;68(9):1547-55.

129. Casali N, Nikolayevskyy V, Balabanova Y, Ignatyeva O, Kontsevaya I, Harris SR, et al.

Microevolution of extensively drug-resistant tuberculosis in Russia. Genome Res.

2012;22(4):735-45.

130. Casali N, Nikolayevskyy V, Balabanova Y, Harris SR, Ignatyeva O, Kontsevaya I, et al.

Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nat Genet.

2014;46(3):279-86.

131. Casali N, Broda A, Harris SR, Parkhill J, Brown T, Drobniewski F. Whole genome

sequence analysis of a large isoniazid-resistant tuberculosis outbreak in London: a retrospective

observational study. PLoS Med. 2016;13(10):e1002137.

132. Chatterjee A, Nilgiriwala K, Saranath D, Rodrigues C, Mistry N. Whole genome

sequencing of clinical strains of Mycobacterium tuberculosis from Mumbai, India: A potential

tool for determining drug-resistance and strain lineage. Tuberculosis. 2017;107:63-72.

133. Clark TG, Mallard K, Coll F, Preston M, Assefa S, Harris D, et al. Elucidating emergence

and transmission of multidrug-resistant tuberculosis in treatment experienced patients by whole

genome sequencing. PLoS ONE. 2013;8(12):e83012.

134. Cohen KA, Abeel T, Manson McGuire A, Desjardins CA, Munsamy V, Shea TP, et al.

Evolution of extensively drug-resistant tuberculosis over four decades: whole genome

sequencing and dating analysis of mycobacterium tuberculosis isolates from KwaZulu-Natal.

PLoS Med. 2015;12(9):e1001880.

135. Comas I, Hailu E, Kiros T, Bekele S, Mekonnen W, Gumi B, et al. Population genomics

of Mycobacterium tuberculosis in Ethiopia contradicts the virgin soil hypothesis for human

tuberculosis in sub-saharan Africa. Curr Biol. 2015;25(24):3260-6.

136. Comas I, Coscolla M, Luo T, Borrell S, Holt KE, Kato-Maeda M, et al. Out-of-Africa

migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat

Genet. 2013;45(10):1176-82.

137. Coscolla M, Barry PM, Oeltmann JE, Koshinsky H, Shaw T, Cilnis M, et al. Genomic

epidemiology of multidrug-resistant Mycobacterium tuberculosis during transcontinental spread.

J Infect Dis. 2015;212(2):302-10.

115

138. Dheda K, Limberis JD, Pietersen E, Phelan J, Esmail A, Lesosky M, et al. Outcomes,

infectiousness, and transmission dynamics of patients with extensively drug-resistant

tuberculosis and home-discharged patients with programmatically incurable tuberculosis: a

prospective cohort study. Lancet Respir Med. 2017;5(4):269-81.

139. Dixit A, Freschi L, Vargas R, Calderon R, Sacchettini J, Drobniewski F, et al. Whole

genome sequencing identifies bacterial factors affecting transmission of multidrug-resistant

tuberculosis in a high-prevalence setting. Sci Rep. 2019;9.

140. Doroshenko A, Pepperell CS, Heffernan C, Egedahl ML, Mortimer TD, Smith TM, et al.

Epidemiological and genomic determinants of tuberculosis outbreaks in First Nations

communities in Canada. BMC Med. 2018;16.

141. Eldholm V, Monteserin J, Rieux A, Lopez B, Sobkowiak B, Ritacco V, et al. Four

decades of transmission of a multidrug-resistant Mycobacterium tuberculosis outbreak strain.

Nat Commun. 2015;6.

142. Fiebig L, Kohl TA, Popovici O, Muhlenfeld M, Indra A, Homorodean D, et al. A joint

cross-border investigation of a cluster of multidrug-resistant tuberculosis in Austria, Romania

and Germany in 2014 using classic, genotyping and whole genome sequencing methods: Lessons

learnt. Euro Surveill. 2017;22(2).

143. Gardy JL, Johnston JC, Sui SJ, Cook VJ, Shah L, Brodkin E, et al. Whole-genome

sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med.

2011;364(8):730-9.

144. Gautam SS, Aogain MM, Cooley LA, Haug G, Fyfe JA, Globan M, et al. Molecular

epidemiology of tuberculosis in Tasmania and genomic characterisation of its first known multi-

drug resistant case. PLoS ONE. 2018;13(2):e0192351.

145. Gautam SS, Mac Aogain M, Bower JE, Basu I, O'Toole RF. Differential carriage of

virulence-associated loci in the New Zealand Rangipo outbreak strain of Mycobacterium

tuberculosis. Infect Dis. 2017;49(9):680-8.

146. Glynn JR, Guerra-Assuncao JA, Houben RM, Sichali L, Mzembe T, Mwaungulu LK, et

al. Whole genome sequencing shows a low proportion of tuberculosis disease is attributable to

known close contacts in rural Malawi. PLoS ONE. 2015;10(7):e0132840.

116

147. Guerra-Assuncao JA, Crampin AC, Houben RM, Mzembe T, Mallard K, Coll F, et al.

Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a

high prevalence area. eLife. 2015;4:03.

148. Guerra-Assunçao JA, Houben RMGJ, Crampin AC, Mzembe T, Mallard K, Coll F, et al.

Recurrence due to relapse or reinfection with Mycobacterium tuberculosis: a whole-genome

sequencing approach in a large, population-based cohort with a high HIV infection prevalence

and active follow-up. J Infect Dis. 2015;211(7):1154-63.

149. Guthrie JL, Delli Pizzi A, Roth D, Kong C, Jorgensen D, Rodrigues M, et al. Genotyping

and Whole-Genome Sequencing to Identify Tuberculosis Transmission to Pediatric Patients in

British Columbia, Canada, 2005-2014. J Infect Dis. 2018;218(7):1155-63.

150. Ho ZJM, Chee CBE, Ong RTH, Sng LH, Peh WLJ, Cook AR, et al. Investigation of a

cluster of multi-drug resistant tuberculosis in a high-rise apartment block in Singapore. Int J

Infect Dis. 2018;67:46-51.

151. Holden KL, Bradley CW, Curran ET, Pollard C, Smith G, Holden E, et al. Unmasking

leading to a healthcare worker Mycobacterium tuberculosis transmission. J Hosp Infect.

2018;100(4):E226-E32.

152. Holt KE, McAdam P, Thai PVK, Thuong NTT, Ha DTM, Lan NN, et al. Frequent

transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the

EsxW Beijing variant in Vietnam. Nat Genet. 2018;50(6):849-56.

153. Huang H, Ding N, Yang T, Li C, Jia X, Wang G, et al. Cross-sectional whole-genome

sequencing and epidemiological study of multidrug-resistant Mycobacterium tuberculosis in

China. Clin Infect Dis. 2019; 69(30):405-413.

154. Ioerger TR, Koo S, No E-G, Chen X, Larsen MH, Jacobs WR, Jr., et al. Genome analysis

of multi- and extensively-drug-resistant tuberculosis from KwaZulu-Natal, South Africa. PLoS

ONE. 2009;4(11):e7778.

155. Ioerger TR, Feng Y, Chen X, Dobos KM, Victor TC, Streicher EM, et al. The non-

clonality of drug resistance in Beijing-genotype isolates of Mycobacterium tuberculosis from the

Western Cape of South Africa. BMC Genomics. 2010;11:670.

156. Ismail NA, Omar SV, Joseph L, Govender N, Blows L, Ismail F, et al. Defining

bedaquiline susceptibility, resistance, cross-resistance and associated genetic determinants: a

retrospective cohort study. EBioMedicine. 2018;28:136-42.

117

157. Jajou R, de Neeling A, Rasmussen EM, Norman A, Mulder A, van Hunen R, et al. A

predominant variable-number tandem-repeat cluster of Mycobacterium tuberculosis isolates

among asylum seekers in the Netherlands and Denmark, deciphered by whole-genome

sequencing. J Clin Microbiol. 2018;56(2).

158. Jajou R, De Neeling A, Van Hunen R, De Vries G, Schimmel H, Mulder A, et al.

Epidemiological links between tuberculosis cases identified twice as efficiently by whole

genome sequencing than conventional molecular typing: A population-based study. PLoS ONE.

2018;13(4):e0195413.

159. Jiang Q, Lu L, Wu J, Yang C, Prakash R, Zuo T, et al. Assessment of tuberculosis contact

investigation in Shanghai, China: An 8-year cohort study. Tuberculosis. 2018;108:10-5.

160. Kato-Maeda M, Ho C, Passarelli B, Banaei N, Grinsdale J, Flores L, et al. Use of whole

genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an

outbreak. PLoS ONE. 2013;8 (3); e58235.

161. Koster K, Largen A, Foster JT, Drees KP, Qian LS, Desmond EP, et al. Whole genome

SNP analysis suggests unique virulence factor differences of the Beijing and Manila families of

Mycobacterium tuberculosis found in Hawaii. PLoS ONE. 2018;13(7).

162. Koster KJ, Largen A, Foster JT, Drees KP, Qian LS, Desmond E, et al. Genomic

sequencing is required for identification of tuberculosis transmission in Hawaii. BMC Infect Dis.

2018;18.

163. Kato-Miyazawa M, Miyoshi-Akiyama T, Kanno Y, Takasaki J, Kirikae T, Kobayashi N.

Genetic diversity of Mycobacterium tuberculosis isolates from foreign-born and Japan-born

residents in Tokyo. Clin Microbiol Infect. 2015;21(3):248.

164. Korhonen V, Smit PW, Haanpera M, Casali N, Ruutu P, Vasankari T, et al. Whole

genome analysis of Mycobacterium tuberculosis isolates from recurrent episodes of tuberculosis,

Finland, 1995-2013. Clin Microbiol Infect. 2016;22(6):549-54.

165. Lalor MK, Casali N, Walker TM, Anderson LF, Davidson JA, Ratna N, et al. The use of

whole-genome sequencing in cluster investigation of a multidrug-resistant tuberculosis outbreak.

Eur Respir J. 2018;51(6).

166. Lanzas F, Karakousis PC, Sacchettini JC, Ioerger TR. Multidrug-resistant tuberculosis in

panama is driven by clonal expansion of a multidrug-resistant mycobacterium tuberculosis strain

118

related to the KZN extensively drug-resistant m. tuberculosis strain from South Africa. J Clinl

Microbiol. 2013;51(10):3277-85.

167. Lee RS, Radomski N, Proulx JF, Manry J, McIntosh F, Desjardins F, et al. Reemergence

and amplification of tuberculosis in the Canadian Arctic. J Infect Dis. 2015;211(12):1905-14.

168. Lee RS, Radomski N, Proulx J-F, Levade I, Shapiro BJ, McIntosh F, et al. Population

genomics of Mycobacterium tuberculosis in the Inuit. Proc Natl Acad Sci USA.

2015;112(44):13609-14.

169. Luo T, Comas I, Luo D, Lu B, Wu J, Wei L, et al. Southern East Asian origin and

coexpansion of Mycobacterium tuberculosis Beijing family with Han Chinese. Proc Natl Acad

Sci USA. 2015;112(26):8136-41.

170. Luo T, Yang C, Peng Y, Lu L, Sun G, Wu J, et al. Whole-genome sequencing to detect

recent transmission of Mycobacterium tuberculosis in settings with a high burden of

tuberculosis. Tuberculosis. 2014;94(4):434-40.

171. Ma MJ, Yang Y, Wang HB, Zhu YF, Fang LQ, An XP, et al. Transmissibility of

tuberculosis among school contacts: An outbreak investigation in a boarding middle school,

China. Infect Genet Evol. 2015;32:148-55.

172. Macedo R, Pinto M, Borges V, Nunes A, Oliveira O, Portugal I, et al. Evaluation of a

gene-by-gene approach for prospective whole-genome sequencing-based surveillance of

multidrug resistant Mycobacterium tuberculosis. Tuberculosis. 2019;115:81-8.

173. Madrazo-Moya CF, Cancino-Munoz I, Cuevas-Cordoba B, Gonzalez-Covarrubias V,

Barbosa-Amezcua M, Soberon X, et al. Whole genomic sequencing as a tool for diagnosis of

drug and multidrug-resistance tuberculosis in an endemic region in Mexico. PLoS ONE.

2019;14(6).

174. Mai TQ, Martinez E, Menon R, Van Anh NT, Hien NT, Marais B, et al. Mycobacterium

tuberculosis drug resistance and transmission among human immunodeficiency virus-infected

patients in Ho Chi Minh City, Vietnam. Am J of Trop Med Hyg. 2018;99(6):1397-406.

175. Makhado NA, Matabane E, Faccin M, Pincon C, Jouet A, Boutachkourt F, et al.

Outbreak of multidrug-resistant tuberculosis in South Africa undetected by WHO-endorsed

commercial tests: an observational study. Lancet Infect Dis. 2018;18(12):1350-9.

119

176. Malm S, Linguissi LSG, Tekwu EM, Vouvoungui JC, Kohl TA, Beckert P, et al. New

Mycobacterium tuberculosis complex sublineage, Brazzaville, Congo. Emerg Infect Dis.

2017;23(3):423-9.

177. Manson AL, Abeel T, Galagan JE, Sundaramurthi JC, Salazar A, Gehrmann T, et al.

Mycobacterium tuberculosis whole genome sequences from Southern India suggest novel

resistance mechanisms and the need for region-specific diagnostics. Clin Infect Dis.

2017;64(11):1494-501.

178. Manson AL, Cohen KA, Abeel T, Desjardins CA, Armstrong DT, Barry CE, et al.

Genomic analysis of globally diverse Mycobacterium tuberculosis strains provides insights into

the emergence and spread of multidrug resistance. Nat Genet. 2017;49(3):395-402.

179. Mehaffy C, Guthrie JL, Alexander DC, Stuart R, Rea E, Jamieson FB. Marked

microevolution of a unique Mycobacterium tuberculosis strain in 17 years of ongoing

transmission in a high risk population. PLoS ONE. 2014;9(11):0112928.

180. Merker M, Blin C, Mona S, Duforet-Frebourg N, Lecher S, Willery E, et al. Evolutionary

history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet.

2015;47(3):242-9.

181. Merker M, Barbier M, Cox H, Rasigade JP, Feuerriegel S, Kohl TA, et al. Compensatory

evolution drives multidrug-resistant tuberculosis in central Asia. eLife. 2018;7.

182. Merker M, Kohl TA, Roetzer A, Truebe L, Richter E, Rusch-Gerdes S, et al. Whole

genome sequencing reveals complex evolution patterns of multidrug-resistant Mycobacterium

tuberculosis Beijing strains in patients. PLoS ONE. 2013;8(12):e82551.

183. Mizukoshi F, Miyoshi-Akiyama T, Iwai H, Suzuki T, Kiritani R, Kirikae T, et al. Genetic

diversity of Mycobacterium tuberculosis isolates from Tochigi prefecture, a local region of

Japan. BMC Infect Dis. 2017;17(1):365.

184. Mokrousov I, Shitikov E, Skiba Y, Kolchenko S, Chernyaeva E, Vyazovaya A. Emerging

peak on the phylogeographic landscape of Mycobacterium tuberculosis in west Asia: definitely

smoke, likely fire. Mol Phylogenetics Evol. 2017;116:202-12.

185. Mortimer TD, Weber AM, Pepperell CS. Signatures of selection at drug resistance loci in

Mycobacterium tuberculosis. mSystems. 2018;3(1).

120

186. Nelson KN, Shah NS, Mathema B, Ismail N, Brust JCM, Brown TS, et al. Spatial

patterns of extensively drug-resistant tuberculosis transmission in KwaZulu-Natal, South Africa.

J Infect Dis. 2018;218(12):1964-73.

187. Norheim G, Seterelv S, Arnesen TM, Mengshoel AT, Tonjum T, Ronning JO, et al.

Tuberculosis Outbreak in an Educational Institution in Norway. J Clin Microbiol.

2017;55(5):1327-33.

188. Ocheretina O, Shen L, Escuyer VE, Mabou M-M, Royal-Mardi G, Collins SE, et al.

Whole genome sequencing investigation of a tuberculosis outbreak in Port-au-Prince, Haiti

caused by a strain with a "low-level" rpob mutation l511p - insights into a mechanism of

resistance escalation. PLoS ONE. 2015;10(6):e0129207.

189. O'Neill MB, Shockey A, Zarley A, Aylward W, Eldholm V, Kitchen A, et al. Lineage

specific histories of Mycobacterium tuberculosis dispersal in Africa and Eurasia. Mol Ecol.

2019;28(13):3241-56.

190. Otchere ID, Coscolla M, Sanchez-Buso L, Asante-Poku A, Meehan C, Osei-Wusu S, et

al. Comparative genomics of Mycobacterium africanum Lineage 5 and Lineage 6 from Ghana

suggests different ecological niches. Sci Rep. 2018;8;11269.

191. Outhred AC, Holmes N, Sadsad R, Martinez E, Jelfs P, Hill-Cawthorne GA, et al.

Identifying likely transmission pathways within a 10-year community outbreak of tuberculosis

by high-depth whole genome sequencing. PLoS ONE. 2016;11(3):e0150550.

192. Packer S, Green C, Brooks-Pollock E, Chaintarli K, Harrison S, Beck CR. Social network

analysis and whole genome sequencing in a cohort study to investigate TB transmission in an

educational setting. BMC Infect Dis. 2019;19.

193. Panossian B, Salloum T, Araj GF, Khazen G, Tokajian S. First insights on the genetic

diversity of MDR Mycobacterium tuberculosis in Lebanon. BMC Infect Dis. 2018;18.

194. Parvaresh L, Crighton T, Martinez E, Bustamante A, Chen S, Sintchenko V. Recurrence

of tuberculosis in a low-incidence setting: a retrospective cross-sectional study augmented by

whole genome sequencing. BMC Infect Dis. 2018;18.

195. Perdigao J, Silva H, Machado D, Macedo R, Maltez F, Silva C, et al. Unraveling genomic

diversity and evolution in Lisbon, Portugal, a highly drug resistant setting. BMC Genomics.

2014;15 (1); 991.

121

196. Perez-Lago L, Comas I, Navarro Y, Gonzalez-Candelas F, Herranz M, Bouza E, et al.

Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium

tuberculosis: potential impact on the inference of tuberculosis transmission. J Infect Dis.

2014;209(1):98-108.

197. Regmi SM, Chaiprasert A, Kulawonganunchai S, Tongsima S, Coker OO, Prammananan

T, et al. Whole genome sequence analysis of multidrug-resistant Mycobacterium tuberculosis

Beijing isolates from an outbreak in Thailand. Mol Genet Genomics. 2015;290(5):1933-41.

198. Roycroft E, O'Toole RF, Fitzgibbon MM, Montgomery L, O'Meara M, Downes P, et al.

Molecular epidemiology of multi- and extensively-drug-resistant Mycobacterium tuberculosis in

Ireland, 2001-2014. J Infect. 2018;76(1):55-67.

199. Ruesen C, Chaidir L, van Laarhoven A, Dian S, Ganiem AR, Nebenzahl-Guimaraes H, et

al. Large-scale genomic analysis shows association between homoplastic genetic variation in

Mycobacterium tuberculosis genes and meningeal or pulmonary tuberculosis. BMC Genomics.

2018;19(1):122.

200. Rutaihwa LK, Menardo F, Stucki D, Gygli SM, Ley SD, Malla B, et al. Multiple

introductions of Mycobacterium tuberculosis lineage 2–Beijing into Africa over centuries. Front

Ecol and Evol. 2019;7(112).

201. Saelens JW, Lau-Bonilla D, Moller A, Medina N, Guzman B, Calderon M, et al. Whole

genome sequencing identifies circulating Beijing-lineage Mycobacterium tuberculosis strains in

Guatemala and an associated urban outbreak. Tuberculosis. 2015;95(6):810-6.

202. Satta G, Witney AA, Shorten RJ, Karlikowska M, Lipman M, McHugh TD. Genetic

variation in Mycobacterium tuberculosis isolates from a London outbreak associated with

isoniazid resistance. BMC Med. 2016;14:1-9.

203. Schurch AC, Kremer K, Daviena O, Kiers A, Boeree MJ, Siezen RJ, et al. High-

resolution typing by integration of genome sequencing data in a large tuberculosis cluster. J Clin

Microbiol. 2010;48(9):3403-6.

204. Senghore M, Otu J, Witney A, Gehre F, Doughty EL, Kay GL, et al. Whole-genome

sequencing illuminates the evolution and spread of multidrug-resistant tuberculosis in Southwest

Nigeria. PLoS ONE. 2017;12(9):e0184510.

122

205. Seraphin MN, Didelot X, Nolan DJ, May JR, Khan MSR, Murray ER, et al. Genomic

investigation of a Mycobacterium tuberculosis outbreak involving prison and community cases

in Florida, United States. Am J Trop Med Hyg. 2018;99(4):867-74.

206. Shah NS, Auld SC, Brust JCM, Mathema B, Ismail N, Moodley P, et al. Transmission of

extensively drug-resistant tuberculosis in south Africa. N Engl J Med. 2017;376(3):243-53.

207. Smit PW, Vasankari T, Aaltonen H, Haanpera M, Casali N, Marttila H, et al. Enhanced

tuberculosis outbreak investigation using whole genome sequencing and IGRA. Eur Resp J.

2015;45(1):276-9.

208. Sobkowiak B, Glynn JR, Houben R, Mallard K, Phelan JE, Guerra-Assuncao JA, et al.

Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data.

BMC Genomics. 2018;19(1):613.

209. Stucki D, Ballif M, Bodmer T, Coscolla M, Maurer AM, Droz S, et al. Tracking a

tuberculosis outbreak over 21 years: strain-specific single-nucleotide polymorphism typing

combined with targeted whole-genome sequencing. J Infect Dis. 2015;211(8):1306-16.

210. Stucki D, Ballif M, Egger M, Furrer H, Altpeter E, Battegay M, et al. Standard

genotyping overestimates transmission of Mycobacterium tuberculosis among immigrants in a

low-incidence country. J Clin Microbiol. 2016;54(7):1862-70.

211. Stucki D, Brites D, Jeljeli L, Coscolla M, Liu Q, Trauner A, et al. Mycobacterium

tuberculosis lineage 4 comprises globally distributed and geographically restricted sublineages.

Nat Genet. 2016;48(12):1535-43.

212. Tyler AD, Randell E, Baikie M, Antonation K, Janella D, Christianson S, et al.

Application of whole genome sequence analysis to the study of Mycobacterium tuberculosis in

Nunavut, Canada. PLoS ONE. 2017;12(10):e0185656.

213. Vaziri F, Kohl TA, Ghajavand H, Kamakoli MK, Merker M, Hadifar S, et al. Genetic

diversity of multi- and extensively drug-resistant mycobacterium tuberculosis isolates in the

capital of Iran, revealed by whole-genome sequencing. J Clin Microbiol. 2019;57(1).

214. Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome

sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational

study. Lancet Infect Dis. 2013;13(2):137-46.

123

215. Walker TM, Lalor MK, Broda A, Ortega LS, Morgan M, Parker L, et al. Assessment of

Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen

genome sequences: an observational study. Lancet Respir Med. 2014;2(4):285-92.

216. Winglee K, Manson McGuire A, Maiga M, Abeel T, Shea T, Desjardins CA, et al. Whole

genome sequencing of Mycobacterium africanum strains from Mali provides insights into the

mechanisms of geographic restriction. PLoS Negl Trop Dis. 2016;10(1):e0004332.

217. Witney AA, Bateson AL, Jindani A, Phillips PP, Coleman D, Stoker NG, et al. Use of

whole-genome sequencing to distinguish relapse from reinfection in a completed tuberculosis

clinical trial. BMC Med. 2017;15(1):71.

218. Wollenberg KR, Desjardins CA, Zalutskaya A, Slodovnikova V, Oler AJ, Quinones M, et

al. Whole-genome sequencing of Mycobacterium tuberculosis provides insight into the evolution

and genetic composition of drug-resistant tuberculosis in Belarus. J Clin Microbiol.

2017;55(2):457-60.

219. Wyllie DH, Davidson JA, Smith EG, Rathod P, Crook DW, Peto TEA, et al. A

quantitative evaluation of miru-vntr typing against whole-genome sequencing for identifying

mycobacterium tuberculosis transmission: a prospective observational cohort study.

EBioMedicine. 2018;34:122-30.

220. Yang C, Luo T, Shen X, Wu J, Gan M, Xu P, et al. Transmission of multidrug-resistant

Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using

whole-genome sequencing and epidemiological investigation. Lancet Infect Dis. 2017;17(3):275-

84.

221. Yang CG, Lu LP, Warren JL, Wu J, Jiang Q, Zuo TY, et al. Internal migration and

transmission dynamics of tuberculosis in Shanghai, China: an epidemiological, spatial, genomic

analysis. Lancet Infect Dis. 2018;18(7):788-95.

222. Yimer SA, Namouchi A, Zegeye ED, Holm-Hansen C, Norheim G, Abebe M, et al.

Deciphering the recent phylogenetic expansion of the originally deeply rooted Mycobacterium

tuberculosis lineage 7. BMC Evol Biol. 2016;16(1):146.

223. Adams AD, Benner RS, Riggs TW, Chescheir NC. Use of the STROBE checklist to

evaluate the reporting quality of observational research in obstetrics. Obstet Gynecol.

2018;132(2):507-12.

124

224. Bhopal R, Rankin J, McColl E, Thomas L, Kaner E, Stacy R, et al. The vexed question of

authorship: views of researchers in a British medical faculty. BMJ. 1997;314(7086):1009-12.

225. Reisenberg D, Lundberg GD. The order of authorship: who’s on first? JAMA.

1990(264):1857.

226. Jajou R, de Neeling A, van Hunen R, de Vries G, Schimmel H, Mulder A, et al.

Epidemiological links between tuberculosis cases identified twice as efficiently by whole

genome sequencing than conventional molecular typing: a population-based study. PLoS One.

2018;13(4):e0195413.

227. Wyllie D, Davidson J, Walker T, Rathod P, Peto T, Robinson E, et al. A quantitative

evaluation of MIRU-VNTR typing against whole-genome sequencing for identifying

Mycobacterium tuberculosis transmission: a prospective observational cohort study.

EBioMedicine. 2018.

228. Gurjav U, Outhred AC, Jelfs P, McCallum N, Wang Q, Hill-Cawthorne GA, et al. Whole

genome sequencing demonstrates limited transmission within identified Mycobacterium

tuberculosis clusters in New South Wales, Australia. PLoS One. 2016;11(10): e0163612.

229. Moher D, Jones A, Lepage L, Group ftC. Use of the CONSORT statement and quality of

reports of randomized trials: a comparative before-and-after evaluation. JAMA.

2001;285(15):1992-5.

230. Cobo E, Cortés J, Ribera JM, Cardellach F, Selva-O'Callaghan A, Kostov B, et al. Effect

of using reporting guidelines during peer review on quality of final manuscripts submitted to a

biomedical journal: masked randomised trial. BMJ. 2011;343:d6783-d.

231. Ramke J, Palagyi A, Jordan V, Petkovic J, Gilbert CE. Using the STROBE statement to

assess reporting in blindness prevalence surveys in low and middle income countries. PLoS One.

2017;12(5):e0176178-e.

232. Fuller T, Pearson M, Peters J, Anderson R. What affects authors’ and editors’ use of

reporting guidelines? findings from an online survey and qualitative interviews. PLoS One.

2015;10(4):e0121585.

233. Huynh N, Baumann A, Loeb M. Reporting quality of the 2014 Ebola outbreak in Africa:

a systematic analysis. PLoS One. 2019;14(6):e0218170-e.

234. Lo C, Mertz D, Loeb M. Assessing the reporting quality of influenza outbreaks in the

community. Influenza Other Respir Viruses. 2017;11(6):556-63.

125

Appendix

Appendix 1. Distribution of H-index in included papers

Few eligible papers had a H-index >80. Most eligible papers had a h-index between 10 to 40.

Appendix 2. Distribution of count of eligible criteria met in included papers

The number of fulfilled STROME-ID eligible criteria varied. Few papers completed >10 of all eligible STROME-ID criteria. Appendix 3. Distribution of proportion of all criteria met in included papers

Most papers reported a proportion of approximately 0.50 of all STROME-ID criteria, with few

papers completing a proportion of <0.2 or >0.70.