34
Global phylogeography and evolutionary history of Shigella dysenteriae type 1 Elisabeth Njamkepo, Nizar Fawal, Alicia Tran-Dien, Jane Hawkey, Nancy Strockbine, Claire Jenkins, Kaisar A. Talukder, Raymond Bercion, Konstantin Kuleshov, Renáta Kolínská, Julie E. Russell, Lidia Kaftyreva, Marie Accou- Demartin, Andreas Karas, Olivier Vandenberg, Alison E. Mather, Carl J. Mason, Andrew J. Page, Thandavarayan Ramamurthy, Chantal Bizet, Andrzej Gamian, Isabelle Carle, Amy Gassama Sow, Christiane Bouchier, Astrid Louise Wester, Monique Lejay-Collin, Marie-Christine Fonkoua, Simon Le Hello, Martin J. Blaser, Cecilia Jernberg, Corinne Ruckly, Audrey Mérens, Anne-Laure Page, Martin Aslett, Peter Roggentin, Angelika Fruth, Erick Denamur, Malabi Venkatesan, Hervé Bercovier, Ladaporn Bodhidatta, Chien-Shun Chiou, Dominique Clermont, Bianca Colonna, Svetlana Egorova, Gururaja P. Pazhani, Analia V. Ezernitchi, Ghislaine Guigon, Simon R. Harris, Hidemasa Izumiya, Agnieszka Korzeniowska-Kowal, Anna Lutyńska, Malika Gouali, Francine Grimont, Céline Langendorf, Monika Marejková, Lorea A. M. Peterson, Guillermo Perez-Perez, Antoinette Ngandjio, Alexander Podkolzin, Erika Souche, Mariia Makarova, German A. Shipulin, Changyun Ye, Helena Žemličková, Mária Herpay, Patrick A.D. Grimont, Julian Parkhill, Philippe Sansonetti, Kathryn E. Holt, Sylvain Brisse, Nicholas R. Thomson, François-Xavier Weill. SUPPLEMENTARY INFORMATION ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 NATURE MICROBIOLOGY | www.nature.com/naturemicrobiology 1

ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

1

Supplementary Information Global phylogeography and evolutionary history of Shigella dysenteriae type 1

Elisabeth Njamkepo, Nizar Fawal, Alicia Tran-Dien, Jane Hawkey, Nancy

Strockbine, Claire Jenkins, Kaisar A. Talukder, Raymond Bercion, Konstantin

Kuleshov, Renáta Kolínská, Julie E. Russell, Lidia Kaftyreva, Marie Accou-

Demartin, Andreas Karas, Olivier Vandenberg, Alison E. Mather, Carl J. Mason,

Andrew J. Page, Thandavarayan Ramamurthy, Chantal Bizet, Andrzej Gamian,

Isabelle Carle, Amy Gassama Sow, Christiane Bouchier, Astrid Louise Wester,

Monique Lejay-Collin, Marie-Christine Fonkoua, Simon Le Hello, Martin J. Blaser,

Cecilia Jernberg, Corinne Ruckly, Audrey Mérens, Anne-Laure Page, Martin Aslett,

Peter Roggentin, Angelika Fruth, Erick Denamur, Malabi Venkatesan, Hervé

Bercovier, Ladaporn Bodhidatta, Chien-Shun Chiou, Dominique Clermont, Bianca

Colonna, Svetlana Egorova, Gururaja P. Pazhani, Analia V. Ezernitchi, Ghislaine

Guigon, Simon R. Harris, Hidemasa Izumiya, Agnieszka Korzeniowska-Kowal,

Anna Lutyńska, Malika Gouali, Francine Grimont, Céline Langendorf, Monika

Marejková, Lorea A. M. Peterson, Guillermo Perez-Perez, Antoinette Ngandjio,

Alexander Podkolzin, Erika Souche, Mariia Makarova, German A. Shipulin,

Changyun Ye, Helena Žemličková, Mária Herpay, Patrick A.D. Grimont, Julian

Parkhill, Philippe Sansonetti, Kathryn E. Holt, Sylvain Brisse, Nicholas R. Thomson,

François-Xavier Weill.

SUPPLEMENTARY INFORMATIONARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27

NATURE MICROBIOLOGY | www.nature.com/naturemicrobiology 1

Page 2: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

2

Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of the Murray collection. a, The phylogeny shown is the maximum likelihood phylogeny based on 14,677 chromosomal SNPs (reference genome Sd197). b, Source of the strains with hypotheses concerning their geographic origin.

1

2-5

6-10

II IIIa

Lineage

no. of isolates

IIIc

2nd Western General Hospital, Manchester

West Africa via French colonial soldiers ?

Japan via Imperial Navy ships!

Gallipoli (evacuated troops)

Royal Navy Hospital Mtarfa, Malta

R.A.M. College, Millbank

IIIb

Lister Institute of Preventive Medicine, London

?!?!

?!

a

WWI isolates

Key ( WW1): Absent PresentKey (Lineage ): 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076

A1_47CDC_F1372CDC_F1358CDC_F1371

1036_99_PF1SD_G3A1_131

67CDC_F4043

CDC_08_3380CDC_C8546CDC_F4046

CDC_F4042_repeatY489_94

CDC_F3434CDC_ZB497_13397

CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579

CDC_F683CDC_F344X1

CDC_F2856CDC_F2349CDC_F3432

CDC_F690CDC_F1454_2_PF1

CDC_ZB37102_04990

CDC_F1605M1PAMA

258_11EDH06DH05

06_130705_733405_7337

CAR5CAR4CAR1

01_1502CDC_F6465

98_343DH03

00_809799_9837

13_0527913_05318

03_2815P2000_1070P2000_1068

00_516606_627600_2688

RKI00_208499_4320

IPD_11654IPD_2109499_10354

E46746_8795_6044

E44600_86CDC_9033_89

AG_896K_1613KO_140

M_J84KO_170

KO_73E85944_92NCID_Az11

As_15858As_15878

MS_836Sumoti

NCID_IDH03216108_06369

KH08_0455NCID_NK2490NCID_BCH518

NCID_21NCID_25

KOI_19TA1

NCID_D44NCID_D1

AR_3349305_7526_PF1

As_1950038_SH05

997_9297_7783

E60750_8999_729

D367_01D482_95

E35066_84KO_21_PF1

AD_45KO_216KO_120

HUS_137K_1438

KO_8Ra_236KO_54KO_46

CRIE_160C_234

CRIE_154MK_803_CE29852_83

687_84E35155_84

25_90CDC_84_787

740_82bis40_81

09_6544KO_47

CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314

CRIE_1203CRIE_519

CDC_9021_89CRIE_1179E30225_83

CDC_87_3333CDC_87_3330

48_Laos3_LaosKO_49

KO_106_PF1KO_225

E3208_76E35062_84

KO_226KO_231

CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395

SPH_755SPH_201

M_18KO_131

CDC_9025_89219_85

E2427_73E16184_79

151_77E2866_73E3008_73

SZ1250_36_74E15_74

E1347_72Wurzburg_206

F_134sdZM603

CDC_3099_85CDC_BU22X2CDC_BU53M1

Iris_1349_87E39084_85

Iris_KAT1Iris_KAT4

CDC_B9624_PF12104

C5778CDC_Z1

11_8112051650C413C596

RAJ_KC2022

CDC_A5468_PF1E45448_86bis

3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458

E1892_72E2471_73E2560_73

84_036E37992_85

99_7947RKI99_6909

99_570684934

RKI99_884263478

99_9324IPD_23

IPD_1200927_84

7_87_PF116_8612_7715_77

6_80_PF1E7926_77bis

CIP56_33_PF1RIMD3101010

E192_75E174_75CRIE_82

HNCMB_20080CAR7_PF1

CAR18CAR10CAR19

2_90_repeatCIP62_17_PF1CIP54_95_PF1

M430M71

CIP106200_PF1CDC_61_5512

91R14CDC_92_9000

91R17CDC_84_305

23_84CDC_C838CDC_C897

CDC_70_3827CDC_69_3823CDC_69_3818

E1012_74C_164

CDC_79_1480bisCDC_C1041

CDC_C1039bis2735

CDC_1007_74CDC_62_5000_PF1

450CDC_55_986_PF1

M160NIPHW_373

NIPHW_18NIPHW_17

NIPHW_374NIPHW_371CRIE_1172

CRIE_904Shigella_dysenteriae_Sd197_v1

Sd197_PF1WR1414

Udorn7CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937

60RHanabusa

M220HNCMB_20003HNCMB_20002

PCM_159CIP55_137_PF1

Sh46_58Sh45_58Sh30_44

IPSP_14_941HNCMB_20001

PCM_126RKI1966

IPSP_14_940IPSP_14_939IPSP_15_159

CIP57_28_PF1CDC_3036_94

17_89_PF193_531

91_312798_4962_PF1

91_312510_87

9_89E322_74

CIP58_1_PF1Sh39_47

CDC_1199_PF1M222

Sh31_44Sh33_44Sh37_45

CIP55_90SH_Lisbonne

CRIE_954CIP52_27_PF1

Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58

M117M118M188M216

M63M217M189M119

IPSP_15_158M159M165M116

M115_cloned

Linea

ge

WW1

IV

I

III

II b

Page 3: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

3

Supplementary Fig. 2 Year of isolation vs. root-to-tip distances extracted by Path-O-Gen from an ML phylogeny. Linear regression line, slope, R² correlation coefficient, and time to the most recent common ancestor (TMRCA) are indicated for the whole dataset (panel a) and separately for lineages II to IV (panels b to d). Isolate CDC 3036-94, which was probably acquired during laboratory contamination with an old collection strain, was excluded from the analysis. The maximum likelihood (ML) phylogeny used is based on 14,677 chromosomal SNPs (reference genome Sd197).

b

d c

a

Page 4: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

4

Supplementary Fig. 3 Correlation between pulsed-field gel electrophoresis (PFGE) data and genomic sequences from isolates recovered during two outbreaks in the Central African Republic in 2003-2004. a, Time span, location, morbidity and mortality of the two outbreaks in the Central African Republic (CAR), according to Bercion et al.25. b, For each isolate analysed by XbaI–pulsed-field gel electrophoresis (PFGE), the position within the maximum likelihood phylogenetic tree (reference genome Sd197) is shown. The dendrogram was generated using BioNumerics version 4.0 (Applied Maths, Sint-Martens-Latem, Belgium) and shows the results of cluster analysis on the basis of XbaI–PFGE fingerprinting. Similarity analysis was performed using the Dice coefficient, and clustering analysis was performed by using the unweighted pair-group method with arithmetic averages (UPGMA). c, Original PFGE gel showing the different XbaI–profiles (X1 to X18). The isolates that were whole-genome sequenced are named. Salmonella enterica serotype Braenderup H9812 was used as a molecular size marker (M).

VAKAGA

HAUTE-KOTTO

HAUT- MBOMOU

MBOMOUBASSE-KOTTO

OUAKAKEMO

BAMINGUI-BANGORAN

NANA-GREBIZI

OMBELLA- MPOKO

LOBAYE

OUHAMOUHAM-PENDE

NANA-MAMBERE

SANGHA-MBAERE

MAMBERE-KADEI

BOSSANGOA

NOLA

BAMBARI

BIRAO

OBO

BOALI

BOZOUMBOUAR

BERBERATI

MBAIKI

SIBUT

MOBAYEBANGASSOU

BRIA

NDELE

KAGA-BANDORO

BANGUI

SUDAN

DEMOCRATIC REPUBLIC of CONGO

CHAD

C

A

M

E

R

O

O

N

PAOUA

CONGO 100 km

Jul 2003-Feb 2004 2013 cases, 41 deaths

Aug-Dec 2004 445 cases, 34 deaths

a

b

Key (PFGE XbaI): X01 X02 X08 X09 X10 X11 X12 X13 X14 X15 X16 X17 X18 X5 X6Key: 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076

A1_47CDC_F1372CDC_F1358CDC_F1371

1036_99_PF1SD_G3A1_131

67CDC_F4043

CDC_08_3380CDC_C8546CDC_F4046

CDC_F4042_repeatY489_94

CDC_F3434CDC_ZB497_13397

CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683

CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690

CDC_F1454_2_PF1CDC_ZB37102_04990

CDC_F1605M1PAMA

258_11EDH06DH05

06_130705_733405_7337

CAR5CAR4CAR1

01_1502CDC_F6465

98_343DH03

00_809799_9837

13_0527913_0531803_2815

P2000_1070P2000_1068

00_516606_627600_2688

RKI00_208499_4320

IPD_11654IPD_2109499_10354

E46746_8795_6044

E44600_86CDC_9033_89

AG_896K_1613KO_140

M_J84KO_170KO_73

E85944_92NCID_Az11

As_15858As_15878

MS_836Sumoti

NCID_IDH03216108_06369

KH08_0455NCID_NK2490NCID_BCH518

NCID_21NCID_25

KOI_19TA1

NCID_D44NCID_D1

AR_3349305_7526_PF1

As_1950038_SH05

997_9297_7783

E60750_8999_729

D367_01D482_95

E35066_84KO_21_PF1

AD_45KO_216KO_120

HUS_137K_1438

KO_8Ra_236KO_54KO_46

CRIE_160C_234

CRIE_154MK_803_CE29852_83

687_84E35155_84

25_90CDC_84_787

740_82bis40_81

09_6544KO_47

CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314

CRIE_1203CRIE_519

CDC_9021_89CRIE_1179E30225_83

CDC_87_3333CDC_87_3330

48_Laos3_LaosKO_49

KO_106_PF1KO_225

E3208_76E35062_84

KO_226KO_231

CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201

M_18KO_131

CDC_9025_89219_85

E2427_73E16184_79

151_77E2866_73E3008_73

SZ1250_36_74E15_74

E1347_72Wurzburg_206

F_134sdZM603

CDC_3099_85CDC_BU22X2CDC_BU53M1

Iris_1349_87E39084_85

Iris_KAT1Iris_KAT4

CDC_B9624_PF12104

C5778CDC_Z1

11_8112051650C413C596

RAJ_KC2022

CDC_A5468_PF1E45448_86bis

3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458

E1892_72E2471_73E2560_73

84_036E37992_85

99_7947RKI99_6909

99_570684934

RKI99_884263478

99_9324IPD_23

IPD_1200927_84

7_87_PF116_8612_7715_77

6_80_PF1E7926_77bis

CIP56_33_PF1RIMD3101010

E192_75E174_75CRIE_82

HNCMB_20080CAR7_PF1

CAR18CAR10CAR19

2_90_repeatCIP62_17_PF1CIP54_95_PF1

M430M71

CIP106200_PF1CDC_61_5512

91R14CDC_92_9000

91R17CDC_84_305

23_84CDC_C838CDC_C897

CDC_70_3827CDC_69_3823CDC_69_3818

E1012_74C_164

CDC_79_1480bisCDC_C1041

CDC_C1039bis2735

CDC_1007_74CDC_62_5000_PF1

450CDC_55_986_PF1

M160NIPHW_373NIPHW_18NIPHW_17

NIPHW_374NIPHW_371CRIE_1172CRIE_904

Shigella_dysenteriae_Sd197_v1Sd197_PF1

WR1414Udorn7

CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937

60RHanabusa

M220HNCMB_20003HNCMB_20002

PCM_159CIP55_137_PF1

Sh46_58Sh45_58Sh30_44

IPSP_14_941HNCMB_20001

PCM_126RKI1966

IPSP_14_940IPSP_14_939IPSP_15_159

CIP57_28_PF1CDC_3036_94

17_89_PF193_531

91_312798_4962_PF1

91_312510_879_89

E322_74CIP58_1_PF1

Sh39_47CDC_1199_PF1

M222Sh31_44Sh33_44Sh37_45

CIP55_90SH_Lisbonne

CRIE_954CIP52_27_PF1

Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58

M117M118M188M216M63

M217M189M119

IPSP_15_158M159M165M116

M115_cloned

PFGE XbaI

Dice (Tol 0.5%-0.5%) (H>0.0% S>0.0%) [0.0%-100.0%] PFGE-NotI

N5N6N7N8N2N3N4N1N16

N11N9

N15N10N13N14N12N17

10060 70 80 9050

Relative similarity (%)

B

Dice (Tol 0.5%-0.5%) (H>0.0% S>0.0%) [0.0%-100.0%]

50

PFGE-XbaI

X10X13X8X11X12X14X15X17X18X16

X6X5X4X3X2X1X9

X7

60 70 80 90 100Relative similarity (%)

A

M M M

1,135&

669&

453&

336&

170&

77&

33&

244&

X8&93/531&

X9&99/4320&

X10&99/7947&

X11&99/9837&

X12&99/10354&

X13&99/9324&

X14&00/5166&

X15&00/8097&

X16&01/1502&

X17&03/2815&

X18&97/7783&

X1&CAR

1&X1

&CAR

4&X2

&CAR

5&X3

&X4

&X5

&CAR

7&X5

&CAR

19&

X6&

X6&CAR

10&

X7&

kb&

c

Page 5: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

5

Supplementary Fig. 4 Complete antibiotic resistance data. The antibiotic susceptibility testing data (AST), the presence of the different antibiotic resistance genes (ARGs), and ARG-bearing structures are shown for each isolate, according to its position in the phylogeny (maximum likelihood phylogeny based on 14,677 chromosomal SNPs after mapping against the reference genome Sd197). Abbreviations: AMP, ampicillin; STR, streptomycin; SUL, sulfonamides; TMP, trimethoprim; SXT, cotrimoxazole; CHL, chloramphenicol; TET, tetracycline; NAL, nalidixic acid; and CIP, ciprofloxacin. GenBank accession numbers are given after the ARG element name. The sequence of R387 was found at the Wellcome Trust Sanger Institute website. Additional ARG elements were identified in this study. When known, plasmid incompatibility group (inc) is given before the plasmid name.

Keys (Antibiotic susceptibility testing, AST) Resistant Susceptible Unknown

(Antibiotic resistance genes, ARGs; ARG elements) Presence Absence

(SRL-PAI variant) type A type B type C type D type E Not typable (remnant) Absence !!

Page 6: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

6

Supplementary Fig. 5

Structure of the different variants of the Shigella resistance locus (SRL) pathogenicity island (SRL-PAI). The data were extracted from the PacBio sequences of isolates 40-81 (SRL-A), CDC ZB4 (SRL-B), 17/89 (SRL-C), CAR10 (SRL-D), and 99-9324 (SRL-E). All the Shigella resistance locus pathogenicity islands (SRL-PAIs) are inserted into the serW tRNA gene. Boxes containing yellow patterns correspond to antibiotic resistance genes. Boxes containing red patterns correspond to insertion sequences. Red labeling indicates insertions/deletions related to SRL-A. The inset shows the structure of the putative ancestral shf locus (as observed in various virulence plasmids from Shigella spp., including pSD1_197) before the insertion of the SRL which is shown above the shf locus. The 8-bp inverted repeats at both ends of the SRL are shown.

SRL-B

SRL-A

SRL-C

SRL-D

SRL-E

virK msbB2 capU

shf IS911

int

aadA1

blaOXA-1

catA1

tetA(B) fecEDCBARI

10,000 bp

Δ 2-6::ISSd1

42::ISSd1

msbB2

shf

capUΔ

ATTTAAGC

virK

57::IS1

48::ISEc12

ISEc23

IS629-IS911::sfiA group II intron

Δ 47 and 6 kb insertion

Δ 47 and 6 kb insertion

Δ 7-9

2.5 kb insertion

IS629::sfiA group II intron

capUΔ

44::ISSd1

IS629::sfiA group II intron

IS629-IS911::sfiA group II intron

TAAATTCG

Page 7: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

7

Supplementary Fig. 6

Structure of the chromosomally encoded transposon found in the CDC 87-3330 isolate. The transposon is shown in the upper part of the figure. Its structure is based on the PacBio sequence of isolate CDC 87-3330. Antibiotic resistance genes are boxed in yellow and insertion sequences are boxed in red. The 8-bp direct repeats at the two ends of the transposon are shown. The chromosomal location of the transposon is shown in the bottom part of the figure, using coordinates based on the S. dysenteriae type 1 reference genome Sd197. Regions of similarity to the SRL-PAI are also indicated in purple.

catA1 tetA(B) IS1 IS10

5’-GGCAGAGTG-3’ 5’-GGCAGAGTG-3’

5’-CATACAGGCAGAGTGGCCGTG-3’!4,037,510!

Coordinates based on Sd197 (CP000034)!

Similarity to SRL-PAI (AF326777)!

fec operon

IS2 I R A Ap B C D E

4,029,937! 4,021,412!

IS10Δ IS10Δ ISShdy2

Page 8: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

8

Supplementary Fig. 7 Circular maximum likelihood phylogeny of the sequenced S. dysenteriae type 1 genomes rooted on non S. dysenteriae genomes. The tree is based on 140,385 SNPs called after mapping against Sd197. The working names of the genomes (See Supplementary Table 1 for the correspondence with isolate names) are shown. The scale corresponds to 12,634 SNPs.

0.09

CDC_C8546

M116

CDC_F1372

CDC_F2349

CAR5

E.coliO157_Sakai_NC_002695

91R17

40_81

As_19500

CRIE_314

NCID_BCH518

CRIE_240

SZ1250_36_74

CDC_Z1

CDC_F6465

CRIE_1

175

CDC_BU22X

2Iris_KAT

4

E60750_89

CDC_F683

E1892_

72

97_7783

E1012_74

CDC_A54

68

PAMA

CDC_1199

CDC_53_3937

99_729

63478

AD_45

RKI99_88

4217_89

CDC_F3432

99_9837

CRIE_178

CDC_

F4043

Sumoti

12_77

RKI1966

S.dysenteriae1_Sd1617_CP

006736

M63

CDC_

C838

E35062_84

S.flexneri2a_2457T_AE014073

CAR4

IPD_12009

95_6044

M115

Udorn7

CDC_3099_85

SPH_201

M119

Hanabusa

6_80

15_77

E2471_

73

KO_131

7_87

WR1414

CDC_92_9000

HNCMB_20001

CRIE_904

27_84

CIP62_17

Sh39_47

P2000_1068

99_7947

Iris_KAT1

NIPHW_17

CRIE_154

F_134

CDC_80_547

K_1438

C596

DH03

HNCMB_25023

E2866_73

Sh14_42

M222

Wurzburg_206

M117

13_05279

A1_47

CIP52_27

CDC_70_3827

KO_216

84_036

D482_95

NCID_

25

CDC_69_3818

M159

KO_170

09_6544

AG_896

P2000_1070

CDC_F69005_7337

Sh36_45

E37992

_85

E29852_83

DH05

CDC_69_3823

E45448_86

CDC_87_333348_Laos

CRIE_312

CDC_84_305

03_2815

151_77

NIPHW_371

Sh20_42

CDC_55_986

sdZM603

CDC_ZB4 CDC_

F3434

D367_01

98_343

CIP53_136

CDC_

F4042

CDC_87_3330

CDC_

08_3380

98_4962

E192_75

25_90

CAR18

CRIE_233

8

Y489_94

M165

Sh47_58

CDC_F1454_2

C413

E30225_83

E85944_92

99_4320

CDC_F4578

2735

HNCMB_20003

3_79

NCID_Az11

11_81

RAJ_K

CRIE_954

Sh21_42

IPD_23

3_Laos

As_15858

05_7334

M160

00_5166

As_15878

E16184_79CD

C_F3437

NCID_NK2490

E7926_77

CRIE_

160

CDC_F344X1

Iris_1349_87

CDC_F4564

CDC_84_

787

1650

CRIE_244

CRIE_2

458

E39084_85

KO_54

KO_226

CDC_

F4046

99_570

6

CRIE_239

KO_73

E2427_73

IPSP_14_939

SD_G3

CDC_9025_89

CRIE_1203

DH06

E4794_76

10_87

NIPHW_374

E15_74

CDC_9033_89

67

KO_46

IPD_11654

SH_Lisbonne

CRIE_82

PCM_159

RIMD

3101010

SPH_1546

CIP58_1

08_06369

91_3127IPSP_14_940

93_531

HNCM

B_20080

C_234

CDC_

C1041

KOI_19NC

ID_IDH

032161

1205

IPSP_14_941

CIP56_33

CDC_F1358

01_1502

38_SH05

A1_131

NCID_D44

E322_74

E2560_

73

CRIE_1179

S.sonnei_Ss046_NC_007384

CDC_79_1480

687_84

K_1613

IPD_21094

M216

E1347_72

C_164

219_85

E.coliK12_MG1655_NC_000913

CDC_F4568

60R

CAR1

00_8097

CDC_F1371

MS_836

CIP57_28

CAR19

CDC_ZB37

HNCMB_25022

NIPHW_18

S.boydii_Sb227_NC_007613

SPH_2395

M217

AR_33493

E35155_84

HNCMB_20002

C5778

Sh45_58

S.dysenteriae1_M

131649_Sanger

NCID_D1

CIP106200

PCM_126

CAR7

CIP54_95

9_89

CDC_61_5512

CDC_C8558

CDC_

F1605M

1

CDC_

C897

91R14

Sh37_45

KO_231

M189

MK_80

3_C

M220

00_2688

102_04990

91_3125

997_92

M_18

NCID_21

Sd197

CDC_

1007_74

258_11E

Sh13_41

KO_225

RKI99_6909

16_86

Sh29_42

450

E3008_73

99_932

4

C2022

23_84

CDC_F4579

E44600_86

CIP55_137CDC_3036_94

M_J84

Sh30_44

NIPHW_373

E46746_87

M188

06_1307

740_82

RKI00_2084

CAR10

1036_99

Sh46_58

CDC_

C1039

KO_106

99_10354

HUS_137

13_05318

Sh19_42

CRIE_139

CRIE_1

177

CDC_62_5000

Sh24_42

05_7526

CDC_9021_89

KH08_0455

CDC_BU53

M1

E3208_76

KO_47

KO_49

S.dysenteriae1_Sd197_CP000034

M71

IPSP_15_158

Sh25_42

Ra_236

M430

2104

CRIE_1

180

E35066_84

KO_8CRIE_519

2_90

KO_140

KO_21

84934

CDC_B962

4

06_6276

CDC_F2856

TA1

Sh33_44

CDC_F3431

97_13397

E174_75

CIP55_90

E2797_74

KO_120

M118

CRIE_144

Sh23_42CRIE_1172

Sh31_44

IPSP_15_159

SPH_755

0.09

CDC_C8546

M116

CDC_F1372

CDC_F2349

CAR5

E.coliO157_Sakai_NC_002695

91R17

40_81

As_19500

CRIE_314

NCID_BCH518

CRIE_240

SZ1250_36_74

CDC_Z1

CDC_F6465

CRIE_1

175

CDC_BU22X

2Iris_KAT

4

E60750_89

CDC_F683

E1892_

72

97_7783

E1012_74

CDC_A54

68

PAMA

CDC_1199

CDC_53_3937

99_729

63478

AD_45

RKI99_88

4217_89

CDC_F3432

99_9837

CRIE_178

CDC_

F4043

Sumoti

12_77

RKI1966

S.dysenteriae1_Sd1617_CP

006736

M63

CDC_

C838

E35062_84

S.flexneri2a_2457T_AE014073

CAR4

IPD_12009

95_6044

M115

Udorn7

CDC_3099_85

SPH_201

M119

Hanabusa

6_80

15_77

E2471_

73

KO_131

7_87

WR1414

CDC_92_9000

HNCMB_20001

CRIE_904

27_84

CIP62_17

Sh39_47

P2000_1068

99_7947

Iris_KAT1

NIPHW_17

CRIE_154

F_134

CDC_80_547

K_1438

C596

DH03

HNCMB_25023

E2866_73

Sh14_42

M222

Wurzburg_206

M117

13_05279

A1_47

CIP52_27

CDC_70_3827

KO_216

84_036

D482_95

NCID_

25

CDC_69_3818

M159

KO_170

09_6544

AG_896

P2000_1070

CDC_F69005_7337

Sh36_45

E37992

_85

E29852_83

DH05

CDC_69_3823

E45448_86

CDC_87_333348_Laos

CRIE_312

CDC_84_305

03_2815

151_77

NIPHW_371

Sh20_42

CDC_55_986

sdZM603

CDC_ZB4 CDC_

F3434

D367_01

98_343

CIP53_136

CDC_

F4042

CDC_87_3330

CDC_

08_3380

98_4962

E192_75

25_90

CAR18

CRIE_233

8

Y489_94

M165

Sh47_58

CDC_F1454_2

C413

E30225_83

E85944_92

99_4320

CDC_F4578

2735

HNCMB_20003

3_79

NCID_Az11

11_81

RAJ_K

CRIE_954

Sh21_42

IPD_23

3_Laos

As_15858

05_7334

M160

00_5166

As_15878

E16184_79

CDC_F3437

NCID_NK2490

E7926_77

CRIE_

160

CDC_F344X1

Iris_1349_87

CDC_F4564

CDC_84_

787

1650

CRIE_244

CRIE_2

458

E39084_85

KO_54

KO_226

CDC_

F4046

99_570

6

CRIE_239

KO_73

E2427_73

IPSP_14_939

SD_G3

CDC_9025_89

CRIE_1203

DH06

E4794_76

10_87

NIPHW_374

E15_74

CDC_9033_89

67

KO_46

IPD_11654

SH_Lisbonne

CRIE_82

PCM_159

RIMD

3101010

SPH_1546

CIP58_1

08_06369

91_3127IPSP_14_940

93_531

HNCM

B_20080

C_234

CDC_

C1041

KOI_19NC

ID_IDH

032161

1205

IPSP_14_941

CIP56_33

CDC_F1358

01_1502

38_SH05

A1_131

NCID_D44

E322_74

E2560_

73

CRIE_1179

S.sonnei_Ss046_NC_007384

CDC_79_1480

687_84

K_1613

IPD_21094

M216

E1347_72

C_164

219_85

E.coliK12_MG1655_NC_000913

CDC_F4568

60R

CAR1

00_8097

CDC_F1371

MS_836

CIP57_28

CAR19

CDC_ZB37

HNCMB_25022

NIPHW_18

S.boydii_Sb227_NC_007613

SPH_2395

M217

AR_33493

E35155_84

HNCMB_20002

C5778

Sh45_58

S.dysenteriae1_M

131649_Sanger

NCID_D1

CIP106200

PCM_126

CAR7

CIP54_95

9_89CD

C_61_5512

CDC_C8558

CDC_

F1605M

1

CDC_

C897

91R14

Sh37_45

KO_231

M189

MK_80

3_C

M220

00_2688

102_04990

91_3125

997_92

M_18

NCID_21

Sd197

CDC_

1007_74

258_11E

Sh13_41

KO_225

RKI99_6909

16_86

Sh29_42

450

E3008_73

99_932

4

C2022

23_84

CDC_F4579

E44600_86

CIP55_137CDC_3036_94

M_J84

Sh30_44

NIPHW_373

E46746_87

M188

06_1307

740_82

RKI00_2084

CAR10

1036_99

Sh46_58

CDC_

C1039

KO_106

99_10354

HUS_137

13_05318

Sh19_42

CRIE_139

CRIE_1

177

CDC_62_5000

Sh24_42

05_7526

CDC_9021_89

KH08_0455

CDC_BU53

M1

E3208_76

KO_47

KO_49

S.dysenteriae1_Sd197_CP000034

M71

IPSP_15_158

Sh25_42

Ra_236

M430

2104

CRIE_1

180

E35066_84

KO_8CRIE_519

2_90

KO_140

KO_21

84934

CDC_B962

4

06_6276

CDC_F2856

TA1

Sh33_44

CDC_F3431

97_13397

E174_75

CIP55_90

E2797_74

KO_120

M118

CRIE_144

Sh23_42CRIE_1172

Sh31_44

IPSP_15_159

SPH_755

E. coli O157:H7 Sakai

E. coli K-12 MG1655

S. flexneri 2a 2457T

S. boydii Sb227

S. sonnei Ss046

0.09

Page 9: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

9

Supplementary Fig. 8

Capacity of M115 to generate mutations conferring resistance to rifampicin.

The results are presented as mean values (± standard errors) for two independent experiments. The E. coli ECOR48 (CIP 106023) strain was used as a strong mutator positive control, the S. dysenteriae type 1 97-13397 isolate was used as a putative strong mutator isolate (deletion of the mutS gene), and the S. dysenteriae type 1 isolates M116 and Sd197 were used as putative normomutator isolates (integrity of the mutS, mutH, mutL and uvrD methyl-directed mismatch repair genes).

Strain

Freq

uenc

y of

mut

atio

n to

rifa

mpi

cin

resi

stan

ce

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

ECOR48 Sd197 M115 M116 97-13395 7

Page 10: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

10

Supplementary Fig. 9 Circular maximum likelihood phylogeny of the entire set of sequenced S. dysenteriae type 1 genomes. The tree is based on 14,677 SNPs called after mapping against Sd197 (marked with a blue triangle). The tree was rooted on M115, which is most closely related to the S. dysenteriae type 1 ancestral strain. The intercontinental transmission events (T1 to T8) and the bootstrap values are shown.

0.02

C2022_Rwanda_1984

D367_01_Eritrea_2001

CDC_F2856_Malawi_1995

CDC_9025_89_Sau

di_Arabia_1988

RIMD3101010_Bangladesh_1977CR

IE_1172_Vietnam_1983

HNCM

B_25023_Vietnam_1971

CDC_F4042_Kenya_1997

E39084_85_Democratic_Republic_Congo_1985

CDC_ZB37_Zambia_1991

M_18_Thailand_

1992

E2427_73_India_1973

60R_Unknown_1940

91_3125_Madagascar_1991

Udorn7_Thailand_1966

NCID_Az11_India_2003

67_Democratic_Republic_Congo_1994

E2471_73_UK_1973

E85944_92_India_1992

Sh45_58_Tajikistan_1940

E45448_86_India_1986

91R14_Guatem

ala_199115_77_Senegal_1977

DH03_Central_African_Republic_1996

WR1414_Unknown_1939

IPSP_14_940_Russia_1958

HNCM

B_20003_Hungary_1953

CDC_3099_85_Sudan_1985

CDC_F1358_Burundi_1993

98_343_Mali_1998

SPH_755_Tha

iland_1993

CRIE_240

_Uzbekistan_19

86

CDC_F1605M1_Angola_1994

E30225

_83_Pa

kistan_

1983

23_84_Mexico_1984

NCID_D

1_India_2002

1036_99_Burundi_1999

E2797_74_

India_1974

M71_UK_1916

CDC_79_1480bis_USA_1979

CIP55_137_Romania_1935

CDC_3036_94_USA_1994

KO_54_Bangladesh_1985

C_234_India_1984

99_4320_Mali_1999

M217_Unknown_1917

M115_UK_1926Wurzburg_206_Unknown_1972

HNCMB_20080_Unknown_1954

MS_836_Bangladesh_2003

CDC_69_3818_Guatem

ala_1969

DH05_Central_African_Republic_1996

Sh47_58_Uk

raine_1935

E7926_77bis_Somalia_1977

NIPH

W_373_Poland_1945

A1_131_Democratic_Republic_Congo_1994

05_7334_Central_African_Republic_2005

sdZM603_Somalia_1984

10_87_Madagascar_1987

Iris_1349_87_Rwanda_1987

KO_8_Bangladesh_1993

M188_UK_1916

1650_Rwanda_1985

CAR5_Central_African_Republic_2004

SPH_1546_T

hailand_199

4

IPSP_15_159_Russia_1956

91_3127_Madagascar_1991

CRIE_154_Estonia_1988

CDC_9

021_89

_Saudi_Ar

abia_1

988

HNCM

B_20002_Unknown_1953

CDC_ZB4_Zambia_1991

IPSP_14_941_Russia_1958

CRIE_144_Uzbekistan_1988

11_81_Democratic_Republic_Congo_1981

PCM_126_Unknown_1972

Sh24_42_Cz

ech_Republi

c_1942

CRIE_117

9_Tajikista

n_1985

25_90_India_1990

CDC_87

_3330_

Thailan

d_1987

SPH_2395_T

hailand_1996

08_06369_India_2008

NCID_BCH

518_India_1995

KO_131_Banglade

sh_1987

PAMA_Cameroon_1998

3_Laos

_Laos_

1988

SPH_201_Thail

and_1992

CDC_C1039bis_Mexico_1988

CDC_8

7_3333

_Thaila

nd_198

7Sd197_Reference_China_1950

M117_UK_19

16

CDC_53_3937_Vietnam_1953

CDC_F4579_South_Africa_1993

E46746_87_India_1987

M216_UK_1916

M119_UK_1916

AG_896_Nepal_1986

CAR18_Central_African_Republic_2004

2735_U

SA_1974

00_5166_Cote_d_Ivoire_2000

CDC_92_9000_Panama_1991

CDC_F6465_Cote_d_Ivoire_1999

CRIE_82_Bangladesh_1974

CDC_F690_Mozambique_1993

687_84_Pakistan_1984

M222_

Unknow

n_1926

As_15858_Bangladesh_2003

Sh14_42

_Germa

ny_1942

M165_Unknown_1930

258_11E_Cameroon_1998

Ra_236_Bangladesh_1986

997_92_Pakistan_1992

D482_95_Afghanistan_1995

IPD_21094_Senegal_2001

KOI_19_India_2002

151_77_Sri_Lanka_1977

KO_140_Bangladesh_1997

2104_Rwanda_1985

A1_47_Democratic_Republic_Congo_1994

CDC_F3432_South_Africa_1994

CRIE_1180_Russia_1985

C413_Rwanda_1986

CRIE_312_Uzbekistan_1986

6_80_Senegal_1979

NCID_25_India_2002

CRIE_1177_Tajikistan_1985

CDC_F4578_South_Africa_1993

Sh19_42_

Czech_Re

public_19

42

CDC_C8546_Burundi_1993

M189_UK_1916

CDC_F4043_Kenya_1997

CDC_F2349_Zimbabwe_1993

E35066_84_India_1984

13_05279_Niger_2011

IPD_12009_Senegal_1997

CDC_61_5512_Unknown_1961

NCID_IDH032161_India_2010

CDC_84_305_Mexico_1984

M430_Unknown_1926

Sumoti_Bangladesh_2004

CRIE_239_Uzbekistan_1979

KO_49_

Banglad

esh_198

5

E29852_83_India_1983

CDC_F4568_South_Africa_1997

CAR1_Central_African_Republic_2003

05_7337_Central_African_Republic_2005

E322_

74_Algeria

_1974

9_89_M

adagascar_19

89

M160_Un

know

n_1917

99_729_Djibouti_1999

99_9837_Senegal_1999

CDC_F3434_South_Africa_1994

CDC_C1041_Mexico_1988

Sh29_42_

Czech_Re

public_194

2

CRIE_314

_Uzbe

kistan_19

86

E35155_84_India_1984

CDC_80_5

47_India_1

980

CIP53_136_Vietnam_1952

CRIE_2338_Russia_1987

KO_120_Bangladesh_1997

CRIE_244

_Uzbekistan_19

86

27_84_Burkina_Faso_1984

Hanabusa_Japan_1927

CDC_F1454_2_Zimbabwe_1993

Sh36_45_

Czech_Re

public_19

45

M63_UK_1916

740_82_Greece_1982

13_05318_Niger_2011

CDC_08_3380_USA_2008

Sh46_58_Belarus_1934

CRIE_160_Estonia_1988

CRIE_178_Uzbekistan_1986

C596_Rwanda_1985

CDC_BU22X2_Burundi_1990

M_J84_Bangladesh_1994

95_6044_India_1995

CIP58_

1_Tunisia_19

58

KO_106_

India_197

8

3_79_India_1979

40_81_India_1981

Sh21_42_

Czech_Re

public_19

42

SZ1250_36_74_Unknown_1974

Iris_KAT1_Democratic_Republic_Congo_1985

CAR4_Central_African_Republic_2003

CRIE_139_Uzbekistan_1988

KO_46_Bangladesh_1985

C5778_Rwanda_1984

06_6276_Togo_2006

93_531_Madagascar_1993

99_10354_Mauritania_1999

E1347_72_UK_1972

CDC_C897_Mexico_1988

RKI00_2084_North_Africa_2000

CDC_BU53M1_Burundi_1990

01_1502_Burkina_Faso_2001

E3208_76

_UK_197

6

CDC_F3431_Zambia_1995

38_SH05_India_2005

CDC_70_3827_El_Salvador_1970

KO_225_

India_198

2

102_04990_Angola_2002

P2000_1070_Sierra_Leone_2000

450_Un

know

n_1966

IPD_11654_Senegal_1998

2_90_Niger_1990

TA1_India_2002

NCID_21_India_2002

E4794_76_U

K_1976

CAR7_Central_African_Republic_2004

48_Lao

s_Laos

_1988

91R17_Guatem

ala_1991

KO_231_In

dia_1983

CDC_F4046_Kenya_1997

F_134_Somalia_1983

IPD_23_Senegal_1998

CRIE_9

54_Unk

nown_1

968

IPSP_14_939_Russia_1958

00_8097_Senegal_2000

05_7526_India_2005

CDC_B9624_Rwanda_1988

Sh37_4

5_Czec

h_Repu

blic_19

45

RKI1966_Unknown_1966

RKI99_6909_Egypt_1999

PCM_159_Hungary_1944

E37992_85_India_1985

SD_G3_Rwanda_1994

KO_47_Bangladesh_1985

219_85_Portugal_1985

CDC_9033_89_Nepal_1988

CDC_119

9_Mediterrane

an_area_1

944

CDC_F683_Mozambique_1993

M220_Malta_1917

Y489_94_UK_1994

CDC_F3437_South_Africa_1994

CRIE_1175_Uzbekistan_1985

E44600_86_UK_1986

As_15878_Bangladesh_2003

AD_45_Thailand_1993

E35062_8

4_India_1

984

CDC_Z1_Democratic_Republic_Congo_1983

CRIE_120

3_Russia_198

7

RKI99_8842_Egypt_1999

CIP52_

27_Unk

nown_1

944

KO_226_

India_198

2

Sd197_China_1950

NIPH

W_374_Poland_1953

17_89_Madagascar_1989

Sh13_41

_German

y_1941

K_1613_Bangladesh_2000

KO_73_Bangladesh_1997

99_9324_Egypt_1999

P2000_1068_Sierra_Leone_2000

KO_170_Bangladesh_1997

CRIE_2458_Russia_1987

E174_75_Bangladesh_1975

63478_Israel_1997

00_2688_Congo_2000MK_803_C_Thailand_1994

CIP57_28_Unknown_1934

Sh39_4

7_Jugo

slavia_19

47

CDC_F4564_South_Africa_1997

M118_UK_1916

Sh30_44_Czech_Republic_1944

CIP55_

90_Rom

ania_19

35

HNCM

B_25022_Vietnam_1971

K_1438_Bangladesh_2000

E192_75_Bangladesh_1975

CIP56_33_Ethiopia_1956

M159_France_1915

97_7783_Pakistan_1997

CAR19_Central_African_Republic_2004

Sh23_42_Cz

ech_Republi

c_1942

NCID_D

44_India_2002

12_77_Portugal_1977

NIPH

W_18_Poland_1953

09_6544_Nepal_2009

E3008_73_India_1973

NIPH

W_17_Poland_1949

E15_74_UK_1974

CRIE_904_Vietnam

_1987

Sh33_4

4_Czec

h_Repu

blic_19

44

CDC_A5468_Democratic_Republic_Congo_1981

HUS_137_Bangladesh_1998

CDC_C838_USA_1988

KO_216_Bangladesh_1985

CDC_

55_986_M

exico

_1955

CIP62_17_Cameroon_1962

99_5706_Egypt_1999

CDC_84_787_Nepal_1984

CIP54_95_Burkina_Faso_1954

DH06_Central_African_Republic_1996

As_19500_Nepal_2003

CRIE_519

_Latvia_19

86

99_7947_Egypt_1999

Sh25_42_C

zech_Repu

blic_1942

1205_Rwanda_1985Iris_KAT4_Democratic_Republic_Congo_1985

NCID_NK2490_India_2002

M116_UK_1916

84_036_China_1984

CIP106200_UK_1917

CAR10_Central_African_Republic_2004

NIPH

W_371_Poland_1949C_164_M

exico_1972 CDC_

1007_74_US

A_1974

06_1307_Central_African_Republic_2006

98_4962_Madagascar_1998

E2560_73_Pakistan_1973

84934_Israel_2001

RAJ_K_Democratic_Republic_Congo_1985

HNCM

B_20001_Hungary_1935

CDC_C8558_Zambia_1992

E16184_79_Sri_Lanka_1979

Sh20_42_

Czech_R

epublic_1

942

03_2815_Niger_2003

E60750_89_Pakistan_1989

CDC_F344X1_Malawi_1993

CDC_F1371_Burundi_1993

SH_Lisb

onne_F

rance_1

921

CDC_

62_5000_Guatemala_1962

16_86_Mali_1986

CDC_F1372_Burundi_1993

Sh31_4

4_Czec

h_Repu

blic_19

44

7_87_Mali_1987

IPSP_15_158_Russia_1956

KO_21_India_1981

97_13397_France_1997

E2866_73_India_1973CDC_69_3823_G

uatemala_1969

KH08_0455_Nepal_2008

E1012_74_Mexico_1974

AR_33493_Nepal_2002

E1892_72_UK_1972

90

6883

100

100

98

100

100

59

74100

100

100

99

44

98

100

100

42

100

39

95

100

100

100

100

100

30

100

100

86

11

100

98

100

100

100

100

38

65

96

100

99

100

76

94

100

60

100

97

100

87

100

8387

100

100

100

83

67

100

43

100

100

100

100

74

100

10050

100

96

100

100

100 100

100

100

64

100

77

67

100

82

80

63

98

62

100

79

100

10068

100

89

73

100

100

74

100

100

20

100

94

63

99

41

10028

100

94

84

100

100

96

86

100

94

100

26

100

99

87

100

80

100

100

76

81

97

100

91

100

68

47

99

58

74

95

100

93

100

100

100100

100

100

92

85

100

96

86

98

100

98

10092

66

70

100

100100

98

100

100

92

71

60

9260

37

55

100

100

99

59

100

26100

100

100

45

99

60

100

100

72

71

100

100

100

100

100

100

99

56

91100

100

100

100

61

65

58

100

98

94

94

60

73

100

96

61

99

100

58

88

100

76

100

100

100

100

100

95

100

100

100

72

100

26

100

100

99

100

100

94

100

100

100

100

100

100

100

96

61100

100

100

100

100

100

100

100

100

100

100

100

47

64

18

99

71

100

34

84

100

96

72

100

83

100

98

100

100

100

100

100

86

100

96

93100

100

100

100

35

88

78

100

100

71

100

94

58

100

10084

100

100

59

84

95

100

100

94

100

100

99

100

100

92

72

99

57

97

100

72

100

100

100

100

43

100

100

64

18

100

100

T1

T2

T3 T4

T5 T6

T7

T8

Page 11: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

11

Supplementary Fig. 10 Circular maximum likelihood phylogeny of the entire set of sequenced S. dysenteriae type 1 genomes plus the newly published reference genome Sd1617. The tree is based on 15,752 SNPs called after mapping against Sd1617 (marked with a blue triangle). The tree was rooted on M115, which is most closely related to the S. dysenteriae type 1 ancestral strain. The intercontinental transmission events (T1 to T8) and the bootstrap values are shown.

0.02

NIPH

W_374_Poland_1953

E45448_86_India_1986

KOI_19_India_2002

IPD_12009_Senegal_1997

M159_France_1915

HNCM

B_20001_Hungary_1935

CDC_F6465_Cote_d_Ivoire_1999

IPSP_14_939_Russia_1958

HNCMB_20080_Unknown_1954

CDC_F4579_South_Africa_1993

99_9324_Egypt_1999

CDC_C1039bis_Mexico_1988

NCID_D1_India_2002

Hanabusa_Japan_1927

2735_U

SA_1974

17_89_Madagascar_1989

Sh39_4

7_Jugo

slavia_19

47

997_92_Pakistan_1992

2104_Rwanda_1985

CRIE_240

_Uzbekistan_19

86

E2427_73_India_1973

9_89_M

adagascar_19

89

CDC_F2856_Malawi_1995

C5778_Rwanda_1984

CDC_92_9000_Panama_1991

M188_UK_1916

PAMA_Cameroon_1998

A1_47_Democratic_Republic_Congo_1994

CIP56_33_Ethiopia_1956

Sh31_4

4_Czec

h_Repu

blic_19

44

sdZM603_Somalia_1984

MS_836_Bangladesh_2003

HNCM

B_20002_Unknown_1953

E60750_89_Pakistan_1989

CDC_C1041_Mexico_1988

CDC_F1371_Burundi_1993

CAR5_Central_African_Republic_2004

M63_UK_1916

KO_140_Bangladesh_1997

K_1438_Bangladesh_2000CDC_B9624_Rwanda_1988

C596_Rwanda_1985

E15_74_UK_1974

CIP58_

1_Tunisia_19

58

DH03_Central_African_Republic_1996

D482_95_Afghanistan_1995

KO_131_Banglade

sh_1987

09_6544_Nepal_2009

13_05279_Niger_2011

KO_73_Bangladesh_1997

12_77_Portugal_1977CD

C_53_3937_Vietnam_1953

KO_216_Bangladesh_1985

CDC_F690_Mozambique_1993

60R_Unknown_1940

00_8097_Senegal_2000

KO_170_Bangladesh_1997

CDC_F3432_South_Africa_1994

CDC_ZB37_Zambia_1991RKI1966_Unknown_1966

E3208_76

_UK_197

6

F_134_Somalia_1983

00_2688_Congo_2000

CIP106200_UK_1917

08_06369_India_2008

CDC_70_3827_El_Salvador_1970

Y489_94_UK_1994

CDC_C8558_Zambia_1992

KO_106_

India_197

8

M222_U

nknown_

1926

Sh24_42_

Czech_Re

public_19

42

84_036_China_1984

WR1414_Unknown_1939

NCID_Az11_India_2003

M220_Malta_1917

NCID_25_India_2002

CDC_A5468_Democratic_Republic_Congo_1981

CDC_9025_89_Sau

di_Arabia_1988

C2022_Rwanda_1984

Sd197_China_1950

CRIE_144_Uzbekistan_1988

CDC_BU53M1_Burundi_1990

SPH_755_Tha

iland_1993

E2866_73_India_1973

CRIE_2458_Russia_1987

CRIE_117

9_Tajikista

n_1985

16_86_Mali_1986

CIP54_95_Burkina_Faso_1954

E35155_84_India_1984

CDC_61_5512_Unknown_1961

CDC_

1007_74_US

A_1974

NIPH

W_373_Poland_1945

Sd1617_R

eference_G

uatemala_1969

SPH_2395_T

hailand_1996

99_9837_Senegal_1999

CDC_Z1_Democratic_Republic_Congo_1983

KO_21_India_1981

Sh45_58_Tajikistan_1940

DH05_Central_African_Republic_1996

E35062_8

4_India_1

984

CAR18_Central_African_Republic_2004

E1347_72_UK_1972

99_7947_Egypt_1999

E174_75_Bangladesh_1975

CRIE_2338_Russia_1987

91_3127_Madagascar_1991

Sh36_45_Cz

ech_Republi

c_1945

SD_G3_Rwanda_1994

CDC_8

7_3333

_Thaila

nd_198

7

M115_UK_1926

05_7337_Central_African_Republic_2005

E35066_84_India_1984

97_7783_Pakistan_1997

CDC_69_3823_Guatem

ala_1969

IPSP_15_158_Russia_19

56

05_7334_Central_African_Republic_2005

13_05318_Niger_2011

KO_231_

India_198

3

As_15878_Bangladesh_2003

D367_01_Eritrea_2001

CDC_119

9_Mediterrane

an_area_1

944

CDC_79_1480bis_USA_1979

CDC_F1454_2_Zimbabwe_1993

67_Democratic_Republic_Congo_1994

M160_Un

know

n_1917

CIP57_28_Unknown_193415_77_Senegal_1977

NCID_NK2490_India_2002

M116_UK_1916

CDC_84_787_Nepal_1984

00_5166_Cote_d_Ivoire_2000

CDC_F1372_Burundi_1993

Sh25_42_Cz

ech_Republi

c_1942

CDC_F4578_South_Africa_1993

CDC_F4568_South_Africa_1997CIP55_137_Romania_1935

E30225

_83_Pa

kistan_

1983

HUS_137_Bangladesh_1998

RKI99_6909_Egypt_1999

CRIE_904_Vietnam

_1987

27_84_Burkina_Faso_1984

CDC_

62_5000_Guatemala_1962

CRIE_519

_Latvia_19

86

KO_120_Bangladesh_1997

TA1_India_2002 E44600_86_UK_1986

CDC_F3434_South_Africa_1994

C_164_Mexico_1972

KO_46_Bangladesh_1985

99_729_Djibouti_1999

151_77_Sri_Lanka_1977

CDC_F4564_South_Africa_1997CDC_9

021_89

_Saudi_Ar

abia_1

988

11_81_Democratic_Republic_Congo_1981

CAR1_Central_African_Republic_2003

HNCM

B_25022_Vietnam_1971

102_04990_Angola_2002

95_6044_India_1995

Sumoti_Bangladesh_2004

03_2815_Niger_2003

MK_803_C_

Thailand_1994

M118_UK_1916

M430_Unknown_1926

Sh23_42_

Czech_R

epublic_1

942

SZ1250_36_74_Unknown_1974

NCID_BCH

518_India_1995

740_82_Greece_1982

23_84_Mexico_1984

CRIE_314

_Uzbe

kistan_19

86

IPSP_14_941_Russia_1958

97_13397_France_1997

KO_225_

India_198

2

63478_Israel_1997

CDC_

55_986_M

exico

_1955

IPD_21094_Senegal_2001

IPSP_14_940_Russia_1958

687_84_Pakistan_1984

1205_Rwanda_1985

Sh19_42_

Czech_Re

public_194

2

CRIE_139_Uzbekistan_1988

As_19500_Nepal_2003

SPH_1546_T

hailand_199

4

KO_226_In

dia_1982

E1012_74_Mexico_1974

E3008_73_India_1973

25_90_India_1990

258_11E_Cameroon_1998

7_87_Mali_19876_80_Senegal_1979

KO_49_

Banglad

esh_198

5

KO_54_Bangladesh_1985

E29852_83_India_1983

NIPH

W_17_Poland_1949

CRIE_160_Estonia_1988

RKI00_2084_North_Africa_2000

Iris_1349_87_Rwanda_1987

E4794_76_U

K_1976

CDC_F344X1_Malawi_1993

E37992_85_India_1985

99_5706_Egypt_1999

CDC_F683_Mozambique_1993

CDC_3099_85_Sudan_1985

38_SH05_India_2005

CRIE_1177_Tajikistan_1985

E322_

74_Algeria

_1974

M217_Unknown_1917

IPD_23_Senegal_1998

CRIE_239_Uzbekistan_1979

E39084_85_Democratic_Republic_Congo_1985

RKI99_8842_Egypt_1999

CDC_F3431_Zambia_1995

M216_UK_1916

3_79_India_1979

NCID_IDH

032161_India_2010

98_4962_Madagascar_1998

P2000_1068_Sierra_Leone_2000

05_7526_India_2005

A1_131_Democratic_Republic_Congo_1994

HNCM

B_20003_Hungary_1953

CDC_3036_94_USA_1994

Sh20_42_

Czech_Re

public_19

42

E2560_73_Pakistan_1973

NIPH

W_371_Poland_1949

CIP55_

90_Rom

ania_19

35

98_343_Mali_1998

AG_896_Nepal_1986

CRIE_1175_Uzbekistan_1985

HNCM

B_25023_Vietnam_1971

CDC_F4042_Kenya_1997

Sh33_4

4_Czec

h_Repu

blic_19

44

CRIE_1180_Russia_1985

M_18_Thailand_

1992

CDC_F2349_Zimbabwe_1993

RIMD3101010_Bangladesh_1977

Iris_KAT4_Democratic_Republic_Congo_1985

AR_33493_Nepal_2002

P2000_1070_Sierra_Leone_2000

CDC_69_3818_Guatem

ala_1969

NCID_D44_India_2002

CDC_80_

547_India

_1980

M119_UK_1916

CDC_F4046_Kenya_1997

CRIE_312_Uzbekistan_1986

Sh21_42_

Czech_Re

public_19

42

CDC_F3437_South_Africa_1994

M71_UK_1916

Wurzburg_206_Unknown_1972

IPSP_15_159_Russia_1956

CDC_ZB4_Zambia_1991

CDC_84_305_Mexico_1984

99_10354_Mauritania_1999CIP53_136_Vietnam_1952

CRIE_178_Uzbekistan_1986

99_4320_Mali_1999

450_Un

know

n_1966

CRIE_244

_Uzbekistan_19

86

Iris_KAT1_Democratic_Republic_Congo_1985

C413_Rwanda_1986

KO_47_Bangladesh_1985

CDC_C8546_Burundi_1993

PCM_159_Hungary_1944

DH06_Central_African_Republic_1996

CDC_08_3380_USA_2008

E7926_77bis_Somalia_1977CAR7_Central_African_Republic_2004

E85944_92_India_1992

M_J84_Bangladesh_1994

CDC_C897_Mexico_1988

219_85_Portugal_1985

10_87_Madagascar_1987

CDC_F1358_Burundi_1993

CDC_9033_89_Nepal_1988

KO_8_Bangladesh_1993

3_Laos

_Laos_

1988

48_Lao

s_Laos

_1988

CDC_87

_3330_

Thailan

d_1987

91R14_Guatem

ala_1991 CRIE_1172_Vietnam_1983

K_1613_Bangladesh_2000

NCID_21_India_2002

06_6276_Togo_2006

CIP52_

27_Unk

nown_1

944

As_15858_Bangladesh_2003

1036_99_Burundi_1999

M165_Unknown_1930

Udorn7_Thailand_1966

CDC_F1605M1_Angola_1994

M117_UK_19

16

CAR4_Central_African_Republic_2003

Sh14_42

_German

y_1942

E192_75_Bangladesh_1975

CAR10_Central_African_Republic_2004

Sh13_41

_Germa

ny_1941

M189_UK_1916

CIP62_17_Cameroon_1962

NIPH

W_18_Poland_1953

Sh29_42_C

zech_Repu

blic_1942

E46746_87_India_1987

CDC_BU22X2_Burundi_1990

CRIE_120

3_Russia_198

7

RAJ_K_Democratic_Republic_Congo_1985

40_81_India_1981

E2797_74_

India_1974

Sh46_58_Belarus_1934

E16184_79_Sri_Lanka_1979

2_90_Niger_1990

06_1307_Central_African_Republic_2006

CRIE_154_Estonia_1988 Ra_236_Bangladesh_1986

C_234_India_1984

01_1502_Burkina_Faso_2001

91_3125_Madagascar_1991

KH08_0455_Nepal_2008

PCM_126_Unknown_1972

Sh37_4

5_Czec

h_Repu

blic_19

45

84934_Israel_2001

Sh47_58_Uk

raine_1935

CAR19_Central_African_Republic_2004

IPD_11654_Senegal_1998

CRIE_82_Bangladesh_1974

E1892_72_UK_1972

91R17_Guatem

ala_1991

SH_Lisb

onne_F

rance_

1921

Sh30_44_Czech_Republic_1944

CDC_C838_USA_1988

CDC_F4043_Kenya_1997

AD_45_Thailand_1993

SPH_201_Thail

and_1992

1650_Rwanda_1985

CRIE_9

54_Unk

nown_1

968

93_531_Madagascar_1993

E2471_73_UK_1973

84

100

100

100

100

73

95

100

64

100

100

75

66100

100

100

100

100

100 39

100

100 66

18

100

16

100

100

100

9779

67

100

100

100

100

100

100

100

100

90

10035

100

100

65

81

86

100

100

89

99

100

100

55

100

85

100

100

74

100

19100

99

100

16

67

38

100

37

100

19

37

10091

100

99

33

100

100

100

100

100

73

100

7

100

100

100

79

19

69

100

4

93

21

100

34

97

100

100

100

93

61

88

100

88

100

100

79

100

90

100

100

100

93

92

47

18

88

94

40

100

74

29

100

100

100

100

100

100

73

81

70

82

84

98

100

56

100

55

73

89

100

78

97

100

93

95

100

8

71

100

99

100

100

69

99

59

96

27

89

88

31

100100

100

95

100

114

98

100 100

100

100

78

89

37

65

7

61

100

73

100

100

91

100

72100

100

62

100

100

100

76

86

100

88

5245

5

100

100

100

93

4

100

40

21

100

100

100

100

100

100

100

36

96

98

76

100

57

98

59

100

72

97

100

100

100

65

42

95

100

100

100

64

39100

100

93

100

73

82

79

63

92

100

100

15

100

55

100

78

65

99

63

100 91

86

98

9

92100

55

100

100

100

10054

97

100

86

100

100

100

100

40

100

100

100

81

100

100

100

49

93

92

100

100

100

100100

100

86

100

71

61

62

98

38

100

97

98

100

10069

100

100

100

68

96

98

100100

100

76

100

99

100

69

88

100

100 100

100

100

100

T1

T2

T3 T4

T5 T6

T7

T8

Page 12: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

12

Supplementary Fig. 11 Comparison of the maximum likelihood trees obtained after mapping against two different S. dysenteriae type 1 reference genomes. The tree on the left is based on 14,677 SNPs called after mapping against Sd197 (lineage IIIa). The tree on the right is based on 15,752 SNPs called after mapping against Sd1617 (lineage IIIb). Coloured boxes mark each of the lineages (I, II, III, IV), respectively; yellow, blue, green, red.

0.0056

A1_47CDC_F1372CDC_F1358CDC_F1371

1036_99_PF1SD_G3A1_131

67CDC_F4043

CDC_08_3380CDC_C8546CDC_F4046

CDC_F4042_repeatY489_94

CDC_F3434CDC_ZB497_13397

CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683

CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690

CDC_F1454_2_PF1CDC_ZB37102_04990

CDC_F1605M1PAMA

258_11EDH06DH05

06_130705_733405_7337CAR5CAR4CAR1

01_1502CDC_F6465

98_343DH03

00_809799_983713_0527913_0531803_2815

P2000_1070P2000_1068

00_516606_627600_2688

RKI00_208499_4320

IPD_11654IPD_2109499_10354

E46746_8795_6044

E44600_86CDC_9033_89

AG_896K_1613KO_140M_J84

KO_170KO_73

E85944_92NCID_Az11As_15858As_15878MS_836Sumoti

NCID_IDH03216108_06369

KH08_0455NCID_NK2490NCID_BCH518

NCID_21NCID_25KOI_19

TA1NCID_D44NCID_D1AR_33493

05_7526_PF1As_1950038_SH05997_9297_7783

E60750_8999_729

D367_01D482_95

E35066_84KO_21_PF1

AD_45KO_216KO_120

HUS_137K_1438KO_8

Ra_236KO_54KO_46

CRIE_160C_234

CRIE_154MK_803_CE29852_83

687_84E35155_84

25_90CDC_84_787

740_82bis40_81

09_6544KO_47

CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314CRIE_1203CRIE_519

CDC_9021_89CRIE_1179E30225_83

CDC_87_3333CDC_87_3330

48_Laos3_LaosKO_49

KO_106_PF1KO_225

E3208_76E35062_84

KO_226KO_231

CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201

M_18KO_131

CDC_9025_89219_85

E2427_73E16184_79

151_77E2866_73E3008_73

SZ1250_36_74E15_74

E1347_72Wurzburg_206

F_134sdZM603

CDC_3099_85CDC_BU22X2CDC_BU53M1Iris_1349_87E39084_85Iris_KAT1Iris_KAT4

CDC_B9624_PF12104

C5778CDC_Z111_8112051650C413C596

RAJ_KC2022

CDC_A5468_PF1E45448_86bis

3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458E1892_72E2471_73E2560_7384_036

E37992_8599_7947

RKI99_690999_570684934

RKI99_884263478

99_9324IPD_23

IPD_1200927_84

7_87_PF116_8612_7715_77

6_80_PF1E7926_77bis

CIP56_33_PF1RIMD3101010

E192_75E174_75CRIE_82

HNCMB_20080CAR7_PF1

CAR18CAR10CAR19

2_90_repeatCIP62_17_PF1CIP54_95_PF1

M430M71

CIP106200_PF1CDC_61_5512

91R14CDC_92_9000

91R17CDC_84_305

23_84CDC_C838CDC_C897

CDC_70_3827CDC_69_3823CDC_69_3818

E1012_74C_164

CDC_79_1480bisCDC_C1041

CDC_C1039bis2735

CDC_1007_74CDC_62_5000_PF1

450CDC_55_986_PF1

M160NIPHW_373NIPHW_18NIPHW_17NIPHW_374NIPHW_371CRIE_1172CRIE_904

Shigella_dysenteriae_Sd197_v1Sd197_PF1

WR1414Udorn7

CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937

60RHanabusa

M220HNCMB_20003HNCMB_20002

PCM_159CIP55_137_PF1

Sh46_58Sh45_58Sh30_44

IPSP_14_941HNCMB_20001

PCM_126RKI1966

IPSP_14_940IPSP_14_939IPSP_15_159

CIP57_28_PF1CDC_3036_94

17_89_PF193_53191_3127

98_4962_PF191_312510_879_89

E322_74CIP58_1_PF1

Sh39_47CDC_1199_PF1

M222Sh31_44Sh33_44Sh37_45CIP55_90

SH_LisbonneCRIE_954

CIP52_27_PF1Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58

M117M118M188M216M63M217M189M119

IPSP_15_158M159M165M116

M115_cloned

0.011

A1_47CDC_F1372CDC_F13711036_99_PF1SD_G3A1_13167CDC_08_3380CDC_C8546CDC_F4046CDC_F4042_repeatY489_94CDC_ZB4102_04990PAMA258_11ECDC_F1454_2_PF197_13397CDC_F3431CDC_C8558CDC_F1358CDC_F4043CDC_F3434CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690CDC_ZB37CDC_F1605M1DH06DH0506_130705_733405_7337CAR5CAR4CAR101_1502CDC_F646598_343DH0300_809799_983713_0527913_0531803_2815P2000_1070P2000_106800_516606_627600_268899_4320RKI00_2084IPD_11654IPD_2109499_10354E46746_8795_6044E44600_86CDC_9033_89AG_896E85944_92K_1613KO_140M_J84KO_170KO_73KO_216KO_120HUS_137K_1438KO_8Ra_236KO_54KO_46C_234CRIE_154CRIE_160MK_803_CE29852_83687_84E35155_8425_90CDC_84_787NCID_Az11As_15878MS_836As_15858SumotiNCID_NK2490NCID_IDH03216108_06369KH08_0455NCID_BCH518NCID_21NCID_25KOI_19TA1NCID_D44NCID_D1AR_3349305_7526_PF1As_1950038_SH05997_9297_778399_729D367_01E60750_89D482_95E35066_84KO_21_PF1AD_45740_82bis40_8109_6544KO_47CRIE_144CRIE_139CRIE_178CRIE_239CRIE_312CRIE_240CRIE_244CRIE_314CRIE_519CRIE_1203CDC_9021_89CRIE_1179E30225_83CDC_87_3333CDC_87_33303_Laos48_LaosKO_49KO_106_PF1KO_225E3208_76E35062_84KO_226KO_231CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201M_18KO_131CDC_9025_89219_85E2427_73E16184_79151_77E2866_73E3008_73SZ1250_36_74E15_74E1347_72Wurzburg_206F_134sdZM603CDC_3099_85CDC_BU22X2CDC_BU53M1E39084_85Iris_1349_87Iris_KAT1Iris_KAT4CDC_B9624_PF12104C5778CDC_Z111_8112051650C413C596RAJ_KC2022CDC_A5468_PF1E45448_86bis3_79CRIE_1180CRIE_1177CRIE_1175CRIE_2338CRIE_2458E1892_72E2471_73E2560_7384_036E37992_8599_7947RKI99_690999_570684934RKI99_88426347899_9324IPD_23IPD_1200927_847_87_PF116_8612_7715_776_80_PF1E7926_77bisCIP56_33_PF1RIMD3101010E192_75E174_75CRIE_82HNCMB_20080CAR7_PF1CAR10CAR18CAR192_90_repeatCIP62_17_PF1CIP54_95_PF1M430M71CIP106200_PF1CDC_61_551291R14CDC_92_900091R17CDC_C838CDC_C897CDC_84_30523_84CDC_70_3827CDC_69_3823CDC_69_3818E1012_74C_164CDC_79_1480bisShigella_dysenteriae_1617_v1CDC_C1041CDC_C1039bis2735CDC_1007_74CDC_62_5000_PF1450CDC_55_986_PF1M160NIPHW_373NIPHW_18NIPHW_17NIPHW_374NIPHW_371CRIE_1172CRIE_904HNCMB_25023HNCMB_25022CDC_53_3937Sd197_PF1WR1414Udorn7CIP53_136_PF1M22060RHanabusaHNCMB_20003HNCMB_20002PCM_159CIP55_137_PF1Sh46_58Sh45_58Sh30_44IPSP_14_941HNCMB_20001PCM_126RKI1966CIP57_28_PF1CDC_3036_94IPSP_14_940IPSP_14_939IPSP_15_15917_89_PF193_53191_312798_4962_PF191_312510_879_89E322_74CIP58_1_PF1Sh39_47CDC_1199_PF1Sh31_44Sh37_45Sh33_44CIP55_90SH_LisbonneCRIE_954CIP52_27_PF1M222Sh14_42Sh13_41Sh19_42Sh20_42Sh21_42Sh23_42Sh24_42Sh36_45Sh29_42Sh25_42Sh47_58M117M118M188M189M216M63M217M119IPSP_15_158M159M165M116M115_clonedI

II

IV

III

Page 13: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

13

Supplementary Fig. 12 Distribution of SNPs with respect to the S. dysenteriae type 1 reference genome Sd197. SNP counts per 10,000 bp window are plotted on the y-axis. The blue line indicates the mean rate of 39 SNPs per 10,000 bp (or 1 SNP per 256 bp). The peak is due to the rpoS gene, which contains 40 SNPs.

Chromosomal coordinates (S. dysenteriae Sd197)

SN

Ps

per 1

0,00

0 bp

0.0

0.002

0.004

0.006

0.008

1,000,000 2,000,000 3,000,000 4,000,000

80

60

40

20

0

rpoS

Page 14: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

14

Supplementary Fig. 13 Distribution of nucleotides at a key SNP position for the BS504 genome. The SNP at position 3,219,278 (T/G) is present in all the African lineage IV isolates of transmission wave T8, but not in other isolates. In the study by Rohmer et al.2, the nucleotide called for this SNP in BS504 was G, despite a frequency of T residues of 36.5% at this position. The read mapping visualised by Artemis46 at this position is also shown for our isolates CDC ZB4 (the original name of BS504, which was isolated in Zambia in 1991) and CDC 55-986 (Mexico, 1955).

nt 3,219,278 SRR765101_BS504 CDC ZB4 CDC 55-986

T, n=157 G, n=0

T, n=0 G, n=142

T, n=31 G, n=54

Page 15: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

15

Supplementary Fig. 14

Phenotypic and molecular markers from existing typing and subtyping schemes. The data are correlated with the maximum likelihood phylogeny based on 14,677 chromosomal SNPs (reference genome Sd197). The four genetic lineages (I, II, III, IV) are indicated by colour, respectively; yellow, blue, green, red. Columns next to the tree show the markers analysed. From left to right: ortho-nitrophenyl-β-galactoside test (ONPG); presence of Shiga toxin genes (stxA and stxB); presence of plasmid pSS046_spC (pSS046_spC); multilocus sequence type (MLST); and CRISPR spacer content (CRISPR) – see inset legend. The genetic data were obtained principally from whole genome sequences.

Key (pSS046_spC): Absent PresentKey (CRISPR): 10_3 11_ 12_ 12_3var1 13_ 14_ 14_32 15_ 16_ 17_ 18_ 19_ 1_ 20_ 2_ 3_ 4_ 5_ 6_ 7_ 8_ 9_Key (MLST): 10_ST146 10_ST146new1_adk426 11_ 12_ 13_ 14_ST146new2_icd511 15_ 16_ 17_ 18_ 19_ST146new2_icd513 1_ 20_ 2_ 3_ 3_ST260new1_gyrB430 4_ 5_ 6_ 7_ 7_ST260new2_icd512 8_ 9_ST260Key (stxB): Absent PresentKey (stxA): 10_Present 11_ 12_SNP 13_ 14_ 15_ 16_ 17_ 18_ 19_ 1_ 20_ 2_ 3_ 4_ 5_ 6_ 7_ 8_ 9_AbsentKey (ONPG): Negative POSITIVEKey (Lineage ): 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076

A1_47CDC_F1372CDC_F1358CDC_F1371

1036_99_PF1SD_G3A1_131

67CDC_F4043

CDC_08_3380CDC_C8546CDC_F4046

CDC_F4042_repeatY489_94

CDC_F3434CDC_ZB497_13397

CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683

CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690

CDC_F1454_2_PF1CDC_ZB37102_04990

CDC_F1605M1PAMA

258_11EDH06DH05

06_130705_733405_7337

CAR5CAR4CAR1

01_1502CDC_F6465

98_343DH03

00_809799_9837

13_0527913_05318

03_2815P2000_1070P2000_1068

00_516606_627600_2688

RKI00_208499_4320

IPD_11654IPD_2109499_10354

E46746_8795_6044

E44600_86CDC_9033_89

AG_896K_1613KO_140

M_J84KO_170KO_73

E85944_92NCID_Az11

As_15858As_15878

MS_836Sumoti

NCID_IDH03216108_06369

KH08_0455NCID_NK2490NCID_BCH518

NCID_21NCID_25

KOI_19TA1

NCID_D44NCID_D1

AR_3349305_7526_PF1

As_1950038_SH05

997_9297_7783

E60750_8999_729

D367_01D482_95

E35066_84KO_21_PF1

AD_45KO_216KO_120

HUS_137K_1438

KO_8Ra_236KO_54KO_46

CRIE_160C_234

CRIE_154MK_803_CE29852_83

687_84E35155_84

25_90CDC_84_787

740_82bis40_81

09_6544KO_47

CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314

CRIE_1203CRIE_519

CDC_9021_89CRIE_1179E30225_83

CDC_87_3333CDC_87_3330

48_Laos3_LaosKO_49

KO_106_PF1KO_225

E3208_76E35062_84

KO_226KO_231

CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201

M_18KO_131

CDC_9025_89219_85

E2427_73E16184_79

151_77E2866_73E3008_73

SZ1250_36_74E15_74

E1347_72Wurzburg_206

F_134sdZM603

CDC_3099_85CDC_BU22X2CDC_BU53M1

Iris_1349_87E39084_85

Iris_KAT1Iris_KAT4

CDC_B9624_PF12104

C5778CDC_Z1

11_8112051650C413C596

RAJ_KC2022

CDC_A5468_PF1E45448_86bis

3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458

E1892_72E2471_73E2560_73

84_036E37992_85

99_7947RKI99_6909

99_570684934

RKI99_884263478

99_9324IPD_23

IPD_1200927_84

7_87_PF116_8612_7715_77

6_80_PF1E7926_77bis

CIP56_33_PF1RIMD3101010

E192_75E174_75CRIE_82

HNCMB_20080CAR7_PF1

CAR18CAR10CAR19

2_90_repeatCIP62_17_PF1CIP54_95_PF1

M430M71

CIP106200_PF1CDC_61_5512

91R14CDC_92_9000

91R17CDC_84_305

23_84CDC_C838CDC_C897

CDC_70_3827CDC_69_3823CDC_69_3818

E1012_74C_164

CDC_79_1480bisCDC_C1041

CDC_C1039bis2735

CDC_1007_74CDC_62_5000_PF1

450CDC_55_986_PF1

M160NIPHW_373NIPHW_18NIPHW_17

NIPHW_374NIPHW_371CRIE_1172CRIE_904

Shigella_dysenteriae_Sd197_v1Sd197_PF1

WR1414Udorn7

CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937

60RHanabusa

M220HNCMB_20003HNCMB_20002

PCM_159CIP55_137_PF1

Sh46_58Sh45_58Sh30_44

IPSP_14_941HNCMB_20001

PCM_126RKI1966

IPSP_14_940IPSP_14_939IPSP_15_159

CIP57_28_PF1CDC_3036_94

17_89_PF193_531

91_312798_4962_PF1

91_312510_879_89

E322_74CIP58_1_PF1

Sh39_47CDC_1199_PF1

M222Sh31_44Sh33_44Sh37_45

CIP55_90SH_Lisbonne

CRIE_954CIP52_27_PF1

Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58

M117M118M188M216M63

M217M189M119

IPSP_15_158M159M165M116

M115_cloned

Linea

ge

ONPGstx

Astx

BMLS

TCRISPR

pSS04

6_sp

C

Keys (ONPG) Positive Negative Unknown

(stxA, stxB, pSS046_spC) Presence Absence SNP

(MLST) ST146 ST5160 ST5159 ST5157 ST260 ST5158 ST5161

(CRISPR) 3 3var1 32 !!

II

III

IV

I

Page 16: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

16

Supplementary Fig. 15

Maximum likelihood phylogenetic tree for S. dysenteriae type 1 virulence plasmid pSD1_197. This unrooted tree was constructed from the 226 plasmid-containing isolates. The genetic lineage determined from the maximum likelihood phylogeny for the 14,677 chromosomal SNPs (reference genome Sd197) is indicated in the column on the right.

Key (Genetic Lineage): II III IV0.0051

M116M188

Sh47_58M118

CIP52_27_PF19_89

91_312717_89_PF1

98_4962_PF193_531

Sh45_58Sh46_58

IPSP_15_159IPSP_14_939IPSP_14_940

HNCMB_20002HNCMB_20003

PCM_126RKI1966

HNCMB_20001CDC_3036_94CIP57_28_PF1

CRIE_954Sh13_41

CIP58_1_PF1E322_74

CDC_1199_PF1Sh39_47

M119M216

M63M189M165

CRIE_82E174_75

CIP54_95_PF1M71

CIP106200_PF1CIP62_17_PF1

CAR18CAR19

CAR7_PF1M430

16_8627_84

7_87_PF1IPD_12009

IPD_2312_77

6_80_PF1CRIE_1172

Udorn7WR1414

Sd197_PF1Shigella_dysenteriae_Sd197_v1

HNCMB_25023NIPHW_371NIPHW_374

NIPHW_17CIP53_136_PF1

CDC_55_986_PF1450

CDC_92_900091R1791R14

CDC_69_3818C_16423_84

E1012_74CDC_1007_74

sdZM603F_134

E37992_85CRIE_2338CRIE_1180CRIE_2458

E1892_7284_036

E2560_7399_9324

63478RKI99_8842

99_5706RKI99_6909

99_794784934

CDC_3099_85C413

Iris_KAT4C596

CDC_Z121041205

Iris_KAT1C5778

1650RAJ_K

CDC_B9624_PF1CDC_A5468_PF1

CDC_BU22X2C202211_81

E2427_73CDC_80_547_PF1

E1347_72M_18

E16184_79E3008_73SPH_1546

SPH_755SPH_201

SPH_2395CDC_9025_89

E2797_74E4794_76

CRIE_1179KO_73

97_778305_7526_PF1

As_1950038_SH05

As_15878NCID_Az11

SumotiKH08_0455

MS_836AR_33493

KOI_1908_06369

NCID_IDH032161NCID_NK2490

NCID_25As_15858

E35066_84E60750_89

997_92D482_95D367_01

99_729E46746_87

95_6044CDC_F1372CDC_F4579

67P2000_1068

SD_G3CAR1

RKI00_208413_0531813_05279

03_281506_6276

PAMACDC_F4046

CDC_08_3380CDC_F1358CDC_F3434CDC_F3432

00_516699_432001_1502

CDC_F4578CDC_ZB37

CDC_F2349CDC_F1454_2_PF1

CDC_F45681036_99_PF1

CDC_F4043CDC_F690

CDC_F1605M1CAR5

CDC_F68398_343

CDC_C8558CDC_F1371CDC_C8546

IPD_11654CDC_ZB4

IPD_2109497_13397

A1_131Y489_94

CDC_F343700_2688

CDC_F456405_733705_733406_1307

DH05DH0625_90AD_45

KO_21_PF1E3208_76

48_LaosKO_140AG_89609_6544

3_LaosK_1613

40_81KO_106_PF1

KO_49CRIE_1203

CRIE_519CRIE_239CRIE_139CRIE_314CRIE_240CRIE_144CRIE_312CRIE_178CRIE_244

KO_46K_1438KO_216KO_120

HUS_137MK_803_C

C_234KO_8

KO_47E85944_92

CDC_9021_89M_J84

KO_170CDC_9033_89

Geneti

c Line

age

Lineage'

0.0051

Page 17: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

17

Supplementary Fig. 16

Pan-genome analysis. a, Changes to the total genes (dotted) and conserved genes (solid) in the pan-genome with the addition of genomes. b, Breakdown of the frequency of genes within isolates where the categories are defined as: core, genes contained in nearly all isolates (>=99%); soft core, genes contained in 95%-99% of the isolates; cloud, genes present only in a few isolates (15%-95%); shell, the remaining genes, present in several isolates (<15%).

a b

Page 18: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

18

Supplementary Tables

Supplementary Table 1 – Details of Shigella dysenteriae type 1 isolates and genomes

used in this study.

The following are shown: year and country of isolation, epidemiological information;

lineage, biotype, antibiotic resistance phenotypes, source and EBI-ENA accession

numbers (sheet no. 1); spatiotemporal distribution (sheet no. 2); MLST, CRISPR type,

presence/absence of stxA and stxB genes (sheet no. 3); gyrA SNPs and presence/absence

of resistance-associated genes and structures (sheet no. 4); distribution of the lineages

over time (sheet no. 5).

See separate Excel file

Supplementary Table 2 – Whole-genome sequences, SNPs and phylogenetic data

used in this study.

The following are shown: mapping statistics (sheet no. 1); assembly statistics (sheet no.

2); SNPs used for phylogeny (sheet no. 3); pairwise SNP distance between the isolates

(sheet no. 4); summary of Bayesian models used for analyses with BEAST (sheet no. 5);

date estimates for the main lineages (sheet no. 6); date estimates for the intercontinental

transmission events (sheet no. 7); Gene Ontology functions of the 5,630 annotated

accessory genes (sheet no. 8).

See separate Excel file

Page 19: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

19

Supplementary Table 3 – CRISPR spacer sequences analysed in this study.

Identifier DNA sequence

A1-var1 CAAGTGATATCCATCATCGCATCCAGTGCGCC

6 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC

24 TCGGTTCAGGCGTTGCAAACCTGGCTACCGGG

115 TCGGTTCAGGCGTTGCAAACCTAGCTACCGGG

21 GTAGTCCATCATTCCACCTATGTCTGAACTCC

Supplementary Discussion

SNP variation in S. dysenteriae type 1

The Maximum likelihood (ML) phylogenetic analysis (RAxML) with non-S. dysenteriae

outgroups and the BEAST analysis (Fig. 2 and Supplementary Fig. 7) showed that M115

is most closely related to the ancestral Shigella dysenteriae type 1 (Sd1) strain. This

isolate, which differed from others by more than 1,200 SNPs, was the only representative

of lineage I. Otherwise, M115 displayed all the characters of Sd1 in terms of biotyping,

serotyping, MLST, CRISPR typing and the presence of the Shiga toxin genes. To

confirm that the divergence of M115 was not due to laboratory contamination or a

hypermutator phenotype, it was even sequenced a second time from a separate DNA

extract, after serial dilution, to ensure that the DNA came from a single colony. The two

genomic sequences obtained were identical. M115 displayed no modifications to genes

involved in the DNA repair system (mutS, mutH, mutL, and uvrD), and it did not have

Page 20: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

20

the hypermutator phenotype (Supplementary Fig. 8). The various ML trees were

therefore rooted on M115.

The topology of the two ML trees obtained after mapping short-read sequences against

the sequences of the Sd1 reference genomes Sd197 and Sd1617 were similar

(Supplementary Figs. 9-11). We therefore used the 14,677 chromosomal SNPs randomly

distributed over the non-repetitive non-recombinant core genome detected after mapping

against Sd197 (Supplementary Fig. 12).

All BAPS runs converged on three sequence clusters corresponding to lineages II to IV,

concordant with the results obtained for clustering by eye, except for M115. This unique

lineage I isolate was consistently added to lineage II by BAPS, despite differing from the

other isolates in this lineage by more than 1,200 SNPs and having been shown to be the

isolate most closely related to the ancestral Sd1 strain by BEAST analysis (see above).

The mean intra-clade pairwise SNP variation within lineages was 275 (minimum 1-

maximum 517) for the 58 lineage II isolates, 417 (0-677) for the 64 lineage III isolates,

and 192 (0-485) for the 208 lineage IV isolates. Within lineage IV, the isolates from the

outbreak in Central Africa in the 1980s (18 T5 isolates isolated from three countries

between 1981 and 1990) differed by a mean of 35 SNPs (1-76), whereas the isolates

from the emerging ortho-nitrophenyl-β-galactoside negative (ONPG-) African strain (63

T8 isolates isolated from 22 countries between 1991 and 2011) differed by a mean of 56

SNPs (0-133).

Page 21: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

21

We assessed the influence of storage conditions on the rare isolates from a single

outbreak recovered by different groups with different culture preservation methods. The

CDC A5468 and 11-81 isolates, which originated from the Democratic Republic of the

Congo (DRC) and were obtained in 1981, were stored at -80°C or freeze-dried. They

differed by only 10 SNPs. The Iris Kat-4 isolate was isolated in DRC in 1985 and stored

at -80°C, whereas the E39084/85 isolate was collected in Rwanda in 1985 and stored at

room temperature as a stab culture on Dorset egg medium. They also differed by only 10

SNPs. Indirectly, the case of the CDC 3036-94 isolate (see below) also argues for the

stability of the SNP pattern. We cannot rule out some differential SNP evolution due to

the number of passages before preservation and the mode of storage, but the limited SNP

variation observed for 30 year-old isolates and the consistency of the phylogeographic

grouping suggest that this was not a major issue in this study.

Other information revealed by the genome sequencing data

Over the last 10 years, we have tried unsuccessfully to find the original cultures

established by Shiga in Japan in 18975 and Kruse in Germany in 190064. It seems likely

that they have been lost or destroyed. However, on the basis of our phylogeographic

data, it seems likely that Shiga’s isolate would have belonged to sublineage IIIa and that

Kruse’s isolate would have belonged to lineage II.

In 1986-1987, a multidrug resistant Sd1 strain caused outbreaks in north-eastern

Thailand (and also in Laos, according to our isolates) after a lull of 20 years14. Based on

the distance of this region from India and Bangladesh, and the plasmid and antibiotic

Page 22: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

22

susceptibility patterns of the isolates, the authors concluded that this strain was unlikely

to have originated from the Indian subcontinent. However, our results suggest that the

sequences of the outbreak isolates were derived from those of bacteria isolated in India

from 1978 to 1982, and in Bangladesh in 1985. Furthermore, all these isolates had the

same antibiotic resistance structures (i.e., two chromosomally encoded transposons,

which are the composite transposon shown in Supplementary Fig. 6 and Tn7).

During the early 1990s, a nalidixic acid-resistant strain caused further outbreaks in

Thailand, this time close to the Burmese border65. Our phylogenetic data indicated that

the outbreak strain was derived from a multidrug-resistant but nalidixic acid-susceptible

strain isolated in Bangladesh in 1987.

In the former USSR, Sd1 or Grigoriev Shiga’s bacillus was highly prevalent from 1917

to 1922, after which its prevalence decreased steadily, reaching negligible levels in the

1950s66. It re-emerged during the 1980s in the Central Asian Soviet Republics, including

the Uzbek Republic, and it was linked to the Afghanistan war and the flow of

populations in Central Asia67. The Uzbek SSR served as a springboard for the

subsequent spread of infection, in the form of sporadic cases, to cities located in the

European part of the USSR (Riga, Moscow, Ulyanovsk, Kuibishev)67. Our data reveal

that these two periods of activity were associated with two different lineages, European

lineage II for the first and South Asian lineage IV (with different subclades) for the

second.

Page 23: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

23

The CDC 3036-94 isolate recovered from a child in Tennessee, USA, in 1994 was highly

unusual as it belonged to lineage II and was susceptible to all the antimicrobial agents

tested. The last such pan-susceptible lineage II isolates on record were recovered from

North Africa during the 1970s. Whole-genome sequencing revealed that this case was

probably associated with contamination from laboratory stocks, as the genome of CDC

3036-94 differed from CIP 57.28 by only five SNPs. CIP 57.28 was isolated in the UK in

1934 as the “Newcastle” strain or NCTC 4837, 60 years before the CDC 3036-94 sample

was isolated. Following its isolation in 1934, this “Newcastle”/NCTC 4837 strain was

deposited in various international collections, including CIP under accession no. 57.28

and ATCC under accession no. 13313. The presence in Tennessee of a biotechnology

company using ATCC 13313 to prepare rabbit polyvalent antisera provides further

support for the hypothesis of laboratory contamination, although no information is

available to connect the patient from whom CDC 3036-94 was isolated with the

biotechnology company. The possibility of tube switching has also been ruled out as we

obtained and sequenced the CDC 3036-94 isolate two years after we sequenced CIP

57.28/ATCC 13313. Furthermore, the genome sequence we obtained for CDC 3036-94

was identical to the publicly available BS506 genome2 obtained independently by

another group from a different stock culture of CDC 3036-94.

Differences found compared to a previous study

In the main text, we show that there was no lack of consistency between phylogeny and

geography, as claimed by Rohmer et al.2. Instead, there were strong phylogeographic

patterns. Our study was based on a wide temporal and geographical sampling of more

Page 24: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

24

than 300 isolates, resulting in 14,677 informative SNPs compared to 56 isolates and 989

SNPs for the other group. Their lineage A comprised a single isolate, BS506 (original

name CDC 3036-94), found to belong to our old European lineage II. However, as this

isolate was recovered in the USA in 1994, and the likely laboratory contamination with

an old collection strain (see above) was not identified, this lineage was attributed to the

USA and this confused the phylogeographic analysis. Their lineage B comprised only the

Sd197 reference genome and corresponded to our Eastern Asian sublineage IIIa. Their

lineage C corresponded to our American sublineage IIIb, with, however, two spurious

sequences from African isolates (DH02 and BS504), also confusing the phylogeographic

analysis (see below). Their lineage D corresponded to our lineage IV. Lineage I and

sublineages IIIc and IIId were not found in their study.

The hypothesis of long-term human carriers proposed by Rohmer et al.2 was essentially

based on this lack of consistency between phylogeny and geography, as observed for

Salmonella enterica serotype Typhi68, the agent of typhoid fever, which may be carried

for several decades in the gallbladder of some convalescent patients. The clear pattern of

successive transmission waves following the importation of lineage III and IV strains

into Africa does not support this hypothesis of long-term carriers for Sd1.

Rohmer et al. 2 also claimed that the massive outbreak that hit Central America

(estimated 500,000 cases)10,11 during the late 1960s might have been caused by an

African strain that became established in the New World at the beginning of the 1960s.

This hypothesis was based on the grouping of two of their African isolates, DH02 and

BS504 (original name CDC ZB4), at the base of the C lineage, the tips of which

correspond to Central American isolates. As our CDC ZB4 isolate, together with our

Page 25: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

25

258-11E and PAMA isolates collected during an outbreak in Cameroon in 199869 (likely

the same outbreak as for DH02) clustered within the ONPG- lineage IV T8 subclade,

which contains almost all the African isolates obtained since 1991, we analysed the

deposited BS504 and DH02 short reads for certain SNPs characteristic of either

sublineage IIIb (North and Central American isolates) or the ONPG-negative lineage IV

subclade. We observed a heterogeneous distribution of two nucleotides at these positions

(Supplementary Fig. 13) indicative of a mixture of two isolates from different genetic

backgrounds, explaining the spurious grouping.

Our results demonstrate that sublineage IIIb, containing only North and Central

American isolates, had a common ancestor dating back to 1893 [95% credible interval

(CI), 1885-1901]. One of the isolates was even isolated in Mexico in 1955, several years

before the postulated establishment of the African strain in America. Our genomic data

are also consistent with reliable old published reports of the isolation of Sd1 at medical

institutions in New England during the early 1900s70,71 or at a camp for Mexican workers

in Michigan in 193872.

It was also claimed that the diagnostic tools might be jeopardized by genetic drift

affecting metabolic activities as well as surface antigens, some of which were targeted by

serotyping. We found, to the contrary, considerable phenotypic and genetic homogeneity

in our dataset (see next section), for all the typing and subtyping tools used by clinical

and public health microbiology laboratories. The only SNP (within lacZ) associated with

the loss of a typing character (ONPG test) was a useful marker of the strain that spread

across Africa during the 1990s and 2000s.

Page 26: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

26

Correlation of S. dysenteriae type 1 phylogenetic lineages with existing typing and

subtyping schemes

Sd1 is known to contain the stxA and stxB genes encoding the Shiga toxin STX1, on a

defective lambdoid prophage73. In a context of the presence of hundreds of insertion

sequences (ISs) within the Sd1 genome17,18, the stxA and stxB genes have remained

remarkably conserved over a period of almost a hundred years. Only two isolates have

lost both stxA and stxB, and three isolates have a SNP within stxA. These findings do not

reflect a sampling bias, as the identification of Sd1 is based on biochemical tests and

serotyping. Searches for stx genes or STX production are not carried out routinely for

Shigella spp. isolates in clinical microbiology or public health laboratories.

The genetically distinct lineages of Sd1 showed only low levels of uncorrelated diversity

on assessment with existing subtyping methods: biotyping74, multilocus sequence typing

(MLST)36, CRISPR typing22,75, plasmid profiling76, and pulsed-field gel electrophoresis

(PFGE)25,77 (Supplementary Figs 3 and 14).

The ONPG test was the only conventional phenotypic test to give variable results. This

test assesses β-galactosidase activity, which is intense and rapid in Sd1 (generally taking

less than 3 hours). The loss of β-galactosidase activity was observed in some isolates

from across the tree but was a constant marker for the lineage IV African isolates of the

T8 intercontinental transmission wave (Fig. 2). This ONPG- character was first reported

in the DRC in 199478 but we detected this marker in older African isolates (Zambia,

Page 27: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

27

1991) and in some Indian and Nepalese genomes isolated earlier and genetically

ancestral to the T8 African genomes (Fig. 2). These South Asian genomes and the

derived African T8 genomes had in common a non-synonymous SNP (C to T at position

363,921 of the reference genome Sd197, leading to a glycine-to-serine substitution)

within the lacZ gene. The other sporadic ONPG- isolates in the other lineages do not

have this non-synonymous SNP. This SNP thus constitutes a good candidate marker for

the emerging African strain.

MLST, which has become the gold standard for bacterial population typing, revealed the

presence of two main STs, ST260 (n=54, lineage II isolates) and ST146 (n=270, other

lineages), differing by a single SNP in one of the seven 500 bp “housekeeping” genes

targeted by this method. In addition, seven genomes belonged to four new STs that were

single-locus variants of ST260 and ST146.

CRISPR types were also very stable across the lineages. We analysed the de novo

assemblies for the different CRISPR spacer sequences and found that all but six genomes

belonged to CRISPR type (CT) 3, with the following four spacers: A1-var1, 6, 24, and

21 (Supplementary Table 3). One genome belonged to CT32 (A1-var1, 6, 115, and 21;

spacer 115 being a single SNP variant of spacer 24) and five genomes belonged to

CT3var1, which differed from CT3 by a single SNP within one direct repeat (DR) within

the CRISPR sequence.

Plasmid profiling based on the number and size of plasmids within a single isolate has

been widely used for differentiating Sd1 isolates. The independent acquisition of

Page 28: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

28

similarly sized multidrug-resistant plasmids (Supplementary Fig. 4) and the distribution

of plasmids not containing ARGs, such as pSS04-spC, across all lineages

(Supplementary Fig. 14), preclude the assessment of phylogenetic relationships between

isolates by this method. Furthermore, the shift from plasmids to genomic islands as the

support for antibiotic resistance in the last two to three decades has probably decreased

the plasmid content of isolates.

Over the last two decades, PFGE has become the method of choice for subtyping enteric

bacteria at strain level. In light of the genome sequences obtained, we re-evaluated two

outbreaks that occurred in the Central African Republic in 2003-200425, and which we

had investigated with PFGE as a molecular epidemiology tool (Supplementary Fig. 3).

PFGE distinguished two groups of profiles, one (PFGE profiles X1 to X4) for the

“Ouham-Pende” outbreak and the other (PFGE profiles X5 to X7) for the “Nana-

Grebizi” outbreak. Both PFGE groups were tightly clustered, whereas the isolates used

for comparison displayed PFGE profiles (X8 to X18) very different from those seen for

the isolates of the two outbreaks. Genomic data showed that the strains that had caused

the “Nana-Grebizi” and “Ouham-Pende” outbreaks belonged to different lineages, IIIc

and IV, respectively, differing by ~700 SNPs. The intra-SNP variation observed among

outbreak isolates and giving rise to slightly different PFGE profiles was 5-33 and 10-20

for the “Nana-Grebizi” and “Ouham-Pende” outbreaks, respectively. Most of the

genomes of the comparison isolates were actually close to the “Ouham-Pende” genomes

(differing from them by 37 to 61 SNPs), despite the lack of relationship suggested by

PFGE. This lack of correlation between PFGE and WGS data confirms that PFGE should

Page 29: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

29

not be used for assessing phylogenetic relationships in an organism with a very plastic

genome containing hundreds of IS, such as Sd1.

Coevolution between the VP and the chromosome

A large virulence plasmid (VP) was present in 226 isolates. The ML phylogeny based on

290 informative SNPs was similar to the chromosome phylogeny (Supplementary Fig.

15), indicating a coevolution of the chromosome and the VP since at least 1853 (95% CI

1831-1871), the date of the MRCA of all the Sd1 isolates other than M115. The M115

isolate did not contain the VP, according to our search criteria (see Methods section).

Structure of the cadBA operon

In Shigella spp. and enteroinvasive E. coli (EIEC), an inability to synthesise lysine

decarboxylase (LDC) and, thus, produce cadaverine has been identified as a convergent

pathoadaptive mutation that enhances virulence79-81. Comparative analysis has shown

that the ancestral LDC trait was lost through various rearrangements of the cadBA

operon encoding LDC and its transporter. We analysed this operon from the PacBio

sequences of nine Sd1 isolates from lineages I (M115), II (M116 and 17/89), IIIa

(Sd197), IIIb (CDC 69-3818), IIIc (CAR10), and IV (40-81, CDC ZB4 and 99-9324). A

similar structure was found in all these isolates. The cadAB operon was located between

ytfQ (SDY_4463 of Sd197; GenBank accession no. CP000034) and yjdL (SDY_4467).

The cadA gene (SDY_4466) displayed a five-nucleotide deletion leading to a frameshift

with a premature stop codon, and the cadB gene (SDY_4465) was interrupted by an IS1.

Page 30: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

30

The cadC gene, a regulator of the cadAB operon was absent. In addition, an IS1 element

was found inserted into the cadA gene of M115 and CDC 69-3818 and a second

frameshift was found at the end of the cadA gene in the lineage IV isolates 40-81, CDC

ZB4 and 99-9324.

Pan-genome and antibiotic resistance

The pan-genome analysis (Supplementary Fig. 16) identified a total of 11,830 genes for

the 330 Sd1 genomes studied. A core genome of 2,194 genes was identified, comprising

1,132,109 bases. Of the 7,345 accessory genes, 5,630 were annotated with 22,135 Gene

Ontology (GO) terms. The top GO terms corresponded to DNA/plasmid binding

(GO:0003677) and transposition, DNA-mediated functions (GO:0006313)

(Supplementary Table 2). Taking into account the various large multidrug-resistant

plasmids (70 to 160 genes per plasmid) we have sequenced by 454 or PacBio, together

with other plasmid fragment sequences obtained by Illumina short-read sequencing, we

can conclude that the number of genes in the accessory genome supporting antibiotic

resistance probably exceeds 1,000.

The accessory genome linked to antibiotic resistance was maintained in the descendants,

whereas structures not involved in antibiotic resistance, such as prophages, were found in

only one or a few isolates and were not studied further.

Cotrimoxazole, a combination of sulfamethoxazole and trimethoprim, has been widely

used to treat Shigella infections since the late 1960s, when the first multidrug-resistant

strains appeared. The first cotrimoxazole-resistant Sd1 strains were isolated on the Indian

Page 31: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

31

subcontinent in the early 1980s82. The dhfrI (or dfrA1) gene was found in all the isolates,

either on a 20-MDa plasmid for the ampicillin-resistant isolates, or chromosomally

encoded for the ampicillin-susceptible isolates. Haider et al.82 thought that there might be

a transposition of this dhfrI gene between the 20-MDa plasmid and the chromosome.

Our first cotrimoxazole-resistant Sd1 isolate was obtained in India in 1978. It contains

the dfrA1 gene on a Tn7-like transposon integrated into the chromosome. The class 2

integron did not contain the aadA1 gene (encoding resistance to streptomycin and

spectinomycin), as for the classical Tn783. The intI2-dfrA1-sat2-orfX-ybfA-ybfB-ybgA-

tnsE-tnsD-tnsC-tnsB-tnsA genes were found to be present and the transposon was named

Tn7::In2-9, in accordance with the nomenclature of ref. 83. Tn7::In2-9 was also found in

114 other Sd1 isolates, all from lineage IV. A classical Tn7 was found in three Egyptian

isolates from 1999. A 30-kb IncX4 plasmid containing dfrA1 was also found in 14 Sd1

isolates (one from lineage II and 13 from lineage IV), none of which contained the

chromosomal Tn7 or Tn7::In2-9. However, the plasmid dfrA1 gene was in a class 2

integron (In2-9) with no trace of the Tn7 transposition module, and could not, therefore,

have transposed to the chromosome as suggested by Haider et al.82. In Africa, our first

isolate resistant to cotrimoxazole was isolated in the DRC in 1983. Resistance to this

antibiotic was observed in 1981, less than two years after the start of the so-called

“Zairian” outbreak caused by a strain initially resistant to ampicillin, chloramphenicol,

tetracycline, streptomycin and sulfonamides84. The initial multidrug resistance was due

to a 50-kb IncX1 plasmid encoding resistance to ampicillin, chloramphenicol, and

tetracycline (pA5468) and a 6-kb plasmid encoding resistance to streptomycin and

sulfonamides (pETEC6). Resistance to cotrimoxazole was conferred by a dfrA1 gene

encoded on a 110-kb IncI1 pST186 plasmid (pBU53M1).

Page 32: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

32

In our collection, the two oldest nalixidic acid-resistant Sd1 isolates were obtained in the

DRC and Bangladesh, both in 1985. However, the first reported isolation of Sd1 isolates

with this pattern of resistance was in April 1982 in the DRC84. This isolation occurred

less than one year after the introduction of nalidixic acid as first-line therapy during the

“Zairian” outbreak, in which isolates rapidly became resistant to cotrimoxazole.

The next step in the development of a multidrug resistance profile was the acquisition of

resistance to ciprofloxacin, a fluoroquinolone, mediated by a double mutation in gyrA

(S83L and a second mutation in codon 87) and a mutation in the topoisomerase IV parC

gene (S80I). In our dataset, resistance to ciprofloxacin was acquired only once, in a

group of 20 isolates from the Indian subcontinent collected between 1995 and 2010

(MIC ciprofloxacin 4-12 mg/L). This is consistent with published reports of an

emergence of ciprofloxacin-resistant Sd1 in West Bengal in 2002, after a hiatus of 14

years in which Sd1 was not isolated85,86. A PFGE approach showed that the

ciprofloxacin-resistant Sd1 isolates were clonal, a finding subsequently confirmed by

whole-genome sequencing on a larger sample. However, we identified an internal branch

corresponding to seven isolates with a mutation of codon 87 (D87G) other than the

predominant mutation (D87N). Six of these seven isolates were collected in Bengal in

2002, during the Diamond Harbor and Siliguri outbreaks86. Similarly, two different

mutations in codon 87 of gyrA were previously identified in genetically related

ciprofloxacin-resistant enteric bacterial pathogens of the S. enterica serotype Kentucky

ST198-X1, a bacterium subject to high levels of fluoroquinolone selection pressure in the

poultry industry87.

Page 33: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

33

Supplementary References 64 Kruse, W. Ueber die Ruhr als Volkskrankheit und ihren Rrreger. Deutsche Med. Woch. 26, 637-639 (1900). 65 Hoge, C. W., Bodhidatta, L., Tungtaem, C. & Echeverria, P. Emergence of nalidixic acid-resistant Shigella dysenteriae type 1 in Thailand: an outbreak associated with consumption of a coconut milk dessert. Int. J. Epidemiol. 24, 1228-1232 (1995). 66 Krasheninnikov, O. A. [Features of the geographic distribution of Shigellae. I. Changes in the etiologic structure of dysentery in Russia and the USSR (1900-1950)]. Russian. Zh. Mikrobiol. Epidemiol. Immunobiol. 45, 21-31 (1968). 67 Solodovnikov, I. u. P. et al. [The epidemiological characteristics of the spread of Grigor'ev-Shiga dysentery in the territories of the former USSR in recent years]. Russian. Zh. Mikrobiol. Epidemiol. Immunobiol. 1, 31-36 (1994). 68 Holt, K.E., et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat. Genet. 40, 987-993 (2008). 69 Cunin, P. et al. An epidemic of bloody diarrhea: Escherichia coli O157 emerging in Cameroon? Emerg. Infect Dis. 5, 285-290 (1999). 70 Vedder EB, Duval CW. The etiology of acute dysentery in the United States. J. Exp. Med. 6, 181-205 (1902). 71 Hiss, P. H. On fermentative and agglutinative characters of bacilli of the "Dysentery Group". J. Med. Res. 13, 1-51 (1904). 72 Block, N. B. & Ferguson, W. An outbreak of Shiga dysentery in Michigan, 1938. Am. J. Public Health Nations Health 30, 43-52 (1940). 73 Greco, K. M., McDonough M. A. & Butterton J. R. Variation in the Shiga toxin region of 20th-century epidemic and endemic Shigella dysenteriae 1 strains. J. Infect. Dis. 190, 330-334 (2004). 74 Le Minor L, Richard C. Méthodes de laboratoire pour l'identification des entérobactéries. Paris, France: Institut Pasteur; 1993. pp. 72–78. 75 Touchon, M. & Rocha, E. P. The small, slow and specialized CRISPR and anti-CRISPR of Escherichia and Salmonella. PLoS One 5, e11126 (2010).

Page 34: ARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27 ... · 2 Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of

34

76 Haider, K., Kay B. A., Talukder, K.A. & Huq, M. I. Plasmid analysis of Shigella dysenteriae type 1 isolates obtained from widely scattered geographical locations. J. Clin. Microbiol. 26, 2083-2086 (1988). 77 Talukder, K. A, Dutta, D. K. & Albert, M.J. Evaluation of pulsed-field gel electrophoresis for typing of Shigella dysenteriae type 1. J. Med. Microbiol. 48, 781-784 (1999). 78 Cavallo, J. D., Niel, L., Talarmin, A. & Dubrous, P. [Antibiotic sensitivity to epidemic strains of Vibrio cholerae and Shigella dysenteriae 1 isolated in Rwandan refugee camps in Zaire]. French. Med. Trop. 55, 351-353 (1995). 79 Maurelli, A.T., Fernández, R.E., Bloch, C.A., Rode, C.K., & Fasano A. "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc. Natl Acad. Sci. U S A. 95, 3943-3948 (1998). 80 Casalino, M., Latella, M.C., Prosseda, G., & Colonna, B. CadC is the preferential target of a convergent evolution driving enteroinvasive Escherichia coli toward a lysine decarboxylase-defective phenotype. Infect. Immun. 71, 5472-5479 (2003). 81 Day, W.A. Jr, Fernández, R.E., & Maurelli, A.T. Pathoadaptive mutations that enhance virulence: genetic organization of the cadA regions of Shigella spp. Infect. Immun. 69, 7471-7480 (2001). 82 Haider, K. et al. Trimethoprim resistance gene in Shigella dysenteriae 1 isolates obtained from widely scattered locations of Asia. Epidemiol. Infect. 104, 219-228 (1990). 83 Ramírez, M. S., Piñeiro, S., Argentinian Integron Study Group & Centrón, D. Novel insights about class 2 integrons from experimental and genomic epidemiology. Antimicrob. Agents Chemother. 54, 699-706 (2010). 84 Rogerie, F. et al. Comparison of norfloxacin and nalidixic acid for treatment of dysentery caused by Shigella dysenteriae type 1 in adults. Antimicrob. Agents Chemother. 29, 883-886 (1986). 85 Dutta, S. et al. Shigella dysenteriae serotype 1, Kolkata, India. Emerg. Infect. Dis. 9, 1471-1474 (2003). 86 Pazhani, G.P. et al. Clonal multidrug-resistant Shigella dysenteriae type 1 strains associated with epidemic and sporadic dysenteries in eastern India. Antimicrob. Agents Chemother. 48, 681-684 (2004). 87 Le Hello, S. et al. International spread of an epidemic population of Salmonella enterica serotype Kentucky ST198 resistant to ciprofloxacin. J. Infect. Dis. 204, 675-684. (2011).