Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Supplementary Information Global phylogeography and evolutionary history of Shigella dysenteriae type 1
Elisabeth Njamkepo, Nizar Fawal, Alicia Tran-Dien, Jane Hawkey, Nancy
Strockbine, Claire Jenkins, Kaisar A. Talukder, Raymond Bercion, Konstantin
Kuleshov, Renáta Kolínská, Julie E. Russell, Lidia Kaftyreva, Marie Accou-
Demartin, Andreas Karas, Olivier Vandenberg, Alison E. Mather, Carl J. Mason,
Andrew J. Page, Thandavarayan Ramamurthy, Chantal Bizet, Andrzej Gamian,
Isabelle Carle, Amy Gassama Sow, Christiane Bouchier, Astrid Louise Wester,
Monique Lejay-Collin, Marie-Christine Fonkoua, Simon Le Hello, Martin J. Blaser,
Cecilia Jernberg, Corinne Ruckly, Audrey Mérens, Anne-Laure Page, Martin Aslett,
Peter Roggentin, Angelika Fruth, Erick Denamur, Malabi Venkatesan, Hervé
Bercovier, Ladaporn Bodhidatta, Chien-Shun Chiou, Dominique Clermont, Bianca
Colonna, Svetlana Egorova, Gururaja P. Pazhani, Analia V. Ezernitchi, Ghislaine
Guigon, Simon R. Harris, Hidemasa Izumiya, Agnieszka Korzeniowska-Kowal,
Anna Lutyńska, Malika Gouali, Francine Grimont, Céline Langendorf, Monika
Marejková, Lorea A. M. Peterson, Guillermo Perez-Perez, Antoinette Ngandjio,
Alexander Podkolzin, Erika Souche, Mariia Makarova, German A. Shipulin,
Changyun Ye, Helena Žemličková, Mária Herpay, Patrick A.D. Grimont, Julian
Parkhill, Philippe Sansonetti, Kathryn E. Holt, Sylvain Brisse, Nicholas R. Thomson,
François-Xavier Weill.
SUPPLEMENTARY INFORMATIONARTICLE NUMBER: 16027 | DOI: 10.1038/NMICROBIOL.2016.27
NATURE MICROBIOLOGY | www.nature.com/naturemicrobiology 1
2
Supplementary Fig. 1 Source, phylogenetic clustering, and putative origins of the 14 World War I (WWI) isolates of the Murray collection. a, The phylogeny shown is the maximum likelihood phylogeny based on 14,677 chromosomal SNPs (reference genome Sd197). b, Source of the strains with hypotheses concerning their geographic origin.
1
2-5
6-10
II IIIa
Lineage
no. of isolates
IIIc
2nd Western General Hospital, Manchester
West Africa via French colonial soldiers ?
Japan via Imperial Navy ships!
Gallipoli (evacuated troops)
Royal Navy Hospital Mtarfa, Malta
R.A.M. College, Millbank
IIIb
Lister Institute of Preventive Medicine, London
?!?!
?!
a
WWI isolates
Key ( WW1): Absent PresentKey (Lineage ): 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076
A1_47CDC_F1372CDC_F1358CDC_F1371
1036_99_PF1SD_G3A1_131
67CDC_F4043
CDC_08_3380CDC_C8546CDC_F4046
CDC_F4042_repeatY489_94
CDC_F3434CDC_ZB497_13397
CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579
CDC_F683CDC_F344X1
CDC_F2856CDC_F2349CDC_F3432
CDC_F690CDC_F1454_2_PF1
CDC_ZB37102_04990
CDC_F1605M1PAMA
258_11EDH06DH05
06_130705_733405_7337
CAR5CAR4CAR1
01_1502CDC_F6465
98_343DH03
00_809799_9837
13_0527913_05318
03_2815P2000_1070P2000_1068
00_516606_627600_2688
RKI00_208499_4320
IPD_11654IPD_2109499_10354
E46746_8795_6044
E44600_86CDC_9033_89
AG_896K_1613KO_140
M_J84KO_170
KO_73E85944_92NCID_Az11
As_15858As_15878
MS_836Sumoti
NCID_IDH03216108_06369
KH08_0455NCID_NK2490NCID_BCH518
NCID_21NCID_25
KOI_19TA1
NCID_D44NCID_D1
AR_3349305_7526_PF1
As_1950038_SH05
997_9297_7783
E60750_8999_729
D367_01D482_95
E35066_84KO_21_PF1
AD_45KO_216KO_120
HUS_137K_1438
KO_8Ra_236KO_54KO_46
CRIE_160C_234
CRIE_154MK_803_CE29852_83
687_84E35155_84
25_90CDC_84_787
740_82bis40_81
09_6544KO_47
CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314
CRIE_1203CRIE_519
CDC_9021_89CRIE_1179E30225_83
CDC_87_3333CDC_87_3330
48_Laos3_LaosKO_49
KO_106_PF1KO_225
E3208_76E35062_84
KO_226KO_231
CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395
SPH_755SPH_201
M_18KO_131
CDC_9025_89219_85
E2427_73E16184_79
151_77E2866_73E3008_73
SZ1250_36_74E15_74
E1347_72Wurzburg_206
F_134sdZM603
CDC_3099_85CDC_BU22X2CDC_BU53M1
Iris_1349_87E39084_85
Iris_KAT1Iris_KAT4
CDC_B9624_PF12104
C5778CDC_Z1
11_8112051650C413C596
RAJ_KC2022
CDC_A5468_PF1E45448_86bis
3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458
E1892_72E2471_73E2560_73
84_036E37992_85
99_7947RKI99_6909
99_570684934
RKI99_884263478
99_9324IPD_23
IPD_1200927_84
7_87_PF116_8612_7715_77
6_80_PF1E7926_77bis
CIP56_33_PF1RIMD3101010
E192_75E174_75CRIE_82
HNCMB_20080CAR7_PF1
CAR18CAR10CAR19
2_90_repeatCIP62_17_PF1CIP54_95_PF1
M430M71
CIP106200_PF1CDC_61_5512
91R14CDC_92_9000
91R17CDC_84_305
23_84CDC_C838CDC_C897
CDC_70_3827CDC_69_3823CDC_69_3818
E1012_74C_164
CDC_79_1480bisCDC_C1041
CDC_C1039bis2735
CDC_1007_74CDC_62_5000_PF1
450CDC_55_986_PF1
M160NIPHW_373
NIPHW_18NIPHW_17
NIPHW_374NIPHW_371CRIE_1172
CRIE_904Shigella_dysenteriae_Sd197_v1
Sd197_PF1WR1414
Udorn7CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937
60RHanabusa
M220HNCMB_20003HNCMB_20002
PCM_159CIP55_137_PF1
Sh46_58Sh45_58Sh30_44
IPSP_14_941HNCMB_20001
PCM_126RKI1966
IPSP_14_940IPSP_14_939IPSP_15_159
CIP57_28_PF1CDC_3036_94
17_89_PF193_531
91_312798_4962_PF1
91_312510_87
9_89E322_74
CIP58_1_PF1Sh39_47
CDC_1199_PF1M222
Sh31_44Sh33_44Sh37_45
CIP55_90SH_Lisbonne
CRIE_954CIP52_27_PF1
Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58
M117M118M188M216
M63M217M189M119
IPSP_15_158M159M165M116
M115_cloned
Linea
ge
WW1
IV
I
III
II b
3
Supplementary Fig. 2 Year of isolation vs. root-to-tip distances extracted by Path-O-Gen from an ML phylogeny. Linear regression line, slope, R² correlation coefficient, and time to the most recent common ancestor (TMRCA) are indicated for the whole dataset (panel a) and separately for lineages II to IV (panels b to d). Isolate CDC 3036-94, which was probably acquired during laboratory contamination with an old collection strain, was excluded from the analysis. The maximum likelihood (ML) phylogeny used is based on 14,677 chromosomal SNPs (reference genome Sd197).
b
d c
a
4
Supplementary Fig. 3 Correlation between pulsed-field gel electrophoresis (PFGE) data and genomic sequences from isolates recovered during two outbreaks in the Central African Republic in 2003-2004. a, Time span, location, morbidity and mortality of the two outbreaks in the Central African Republic (CAR), according to Bercion et al.25. b, For each isolate analysed by XbaI–pulsed-field gel electrophoresis (PFGE), the position within the maximum likelihood phylogenetic tree (reference genome Sd197) is shown. The dendrogram was generated using BioNumerics version 4.0 (Applied Maths, Sint-Martens-Latem, Belgium) and shows the results of cluster analysis on the basis of XbaI–PFGE fingerprinting. Similarity analysis was performed using the Dice coefficient, and clustering analysis was performed by using the unweighted pair-group method with arithmetic averages (UPGMA). c, Original PFGE gel showing the different XbaI–profiles (X1 to X18). The isolates that were whole-genome sequenced are named. Salmonella enterica serotype Braenderup H9812 was used as a molecular size marker (M).
VAKAGA
HAUTE-KOTTO
HAUT- MBOMOU
MBOMOUBASSE-KOTTO
OUAKAKEMO
BAMINGUI-BANGORAN
NANA-GREBIZI
OMBELLA- MPOKO
LOBAYE
OUHAMOUHAM-PENDE
NANA-MAMBERE
SANGHA-MBAERE
MAMBERE-KADEI
BOSSANGOA
NOLA
BAMBARI
BIRAO
OBO
BOALI
BOZOUMBOUAR
BERBERATI
MBAIKI
SIBUT
MOBAYEBANGASSOU
BRIA
NDELE
KAGA-BANDORO
BANGUI
SUDAN
DEMOCRATIC REPUBLIC of CONGO
CHAD
C
A
M
E
R
O
O
N
PAOUA
CONGO 100 km
Jul 2003-Feb 2004 2013 cases, 41 deaths
Aug-Dec 2004 445 cases, 34 deaths
a
b
Key (PFGE XbaI): X01 X02 X08 X09 X10 X11 X12 X13 X14 X15 X16 X17 X18 X5 X6Key: 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076
A1_47CDC_F1372CDC_F1358CDC_F1371
1036_99_PF1SD_G3A1_131
67CDC_F4043
CDC_08_3380CDC_C8546CDC_F4046
CDC_F4042_repeatY489_94
CDC_F3434CDC_ZB497_13397
CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683
CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690
CDC_F1454_2_PF1CDC_ZB37102_04990
CDC_F1605M1PAMA
258_11EDH06DH05
06_130705_733405_7337
CAR5CAR4CAR1
01_1502CDC_F6465
98_343DH03
00_809799_9837
13_0527913_0531803_2815
P2000_1070P2000_1068
00_516606_627600_2688
RKI00_208499_4320
IPD_11654IPD_2109499_10354
E46746_8795_6044
E44600_86CDC_9033_89
AG_896K_1613KO_140
M_J84KO_170KO_73
E85944_92NCID_Az11
As_15858As_15878
MS_836Sumoti
NCID_IDH03216108_06369
KH08_0455NCID_NK2490NCID_BCH518
NCID_21NCID_25
KOI_19TA1
NCID_D44NCID_D1
AR_3349305_7526_PF1
As_1950038_SH05
997_9297_7783
E60750_8999_729
D367_01D482_95
E35066_84KO_21_PF1
AD_45KO_216KO_120
HUS_137K_1438
KO_8Ra_236KO_54KO_46
CRIE_160C_234
CRIE_154MK_803_CE29852_83
687_84E35155_84
25_90CDC_84_787
740_82bis40_81
09_6544KO_47
CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314
CRIE_1203CRIE_519
CDC_9021_89CRIE_1179E30225_83
CDC_87_3333CDC_87_3330
48_Laos3_LaosKO_49
KO_106_PF1KO_225
E3208_76E35062_84
KO_226KO_231
CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201
M_18KO_131
CDC_9025_89219_85
E2427_73E16184_79
151_77E2866_73E3008_73
SZ1250_36_74E15_74
E1347_72Wurzburg_206
F_134sdZM603
CDC_3099_85CDC_BU22X2CDC_BU53M1
Iris_1349_87E39084_85
Iris_KAT1Iris_KAT4
CDC_B9624_PF12104
C5778CDC_Z1
11_8112051650C413C596
RAJ_KC2022
CDC_A5468_PF1E45448_86bis
3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458
E1892_72E2471_73E2560_73
84_036E37992_85
99_7947RKI99_6909
99_570684934
RKI99_884263478
99_9324IPD_23
IPD_1200927_84
7_87_PF116_8612_7715_77
6_80_PF1E7926_77bis
CIP56_33_PF1RIMD3101010
E192_75E174_75CRIE_82
HNCMB_20080CAR7_PF1
CAR18CAR10CAR19
2_90_repeatCIP62_17_PF1CIP54_95_PF1
M430M71
CIP106200_PF1CDC_61_5512
91R14CDC_92_9000
91R17CDC_84_305
23_84CDC_C838CDC_C897
CDC_70_3827CDC_69_3823CDC_69_3818
E1012_74C_164
CDC_79_1480bisCDC_C1041
CDC_C1039bis2735
CDC_1007_74CDC_62_5000_PF1
450CDC_55_986_PF1
M160NIPHW_373NIPHW_18NIPHW_17
NIPHW_374NIPHW_371CRIE_1172CRIE_904
Shigella_dysenteriae_Sd197_v1Sd197_PF1
WR1414Udorn7
CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937
60RHanabusa
M220HNCMB_20003HNCMB_20002
PCM_159CIP55_137_PF1
Sh46_58Sh45_58Sh30_44
IPSP_14_941HNCMB_20001
PCM_126RKI1966
IPSP_14_940IPSP_14_939IPSP_15_159
CIP57_28_PF1CDC_3036_94
17_89_PF193_531
91_312798_4962_PF1
91_312510_879_89
E322_74CIP58_1_PF1
Sh39_47CDC_1199_PF1
M222Sh31_44Sh33_44Sh37_45
CIP55_90SH_Lisbonne
CRIE_954CIP52_27_PF1
Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58
M117M118M188M216M63
M217M189M119
IPSP_15_158M159M165M116
M115_cloned
PFGE XbaI
Dice (Tol 0.5%-0.5%) (H>0.0% S>0.0%) [0.0%-100.0%] PFGE-NotI
N5N6N7N8N2N3N4N1N16
N11N9
N15N10N13N14N12N17
10060 70 80 9050
Relative similarity (%)
B
Dice (Tol 0.5%-0.5%) (H>0.0% S>0.0%) [0.0%-100.0%]
50
PFGE-XbaI
X10X13X8X11X12X14X15X17X18X16
X6X5X4X3X2X1X9
X7
60 70 80 90 100Relative similarity (%)
A
M M M
1,135&
669&
453&
336&
170&
77&
33&
244&
X8&93/531&
X9&99/4320&
X10&99/7947&
X11&99/9837&
X12&99/10354&
X13&99/9324&
X14&00/5166&
X15&00/8097&
X16&01/1502&
X17&03/2815&
X18&97/7783&
X1&CAR
1&X1
&CAR
4&X2
&CAR
5&X3
&X4
&X5
&CAR
7&X5
&CAR
19&
X6&
X6&CAR
10&
X7&
kb&
c
5
Supplementary Fig. 4 Complete antibiotic resistance data. The antibiotic susceptibility testing data (AST), the presence of the different antibiotic resistance genes (ARGs), and ARG-bearing structures are shown for each isolate, according to its position in the phylogeny (maximum likelihood phylogeny based on 14,677 chromosomal SNPs after mapping against the reference genome Sd197). Abbreviations: AMP, ampicillin; STR, streptomycin; SUL, sulfonamides; TMP, trimethoprim; SXT, cotrimoxazole; CHL, chloramphenicol; TET, tetracycline; NAL, nalidixic acid; and CIP, ciprofloxacin. GenBank accession numbers are given after the ARG element name. The sequence of R387 was found at the Wellcome Trust Sanger Institute website. Additional ARG elements were identified in this study. When known, plasmid incompatibility group (inc) is given before the plasmid name.
Keys (Antibiotic susceptibility testing, AST) Resistant Susceptible Unknown
(Antibiotic resistance genes, ARGs; ARG elements) Presence Absence
(SRL-PAI variant) type A type B type C type D type E Not typable (remnant) Absence !!
6
Supplementary Fig. 5
Structure of the different variants of the Shigella resistance locus (SRL) pathogenicity island (SRL-PAI). The data were extracted from the PacBio sequences of isolates 40-81 (SRL-A), CDC ZB4 (SRL-B), 17/89 (SRL-C), CAR10 (SRL-D), and 99-9324 (SRL-E). All the Shigella resistance locus pathogenicity islands (SRL-PAIs) are inserted into the serW tRNA gene. Boxes containing yellow patterns correspond to antibiotic resistance genes. Boxes containing red patterns correspond to insertion sequences. Red labeling indicates insertions/deletions related to SRL-A. The inset shows the structure of the putative ancestral shf locus (as observed in various virulence plasmids from Shigella spp., including pSD1_197) before the insertion of the SRL which is shown above the shf locus. The 8-bp inverted repeats at both ends of the SRL are shown.
SRL-B
SRL-A
SRL-C
SRL-D
SRL-E
virK msbB2 capU
shf IS911
int
aadA1
blaOXA-1
catA1
tetA(B) fecEDCBARI
10,000 bp
Δ 2-6::ISSd1
42::ISSd1
msbB2
shf
capUΔ
ATTTAAGC
virK
57::IS1
48::ISEc12
ISEc23
IS629-IS911::sfiA group II intron
Δ 47 and 6 kb insertion
Δ 47 and 6 kb insertion
Δ 7-9
2.5 kb insertion
IS629::sfiA group II intron
capUΔ
44::ISSd1
IS629::sfiA group II intron
IS629-IS911::sfiA group II intron
TAAATTCG
7
Supplementary Fig. 6
Structure of the chromosomally encoded transposon found in the CDC 87-3330 isolate. The transposon is shown in the upper part of the figure. Its structure is based on the PacBio sequence of isolate CDC 87-3330. Antibiotic resistance genes are boxed in yellow and insertion sequences are boxed in red. The 8-bp direct repeats at the two ends of the transposon are shown. The chromosomal location of the transposon is shown in the bottom part of the figure, using coordinates based on the S. dysenteriae type 1 reference genome Sd197. Regions of similarity to the SRL-PAI are also indicated in purple.
catA1 tetA(B) IS1 IS10
5’-GGCAGAGTG-3’ 5’-GGCAGAGTG-3’
5’-CATACAGGCAGAGTGGCCGTG-3’!4,037,510!
Coordinates based on Sd197 (CP000034)!
Similarity to SRL-PAI (AF326777)!
fec operon
IS2 I R A Ap B C D E
4,029,937! 4,021,412!
IS10Δ IS10Δ ISShdy2
8
Supplementary Fig. 7 Circular maximum likelihood phylogeny of the sequenced S. dysenteriae type 1 genomes rooted on non S. dysenteriae genomes. The tree is based on 140,385 SNPs called after mapping against Sd197. The working names of the genomes (See Supplementary Table 1 for the correspondence with isolate names) are shown. The scale corresponds to 12,634 SNPs.
0.09
CDC_C8546
M116
CDC_F1372
CDC_F2349
CAR5
E.coliO157_Sakai_NC_002695
91R17
40_81
As_19500
CRIE_314
NCID_BCH518
CRIE_240
SZ1250_36_74
CDC_Z1
CDC_F6465
CRIE_1
175
CDC_BU22X
2Iris_KAT
4
E60750_89
CDC_F683
E1892_
72
97_7783
E1012_74
CDC_A54
68
PAMA
CDC_1199
CDC_53_3937
99_729
63478
AD_45
RKI99_88
4217_89
CDC_F3432
99_9837
CRIE_178
CDC_
F4043
Sumoti
12_77
RKI1966
S.dysenteriae1_Sd1617_CP
006736
M63
CDC_
C838
E35062_84
S.flexneri2a_2457T_AE014073
CAR4
IPD_12009
95_6044
M115
Udorn7
CDC_3099_85
SPH_201
M119
Hanabusa
6_80
15_77
E2471_
73
KO_131
7_87
WR1414
CDC_92_9000
HNCMB_20001
CRIE_904
27_84
CIP62_17
Sh39_47
P2000_1068
99_7947
Iris_KAT1
NIPHW_17
CRIE_154
F_134
CDC_80_547
K_1438
C596
DH03
HNCMB_25023
E2866_73
Sh14_42
M222
Wurzburg_206
M117
13_05279
A1_47
CIP52_27
CDC_70_3827
KO_216
84_036
D482_95
NCID_
25
CDC_69_3818
M159
KO_170
09_6544
AG_896
P2000_1070
CDC_F69005_7337
Sh36_45
E37992
_85
E29852_83
DH05
CDC_69_3823
E45448_86
CDC_87_333348_Laos
CRIE_312
CDC_84_305
03_2815
151_77
NIPHW_371
Sh20_42
CDC_55_986
sdZM603
CDC_ZB4 CDC_
F3434
D367_01
98_343
CIP53_136
CDC_
F4042
CDC_87_3330
CDC_
08_3380
98_4962
E192_75
25_90
CAR18
CRIE_233
8
Y489_94
M165
Sh47_58
CDC_F1454_2
C413
E30225_83
E85944_92
99_4320
CDC_F4578
2735
HNCMB_20003
3_79
NCID_Az11
11_81
RAJ_K
CRIE_954
Sh21_42
IPD_23
3_Laos
As_15858
05_7334
M160
00_5166
As_15878
E16184_79CD
C_F3437
NCID_NK2490
E7926_77
CRIE_
160
CDC_F344X1
Iris_1349_87
CDC_F4564
CDC_84_
787
1650
CRIE_244
CRIE_2
458
E39084_85
KO_54
KO_226
CDC_
F4046
99_570
6
CRIE_239
KO_73
E2427_73
IPSP_14_939
SD_G3
CDC_9025_89
CRIE_1203
DH06
E4794_76
10_87
NIPHW_374
E15_74
CDC_9033_89
67
KO_46
IPD_11654
SH_Lisbonne
CRIE_82
PCM_159
RIMD
3101010
SPH_1546
CIP58_1
08_06369
91_3127IPSP_14_940
93_531
HNCM
B_20080
C_234
CDC_
C1041
KOI_19NC
ID_IDH
032161
1205
IPSP_14_941
CIP56_33
CDC_F1358
01_1502
38_SH05
A1_131
NCID_D44
E322_74
E2560_
73
CRIE_1179
S.sonnei_Ss046_NC_007384
CDC_79_1480
687_84
K_1613
IPD_21094
M216
E1347_72
C_164
219_85
E.coliK12_MG1655_NC_000913
CDC_F4568
60R
CAR1
00_8097
CDC_F1371
MS_836
CIP57_28
CAR19
CDC_ZB37
HNCMB_25022
NIPHW_18
S.boydii_Sb227_NC_007613
SPH_2395
M217
AR_33493
E35155_84
HNCMB_20002
C5778
Sh45_58
S.dysenteriae1_M
131649_Sanger
NCID_D1
CIP106200
PCM_126
CAR7
CIP54_95
9_89
CDC_61_5512
CDC_C8558
CDC_
F1605M
1
CDC_
C897
91R14
Sh37_45
KO_231
M189
MK_80
3_C
M220
00_2688
102_04990
91_3125
997_92
M_18
NCID_21
Sd197
CDC_
1007_74
258_11E
Sh13_41
KO_225
RKI99_6909
16_86
Sh29_42
450
E3008_73
99_932
4
C2022
23_84
CDC_F4579
E44600_86
CIP55_137CDC_3036_94
M_J84
Sh30_44
NIPHW_373
E46746_87
M188
06_1307
740_82
RKI00_2084
CAR10
1036_99
Sh46_58
CDC_
C1039
KO_106
99_10354
HUS_137
13_05318
Sh19_42
CRIE_139
CRIE_1
177
CDC_62_5000
Sh24_42
05_7526
CDC_9021_89
KH08_0455
CDC_BU53
M1
E3208_76
KO_47
KO_49
S.dysenteriae1_Sd197_CP000034
M71
IPSP_15_158
Sh25_42
Ra_236
M430
2104
CRIE_1
180
E35066_84
KO_8CRIE_519
2_90
KO_140
KO_21
84934
CDC_B962
4
06_6276
CDC_F2856
TA1
Sh33_44
CDC_F3431
97_13397
E174_75
CIP55_90
E2797_74
KO_120
M118
CRIE_144
Sh23_42CRIE_1172
Sh31_44
IPSP_15_159
SPH_755
0.09
CDC_C8546
M116
CDC_F1372
CDC_F2349
CAR5
E.coliO157_Sakai_NC_002695
91R17
40_81
As_19500
CRIE_314
NCID_BCH518
CRIE_240
SZ1250_36_74
CDC_Z1
CDC_F6465
CRIE_1
175
CDC_BU22X
2Iris_KAT
4
E60750_89
CDC_F683
E1892_
72
97_7783
E1012_74
CDC_A54
68
PAMA
CDC_1199
CDC_53_3937
99_729
63478
AD_45
RKI99_88
4217_89
CDC_F3432
99_9837
CRIE_178
CDC_
F4043
Sumoti
12_77
RKI1966
S.dysenteriae1_Sd1617_CP
006736
M63
CDC_
C838
E35062_84
S.flexneri2a_2457T_AE014073
CAR4
IPD_12009
95_6044
M115
Udorn7
CDC_3099_85
SPH_201
M119
Hanabusa
6_80
15_77
E2471_
73
KO_131
7_87
WR1414
CDC_92_9000
HNCMB_20001
CRIE_904
27_84
CIP62_17
Sh39_47
P2000_1068
99_7947
Iris_KAT1
NIPHW_17
CRIE_154
F_134
CDC_80_547
K_1438
C596
DH03
HNCMB_25023
E2866_73
Sh14_42
M222
Wurzburg_206
M117
13_05279
A1_47
CIP52_27
CDC_70_3827
KO_216
84_036
D482_95
NCID_
25
CDC_69_3818
M159
KO_170
09_6544
AG_896
P2000_1070
CDC_F69005_7337
Sh36_45
E37992
_85
E29852_83
DH05
CDC_69_3823
E45448_86
CDC_87_333348_Laos
CRIE_312
CDC_84_305
03_2815
151_77
NIPHW_371
Sh20_42
CDC_55_986
sdZM603
CDC_ZB4 CDC_
F3434
D367_01
98_343
CIP53_136
CDC_
F4042
CDC_87_3330
CDC_
08_3380
98_4962
E192_75
25_90
CAR18
CRIE_233
8
Y489_94
M165
Sh47_58
CDC_F1454_2
C413
E30225_83
E85944_92
99_4320
CDC_F4578
2735
HNCMB_20003
3_79
NCID_Az11
11_81
RAJ_K
CRIE_954
Sh21_42
IPD_23
3_Laos
As_15858
05_7334
M160
00_5166
As_15878
E16184_79
CDC_F3437
NCID_NK2490
E7926_77
CRIE_
160
CDC_F344X1
Iris_1349_87
CDC_F4564
CDC_84_
787
1650
CRIE_244
CRIE_2
458
E39084_85
KO_54
KO_226
CDC_
F4046
99_570
6
CRIE_239
KO_73
E2427_73
IPSP_14_939
SD_G3
CDC_9025_89
CRIE_1203
DH06
E4794_76
10_87
NIPHW_374
E15_74
CDC_9033_89
67
KO_46
IPD_11654
SH_Lisbonne
CRIE_82
PCM_159
RIMD
3101010
SPH_1546
CIP58_1
08_06369
91_3127IPSP_14_940
93_531
HNCM
B_20080
C_234
CDC_
C1041
KOI_19NC
ID_IDH
032161
1205
IPSP_14_941
CIP56_33
CDC_F1358
01_1502
38_SH05
A1_131
NCID_D44
E322_74
E2560_
73
CRIE_1179
S.sonnei_Ss046_NC_007384
CDC_79_1480
687_84
K_1613
IPD_21094
M216
E1347_72
C_164
219_85
E.coliK12_MG1655_NC_000913
CDC_F4568
60R
CAR1
00_8097
CDC_F1371
MS_836
CIP57_28
CAR19
CDC_ZB37
HNCMB_25022
NIPHW_18
S.boydii_Sb227_NC_007613
SPH_2395
M217
AR_33493
E35155_84
HNCMB_20002
C5778
Sh45_58
S.dysenteriae1_M
131649_Sanger
NCID_D1
CIP106200
PCM_126
CAR7
CIP54_95
9_89CD
C_61_5512
CDC_C8558
CDC_
F1605M
1
CDC_
C897
91R14
Sh37_45
KO_231
M189
MK_80
3_C
M220
00_2688
102_04990
91_3125
997_92
M_18
NCID_21
Sd197
CDC_
1007_74
258_11E
Sh13_41
KO_225
RKI99_6909
16_86
Sh29_42
450
E3008_73
99_932
4
C2022
23_84
CDC_F4579
E44600_86
CIP55_137CDC_3036_94
M_J84
Sh30_44
NIPHW_373
E46746_87
M188
06_1307
740_82
RKI00_2084
CAR10
1036_99
Sh46_58
CDC_
C1039
KO_106
99_10354
HUS_137
13_05318
Sh19_42
CRIE_139
CRIE_1
177
CDC_62_5000
Sh24_42
05_7526
CDC_9021_89
KH08_0455
CDC_BU53
M1
E3208_76
KO_47
KO_49
S.dysenteriae1_Sd197_CP000034
M71
IPSP_15_158
Sh25_42
Ra_236
M430
2104
CRIE_1
180
E35066_84
KO_8CRIE_519
2_90
KO_140
KO_21
84934
CDC_B962
4
06_6276
CDC_F2856
TA1
Sh33_44
CDC_F3431
97_13397
E174_75
CIP55_90
E2797_74
KO_120
M118
CRIE_144
Sh23_42CRIE_1172
Sh31_44
IPSP_15_159
SPH_755
E. coli O157:H7 Sakai
E. coli K-12 MG1655
S. flexneri 2a 2457T
S. boydii Sb227
S. sonnei Ss046
0.09
9
Supplementary Fig. 8
Capacity of M115 to generate mutations conferring resistance to rifampicin.
The results are presented as mean values (± standard errors) for two independent experiments. The E. coli ECOR48 (CIP 106023) strain was used as a strong mutator positive control, the S. dysenteriae type 1 97-13397 isolate was used as a putative strong mutator isolate (deletion of the mutS gene), and the S. dysenteriae type 1 isolates M116 and Sd197 were used as putative normomutator isolates (integrity of the mutS, mutH, mutL and uvrD methyl-directed mismatch repair genes).
Strain
Freq
uenc
y of
mut
atio
n to
rifa
mpi
cin
resi
stan
ce
1.00E-09
1.00E-08
1.00E-07
1.00E-06
1.00E-05
ECOR48 Sd197 M115 M116 97-13395 7
10
Supplementary Fig. 9 Circular maximum likelihood phylogeny of the entire set of sequenced S. dysenteriae type 1 genomes. The tree is based on 14,677 SNPs called after mapping against Sd197 (marked with a blue triangle). The tree was rooted on M115, which is most closely related to the S. dysenteriae type 1 ancestral strain. The intercontinental transmission events (T1 to T8) and the bootstrap values are shown.
0.02
C2022_Rwanda_1984
D367_01_Eritrea_2001
CDC_F2856_Malawi_1995
CDC_9025_89_Sau
di_Arabia_1988
RIMD3101010_Bangladesh_1977CR
IE_1172_Vietnam_1983
HNCM
B_25023_Vietnam_1971
CDC_F4042_Kenya_1997
E39084_85_Democratic_Republic_Congo_1985
CDC_ZB37_Zambia_1991
M_18_Thailand_
1992
E2427_73_India_1973
60R_Unknown_1940
91_3125_Madagascar_1991
Udorn7_Thailand_1966
NCID_Az11_India_2003
67_Democratic_Republic_Congo_1994
E2471_73_UK_1973
E85944_92_India_1992
Sh45_58_Tajikistan_1940
E45448_86_India_1986
91R14_Guatem
ala_199115_77_Senegal_1977
DH03_Central_African_Republic_1996
WR1414_Unknown_1939
IPSP_14_940_Russia_1958
HNCM
B_20003_Hungary_1953
CDC_3099_85_Sudan_1985
CDC_F1358_Burundi_1993
98_343_Mali_1998
SPH_755_Tha
iland_1993
CRIE_240
_Uzbekistan_19
86
CDC_F1605M1_Angola_1994
E30225
_83_Pa
kistan_
1983
23_84_Mexico_1984
NCID_D
1_India_2002
1036_99_Burundi_1999
E2797_74_
India_1974
M71_UK_1916
CDC_79_1480bis_USA_1979
CIP55_137_Romania_1935
CDC_3036_94_USA_1994
KO_54_Bangladesh_1985
C_234_India_1984
99_4320_Mali_1999
M217_Unknown_1917
M115_UK_1926Wurzburg_206_Unknown_1972
HNCMB_20080_Unknown_1954
MS_836_Bangladesh_2003
CDC_69_3818_Guatem
ala_1969
DH05_Central_African_Republic_1996
Sh47_58_Uk
raine_1935
E7926_77bis_Somalia_1977
NIPH
W_373_Poland_1945
A1_131_Democratic_Republic_Congo_1994
05_7334_Central_African_Republic_2005
sdZM603_Somalia_1984
10_87_Madagascar_1987
Iris_1349_87_Rwanda_1987
KO_8_Bangladesh_1993
M188_UK_1916
1650_Rwanda_1985
CAR5_Central_African_Republic_2004
SPH_1546_T
hailand_199
4
IPSP_15_159_Russia_1956
91_3127_Madagascar_1991
CRIE_154_Estonia_1988
CDC_9
021_89
_Saudi_Ar
abia_1
988
HNCM
B_20002_Unknown_1953
CDC_ZB4_Zambia_1991
IPSP_14_941_Russia_1958
CRIE_144_Uzbekistan_1988
11_81_Democratic_Republic_Congo_1981
PCM_126_Unknown_1972
Sh24_42_Cz
ech_Republi
c_1942
CRIE_117
9_Tajikista
n_1985
25_90_India_1990
CDC_87
_3330_
Thailan
d_1987
SPH_2395_T
hailand_1996
08_06369_India_2008
NCID_BCH
518_India_1995
KO_131_Banglade
sh_1987
PAMA_Cameroon_1998
3_Laos
_Laos_
1988
SPH_201_Thail
and_1992
CDC_C1039bis_Mexico_1988
CDC_8
7_3333
_Thaila
nd_198
7Sd197_Reference_China_1950
M117_UK_19
16
CDC_53_3937_Vietnam_1953
CDC_F4579_South_Africa_1993
E46746_87_India_1987
M216_UK_1916
M119_UK_1916
AG_896_Nepal_1986
CAR18_Central_African_Republic_2004
2735_U
SA_1974
00_5166_Cote_d_Ivoire_2000
CDC_92_9000_Panama_1991
CDC_F6465_Cote_d_Ivoire_1999
CRIE_82_Bangladesh_1974
CDC_F690_Mozambique_1993
687_84_Pakistan_1984
M222_
Unknow
n_1926
As_15858_Bangladesh_2003
Sh14_42
_Germa
ny_1942
M165_Unknown_1930
258_11E_Cameroon_1998
Ra_236_Bangladesh_1986
997_92_Pakistan_1992
D482_95_Afghanistan_1995
IPD_21094_Senegal_2001
KOI_19_India_2002
151_77_Sri_Lanka_1977
KO_140_Bangladesh_1997
2104_Rwanda_1985
A1_47_Democratic_Republic_Congo_1994
CDC_F3432_South_Africa_1994
CRIE_1180_Russia_1985
C413_Rwanda_1986
CRIE_312_Uzbekistan_1986
6_80_Senegal_1979
NCID_25_India_2002
CRIE_1177_Tajikistan_1985
CDC_F4578_South_Africa_1993
Sh19_42_
Czech_Re
public_19
42
CDC_C8546_Burundi_1993
M189_UK_1916
CDC_F4043_Kenya_1997
CDC_F2349_Zimbabwe_1993
E35066_84_India_1984
13_05279_Niger_2011
IPD_12009_Senegal_1997
CDC_61_5512_Unknown_1961
NCID_IDH032161_India_2010
CDC_84_305_Mexico_1984
M430_Unknown_1926
Sumoti_Bangladesh_2004
CRIE_239_Uzbekistan_1979
KO_49_
Banglad
esh_198
5
E29852_83_India_1983
CDC_F4568_South_Africa_1997
CAR1_Central_African_Republic_2003
05_7337_Central_African_Republic_2005
E322_
74_Algeria
_1974
9_89_M
adagascar_19
89
M160_Un
know
n_1917
99_729_Djibouti_1999
99_9837_Senegal_1999
CDC_F3434_South_Africa_1994
CDC_C1041_Mexico_1988
Sh29_42_
Czech_Re
public_194
2
CRIE_314
_Uzbe
kistan_19
86
E35155_84_India_1984
CDC_80_5
47_India_1
980
CIP53_136_Vietnam_1952
CRIE_2338_Russia_1987
KO_120_Bangladesh_1997
CRIE_244
_Uzbekistan_19
86
27_84_Burkina_Faso_1984
Hanabusa_Japan_1927
CDC_F1454_2_Zimbabwe_1993
Sh36_45_
Czech_Re
public_19
45
M63_UK_1916
740_82_Greece_1982
13_05318_Niger_2011
CDC_08_3380_USA_2008
Sh46_58_Belarus_1934
CRIE_160_Estonia_1988
CRIE_178_Uzbekistan_1986
C596_Rwanda_1985
CDC_BU22X2_Burundi_1990
M_J84_Bangladesh_1994
95_6044_India_1995
CIP58_
1_Tunisia_19
58
KO_106_
India_197
8
3_79_India_1979
40_81_India_1981
Sh21_42_
Czech_Re
public_19
42
SZ1250_36_74_Unknown_1974
Iris_KAT1_Democratic_Republic_Congo_1985
CAR4_Central_African_Republic_2003
CRIE_139_Uzbekistan_1988
KO_46_Bangladesh_1985
C5778_Rwanda_1984
06_6276_Togo_2006
93_531_Madagascar_1993
99_10354_Mauritania_1999
E1347_72_UK_1972
CDC_C897_Mexico_1988
RKI00_2084_North_Africa_2000
CDC_BU53M1_Burundi_1990
01_1502_Burkina_Faso_2001
E3208_76
_UK_197
6
CDC_F3431_Zambia_1995
38_SH05_India_2005
CDC_70_3827_El_Salvador_1970
KO_225_
India_198
2
102_04990_Angola_2002
P2000_1070_Sierra_Leone_2000
450_Un
know
n_1966
IPD_11654_Senegal_1998
2_90_Niger_1990
TA1_India_2002
NCID_21_India_2002
E4794_76_U
K_1976
CAR7_Central_African_Republic_2004
48_Lao
s_Laos
_1988
91R17_Guatem
ala_1991
KO_231_In
dia_1983
CDC_F4046_Kenya_1997
F_134_Somalia_1983
IPD_23_Senegal_1998
CRIE_9
54_Unk
nown_1
968
IPSP_14_939_Russia_1958
00_8097_Senegal_2000
05_7526_India_2005
CDC_B9624_Rwanda_1988
Sh37_4
5_Czec
h_Repu
blic_19
45
RKI1966_Unknown_1966
RKI99_6909_Egypt_1999
PCM_159_Hungary_1944
E37992_85_India_1985
SD_G3_Rwanda_1994
KO_47_Bangladesh_1985
219_85_Portugal_1985
CDC_9033_89_Nepal_1988
CDC_119
9_Mediterrane
an_area_1
944
CDC_F683_Mozambique_1993
M220_Malta_1917
Y489_94_UK_1994
CDC_F3437_South_Africa_1994
CRIE_1175_Uzbekistan_1985
E44600_86_UK_1986
As_15878_Bangladesh_2003
AD_45_Thailand_1993
E35062_8
4_India_1
984
CDC_Z1_Democratic_Republic_Congo_1983
CRIE_120
3_Russia_198
7
RKI99_8842_Egypt_1999
CIP52_
27_Unk
nown_1
944
KO_226_
India_198
2
Sd197_China_1950
NIPH
W_374_Poland_1953
17_89_Madagascar_1989
Sh13_41
_German
y_1941
K_1613_Bangladesh_2000
KO_73_Bangladesh_1997
99_9324_Egypt_1999
P2000_1068_Sierra_Leone_2000
KO_170_Bangladesh_1997
CRIE_2458_Russia_1987
E174_75_Bangladesh_1975
63478_Israel_1997
00_2688_Congo_2000MK_803_C_Thailand_1994
CIP57_28_Unknown_1934
Sh39_4
7_Jugo
slavia_19
47
CDC_F4564_South_Africa_1997
M118_UK_1916
Sh30_44_Czech_Republic_1944
CIP55_
90_Rom
ania_19
35
HNCM
B_25022_Vietnam_1971
K_1438_Bangladesh_2000
E192_75_Bangladesh_1975
CIP56_33_Ethiopia_1956
M159_France_1915
97_7783_Pakistan_1997
CAR19_Central_African_Republic_2004
Sh23_42_Cz
ech_Republi
c_1942
NCID_D
44_India_2002
12_77_Portugal_1977
NIPH
W_18_Poland_1953
09_6544_Nepal_2009
E3008_73_India_1973
NIPH
W_17_Poland_1949
E15_74_UK_1974
CRIE_904_Vietnam
_1987
Sh33_4
4_Czec
h_Repu
blic_19
44
CDC_A5468_Democratic_Republic_Congo_1981
HUS_137_Bangladesh_1998
CDC_C838_USA_1988
KO_216_Bangladesh_1985
CDC_
55_986_M
exico
_1955
CIP62_17_Cameroon_1962
99_5706_Egypt_1999
CDC_84_787_Nepal_1984
CIP54_95_Burkina_Faso_1954
DH06_Central_African_Republic_1996
As_19500_Nepal_2003
CRIE_519
_Latvia_19
86
99_7947_Egypt_1999
Sh25_42_C
zech_Repu
blic_1942
1205_Rwanda_1985Iris_KAT4_Democratic_Republic_Congo_1985
NCID_NK2490_India_2002
M116_UK_1916
84_036_China_1984
CIP106200_UK_1917
CAR10_Central_African_Republic_2004
NIPH
W_371_Poland_1949C_164_M
exico_1972 CDC_
1007_74_US
A_1974
06_1307_Central_African_Republic_2006
98_4962_Madagascar_1998
E2560_73_Pakistan_1973
84934_Israel_2001
RAJ_K_Democratic_Republic_Congo_1985
HNCM
B_20001_Hungary_1935
CDC_C8558_Zambia_1992
E16184_79_Sri_Lanka_1979
Sh20_42_
Czech_R
epublic_1
942
03_2815_Niger_2003
E60750_89_Pakistan_1989
CDC_F344X1_Malawi_1993
CDC_F1371_Burundi_1993
SH_Lisb
onne_F
rance_1
921
CDC_
62_5000_Guatemala_1962
16_86_Mali_1986
CDC_F1372_Burundi_1993
Sh31_4
4_Czec
h_Repu
blic_19
44
7_87_Mali_1987
IPSP_15_158_Russia_1956
KO_21_India_1981
97_13397_France_1997
E2866_73_India_1973CDC_69_3823_G
uatemala_1969
KH08_0455_Nepal_2008
E1012_74_Mexico_1974
AR_33493_Nepal_2002
E1892_72_UK_1972
90
6883
100
100
98
100
100
59
74100
100
100
99
44
98
100
100
42
100
39
95
100
100
100
100
100
30
100
100
86
11
100
98
100
100
100
100
38
65
96
100
99
100
76
94
100
60
100
97
100
87
100
8387
100
100
100
83
67
100
43
100
100
100
100
74
100
10050
100
96
100
100
100 100
100
100
64
100
77
67
100
82
80
63
98
62
100
79
100
10068
100
89
73
100
100
74
100
100
20
100
94
63
99
41
10028
100
94
84
100
100
96
86
100
94
100
26
100
99
87
100
80
100
100
76
81
97
100
91
100
68
47
99
58
74
95
100
93
100
100
100100
100
100
92
85
100
96
86
98
100
98
10092
66
70
100
100100
98
100
100
92
71
60
9260
37
55
100
100
99
59
100
26100
100
100
45
99
60
100
100
72
71
100
100
100
100
100
100
99
56
91100
100
100
100
61
65
58
100
98
94
94
60
73
100
96
61
99
100
58
88
100
76
100
100
100
100
100
95
100
100
100
72
100
26
100
100
99
100
100
94
100
100
100
100
100
100
100
96
61100
100
100
100
100
100
100
100
100
100
100
100
47
64
18
99
71
100
34
84
100
96
72
100
83
100
98
100
100
100
100
100
86
100
96
93100
100
100
100
35
88
78
100
100
71
100
94
58
100
10084
100
100
59
84
95
100
100
94
100
100
99
100
100
92
72
99
57
97
100
72
100
100
100
100
43
100
100
64
18
100
100
T1
T2
T3 T4
T5 T6
T7
T8
11
Supplementary Fig. 10 Circular maximum likelihood phylogeny of the entire set of sequenced S. dysenteriae type 1 genomes plus the newly published reference genome Sd1617. The tree is based on 15,752 SNPs called after mapping against Sd1617 (marked with a blue triangle). The tree was rooted on M115, which is most closely related to the S. dysenteriae type 1 ancestral strain. The intercontinental transmission events (T1 to T8) and the bootstrap values are shown.
0.02
NIPH
W_374_Poland_1953
E45448_86_India_1986
KOI_19_India_2002
IPD_12009_Senegal_1997
M159_France_1915
HNCM
B_20001_Hungary_1935
CDC_F6465_Cote_d_Ivoire_1999
IPSP_14_939_Russia_1958
HNCMB_20080_Unknown_1954
CDC_F4579_South_Africa_1993
99_9324_Egypt_1999
CDC_C1039bis_Mexico_1988
NCID_D1_India_2002
Hanabusa_Japan_1927
2735_U
SA_1974
17_89_Madagascar_1989
Sh39_4
7_Jugo
slavia_19
47
997_92_Pakistan_1992
2104_Rwanda_1985
CRIE_240
_Uzbekistan_19
86
E2427_73_India_1973
9_89_M
adagascar_19
89
CDC_F2856_Malawi_1995
C5778_Rwanda_1984
CDC_92_9000_Panama_1991
M188_UK_1916
PAMA_Cameroon_1998
A1_47_Democratic_Republic_Congo_1994
CIP56_33_Ethiopia_1956
Sh31_4
4_Czec
h_Repu
blic_19
44
sdZM603_Somalia_1984
MS_836_Bangladesh_2003
HNCM
B_20002_Unknown_1953
E60750_89_Pakistan_1989
CDC_C1041_Mexico_1988
CDC_F1371_Burundi_1993
CAR5_Central_African_Republic_2004
M63_UK_1916
KO_140_Bangladesh_1997
K_1438_Bangladesh_2000CDC_B9624_Rwanda_1988
C596_Rwanda_1985
E15_74_UK_1974
CIP58_
1_Tunisia_19
58
DH03_Central_African_Republic_1996
D482_95_Afghanistan_1995
KO_131_Banglade
sh_1987
09_6544_Nepal_2009
13_05279_Niger_2011
KO_73_Bangladesh_1997
12_77_Portugal_1977CD
C_53_3937_Vietnam_1953
KO_216_Bangladesh_1985
CDC_F690_Mozambique_1993
60R_Unknown_1940
00_8097_Senegal_2000
KO_170_Bangladesh_1997
CDC_F3432_South_Africa_1994
CDC_ZB37_Zambia_1991RKI1966_Unknown_1966
E3208_76
_UK_197
6
F_134_Somalia_1983
00_2688_Congo_2000
CIP106200_UK_1917
08_06369_India_2008
CDC_70_3827_El_Salvador_1970
Y489_94_UK_1994
CDC_C8558_Zambia_1992
KO_106_
India_197
8
M222_U
nknown_
1926
Sh24_42_
Czech_Re
public_19
42
84_036_China_1984
WR1414_Unknown_1939
NCID_Az11_India_2003
M220_Malta_1917
NCID_25_India_2002
CDC_A5468_Democratic_Republic_Congo_1981
CDC_9025_89_Sau
di_Arabia_1988
C2022_Rwanda_1984
Sd197_China_1950
CRIE_144_Uzbekistan_1988
CDC_BU53M1_Burundi_1990
SPH_755_Tha
iland_1993
E2866_73_India_1973
CRIE_2458_Russia_1987
CRIE_117
9_Tajikista
n_1985
16_86_Mali_1986
CIP54_95_Burkina_Faso_1954
E35155_84_India_1984
CDC_61_5512_Unknown_1961
CDC_
1007_74_US
A_1974
NIPH
W_373_Poland_1945
Sd1617_R
eference_G
uatemala_1969
SPH_2395_T
hailand_1996
99_9837_Senegal_1999
CDC_Z1_Democratic_Republic_Congo_1983
KO_21_India_1981
Sh45_58_Tajikistan_1940
DH05_Central_African_Republic_1996
E35062_8
4_India_1
984
CAR18_Central_African_Republic_2004
E1347_72_UK_1972
99_7947_Egypt_1999
E174_75_Bangladesh_1975
CRIE_2338_Russia_1987
91_3127_Madagascar_1991
Sh36_45_Cz
ech_Republi
c_1945
SD_G3_Rwanda_1994
CDC_8
7_3333
_Thaila
nd_198
7
M115_UK_1926
05_7337_Central_African_Republic_2005
E35066_84_India_1984
97_7783_Pakistan_1997
CDC_69_3823_Guatem
ala_1969
IPSP_15_158_Russia_19
56
05_7334_Central_African_Republic_2005
13_05318_Niger_2011
KO_231_
India_198
3
As_15878_Bangladesh_2003
D367_01_Eritrea_2001
CDC_119
9_Mediterrane
an_area_1
944
CDC_79_1480bis_USA_1979
CDC_F1454_2_Zimbabwe_1993
67_Democratic_Republic_Congo_1994
M160_Un
know
n_1917
CIP57_28_Unknown_193415_77_Senegal_1977
NCID_NK2490_India_2002
M116_UK_1916
CDC_84_787_Nepal_1984
00_5166_Cote_d_Ivoire_2000
CDC_F1372_Burundi_1993
Sh25_42_Cz
ech_Republi
c_1942
CDC_F4578_South_Africa_1993
CDC_F4568_South_Africa_1997CIP55_137_Romania_1935
E30225
_83_Pa
kistan_
1983
HUS_137_Bangladesh_1998
RKI99_6909_Egypt_1999
CRIE_904_Vietnam
_1987
27_84_Burkina_Faso_1984
CDC_
62_5000_Guatemala_1962
CRIE_519
_Latvia_19
86
KO_120_Bangladesh_1997
TA1_India_2002 E44600_86_UK_1986
CDC_F3434_South_Africa_1994
C_164_Mexico_1972
KO_46_Bangladesh_1985
99_729_Djibouti_1999
151_77_Sri_Lanka_1977
CDC_F4564_South_Africa_1997CDC_9
021_89
_Saudi_Ar
abia_1
988
11_81_Democratic_Republic_Congo_1981
CAR1_Central_African_Republic_2003
HNCM
B_25022_Vietnam_1971
102_04990_Angola_2002
95_6044_India_1995
Sumoti_Bangladesh_2004
03_2815_Niger_2003
MK_803_C_
Thailand_1994
M118_UK_1916
M430_Unknown_1926
Sh23_42_
Czech_R
epublic_1
942
SZ1250_36_74_Unknown_1974
NCID_BCH
518_India_1995
740_82_Greece_1982
23_84_Mexico_1984
CRIE_314
_Uzbe
kistan_19
86
IPSP_14_941_Russia_1958
97_13397_France_1997
KO_225_
India_198
2
63478_Israel_1997
CDC_
55_986_M
exico
_1955
IPD_21094_Senegal_2001
IPSP_14_940_Russia_1958
687_84_Pakistan_1984
1205_Rwanda_1985
Sh19_42_
Czech_Re
public_194
2
CRIE_139_Uzbekistan_1988
As_19500_Nepal_2003
SPH_1546_T
hailand_199
4
KO_226_In
dia_1982
E1012_74_Mexico_1974
E3008_73_India_1973
25_90_India_1990
258_11E_Cameroon_1998
7_87_Mali_19876_80_Senegal_1979
KO_49_
Banglad
esh_198
5
KO_54_Bangladesh_1985
E29852_83_India_1983
NIPH
W_17_Poland_1949
CRIE_160_Estonia_1988
RKI00_2084_North_Africa_2000
Iris_1349_87_Rwanda_1987
E4794_76_U
K_1976
CDC_F344X1_Malawi_1993
E37992_85_India_1985
99_5706_Egypt_1999
CDC_F683_Mozambique_1993
CDC_3099_85_Sudan_1985
38_SH05_India_2005
CRIE_1177_Tajikistan_1985
E322_
74_Algeria
_1974
M217_Unknown_1917
IPD_23_Senegal_1998
CRIE_239_Uzbekistan_1979
E39084_85_Democratic_Republic_Congo_1985
RKI99_8842_Egypt_1999
CDC_F3431_Zambia_1995
M216_UK_1916
3_79_India_1979
NCID_IDH
032161_India_2010
98_4962_Madagascar_1998
P2000_1068_Sierra_Leone_2000
05_7526_India_2005
A1_131_Democratic_Republic_Congo_1994
HNCM
B_20003_Hungary_1953
CDC_3036_94_USA_1994
Sh20_42_
Czech_Re
public_19
42
E2560_73_Pakistan_1973
NIPH
W_371_Poland_1949
CIP55_
90_Rom
ania_19
35
98_343_Mali_1998
AG_896_Nepal_1986
CRIE_1175_Uzbekistan_1985
HNCM
B_25023_Vietnam_1971
CDC_F4042_Kenya_1997
Sh33_4
4_Czec
h_Repu
blic_19
44
CRIE_1180_Russia_1985
M_18_Thailand_
1992
CDC_F2349_Zimbabwe_1993
RIMD3101010_Bangladesh_1977
Iris_KAT4_Democratic_Republic_Congo_1985
AR_33493_Nepal_2002
P2000_1070_Sierra_Leone_2000
CDC_69_3818_Guatem
ala_1969
NCID_D44_India_2002
CDC_80_
547_India
_1980
M119_UK_1916
CDC_F4046_Kenya_1997
CRIE_312_Uzbekistan_1986
Sh21_42_
Czech_Re
public_19
42
CDC_F3437_South_Africa_1994
M71_UK_1916
Wurzburg_206_Unknown_1972
IPSP_15_159_Russia_1956
CDC_ZB4_Zambia_1991
CDC_84_305_Mexico_1984
99_10354_Mauritania_1999CIP53_136_Vietnam_1952
CRIE_178_Uzbekistan_1986
99_4320_Mali_1999
450_Un
know
n_1966
CRIE_244
_Uzbekistan_19
86
Iris_KAT1_Democratic_Republic_Congo_1985
C413_Rwanda_1986
KO_47_Bangladesh_1985
CDC_C8546_Burundi_1993
PCM_159_Hungary_1944
DH06_Central_African_Republic_1996
CDC_08_3380_USA_2008
E7926_77bis_Somalia_1977CAR7_Central_African_Republic_2004
E85944_92_India_1992
M_J84_Bangladesh_1994
CDC_C897_Mexico_1988
219_85_Portugal_1985
10_87_Madagascar_1987
CDC_F1358_Burundi_1993
CDC_9033_89_Nepal_1988
KO_8_Bangladesh_1993
3_Laos
_Laos_
1988
48_Lao
s_Laos
_1988
CDC_87
_3330_
Thailan
d_1987
91R14_Guatem
ala_1991 CRIE_1172_Vietnam_1983
K_1613_Bangladesh_2000
NCID_21_India_2002
06_6276_Togo_2006
CIP52_
27_Unk
nown_1
944
As_15858_Bangladesh_2003
1036_99_Burundi_1999
M165_Unknown_1930
Udorn7_Thailand_1966
CDC_F1605M1_Angola_1994
M117_UK_19
16
CAR4_Central_African_Republic_2003
Sh14_42
_German
y_1942
E192_75_Bangladesh_1975
CAR10_Central_African_Republic_2004
Sh13_41
_Germa
ny_1941
M189_UK_1916
CIP62_17_Cameroon_1962
NIPH
W_18_Poland_1953
Sh29_42_C
zech_Repu
blic_1942
E46746_87_India_1987
CDC_BU22X2_Burundi_1990
CRIE_120
3_Russia_198
7
RAJ_K_Democratic_Republic_Congo_1985
40_81_India_1981
E2797_74_
India_1974
Sh46_58_Belarus_1934
E16184_79_Sri_Lanka_1979
2_90_Niger_1990
06_1307_Central_African_Republic_2006
CRIE_154_Estonia_1988 Ra_236_Bangladesh_1986
C_234_India_1984
01_1502_Burkina_Faso_2001
91_3125_Madagascar_1991
KH08_0455_Nepal_2008
PCM_126_Unknown_1972
Sh37_4
5_Czec
h_Repu
blic_19
45
84934_Israel_2001
Sh47_58_Uk
raine_1935
CAR19_Central_African_Republic_2004
IPD_11654_Senegal_1998
CRIE_82_Bangladesh_1974
E1892_72_UK_1972
91R17_Guatem
ala_1991
SH_Lisb
onne_F
rance_
1921
Sh30_44_Czech_Republic_1944
CDC_C838_USA_1988
CDC_F4043_Kenya_1997
AD_45_Thailand_1993
SPH_201_Thail
and_1992
1650_Rwanda_1985
CRIE_9
54_Unk
nown_1
968
93_531_Madagascar_1993
E2471_73_UK_1973
84
100
100
100
100
73
95
100
64
100
100
75
66100
100
100
100
100
100 39
100
100 66
18
100
16
100
100
100
9779
67
100
100
100
100
100
100
100
100
90
10035
100
100
65
81
86
100
100
89
99
100
100
55
100
85
100
100
74
100
19100
99
100
16
67
38
100
37
100
19
37
10091
100
99
33
100
100
100
100
100
73
100
7
100
100
100
79
19
69
100
4
93
21
100
34
97
100
100
100
93
61
88
100
88
100
100
79
100
90
100
100
100
93
92
47
18
88
94
40
100
74
29
100
100
100
100
100
100
73
81
70
82
84
98
100
56
100
55
73
89
100
78
97
100
93
95
100
8
71
100
99
100
100
69
99
59
96
27
89
88
31
100100
100
95
100
114
98
100 100
100
100
78
89
37
65
7
61
100
73
100
100
91
100
72100
100
62
100
100
100
76
86
100
88
5245
5
100
100
100
93
4
100
40
21
100
100
100
100
100
100
100
36
96
98
76
100
57
98
59
100
72
97
100
100
100
65
42
95
100
100
100
64
39100
100
93
100
73
82
79
63
92
100
100
15
100
55
100
78
65
99
63
100 91
86
98
9
92100
55
100
100
100
10054
97
100
86
100
100
100
100
40
100
100
100
81
100
100
100
49
93
92
100
100
100
100100
100
86
100
71
61
62
98
38
100
97
98
100
10069
100
100
100
68
96
98
100100
100
76
100
99
100
69
88
100
100 100
100
100
100
T1
T2
T3 T4
T5 T6
T7
T8
12
Supplementary Fig. 11 Comparison of the maximum likelihood trees obtained after mapping against two different S. dysenteriae type 1 reference genomes. The tree on the left is based on 14,677 SNPs called after mapping against Sd197 (lineage IIIa). The tree on the right is based on 15,752 SNPs called after mapping against Sd1617 (lineage IIIb). Coloured boxes mark each of the lineages (I, II, III, IV), respectively; yellow, blue, green, red.
0.0056
A1_47CDC_F1372CDC_F1358CDC_F1371
1036_99_PF1SD_G3A1_131
67CDC_F4043
CDC_08_3380CDC_C8546CDC_F4046
CDC_F4042_repeatY489_94
CDC_F3434CDC_ZB497_13397
CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683
CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690
CDC_F1454_2_PF1CDC_ZB37102_04990
CDC_F1605M1PAMA
258_11EDH06DH05
06_130705_733405_7337CAR5CAR4CAR1
01_1502CDC_F6465
98_343DH03
00_809799_983713_0527913_0531803_2815
P2000_1070P2000_1068
00_516606_627600_2688
RKI00_208499_4320
IPD_11654IPD_2109499_10354
E46746_8795_6044
E44600_86CDC_9033_89
AG_896K_1613KO_140M_J84
KO_170KO_73
E85944_92NCID_Az11As_15858As_15878MS_836Sumoti
NCID_IDH03216108_06369
KH08_0455NCID_NK2490NCID_BCH518
NCID_21NCID_25KOI_19
TA1NCID_D44NCID_D1AR_33493
05_7526_PF1As_1950038_SH05997_9297_7783
E60750_8999_729
D367_01D482_95
E35066_84KO_21_PF1
AD_45KO_216KO_120
HUS_137K_1438KO_8
Ra_236KO_54KO_46
CRIE_160C_234
CRIE_154MK_803_CE29852_83
687_84E35155_84
25_90CDC_84_787
740_82bis40_81
09_6544KO_47
CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314CRIE_1203CRIE_519
CDC_9021_89CRIE_1179E30225_83
CDC_87_3333CDC_87_3330
48_Laos3_LaosKO_49
KO_106_PF1KO_225
E3208_76E35062_84
KO_226KO_231
CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201
M_18KO_131
CDC_9025_89219_85
E2427_73E16184_79
151_77E2866_73E3008_73
SZ1250_36_74E15_74
E1347_72Wurzburg_206
F_134sdZM603
CDC_3099_85CDC_BU22X2CDC_BU53M1Iris_1349_87E39084_85Iris_KAT1Iris_KAT4
CDC_B9624_PF12104
C5778CDC_Z111_8112051650C413C596
RAJ_KC2022
CDC_A5468_PF1E45448_86bis
3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458E1892_72E2471_73E2560_7384_036
E37992_8599_7947
RKI99_690999_570684934
RKI99_884263478
99_9324IPD_23
IPD_1200927_84
7_87_PF116_8612_7715_77
6_80_PF1E7926_77bis
CIP56_33_PF1RIMD3101010
E192_75E174_75CRIE_82
HNCMB_20080CAR7_PF1
CAR18CAR10CAR19
2_90_repeatCIP62_17_PF1CIP54_95_PF1
M430M71
CIP106200_PF1CDC_61_5512
91R14CDC_92_9000
91R17CDC_84_305
23_84CDC_C838CDC_C897
CDC_70_3827CDC_69_3823CDC_69_3818
E1012_74C_164
CDC_79_1480bisCDC_C1041
CDC_C1039bis2735
CDC_1007_74CDC_62_5000_PF1
450CDC_55_986_PF1
M160NIPHW_373NIPHW_18NIPHW_17NIPHW_374NIPHW_371CRIE_1172CRIE_904
Shigella_dysenteriae_Sd197_v1Sd197_PF1
WR1414Udorn7
CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937
60RHanabusa
M220HNCMB_20003HNCMB_20002
PCM_159CIP55_137_PF1
Sh46_58Sh45_58Sh30_44
IPSP_14_941HNCMB_20001
PCM_126RKI1966
IPSP_14_940IPSP_14_939IPSP_15_159
CIP57_28_PF1CDC_3036_94
17_89_PF193_53191_3127
98_4962_PF191_312510_879_89
E322_74CIP58_1_PF1
Sh39_47CDC_1199_PF1
M222Sh31_44Sh33_44Sh37_45CIP55_90
SH_LisbonneCRIE_954
CIP52_27_PF1Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58
M117M118M188M216M63M217M189M119
IPSP_15_158M159M165M116
M115_cloned
0.011
A1_47CDC_F1372CDC_F13711036_99_PF1SD_G3A1_13167CDC_08_3380CDC_C8546CDC_F4046CDC_F4042_repeatY489_94CDC_ZB4102_04990PAMA258_11ECDC_F1454_2_PF197_13397CDC_F3431CDC_C8558CDC_F1358CDC_F4043CDC_F3434CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690CDC_ZB37CDC_F1605M1DH06DH0506_130705_733405_7337CAR5CAR4CAR101_1502CDC_F646598_343DH0300_809799_983713_0527913_0531803_2815P2000_1070P2000_106800_516606_627600_268899_4320RKI00_2084IPD_11654IPD_2109499_10354E46746_8795_6044E44600_86CDC_9033_89AG_896E85944_92K_1613KO_140M_J84KO_170KO_73KO_216KO_120HUS_137K_1438KO_8Ra_236KO_54KO_46C_234CRIE_154CRIE_160MK_803_CE29852_83687_84E35155_8425_90CDC_84_787NCID_Az11As_15878MS_836As_15858SumotiNCID_NK2490NCID_IDH03216108_06369KH08_0455NCID_BCH518NCID_21NCID_25KOI_19TA1NCID_D44NCID_D1AR_3349305_7526_PF1As_1950038_SH05997_9297_778399_729D367_01E60750_89D482_95E35066_84KO_21_PF1AD_45740_82bis40_8109_6544KO_47CRIE_144CRIE_139CRIE_178CRIE_239CRIE_312CRIE_240CRIE_244CRIE_314CRIE_519CRIE_1203CDC_9021_89CRIE_1179E30225_83CDC_87_3333CDC_87_33303_Laos48_LaosKO_49KO_106_PF1KO_225E3208_76E35062_84KO_226KO_231CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201M_18KO_131CDC_9025_89219_85E2427_73E16184_79151_77E2866_73E3008_73SZ1250_36_74E15_74E1347_72Wurzburg_206F_134sdZM603CDC_3099_85CDC_BU22X2CDC_BU53M1E39084_85Iris_1349_87Iris_KAT1Iris_KAT4CDC_B9624_PF12104C5778CDC_Z111_8112051650C413C596RAJ_KC2022CDC_A5468_PF1E45448_86bis3_79CRIE_1180CRIE_1177CRIE_1175CRIE_2338CRIE_2458E1892_72E2471_73E2560_7384_036E37992_8599_7947RKI99_690999_570684934RKI99_88426347899_9324IPD_23IPD_1200927_847_87_PF116_8612_7715_776_80_PF1E7926_77bisCIP56_33_PF1RIMD3101010E192_75E174_75CRIE_82HNCMB_20080CAR7_PF1CAR10CAR18CAR192_90_repeatCIP62_17_PF1CIP54_95_PF1M430M71CIP106200_PF1CDC_61_551291R14CDC_92_900091R17CDC_C838CDC_C897CDC_84_30523_84CDC_70_3827CDC_69_3823CDC_69_3818E1012_74C_164CDC_79_1480bisShigella_dysenteriae_1617_v1CDC_C1041CDC_C1039bis2735CDC_1007_74CDC_62_5000_PF1450CDC_55_986_PF1M160NIPHW_373NIPHW_18NIPHW_17NIPHW_374NIPHW_371CRIE_1172CRIE_904HNCMB_25023HNCMB_25022CDC_53_3937Sd197_PF1WR1414Udorn7CIP53_136_PF1M22060RHanabusaHNCMB_20003HNCMB_20002PCM_159CIP55_137_PF1Sh46_58Sh45_58Sh30_44IPSP_14_941HNCMB_20001PCM_126RKI1966CIP57_28_PF1CDC_3036_94IPSP_14_940IPSP_14_939IPSP_15_15917_89_PF193_53191_312798_4962_PF191_312510_879_89E322_74CIP58_1_PF1Sh39_47CDC_1199_PF1Sh31_44Sh37_45Sh33_44CIP55_90SH_LisbonneCRIE_954CIP52_27_PF1M222Sh14_42Sh13_41Sh19_42Sh20_42Sh21_42Sh23_42Sh24_42Sh36_45Sh29_42Sh25_42Sh47_58M117M118M188M189M216M63M217M119IPSP_15_158M159M165M116M115_clonedI
II
IV
III
13
Supplementary Fig. 12 Distribution of SNPs with respect to the S. dysenteriae type 1 reference genome Sd197. SNP counts per 10,000 bp window are plotted on the y-axis. The blue line indicates the mean rate of 39 SNPs per 10,000 bp (or 1 SNP per 256 bp). The peak is due to the rpoS gene, which contains 40 SNPs.
Chromosomal coordinates (S. dysenteriae Sd197)
SN
Ps
per 1
0,00
0 bp
0.0
0.002
0.004
0.006
0.008
1,000,000 2,000,000 3,000,000 4,000,000
80
60
40
20
0
rpoS
14
Supplementary Fig. 13 Distribution of nucleotides at a key SNP position for the BS504 genome. The SNP at position 3,219,278 (T/G) is present in all the African lineage IV isolates of transmission wave T8, but not in other isolates. In the study by Rohmer et al.2, the nucleotide called for this SNP in BS504 was G, despite a frequency of T residues of 36.5% at this position. The read mapping visualised by Artemis46 at this position is also shown for our isolates CDC ZB4 (the original name of BS504, which was isolated in Zambia in 1991) and CDC 55-986 (Mexico, 1955).
nt 3,219,278 SRR765101_BS504 CDC ZB4 CDC 55-986
T, n=157 G, n=0
T, n=0 G, n=142
T, n=31 G, n=54
15
Supplementary Fig. 14
Phenotypic and molecular markers from existing typing and subtyping schemes. The data are correlated with the maximum likelihood phylogeny based on 14,677 chromosomal SNPs (reference genome Sd197). The four genetic lineages (I, II, III, IV) are indicated by colour, respectively; yellow, blue, green, red. Columns next to the tree show the markers analysed. From left to right: ortho-nitrophenyl-β-galactoside test (ONPG); presence of Shiga toxin genes (stxA and stxB); presence of plasmid pSS046_spC (pSS046_spC); multilocus sequence type (MLST); and CRISPR spacer content (CRISPR) – see inset legend. The genetic data were obtained principally from whole genome sequences.
Key (pSS046_spC): Absent PresentKey (CRISPR): 10_3 11_ 12_ 12_3var1 13_ 14_ 14_32 15_ 16_ 17_ 18_ 19_ 1_ 20_ 2_ 3_ 4_ 5_ 6_ 7_ 8_ 9_Key (MLST): 10_ST146 10_ST146new1_adk426 11_ 12_ 13_ 14_ST146new2_icd511 15_ 16_ 17_ 18_ 19_ST146new2_icd513 1_ 20_ 2_ 3_ 3_ST260new1_gyrB430 4_ 5_ 6_ 7_ 7_ST260new2_icd512 8_ 9_ST260Key (stxB): Absent PresentKey (stxA): 10_Present 11_ 12_SNP 13_ 14_ 15_ 16_ 17_ 18_ 19_ 1_ 20_ 2_ 3_ 4_ 5_ 6_ 7_ 8_ 9_AbsentKey (ONPG): Negative POSITIVEKey (Lineage ): 10_IV 11_ 12_ 13_ 14_ 14_I 15_ 17_ 18_ 19_ 19_III 1_ 20_ 2_ 4_ 5_ 6_II 7_ 8_ 9_0.0076
A1_47CDC_F1372CDC_F1358CDC_F1371
1036_99_PF1SD_G3A1_131
67CDC_F4043
CDC_08_3380CDC_C8546CDC_F4046
CDC_F4042_repeatY489_94
CDC_F3434CDC_ZB497_13397
CDC_F3431CDC_C8558CDC_F4568CDC_F3437CDC_F4564CDC_F4578CDC_F4579CDC_F683
CDC_F344X1CDC_F2856CDC_F2349CDC_F3432CDC_F690
CDC_F1454_2_PF1CDC_ZB37102_04990
CDC_F1605M1PAMA
258_11EDH06DH05
06_130705_733405_7337
CAR5CAR4CAR1
01_1502CDC_F6465
98_343DH03
00_809799_9837
13_0527913_05318
03_2815P2000_1070P2000_1068
00_516606_627600_2688
RKI00_208499_4320
IPD_11654IPD_2109499_10354
E46746_8795_6044
E44600_86CDC_9033_89
AG_896K_1613KO_140
M_J84KO_170KO_73
E85944_92NCID_Az11
As_15858As_15878
MS_836Sumoti
NCID_IDH03216108_06369
KH08_0455NCID_NK2490NCID_BCH518
NCID_21NCID_25
KOI_19TA1
NCID_D44NCID_D1
AR_3349305_7526_PF1
As_1950038_SH05
997_9297_7783
E60750_8999_729
D367_01D482_95
E35066_84KO_21_PF1
AD_45KO_216KO_120
HUS_137K_1438
KO_8Ra_236KO_54KO_46
CRIE_160C_234
CRIE_154MK_803_CE29852_83
687_84E35155_84
25_90CDC_84_787
740_82bis40_81
09_6544KO_47
CRIE_144CRIE_139CRIE_239CRIE_312CRIE_178CRIE_240CRIE_244CRIE_314
CRIE_1203CRIE_519
CDC_9021_89CRIE_1179E30225_83
CDC_87_3333CDC_87_3330
48_Laos3_LaosKO_49
KO_106_PF1KO_225
E3208_76E35062_84
KO_226KO_231
CDC_80_547_PF1E2797_74E4794_76SPH_1546SPH_2395SPH_755SPH_201
M_18KO_131
CDC_9025_89219_85
E2427_73E16184_79
151_77E2866_73E3008_73
SZ1250_36_74E15_74
E1347_72Wurzburg_206
F_134sdZM603
CDC_3099_85CDC_BU22X2CDC_BU53M1
Iris_1349_87E39084_85
Iris_KAT1Iris_KAT4
CDC_B9624_PF12104
C5778CDC_Z1
11_8112051650C413C596
RAJ_KC2022
CDC_A5468_PF1E45448_86bis
3_79CRIE_1180CRIE_1175CRIE_1177CRIE_2338CRIE_2458
E1892_72E2471_73E2560_73
84_036E37992_85
99_7947RKI99_6909
99_570684934
RKI99_884263478
99_9324IPD_23
IPD_1200927_84
7_87_PF116_8612_7715_77
6_80_PF1E7926_77bis
CIP56_33_PF1RIMD3101010
E192_75E174_75CRIE_82
HNCMB_20080CAR7_PF1
CAR18CAR10CAR19
2_90_repeatCIP62_17_PF1CIP54_95_PF1
M430M71
CIP106200_PF1CDC_61_5512
91R14CDC_92_9000
91R17CDC_84_305
23_84CDC_C838CDC_C897
CDC_70_3827CDC_69_3823CDC_69_3818
E1012_74C_164
CDC_79_1480bisCDC_C1041
CDC_C1039bis2735
CDC_1007_74CDC_62_5000_PF1
450CDC_55_986_PF1
M160NIPHW_373NIPHW_18NIPHW_17
NIPHW_374NIPHW_371CRIE_1172CRIE_904
Shigella_dysenteriae_Sd197_v1Sd197_PF1
WR1414Udorn7
CIP53_136_PF1HNCMB_25023HNCMB_25022CDC_53_3937
60RHanabusa
M220HNCMB_20003HNCMB_20002
PCM_159CIP55_137_PF1
Sh46_58Sh45_58Sh30_44
IPSP_14_941HNCMB_20001
PCM_126RKI1966
IPSP_14_940IPSP_14_939IPSP_15_159
CIP57_28_PF1CDC_3036_94
17_89_PF193_531
91_312798_4962_PF1
91_312510_879_89
E322_74CIP58_1_PF1
Sh39_47CDC_1199_PF1
M222Sh31_44Sh33_44Sh37_45
CIP55_90SH_Lisbonne
CRIE_954CIP52_27_PF1
Sh14_42Sh13_41Sh20_42Sh21_42Sh19_42Sh36_45Sh29_42Sh25_42Sh23_42Sh24_42Sh47_58
M117M118M188M216M63
M217M189M119
IPSP_15_158M159M165M116
M115_cloned
Linea
ge
ONPGstx
Astx
BMLS
TCRISPR
pSS04
6_sp
C
Keys (ONPG) Positive Negative Unknown
(stxA, stxB, pSS046_spC) Presence Absence SNP
(MLST) ST146 ST5160 ST5159 ST5157 ST260 ST5158 ST5161
(CRISPR) 3 3var1 32 !!
II
III
IV
I
16
Supplementary Fig. 15
Maximum likelihood phylogenetic tree for S. dysenteriae type 1 virulence plasmid pSD1_197. This unrooted tree was constructed from the 226 plasmid-containing isolates. The genetic lineage determined from the maximum likelihood phylogeny for the 14,677 chromosomal SNPs (reference genome Sd197) is indicated in the column on the right.
Key (Genetic Lineage): II III IV0.0051
M116M188
Sh47_58M118
CIP52_27_PF19_89
91_312717_89_PF1
98_4962_PF193_531
Sh45_58Sh46_58
IPSP_15_159IPSP_14_939IPSP_14_940
HNCMB_20002HNCMB_20003
PCM_126RKI1966
HNCMB_20001CDC_3036_94CIP57_28_PF1
CRIE_954Sh13_41
CIP58_1_PF1E322_74
CDC_1199_PF1Sh39_47
M119M216
M63M189M165
CRIE_82E174_75
CIP54_95_PF1M71
CIP106200_PF1CIP62_17_PF1
CAR18CAR19
CAR7_PF1M430
16_8627_84
7_87_PF1IPD_12009
IPD_2312_77
6_80_PF1CRIE_1172
Udorn7WR1414
Sd197_PF1Shigella_dysenteriae_Sd197_v1
HNCMB_25023NIPHW_371NIPHW_374
NIPHW_17CIP53_136_PF1
CDC_55_986_PF1450
CDC_92_900091R1791R14
CDC_69_3818C_16423_84
E1012_74CDC_1007_74
sdZM603F_134
E37992_85CRIE_2338CRIE_1180CRIE_2458
E1892_7284_036
E2560_7399_9324
63478RKI99_8842
99_5706RKI99_6909
99_794784934
CDC_3099_85C413
Iris_KAT4C596
CDC_Z121041205
Iris_KAT1C5778
1650RAJ_K
CDC_B9624_PF1CDC_A5468_PF1
CDC_BU22X2C202211_81
E2427_73CDC_80_547_PF1
E1347_72M_18
E16184_79E3008_73SPH_1546
SPH_755SPH_201
SPH_2395CDC_9025_89
E2797_74E4794_76
CRIE_1179KO_73
97_778305_7526_PF1
As_1950038_SH05
As_15878NCID_Az11
SumotiKH08_0455
MS_836AR_33493
KOI_1908_06369
NCID_IDH032161NCID_NK2490
NCID_25As_15858
E35066_84E60750_89
997_92D482_95D367_01
99_729E46746_87
95_6044CDC_F1372CDC_F4579
67P2000_1068
SD_G3CAR1
RKI00_208413_0531813_05279
03_281506_6276
PAMACDC_F4046
CDC_08_3380CDC_F1358CDC_F3434CDC_F3432
00_516699_432001_1502
CDC_F4578CDC_ZB37
CDC_F2349CDC_F1454_2_PF1
CDC_F45681036_99_PF1
CDC_F4043CDC_F690
CDC_F1605M1CAR5
CDC_F68398_343
CDC_C8558CDC_F1371CDC_C8546
IPD_11654CDC_ZB4
IPD_2109497_13397
A1_131Y489_94
CDC_F343700_2688
CDC_F456405_733705_733406_1307
DH05DH0625_90AD_45
KO_21_PF1E3208_76
48_LaosKO_140AG_89609_6544
3_LaosK_1613
40_81KO_106_PF1
KO_49CRIE_1203
CRIE_519CRIE_239CRIE_139CRIE_314CRIE_240CRIE_144CRIE_312CRIE_178CRIE_244
KO_46K_1438KO_216KO_120
HUS_137MK_803_C
C_234KO_8
KO_47E85944_92
CDC_9021_89M_J84
KO_170CDC_9033_89
Geneti
c Line
age
Lineage'
0.0051
17
Supplementary Fig. 16
Pan-genome analysis. a, Changes to the total genes (dotted) and conserved genes (solid) in the pan-genome with the addition of genomes. b, Breakdown of the frequency of genes within isolates where the categories are defined as: core, genes contained in nearly all isolates (>=99%); soft core, genes contained in 95%-99% of the isolates; cloud, genes present only in a few isolates (15%-95%); shell, the remaining genes, present in several isolates (<15%).
a b
18
Supplementary Tables
Supplementary Table 1 – Details of Shigella dysenteriae type 1 isolates and genomes
used in this study.
The following are shown: year and country of isolation, epidemiological information;
lineage, biotype, antibiotic resistance phenotypes, source and EBI-ENA accession
numbers (sheet no. 1); spatiotemporal distribution (sheet no. 2); MLST, CRISPR type,
presence/absence of stxA and stxB genes (sheet no. 3); gyrA SNPs and presence/absence
of resistance-associated genes and structures (sheet no. 4); distribution of the lineages
over time (sheet no. 5).
See separate Excel file
Supplementary Table 2 – Whole-genome sequences, SNPs and phylogenetic data
used in this study.
The following are shown: mapping statistics (sheet no. 1); assembly statistics (sheet no.
2); SNPs used for phylogeny (sheet no. 3); pairwise SNP distance between the isolates
(sheet no. 4); summary of Bayesian models used for analyses with BEAST (sheet no. 5);
date estimates for the main lineages (sheet no. 6); date estimates for the intercontinental
transmission events (sheet no. 7); Gene Ontology functions of the 5,630 annotated
accessory genes (sheet no. 8).
See separate Excel file
19
Supplementary Table 3 – CRISPR spacer sequences analysed in this study.
Identifier DNA sequence
A1-var1 CAAGTGATATCCATCATCGCATCCAGTGCGCC
6 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC
24 TCGGTTCAGGCGTTGCAAACCTGGCTACCGGG
115 TCGGTTCAGGCGTTGCAAACCTAGCTACCGGG
21 GTAGTCCATCATTCCACCTATGTCTGAACTCC
Supplementary Discussion
SNP variation in S. dysenteriae type 1
The Maximum likelihood (ML) phylogenetic analysis (RAxML) with non-S. dysenteriae
outgroups and the BEAST analysis (Fig. 2 and Supplementary Fig. 7) showed that M115
is most closely related to the ancestral Shigella dysenteriae type 1 (Sd1) strain. This
isolate, which differed from others by more than 1,200 SNPs, was the only representative
of lineage I. Otherwise, M115 displayed all the characters of Sd1 in terms of biotyping,
serotyping, MLST, CRISPR typing and the presence of the Shiga toxin genes. To
confirm that the divergence of M115 was not due to laboratory contamination or a
hypermutator phenotype, it was even sequenced a second time from a separate DNA
extract, after serial dilution, to ensure that the DNA came from a single colony. The two
genomic sequences obtained were identical. M115 displayed no modifications to genes
involved in the DNA repair system (mutS, mutH, mutL, and uvrD), and it did not have
20
the hypermutator phenotype (Supplementary Fig. 8). The various ML trees were
therefore rooted on M115.
The topology of the two ML trees obtained after mapping short-read sequences against
the sequences of the Sd1 reference genomes Sd197 and Sd1617 were similar
(Supplementary Figs. 9-11). We therefore used the 14,677 chromosomal SNPs randomly
distributed over the non-repetitive non-recombinant core genome detected after mapping
against Sd197 (Supplementary Fig. 12).
All BAPS runs converged on three sequence clusters corresponding to lineages II to IV,
concordant with the results obtained for clustering by eye, except for M115. This unique
lineage I isolate was consistently added to lineage II by BAPS, despite differing from the
other isolates in this lineage by more than 1,200 SNPs and having been shown to be the
isolate most closely related to the ancestral Sd1 strain by BEAST analysis (see above).
The mean intra-clade pairwise SNP variation within lineages was 275 (minimum 1-
maximum 517) for the 58 lineage II isolates, 417 (0-677) for the 64 lineage III isolates,
and 192 (0-485) for the 208 lineage IV isolates. Within lineage IV, the isolates from the
outbreak in Central Africa in the 1980s (18 T5 isolates isolated from three countries
between 1981 and 1990) differed by a mean of 35 SNPs (1-76), whereas the isolates
from the emerging ortho-nitrophenyl-β-galactoside negative (ONPG-) African strain (63
T8 isolates isolated from 22 countries between 1991 and 2011) differed by a mean of 56
SNPs (0-133).
21
We assessed the influence of storage conditions on the rare isolates from a single
outbreak recovered by different groups with different culture preservation methods. The
CDC A5468 and 11-81 isolates, which originated from the Democratic Republic of the
Congo (DRC) and were obtained in 1981, were stored at -80°C or freeze-dried. They
differed by only 10 SNPs. The Iris Kat-4 isolate was isolated in DRC in 1985 and stored
at -80°C, whereas the E39084/85 isolate was collected in Rwanda in 1985 and stored at
room temperature as a stab culture on Dorset egg medium. They also differed by only 10
SNPs. Indirectly, the case of the CDC 3036-94 isolate (see below) also argues for the
stability of the SNP pattern. We cannot rule out some differential SNP evolution due to
the number of passages before preservation and the mode of storage, but the limited SNP
variation observed for 30 year-old isolates and the consistency of the phylogeographic
grouping suggest that this was not a major issue in this study.
Other information revealed by the genome sequencing data
Over the last 10 years, we have tried unsuccessfully to find the original cultures
established by Shiga in Japan in 18975 and Kruse in Germany in 190064. It seems likely
that they have been lost or destroyed. However, on the basis of our phylogeographic
data, it seems likely that Shiga’s isolate would have belonged to sublineage IIIa and that
Kruse’s isolate would have belonged to lineage II.
In 1986-1987, a multidrug resistant Sd1 strain caused outbreaks in north-eastern
Thailand (and also in Laos, according to our isolates) after a lull of 20 years14. Based on
the distance of this region from India and Bangladesh, and the plasmid and antibiotic
22
susceptibility patterns of the isolates, the authors concluded that this strain was unlikely
to have originated from the Indian subcontinent. However, our results suggest that the
sequences of the outbreak isolates were derived from those of bacteria isolated in India
from 1978 to 1982, and in Bangladesh in 1985. Furthermore, all these isolates had the
same antibiotic resistance structures (i.e., two chromosomally encoded transposons,
which are the composite transposon shown in Supplementary Fig. 6 and Tn7).
During the early 1990s, a nalidixic acid-resistant strain caused further outbreaks in
Thailand, this time close to the Burmese border65. Our phylogenetic data indicated that
the outbreak strain was derived from a multidrug-resistant but nalidixic acid-susceptible
strain isolated in Bangladesh in 1987.
In the former USSR, Sd1 or Grigoriev Shiga’s bacillus was highly prevalent from 1917
to 1922, after which its prevalence decreased steadily, reaching negligible levels in the
1950s66. It re-emerged during the 1980s in the Central Asian Soviet Republics, including
the Uzbek Republic, and it was linked to the Afghanistan war and the flow of
populations in Central Asia67. The Uzbek SSR served as a springboard for the
subsequent spread of infection, in the form of sporadic cases, to cities located in the
European part of the USSR (Riga, Moscow, Ulyanovsk, Kuibishev)67. Our data reveal
that these two periods of activity were associated with two different lineages, European
lineage II for the first and South Asian lineage IV (with different subclades) for the
second.
23
The CDC 3036-94 isolate recovered from a child in Tennessee, USA, in 1994 was highly
unusual as it belonged to lineage II and was susceptible to all the antimicrobial agents
tested. The last such pan-susceptible lineage II isolates on record were recovered from
North Africa during the 1970s. Whole-genome sequencing revealed that this case was
probably associated with contamination from laboratory stocks, as the genome of CDC
3036-94 differed from CIP 57.28 by only five SNPs. CIP 57.28 was isolated in the UK in
1934 as the “Newcastle” strain or NCTC 4837, 60 years before the CDC 3036-94 sample
was isolated. Following its isolation in 1934, this “Newcastle”/NCTC 4837 strain was
deposited in various international collections, including CIP under accession no. 57.28
and ATCC under accession no. 13313. The presence in Tennessee of a biotechnology
company using ATCC 13313 to prepare rabbit polyvalent antisera provides further
support for the hypothesis of laboratory contamination, although no information is
available to connect the patient from whom CDC 3036-94 was isolated with the
biotechnology company. The possibility of tube switching has also been ruled out as we
obtained and sequenced the CDC 3036-94 isolate two years after we sequenced CIP
57.28/ATCC 13313. Furthermore, the genome sequence we obtained for CDC 3036-94
was identical to the publicly available BS506 genome2 obtained independently by
another group from a different stock culture of CDC 3036-94.
Differences found compared to a previous study
In the main text, we show that there was no lack of consistency between phylogeny and
geography, as claimed by Rohmer et al.2. Instead, there were strong phylogeographic
patterns. Our study was based on a wide temporal and geographical sampling of more
24
than 300 isolates, resulting in 14,677 informative SNPs compared to 56 isolates and 989
SNPs for the other group. Their lineage A comprised a single isolate, BS506 (original
name CDC 3036-94), found to belong to our old European lineage II. However, as this
isolate was recovered in the USA in 1994, and the likely laboratory contamination with
an old collection strain (see above) was not identified, this lineage was attributed to the
USA and this confused the phylogeographic analysis. Their lineage B comprised only the
Sd197 reference genome and corresponded to our Eastern Asian sublineage IIIa. Their
lineage C corresponded to our American sublineage IIIb, with, however, two spurious
sequences from African isolates (DH02 and BS504), also confusing the phylogeographic
analysis (see below). Their lineage D corresponded to our lineage IV. Lineage I and
sublineages IIIc and IIId were not found in their study.
The hypothesis of long-term human carriers proposed by Rohmer et al.2 was essentially
based on this lack of consistency between phylogeny and geography, as observed for
Salmonella enterica serotype Typhi68, the agent of typhoid fever, which may be carried
for several decades in the gallbladder of some convalescent patients. The clear pattern of
successive transmission waves following the importation of lineage III and IV strains
into Africa does not support this hypothesis of long-term carriers for Sd1.
Rohmer et al. 2 also claimed that the massive outbreak that hit Central America
(estimated 500,000 cases)10,11 during the late 1960s might have been caused by an
African strain that became established in the New World at the beginning of the 1960s.
This hypothesis was based on the grouping of two of their African isolates, DH02 and
BS504 (original name CDC ZB4), at the base of the C lineage, the tips of which
correspond to Central American isolates. As our CDC ZB4 isolate, together with our
25
258-11E and PAMA isolates collected during an outbreak in Cameroon in 199869 (likely
the same outbreak as for DH02) clustered within the ONPG- lineage IV T8 subclade,
which contains almost all the African isolates obtained since 1991, we analysed the
deposited BS504 and DH02 short reads for certain SNPs characteristic of either
sublineage IIIb (North and Central American isolates) or the ONPG-negative lineage IV
subclade. We observed a heterogeneous distribution of two nucleotides at these positions
(Supplementary Fig. 13) indicative of a mixture of two isolates from different genetic
backgrounds, explaining the spurious grouping.
Our results demonstrate that sublineage IIIb, containing only North and Central
American isolates, had a common ancestor dating back to 1893 [95% credible interval
(CI), 1885-1901]. One of the isolates was even isolated in Mexico in 1955, several years
before the postulated establishment of the African strain in America. Our genomic data
are also consistent with reliable old published reports of the isolation of Sd1 at medical
institutions in New England during the early 1900s70,71 or at a camp for Mexican workers
in Michigan in 193872.
It was also claimed that the diagnostic tools might be jeopardized by genetic drift
affecting metabolic activities as well as surface antigens, some of which were targeted by
serotyping. We found, to the contrary, considerable phenotypic and genetic homogeneity
in our dataset (see next section), for all the typing and subtyping tools used by clinical
and public health microbiology laboratories. The only SNP (within lacZ) associated with
the loss of a typing character (ONPG test) was a useful marker of the strain that spread
across Africa during the 1990s and 2000s.
26
Correlation of S. dysenteriae type 1 phylogenetic lineages with existing typing and
subtyping schemes
Sd1 is known to contain the stxA and stxB genes encoding the Shiga toxin STX1, on a
defective lambdoid prophage73. In a context of the presence of hundreds of insertion
sequences (ISs) within the Sd1 genome17,18, the stxA and stxB genes have remained
remarkably conserved over a period of almost a hundred years. Only two isolates have
lost both stxA and stxB, and three isolates have a SNP within stxA. These findings do not
reflect a sampling bias, as the identification of Sd1 is based on biochemical tests and
serotyping. Searches for stx genes or STX production are not carried out routinely for
Shigella spp. isolates in clinical microbiology or public health laboratories.
The genetically distinct lineages of Sd1 showed only low levels of uncorrelated diversity
on assessment with existing subtyping methods: biotyping74, multilocus sequence typing
(MLST)36, CRISPR typing22,75, plasmid profiling76, and pulsed-field gel electrophoresis
(PFGE)25,77 (Supplementary Figs 3 and 14).
The ONPG test was the only conventional phenotypic test to give variable results. This
test assesses β-galactosidase activity, which is intense and rapid in Sd1 (generally taking
less than 3 hours). The loss of β-galactosidase activity was observed in some isolates
from across the tree but was a constant marker for the lineage IV African isolates of the
T8 intercontinental transmission wave (Fig. 2). This ONPG- character was first reported
in the DRC in 199478 but we detected this marker in older African isolates (Zambia,
27
1991) and in some Indian and Nepalese genomes isolated earlier and genetically
ancestral to the T8 African genomes (Fig. 2). These South Asian genomes and the
derived African T8 genomes had in common a non-synonymous SNP (C to T at position
363,921 of the reference genome Sd197, leading to a glycine-to-serine substitution)
within the lacZ gene. The other sporadic ONPG- isolates in the other lineages do not
have this non-synonymous SNP. This SNP thus constitutes a good candidate marker for
the emerging African strain.
MLST, which has become the gold standard for bacterial population typing, revealed the
presence of two main STs, ST260 (n=54, lineage II isolates) and ST146 (n=270, other
lineages), differing by a single SNP in one of the seven 500 bp “housekeeping” genes
targeted by this method. In addition, seven genomes belonged to four new STs that were
single-locus variants of ST260 and ST146.
CRISPR types were also very stable across the lineages. We analysed the de novo
assemblies for the different CRISPR spacer sequences and found that all but six genomes
belonged to CRISPR type (CT) 3, with the following four spacers: A1-var1, 6, 24, and
21 (Supplementary Table 3). One genome belonged to CT32 (A1-var1, 6, 115, and 21;
spacer 115 being a single SNP variant of spacer 24) and five genomes belonged to
CT3var1, which differed from CT3 by a single SNP within one direct repeat (DR) within
the CRISPR sequence.
Plasmid profiling based on the number and size of plasmids within a single isolate has
been widely used for differentiating Sd1 isolates. The independent acquisition of
28
similarly sized multidrug-resistant plasmids (Supplementary Fig. 4) and the distribution
of plasmids not containing ARGs, such as pSS04-spC, across all lineages
(Supplementary Fig. 14), preclude the assessment of phylogenetic relationships between
isolates by this method. Furthermore, the shift from plasmids to genomic islands as the
support for antibiotic resistance in the last two to three decades has probably decreased
the plasmid content of isolates.
Over the last two decades, PFGE has become the method of choice for subtyping enteric
bacteria at strain level. In light of the genome sequences obtained, we re-evaluated two
outbreaks that occurred in the Central African Republic in 2003-200425, and which we
had investigated with PFGE as a molecular epidemiology tool (Supplementary Fig. 3).
PFGE distinguished two groups of profiles, one (PFGE profiles X1 to X4) for the
“Ouham-Pende” outbreak and the other (PFGE profiles X5 to X7) for the “Nana-
Grebizi” outbreak. Both PFGE groups were tightly clustered, whereas the isolates used
for comparison displayed PFGE profiles (X8 to X18) very different from those seen for
the isolates of the two outbreaks. Genomic data showed that the strains that had caused
the “Nana-Grebizi” and “Ouham-Pende” outbreaks belonged to different lineages, IIIc
and IV, respectively, differing by ~700 SNPs. The intra-SNP variation observed among
outbreak isolates and giving rise to slightly different PFGE profiles was 5-33 and 10-20
for the “Nana-Grebizi” and “Ouham-Pende” outbreaks, respectively. Most of the
genomes of the comparison isolates were actually close to the “Ouham-Pende” genomes
(differing from them by 37 to 61 SNPs), despite the lack of relationship suggested by
PFGE. This lack of correlation between PFGE and WGS data confirms that PFGE should
29
not be used for assessing phylogenetic relationships in an organism with a very plastic
genome containing hundreds of IS, such as Sd1.
Coevolution between the VP and the chromosome
A large virulence plasmid (VP) was present in 226 isolates. The ML phylogeny based on
290 informative SNPs was similar to the chromosome phylogeny (Supplementary Fig.
15), indicating a coevolution of the chromosome and the VP since at least 1853 (95% CI
1831-1871), the date of the MRCA of all the Sd1 isolates other than M115. The M115
isolate did not contain the VP, according to our search criteria (see Methods section).
Structure of the cadBA operon
In Shigella spp. and enteroinvasive E. coli (EIEC), an inability to synthesise lysine
decarboxylase (LDC) and, thus, produce cadaverine has been identified as a convergent
pathoadaptive mutation that enhances virulence79-81. Comparative analysis has shown
that the ancestral LDC trait was lost through various rearrangements of the cadBA
operon encoding LDC and its transporter. We analysed this operon from the PacBio
sequences of nine Sd1 isolates from lineages I (M115), II (M116 and 17/89), IIIa
(Sd197), IIIb (CDC 69-3818), IIIc (CAR10), and IV (40-81, CDC ZB4 and 99-9324). A
similar structure was found in all these isolates. The cadAB operon was located between
ytfQ (SDY_4463 of Sd197; GenBank accession no. CP000034) and yjdL (SDY_4467).
The cadA gene (SDY_4466) displayed a five-nucleotide deletion leading to a frameshift
with a premature stop codon, and the cadB gene (SDY_4465) was interrupted by an IS1.
30
The cadC gene, a regulator of the cadAB operon was absent. In addition, an IS1 element
was found inserted into the cadA gene of M115 and CDC 69-3818 and a second
frameshift was found at the end of the cadA gene in the lineage IV isolates 40-81, CDC
ZB4 and 99-9324.
Pan-genome and antibiotic resistance
The pan-genome analysis (Supplementary Fig. 16) identified a total of 11,830 genes for
the 330 Sd1 genomes studied. A core genome of 2,194 genes was identified, comprising
1,132,109 bases. Of the 7,345 accessory genes, 5,630 were annotated with 22,135 Gene
Ontology (GO) terms. The top GO terms corresponded to DNA/plasmid binding
(GO:0003677) and transposition, DNA-mediated functions (GO:0006313)
(Supplementary Table 2). Taking into account the various large multidrug-resistant
plasmids (70 to 160 genes per plasmid) we have sequenced by 454 or PacBio, together
with other plasmid fragment sequences obtained by Illumina short-read sequencing, we
can conclude that the number of genes in the accessory genome supporting antibiotic
resistance probably exceeds 1,000.
The accessory genome linked to antibiotic resistance was maintained in the descendants,
whereas structures not involved in antibiotic resistance, such as prophages, were found in
only one or a few isolates and were not studied further.
Cotrimoxazole, a combination of sulfamethoxazole and trimethoprim, has been widely
used to treat Shigella infections since the late 1960s, when the first multidrug-resistant
strains appeared. The first cotrimoxazole-resistant Sd1 strains were isolated on the Indian
31
subcontinent in the early 1980s82. The dhfrI (or dfrA1) gene was found in all the isolates,
either on a 20-MDa plasmid for the ampicillin-resistant isolates, or chromosomally
encoded for the ampicillin-susceptible isolates. Haider et al.82 thought that there might be
a transposition of this dhfrI gene between the 20-MDa plasmid and the chromosome.
Our first cotrimoxazole-resistant Sd1 isolate was obtained in India in 1978. It contains
the dfrA1 gene on a Tn7-like transposon integrated into the chromosome. The class 2
integron did not contain the aadA1 gene (encoding resistance to streptomycin and
spectinomycin), as for the classical Tn783. The intI2-dfrA1-sat2-orfX-ybfA-ybfB-ybgA-
tnsE-tnsD-tnsC-tnsB-tnsA genes were found to be present and the transposon was named
Tn7::In2-9, in accordance with the nomenclature of ref. 83. Tn7::In2-9 was also found in
114 other Sd1 isolates, all from lineage IV. A classical Tn7 was found in three Egyptian
isolates from 1999. A 30-kb IncX4 plasmid containing dfrA1 was also found in 14 Sd1
isolates (one from lineage II and 13 from lineage IV), none of which contained the
chromosomal Tn7 or Tn7::In2-9. However, the plasmid dfrA1 gene was in a class 2
integron (In2-9) with no trace of the Tn7 transposition module, and could not, therefore,
have transposed to the chromosome as suggested by Haider et al.82. In Africa, our first
isolate resistant to cotrimoxazole was isolated in the DRC in 1983. Resistance to this
antibiotic was observed in 1981, less than two years after the start of the so-called
“Zairian” outbreak caused by a strain initially resistant to ampicillin, chloramphenicol,
tetracycline, streptomycin and sulfonamides84. The initial multidrug resistance was due
to a 50-kb IncX1 plasmid encoding resistance to ampicillin, chloramphenicol, and
tetracycline (pA5468) and a 6-kb plasmid encoding resistance to streptomycin and
sulfonamides (pETEC6). Resistance to cotrimoxazole was conferred by a dfrA1 gene
encoded on a 110-kb IncI1 pST186 plasmid (pBU53M1).
32
In our collection, the two oldest nalixidic acid-resistant Sd1 isolates were obtained in the
DRC and Bangladesh, both in 1985. However, the first reported isolation of Sd1 isolates
with this pattern of resistance was in April 1982 in the DRC84. This isolation occurred
less than one year after the introduction of nalidixic acid as first-line therapy during the
“Zairian” outbreak, in which isolates rapidly became resistant to cotrimoxazole.
The next step in the development of a multidrug resistance profile was the acquisition of
resistance to ciprofloxacin, a fluoroquinolone, mediated by a double mutation in gyrA
(S83L and a second mutation in codon 87) and a mutation in the topoisomerase IV parC
gene (S80I). In our dataset, resistance to ciprofloxacin was acquired only once, in a
group of 20 isolates from the Indian subcontinent collected between 1995 and 2010
(MIC ciprofloxacin 4-12 mg/L). This is consistent with published reports of an
emergence of ciprofloxacin-resistant Sd1 in West Bengal in 2002, after a hiatus of 14
years in which Sd1 was not isolated85,86. A PFGE approach showed that the
ciprofloxacin-resistant Sd1 isolates were clonal, a finding subsequently confirmed by
whole-genome sequencing on a larger sample. However, we identified an internal branch
corresponding to seven isolates with a mutation of codon 87 (D87G) other than the
predominant mutation (D87N). Six of these seven isolates were collected in Bengal in
2002, during the Diamond Harbor and Siliguri outbreaks86. Similarly, two different
mutations in codon 87 of gyrA were previously identified in genetically related
ciprofloxacin-resistant enteric bacterial pathogens of the S. enterica serotype Kentucky
ST198-X1, a bacterium subject to high levels of fluoroquinolone selection pressure in the
poultry industry87.
33
Supplementary References 64 Kruse, W. Ueber die Ruhr als Volkskrankheit und ihren Rrreger. Deutsche Med. Woch. 26, 637-639 (1900). 65 Hoge, C. W., Bodhidatta, L., Tungtaem, C. & Echeverria, P. Emergence of nalidixic acid-resistant Shigella dysenteriae type 1 in Thailand: an outbreak associated with consumption of a coconut milk dessert. Int. J. Epidemiol. 24, 1228-1232 (1995). 66 Krasheninnikov, O. A. [Features of the geographic distribution of Shigellae. I. Changes in the etiologic structure of dysentery in Russia and the USSR (1900-1950)]. Russian. Zh. Mikrobiol. Epidemiol. Immunobiol. 45, 21-31 (1968). 67 Solodovnikov, I. u. P. et al. [The epidemiological characteristics of the spread of Grigor'ev-Shiga dysentery in the territories of the former USSR in recent years]. Russian. Zh. Mikrobiol. Epidemiol. Immunobiol. 1, 31-36 (1994). 68 Holt, K.E., et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat. Genet. 40, 987-993 (2008). 69 Cunin, P. et al. An epidemic of bloody diarrhea: Escherichia coli O157 emerging in Cameroon? Emerg. Infect Dis. 5, 285-290 (1999). 70 Vedder EB, Duval CW. The etiology of acute dysentery in the United States. J. Exp. Med. 6, 181-205 (1902). 71 Hiss, P. H. On fermentative and agglutinative characters of bacilli of the "Dysentery Group". J. Med. Res. 13, 1-51 (1904). 72 Block, N. B. & Ferguson, W. An outbreak of Shiga dysentery in Michigan, 1938. Am. J. Public Health Nations Health 30, 43-52 (1940). 73 Greco, K. M., McDonough M. A. & Butterton J. R. Variation in the Shiga toxin region of 20th-century epidemic and endemic Shigella dysenteriae 1 strains. J. Infect. Dis. 190, 330-334 (2004). 74 Le Minor L, Richard C. Méthodes de laboratoire pour l'identification des entérobactéries. Paris, France: Institut Pasteur; 1993. pp. 72–78. 75 Touchon, M. & Rocha, E. P. The small, slow and specialized CRISPR and anti-CRISPR of Escherichia and Salmonella. PLoS One 5, e11126 (2010).
34
76 Haider, K., Kay B. A., Talukder, K.A. & Huq, M. I. Plasmid analysis of Shigella dysenteriae type 1 isolates obtained from widely scattered geographical locations. J. Clin. Microbiol. 26, 2083-2086 (1988). 77 Talukder, K. A, Dutta, D. K. & Albert, M.J. Evaluation of pulsed-field gel electrophoresis for typing of Shigella dysenteriae type 1. J. Med. Microbiol. 48, 781-784 (1999). 78 Cavallo, J. D., Niel, L., Talarmin, A. & Dubrous, P. [Antibiotic sensitivity to epidemic strains of Vibrio cholerae and Shigella dysenteriae 1 isolated in Rwandan refugee camps in Zaire]. French. Med. Trop. 55, 351-353 (1995). 79 Maurelli, A.T., Fernández, R.E., Bloch, C.A., Rode, C.K., & Fasano A. "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc. Natl Acad. Sci. U S A. 95, 3943-3948 (1998). 80 Casalino, M., Latella, M.C., Prosseda, G., & Colonna, B. CadC is the preferential target of a convergent evolution driving enteroinvasive Escherichia coli toward a lysine decarboxylase-defective phenotype. Infect. Immun. 71, 5472-5479 (2003). 81 Day, W.A. Jr, Fernández, R.E., & Maurelli, A.T. Pathoadaptive mutations that enhance virulence: genetic organization of the cadA regions of Shigella spp. Infect. Immun. 69, 7471-7480 (2001). 82 Haider, K. et al. Trimethoprim resistance gene in Shigella dysenteriae 1 isolates obtained from widely scattered locations of Asia. Epidemiol. Infect. 104, 219-228 (1990). 83 Ramírez, M. S., Piñeiro, S., Argentinian Integron Study Group & Centrón, D. Novel insights about class 2 integrons from experimental and genomic epidemiology. Antimicrob. Agents Chemother. 54, 699-706 (2010). 84 Rogerie, F. et al. Comparison of norfloxacin and nalidixic acid for treatment of dysentery caused by Shigella dysenteriae type 1 in adults. Antimicrob. Agents Chemother. 29, 883-886 (1986). 85 Dutta, S. et al. Shigella dysenteriae serotype 1, Kolkata, India. Emerg. Infect. Dis. 9, 1471-1474 (2003). 86 Pazhani, G.P. et al. Clonal multidrug-resistant Shigella dysenteriae type 1 strains associated with epidemic and sporadic dysenteries in eastern India. Antimicrob. Agents Chemother. 48, 681-684 (2004). 87 Le Hello, S. et al. International spread of an epidemic population of Salmonella enterica serotype Kentucky ST198 resistant to ciprofloxacin. J. Infect. Dis. 204, 675-684. (2011).