Upload
hoangdang
View
236
Download
0
Embed Size (px)
Citation preview
A non-canonical mismatch repair pathway in prokaryotes
Article (Accepted Version)
http://sro.sussex.ac.uk
Castañeda-García, A, Prieto, A I, Rodríguez-Beltrán, J, Alonso, N, Cantillon, D, Costas, C, Pérez-Lago, L, Zegeye, E D, Herranz, M, Plociński, P, Tonjum, T, García de Viedma, D, Paget, M, Waddell, S J, Rojas, A M et al. (2017) A non-canonical mismatch repair pathway in prokaryotes. Nature Communications, 8. a14246. ISSN 2041-1723
This version is available from Sussex Research Online: http://sro.sussex.ac.uk/66997/
This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version.
Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University.
Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.
Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.
1
A non-canonical mismatch repair pathway in prokaryotes
A. Castañeda-García1,2
, A. I. Prieto1, J. Rodríguez-Beltrán
1, N. Alonso
3, D. Cantillon
4, C.
Costas1, L. Pérez
5, E. D. Zegeye
6, M. Herranz
5, P. Plociński
2,#, T. Tonjum
6, D. García de
Viedma5, M. Paget
7, S.J. Waddell
4, A. M. Rojas
8*, A. J. Doherty
2* and J. Blázquez
1,3,9*
1 Instituto de Biomedicina de Sevilla (IBIS)-CSIC. Stress and bacterial evolution .Sevilla,
Spain.
2 Genome Damage and Stability Centre, School of Life Sciences, University of Sussex,
Brighton, BN1 9RQ, UK. 3 Centro Nacional de Biotecnología-CSIC. Madrid, Spain.
4 Brighton and Sussex Medical School, University of Sussex, Brighton, BN1 9PX, UK.
5 Servicio de Microbiología Clínica y Enfermedades Infecciosas, Hospital Gregorio Marañón,
Madrid, Spain. 6 Department of Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway and
Department of Microbiology, University of Oslo, Oslo, Norway. 7School of Life Sciences, University of Sussex, Brighton BN1 9QG, UK
8 Computational Biology and Bioinformatics, Instituto de Biomedicina de Sevilla (IBIS)-
CSIC. 9 Unit of Infectious Diseases, Microbiology, and Preventive Medicine. University Hospital
Virgen del Rocio, Sevilla, Spain. # Present address: Institute of Biochemistry and Biophysics, Polish Academy of Sciences,
Warsaw, Poland.
*Correspondence to: Jesús Blázquez ([email protected]), Aidan J. Doherty
([email protected]) and Ana M. Rojas for computational Biology analyses
2
Abstract
Mismatch Repair (MMR) is a near ubiquitous pathway, essential for the
maintenance of genome stability. Members of the MutS and MutL protein families perform
key steps in mismatch correction. Despite the major importance of this repair pathway, MutS-
MutL are absent in almost all Actinobacteria and many Archaea. However, these organisms
exhibit rates and spectra of spontaneous mutations similar to MMR-bearing species,
suggesting the existence of an alternative to the canonical MutS-MutL-based MMR. We
report that Mycobacterium smegmatis NucS/EndoMS, a putative endonuclease with no
structural homology to known MMR factors, is required for mutation avoidance and anti-
recombination, hallmarks of the canonical MMR. Furthermore, phenotypic analysis of
naturally occurring polymorphic NucS in a M. smegmatis surrogate model, suggests the
existence of M. tuberculosis mutator strains. The phylogenetic analysis of NucS indicates a
complex evolutionary process leading to a disperse distribution pattern in prokaryotes.
Together, these findings indicate that distinct pathways for MMR have evolved at least twice
in nature.
3
Cells ensure the maintenance of genome stability and a low mutation rate using a
plethora of DNA surveillance and correction processes. These include base selection,
proofreading, mismatch repair (MMR), base/nucleotide excision repair, recombination repair
and non-homologous end-joining 1. The recognized MMR pathway is highly conserved
among the three domains of life 2. In all cases, members of the MutS and MutL protein
families perform key steps in mismatch correction. This MutS-MutL-based MMR system
(canonical MMR) is a sophisticated DNA repair pathway that detects and removes incorrect
mismatched nucleobases. Mismatched base pairs in DNA arise mainly as a result of DNA
replication errors due to the incorporation of wrong nucleobases by DNA polymerases. To
correct them, the system detects incorrect although chemically normal bases that are
mismatched with the complementary strand, discriminating between the parental template and
the newly synthesized strand 3.
Loss of this activity has very important consequences, such as high rates of mutation
(hypermutability) and increased recombination between non perfectly identical
(homeologous) DNA sequences 4, both hallmarks of MMR inactivation. Nevertheless, despite
the quasi-ubiquitous nature of this pathway, there are some important exceptions. The
genomes of many Archaea, including Crenarchaeota and a few groups of Euryarchaeota, and
almost all members of the bacterial phylum Actinobacteria, including Mycobacterium, have
been shown to possess no identifiable MutS or MutL homologs 5-7
. However, these
prokaryotes exhibit rates and spectra of spontaneous mutations similar to canonical MMR-
bearing bacterial species 8-10
, suggesting the existence of unidentified mechanisms responsible
for mismatch repair.
To identify novel mutation avoidance genes in mycobacteria, we performed a genetic
screen and discovered that inactivation of the MSMEG_4923 gene in Mycobacterium
smegmatis, encoding a homologue of an archaeal endonuclease dubbed NucS (for nuclease
specific for ssDNA) 11
, produced a hypermutable phenotype. NucS from Pyrococcus abyssi
was initially identified as a member of a new family of novel structure-specific DNA
endonucleases in Archaea 11
. Notably, a recent report revealed that the protein NucS from the
archaeal species Thermococcus kodakarensis (renamed as EndoMS) is a mismatch-specific
endonuclease, acting specifically on dsDNA substrates containing mismatched bases 12
. The
possibility that a non-canonical mismatch repair could be triggered by a specific double-
strand break (DSB) endonuclease, able to act at the site of a mispair, followed by DSB repair
was hypothesized 30 years ago 13
. The archaeal NucS binds and cleaves both strands at
dsDNA mismatched substrates, with the mispaired bases in the central position, leaving 5-
4
nucleotide long 5´-cohesive ends 12
. Consequently, it has been suggested that these specific
double-strand breaks may promote the repair of mismatches, acting in a novel MMR process
12. Moreover, very recently, the structure of the T. kodakarensis NucS-mismatched dsDNA
complex has recently been determined, strongly supporting the idea that NucS acts in a
mismatch repair pathway 14
. Although these biochemical and structural studies have shed
some light on the role of NucS at the molecular level, the cellular and biological function of
NucS and its impact on genome stability remains unknown.
Hypermutable bacterial pathogens, very often associated with defects in MMR
components, are frequently isolated and pose a serious risk in many clinical infections 15-21
.
The major pathogen Mycobacterium tuberculosis appears to be genetically isolated, acquires
antibiotic resistance exclusively through chromosomal mutations 22 and presents variability in
mutation rates between strains 23
. These factors should contribute to the selection of
hypermutable M. tuberculosis strains under antibiotic pressure, as it occurs with other chronic
pathogens 18,23
. However, hypermutable strains have not yet been detected in this pathogen.
To investigate the possible existence of M. tuberculosis hypermutable strains affected in NucS
activity, we analysed the effect of naturally occurring polymorphisms in nucS from M.
tuberculosis clinical isolates on mutation rates, using M. smegmatis as a surrogate model.
In this study, we first establish that NucS is essential for maintaining DNA stability,
by preventing the acquisition of DNA mutations in Mycobacterium and Streptomyces.
Inactivation of nucS in M. smegmatis produces specific phenotypes that mimic those of
canonical MMR-null mutants (increased mutation rate, biased mutational spectrum and
increased homeologous recombination). Finally, we conduct here distribution and
phylogenetic analyses of NucS across the prokaryotic domains to understand its evolutionary
origins.
5
Results
Identification and characterization of M. smegmatis nucS
To identify novel mutation avoidance genes in mycobacteria, a M. smegmatis mc2 155
library of ~11,000 independent transposon insertion mutants was generated and screened for
spontaneous mutations that confer rifampicin resistance (Rif-R), used as a hypermutator
hallmark (Fig. 1A). One transposon insertion, which inactivated the MSMEG_4923 (nucS)
gene, conferred a strong hypermutable phenotype. The NucS/EndoMS translated protein
sequence is 27% identical to NucS from the hyperthermophilic archaeal Pyrococcus abyssi
and 87% to that of M. tuberculosis (Fig. 1B). P. abyssi NucS was defined as the first member
of a new family of structure-specific endonucleases with a mismatch-specific endonuclease
activity 11,12
containing an N-terminal DNA-binding domain and a C-terminal RecB-like
nuclease domain 11
. All the catalytic residues required for nuclease activity in P. abyssi are
conserved in the mycobacterial NucS (see Fig. 1B).
Recombinant M. smegmatis NucS was expressed and purified. The apparent
molecular mass of the purified native protein fits with the expected size (25 kDa)
(Supplementary Fig. 1A). The biochemical activity was analysed by DNA electrophoretic
mobility shift assays (EMSAs) and nuclease enzymatic assays. M. smegmatis NucS was
capable of binding to single-stranded DNA (ssDNA), as seen in the archaeal P. abyssi NucS
11, but not to double-stranded DNA (dsDNA) (Fig. 1C). Regarding cleavage of mismatched
substrates, no significant specific cleavage activity was observed on different substrates
containing a single nucleotide mismatch in the central region of the strand (Supplementary
Fig. 1B), in contrast with previous results obtained with the archaeal T. kodakarensis NucS 12
.
The binding to ssDNA indicates that the protein is correctly folded, as it can bind to its
substrate. This suggests that different requirements, such as the binding of additional partners,
protein modifications or different DNA substrates, may be required to activate the mismatch-
specific cleavage activity of the bacterial NucS.
NucS is essential to maintain low levels of spontaneous mutation
To verify that NucS has a key role in maintaining DNA fidelity, we constructed an in-
frame deletion of nucS (ΔnucS, nucS-null derivative) in M. smegmatis and measured the rate
at which drug resistance was acquired. Notably, the nucS-null strain displayed a hypermutable
phenotype, increasing the rate of spontaneously emerging rifampicin and streptomycin (Str-R)
resistances by a factor of 150-fold and 86-fold, respectively, above the wild-type strain
6
(3.1x10-7
vs 2.1x10-9
and 4.8x10-8
vs 5.6x10-10
, ΔnucS vs wild-type, respectively). These
results are equivalent to those observed for mutS- or mutL-deficient E. coli (102-10
3-fold
increases) 24
. Significantly, basal mutation rates were recovered when the nucS deletion was
chromosomally complemented with the wild-type gene MSMEG_4923 (nucSSm) (Fig. 2A and
Supplementary Table 1), confirming that inactivation of this gene was responsible for the high
mutation rates observed.
Notably, M. smegmatis mc2
155 and its ΔnucS derivative produced a different
mutational signature. While spontaneous Rif-R mutations detected in the wild-type strain
comprise different base substitutions, including transitions, but also transversions and even an
in-frame deletion, all mutations detected in the ΔnucS strain were specifically transitions
(A:T→G:C or G:C→A:T) (Fig. 2B and Supplementary Tables 2A-B). Transitions occur much
more frequently than transversions or indels in any cell, due to spontaneous or induced errors
in the DNA, and are the preferred substrates for MMR mechanism. The ΔnucS strain
accumulates a high number of uncorrected transitions, masking transversions and indels to
undetectable levels under our experimental conditions. This transition-biased mutational
spectrum is a hallmark signature of canonical MMR pathway 25,26
.
To demonstrate the generality of the mutation-avoidance activity in species encoding
NucS, the nucS gene was deleted in Streptomyces coelicolor A3(2), a different actinobacterial
species. The precise deletion of nucS in this species increased the rate of spontaneous
mutations conferring Rif-R and Str-R by a factor of 108-fold and 197-fold, respectively. As
with M. smegmatis, low mutation rates were recovered by complementation with the wild-
type S. coelicolor nucS gene (Fig. 2C and Supplementary Table 3). Together, these results
highlight the essential role of NucS in mutation avoidance.
NucS inhibits homeologous but not homologous recombination
These results prompted us to investigate whether NucS is involved in reduction of
recombination between non-identical (homeologous) DNA sequences, but not between 100%
identical, as described for the canonical MMR-null mutants in other bacterial species 4,27. The
rates of recombination between homologous and homeologous sequences were measured
using specific engineered tools (Fig. 3A and Supplementary Fig. 2A-B). Recombination rates
between 100% identical sequences were similar in the wild-type and its ΔnucS derivative
(2.42x10-6
and 2.86x10-6
, respectively). However, when the identity of DNA sequences
decreased, the recombination rate was comparatively higher in the nucS–null derivative: 95%
7
4.68x10-7
vs 1.41x10-6
(3-fold difference), 90% 1.71x10-8
vs 1.82x10-7
(10-fold) and 85%
6.47x10-9
vs 3.36x10-8
(5-fold) for wild-type versus ΔnucS, respectively (Fig. 3B). We
verified that recombinants detected were exclusively due to recombination events but not
spontaneous mutation (see Methods). The inhibitory effect of NucS on homeologous but not
homologous recombination again resembles an important tell-tale signature of a canonical
MMR pathway 28,29
.
NucS polymorphisms in M. tuberculosis clinical strains
Once the possibility of hypermutability was demonstrated in M. smegmatis, we
searched for the existence of M. tuberculosis hypermutable clinical isolates by nucS
inactivation. Analysis of the nucSTB sequences from ∼1,600 clinical M. tuberculosis strains
available in the databases revealed a total of 9 missense SNPs (Supplementary Table 4 and
Supplementary Fig. 3). The effects of these polymorphisms on NucS activity were
experimentally analysed by checking their impact on mutation rates, using a M. smegmatis
heterologous system, as previously reported for other mutagenesis studies 30
. Ten nucSTB
alleles (9 polymorphic plus the wild-type) were integrated into the chromosome of M.
smegmatis ΔnucS. Complementation with wild-type nucSTB restored low mutation rates to M.
smegmatis ΔnucS. However, five alleles increased mutation rate significantly (Fig. 4,
Supplementary Table 5), with allele S39R presenting the strongest observed mutator
phenotype (83-fold increase). Alleles A135S, R144S, T168A and K184E produced increases
in mutation rates close to one order of magnitude, whereas alleles S54I, A67S, V69A, and
D162H produced low increases (∼2) or no changes. These results suggest the existence of
hypermutable M. tuberculosis clinical strains affected by modulation of NucS activity.
NucS taxonomic distribution
Taken together, our results compellingly suggest a functional connection between
NucS and known MMR pathways. We analysed the species distribution of NucS taking also
into account the canonical MMR proteins, MutS and MutL (for MutS and MutL analyses see
Supplementary Methods). A total of 3,942 reference proteomes were scanned for
presence/absence of NucS (Supplementary Data 1 and Supplementary Fig. 4). Bacteria are
represented by 2,709 proteomes (68%) and Archaea by 132 (4%), the remaining 1,101 (28%)
belonging to Eukaryota and Virus. NucS is present in 370 organisms, from the domains
8
Archaea (60 species) and Bacteria (310 species) (Supplementary Data 1). It is totally absent
from eukaryotes and virus.
To highlight the distribution of NucS in all organisms, we built a phylogenetic profile
of NucS. Figure 5 shows this profile mapped onto the NCBI taxonomy tree, containing 2,186
bacterial and archaeal species (see also Supplementary Data 2). The NucS distribution pattern
indicates that this protein exhibits a disperse distribution (Fig. 5 and Supplementary Data 1).
In Bacteria, the phylum Actinobacteria is the one containing the majority of NucS with
300 species in the class Actinobacteria (only 2 exceptions) and 3 species in other classes of
the phylum (Supplementary Data 1), Conexibacter woesei (class Thermoleophilia),
Patulibacter medicamentivorans (class Thermoleophilia) and Ilumatobacter coccineus (class
Acidimicrobiia). All the analysed members of the class Coriobacteriia, from the
Actinobacteria phylum, lack NucS and MutS-MutL proteins, while the two species from the
class Rubrobacteria lack NucS, but present MutS-MutL. The pattern is even more disperse in
Archaea. From 132 species, 60 have NucS (21 also contain MutS-MutL). NucS, but not
MutS-MutL, is present in 17 out of 24 species of the phylum Crenarchaeota and in 18 out of
88 of the phylum Euryarchaeota. By contrast, MutS-MutL (but not NucS) is completely
absent in crenarchaeotal species while it is restricted to 29 euryarchaeotal species
(Supplementary Data 1).
Interestingly, only 28 organisms, 21 halobacterial species (domain Archaea) and 7
species of the phylum Deinococcus-Thermus (domain Bacteria), have both NucS and MutS-
MutL sequences. Notably, when we focused our analysis in the two main NucS-containing
groups (Actinobacteria and Archaea), some species still lack both NucS and MutS-MutL
(confirmed by protein translated searches -tblastn- in Actinobacteria and Archaea,
Supplementary Data 1).
A model for the origin and evolution of NucS
Our results support a complex evolutionary ancestry for NucS. Comprehensive protein
sequence analyses of NucS indicate that the protein contains two distinct regions, a DNA-
binding N-terminal domain (NucS-NT) and an endonuclease C-terminal domain (NucS-CT)
(Supplementary Figs. 3-4), in agreement with previous structural studies 11
. At the sequence
level, using profile hidden Markov models (HMMs), we also detected these regions in
alternative proteins outside the context of NucS, supporting the original independence of these
domains, which have been subsequently fused during evolution.
9
To understand the particular distribution observed in the phylogenetic profile, we
conducted sequence and phylogenetic independent analyses of full NucS and also the N-
terminal and C-terminal regions (Supplementary Fig. 5-7). The actinobacterial NucS
representatives are well separated from those of Archaea. On the other hand, the NucS from
Deinococcus-Thermus species group together with the archaeal proteins, instead of the
actinobacterial orthologues, suggesting that NucS has recently been transferred from Archaea
to these Deinococcus-Thermus species (Supplementary Figs. 5-7).
The sequence analyses provide different distributions for the two regions of NucS.
While the N-terminal region is limited exclusively to Archaea and some Bacteria
(Supplementary Fig. 7), the C-terminal region was found in many archaeal species, some
Bacteria, and in a few eukaryotes (Supplementary Fig. 6) and, importantly, also in different
domain architectures. These observations, in the context of our phylogenetic analyses, suggest
that both regions may have emerged in Archaea. Subsequently, the C-terminal region may
have been transferred to some bacterial species and to a few eukaryotes, and got fixed in
different protein contexts. The full protein and the individual domains likely got lost in certain
groups during the evolution of Archaea. However, the reconstruction of the precise evolution
of these individual domains requires deeper analyses in the context of related protein
domains.
Phylogenetic analysis of the full protein suggests that NucS emerged after the archaeal
divergence by a rearrangement of these domains and then it was transferred from Archaea to
certain Deinococcus-Thermus species in at least one event. For Actinobacteria, the most
parsimonious explanation suggests a horizontal transfer event of NucS from Archaea coupled
with a likely MutS-MutL loss event in the last common ancestor of Actinobacteria (as most of
the contemporary NucS-encoding species lack MutS and MutL). This is consistent with the
fact that we did not find any species from the entire Bacteria group, except those from
Actinobacteria and the Deinococcus-Thermus group, having NucS or NucS-like proteins. Our
model (Fig. 6) favours the simplest scenario to explain the complex pattern distribution of the
full NucS protein and its absence in all eukaryotes, the vast majority of bacteria, and some
archaeal organisms without invoking unlikely massive losses. This is in agreement with our
phylogenetic observations.
10
Discussion
This study provides genetic and biological evidence that establish the existence of a
DNA repair system that mimics the canonical MutS-MutL-based MMR pathway.
To date, the correction of mismatched nucleobases has been defined as a highly
conserved mechanism whose key steps are performed, in all cases, by members of the MutS
and MutL protein families. Despite the genomes of Crenarchaeota, a few groups of
Euryarchaeota and almost all members of the phylum Actinobacteria lacking identifiable
MutS or MutL homologues 5-7
, they exhibit rates and spectra of spontaneous mutations similar
to canonical MMR-bearing bacterial species 8-10
, suggesting the existence of still undetected
pathway responsible for this type of correction. Although recent biochemical and structural
reports suggested the existence of a novel mismatch-specific endonuclease, NucS, in the
archaeal species T. kodakarensis 12,14
that is able to recognize and cleave mismatched bases in
double-strand DNA in vitro, no genetic and/or biological evidence had been reported to date
on the activity of a novel mismatch repair pathway.
The screening of a large library of M. smegmatis mutants revealed that NucS is a key
mutation avoidance component in Actinobacteria. Through genetic and biological analysis,
we demonstrate that the nucS-null phenotypes in M. smegmatis are almost identical to those
produced by the MMR deficiency in other bacteria (very high mutation rates, transition-biased
mutational spectrum and increased homeologous recombination rates). The anti-mutator
nature of NucS is also demonstrated in Streptomyces coelicolor, a different species of the
class Actinobacteria. Therefore, NucS appears to be an important DNA repair factor which,
together with the high-fidelity DNA-polymerase DnaE1 and its PHP domain-proofreader 30
,
maintain low mutation rates (~10-10
mutations per base per generation 9,10
), ensuring genome
stability and DNA fidelity in mycobacteria and in other Actinobacteria.
Although single nucleotide polymorphisms (SNPs) in DNA repair/replication genes
have been suggested as a source of hypermutation in M. tuberculosis 31,32
, only small
increases in mutation rates have been observed due to these polymorphic genes (e.g.
polymorphisms in PHP exonuclease domain of the DNA-polymerase DnaE1 30
). Our results
suggest the existence of naturally occurring hypermutable M. tuberculosis variants with
diminished NucS activity. Whether or not this is relevant for the virulence and adaptation to
antibiotic treatments remains to be deciphered.
Although we observed that purified mycobacterial NucS binds to single-stranded but
not to double-stranded DNA, as described for archaeal NucS proteins 11,12
, no significant
specific cleavage activity was observed on mismatched substrates, suggesting that activation
11
by other partners and/or modifications (e.g. post-translational modifications) is required in M.
smegmatis. Also, the possibility of a functional difference between bacterial and archaeal
NucS cannot be ruled out. Therefore, additional studies are needed to assess whether
mycobacterial and archaeal NucS proteins have the same functional requirements.
Our computational studies support an archaeal origin for NucS, as previously
suggested 12
, built upon two distinct domains that suffered a complex evolutionary history of
transfers and/or losses, including at least two HGT events to Actinobacteria and Deinococcus-
Thermus. Indeed, the contribution of horizontal gene transfer in bacterial and archaeal
evolution is well-established 33,34
. Interestingly, NucS and MutS-MutL systems seem to be
present alternatively in different species, with only a few exceptions. In Actinobacteria, it is
possible that the acquisition of NucS may have facilitated the subsequent loss of MutS-MutL,
as these canonical proteins are so widely conserved across Bacteria.
On the other hand, there are two groups (Halobacteria and some Deinococcus-
Thermus) where NucS and MutS-MutL coexist. Notably, in a Halobacteria, Halobacterium
salinarum, inactivation of mutS or mutL produced no hypermutability 35
, suggesting that MutS
and MutL are redundant to an alternative system that controls spontaneous mutation. This
indicates that evolution of these particular species may have opted to keep both systems. The
possible interplay between both pathways remains to be elucidated. Surprisingly, NucS and
MutS-MutL are apparently absent in some species. Therefore, the possible existence of
additional alternative MMR repair pathways, yet to be identified, cannot be discarded.
In conclusion, we propose that MMR is a mechanism that can be either accomplished
by either the MutS-L or the NucS pathway. Understanding the mechanisms and pathways that
influence genome stability may unveil new strategies to predict and combat the development
of drug resistance. In addition, engineered Mycobacterium strains lacking this mutation-
avoidance pathway, may be valuable tools for evaluating anti-tuberculosis treatments,
including new drugs, drug combinations and non-antibiotic based regimes.
12
Methods
Bacterial strains, media and growth conditions.
M. smegmatis wild-type strain mc2 155 and its mutant derivatives were grown at 37ºC
in Middlebrook 7H9 broth or Middlebrook 7H10 agar (Difco) with 0.5% glycerol and 0.05%
Tween 80, and enriched with 10% albumin-dextrose-catalase (Difco). Escherichia coli strains
were cultured at 37ºC in LB medium. Streptomyces coelicolor A3(2) M145 and its derivatives
were grown at 30ºC on mannitol–soya (MS) agar.
All primers used in this work are listed in Supplementary Table 6.
Generation and screening of a M. smegmatis insertion library
Transposon ΦMycoMarT7 36
was used to obtain a M. smegmatis mutant library of
11,000 independent clones. For transduction, M. smegmatis mc2 155 cultures were washed,
mixed with the phage stock at a multiplicity of infection of 1:10 (37ºC, 3 h) and plated on
kanamycin (25 µg ml-1
). The insertion mutants were isolated and inoculated into 96-well
microtiter plates containing Middlebrook 7H9 medium. To identify hypermutators, each
mutant was inoculated onto 7H10 agar plates with rifampicin (20 µg ml-1
). One transposon
mutant, producing a high number of rifampicin resistant (Rif-R) colonies, was selected for
further characterization. The transposon insertion site in the chromosome was determined by
sequencing.
Generation of a ΔnucS knockout mutant in M. smegmatis
M. smegmatis ΔnucS (MSMEG_4923, nucSSm) in-frame deletion mutant was generated
by allelic replacement 37
. Briefly, p2NIL-ΔnucSSm, harbouring an in-frame deletion of the
target gene, was electroporated into M. smegmatis mc2 155. Single-crossover merodiploid
clones were isolated and after that, counter-selected to generate the unmarked deletion by a
second crossover event. Finally, putative M. smegmatis ΔnucS colonies were checked by PCR
and sequencing. For additional details see Supplementary Methods.
Complementation of the M. smegmatis ΔnucS mutant
For complementation, a wild type of the MSMEG_4923 gene (nucSSm) and the wild-
type full-length MT_1321 gene from CDC1551 control strain (nucSTB), including their own
promoter regions, were cloned into the vector pMV361 38
and introduced into M. smegmatis
ΔnucS by electroporation. pMV361 is an integrative vector that carries a site-specific
13
integration system derived from the mycobacteriophage L5. It integrates at a single site in the
M. smegmatis chromosome, the attB, which overlaps the 3´ end of the tRNA-glycine gene 39
.
Integration at the appropriate site was verified for all constructs by PCR. For additional details
see Supplementary Methods.
Generation of ΔnucS S. coelicolor knockout mutant and complementation
S. coelicolor ΔnucS (gene SCO5388, nucSSco) in-frame deletion mutant was
constructed by double crossover recombination using the plasmid pIJ6650 40
. pIJ-ΔnucSSco
(containing the in-frame deletion of the nucSSco gene) was introduced in S. coelicolor A3(2)
M145 to generate the unmarked deletion of the nucSSco gene. Finally, S. coelicolor ΔnucS was
complemented with a wild-type copy of nucSSco inserted in the attB site of the S. coelicolor
ΔnucS via the integrative vector pSET152 (Supplementary Methods).
Estimation of mutation rates
Fluctuation analyses were used to experimentally address mutation rates. For each
experiment, 20 independent cultures of M. smegmatis mc2 155 and its derivatives were grown
and diluted in order to inoculate 1,000-10,000 cells per ml into fresh medium. All the cultures
were incubated until they reached stationary phase (about 109 cells per ml). Appropriate
dilutions were plated on Middlebrook 7H10 medium with or without rifampicin (100 µg ml-1
)
or streptomycin (50 µg ml-1
). For S. coelicolor, 20 independent concentrated spore
suspensions (containing ∼109 spores per ml) from each analysed strain were generated on MS
agar, and after that, plated on MS agar with or without rifampicin (100 µg ml-1
) or
streptomycin (50 µg ml-1
). At least 3 different experiments were performed for each
fluctuation analysis in both cases.
The expected number of mutations per culture (m) and 95% confidence intervals were
calculated using the maximum likelihood estimator applying the newton.LD.plating and
confint.LD.plating functions that account for differences in plating efficiency implemented in
the package rSalvador (http://eeeeeric.com/rSalvador/) (Qi Zheng. Rsalvador: an assay. R
package version 1.3) for R (www.R-project.org/). Mutation rates (mutations per cell per
generation) were then calculated by dividing m by the total number of generations, assumed
to be roughly equal to the average final number of cells. Statistical comparisons were carried
out by the use of the LRT.LD.plating function of the same package, which accounts for the
differences in final number of cells and plating efficiency. Finally, in case of multiple
comparisons p-values were corrected by the Bonferroni method.
14
Characterization of the mutational spectrum.
The mutational spectra conferred by M. smegmatis mc2 155 and its ΔnucS derivative were
characterised by selecting for resistance to rifampicin and sequencing the rifampicin
resistance-determining region (RRDR) in the rpoB gene, from independent Rif-R isolates.
Recombination assays
The recombination assay vectors (pRhomyco series) were generated by cloning two
overlapping fragments of the hygromycin-resistance gene (hyg), from pRAM 41
, flanking a
functional kanamycin-resistance (kan) gene in the pMV361 plasmid (Supplementary Fig. 2).
Neither of the fragments of hyg conferred resistance to hygromycin on their own. Fragments
share a duplicated 517 bp overlapping region, with different degrees of sequence identity
(100%, 95%, 90% or 85%). M. smegmatis mc2 155 and its ΔnucS derivative were transformed
by electroporation with the integrative plasmids pRhomyco 100%, 95%, 90% and 85%
designed for the recombination assays and plated on Middlebrook 7H10 containing
kanamycin (25 µg/µl). Integration of pRhomyco vectors in the appropriate site (attB) of the
M. smegmatis mc2 155 (wild-type and ΔnucS) chromosome was verified by PCR.
To measure recombination rates, we analysed the restoration of a functional hyg gene
that confers Hyg-R by a recombination event between the two overlapping fragments of the
hyg gene. Sixteen independent overnight cultures from each strain were grown in
Middlebrook 7H9 broth plus kanamycin (to prevent the selection of Hyg-R bacteria coming
from the early recombination of pRhomyco). Approximately, 104 bacteria were inoculated in
plain 7H9 broth (without kanamycin) to allow the recombination events and cultured 24h at
37ºC with shaking. Cultures were plated on 7H10 plus 50 µg ml-1
hygromycin (for
recombinant cells count) and in addition, appropriate dilutions were plated on plain 7H10 (for
viable cell counts) and incubated for 3-5 days. Total number of Hyg-R colonies was
considered to calculate recombination rates. Recombination rates were calculated and
analysed as described in estimation of mutation rates.
To validate the recombination assay, we first verified that no Hyg-R colonies were
generated by spontaneous mutation, even in the ΔnucS derivative. The mutation rate was
≤1x10-10
for the hypermutable derivative, far below the lowest recombination value (6x10-9
for WT, 15% non-identical sequences). These results indicate that the Hyg-R colonies from
our recombination assay were not generated by spontaneous mutations. Moreover, we verified
that recombinant clones express hygromycin resistance and kanamycin susceptibility by
picking 20-30 Hyg-R colonies from each strain onto kanamycin plates. In all cases they were
15
Hyg-R and Kan-S. Finally, we randomly isolated 10 independent Hyg-R presumptive
recombinant colonies from each construct (80 colonies total) and verified by PCR that a
single fragment of the size of the reconstituted hyg gene was amplified in all cases.
NucS purification and biochemical activity.
M. smegmatis nucS was cloned as a SUMO fusion and expressed in E. coli BL21 cells
overnight at 16ºC. NucS protein was purified by affinity chromatography using Ni2+
–NTA
agarose column followed by an additional step of anion exchange purification in HiTrap Q
HP column. NucS-SUMO fusion was cleaved by Ulp protease to release a native protein.
Purified protein was collected and concentrated to perform the activity assays (Supplementary
Fig. 1A). In addition, His-tagged M. smegmatis nucS was also cloned in pET28b (for E. coli
expression) and pYUB28b (for M. smegmatis expression). His-tagged NucS was also purified
by affinity chromatography and anion exchange purification as described before.
For DNA EMSA assays, NucS purified protein (1 µM to 16 µM) was incubated with
fluorescein-labelled single-stranded (45-mer DNA, 50 nM) or double-stranded DNA (45-bp
DNA, 50 nM), in 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 10% glycerol and 10 µg ml-1
BSA
for 30 minutes at 37ºC. Protein-DNA complexes were resolved on 5% polyacrylamide gel in
0.25xTBE plus 1% glycerol and run under refrigerated conditions (5ºC-10ºC).
For nuclease activity, fluorescein-labelled 36-mer DNA was annealed to a fully
complementary opposite strand (control) or carrying a single nucleotide mismatch in the
central region of the strand (mismatch substrates). 30 nM dsDNA substrates were incubated
with 300 nM of recombinant, tag free NucS in 20 mM Tris-HCl pH 7.5, 6 mM ammonium
sulfate, 2 mM MgCl2 and 100 mM NaCl, 0.1 mg ml-1
BSA and 0.1% TritonX-100. The
reactions were carried out at 37°C for 30 minutes when 0.5 U Proteinase K was added to each
reaction to digest NucS nuclease. Digestion was continued for 30 minutes at 50°C. Stop
solution was added (95% formamide, 0.09% xylene cyanol), samples were boiled at 95°C for
10 minutes and resolved on 7M urea, 15% polyacrylamide gel in 1xTBE for 2 hours. Mg, Mn
and Zn ions, and combinations of these three metals, were tested as cofactors with no
cleavage observed.
Analysis of nucS SNPs in M. tuberculosis clinical strains
Sequences from the MT_1321 gene product were downloaded from the Ensembl
database (http://bacteria.ensembl.org/index.html), M. tuberculosis variome resource 42
and
16
Comas et al. data 43
. 1,600 proteins were aligned using MAFFT and the polymorphisms were
visualized with Jalview.
Construction of nucS alleles by site-directed mutagenesis.
To construct nucS alleles, wild type nucSTB gene was PCR amplified and cloned into
pUC19 to be used as template for mutagenic PCRs. Single specific mutations were introduced
into nucSTB by PCR with the suitable mutagenic pairs of primers. Following the PCR reaction,
a DpnI digestion was carried out for 4 hours at 37ºC. Mutagenized plasmids obtained after
transformation into E. coli DH5α were verified by DNA sequencing. All the nucS alleles were
re-cloned into the integrative vector pMV361 to generate a set of complementation plasmids
and introduced by transformation into M. smegmatis mc2 155 ΔnucS. These plasmids integrate
at a single site, attB, in the M. smegmatis chromosome.
In order to test the efficiency of each nucS allele to restore normal mutation rates, the
plasmids pMV361 carrying the mutated alleles were integrated in M. smegmatis ΔnucS
chromosome. Integration of each nucSTB allele in the appropriate site (attB) of the
chromosome was verified by PCR.
Computational analyses of the NucS protein.
A summary of all the computational approaches conducted is depicted in
Supplementary Fig. 4.
To establish how general the presence of NucS is in the domains on life, we conducted
sequence analyses to initially identify NucS proteins (Supplementary Fig. 4).
NUCS_MYCTU (M. tuberculosis NucS) and NUCS_PYRAB (P. abyssi NucS) sequences
were used to find homologues. Each full sequence was used as an independent query in
pHMMER searches (HMMER3; http://hmmer.org/) against the large database of the
concatenated reference proteomes (2016_02 release of 17th February 2016). We next
collected sequences excluding the original sequences used to conduct the queries. The
remaining sequences were filtered by e-value (removed >0.0001), bit score >50, and length
(only those >75% length were kept), and further aligned by MAFFT
[http://mafft.cbrc.jp/alignment/software/] 44
. The alignments were visualized with Belvu
[http://sonnhammer.sbc.su.se/Belvu.html] 45
to check for quality. We removed the redundancy
of the alignment discarding sequences with more than 63% of sequence identity. We next
generated Markov profiles trained with the non-redundant multiple sequence alignment (using
HHMER3). For each protein, we obtained about the same 400 sequences that would give also
17
partial hits to different species, which suggested a potential existence of different domains in
the protein. Further checking of the sequences retrieved the final 370 sequences. As NucS was
not identified in eukaryotes or viruses, they were discarded for further analyses.
We constructed a phylogenetic profile of NucS (Supplementary Data 1) and mapped it
into the NCBI taxonomic tree (Figure 5, the newick tree format is available in Supplementary
Data 2). The tree was annotated using ggtree (http://www.bioconductor.org/packages/ggtree).
Characterization of the two NucS regions
To identify potential domains in NucS, we focused on the archaeal P. abyssi NucS,
whose 3D structure was previously resolved (PDB 2VLD chain B) 11
. This structure is
composed by two distinct regions: the N-terminal DNA binding region (1-114 amino acids)
and the C-terminal catalytic region (126-233 amino acids) 11
. Given the absence of structural
data for the bacterial NucS protein, the M. tuberculosis NucS structure was modelled using
the archaeal P. abyssi NucS as a template using I-TASSER 16
and generated a reliable model
(Supplementary Fig. 3). Template and model were structurally aligned using the
Combinatorial Extension (CE) algorithm17
, built-in with PyMOL, to identify relevant
residues. We used the structure-based alignment between this template and M. tuberculosis
NucS (NucS_MYCTU) (Supplementary Fig. 3) to determine the different regions for further
sequence analyses (N-terminal and C-terminal regions). These analyses were conducted using
profile searches as above and confirmed by independent analyses of the PF01939 domain
trained with NucS (for details see Supplementary Methods and Supplementary Fig. 4).
We conducted sequence searches of NucS_MYCTU in the PFAM database, where
the PF01939 domain (DUF91, automatic) was identified. From the PF01939 domain (539
sequences in 464 species), only 368 entries in 363 species (archaeal and bacterial) gave a
match with the full protein (Supplementary data file 1). The results obtained from the PFAM
database searches for the PF01939 domain, are in agreement with the structure-based profiles
analyses (see above) confirming the existence of two distinct regions.
Taking into account the domain boundaries established above, we next aligned the
368 full NucS proteins (from PFAM PF01939) to profiles built from the structural alignment
corresponding to i) the whole protein ii) the N-terminal region (NT) and iii) the C-terminal
region (CT).
18
Sequence analysis of NucS domains
Using the different defined regions by structural bioinformatics, we first conducted
sequence searches against the large database. We made non-redundant alignments of the N-
terminal region (containing 63 sequences) and the C-terminal region (containing 39
sequences) and built profile hidden Markov models (HMMs). These profiles were searched
using pHMMER 46
against the large References Proteomes file.
Our analyses identified the C-terminal region in all the original 368 NucS sequences
(for details see Supplementary Methods and Supplementary Fig. 4). In addition, the C-
terminal was also found in proteins containing a variety of additional protein domains in other
prokaryotic species and in few additional eukaryotic sequences. We checked the alignments to
confirm that the catalytic residues were conserved, as described in the structure of P. abyssi
11. When a profile with the eukaryotic sequences (excluding any bacterial or archaeal
sequences) was generated, we retrieved again the C-terminal regions of NucS above threshold
at significant values, but we did not retrieve any other eukaryotic sequences at reliable values.
On the other hand, the N-terminal region was found mainly in bacterial and archaeal NucS,
but also in two archaeal proteins annotated as topoisomerases. A pHMMER search of the
regions of these sequences yielded the N-terminal of NucS.
Phylogenetic analyses
We conducted Maximum Likelihood (ML) phylogenetic analyses using RAxML 47
in
order to construct phylogenetic trees that could explain the phylogenetic profile
(Supplementary Fig. 4).
To run phylogenetic analyses, NucS sequences from the PF01939 domain were used,
but aligned to their corresponding structure-based regions. Sequences found in our searches
with independent regions were also included. Out of 539, 368 unique sequences aligned with
the full NucS structure-based profile (Supplementary Fig. 5, Supplementary Data 3), 422
sequences containing the C-terminal region (NucS-CT) (Supplementary Fig. 6), 370
sequences containing the N-terminal region (NucS-NT) (Supplementary Fig. 7) were run.
Sequences were aligned to the structure-based profiles, and the alignments were
subjected to ML search and 1,500 bootstrap replicates using RAXML 47
. All free model
parameters were estimated by RAxML where we used a GAMMA model of rate
heterogeneity and ML estimate of alpha-parameter. The most likely selected model was LG
48. For accuracy and focus on our alignment, we run the sequential version of RAxML
creating various sets of starting trees (10 by manual inferring different starting trees and 10 as
19
automatically inferred by the program). The best setting (the closest likelihood to zero) was
used for further calculations.
To run phylogenetic analyses, sequences from the PFAM PF01939 domain were used,
but aligned to their corresponding region. For details of the particular regions for each domain
see Supplementary Data 4. Sequences found in our searches with NT or CT regions outside
NucS were also included. Phylogenetic trees were visualized with iTOL (http://itol.embl.de)
49. The raw trees corresponding to both CT and NT regions (Supplementary Figs. 6 and 7) are
available as Supplementary Data 5 and 6 respectively.
Data Availability
The authors declare that all data supporting the findings of this study are available within the
paper and its supplementary information files.
20
References
1 Friedberg, E. et al. DNA Repair and Mutagenesis. Washington DC, USA: American
Society of Microbiology. 2nd Edition edn, (2006).
2 Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins,
and processes. Mutation research 435, 171‐213 (1999).
3 Iyer, R. R., Pluciennik, A., Burdett, V. & Modrich, P. L. DNA mismatch repair:
functions and mechanisms. Chem Rev 106, 302‐323, doi:10.1021/cr0404794
(2006).
4 Matic, I., Rayssiguier, C. & Radman, M. Interspecies gene exchange in bacteria: the
role of SOS and mismatch repair systems in evolution of species. Cell 80, 507‐515,
doi:0092‐8674(95)90501‐4 [pii] (1995).
5 Sachadyn, P. Conservation and diversity of MutS proteins. Mutation research 694,
20‐30, doi:10.1016/j.mrfmmm.2010.08.009 (2010).
6 Mizrahi, V. & Andersen, S. J. DNA repair in Mycobacterium tuberculosis. What
have we learnt from the genome sequence? Molecular microbiology 29, 1331‐
1339 (1998).
7 Banasik, M. & Sachadyn, P. Conserved motifs of MutL proteins. Mutation research
769, 69‐79, doi:10.1016/j.mrfmmm.2014.07.006 (2014).
8 Springer, B. et al. Lack of mismatch correction facilitates genome evolution in
mycobacteria. Molecular microbiology 53, 1601‐1609, doi:10.1111/j.1365‐
2958.2004.04231.x (2004).
9 Ford, C. B. et al. Use of whole genome sequencing to estimate the mutation rate of
Mycobacterium tuberculosis during latent infection. Nature genetics 43, 482‐486,
doi:10.1038/ng.811 (2011).
10 Kucukyildirim, S. et al. The Rate and Spectrum of Spontaneous Mutations in
Mycobacterium smegmatis, a Bacterium Naturally Devoid of the Post‐replicative
Mismatch Repair Pathway. G3 (Bethesda), doi:10.1534/g3.116.030130 (2016).
11 Ren, B. et al. Structure and function of a novel endonuclease acting on branched
DNA substrates. The EMBO journal 28, 2479‐2489, doi:emboj2009192
[pii]10.1038/emboj.2009.192 (2009).
12 Ishino, S. et al. Identification of a mismatch‐specific endonuclease in
hyperthermophilic Archaea. Nucleic acids research, doi:10.1093/nar/gkw153
(2016).
13 Wagner, R. et al. Involvement of Escherichia coli mismatch repair in DNA
replication and recombination. Cold Spring Harb Symp Quant Biol 49, 611‐615
(1984).
14 Nakae, S. et al. Structure of the EndoMS‐DNA Complex as Mismatch Restriction
Endonuclease. Structure 24, 1960‐1971, doi:10.1016/j.str.2016.09.005 (2016).
15 Gross, M. D. & Siegel, E. C. Incidence of mutator strains in Escherichia coli and
coliforms in nature. Mutation research 91, 107‐110 (1981).
16 LeClerc, J. E., Li, B., Payne, W. L. & Cebula, T. A. High mutation frequencies among
Escherichia coli and Salmonella pathogens. Science 274, 1208‐1211 (1996).
17 Matic, I. et al. Highly variable mutation rates in commensal and pathogenic
Escherichia coli. Science 277, 1833‐1834 (1997).
18 Oliver, A., Canton, R., Campo, P., Baquero, F. & Blazquez, J. High frequency of
hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. Science
288, 1251‐1254, doi:8507 [pii] (2000).
21
19 Picard, B. et al. Mutator natural Escherichia coli isolates have an unusual
virulence phenotype. Infect Immun 69, 9‐14, doi:10.1128/IAI.69.1.9‐14.2001
(2001).
20 Giraud, A., Matic, I., Radman, M., Fons, M. & Taddei, F. Mutator bacteria as a risk
factor in treatment of infectious diseases. Antimicrobial agents and chemotherapy
46, 863‐865 (2002).
21 Macia, M. D. et al. Hypermutation is a key factor in development of multiple‐
antimicrobial resistance in Pseudomonas aeruginosa strains causing chronic lung
infections. Antimicrobial agents and chemotherapy 49, 3382‐3386,
doi:49/8/3382 [pii] 10.1128/AAC.49.8.3382‐3386.2005 (2005).
22 Muller, B., Borrell, S., Rose, G. & Gagneux, S. The heterogeneous evolution of
multidrug‐resistant Mycobacterium tuberculosis. Trends in genetics : TIG 29, 160‐
169, doi:10.1016/j.tig.2012.11.005 (2013).
23 Ford, C. B. et al. Mycobacterium tuberculosis mutation rate estimates from
different lineages predict substantial differences in the emergence of drug‐
resistant tuberculosis. Nature genetics 45, 784‐790, doi:10.1038/ng.2656 (2013).
24 Cox, E. C. Bacterial mutator genes and the control of spontaneous mutation. Annu
Rev Genet 10, 135‐156, doi:10.1146/annurev.ge.10.120176.001031 (1976).
25 Lee, H., Popodi, E., Tang, H. & Foster, P. L. Rate and molecular spectrum of
spontaneous mutations in the bacterium Escherichia coli as determined by
whole‐genome sequencing. Proceedings of the National Academy of Sciences of the
United States of America 109, E2774‐2783, doi:10.1073/pnas.1210309109
(2012).
26 Garibyan, L. et al. Use of the rpoB gene to determine the specificity of base
substitution mutations on the Escherichia coli chromosome. DNA Repair (Amst) 2,
593‐608 (2003).
27 Tham, K. C. et al. Mismatch repair inhibits homeologous recombination via
coordinated directional unwinding of trapped DNA structures. Mol Cell 51, 326‐
337, doi:10.1016/j.molcel.2013.07.008 (2013).
28 Feinstein, S. I. & Low, K. B. Hyper‐recombining recipient strains in bacterial
conjugation. Genetics 113, 13‐33 (1986).
29 Spies, M. & Fishel, R. Mismatch repair during homologous and homeologous
recombination. Cold Spring Harbor perspectives in biology 7, a022657,
doi:10.1101/cshperspect.a022657 (2015).
30 Rock, J. M. et al. DNA replication fidelity in Mycobacterium tuberculosis is
mediated by an ancestral prokaryotic proofreader. Nature genetics,
doi:10.1038/ng.3269 (2015).
31 Ebrahimi‐Rad, M. et al. Mutations in putative mutator genes of Mycobacterium
tuberculosis strains of the W‐Beijing family. Emerging infectious diseases 9, 838‐
845, doi:10.3201/eid0907.020589 (2003).
32 Dos Vultos, T., Blazquez, J., Rauzier, J., Matic, I. & Gicquel, B. Identification of Nudix
hydrolase family members with an antimutator role in Mycobacterium
tuberculosis and Mycobacterium smegmatis. Journal of bacteriology 188, 3159‐
3161, doi:188/8/3159 [pii] 10.1128/JB.188.8.3159‐3161.2006 (2006).
33 Dagan, T., Artzy‐Randrup, Y. & Martin, W. Modular networks and cumulative
impact of lateral transfer in prokaryote genome evolution. Proceedings of the
National Academy of Sciences of the United States of America 105, 10039‐10044,
doi:10.1073/pnas.0800679105 (2008).
22
34 Gophna, U., Charlebois, R. L. & Doolittle, W. F. Have archaeal genes contributed to
bacterial virulence? Trends in microbiology 12, 213‐219,
doi:10.1016/j.tim.2004.03.002 (2004).
35 Busch, C. R. & DiRuggiero, J. MutS and MutL are dispensable for maintenance of
the genomic mutation rate in the halophilic archaeon Halobacterium salinarum
NRC‐1. PloS one 5, e9045, doi:10.1371/journal.pone.0009045 (2010).
36 Sassetti, C. M., Boyd, D. H. & Rubin, E. J. Comprehensive identification of
conditionally essential genes in mycobacteria. Proceedings of the National
Academy of Sciences of the United States of America 98, 12712‐12717,
doi:10.1073/pnas.231275498 (2001).
37 Parish, T. & Stoker, N. G. Use of a flexible cassette method to generate a double
unmarked Mycobacterium tuberculosis tlyA plcABC mutant by gene replacement.
Microbiology 146 ( Pt 8), 1969‐1975, doi:10.1099/00221287‐146‐8‐1969
(2000).
38 Stover, C. K. et al. New use of BCG for recombinant vaccines. Nature 351, 456‐
460, doi:10.1038/351456a0 (1991).
39 Lee, M. H., Pascopella, L., Jacobs, W. R., Jr. & Hatfull, G. F. Site‐specific integration
of mycobacteriophage L5: integration‐proficient vectors for Mycobacterium
smegmatis, Mycobacterium tuberculosis, and bacille Calmette‐Guerin.
Proceedings of the National Academy of Sciences of the United States of America
88, 3111‐3115 (1991).
40 Paget, M. S. et al. Mutational analysis of RsrA, a zinc‐binding anti‐sigma factor
with a thiol‐disulphide redox switch. Molecular microbiology 39, 1036‐1047
(2001).
41 Hinds, J. et al. Enhanced gene replacement in mycobacteria. Microbiology 145 ( Pt
3), 519‐527, doi:10.1099/13500872‐145‐3‐519 (1999).
42 Joshi, K. R., Dhiman, H. & Scaria, V. tbvar: A comprehensive genome variation
resource for Mycobacterium tuberculosis. Database (Oxford) 2014, bat083,
doi:10.1093/database/bat083 (2014).
43 Comas, I. et al. Out‐of‐Africa migration and Neolithic coexpansion of
Mycobacterium tuberculosis with modern humans. Nature genetics 45, 1176‐
1182, doi:10.1038/ng.2744 (2013).
44 Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid
multiple sequence alignment based on fast Fourier transform. Nucleic acids
research 30, 3059‐3066 (2002).
45 Sonnhammer, E. L. & Hollich, V. Scoredist: a simple and robust protein sequence
distance estimator. BMC Bioinformatics 6, 108, doi:10.1186/1471‐2105‐6‐108
(2005).
46 Eddy, S. R. A new generation of homology search tools based on probabilistic
inference. Genome Inform 23, 205‐211 (2009).
47 Stamatakis, A. RAxML‐VI‐HPC: maximum likelihood‐based phylogenetic analyses
with thousands of taxa and mixed models. Bioinformatics 22, 2688‐2690,
doi:10.1093/bioinformatics/btl446 (2006).
48 Le, S. Q. & Gascuel, O. An improved general amino acid replacement matrix.
Molecular biology and evolution 25, 1307‐1320, doi:10.1093/molbev/msn067
(2008).
49 Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the
display and annotation of phylogenetic and other trees. Nucleic acids research 44,
W242‐245, doi:10.1093/nar/gkw290 (2016).
23
50 Hug, L. A. et al. A new view of the tree of life. Nat Microbiol 1, 16048,
doi:10.1038/nmicrobiol.2016.48 (2016).
24
End Notes
Acknowledgments.
J.B. was supported by Plan Nacional de I+D+i and Instituto de Salud Carlos III,
Subdirección General de Redes y Centros de Investigación Cooperativa, Ministerio de
Economía y Competitividad, Spanish Network for Research in Infectious Diseases (REIPI
RD12/0015) - co-financed by European Development Regional Fund "A way to achieve
Europe" ERDF and SAF2015-72793-EXP and BFU2016-78250-P from Spanish Ministry of
Science and Competitiveness (MINECO)-FEDER). A.J.D. was supported by a grant from the
Biotechnology and Biological Sciences Research Council (BB/J018643/1). A. C-G. and J. R-
B. were supported by contracts from REIPI RD15/0012. and A. C-G also by a Ramon Areces
fellowship for Life Sciences. A. P. was supported by a “Sara Borrell” contract, Instituto de
Salud Carlos III. T.T. was supported by the Research Council of Norway, GLOBVAC project
234506. We are grateful to I. Cases for his help customizing the R libraries for tree
visualization, Federico Abascal for critical input regarding models of evolution, I. Comas for
critical reading of the manuscript and J. Glynn and S. Hoffner for providing additional data of
M. tuberculosis strains.
Author contributions: J.B. designed the project, directed the experimental work, constructed
S. coelicolor strains and wrote the manuscript; A.C-G participated in the project design,
performed and designed experiments, made strains in M. smegmatis and S. coelicolor,
developed biochemical assays together with P.P and wrote the manuscript. A.I.P. made
strains, measured recombination rates, performed experiments with M. smegmatis and
bioinformatics search of NucS polymorphisms. A. M. R. performed all the computational
analyses (evolutionary, structural bioinformatics and sequence) and wrote the manuscript;
J.R-B. measured mutation rates and performed statistical analysis; N.A-R, D.C., C.C. L.P., E.
D. Z., M. Herranz, T.T., D.G-V, S.W, M. P. and A.J.D. contributed with strain construction
and/or measured mutation rates. A.J.D. performed data base exploration, helped with NucS
structure and participated in writing the manuscript.
Conflict of interest. The authors declare no conflict of interest.
25
Figure legends.
Figure 1. Identification and characterization of M. smegmatis NucS. A) Schematic
representation of the process for identifying the nucS transposon mutant. ~11,000 clones from
the M. smegmatis insertion mutant library were replicated onto plates with (Rif) or without
(No Rif) rifampicin (step 1). One single clone (circled) produced a high number of Rif-R
colonies. After isolation and purification (step 2), the frequency of spontaneous Rif-R mutants
(bottom plate) was checked and compared with that of the wild-type (upper plate),
demonstrating its hypermutable phenotype. B) Multiple sequence alignment of NucS
sequences. M. tuberculosis (NucS_Mtu), M. smegmatis (NucS_Msm) and P. abyssi
(NucS_Pab) sequences are from Uniprot (identifiers are P9WIY4, A0R1Z0 and Q9V2E8,
respectively). Solid lines over the alignment indicate protein domains as defined previously
for P. abyssi NucS 11
(black, DNA-binding; grey, nuclease). Identical amino acid residues are
shown in black. Catalytic residues required for nuclease activity in P. abyssi NucS 11
are
labelled with asterisks. Nuclease motifs of P. abyssi NucS are also indicated 11
. C) DNA
binding activity of NucS. In a gel-based electrophoretic mobility shift assay (EMSA), purified
NucS protein (1 µM to 16 µM) is capable of binding to 45-mer ssDNA (50 nM) (left side) but
not to 45-bp dsDNA (50 nM) (right side). The arrow indicates the position of the DNA-NucS
complex.
Figure 2. Mutational effects of nucS deletion.
A) Rates of spontaneous mutations conferring rifampicin, Rif-R (red), and streptomycin
resistance, Str-R (grey) of M. smegmatis mc2 155 (WT), its ΔnucS derivative and the ΔnucS
strain complemented with nucS from M. smegmatis mc2 155 (nucSSm). B) Mutational
spectrum of M. smegmatis mc2 155 (green) and its ∆nucS derivative (red). Bars represent the
frequency of the types of change found in rpoB. C) Rates of spontaneous mutations
conferring Rif-R (red) and Str-R (grey) of S. coelicolor A3(2) M145 (WT), its ΔnucS
derivative and the ΔnucS strain complemented with the wild-type nucS from S. coelicolor
(nucSSco).
Error bars represent 95% confidence intervals (n=20). Asterisks denote statistical significance
(Likelihood ratio test under Luria-Delbruck model, Bonferroni corrected, p-value <10-4
in all
cases). Mutation rate: mutations per cell per generation.
26
Figure 3. Effect of nucS deletion on recombination.
A) Chromosomal construct used to measure recombination between homologous or
homeologous DNA sequences. The hyg gene is reconstituted by a single recombination event
between two 517-bp overlapping fragments (striped), sharing different degree of sequence
identity (100%, 95%, 90% and 85%) and separated by a kanamycin resistant (Kan-R) gene.
Recombinant clones express hygromycin resistance and kanamycin susceptibility.
B) Rates of recombination between homologous and homeologous DNA sequences with
different degree of identity (%) in M. smegmatis mc2 155 (WT, green squares) and its ΔnucS
derivative (red diamonds). Error bars represent 95% confidence intervals (n=16). Asterisks
denote statistical significance (Likelihood ratio test under Luria-Delbruck model, Bonferroni
corrected, p-value <10-4
in all cases).
Figure 4. Effects of NucS polymorphisms on mutation rates in the M. smegmatis ΔnucS
surrogate model.
Rates of spontaneous mutations conferring Rif-R of the M. smegmatis ΔnucS complemented
with wild-type nucSTB (ΔnucS/nucSTB; red) or containing each of the 9 polymorphisms
indicated (blue). Relative increases in mutation rates with respect to the control strain
(ΔnucS/nucSTB; set to 1) are shown inside the column. Error bars represent 95% confidence
intervals (n=20). Asterisks denote statistical significance (Likelihood ratio test under Luria-
Delbruck model, Bonferroni corrected; ***p < 0.001; ** p < 0.005). Mutation rate: mutations
per cell per generation.
Figure 5. Phylogenetic profiling of NucS. The NCBI taxonomic tree from 2,186 species
from Bacteria (black outer label) and Archaea (blue outer label). Orange branches: NucS
only; green branches: NucS and MutS-MutL. Bacteria includes Actinobacteria, Firmicutes,
Proteobacteria, FCB (Fibrobacteres, Chlorobi and Bacteroidetes), Other Terrabacteria
(Armatimonadetes, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Tenericutes and
unclassified Terrabacteria) and other groups (Acidobacteria, Aquificae, Caldiserica,
Chrysiogenetes, Deferribacteres, Dictyoglomi, Elusimicrobia, Fusobacteria, Nitrospirae, PVC
group, Spirochaetes, Synergistetes, Thermodesulfobacteria, Thermotogae and unclassified
bacteria). Archaea includes Euryarchaeota, TACK (Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota) and unclassified archaeal species (*). As NucS is absent in
27
eukaryotes and viruses, these lineages were removed for clarity purposes. The tree was
annotated using ggtree (http://www.bioconductor.org/packages/ggtree).
Figure 6. A model for NucS protein emergence and evolution. The unrooted Tree of Life
(available and based on reference 50
) was used to depict the proposed evolutionary history of
NucS according to our data. The groups relevant to our model are highlighted. Coloured
squares depict the NucS-NT (blue) and NucS-CT (red) terminal regions. This model proposes
that NucS has an archaeal origin and emerged as a combination of two independent protein
domains with complex evolutionary history. Numbers indicate the steps of the model: Both
N-terminal and C-terminal regions likely emerged in the archaeal lineage (1). The CT region
was transferred via HGT to very few Eukaryotes and to some Bacteria (main groups with any
species having the NucS-CT region are labelled with red circles), where the CT domain
combined with other regions outside the context of NucS. In the archaeal lineage, NT and CT
regions fused to produce the full NucS (2). NucS expanded in many archaeal groups but was
also lost in some others. The full NucS protein was transferred to Bacteria by at least two
independent HGT events, one to some Deinoccocus-Thermus species (3) and another to
Actinobacteria (4).
MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLTENucS_Mtu 1--- RLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVTENucS_Msm 1HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIH-QNKKREPVNWQPPGSKVTFNucS_Pab 25
ESG-GQAPVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQLLNucS_Mtu 62QDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVELLNucS_Msm 59KEN-S---IISIRRRPYERLEVEIIEPYSLVVFLAEDYEELALTGSEAEM-ANLIFENPRVNucS_Pab 85
G--EGYTLVRREYMTAIGPVDLLCSDERG-GSVAVEIK-RRGEIDGVEQLTRYLELLNRDSNucS_Mtu 122G--AGYTLVRREYPTPIGPVDLLCRDELG-RSVAVEIK-RRGEIDGVEQLTRYLELLNRDSNucS_Msm 120IEEGFKPIYRE-KPIRHGIVDV-MGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEYNucS_Pab 141
VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYDTMRGMDSGEYRLFNucS_Mtu 179 226LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYDQMRGMDSDEYRLFNucS_Msm 177 224GE-NVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP-KKG---------NucS_Pab 200 236
Motif I
*
Motif II
**Motif III
**Motif QxxxY
**
M
A
B
CNucS NucS NucS
NucS-ssDNA
45-mer 50nM ssDNA 45-mer ssDNA 45bp-dsDNA
TACK group
FC
B g
roup
ProteobacteriaA
ctin
obact
eria
Firmicutes
Oth
er T
erra
bact
eria
Other gro
ups
Euryarchaeota
NucS+MutS/L
NucS
*
Cyanobacte
ria
Chloroflexi
Chlorobi
beta-ProteobacteriaF
irmicu
tes
Planctomycetes
Verrucomicrobia
Chlamydia
alpha-Proteobacteria
gamma-Proteobacteria
epsilon-Proteobacteria
Microgenomates
Parcubacteria
(1)
(4)
delta
-Pro
teobact
eria
Halobacteria
Deinococcus-Therm
us
Euryarchaeota
Crenarchaeota
Actinobacteria
Rubrobacteria
No NucS
Coriobacteriia
No NucS
(3)
(2)
EUKARYAPlacozoa,Porifera
NucS-CTNucS-NT
Horizontal gene transfer
Tree scale: 1
1
Supplementary Information
Supplementary Figures.
Supplementary Figure 1. Purified mycobacterial NucS does not cleave mismatch DNA
substrates in vitro. A) Native M. smegmatis NucS protein purification. Recombinant NucS protein
was expressed in E. coli, purified and concentrated, as described (see Methods). Native NucS
protein was analysed in a SDS-PAGE gel and stained with Coomassie Brillant Blue. Lane 1,
molecular marker; Lane 2: purified native M. smegmatis NucS; Lane 3: concentrated M. smegmatis
NucS protein (used for EMSA and nuclease assays). B) Double stranded DNA (30 nM) with or
without the indicated internal mismatches, was incubated with 300 nM NucS. Sequence consensus
is shown at the bottom and mismatch positions are indicated in red. No significant cleavage activity
was observed with the shown 36-mers or with longer 75-mer (not displayed here) mismatch
substrates.
C-A:C-A:
C-T:C-T:
C-C:C-C:
T-T:T-T:
T-G:T-G:
A-A:A-A:
A-G:A-G:
G-G:G-G:
NO:NO:
37
25
50
20
NucS
15
75
1 2 3
200
A B
10
kDa
150
100
dsDNA
2
A
B
Supplementary Figure 2. Tools for measuring recombination rates.
A) Alignment of the 517 bp overlapping sequences, with 100%, 95%, 90% and 85% identities,
shared by hyg 5´ and hyg 3´ regions, used to construct the different pRhomyco versions. Changes
3
introduced in the original sequence are highlighted. B) Cartoon of plasmid pRhomyco before and
after recombination. hyg 5’ and hyg 3’ are truncated alleles of the hyg gene carrying a 3´-terminal or
5´-terminal deletion, respectively. Both alleles overlap 517 bp (striped regions) and are separated by
a 1,200 bp region containing aph3 gene (Kan-R). Four versions of pRhomyco were generated
carrying hyg 5’ alleles 100%, 95%, 90% or 85% identical, in its overlapping region, to the hyg 3’.
Plasmids integrate in the chromosome of M. smegmatis mc2
155 and its ΔnucS derivative. Site-
specific recombination between the attP from the plasmid and the unique bacterial attB is promoted
by the plasmid-encoded integrase.
4
Supplementary Figure 3. Domain characterization of M. tuberculosis NucS and polymorphic
residues. A) The upper part is a schematic representation of the homodimeric structure of P. abyssi
NucS (2VDL) 1. The two distinct domains of NucS, N-terminal and C-terminal, are in blue and
N-terminal domainC-terminal domain
A
G157
D160
E174
Q187
Y191
K176
R42
R70
W75
HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIHQN-KKREPVNWQPPGSKVT2VLD_Paby 25M---RLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVTNucS_Mycsme 1MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLTNucS_Myctu 1
* * * * * * * ** * * ** * * *
* * * *
* * *
* * * ** * *
** * * *
*
* * **
* * * *
* * *
**
FK--E--NSmISIRRRPYERLEVEIIEPYSLVVFLAEDYEELaltgSEAEmANLIFENPR2VLD_Paby 84EQDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVENucS_Mycsme 58EESGGQ-APVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQNucS_Myctu 61
VIEEGFKPIYREKPIRHGIVDVMGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEY2VLD_Paby 140LLGAGYTLVRREYPTPIGPVDLLCRDELGRSVAVEIKR-RGEIDGVEQLTRYLELLNRDSNucS_Mycsme 118LLGEGYTLVRREYMTAIGPVDLLCRDERGGSVAVEIKR-RGEIDGVEQLTRYLELLNRDSNucS_Myctu 120
-GENVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP2VLD_Paby 200 233LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYDNucS_Mycsme 177 211VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYDNucS_Myctu 179 213
*
180o
Q187
K176
Y191
E174
Q187
A67
V69
S54
S39
W75W
75R70
R70
R42R42
A135
D162
Q187
K184
Y191
E127 D160
G157
A67
R144
K184
A135
D162
Q187
T168
B
* R I
S A
S S H A
E
R144
T168
S39
S54
V69
W75W
75R70
R70
R42R42
*
E127
C-terminal domain
5
pink, respectively. The first residues (1-25) of the NucS P. abyssi N-terminal domain, missing in
the M. tuberculosis model, are depicted as green cartoon. The putative β-clamp binding motif in P.
abyssi NucS is shown in orange. The important catalytic and DNA binding sites are depicted over
the P. abyssi structure as vertical red bars with the residues numbered above. Back and front views
of the structural superimposition of the homodimeric resolved structure of NucS from P. abyssi 1
and the M. tuberculosis model (purple) is presented below the schema (colour code as described for
the upper scheme). The important catalytic and DNA binding sites are depicted as red sticks and the
residues where polymorphisms described in this work have been detected are shown as green sticks
over the structural imposition. B) Multiple structure-based alignment of NucS from P. abyssi
(2VDL), M. smegmatis mc2 155 and M. tuberculosis CDC155. Colour code is the same as in panel
A. The amino acid change of each naturally occurring M. tuberculosis polymorphism is depicted in
green. Important catalytic and DNA binding residues are in red. Grey low-case indicates regions
without structural information; magenta “m” indicates Seleno-Met modifications in the structure of
P. abyssi. Residues of the putative β-clamp binding motif in P. abyssi NucS are shown in orange.
Only regions that align among the three proteins are shown. Asterisks indicate identical residues in
the three aligned sequences.
6
Supplementary Figure 4. Computational procedures to analyse NucS distribution and
evolution. Each panel depicts a different protocol. Yellow fonts indicate figures and/or tables.
Yellow uppercase letters are pieces of evidence used in the model presented in Fig. 6 (main text).
Domain Analyses IInitial Sequence Searches
~400
Compare ranked positions hits by e-value and
bit-score
NucS_pyrab NucS_myctu
e-val < 0.0001, aln lenght > 75%,
Bit score > 50
- ~370 unique archaeal/bacterial proteins
- Eukaryotes and Virus discarded as lack NucS (A)
Align sequences
(excluding original
queries)
pHMMER
MAFFT
hmmbuild
hmmsearch
e-val < 0.0001, aln lenght > 75%,
Bit score > 50
~400
370 370
External info
Input file
Output file
Results
Program
MutS/L identification
hmmbuild
hmmsearch
MutS MutL Selected species
Bit score > 50, aln lenght > 75%,
MUTSMutL seqs MutS seqs
pMutS pMutL
2,841 prokaryotic
genomes
MAFFT
alnMutS alnMutL
Phylogenetic Profiling
Figure 5
Supplementary data file 1
Large DB including
3,942 Ref Proteomes
aln-nr 63%
Large DB including
3,942 Ref Proteomes
p-aln p-aln
aln-nr 63%
Structure-based definition using NucS_pyrab:
full, N-terminal and C-terminal.
Archaea
Bacteria
Archaea/Bacteria
Archaea-others
Archaea-others
Archaea/Bacteria
Prokayotes-othersEukaryotes
EukaryotesMAFFT- hmmbuild
Archaea-NucS
Bacteria-NucS
MAFFT-hmmbuild
Archaea-NucS
Bacteria-NucS
hmmbuild
hmmsearch
NucS is built on two distinct domains (B) Supp. Fig 3
Large DB including
3,942 Ref Proteomes
hmmsearchLarge DB including
3,942 Ref Proteomes
p-NT p-CT
aFull aNT aCT
pFull pNT pCT
Tree of Life(A)
(B) (C)
(D)
Proposed evolution of NucSFigure 6
MutS/L
Extract PFAM
PF01939 seqs
Domain Analyses II
hmmalign
3D-based
pairwise
RAXML: phylogeny
NucS- full sequence
Phylogenetic tree
NucS in some bacteria have been transferred from Archaea (C)
aFull aNT aCT
Supp. Fig 5 Supp. Fig 7 Supp. Fig 6
NucS N-Terminal
Phylogenetic tree NucS C-Terminal
Phylogenetic tree
NucS shows a disperse distribution pattern (D)
(*) Only complete genomes Actinobacteria phylum and Archaea
** Confident absence
If NO NucS is identified
if MutS/L is found
if MutS/L not found:
and “Unassembled WGS”
else, translated searches vs genome(*)
if MutS/L found
if MutS/L not found
If NucS IS identified
if MutS/L is found
if MutS/L not found:
and “Unassembled WGS”
else, translated searches vs genome(*)
if MutS/L found
if MutS/L not found
Undef MutS/L
MutS/LLikelyNot
NucS-MutS/L
NucS-MutS/L
NucS-only**
Procedure:
Undef MutS/L
and “Unassembled WGS” Undef NucS else, translated searches vs genome(*)
if NucS found
if NucS not found
NucS
LikelyNot
7
Supplementary Figure 5. Phylogenetic analyses of NucS. Unrooted ML tree of full NucS
sequences from the PFAM PF01939. Black branches are Actinobacteria; blue branches are
Archaea; red branches are species from the Deinococcus-Thermus group. Labels indicate species
name. Red circles represent >80% bootstrap in 1,500 replicates.
Meth
anocella
_palu
dic
ola
.st.D
SM
_17711
Str
epto
myces_sp._
Tu6071
Stre
pto
myces_sp._
WM
6386
09
76
1.M
SD.t
s.iy
bsl
aw
_m
utar
da
uq
ola
H
Therm
ococcus_gam
mato
lera
ns.s
t.D
SM
_15229
Actinomyces_sp._oral_taxon_848_str._F0332
Corynebacterium_epiderm
idicanis
Amycolicicoccus_subflavus.st.DSM_45089
Actinomyces_sp.oral_tax_180.str.F0310
Natro
nobacte
rium
_gre
gory
i_A
TC
C.4
3098
Micr
obacteriu
m_hyd
roca
rbonoxy
dans
Corynebacterium_jeikeium.st.K411
Aero
pyru
m_pern
ix_ATC
C.7
00893
Halo
bacte
rium
_sp._
DL1
Mycobacterium_abscessus_ATCC.19977
Halo
bacte
rium
_sp._
DL1
Actin
obacte
ria_bacte
rium
_O
V450
Strepto
myc
es_
sp._
CN
Q-5
09
Corynebacterium_argentoratense_DSM.44202
Conexibacter_w
oesei.st.DSM
_14684
Stre
pto
myces_m
angro
vis
oli
Noc
ardi
a_cy
riacige
orgi
ca.s
t.GUH-2
Saccharomonospora_glauca_K62
Mycobacterium_parascrofulaceum_ATCC.BAA-614
Str
epto
myc
es_
sp._
WM
6378
Gardnerella_vaginalis_ATCC.14019
Rhodococcus_jostii.st.RHA1
Pyr
obacu
lum
_aero
philu
m_AT
CC
.51768
Strepto
myc
es_
rubello
murinus
Actinomyces_massiliensis_F0489
Rhodococcus_aetherivorans
Mycobacterium_sinense.st.JDM601
Actinomyces_sp.oral_tax_178.str.F0338
Rhodococcus_sp._B7740
Rothia_mucilaginosa.st.DY-18
Geoderm
atophilus_obscurus_ATCC.25078
Therm
omonospora_curvata_A
TC
C.19995
Kutzneria_albida_DSM.43870
Pyr
oco
ccus_
furiosu
s_A
TC
C.4
3587
Nocardiopsis_sp._N
RR
L_B-16309
Frigoribacterium_sp._RIT-PI-h
Strep
tom
yces
_rim
osus
_sub
sp._
rimos
us_A
TC
C.1
0970
Blastococcus_saxobsidens.st.DD2
Pseudonocardia_dioxanivorans_ATCC.55486
Meth
anopyru
s_kandle
ri.s
t.A
V19
Arthrobacter_globiformis.NBRC.12137
Stre
pto
myces_sp._
MM
G1121
Trueperella_pyogenes
Mycobacterium_hassiacum_DSM.44199
Stre
pto
myce
s_ca
ttleya
_ATC
C.3
5852
Rhodococcus_equi_ATCC.33707
Corynebacterium_glucuronolyticum
_ATCC.51866
Str
epto
myc
es_
tsuku
bensi
s_N
RR
L18488
Stre
pto
myces_averm
itilis_A
TC
C.3
1267
Actinotignum_schaalii
Kocuria_sp._UCD-OTCP
Actin
obacte
ria_bacte
rium
_O
V320
Saccharothrix_sp._NRRL_B-16348
Mycobacterium_paratuberculosis_ATCC.BAA-968
Natrin
em
a_sp.s
t.J7-2
Halo
rubru
m_la
cuspro
fundi_
AT
CC
.49239
Noca
rdia
_nova_SH
22a
22
90
07.
CC
TA
_m
ura
nila
s_
mui
ret
ca
bol
aH
Micr
obacteriu
m_laeva
niform
ans_OR221
Brevibacterium_mcbrellneri_ATCC.49030
Arthrobacter_arilaitensis.st.DSM_16368
Corynebacterium_glutamicum_ATCC.13032
Corynebacterium_resistens.DSM.45100
Pyro
coccus_abyssi.st.G
E5
Cellulomonas_gilvus_ATCC.13127
80
05.
ts.
sis
ne
gn
ag
gnij
_.p
sb
us
_s
uci
po
cs
org
yh
_p
ert
S
Actinomyces_sp._oral_tax_175.str.F0384
Stre
ptom
yces
_gris
eus_
subs
p._g
riseu
s.st
.JCM
_462
6
Sulfo
lobus_
solfa
taricu
s_ATC
C.3
5092
Beutenbergia_cavernae_ATCC.BAA-8
Str
epto
myces_virid
ochro
mogenes
Bifidobacteriu
m_scardovii
Scardovia_wiggsiae_F0424
Marin
itherm
us_
hyd
roth
erm
alis.st.D
SM
.14884
Sanguibacter_keddieii_ATCC.51767
Mobiluncus_mulieris_ATCC.35239
Act
inobact
eria
_bact
eriu
m_IM
CC
26256
Actinomyces_graevenitzii_C83
Stre
pto
myces_sp.
Cellulomonas_fimi_ATCC.484
marine_actinobacterium_PHSC20C1
Kita
sato
spora
_se
tae_ATC
C.3
3774
Actinobacte
ria_bacte
rium
_O
K006
Bifidobacterium_gallicum_DSM.20093
Str
epto
myces_gla
ucescens
Dein
oco
ccus_
soli_
Cha_et_
al._
2014
Halo
terrig
ena_tu
rkm
enic
a_A
TC
C.5
1198
Strep
tom
yces
_sp.
_769
Halo
fera
x_volc
anii_
AT
CC
.29605
Stre
pto
myces_sp._
e14
Renibacterium_salmoninarum_ATCC.33209
Mic
rolu
natu
s_ph
osph
ovor
us_A
TC
C.7
0005
4
Saccharothrix_espanaensis_ATCC.51144
Dietzia_c
inna
mea
_P4
Dein
oco
ccus_
pro
teolyticu
s_ATC
C.3
5074
Micro
bact
eriu
m_o
xyda
ns
Corynebacterium_vitaeruminis_DSM.20294
Noc
ardi
a_fa
rcin
ica.
st.IF
M_1
0152
Nocardia_sp._NRRL_S-836
Mycobacterium_therm
oresistibile
_ATCC.19527
Corynebacterium_diphtheriae_ATCC.700971
Actinoplanes_sp._N902-109
Bifidobacteriu
m_longum.st.NCC_2705
Strepto
myc
es_
sp._
NR
RL_F-6
602
Mycobacterium_haemophilum
Gordonia_polyisopre
nivorans.st.D
SM.44266
Pyro
lobus_
fum
arii.D
SM
.11204
Str
epto
myces_sp._
PB
H53
Bifidobacterium_bifidum_BGN4
Dermacoccus_nishinomiyaensis
misce_C
renarchaeota_group-1_archaeon_SG
8-32-1
Str
epto
myces_davaw
ensis
_JC
M_4913
Bifidobacterium_angulatum_DSM.20098
Strepto
myc
es_
xiam
enensi
s
Micr
obacteriu
m_ke
tosir
educens
Mycobacterium_vanbaalenii.st.DSM.251
Nocardioides_simplex
Corynebacterium_kutscheri
Meth
anosphaera
_sta
dtm
anae_A
TC
C.4
3021
Str
epto
myces_coelic
olo
r_A
TC
C.B
AA
Ilumatobacter_coccineus_Y
M16-304
Str
epto
myces_sp._
AS
58
Mycobacterium_leprae.st.TN
Gordonia_sputi_
NBRC.100414
Mobiluncus_curtisii_ATCC.43063
Mob
ilico
ccus
_pel
agiu
s_NBRC.1
0492
5
Nakamurella_multipartita_ATCC.700099
Pro
pion
imic
robi
um_l
ymph
ophi
lum
_AC
S-0
93-V
-SC
H5
Meth
anobre
vib
acte
r._A
bM
4
Stre
pto
myce
s_sp
._M
g1
Clavibacter_michiganensis_subsp._sepedonicus_ATCC.33113
Halo
sta
gnic
ola
_la
rsenii_
XH
-48
Corynebacterium_atypicum
Nocardioides_luteus
Stre
pto
myce
s_vie
tnam
ensis
Stre
pto
myce
s_sp
._N
RR
L_F
-4428
archaeon_GW
2011_AR
10
38
09
2.C
CT
A_
su
eci
vs
_s
ec
ym
otp
ert
S
Frankia_symbiont_subsp._D
atisca_glomerata
Microm
onospora_sp._NRRL_B
-16802
Frankia_sp._EUN1f
Therm
obifida_fusca.st.YX
Rothia_dentocariosa_ATCC.17931
Str
epto
myces_variegatu
s
misce_C
renarchaeota_group-6_archaeon_AD
8-1
Stre
pto
myces_scabie
i.st.8
7.2
2
Janibacter_hoylei_PVAS-1
Micr
obacteriu
m_tri
choth
ecenolyt
icum
Corynebacterium_pseudotuberculosis.st.C231
Strep
tom
yces
_niv
eus_
NC
IMB_1
1891
Strep
tom
yces
_lyd
icus
_A02
Therm
opro
teus_
sp._
AZ2
Kytococcus_sedentarius_ATCC.14392
Strepto
myc
es_
sp._
NR
RL_F-6
602
Mycobacterium_obuense
Bifidobacteriu
m_longum_subsp.longum_F8
Xylanimonas_cellulosilytica.st.DSM_15894
Mycobacterium_tuberculosis_H37Rv
Cellulomonas_flavigena_ATCC.482
Strepto
myc
es_
sp._
NR
RL_S
-495
Arthrobacter_sp._PAMC25486
Microm
onospora_sp._HK10
Stre
pto
myces_caele
stis
Cald
ivirga_m
aquili
ngensi
s_ATC
C.7
00844
Str
epto
myces_griseoflavus_T
u4000
Mycobacterium_elephantis
Dein
oco
ccus_
gobie
nsis.st.D
SM
_21396
Corynebacterium_xerosis
Micr
obacteriu
m_ch
ocola
tum
Mycobacterium_smegmatis_ATCC.700084
Str
epto
myces_virid
ochro
mogenes.s
t.D
SM
_40736
Natro
nobacte
rium
_gre
gory
i_A
TC
C.4
3098
Verrucosispora_maris.st.A
B-18-032
Streptomyces_sp._AA4
Segniliparu
s_ro
tundus_
ATCC.BAA-9
72
Jonesia_denitrificans_ATCC.14870
Gord
onia_sih
wensis_NBRC.1
08236
69
78
1.M
SD.t
s.ila
gto
ej_
su
cc
ocil
akl
ala
H
Corynebacterium_falsenii_DSM.44353
Rhodococcus_spRD6.2
Bifidobacterium_dentium_ATCC.27534
Bifidobacterium_therm
ophilum_RBL67
Halo
pig
er_
xanaduensis
.st.D
SM
.8323
Amycolatopsis_orientalis_HCCB10007
Curtobacterium_flaccumfaciens_UCD-AKU
Stre
pto
spora
ngiu
m_ro
seum
_ATC
C.1
2428
T9
11
CS
UM
_.p
s_
se
cy
mot
per
tS
Meth
anobre
vib
acte
r_ru
min
antium
_A
TC
C.3
5063
Acidothermus_cellulolyticus_ATC
C.43068
33
70
4_
MS
D.ts.
su
nillo
c_
se
cy
mot
pert
S
Micro
bacter
ium
_sp.
_SA39
Mycobacterium_kansasii_ATCC.12478
Arthrobacter_chlorophenolicus_ATCC.700700
Meth
anoca
ldoco
ccus_
jannasc
hii_
AT
CC
.43067
47
68
1_
MS
D.t
s.si
sn
ep
alo
om
_s
an
om
on
ort
aN
Leucobacter_sp._Ag1
Bra
chyb
acte
rium
_sp.
_SW
0106
-09
Stre
pto
myces_sp._
NR
RL_S
-444
Saccharopolyspora_erythraea_ATCC.11635
Dein
oco
ccus_
swuensis
Stre
pto
myce
s_sp
._N
RR
L_F-6
491
Vulc
anis
aeta
_dis
trib
uta
.st.D
SM
.14429
Mycobacterium_xenopi_RIVM700367
Saccharomonospora_marina_XMU15
Bifidobacteriu
m_asteroides.st.P
RL2011T
herm
ococcus_kodakare
nsis
_A
TC
C.B
AA
Pro
pion
ibac
teriu
m_a
cnes
.st.K
PA17
1202
Strepto
myc
es_
sp._
PVA
_94-0
7
Str
ept_
cyaneogriseus_subsp.n
oncyanogenus
Micrococcus_luteus_ATCC.4698
Corynebacterium_mustelae
Janibacter_sp._HTCC2649
Rhodococcus_erythropolis.st.PR4
Stre
pto
myces_sp._
NR
RL_W
C-3
618
67
9.2
1_
SD
_s
nati
cs
o_.
ps
bu
s_
su
ne
go
mor
hc
oe
sor
_tp
ertS
Terrabacter_sp._28
Aci
dia
nus_
hosp
italis
.st.W
1
Natria
lba_m
agadii_
AT
CC
.43099
Bifidobacterium_adolescentis_ATCC.15703
Micro
bacter
ium
_folioru
m
Catenulispora_acidiphila.st.D
SM
_44928
Micro
bacteriu
m_aza
dirach
tae
Lechevalieria_aerocolonigenes
Amycolatopsis_mediterranei.st.S699
Corynebacterium_sp._ATCC.6931
Gordonia_rh
izosp
hera_NBRC.16068
Therm
osphaera
_aggre
gans.s
t.DS
M.11
486
Salinispora_tropica_ATC
C.B
AA-916
I4F2P8 acti 477641
Mycobacterium_marinum_ATCC.BAA-535
halo
philic
_arc
haeon_D
L31
Stre
pto
myce
s_ka
trae
Stre
pto
myces_sp._
WM
6372
Bifidobacteriu
m_breve_DSM.20213
94
03
4.C
CT
A_i
utro
msir
am
_al
ucr
aol
aH
Actinobacte
ria_bacte
rium
_O
K074
Meth
anocella
_arv
ory
zae.s
t.DS
M.2
066
Therm
ofilum_sp._1910b
Stre
pto
myce
s_him
asta
tinicu
s_ATC
C.5
3653
Tsuk
amur
ella_p
auro
met
abola_
ATCC.8
368
Therm
ofilum_pendens.st.H
rk_5
Corynebacterium_variabile.st.DSM_44702
Frankia_sp.st.EuI1c
Bifidobacterium_kashiwanohense_PV20-2
Dein
oco
ccus_
marico
pensis.st.D
SM
.21211
Meth
anoth
erm
obacte
r_th
erm
auto
trophic
us_A
TC
C.2
9096
Pro
pion
ibac
teriu
m_a
cidi
prop
ioni
ci_A
TC
C.4
875
Gord
onia_am
arae_NBRC.1
5530
Str
epto
myces_in
carn
atu
s
Mycobacterium_neoaurum_VKM_Ac-1815D
Actinoplanes_sp._ATCC.31044
Mycobacterium_sp.st.KMS)
Patulibacter_m
edicamentivorans
Nocard
ia_bra
silie
nsis_ATCC.7
00358
Stre
ptom
yces
_pra
tens
is_A
TCC.3
3331
Nocardioides_sp.st.BAA-49
Therm
ogla
diu
s_ce
llulo
lyticus.st.1
633
Halo
geom
etr
icum
_borinquense_
AT
CC
.700274
Corynebacterium_efficiens.st.DSM_44549
Ignis
phaera
_aggre
gans.
st.D
SM
.17230
Kin
eosp
haer
a_lim
osa_
NBRC.1
0034
0
Strep
tom
yces
_pris
tinae
spira
lis_A
TC
C.2
5486
Nocardioides_sp._CF8
Therm
obisp
ora
_bisp
ora
_ATC
C.1
9993
Sta
phylo
therm
us_
marin
us_
AT
CC
.43588
Kocuria_rhizophila_ATCC.9341
Actinomyces_urogenitalis_DSM.15434
Corynebacterium_glyciniphilum
_AJ_3170
Gordonia_sp
._KTR9
Natro
nococcus_occultu
s_S
P4
Frankia_sp.st.EAN1pec
Meth
anobacte
rium
_sp._
MB
1
Str
epto
myces_nodosus
Stre
pto
myces_sp._
MM
G1533
Actinomyces_sp.oral_tax_448.str.F0400
Stre
pto
myce
s_bin
gch
enggensis.st.B
CW
-1
Hyp
erth
erm
us_
butylicu
s.st.DS
M_5456
Actinosynnema_mirum_ATCC.29888
Pro
pion
ibac
teriu
m_p
ropi
onic
um.s
t.F02
30a
47
80
07.
CC
TA
_ie
ata
ho
ku
m_
mui
bor
cim
ola
H
Stre
pto
myces_plu
ripote
ns
Stre
ptom
yces
_sp.
st.S
irexA
A-E
Meth
anobre
vib
acte
r_sm
ithii_
AT
CC
.35061
Microm
onospora_lupini_str._Lupac_08
Natro
nom
onas_phara
onis
_A
TC
C.3
5678
Meth
anobacte
rium
_la
cus.s
t.A
L-2
1
Arcanobacterium_haemolyticum_ATCC.9345
Str
epto
myc
es_
venezu
ela
e_AT
CC
.10712
Natrin
em
a_pelliru
bru
m.s
t.DS
M_15624
Strep
tom
yces
_aur
atus
_AG
R00
01
Bra
chyb
acte
rium
_fae
cium
_ATC
C.4
3885
Micr
obacteriu
m_te
stace
um.s
t.StL
B037
L0J2N8 acti 212767
Str
epto
myces_sp.s
t.S
PB
074
Actinoplanes_missouriensis_ATC
C.14538
Actinomyces_turicensis_ACS-279-V-Col4
Mycobacterium_sp._VKM_Ac-1817D
Bifidobacteriu
m_animalis
_subsp._lactis.st.A
D011
Stre
pto
myces_sp._
WM
4235
Gardnerella_vaginalis_JCP8481B
Aus
twickia_
chel
onae
_NBRC.1
0520
0
Stre
pto
myces_zin
cire
sis
tens_K
42
Arthrobacter_sp.st.FB24
Therm
ococcus_lit
ora
lis_A
TC
C.5
1850
Stre
pto
myces_sp._
MU
SC
136T
Kribbella_flavida.st.DSM_17836
Micro
bact
eriu
m_s
p._A
g1
Aeromicrobium
_marinum
_DSM.15272
Leifsonia_xyli_subsp._xyli.st.CTCB07
Intrasporangium_calvum_ATCC.23552
Kineococcus_radiotolerans_ATCC.BAA-149
Sulfo
lobus_
toko
daii.
st.D
SM
_16993
Microbacte
rium_ginse
ngisoli
Frankia_alni.st.ACN14a
Dein
oco
ccus_
gobie
nsis.st.D
SM
_21396
Corynebacterium_matruchotii_ATCC.14266
Corynebacterium_urealyticum_ATCC.43042
Halo
bacte
rium
_salin
aru
m_A
TC
C.7
00922
Stre
pto
myce
s_vio
lace
usn
iger_
Tu_411
3
Gordonia_bro
nchialis
_ATCC.25592
Bifidobacterium_pseudolongum_PV8-2
Str
epto
myc
es_
alb
us_
AT
CC
.21838
Sulfo
lobus_
aci
doca
ldarius_
ATC
C.3
3909
Saccharomonospora_viridis_ATCC.15386
Stre
ptom
yces
_cla
vulig
erus
_ATC
C.2
7064
Frankia_sp.st.CcI3
Isoptericola_variabilis.st.225
54
0M
_s
uc
aitn
aru
ao
esir
g_
se
cy
mot
pert
S
Sac
char
othr
ix_s
p._S
T-88
8
Actinoplanes_friuliensis_D
SM
.7358
Mycobacterium_rhodesiae.st.NBB3
Str
epto
myces_ghanaensis
_A
TC
C.1
4672
Pro
pion
ibac
teriu
m_f
reud
enre
ichi
i_su
bsp.
_she
rman
ii_ATC
C.9
614
Nocardiopsis_alba_A
TC
C.B
AA-2165
Candidatus_C
aldiarchaeum_subterraneum
Turicella_otitidis_ATCC.51513
Rhodococcus_pyridinivorans_SB3094
Microm
onospora_aurantiaca_ATCC.27029
Scardovia_inopinata_F0304
Meth
anobacte
rium
_palu
dis
.st.D
SM
.25820
Mycobacterium_sp._EPa45
Actinomyces_neuii_BVS029A5
Nocardiopsis_dassonvillei_A
TC
C.23218
Stackebrandtia_nassauensis.st.DSM_44728
Therm
ococcus_baro
philu
s.s
t.D
SM
_11
836
Tree scale: 0.1
ARCHAEA
ACTINOBACTERIA
Deinococcus
Tree scale
8
Supplementary Figure 6. Phylogenetic analyses of the NucS C-terminal (CT) region. Unrooted
ML tree of CT sequences from the PFAM PF01939 domain. Grey font indicates that this domain is
in NucS and blue font that is found outside NucS. Black branches, Actinobacteria; red
Deinococcus-Thermus; grey, other Bacteria; light blue, Archaea; green, eukaryotic sequences. Bold
fonts indicate the main species used in this study. Labels indicate species name. The circles
represent >80% bootstrap in 1,500 replicates.
111-A
ctinobacte
ria b
acte
rium
OV
320
415-Microbacterium ketosireducens
31
20
2.M
SD.
ev
erb
muir
etc
ab
odifi
B-8
03
35-Methanocella arvoryzae.st.DSM
22066
348-
Mar
inith
erm
us h
ydro
ther
mal
is.s
t.DSM
148
84169-C
orynebacterium camporealensis
07
90
1.C
CT
A s
us
omi
r.p
sb
us
su
so
mir
se
cy
mot
per
tS-
39
249-Gordonia sihw
ensis NBR
C 108236
270-M
ycobacte
rium
neoauru
m V
KM
Ac-1
815D
194-Austwickia chelonae NBRC 105200
106-S
trepto
myces x
iam
enensis
4N
GB
mu
difib
muir
etc
ab
odifi
B-7
03
424-Clostridium sp.CAG:628
79-A
ctin
obacte
ria b
acte
rium
OV
450
266-Myco
bacteriu
m sm
egm
atis.st.ATC
C.700084
258-Mycobacterium
leprae.st.TN
292-B
ifidobacte
rium
aste
roid
es.s
t.PR
L2011
196-Curtobacterium flaccumfaciens UCD-AKU
384-Haloarcula marismortui.st.A
TCC.43049
175-Corynebacterium sp.ATCC.6931
324-A
ero
mic
robiu
m m
arinum
.DS
M.1
5272
7-miscellaneous Crenarchaeota group archaeon SMTZ-80
112-S
trepto
myces s
p.s
t.S
PB
074
9-Burkholderiales bacterium GJ-E10
226-Arthrobacter sp.PAMC25486
66-A
ctinosyn
nem
a m
irum
.st.AT
CC
.29888
131-S
trepto
myc
es
zinci
resi
stens
K42
30
75
1.C
CT
A.t
s.si
tn
ec
sel
od
a m
uir
etc
ab
odi
fiB-
50
3
159-Cory
nebacteriu
m e
pidermidica
nis
243-Rhodococcus jostii.st.RHA1
117-S
trepto
myces s
p.M
MG
1533
284-M
obilu
ncus c
urtis
ii.st.A
TC
C.4
3063
124-S
trepto
myc
es
plu
ripote
ns
379-Halalkalicoccus je
otgali.st.D
SM 18796
162-Corynebacteriu
m efficiens.st.D
SM 44549
181-Corynebacterium lipophiloflavum.DSM.44291
333-M
icro
monosp
ora
sp.N
RR
L B
-16802
387-Haloquadratum walsbyi.st.DSM 16790
153-Cory
nebacteriu
m v
ariabile
.st.D
SM 4
4702
360-
Met
hano
ther
mob
acte
r the
rmau
totro
phicus
.st.A
TCC.2
9096
161-Coryn
ebacteriu
m glutamicum.st
.ATCC.13032
346-
Seg
nilip
arus
rot
undu
s.st
.ATC
C.B
AA
-972
390-Natronobacterium gregoryi.st.ATCC.43098
278-S
acch
aro
monosp
ora
gla
uca
K62
225-Arthrobacter arilaitensis.st.DSM 16368
39-Vibrio campbellii.st.H
Y01
173-Corynebacterium aurimucosum.st.ATCC.700975
85-S
trepto
myces a
lbus.s
t.AT
CC
.21838
211-Beutenbergia cavernae.st.ATCC.BAA-8
70-N
oca
rdio
psis a
lba.st.A
TC
C.B
AA
-2165
65-N
oca
rdia
sp.N
RR
L S
-836
419-Clostridium sp.CAG:433
62-S
acch
aro
thrix sp
.NR
RL B
-16348
275-M
ycobacte
rium
absce
ssus.st.A
TC
C.1
9977
313-A
ctinom
yces s
p.o
ral ta
xon 8
48 s
tr.F
0332
52-Frankia sp.st.E
AN
1pec
416-Parcubacteria
290-A
ctin
om
yces s
p.o
ral ta
xon 1
75 s
tr.F0384
25
85
3.C
CT
A.ts.
ay
eltta
c s
ec
ym
otp
ertS-
79
160-Cory
nebacteriu
m sp
.KPL1989
420-Clostridium sp.CAG:433
99-S
trepto
myces v
iola
ceusnig
er
Tu 4
113
58-S
tacke
bra
ndtia
nassa
uensis.st.D
SM
44728
250-Gordonia bronchialis.st.ATC
C.25592
274-M
ycobacte
rium
therm
ore
sistibile
ATC
C.1
9527
216-Microbacterium sp.Ag1
388-Halogeometricum borinquense.st.ATCC.700274
329-S
alin
ispora
tro
pic
a.s
t.A
TC
C.B
AA
-916
68-T
herm
obisp
ora
bisp
ora
.st.AT
CC
.19993
46-Deinococcus m
aricopensis.st.DSM
21211
369-Therm
ofilum
pendens.
st.H
rk 5
102-S
trepto
myces r
ubello
murinus
215-Microbacterium sp.SA39406-Pyrococcus abyssi.st.GE5
34-Caldivirga maquilingensis.st.ATCC.700844
154-Cory
nebacteriu
m je
ikeiu
m.s
t.K411
47-Deinococcus soli.C
ha et al.2014
206-Xylanimonas cellulosilytica.st.DSM 15894
197-Leucobacter sp.Ag1
132-S
trepto
myc
es
grise
ofla
vus
Tu4000
233-Arthrobacter aurescens.st.TC1
319-K
utz
neria a
lbid
a.D
SM
.43870
247-Gordonia sp.KTR9
107-S
trepto
myces s
p.N
RR
L F
-6602
402-Thermococcus kodakarensis.st.ATCC.BAA-918
182-Corynebacterium ureicelerivorans
73-S
trepto
myces k
atra
e
257-Mycobacterium
parascrofulaceum ATC
C.B
AA-614
17-Microgenomates
27-Lyngbya sp.st.PCC 8106
64-L
ech
eva
lieria
aero
colo
nig
enes
204-Jonesia denitrificans.st.ATCC.14870
244-Rhodococcus erythropolis.st.PR4
322-D
ele
ted-I
4F
2P
8265-Mycobacterium
sp.VKM
Ac-1817D
341-P
atu
libact
er m
edic
am
entiv
ora
ns
51-Frankia sp.E
UN
1f
126-S
trepto
myc
es
inca
rnatu
s
172-Corynebacterium casei LMG S-19264
185-Streptosporangium roseum.st.ATCC.12428
287-A
ctin
om
yces s
p.o
ral ta
xon 1
80 s
tr.F0310
80-S
trepto
myces s
p.W
M6372
376-Natro
nobacterium gregoryi.s
t.ATCC.43098
396-Natrialba magadii.st.ATCC.43099
43-Deinococcus gobiensis.st.D
SM
21396
281-A
ctinom
yces g
raeve
nitzii C
83
423-Trichoplax adhaerens
36-Methanocella paludicola.st.DSM
17711
373-Candidatus M
agnetobacteriu
m bavaric
um
30-Thermoproteus sp.AZ2
22-Hirschia baltica.st.ATCC.4981429-Jeotgalibacillus sp.D5
256-Mycobacterium
paratuberculosis.st.ATCC.B
AA-968
212-Microbacterium testaceum.st.StLB037
293-B
ifidobacte
rium
pseudolo
ngum
PV
8-2
50-Frankia sym
biont subsp.Datisca glom
erata
285-M
obilu
ncus m
ulie
ris A
TC
C.3
5239
163-Corynebacteriu
m kutscheri
44-Deinococcus gobiensis.st.D
SM
21396
38-Paraglaciecola arctica BSs20135
114-A
ctinobacte
ria b
acte
rium
OK
006
109-S
trepto
myces v
ariegatu
s
418-Pelagibacter ubique.st.HTCC1062
378-Halobacteriu
m salinarum.st.A
TCC.700922
53-Frankia sp.st.E
uI1c
151-
Cor
yneb
acte
rium
falsen
ii.DSM
.443
53
32-Ignisphaera aggregans.st.DSM 17230
37-Paraglaciecola arctica BSs20135
49-Frankia alni.st.A
CN
14a
143-
Stre
ptom
yces
cae
lest
is
404-Thermococcus barophilus.st.DSM 11836
277-N
oca
rdia
nova
SH
22a
320-G
eoderm
ato
philu
s o
bscuru
s.s
t.A
TC
C.2
5078
395-Haloterrigena turkmenica.st.ATCC.51198
168-Corynebacteriu
m maris
.DSM.45190
125-S
trepto
myc
es
colli
nus.
st.D
SM
40733
391-Halopiger xanaduensis.st.DSM 18323
232-Arthrobacter sp.st.FB24
411-Metagenomes
296-S
card
ovia
inopin
ata
F0304
400-Crenarchaeota group-1 archaeon SG8-32-1
11-Synechococcus sp.ATCC.27264
214-Microbacterium hydrocarbonoxydans
230-Arthrobacter globiformis NBRC 12137
195-Kineococcus radiotolerans.st.ATCC.BAA-149
19-Chryseobacterium sp.ERMR1:04
86-S
trepto
myces s
p.P
VA
94-0
7
370-Lokia
rchaeum
sp.G
C14 75
128-S
trepto
myc
es
rose
och
rom
ogenus
subsp
.osc
itans
DS
12.9
76
189-Terrabacter sp.28
89
00
2.M
SD.
mut
alu
gn
a m
uir
etc
ab
odi
fiB-
60
3
399-Crenarchaeota group-6 archaeon AD8-1
299-B
ifidobacte
rium
gallic
um
.DS
M.2
0093
75-S
trepto
myces s
p.M
g1
377-Halobacteriu
m sp.DL1
72-N
ocard
iopsis
sp.N
RR
L B
-16309
136-
Strep
tom
yces
gla
uces
cens
205-Cellulomonas gilvus.st.ATCC.13127
84-S
trepto
myces v
enezuela
e.s
t.AT
CC
.10712
291-C
ate
nulis
pora
acid
iphila
.st.D
SM
44928
380-Natro
nomonas moolapensis.st.D
SM 18674
5-Methanococcoides burtonii.st.DSM 6242
15-Streptomyces pluripotens
202-Rothia dentocariosa.st.ATCC.17931
78-S
trepto
myces s
p.N
RR
L F
-4428
342-C
onexi
bact
er w
oese
i.st.D
SM
14684
340-P
ropio
nim
icro
biu
m ly
mphophilu
m A
CS
-093-V
-SC
H5
148-
Cor
yneb
acte
rium
dipht
heria
e.st.A
TCC.7
0097
1
321-B
lasto
coccus s
axobsid
ens.s
t.D
D2
122-S
trepto
myc
es
sp.P
BH
53
146-
Bre
viba
cter
ium
mcb
relln
eri A
TCC.4
9030
4-Methanococcoides methylutens MM1
386-Haloferax volcanii.st.ATCC.29605
190-Intrasporangium calvum.st.ATCC.23552
45-Deinococcus proteolyticus.st.ATC
C.35074
179-Corynebacterium marinum.DSM.44953
100-S
trepto
myces h
imasta
tinic
us A
TC
C.5
3653
245-Rhodococcus pyridinivorans SB3094
334-A
ctin
opla
nes
friu
liensi
s.D
SM
.7358
231-Renibacterium salmoninarum.st.ATCC.33209
219-Microbacterium ketosireducens
288-A
ctin
om
yces s
p.o
ral ta
xon 1
78 s
tr.F0338
349-
Aer
opyr
um p
erni
x.st
.ATC
C.7
0089
3
273-M
ycobacte
rium
hassia
cum
.DS
M.4
4199
261-Mycobacterium
marinum
.st.ATC
C.B
AA-535
405-Thermococcus litoralis.st.ATCC.51850
283-A
ctinom
yces n
euii B
VS
029A
5
108-S
trepto
myces v
ietn
am
ensis
223-Microbacterium chocolatum
135-
Strep
tom
yces
dav
awen
sis
JCM
491
3
359-
Met
hano
bacter
ium
sp.
MB1
133-
Strep
tom
yces
sp.
e14
91-S
trepto
myces c
lavulig
eru
s.s
t.AT
CC
.27064
372-Candidatu
s Magneto
bacteriu
m b
avaric
um
279-S
acch
aro
monosp
ora
marin
a X
MU
15
20
A s
uci
dyl
se
cy
mot
pert
S-6
9
63-S
acch
aro
thrix e
spanaensis.st.A
TC
C.5
1144
152-Cory
nebacteriu
m g
lycinip
hilum
AJ
3170
123-S
trepto
myc
es
hyg
rosc
opic
us
subsp
.jinggangensi
s.st
.5008
410-Metagenomes
298-B
ifidobacte
rium
therm
ophilu
m R
BL67
71-N
oca
rdio
psis d
asso
nville
i.st.AT
CC
.23218
414-Peptococcaceae bacterium SCADC1 2 3
16-Kitasatospora setae.st.ATCC.33774
171-Corynebacterium stria
tum ATCC.6940
167-Corynebacteriu
m genitaliu
m ATCC.33030
352-
The
rmog
ladi
us c
ellu
loly
ticus
.st.1
633
371-Lokia
rchaeum
sp.G
C14 75
201-Rothia mucilaginosa.st.DY-18
25-Methanobacterium lacus.st.AL-21
220-Microbacterium trichothecenolyticum
210-Sanguibacter keddieii.st.ATCC.51767
260-Myco
bacteriu
m tu
bercu
losis.st.A
TCC.25618
6-Methanosalsum zhilinae.st.DSM 4017
164-Corynebacteriu
m mustelae
69-T
herm
om
onosp
ora
curva
ta.st.A
TC
C.1
9995
139-
Strep
tom
yces
sp.
NR
RL
WC
-361
8
89-S
trepto
myces s
p.s
t.Sire
xA
A-E
121-S
trepto
myc
es
sp.M
US
C11
9T
365-Meth
anosphaera
sta
dtmanae.s
t.ATCC
.43021
203-Kocuria rhizophila.st.ATCC.9341
295-P
ara
scard
ovia
dentic
ole
ns.D
SM
.10105
417-Brachyspira sp.CAG:700
14-Streptomyces albus.st.ATCC.21838
199-Clavibacter michiganensis.st.ATCC.33113
343-Ilu
mato
bact
er co
ccin
eus
YM
16-3
04
409-Bacillus cereus.st.ATCC.14579
156-Cory
nebacteriu
m g
lucu
ronolyt
icum
ATCC.5
1866
227-Arthrobacter chlorophenolicus.st.ATCC.700700
316-A
mycola
topsis
orienta
lis H
CC
B10007
140-
Strep
tom
yces
viri
doch
rom
ogen
es
183-Corynebacterium testudinoris
103-S
accharo
thrix s
p.S
T-8
88
145-
Stre
ptom
yces
sp.
WM
6378
149-
Cor
yneb
acte
rium
resist
ens.
st.D
SM 4
5100
355-
Hyp
erth
erm
us b
utylicus
.st.D
SM
545
655-N
ocardioides sp.CF8
332-M
icro
monosp
ora
lupin
i str.L
upac
08
276-D
ietzia
cinnam
ea P
4
10-Desulfatitalea sp.BRH c12
253-Nocardia farcinica.st.IFM
10152
208-Cellulomonas flavigena.st.ATCC.482
310-B
ifid
obacte
rium
longum
.st.N
CC
2705
67-T
herm
obifid
a fu
sca.st.Y
X
325-A
ctinopla
nes m
issouriensis
.st.A
TC
C.1
4538
269-D
ele
ted- L
0J2
N8
56-Nocardioides sp.st.B
AA
-499
401-Candidatus Caldiarchaeum subterraneum
394-Natronococcus occultus SP4
353-
The
rmos
phae
ra a
ggre
gans
.st.D
SM
114
86
224-Kocuria sp.UCD-OTCP
158-Cory
nebacteriu
m a
rgento
rate
nse.D
SM.4
4202
315-S
trepto
myces s
p.A
A4
218-Microbacterium foliorum
96
7.p
s s
ec
ym
otp
ertS-
59
354-
arch
aeon
GW
2011
AR10
222-Microbacterium laevaniformans OR221
20-Citreicella sp.357
347-
Aci
doth
erm
us c
ellu
loly
ticus
.st.A
TC
C.4
3068
57-N
oca
rdio
ides lu
teus
2-Methanomethylovorans hollandica.st.DSM 15978
362-
Met
hano
brev
ibac
ter s
p.AbM
4
141-
Strep
tom
yces
sp.
AS58
59-N
aka
mure
lla m
ultip
artita
.st.ATC
C.7
00099
74-S
trepto
myces s
p.C
48-Frankia sp.st.C
cI3
213-Microbacterium azadirachtae
118-S
trepto
myces s
p.M
MG
1121
129-S
trepto
myc
es
nodosu
s
193-Mobilicoccus pelagius NBRC 104925
82-S
trepto
myces n
iveus N
CIM
B 1
1891
138-
Strep
tom
yces
sca
biei
.st.8
7.22
344-M
icro
lunatu
s phosp
hovo
rus.
st.A
TC
C.7
00054
358-
Acidi
anus
hos
pita
lis.s
t.W1
142-
Stre
ptom
yces
gha
naen
sis AT
CC.1
4672
367-Meth
anopyrus
kandle
ri.st
.AV19
10
00
RG
A s
utar
ua
se
cy
mot
per
tS-
49
12-Desulfobacterium autotrophicum.st.ATCC.43914
229-Brachybacterium sp.SW0106-09
87-S
trepto
myces s
p.N
RR
L F
-6602
236-Tsukamurella paurometabola.st.ATCC.8368
188-Janibacter sp.HTCC2649
302-G
ard
nere
lla v
agin
alis
JC
P8481B
209-Cellulomonas fimi.st.ATCC.484
115-S
trepto
myces s
p.W
M6386
113-S
trepto
myces s
p.T
u6071
178-Corynebacterium humireducens NBRC 106098
144-
Stre
ptom
yces
cya
neog
riseu
s su
bsp.
nonc
yano
genu
s
176-Corynebacterium doosanense CAU 212
177-Corynebacterium halotolerans YIM 70093
104-S
trepto
myces s
p.N
RR
L S
-495
286-A
ctin
om
yces tu
ricensis
AC
S-2
79-V
-Col4
267-Mycobacterium
obuense
241-Rhodococcus sp.B7740
33-Vulcanisaeta distributa.st.DSM 14429
364-Meth
anobacteriu
m p
aludis
.st.D
SM 2
5820
312-A
rcanobacte
rium
haem
oly
ticum
.st.A
TC
C.9
345
200-Leifsonia xyli subsp.xyli.st.CTCB07
8-Desulfurispirillum indicum.st.ATCC.BAA-1389
336-P
ropio
nib
act
erium
pro
pio
nic
um
.st.F0230a
363-
Met
hano
brev
ibac
ter s
mith
ii.st
.PS
166-Corynebacteriu
m pseudotuberculosis.st.C231
259-Mycobacterium
kansasii ATCC.12478
282-A
ctinom
yces sp
.ora
l taxo
n 4
48 str.F
0400
13-Streptomyces viridochromogenes
381-Halobacteriu
m sp.DL1
403-Thermococcus gammatolerans.st.DSM 15229
184-Corynebacterium uterequi
330-M
icro
monospora
sp.H
K10
110-S
trepto
myces v
irid
ochro
mogenes.s
t.D
SM
40736
76-S
trepto
myces s
p.W
M4235
60-S
acch
aro
polysp
ora
eryth
raea.st.A
TC
C.11
635
328-M
icro
monospora
aura
ntiaca.s
t.A
TC
C.2
7029
242-Rhodococcus aetherivorans
407-Pyrococcus furiosus.st.ATCC.43587
83-S
trepto
myces s
p.N
RR
L F
-6491
350-
Pyr
olob
us fu
mar
ii.st
.DSM
112
04
385-halophilic archaeon DL31
3-Methanolobus psychrophilus R15
170-Corynebacterium sp.KPL1855
77-S
trepto
myces s
p.N
RR
L S
-4441
27-S
trepto
myc
es
grise
oaura
ntia
cus
M045
297-S
card
ovia
wig
gsia
e F
0424
301-G
ard
nere
lla v
agin
alis
.st.A
TC
C.1
4019
413-Rhodospirillaceae bacterium BRH c57
368-Therm
ofilum
sp.1
910b
268-Mycobacterium
vanbaalenii.st.DS
M 7251
398-Halomicrobium mukohataei.st.ATCC.700874
422-Ricinus communis
186-Kytococcus sedentarius.st.ATCC.14392
314-A
ctinotignum
schaalii
240-Rhodococcus equi ATCC.33707
147-
Cor
yneb
acte
rium
mat
ruch
otii AT
CC.1
4266
40-Firmicutes bacterium
CAG
:114
8F
mu
gn
ol.p
sb
us
mu
gn
ol m
uiret
ca
bo
difiB-
90
3
389-Halorubrum lacusprofundi.st.ATCC.49239
105-S
trepto
myces s
p.C
NQ
-509
234-Micrococcus luteus.st.ATCC.4698
361-
Met
hano
brev
ibac
ter r
umin
antiu
m.s
t.ATC
C.3
5063
130-S
trepto
myc
es
coelic
olo
r.st
.ATC
C.B
AA
-471
335-A
ctin
opla
nes
sp.A
TC
C.3
1044
263-Mycobacterium
sp.EPa45
251-Gordonia rhizosphera N
BRC 16068
235-marine actinobacterium PHSC20C1
338-P
ropio
nib
act
erium
acn
es.
st.K
PA
171202
318-S
accharo
monospora
virid
is.s
t.A
TC
C.1
5386
150-
Cor
yneb
acte
rium
ure
alyt
icum
.st.A
TCC.4
3042
382-Halobacterium salinarum.st.A
TCC.700922
157-Cory
nebacteriu
m a
typicu
m
198-Frigoribacterium sp.RIT-PI-h
24-Burkholderia sp.YI23
116-S
trepto
myces a
verm
itili
s.s
t.A
TC
C.3
1267
272-M
ycobacte
rium
rhodesia
e.st.N
BB
3
119-S
trepto
myc
es
sp.M
US
C136T
246-Nocardia brasiliensis ATCC.700358
397-Halostagnicola larsenii XH-48
248-Gordonia am
arae NBRC 15530
303-B
ifidobacte
rium
scard
ovii
374-Candidatus M
agnetobacterium bavaric
um
134-
Strep
tom
yces
man
grov
isol
i
252-Nocardia cyriacigeorgica.st.G
UH-2
81-S
trepto
myces ts
ukubensis
NR
RL18488
331-V
err
uco
sisp
ora
maris.
st.A
B-1
8-0
32
392-Natrinema sp.st.J7-2
357-
Sul
folo
bus
solfa
taric
us.s
t.ATC
C.3
5092
18-Amphimedon queenslandica
383-Natronomonas pharaonis.st.A
TCC.35678
280-A
ctinom
yces m
assilie
nsis F
0489
300-B
ifidobacte
rium
dentiu
m.s
t.AT
CC
.27534
207-Isoptericola variabilis.st.225
327-A
ctinopla
nes s
p.N
902-1
09
191-Janibacter hoylei PVAS-1
239-Gordonia sputi.NBRC.100414
326-P
seudonocard
ia d
ioxaniv
ora
ns.s
t.A
TC
C.5
5486
255-Mycobacterium
xenopi RIV
M700367
264-Mycobacterium
sinense.st.JDM
601
421-Amphimedon queenslandica
21-Brevundimonas sp.AAP58
90-S
trepto
myces p
rate
nsis
.st.A
TC
C.3
3331
26-Lyngbya sp.st.PCC 8106
408-Methanocaldococcus jannaschii.st.ATCC.43067
238-Gordonia polyisoprenivorans.st.DSM 44266
393-Natrinema pellirubrum.st.DSM 15624
412-Thiothrix nivea.DSM.5205
187-Dermacoccus nishinomiyaensis
31-Pyrobaculum aerophilum
.st.ATCC.51768
311-T
ruepere
lla p
yogenes
155-Cory
nebacteriu
m k
roppenst
edtii.st
.DSM
44385
356-
Sul
folo
bus
toko
daii.st
.DSM
169
93
180-Corynebacterium imitans
317-A
mycola
topsis
mediterr
anei.st.S
699
323-K
ribbella
fla
vid
a.s
t.D
SM
17836
221-Microbacterium ginsengisoli
42-Deinococcus sw
uensis
254-Mycobacterium
elephantis
174-Corynebacterium xerosis
137-
Act
inob
acte
ria b
acte
rium
OK07
4
165-Corynebacteriu
m vitaeruminis.D
SM.20294
351-
Sta
phyl
othe
rmus
mar
inus
.st.A
TC
C.4
3588
120-S
trepto
myc
es
svic
eus
AT
CC
.29083
68
45
2.C
CT
A sil
ari
ps
ea
nit
sir
p s
ec
ym
otp
ert
S-2
9
101-K
itasato
spora
seta
e.s
t.A
TC
C.3
3774
366-Meth
anobacteriu
m la
cus.
st.A
L-21
11
0D
A.t
s.si
tc
al.
ps
bu
s sil
ami
na
mui
ret
ca
bo
difi
B-4
03
289-A
ctin
om
yces u
rogenita
lis.D
SM
.15434
271-M
ycobacte
rium
sp.st.K
MS
88-S
trepto
myces g
riseus s
ubsp.g
riseus.s
t.JC
M 4
626
1-Firmicutes bacterium CAG:582
337-P
ropio
nib
act
erium
aci
dip
ropio
nic
i.st.ATC
C.4
875
237-Rhodococcus sp.RD6.2
28-Bacillus decisifrondis
339-P
rop fre
udenre
ichii
subsp
.sherm
anii.
st.A
TC
C.9
614
192-Kineosphaera limosa NBRC 100340
41-Frankia sp.EUN1f
98-S
trepto
myces b
ingchenggensis
.st.B
CW
-1
61-A
myco
licicoccu
s subfla
vus.st.D
SM
45089
217-Microbacterium oxydans
54-Nocardioides sim
plex
294-B
ifidobacte
rium
kashiw
anohense P
V20-2
262-Mycobacterium
haemophilum
345-
Act
inob
acte
ria b
acte
rium
IMC
C26
256
375-Rhizophagus irr
egularis D
AOM 197198w
228-Brachybacterium faecium.st.ATCC.43885
23-Burkholderia vietnamiensis.st.G4
Tree scale
9
Supplementary Figure 7. Phylogenetic analyses of NucS N-terminal (NT) region. Unrooted ML
tree of NT sequences from the PFAM PF01939 domain. Black branches, Actinobacteria; red
Deinococcus-Thermus; light blue, Archaea. Labels indicate species name. The circles represent
>80% bootstrap in 1,500 replicates.
115-Bifidobacterium_breve.DSM.20213
34-Streptom
yces_albus.st.ATCC.21838
166-Natronomonas_pharaonis.st.ATCC.35678
97-Amycolicicoccus_subflavus.st.DSM_45089
289-Gordonia_amarae.NBRC.15530
254-Curtobacterium_flaccumfaciens_UCD-AKU
273-Mycobacterium_hassiacum.DSM.44199
108-Bifidobacterium_bifidum_BGN4
42-Streptomyces_sp.NRRL_S-495
84-Amycolatopsis_orientalis_HCCB10007
262-
325-Coryneba
cterium
_vitaerum
inis.DS
M.2029
4
6-Streptomyces_lydicus_A02
131-Halomicrobium_mukohataei.st.ATCC.700874
308-Corynebacterium_humireducens.NBRC.106098_=.DSM.45392
159-Methanobrevibacter_
smithii.st.PS
247-Arthrobacter_sp.PAMC25486
120-Actinomyces_sp.oral_taxon_848_str.F0332
285-Gordonia_sputi.NBRC.100414
126-halophilic_archaeon_DL31
161-Methanosphaera_stadtmanae.st.ATC
C.43021
179-Halogeometricum_borinquense.st.ATCC.700274
12-Streptomyces_sp.NRRL_WC-3618
337-Coryne
bacterium_je
ikeium.st.K4
11
143-Na
tronoba
cterium
_grego
ryi.st.AT
CC.43098
344-Propionibacterium_acidipr
opionici.st.ATCC.4875
151-Pyroco
ccus_abyss
i.st.GE5
93-Nocardia_nova_SH22a
216-Microbacterium
_ketosireducens
354-Thermobifida_fusca.st.YX
279-Mycobacterium_bovis
226-Cellulomonas_gilvus.st.A
TCC.13127
40-Streptomyces_caelestis
358-Frankia_sp.st.EuI1c
208-Actinomyces_sp.oral_taxon_178_str.F0338
170-Halomicrobium_mukohataei.st.ATCC.700874
43-Streptomyces_rubellomurinus
76-Streptomyces_zinciresistens_K42
1-Streptomyces_bingchenggensis.st.BCW-1
299-Corynebacterium_xerosis
275-Mycobacterium_marinum.st.ATCC.BAA-535
315-Corynebacterium_aurimucosum.st.ATCC.700975
51-Actinoplanes_misso
uriensis.st.A
TCC.14538
78-Pseudonocardia_dioxanivorans.st.ATCC.55486
293-Rhodococcus_sp.B7740
101-Brevibacterium_mcbrellneri.ATCC.49030
230-Iso
ptericola_variabilis.st.2
25
21-Actinobacteria_bacterium_OK006
74-Actinobacteria_bacterium_OV320
242-Brachybacterium_faecium.st.ATCC.43885
133-Halopiger_xanaduensis.st.DSM_18323
45-Verrucosisp
ora_maris.st.A
B-18-032
105-Gardnerella_vaginalis_JCP8481B
220-Microbacterium
_sp.SA39
110-Bifidobacterium_pseudolongum_PV8-2
272-Mycobacterium_neoaurum_VKM_Ac-1815D
227-Cellulomonas_flavigena.st.A
TCC.482
152-Methan
ocaldococc
us_jannasc
hii.st.ATCC
.43067
33-Streptom
yces_glaucescens
46-Micromonospora_sp.HK10
212-Microbacterium
_laevaniformans_O
R221
165-Methanobacterium_sp.MB1
66-Streptomyces_sp.PBH53
198-Methanocella_paludicola.st.DSM_17711
318-Corynebacterium_glutamicum.st.ATCC.13032
319-Corynebacterium_argentoratense.DSM.44202
180-Haloferax_volcanii.st.ATCC.29605
326-Coryneba
cterium
_kutscheri
79-Saccharopolyspora_erythraea.st.ATCC.11635
13-Streptomyces_avermitilis.st.ATCC.31267
69-Streptomyces_pratensis.st.ATCC.33331
209-Actinomyces_sp.oral_taxon_180_str.F0310
122-Halobacterium_salinarum.st.ATCC.700922
7-Streptomyces_auratus_AGR0001
50-Actinoplanes_sp.st.A
TCC.31044
304-Turicella_otitidis.ATCC.51513
39-Streptomyces_griseoflavus_Tu4000
113-Bifidobacterium_kashiwanohense_PV20-2
169-Haloarcula_marismortui.st.ATCC.43049
184-Caldivirga_maquilingensis.st.ATCC.700844
144-Ha
lobacter
ium_sa
linarum
.st.ATCC
.700922
47-Micromonospora_sp.NRRL_B-16802
11-Streptomyces_sp.AS58
250-Arthrobacter_chlorophenolicus.st.ATCC.700700
67-Streptomyces_hygroscopicus_subsp.jinggangensis.st.5008
53-Micromonospora_lupini_str.Lupac_08
27-Streptom
yces_sp.C
17-Streptomyces_tsukubensis_NRRL18488
96-Stackebrandtia_nassauensis.st.DSM_44728
200-Conexibacter_woesei.st.DSM_14684
215-Microbacterium
_trichothecenolyticum
291-Rhodococcus_aetherivorans
162-Methanothermobacter_thermautotrophicus.st.ATCC.2909
6
44-Saccharothrix_sp.ST-888
360-Frankia_symbiont_subsp.Datisca_glomerata
38-Streptom
yces_griseus_subsp.griseus.st.JCM_4626
119-Rothia_mucilaginosa.st.DY-18
132-Halovivax_ruber.st.DSM_18193
248-Arthrobacter_globiformis.NBRC.12137
61-Streptomyces_mangrovisoli
52-Actinoplanes_friuliensis.DSM.7358
62-Streptomyces_cyaneogriseus_subsp.noncyanogenus
98-Kutzneria_albida.DSM.43870
30-Streptom
yces_sp.Mg1
37-Streptom
yces_vietnamensis
48-Micromonospora_aurantiaca.st.A
TCC.27029
300-Corynebacterium_sp.ATCC.6931
332-Cory
nebacteri
um_varia
bile.st.D
SM_4470
2
71-Streptomyces_sp.MMG1121
362-Streptomyces_violaceusniger_Tu_4113
234-Austwickia
_chelonae.NBRC.105200
172-Natrinema_pellirubrum.st.DSM_15624
218-Microbacterium
_azadirachtae
178-halophilic_archaeon_DL31
194-Deinococcus_swuensis
14-Streptomyces_nodosus
245-Mobiluncus_curtisii.st.ATCC.43063
25-Streptomyces_variegatus
100-Kytococcus_sedentarius.st.ATCC.14392
160-Methanobrevibacter_sp.Ab
M4
357-Frankia_sp.st.CcI3
127-Haloarcula_marismortui.st.ATCC.43049
134-Halostagnicola_larsenii_XH-48
189-Staphylothermus_marinus.st.ATCC.43588
309-Corynebacterium_marinum.DSM.44953
139-Therm
ofilum_pe
ndens.st.H
rk_5
295-Rhodococcus_equi.ATCC.33707
5-Streptomyces_sp.769
298-Dietzia_cinnamea_P4
345-Propionimicrobium_lymphophilum_AC
S-093-V-SCH5
322-Corynebacterium_pseudotuberculosis.st.C231
258-Leifsonia_xyli_subsp.xyli.st.CTCB07
4-Streptomyces_rimosus_subsp.rimosus.ATCC.10970
111-Bifidobacterium_dentium.st.ATCC.27534
123-Halorubrum_lacusprofundi.st.ATCC.49239
190-Hyperthermus_butylicus.st.DSM_5456
241-Brachybacterium_sp.SW0106-09
361-Frankia_sp.st.EAN1pec
103-Nocardioides_luteus
36-Streptom
yces_sp.NRRL_F-6491
228-Jonesia_denitrifica
ns.st.A
TCC.14870
367-Streptomyces_sp.Tu6071
181-Haloquadratum_walsbyi.st.DSM_16790338-Coryneba
cterium_lipoph
iloflavum.DSM
.44291
233-Kineosphaera_limosa.NBRC.100340
114-Bifidobacterium_animalis_subsp.lactis.st.AD011
60-Streptomyces_sp.e14
321-Corynebacterium_diphtheriae.st.ATCC.700971
333-Cory
nebacteri
um_glyci
niphilum_
AJ_3170
70-Streptomyces_incarnatus
349-Nocardiopsis_sp.NRRL_B-16309
264-Mycobacterium_sp.st.KMS
288-Gordonia_rhizosphera.NBRC.16068
252-Arthrobacter_arilaitensis.st.DSM_16368
83-Amycolatopsis_mediterranei.st.S699
195-Deinococcus_gobiensis.st.DSM_21396
31-Streptom
yces_sp.WM6372
290-Gordonia_sihwensis.NBRC.108236
137-Haloquadratum_walsbyi.st.DSM_16790
278-Mycobacterium_tuberculosis.st.ATCC.25618
94-Nocardia_brasiliensis.ATCC.700358
281-Mycobacterium_haemophilum
327-Co
ryneba
cterium
_haloto
lerans_
YIM_70
093
282-Mycobacterium_paratuberculosis.st.ATCC.BAA-968
294-Rhodococcus_erythropolis.st.PR4
121-Actinomyces_neuii_BVS029A5
125-Haloferax_volcanii.st.ATCC.29605
77-Streptomyces_cattleya.st.ATCC.35852
246-Kocuria_sp.UCD-OTCP
155-Thermoco
ccus_barophilu
s.st.DSM_1183
6
206-Actinotignum_schaalii
22-Streptomyces_sp.W
M6378
365-Streptomyces_sp.NRRL_F-6602
55-Catenulispora_acidiphila.st.DSM_44928
259-Streptomyces_sp.AA4
263-Mycobacterium_elephantis
138-Thermofilum_sp.1910b
91-Nocardia_farcinica.st.IFM_10152
168-Halalkalicoccus_jeotgali.st.DSM_18796
146-Me
thanoce
lla_palu
dicola.st.DS
M_1771
1
204-Actinomyces_m
assiliensis_F0489
302-Corynebacterium_ureicelerivorans
323-Coryneba
cterium
_matruchotii.ATCC.14266
297-Rhodococcus_pyridinivorans_SB3094
202-Actinomyces_sp.oral_taxon_175_str.F0384
346-Propionibacterium_propionicum.st.F0230a
148-Acid
ianus_ho
spitalis.st
.W1
320-Corynebacterium_efficiens.st.DSM_44549
154-Thermoc
occus_gamm
atolerans.st.D
SM_15229
92-Nocardia_cyriacigeorgica.st.GUH-2
229-Cellulomonas_fimi.st.A
TCC.484
140-miscellaneou
s_Crenarchae
ota_group
-6_archae
on_AD8-1
54-Salinispora_tropica.st.ATCC.BAA-916
49-Actinoplanes_sp.N902-109
185-Thermoproteus_sp.AZ2
188-Thermosphaera_aggregans.st.DSM_11486
106-Bifidobacterium_thermophilum_RBL67
329-Co
rynebac
terium_s
p.KPL1
989
142-Halobacteriu
m_sp.DL1
28-Streptom
yces_katrae
102-Aeromicrobium_marinum.DSM.15272
219-Microbacterium
_sp.Ag1
336-Coryne
bacterium_
resistens.s
t.DSM_4510
0
210-Arcanobacterium
_haemolyticum
.st.ATCC.9345
277-Mycobacterium_parascrofulaceum.ATCC.BAA-614
213-Microbacterium
_testaceum.st.StLB037
312-Corynebacterium_casei_LMG_S-19264
117-Nocardioides_sp.st.BAA-499
348-Nocardiopsis_dassonvillei.st.ATCC.23218
328-Co
rynebac
terium_t
estudino
ris
271-Mycobacterium_smegmatis.st.ATCC.700084
239-Kineococcus_radiotolerans.st.ATCC.BAA-149
58-Streptomyces_niveus_NCIMB_11891
301-Corynebacterium_epidermidicanis
72-Streptomyces_sp.MUSC136T
368-Streptomyces_clavuligerus.st.ATCC.27064
355-Acidothermus_cellulolyticus.st.ATCC.43068
225-Xylanimonas_cellulosilytica
.st.DSM_15894
158-Methanobrevibac
ter_ruminantium.st.AT
CC.35063
317-Corynebacterium_uterequi
211-Trueperella_pyogenes
221-Microbacterium
_hydrocarbonoxydans
99-Nakamurella_multipartita.st.ATCC.700099
356-Frankia_alni.st.ACN14a
196-Deinococcus_proteolyticus.st.ATCC.35074
116-Nocardioides_sp.CF8
59-Streptomyces_pluripotens
2-Streptomyces_sp.CNQ-509
141-miscellaneou
s_Crenarchae
ota_group
-1_archae
on_SG8-32-1
81-Saccharomonospora_marina_XMU15
244-Kocuria_rhizophila.st.ATCC.9341
65-Streptomyces_roseochromogenus_subsp.oscitans_DS_12.976
217-Microbacterium
_ginsengisoli
269-Mycobacterium_vanbaalenii.st.DSM_7251
128-Natronomonas_moolapensis.st.DSM_18674
193-Marinithermus_hydrothermalis.st.DSM_14884
75-Streptomyces_viridochromogenes
347-Microlunatus_phosphovorus.st.ATCC.700054
145-Me
thanoce
lla_arvo
ryzae.st
.DSM_2
2066
359-Frankia_sp.EUN1f
10-Streptomyces_griseoaurantiacus_M045
9-Streptomyces_pristinaespiralis.ATCC.25486
223-Microbacterium_oxydans
237-Janibacter_hoylei_PVAS-1
306-Corynebacterium_glucuronolyticum.ATCC.51866
118-Rothia_dentocariosa.st.ATCC.17931
23-Streptomyces_sviceus.ATC
C.29083
176-Natronococcus_occultus_SP4
56-Kitasatospora_setae.st.ATCC.33774
311-Corynebacterium_doosanense_CAU_212_=.DSM.45436
296-Rhodococcus_jostii.st.RHA1
173-Natronobacterium_gregoryi.st.ATCC.43098
174-Natrinema_sp.st.J7-2
270-Mycobacterium_obuense
283-Mycobacterium_sp.EPa45
183-Vulcanisaeta_distributa.st.DSM_14429
32-Streptom
yces_sp.NRRL_S-444
268-Mycobacterium_thermoresistibile.ATCC.19527
149-Sulfo
lobus_ac
idocaldari
us.st.ATC
C.33909
314-Corynebacterium_camporealensis
8-Streptomyces_xiamenensis
340-
366-Streptomyces_sp.st.SPB074
164-Methanobacterium_paludis.st.DSM_25820
186-Pyrobaculum_aerophilum.st.ATCC.51768
257-Frigoribacterium_sp.RIT-PI-h
207-Actinomyces_turicensis_AC
S-279-V-Col4
87-Saccharothrix_espanaensis.st.ATCC.51144
205-Actinomyces_sp.oral_taxon_448_str.F0400
240-Beutenbergia_cavernae.st.ATCC.BAA-8
109-Bifidobacterium_gallicum.DSM.20093
286-Gordonia_bronchialis.st.ATCC.25592
182-Ignisphaera_aggregans.st.DSM_17230
197-Thermogladius_cellulolyticus.st.1633
334-Cory
nebacteriu
m_urealyt
icum.st.AT
CC.43042
260-Tsukamurella_paurometabola.st.ATCC.8368
112-Bifidobacterium_adolescentis.st.ATCC.15703
80-Saccharomonospora_glauca_K62
64-Streptomyces_viridochromogenes.st.DSM_40736
305-Corynebacterium_genitalium.ATCC.33030
15-Streptomyces_sp.MMG1533
86-Actinosynnema_mirum.st.ATCC.29888
324-Coryneba
cterium
_mustelae
351-Streptosporangium_roseum.st.ATCC.12428
287-Gordonia_sp.KTR9
167-Natronomonas_moolapensis.st.DSM_18674
82-Saccharomonospora_viridis.st.ATCC.15386
104-Nocardioides_simplex
187-Candidatus_Caldiarchaeum_subterraneum
316-Corynebacterium_striatum.ATCC.6940
352-Thermobispora_bispora.st.ATCC.19993
20-Streptomyces_sp.W
M4235
280-Mycobacterium_leprae.st.TN
343-Propionibacterium_f
reudenreichii_subsp.sher
manii.st.ATCC.9614
147-Sul
folobus_s
olfataricu
s.st.ATC
C.35092
214-Microbacterium
_chocolatum
156-Thermococc
us_litoralis.st.AT
CC.51850
235-Mobilico
ccus_pelagius.NBRC.104925
364-Streptomyces_sp.PVA_94-07
129-Natronomonas_pharaonis.st.ATCC.35678
171-Halostagnicola_larsenii_XH-48
339-Coryneba
cterium_kropp
enstedtii.st.DS
M_44385
267-Mycobacterium_xenopi_RIVM700367
255-Clavibacter_michiganensis_subsp.sepedonicus.st.ATCC.33113
342-Geodermatophilu
s_obscurus.st.ATCC.
25078
107-Bifidobacterium_angulatum.DSM.20098
153-Pyrococ
cus_furiosus
.st.ATCC.43
587
199-archaeon_GW2011_AR10
150-Sulfo
lobus_toko
daii.st.DSM
_16993
191-Deinococcus_maricopensis.st.DSM_21211
89-Lechevalieria_aerocolonigenes
157-Thermococcu
s_kodakarensis.st.
ATCC.BAA-918
276-Mycobacterium_kansasii.ATCC.12478
19-Actinobacteria_bacterium_OV450
3-Streptomyces_sp.NRRL_F-6602
249-Arthrobacter_sp.st.FB24
73-Streptomyces_sp.WM6386
238-Janibacter_sp.HTCC2649
253-Leucobacter_sp.Ag1
307-Corynebacterium_atypicum
135-Natrialba_magadii.st.ATCC.43099
57-Streptomyces_davawensis_JCM_4913
256-marine_actinobacterium_PHSC20C1
85-Nocardia_sp.NRRL_S-836
335-Coryne
bacterium_
falsenii.DSM
.44353
266-Mycobacterium_rhodesiae.st.NBB3
63-Streptomyces_collinus.st.DSM_40733
35-Actinobacteria_bacterium
_OK074
68-Streptomyces_sp.st.SirexAA-E
203-Actinomyces_urogenitalis.DSM
.15434
350-Nocardiopsis_alba.st.ATCC.BAA-2165
284-Gordonia_polyisoprenivorans.st.DSM_44266
303-Corynebacterium_imitans
331-Ilum
atobacte
r_coccin
eus_YM1
6-304
330-Act
inobacte
ria_bac
terium_I
MCC26
256
236-Dermacoccus_nishinomiyaensis
261-Mycobacterium_abscessus.st.ATCC.19977
16-Streptomyces_venezuelae.st.ATCC.10712
310-Corynebacterium_maris.DSM.45190
243-Micrococcus_luteus.st.ATCC.4698
29-Streptom
yces_sp.NRRL_F-4428
313-Corynebacterium_sp.KPL1855
224-Sanguibacter_keddieii.st.A
TCC.51767
201-Actinomyces_graevenitzii_C83
163-Methanobacterium_lacus.st.AL-21
353-Thermomonospora_curvata.st.ATCC.19995
124-Halogeometricum_borinquense.st.ATCC.700274
95-Kribbella_flavida.st.DSM_17836
251-Renibacterium_salmoninarum.st.ATCC.33209
175-Halopiger_xanaduensis.st.DSM_18323
231-Terrabacter_sp.28
88-Saccharothrix_sp.NRRL_B-16348
24-Streptomyces_sp.M
USC119T
363-Streptomyces_himastatinicus.ATCC.53653
292-Rhodococcus_sp.RD6.2
130-Halorhabdus_tiamatea_SARL4B
274-Mycobacterium_sp.VKM_Ac-1817D
90-Segniliparus_rotundus.st.ATCC.BAA-972
222-Microbacterium
_foliorum
341-Blastococcus
_saxobsidens.st.D
D2
26-Streptom
yces_ghanaensis.ATCC.14672
192-'Deinococcus_soli'_Cha_et_al._2014
41-Streptomyces_coelicolor.st.A
TCC.BAA-471
136-Natronobacterium_gregoryi.st.ATCC.43098
265-Mycobacterium_sinense.st.JDM601
177-Halorubrum_lacusprofundi.st.ATCC.49239
232-Intrasporangium_calvum.st.A
TCC.23552
18-Streptomyces_scabiei.st.87.22
Tree scale: 1
ACTINOBACTERIA A
RCHAEA
ARCHAEA
Deinococcus
Deinococcus
10
Supplementary Tables
Supplementary Table 1. Mutation rates of M. smegmatis and its ∆nucS derivatives. Rates of
spontaneous mutations conferring rifampicin (Rif-R) and streptomycin resistance (Str-R) of M.
smegmatis mc2
155 (WT), M. smegmatis ∆nucS and M. smegmatis ∆nucS complemented with the
wild-type nucS from M. smegmatis mc2
155 (nucSSm). Fold change indicates the increase in
mutation rate with respect to the strain M. smegmatis mc2
155 (set to 1). Mut Rate: mutation rate
(mutations per cell per generation).
Strain Genotype Mut Rate
Rif 95% CI Fold
Mut Rate
Str 95% CI Fold
mc2 155 WT 2.07x10‐9 1.47‐2.74x10‐9 1 5.58x10‐10 2.25‐11.1x10‐10 1
∆nucS ∆nucS 3.11x10‐7 2.81‐3.39x10‐7 150.2 4.82x10‐8 3.53‐6.17x10‐8 86.3
∆nucS/nucSSm Complemented 3.02x10‐9 2.2‐3.97x10‐9 1.46 1.43x10‐9 0.77‐2.33x10‐9 2.56
11
Supplementary Table 2. Mutational spectrum of M. smegmatis mc2 155 and its ∆nucS derivative.
A) Spontaneous mutations conferring rifampicin resistance found in 41 independent Rif-R mutants in
the WT and ∆nucS strains. The columns show the position of the mutations in the rpoB gene sequence,
the codon change (modified bases are in bold), the amino acid change caused by each mutation and the
number of independent Rif-R mutants isolated from mc2 155 (WT) or its ΔnucS derivative. B) Specificity
of base substitutions, summarized by class, produced in the WT and ∆nucS strains.
A)
Position
Codon change
Amino acid
change
WT
ΔnucS
A 1295 G
GAC→GGC
Asp 432 Gly
0
1
C 1324 T
CAC→TAC
His 442 Tyr
6
6
C 1324 G
CAC→GAC
His 442 Asp
2
0
A 1325 G
CAC→CGC
His 442 Arg
13
20
A 1325 C
CAC→CCC
His 442 Pro
4
0
G 1334 A
CGT →CAT
Arg 445 His
0
6
G 1334 C
CGT→CCT
Arg 445 Pro
1
0
G 1334 T
CGT→CTT
Arg 445 Leu
2
0
C 1340 T
TCG→TTG
Ser 447 Leu
9
6
C 1340 G
TCG→TGG
Ser 447 Trp
2
0
T 1346 C
CTG→CCG
Leu 449 Pro
0
2
G 1348 A
GGC→AGC
Gly 450 Ser
1
0
Δ CCAGCTGTC
(1275-1283)
CCAGCTGTC
Ser 425 Arg
+ ΔGlnLeuSer
(426-428)
1
0
TOTAL
41
41
12
B)
Mutation WT (%)
n=41
∆nucS (%)
n=41
G:C→A:T 16 (39.0) 18 (43.9)
A:T→G:C 13 (31.7) 23 (56.1)
G:C→T:A 2 (4.9) 0
A:T→T:A 0 0
G:C→C:G 5 (12.2) 0
A:T→C:G 4 (9.7) 0
Deletions 1 (2.4) 0
13
Supplementary Table 3. Mutation rates of S. coelicolor A3(2) M145 and its ∆nucS
derivatives. Rates of spontaneous mutations conferring rifampicin (Rif-R) and streptomycin
resistance (Str-R) of S. coelicolor A3(2) M145, its ∆nucS derivative and S. coelicolor ∆nucS
complemented with the wild-type nucS from S. coelicolor (nucSSco). 95% confidence intervals (CI)
are indicated. Mut Rate: mutation rate (mutations per cell per generation).
Strain Genotype Mut Rate
Rif-R 95% CI Fold
Mut Rate
Str-R 95% CI Fold
S. coelicolor
A3(2) M145 WT 5.11x10
-9 2.87-8.01x10
-9 1 4.30x10
-10 0.24-18.9x10
-10 1
∆nucS ∆nucS 5.54x10-7
4.20-6.97x10-7
108.4 8.49x10-8
5.78-11.4x10-8
197.4
∆nucS/nucSSco Complemented 2.01x10-9
0.83-3.88x10-9
0.5 4.30x10-10
0.24-18.9x10-10
1
14
Supplementary Table 4. Characteristics of the M. tuberculosis representative strains
containing NucS polymorphisms. Genome ID/common name, NucS polymorphism, resistance
profile, lineage and origin of the strains are shown. Resistance profile indicates not
detected/unknown antibiotic resistances (Susceptible) or MDR, multidrug-resistant strain
(expressing at least rifampicin and isoniazid resistance).
GenomeID/
name Polymorphism
Resistance
profile Lineage Origin
CDC1551 WT Susceptible 4 North America
TKK_02_0079 S39R MDR 4 South Africa
MTB_N1057 S54I Susceptible 4 South Asia
KT-0040 A67S Susceptible 2 S. Korea (Broad Inst)
ERR036236 V69A Susceptible 1 Unknown
BTB 04-388 A135S MDR 3 Sweden (Broad Inst)
BTB 07-246 R144S MDR 4 Sweden (Broad Inst)
TKK_03_0044 D162H Susceptible 4 South Africa
HN2738 T168A Unknown Unknown Unknown (Broad Inst)
MTB_X632 K184E MDR 4 Central America
15
Supplementary Table 5. Effect of M. tuberculosis NucS naturally occurring polymorphisms on
mutation rates. Rates of spontaneous mutations conferring rifampicin resistance (Mut rate,
mutations per cell per generation) of M. smegmatis ∆nucS complemented with the wild-type nucS
from M. tuberculosis (nucSTB) or the 9 polymorphic alleles. 95% confidence intervals (CI) are
shown. Fold change indicates the increase in mutation rate with respect to the strain M. smegmatis
∆nucS complemented with wild-type nucSTB (∆nucS/nucSTB), set to 1.
Amino acid
change Codon change Mut rate 95% CI
Fold
change
M. smegmatis
∆nucS/nucSTB
nucSTB from
CDC1551 4.17x10
-9 3.29-5.12x10
-9 1
S39R AGC→AGG 3.49x10-7
2.93-4.0x10-7
83.7
S54I AGT→ATT 7.94x10-9
5.49-11.07x10-9
1.9
A67S GCG→TCG 2.78x10-9
1.87-3.84x10-9
0.7
V69A GTG→GCG 8.77x10-9
6.50-11.2x10-9
2.1
A135S GCG→TCG 3.57x10-8
2.77-4.38x10-8
8.6
R144S CGC→AGC 3.16x10-8
2.40-4.0x10-8
7.6
D162H GAC→CAC 8.98x10-9
6.42-11.80x10-9
2.2
T168A ACC→GCC 3.96x10-8
3.09-4.81x10-8
9.5
K184E AAG→GAG 3.06x10-8
2.37-3.75x10-8
7.3
16
Supplementary Table 6. Sequence of oligonucleotides used in this work.
Oligonucleotide Sequence (5’-3’) Purpose
SecMycomar CCCGAAAAGTGCCACCTAAATTGTAAGCG Localization of Tn insertion site
DelnucSm5F CCCGCTGCAGCTGGCCGAGTTCGG nucSSm M. smegmatis deletion
DelnucSm5R CGGTAAGCTTGGCTATCACGAGGCGCACCC nucSSm M. smegmatis deletion
DelnucSm3F CGGAAAGCTTAGCGACGAGTACCGGCTCTT nucSSm M. smegmatis deletion
DelnucSm3R AATCGTCGACCGAACCCATCAACTTACCGA nucSSm M. smegmatis deletion
CompnucTBF ACTGGAATTCTCGAGTGGTGGCCTTCTCGGATGGCAT nucSTB for ΔnucSSm complementation
CompnucTBR ACTGAAGCTTTCAGAACAGCCGGTACTCGCCGCT nucSTB for ΔnucSSm complementation
CompnucSmF ACTGGAATTCCCGCGCCAGCGAATTGTCGGCGTTCAT nucSSm for ΔnucSSm complementation
CompnucSmR CGGCAAGCTTTCAGAAGAGCCGGTACTCGTCGCT nucSSm for ΔnucSSm complementation
RifRRDRrpoBF
GTGGCGGCGATCAAGGAGTTCTTC RRDR-rpoBSm amplification
RifRRDRrpoBR
GGCGACCGACACCATCTGGCGCGG RRDR-rpoBSm amplification
DelnucSco5F
GTGTAAGCTTCGGCACCGCGGTGAGTGTGC nucSSco S. coelicolor deletion
DelnucSco5R CGACGGATCCGCGGGCAATGACGAGACGCA nucSSco S. coelicolor deletion
DelnucSco3F GCTGGGATCCTTCTGAGGGCGCGACGCGTC nucSSco S. coelicolor deletion
DelnucSco3R CAGGGATATCTGCCCGCCCTGGTCGGCGAG nucSSco S. coelicolor deletion
CompnucScoF TACGGAATTCGGGTTCTCCTCTCGCACCCCCGACCAGCAGGGG nucSSco for ΔnucSSco complementation
CompnucScoR TACGGGATCCTCAGAACAGCCGCAGCTTGTCGTCCTCGATGCC nucSSco for ΔnucSSco complementation
Rechyg3F CAGTTTCATTTGATGCTCGATGAG Recombination. pRhomyco 100%
Rechyg3R GACTAACTAGTCAGGCGCCGGGGGCGGTGT Recombination. pRhomyco 100%
Rechyg5F GATGCAGCTGGGAGTGCGCTGTGACACAAGAATCC Recombination. pRhomyco 100%
Rechyg5R CGATAAGCTTCGAATTCTGCAGCTCG Recombi. pRhomyco 100%, 95%, 90% and
85%
Rechyg5intR GATGTGATCACCGGGTCGGGCTCG Recombi. pRhomyco 95%, 90% and 85%
Rec95-BclIF GATGTGATCAAGCTGTTCGGGGAGCACTG
Recombination. pRhomyco 95%
Rec95-NheIR
GATGGCTAGCGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 95%
Rec90-85-BclIF GATGTGATCAAGCTCTTCGGGGAACACTG Recombination. pRhomyco 90% and 85%
Rec90-85-PciIR GATGACATGTGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 90% and 85%
S39RF GGCCGACGGATCGGTCAGGGTACATGCTGACGACCG Site-directed mutagenesis nucSTB
S39RR CGGTCGTCAGCATGTACCCTGACCGATCCGTCGGCC Site-directed mutagenesis nucSTB
S54IF CCGTTGAACTGGATGATTCCGCCGTGCTGGTTG Site-directed mutagenesis nucSTB
17
S54IR CAACCAGCACGGCGGAATCATCCAGTTCAACGG Site-directed mutagenesis nucSTB
A67SF AGAGTCCGGCGGCCAGTCGCCAGTGTGGGTGGTCG Site-directed mutagenesis nucSTB
A67SR CGACCACCCACACTGGCGACTGGCCGCCGGACTCTTC Site-directed mutagenesis nucSTB
V69AF CCGGCGGCCAGGCGCCAGCGTGGGTGGTCGAGAAC Site-directed mutagenesis nucSTB
V69AR GTTCTCGACCACCCACGCTGGCGCCTGGCCGCCGG Site-directed mutagenesis nucSTB
A135SF GCCGCGAGTACATGACCTCGATCGGACCCGTCGAC Site-directed mutagenesis nucSTB
A135SR GTCGACGGGTCCGATCGAGGTCATGTACTCGCGGC Site-directed mutagenesis nucSTB
A162HF GCGGCGTGGCGAGATCCACGGCGTGGAGCAGCTGAC Site-directed mutagenesis nucSTB
A162HR GTCAGCTGCTCCACGCCGTGGATCTCGCCACGCCGC Site-directed mutagenesis nucSTB
T168AF GCGTGGAGCAGCTGGCCCGCTACCTCGAGTTGC Site-directed mutagenesis nucSTB
T168AR GCAACTCGAGGTAGCGGGCCAGCTGCTCCACGC Site-directed mutagenesis nucSTB
K184EF GTGCTCGCGCCGGTCGAGGGGGTGTTTGCCG Site-directed mutagenesis nucSTB
K184ER CGGCAAACACCCCCTCGACCGGCGCGAGCAC Site-directed mutagenesis nucSTB
pvv16-seq-fwd CGGTGAGTCGTAGGTCGGGACGG PCR amplification and sequencing
pvv16-seq-rev TGCCTGGCAGTCGATCGTACGCTAG PCR amplification and sequencing
18
Supplementary Data.
Supplementary Data 1. Taxonomic distribution of NucS in reference proteomes (Big excel
data file).
Supplementary Data 2. Taxonomic tree for phylogenetic profiling in newick format. Raw
NCBI profiling tree from archaeal and bacterial species extracted from NCBI in newick format,
corresponding to Figure 5.
Supplementary Data 3. Phylogenetic tree of full NucS in newick format. Raw Maximum
Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 5 (full NucS) in
newick format.
Supplementary Data 4. Detailed information for phylogenetic trees of NucS regions. Sequence
details of NucS-CT and NucS-NT labels on Supplementary Figures 6-7 trees (Excel file).
Supplementary Data 5. Phylogenetic tree of C-terminal NucS in newick format. Raw
Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 6
(NucS-CT region) in newick format.
Supplementary Data 6. Phylogenetic tree of N-terminal NucS in newick format. Raw
Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary figure 7
(NucS-NT region) in newick format.
19
Supplementary Methods.
Generation of a ΔnucS knockout mutant in M. smegmatis.
To generate a ΔnucS knockout mutant in M. smegmatis, a 1.0 kb PstI-HindIII upstream and HindIII-
SalI downstream fragments of nucS were amplified by PCR, using delnucSm 5F, 5R, 3F and 3R
primers, and cloned in-frame into the p2NIL vector. Then, a pGOAL19 PacI-cassette, carrying a β-
galactosidase lacZ gene, a hygromycin-resistance hyg gene and a sacB gene, that confers sucrose
sensitivity, was also inserted into the p2NIL vector. The resulting plasmid p2NIL-ΔnucSSm,
harbouring an in-frame deletion of the target gene (lacking ≈ 95% of the gene sequence), was
electroporated into M. smegmatis mc2 155. Cells were plated on Middlebrook 7H10 agar-X-gal
(100 µg ml-1
) plus kanamycin (25 µg ml-1
) and hygromycin (50 µg ml-1
) and incubated for 4-5 days
at 37ºC. Single-crossover merodiploid clones were grown in 7H9 broth without antibiotics to allow
a second crossover event. Cultures were diluted and counter-selected on Middlebrook 7H10-X-gal
(100 µg ml-1
) plates containing 2% sucrose and incubated 4-5 days at 37ºC. Double-crossover
clones were tested for kanamycin and hygromycin susceptibility to confirm the loss of the plasmid.
Finally, M. smegmatis mc2 155 ΔnucS colonies were tested by PCR and sequencing to verify the
unmarked nucS deletion.
Complementation of the M. smegmatis ΔnucS mutant.
For complementation with nucS from M. smegmatis mc2 155, the MSMEG_4923 gene (nucSSm),
including its own 46-bp promoter region, was amplified by PCR with primers compnucSmF and
compnucSmR primers, digested with EcoRI and HindIII and cloned into the integrative vector
pMV361, rendering the complementation vector pMV-nucSSm. Similarly, for complementation with
nucS from M. tuberculosis, the wild-type full-length MT_1321 gene from CDC1551 control strain
(nucSTB), with its own 61-pb upstream promoter region, was amplified by PCR, using compnucTBF
and compnucTBR primers, cloned into the pMV361 vector to generate the complementation vector,
pMV-nucSTB. Putative complemented mutants were obtained upon electroporation of the plasmids
into M. smegmatis mc2 155 ΔnucS and incubation of the plated samples on Middlebrook 7H10 agar
plus kanamycin (25 µg ml-1
) for 3-5 days at 37ºC. Finally, M. smegmatis mc2 155 ΔnucS
complemented mutants were analysed by PCR and sequencing to verify the proper insertion of the
genes.
20
Generation of ΔnucS S. coelicolor knockout mutant and complementation.
pIJ6650 vector was used to clone in-frame two 1.5 kb DNA fragments, one HindIII-BamHI nucS
upstream fragment plus one BamHI-EcoRV nucS downstream fragment, previously amplified by
PCR (primers delnucSco 5F, 5R, 3F and 3R). E. coli ET12567 (pUZ8002) was transformed with
pIJ-ΔnucSSco (containing the in-frame deletion of the nucSSco gene) and conjugated with S.
coelicolor A3(2) M145. Following the isolation of apramycin resistant single-crossover, putative
double-crossover mutants were isolated and the unmarked deletion of the nucSSco gene was verified
by PCR. S. coelicolor ΔnucS was complemented with a wild-type copy of nucSSco cloned in
pSET152. pSET152 is a non-replicative plasmid in Streptomyces that carries the attP site and the
integrase gene of the C31 phage and consequently can integrate into the attB site 2 in the
chromosome of Streptomyces. nucSSco gene with its own promotor was amplified by PCR with
primers compnucScoF and compnucScoR, cloned into pSET152 using EcoRI and BamHI sites to
give the pSET-nucSSco plasmid that was introduced into E. coli ET12567 (pUZ8002) by
transformation. Finally, pSET-nucSSco from E. coli ET12567 (pUZ8002, pSET-nucSSco) was
introduced into S. coelicolor ΔnucS by conjugation in order to integrate the construction into the
chromosome.
Construction of pRhomyco plasmids.
To create a template for measuring homologous and homeologous recombination in M. smegmatis,
we designed a recombination assay based on the hygromycin-resistance (Hyg-R) gene hyg, using
the integrative plasmid pMV361. For pRhomyco 100%, a fragment denominated hyg 3’ (from
nucleotide 195 of the coding sequence to the stop codon) of the hyg gene, was PCR amplified from
pRAM vector, using rechyg3F and rechyg3R primers and digested with EcoRV/SpeI to be cloned in
StuI/SpeI targets of pMV361. Additionally, a fragment denominated hyg 5´ containing 711 bp
(from the start codon of the hyg gene to nucleotide 711) was PCR amplified and cloned using
PvuII/HindIII with rechyg5F and rechyg5R primers. Both fragments share two overlapping hyg
fragments of 517 bp (nucleotide 195 to 711). The homeologous recombination vectors, named
pRhomyco 95%, 90% and 85%, were constructed by replacing the overlapping 517-bp fragment of
hyg 5´ for synthetic fragments that have 95%, 90% or 85% sequence similarity to the original hyg
fragment (Supplementary Fig. 2). First, a common fragment to the three of them (from nucleotide 1
to 193 of hyg) was PCR-amplified with rechyg5F and rechyg5intR primers and cloned using
PvuII/BclI. Then, each synthetic fragment was PCR amplified and cloned using BclI/NheI with
rec95-BclIF and rec95-NheIR primers, in the case of pRhomyco 95%. For pRhomyco 90% and
21
85%, the variable fragment was cloned using BclI/PcI-BspHI with rec90-85-BclIF and rec90-85-
PciIR primers.
Computational analyses.
A summary of all the computational approaches conducted is depicted in Supplementary Fig. 4. For
NucS, we used the structure-based alignment of bacterial and archaeal NucS (Supplementary Fig.
3). Then we followed the procedure depicted in Supplementary Fig. 4 to conduct domain analyses
using sequences.
Using the different defined regions by structural bioinformatics, we first conducted sequence
searches against the large database. We made non-redundant alignments of the N-terminal region
(containing 63 sequences) and the C-terminal region (containing 39 sequences) and built profile
hidden Markov models (HMMs). These profiles were searched using pHMMER 3 against the large
References Proteomes file where additional proteins containing CT and NT regions in alternative
proteins were found. While the NT region was found in alternative archaeal proteins (we selected
two after checking the alignment F7PKA0_9EURY and L0I7R5_HALRX (DUF91, PF01939), the
CT region was found not only in additional prokaryotic groups, but also in eukaryotic sequences
(I1F1Q3_AMPQE, I1F1Q5_AMPQE, B3SFT8_TRIAD, B9T981_RICCO, and
A0A015J6U4_9GLOM). With the exception of A0A015J6U4, and B9T981, the matching regions
are located within a DUF1016 domain (uncharacterized). We checked the alignments to confirm
that the catalytic residues, as described in the structure of P. abyssi 1, were conserved.
We next searched the PFAM database with NUCS_MYCTU and found a domain, PF01939, which
was trained in three NucS full sequences. From the PF01939 domain (539 sequences in 464
species), only 368 entries in 363 (archaeal and bacterial) species gave a match with the full protein
(Supplementary data file 1).
To identify MutS and MutL in bacterial and archaeal species, we generated HMMs from different
proteins, which would capture most of the bacterial and archaeal diversity (Supplementary Fig. 4).
The MutS profile was trained with the following sequences (MUTS_ECOLI, C1F256_ACIC5,
MUTS_AQUAE, MUTS_BACTN, MUTS_CHLPN, MUTS_CHLTE, MUTS_CHLAA,
WP_027388865.1, MUTS_PROM3, B5YE41_DICT6, MUTS_BACSU, MUTS_STAAM,
MUTS_CLOTE, MUTS_FUSNN, C1AEM6_GEMAT, A6DUF8_9BACT, MUTS_RHOBA,
MUTS_BRUME, MUTS_NEIMA, MUTS_SALTY, MUTS_VIBC3, MUTS_PSEAE,
MUTS_DESVH, MUTS_TREPA, A0A075WU36_9BACT, MUTS_THEMA, WP_009958798.1),
while the MutL profile was trained with the following sequences (MUTL_ACIC5,
MUTL_AQUAE, MUTL_BACTN, MUTL_CHLPN, MUTL_CHLTE, A9WJ86_CHLAA,
22
MUTL_DEIRA, MUTL_DICT6, D9S9H6_FIBSS, MUTL_BACSU, MUTL_STAAM,
MUTL_CLOTE, Q8RG56_FUSNN, MUTL_GEMAT, A6DHB3_9BACT, Q7UMZ3_RHOBA,
MUTL_BRUME, MUTL_NEIMA, A0A0C5TQ33_SALTM, MUTL_VIBCH, MUTL_PSEAE,
Q72ET5_DESVH, MUTL_TREPA, A0A075WSF3_9BACT, MUTL_THEMA, and
MUTL_ECOLI). The models were searched against each reference proteome where bit scores <50
were discarded to filter and analyze the results. Only full proteins were retrieved by forcing a length
> 75%, so partial hits would be excluded.
For actinobacterial and archaeal species not showing NucS, searches in all the particular genomes
(only for complete genomes) were done using translated searches against their genomes using
tblastn with NucS_MYCTU, MutS_ECOLI, and MUTL_ECOLI for actinobacterial, and
NucS_PYRAB, MutS_HALMA, and MUTL_HALSA for archaeal proteins.
23
References for Supplementary Information
1 Ren, B. et al. Structure and function of a novel endonuclease acting on branched DNA
substrates. The EMBO journal 28, 2479‐2489, doi:emboj2009192 [pii]
10.1038/emboj.2009.192 (2009).
2 Bierman, M. et al. Plasmid cloning vectors for the conjugal transfer of DNA from
Escherichia coli to Streptomyces spp. Gene 116, 43‐49 (1992).
3 Eddy, S. R. A new generation of homology search tools based on probabilistic inference.
Genome Inform 23, 205‐211 (2009).