57
A non-canonical mismatch repair pathway in prokaryotes Article (Accepted Version) http://sro.sussex.ac.uk Castañeda-García, A, Prieto, A I, Rodríguez-Beltrán, J, Alonso, N, Cantillon, D, Costas, C, Pérez- Lago, L, Zegeye, E D, Herranz, M, Plociński, P, Tonjum, T, García de Viedma, D, Paget, M, Waddell, S J, Rojas, A M et al. (2017) A non-canonical mismatch repair pathway in prokaryotes. Nature Communications, 8. a14246. ISSN 2041-1723 This version is available from Sussex Research Online: http://sro.sussex.ac.uk/66997/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version. Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University. Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available. Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

A non-canonical mismatch repair pathway in prokaryotessro.sussex.ac.uk/66997/4/Garcia_NucS.pdf · A non-canonical mismatch repair pathway in prokaryotes ... A non-canonical mismatch

Embed Size (px)

Citation preview

A non-canonical mismatch repair pathway in prokaryotes

Article (Accepted Version)

http://sro.sussex.ac.uk

Castañeda-García, A, Prieto, A I, Rodríguez-Beltrán, J, Alonso, N, Cantillon, D, Costas, C, Pérez-Lago, L, Zegeye, E D, Herranz, M, Plociński, P, Tonjum, T, García de Viedma, D, Paget, M, Waddell, S J, Rojas, A M et al. (2017) A non-canonical mismatch repair pathway in prokaryotes. Nature Communications, 8. a14246. ISSN 2041-1723

This version is available from Sussex Research Online: http://sro.sussex.ac.uk/66997/

This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version.

Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University.

Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.

Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

  1 

A non-canonical mismatch repair pathway in prokaryotes

A. Castañeda-García1,2

, A. I. Prieto1, J. Rodríguez-Beltrán

1, N. Alonso

3, D. Cantillon

4, C.

Costas1, L. Pérez

5, E. D. Zegeye

6, M. Herranz

5, P. Plociński

2,#, T. Tonjum

6, D. García de

Viedma5, M. Paget

7, S.J. Waddell

4, A. M. Rojas

8*, A. J. Doherty

2* and J. Blázquez

1,3,9*

1 Instituto de Biomedicina de Sevilla (IBIS)-CSIC. Stress and bacterial evolution .Sevilla,

Spain.

2 Genome Damage and Stability Centre, School of Life Sciences, University of Sussex,

Brighton, BN1 9RQ, UK. 3 Centro Nacional de Biotecnología-CSIC. Madrid, Spain.

4 Brighton and Sussex Medical School, University of Sussex, Brighton, BN1 9PX, UK.

5 Servicio de Microbiología Clínica y Enfermedades Infecciosas, Hospital Gregorio Marañón,

Madrid, Spain. 6 Department of Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway and

Department of Microbiology, University of Oslo, Oslo, Norway. 7School of Life Sciences, University of Sussex, Brighton BN1 9QG, UK

8 Computational Biology and Bioinformatics, Instituto de Biomedicina de Sevilla (IBIS)-

CSIC. 9 Unit of Infectious Diseases, Microbiology, and Preventive Medicine. University Hospital

Virgen del Rocio, Sevilla, Spain. # Present address: Institute of Biochemistry and Biophysics, Polish Academy of Sciences,

Warsaw, Poland.

*Correspondence to: Jesús Blázquez ([email protected]), Aidan J. Doherty

([email protected]) and Ana M. Rojas for computational Biology analyses

([email protected]).

  2 

Abstract

Mismatch Repair (MMR) is a near ubiquitous pathway, essential for the

maintenance of genome stability. Members of the MutS and MutL protein families perform

key steps in mismatch correction. Despite the major importance of this repair pathway, MutS-

MutL are absent in almost all Actinobacteria and many Archaea. However, these organisms

exhibit rates and spectra of spontaneous mutations similar to MMR-bearing species,

suggesting the existence of an alternative to the canonical MutS-MutL-based MMR. We

report that Mycobacterium smegmatis NucS/EndoMS, a putative endonuclease with no

structural homology to known MMR factors, is required for mutation avoidance and anti-

recombination, hallmarks of the canonical MMR. Furthermore, phenotypic analysis of

naturally occurring polymorphic NucS in a M. smegmatis surrogate model, suggests the

existence of M. tuberculosis mutator strains. The phylogenetic analysis of NucS indicates a

complex evolutionary process leading to a disperse distribution pattern in prokaryotes.

Together, these findings indicate that distinct pathways for MMR have evolved at least twice

in nature.

  3 

Cells ensure the maintenance of genome stability and a low mutation rate using a

plethora of DNA surveillance and correction processes. These include base selection,

proofreading, mismatch repair (MMR), base/nucleotide excision repair, recombination repair

and non-homologous end-joining 1. The recognized MMR pathway is highly conserved

among the three domains of life 2. In all cases, members of the MutS and MutL protein

families perform key steps in mismatch correction. This MutS-MutL-based MMR system

(canonical MMR) is a sophisticated DNA repair pathway that detects and removes incorrect

mismatched nucleobases. Mismatched base pairs in DNA arise mainly as a result of DNA

replication errors due to the incorporation of wrong nucleobases by DNA polymerases. To

correct them, the system detects incorrect although chemically normal bases that are

mismatched with the complementary strand, discriminating between the parental template and

the newly synthesized strand 3.

Loss of this activity has very important consequences, such as high rates of mutation

(hypermutability) and increased recombination between non perfectly identical

(homeologous) DNA sequences 4, both hallmarks of MMR inactivation. Nevertheless, despite

the quasi-ubiquitous nature of this pathway, there are some important exceptions. The

genomes of many Archaea, including Crenarchaeota and a few groups of Euryarchaeota, and

almost all members of the bacterial phylum Actinobacteria, including Mycobacterium, have

been shown to possess no identifiable MutS or MutL homologs 5-7

. However, these

prokaryotes exhibit rates and spectra of spontaneous mutations similar to canonical MMR-

bearing bacterial species 8-10

, suggesting the existence of unidentified mechanisms responsible

for mismatch repair.

To identify novel mutation avoidance genes in mycobacteria, we performed a genetic

screen and discovered that inactivation of the MSMEG_4923 gene in Mycobacterium

smegmatis, encoding a homologue of an archaeal endonuclease dubbed NucS (for nuclease

specific for ssDNA) 11

, produced a hypermutable phenotype. NucS from Pyrococcus abyssi

was initially identified as a member of a new family of novel structure-specific DNA

endonucleases in Archaea 11

. Notably, a recent report revealed that the protein NucS from the

archaeal species Thermococcus kodakarensis (renamed as EndoMS) is a mismatch-specific

endonuclease, acting specifically on dsDNA substrates containing mismatched bases 12

. The

possibility that a non-canonical mismatch repair could be triggered by a specific double-

strand break (DSB) endonuclease, able to act at the site of a mispair, followed by DSB repair

was hypothesized 30 years ago 13

. The archaeal NucS binds and cleaves both strands at

dsDNA mismatched substrates, with the mispaired bases in the central position, leaving 5-

  4 

nucleotide long 5´-cohesive ends 12

. Consequently, it has been suggested that these specific

double-strand breaks may promote the repair of mismatches, acting in a novel MMR process

12. Moreover, very recently, the structure of the T. kodakarensis NucS-mismatched dsDNA

complex has recently been determined, strongly supporting the idea that NucS acts in a

mismatch repair pathway 14

. Although these biochemical and structural studies have shed

some light on the role of NucS at the molecular level, the cellular and biological function of

NucS and its impact on genome stability remains unknown.

Hypermutable bacterial pathogens, very often associated with defects in MMR

components, are frequently isolated and pose a serious risk in many clinical infections 15-21

.

The major pathogen Mycobacterium tuberculosis appears to be genetically isolated, acquires

antibiotic resistance exclusively through chromosomal mutations 22 and presents variability in

mutation rates between strains 23

. These factors should contribute to the selection of

hypermutable M. tuberculosis strains under antibiotic pressure, as it occurs with other chronic

pathogens 18,23

. However, hypermutable strains have not yet been detected in this pathogen.

To investigate the possible existence of M. tuberculosis hypermutable strains affected in NucS

activity, we analysed the effect of naturally occurring polymorphisms in nucS from M.

tuberculosis clinical isolates on mutation rates, using M. smegmatis as a surrogate model.

In this study, we first establish that NucS is essential for maintaining DNA stability,

by preventing the acquisition of DNA mutations in Mycobacterium and Streptomyces.

Inactivation of nucS in M. smegmatis produces specific phenotypes that mimic those of

canonical MMR-null mutants (increased mutation rate, biased mutational spectrum and

increased homeologous recombination). Finally, we conduct here distribution and

phylogenetic analyses of NucS across the prokaryotic domains to understand its evolutionary

origins.

  5 

Results

Identification and characterization of M. smegmatis nucS

To identify novel mutation avoidance genes in mycobacteria, a M. smegmatis mc2 155

library of ~11,000 independent transposon insertion mutants was generated and screened for

spontaneous mutations that confer rifampicin resistance (Rif-R), used as a hypermutator

hallmark (Fig. 1A). One transposon insertion, which inactivated the MSMEG_4923 (nucS)

gene, conferred a strong hypermutable phenotype. The NucS/EndoMS translated protein

sequence is 27% identical to NucS from the hyperthermophilic archaeal Pyrococcus abyssi

and 87% to that of M. tuberculosis (Fig. 1B). P. abyssi NucS was defined as the first member

of a new family of structure-specific endonucleases with a mismatch-specific endonuclease

activity 11,12

containing an N-terminal DNA-binding domain and a C-terminal RecB-like

nuclease domain 11

. All the catalytic residues required for nuclease activity in P. abyssi are

conserved in the mycobacterial NucS (see Fig. 1B).

Recombinant M. smegmatis NucS was expressed and purified. The apparent

molecular mass of the purified native protein fits with the expected size (25 kDa)

(Supplementary Fig. 1A). The biochemical activity was analysed by DNA electrophoretic

mobility shift assays (EMSAs) and nuclease enzymatic assays. M. smegmatis NucS was

capable of binding to single-stranded DNA (ssDNA), as seen in the archaeal P. abyssi NucS

11, but not to double-stranded DNA (dsDNA) (Fig. 1C). Regarding cleavage of mismatched

substrates, no significant specific cleavage activity was observed on different substrates

containing a single nucleotide mismatch in the central region of the strand (Supplementary

Fig. 1B), in contrast with previous results obtained with the archaeal T. kodakarensis NucS 12

.

The binding to ssDNA indicates that the protein is correctly folded, as it can bind to its

substrate. This suggests that different requirements, such as the binding of additional partners,

protein modifications or different DNA substrates, may be required to activate the mismatch-

specific cleavage activity of the bacterial NucS.

NucS is essential to maintain low levels of spontaneous mutation

To verify that NucS has a key role in maintaining DNA fidelity, we constructed an in-

frame deletion of nucS (ΔnucS, nucS-null derivative) in M. smegmatis and measured the rate

at which drug resistance was acquired. Notably, the nucS-null strain displayed a hypermutable

phenotype, increasing the rate of spontaneously emerging rifampicin and streptomycin (Str-R)

resistances by a factor of 150-fold and 86-fold, respectively, above the wild-type strain

  6 

(3.1x10-7

vs 2.1x10-9

and 4.8x10-8

vs 5.6x10-10

, ΔnucS vs wild-type, respectively). These

results are equivalent to those observed for mutS- or mutL-deficient E. coli (102-10

3-fold

increases) 24

. Significantly, basal mutation rates were recovered when the nucS deletion was

chromosomally complemented with the wild-type gene MSMEG_4923 (nucSSm) (Fig. 2A and

Supplementary Table 1), confirming that inactivation of this gene was responsible for the high

mutation rates observed.

Notably, M. smegmatis mc2

155 and its ΔnucS derivative produced a different

mutational signature. While spontaneous Rif-R mutations detected in the wild-type strain

comprise different base substitutions, including transitions, but also transversions and even an

in-frame deletion, all mutations detected in the ΔnucS strain were specifically transitions

(A:T→G:C or G:C→A:T) (Fig. 2B and Supplementary Tables 2A-B). Transitions occur much

more frequently than transversions or indels in any cell, due to spontaneous or induced errors

in the DNA, and are the preferred substrates for MMR mechanism. The ΔnucS strain

accumulates a high number of uncorrected transitions, masking transversions and indels to

undetectable levels under our experimental conditions. This transition-biased mutational

spectrum is a hallmark signature of canonical MMR pathway 25,26

.

To demonstrate the generality of the mutation-avoidance activity in species encoding

NucS, the nucS gene was deleted in Streptomyces coelicolor A3(2), a different actinobacterial

species. The precise deletion of nucS in this species increased the rate of spontaneous

mutations conferring Rif-R and Str-R by a factor of 108-fold and 197-fold, respectively. As

with M. smegmatis, low mutation rates were recovered by complementation with the wild-

type S. coelicolor nucS gene (Fig. 2C and Supplementary Table 3). Together, these results

highlight the essential role of NucS in mutation avoidance.

NucS inhibits homeologous but not homologous recombination

These results prompted us to investigate whether NucS is involved in reduction of

recombination between non-identical (homeologous) DNA sequences, but not between 100%

identical, as described for the canonical MMR-null mutants in other bacterial species 4,27. The

rates of recombination between homologous and homeologous sequences were measured

using specific engineered tools (Fig. 3A and Supplementary Fig. 2A-B). Recombination rates

between 100% identical sequences were similar in the wild-type and its ΔnucS derivative

(2.42x10-6

and 2.86x10-6

, respectively). However, when the identity of DNA sequences

decreased, the recombination rate was comparatively higher in the nucS–null derivative: 95%

  7 

4.68x10-7

vs 1.41x10-6

(3-fold difference), 90% 1.71x10-8

vs 1.82x10-7

(10-fold) and 85%

6.47x10-9

vs 3.36x10-8

(5-fold) for wild-type versus ΔnucS, respectively (Fig. 3B). We

verified that recombinants detected were exclusively due to recombination events but not

spontaneous mutation (see Methods). The inhibitory effect of NucS on homeologous but not

homologous recombination again resembles an important tell-tale signature of a canonical

MMR pathway 28,29

.

NucS polymorphisms in M. tuberculosis clinical strains

Once the possibility of hypermutability was demonstrated in M. smegmatis, we

searched for the existence of M. tuberculosis hypermutable clinical isolates by nucS

inactivation. Analysis of the nucSTB sequences from ∼1,600 clinical M. tuberculosis strains

available in the databases revealed a total of 9 missense SNPs (Supplementary Table 4 and

Supplementary Fig. 3). The effects of these polymorphisms on NucS activity were

experimentally analysed by checking their impact on mutation rates, using a M. smegmatis

heterologous system, as previously reported for other mutagenesis studies 30

. Ten nucSTB

alleles (9 polymorphic plus the wild-type) were integrated into the chromosome of M.

smegmatis ΔnucS. Complementation with wild-type nucSTB restored low mutation rates to M.

smegmatis ΔnucS. However, five alleles increased mutation rate significantly (Fig. 4,

Supplementary Table 5), with allele S39R presenting the strongest observed mutator

phenotype (83-fold increase). Alleles A135S, R144S, T168A and K184E produced increases

in mutation rates close to one order of magnitude, whereas alleles S54I, A67S, V69A, and

D162H produced low increases (∼2) or no changes. These results suggest the existence of

hypermutable M. tuberculosis clinical strains affected by modulation of NucS activity.

NucS taxonomic distribution

Taken together, our results compellingly suggest a functional connection between

NucS and known MMR pathways. We analysed the species distribution of NucS taking also

into account the canonical MMR proteins, MutS and MutL (for MutS and MutL analyses see

Supplementary Methods). A total of 3,942 reference proteomes were scanned for

presence/absence of NucS (Supplementary Data 1 and Supplementary Fig. 4). Bacteria are

represented by 2,709 proteomes (68%) and Archaea by 132 (4%), the remaining 1,101 (28%)

belonging to Eukaryota and Virus. NucS is present in 370 organisms, from the domains

  8 

Archaea (60 species) and Bacteria (310 species) (Supplementary Data 1). It is totally absent

from eukaryotes and virus.

To highlight the distribution of NucS in all organisms, we built a phylogenetic profile

of NucS. Figure 5 shows this profile mapped onto the NCBI taxonomy tree, containing 2,186

bacterial and archaeal species (see also Supplementary Data 2). The NucS distribution pattern

indicates that this protein exhibits a disperse distribution (Fig. 5 and Supplementary Data 1).

In Bacteria, the phylum Actinobacteria is the one containing the majority of NucS with

300 species in the class Actinobacteria (only 2 exceptions) and 3 species in other classes of

the phylum (Supplementary Data 1), Conexibacter woesei (class Thermoleophilia),

Patulibacter medicamentivorans (class Thermoleophilia) and Ilumatobacter coccineus (class

Acidimicrobiia). All the analysed members of the class Coriobacteriia, from the

Actinobacteria phylum, lack NucS and MutS-MutL proteins, while the two species from the

class Rubrobacteria lack NucS, but present MutS-MutL. The pattern is even more disperse in

Archaea. From 132 species, 60 have NucS (21 also contain MutS-MutL). NucS, but not

MutS-MutL, is present in 17 out of 24 species of the phylum Crenarchaeota and in 18 out of

88 of the phylum Euryarchaeota. By contrast, MutS-MutL (but not NucS) is completely

absent in crenarchaeotal species while it is restricted to 29 euryarchaeotal species

(Supplementary Data 1).

Interestingly, only 28 organisms, 21 halobacterial species (domain Archaea) and 7

species of the phylum Deinococcus-Thermus (domain Bacteria), have both NucS and MutS-

MutL sequences. Notably, when we focused our analysis in the two main NucS-containing

groups (Actinobacteria and Archaea), some species still lack both NucS and MutS-MutL

(confirmed by protein translated searches -tblastn- in Actinobacteria and Archaea,

Supplementary Data 1).

A model for the origin and evolution of NucS

Our results support a complex evolutionary ancestry for NucS. Comprehensive protein

sequence analyses of NucS indicate that the protein contains two distinct regions, a DNA-

binding N-terminal domain (NucS-NT) and an endonuclease C-terminal domain (NucS-CT)

(Supplementary Figs. 3-4), in agreement with previous structural studies 11

. At the sequence

level, using profile hidden Markov models (HMMs), we also detected these regions in

alternative proteins outside the context of NucS, supporting the original independence of these

domains, which have been subsequently fused during evolution.

  9 

To understand the particular distribution observed in the phylogenetic profile, we

conducted sequence and phylogenetic independent analyses of full NucS and also the N-

terminal and C-terminal regions (Supplementary Fig. 5-7). The actinobacterial NucS

representatives are well separated from those of Archaea. On the other hand, the NucS from

Deinococcus-Thermus species group together with the archaeal proteins, instead of the

actinobacterial orthologues, suggesting that NucS has recently been transferred from Archaea

to these Deinococcus-Thermus species (Supplementary Figs. 5-7).

The sequence analyses provide different distributions for the two regions of NucS.

While the N-terminal region is limited exclusively to Archaea and some Bacteria

(Supplementary Fig. 7), the C-terminal region was found in many archaeal species, some

Bacteria, and in a few eukaryotes (Supplementary Fig. 6) and, importantly, also in different

domain architectures. These observations, in the context of our phylogenetic analyses, suggest

that both regions may have emerged in Archaea. Subsequently, the C-terminal region may

have been transferred to some bacterial species and to a few eukaryotes, and got fixed in

different protein contexts. The full protein and the individual domains likely got lost in certain

groups during the evolution of Archaea. However, the reconstruction of the precise evolution

of these individual domains requires deeper analyses in the context of related protein

domains.

Phylogenetic analysis of the full protein suggests that NucS emerged after the archaeal

divergence by a rearrangement of these domains and then it was transferred from Archaea to

certain Deinococcus-Thermus species in at least one event. For Actinobacteria, the most

parsimonious explanation suggests a horizontal transfer event of NucS from Archaea coupled

with a likely MutS-MutL loss event in the last common ancestor of Actinobacteria (as most of

the contemporary NucS-encoding species lack MutS and MutL). This is consistent with the

fact that we did not find any species from the entire Bacteria group, except those from

Actinobacteria and the Deinococcus-Thermus group, having NucS or NucS-like proteins. Our

model (Fig. 6) favours the simplest scenario to explain the complex pattern distribution of the

full NucS protein and its absence in all eukaryotes, the vast majority of bacteria, and some

archaeal organisms without invoking unlikely massive losses. This is in agreement with our

phylogenetic observations.

  10 

Discussion

This study provides genetic and biological evidence that establish the existence of a

DNA repair system that mimics the canonical MutS-MutL-based MMR pathway.

To date, the correction of mismatched nucleobases has been defined as a highly

conserved mechanism whose key steps are performed, in all cases, by members of the MutS

and MutL protein families. Despite the genomes of Crenarchaeota, a few groups of

Euryarchaeota and almost all members of the phylum Actinobacteria lacking identifiable

MutS or MutL homologues 5-7

, they exhibit rates and spectra of spontaneous mutations similar

to canonical MMR-bearing bacterial species 8-10

, suggesting the existence of still undetected

pathway responsible for this type of correction. Although recent biochemical and structural

reports suggested the existence of a novel mismatch-specific endonuclease, NucS, in the

archaeal species T. kodakarensis 12,14

that is able to recognize and cleave mismatched bases in

double-strand DNA in vitro, no genetic and/or biological evidence had been reported to date

on the activity of a novel mismatch repair pathway.

The screening of a large library of M. smegmatis mutants revealed that NucS is a key

mutation avoidance component in Actinobacteria. Through genetic and biological analysis,

we demonstrate that the nucS-null phenotypes in M. smegmatis are almost identical to those

produced by the MMR deficiency in other bacteria (very high mutation rates, transition-biased

mutational spectrum and increased homeologous recombination rates). The anti-mutator

nature of NucS is also demonstrated in Streptomyces coelicolor, a different species of the

class Actinobacteria. Therefore, NucS appears to be an important DNA repair factor which,

together with the high-fidelity DNA-polymerase DnaE1 and its PHP domain-proofreader 30

,

maintain low mutation rates (~10-10

mutations per base per generation 9,10

), ensuring genome

stability and DNA fidelity in mycobacteria and in other Actinobacteria.

Although single nucleotide polymorphisms (SNPs) in DNA repair/replication genes

have been suggested as a source of hypermutation in M. tuberculosis 31,32

, only small

increases in mutation rates have been observed due to these polymorphic genes (e.g.

polymorphisms in PHP exonuclease domain of the DNA-polymerase DnaE1 30

). Our results

suggest the existence of naturally occurring hypermutable M. tuberculosis variants with

diminished NucS activity. Whether or not this is relevant for the virulence and adaptation to

antibiotic treatments remains to be deciphered.

Although we observed that purified mycobacterial NucS binds to single-stranded but

not to double-stranded DNA, as described for archaeal NucS proteins 11,12

, no significant

specific cleavage activity was observed on mismatched substrates, suggesting that activation

  11 

by other partners and/or modifications (e.g. post-translational modifications) is required in M.

smegmatis. Also, the possibility of a functional difference between bacterial and archaeal

NucS cannot be ruled out. Therefore, additional studies are needed to assess whether

mycobacterial and archaeal NucS proteins have the same functional requirements.

Our computational studies support an archaeal origin for NucS, as previously

suggested 12

, built upon two distinct domains that suffered a complex evolutionary history of

transfers and/or losses, including at least two HGT events to Actinobacteria and Deinococcus-

Thermus. Indeed, the contribution of horizontal gene transfer in bacterial and archaeal

evolution is well-established 33,34

. Interestingly, NucS and MutS-MutL systems seem to be

present alternatively in different species, with only a few exceptions. In Actinobacteria, it is

possible that the acquisition of NucS may have facilitated the subsequent loss of MutS-MutL,

as these canonical proteins are so widely conserved across Bacteria.

On the other hand, there are two groups (Halobacteria and some Deinococcus-

Thermus) where NucS and MutS-MutL coexist. Notably, in a Halobacteria, Halobacterium

salinarum, inactivation of mutS or mutL produced no hypermutability 35

, suggesting that MutS

and MutL are redundant to an alternative system that controls spontaneous mutation. This

indicates that evolution of these particular species may have opted to keep both systems. The

possible interplay between both pathways remains to be elucidated. Surprisingly, NucS and

MutS-MutL are apparently absent in some species. Therefore, the possible existence of

additional alternative MMR repair pathways, yet to be identified, cannot be discarded.

In conclusion, we propose that MMR is a mechanism that can be either accomplished

by either the MutS-L or the NucS pathway. Understanding the mechanisms and pathways that

influence genome stability may unveil new strategies to predict and combat the development

of drug resistance. In addition, engineered Mycobacterium strains lacking this mutation-

avoidance pathway, may be valuable tools for evaluating anti-tuberculosis treatments,

including new drugs, drug combinations and non-antibiotic based regimes.

  12 

Methods

Bacterial strains, media and growth conditions.

M. smegmatis wild-type strain mc2 155 and its mutant derivatives were grown at 37ºC

in Middlebrook 7H9 broth or Middlebrook 7H10 agar (Difco) with 0.5% glycerol and 0.05%

Tween 80, and enriched with 10% albumin-dextrose-catalase (Difco). Escherichia coli strains

were cultured at 37ºC in LB medium. Streptomyces coelicolor A3(2) M145 and its derivatives

were grown at 30ºC on mannitol–soya (MS) agar.

All primers used in this work are listed in Supplementary Table 6.

Generation and screening of a M. smegmatis insertion library

Transposon ΦMycoMarT7 36

was used to obtain a M. smegmatis mutant library of

11,000 independent clones. For transduction, M. smegmatis mc2 155 cultures were washed,

mixed with the phage stock at a multiplicity of infection of 1:10 (37ºC, 3 h) and plated on

kanamycin (25 µg ml-1

). The insertion mutants were isolated and inoculated into 96-well

microtiter plates containing Middlebrook 7H9 medium. To identify hypermutators, each

mutant was inoculated onto 7H10 agar plates with rifampicin (20 µg ml-1

). One transposon

mutant, producing a high number of rifampicin resistant (Rif-R) colonies, was selected for

further characterization. The transposon insertion site in the chromosome was determined by

sequencing.

Generation of a ΔnucS knockout mutant in M. smegmatis

M. smegmatis ΔnucS (MSMEG_4923, nucSSm) in-frame deletion mutant was generated

by allelic replacement 37

. Briefly, p2NIL-ΔnucSSm, harbouring an in-frame deletion of the

target gene, was electroporated into M. smegmatis mc2 155. Single-crossover merodiploid

clones were isolated and after that, counter-selected to generate the unmarked deletion by a

second crossover event. Finally, putative M. smegmatis ΔnucS colonies were checked by PCR

and sequencing. For additional details see Supplementary Methods.

Complementation of the M. smegmatis ΔnucS mutant

For complementation, a wild type of the MSMEG_4923 gene (nucSSm) and the wild-

type full-length MT_1321 gene from CDC1551 control strain (nucSTB), including their own

promoter regions, were cloned into the vector pMV361 38

and introduced into M. smegmatis

ΔnucS by electroporation. pMV361 is an integrative vector that carries a site-specific

  13 

integration system derived from the mycobacteriophage L5. It integrates at a single site in the

M. smegmatis chromosome, the attB, which overlaps the 3´ end of the tRNA-glycine gene 39

.

Integration at the appropriate site was verified for all constructs by PCR. For additional details

see Supplementary Methods.

Generation of ΔnucS S. coelicolor knockout mutant and complementation

S. coelicolor ΔnucS (gene SCO5388, nucSSco) in-frame deletion mutant was

constructed by double crossover recombination using the plasmid pIJ6650 40

. pIJ-ΔnucSSco

(containing the in-frame deletion of the nucSSco gene) was introduced in S. coelicolor A3(2)

M145 to generate the unmarked deletion of the nucSSco gene. Finally, S. coelicolor ΔnucS was

complemented with a wild-type copy of nucSSco inserted in the attB site of the S. coelicolor

ΔnucS via the integrative vector pSET152 (Supplementary Methods).

Estimation of mutation rates

Fluctuation analyses were used to experimentally address mutation rates. For each

experiment, 20 independent cultures of M. smegmatis mc2 155 and its derivatives were grown

and diluted in order to inoculate 1,000-10,000 cells per ml into fresh medium. All the cultures

were incubated until they reached stationary phase (about 109 cells per ml). Appropriate

dilutions were plated on Middlebrook 7H10 medium with or without rifampicin (100 µg ml-1

)

or streptomycin (50 µg ml-1

). For S. coelicolor, 20 independent concentrated spore

suspensions (containing ∼109 spores per ml) from each analysed strain were generated on MS

agar, and after that, plated on MS agar with or without rifampicin (100 µg ml-1

) or

streptomycin (50 µg ml-1

). At least 3 different experiments were performed for each

fluctuation analysis in both cases.

The expected number of mutations per culture (m) and 95% confidence intervals were

calculated using the maximum likelihood estimator applying the newton.LD.plating and

confint.LD.plating functions that account for differences in plating efficiency implemented in

the package rSalvador (http://eeeeeric.com/rSalvador/) (Qi Zheng. Rsalvador: an assay. R

package version 1.3) for R (www.R-project.org/). Mutation rates (mutations per cell per

generation) were then calculated by dividing m by the total number of generations, assumed

to be roughly equal to the average final number of cells. Statistical comparisons were carried

out by the use of the LRT.LD.plating function of the same package, which accounts for the

differences in final number of cells and plating efficiency. Finally, in case of multiple

comparisons p-values were corrected by the Bonferroni method.

  14 

Characterization of the mutational spectrum.

The mutational spectra conferred by M. smegmatis mc2 155 and its ΔnucS derivative were

characterised by selecting for resistance to rifampicin and sequencing the rifampicin

resistance-determining region (RRDR) in the rpoB gene, from independent Rif-R isolates.

Recombination assays

The recombination assay vectors (pRhomyco series) were generated by cloning two

overlapping fragments of the hygromycin-resistance gene (hyg), from pRAM 41

, flanking a

functional kanamycin-resistance (kan) gene in the pMV361 plasmid (Supplementary Fig. 2).

Neither of the fragments of hyg conferred resistance to hygromycin on their own. Fragments

share a duplicated 517 bp overlapping region, with different degrees of sequence identity

(100%, 95%, 90% or 85%). M. smegmatis mc2 155 and its ΔnucS derivative were transformed

by electroporation with the integrative plasmids pRhomyco 100%, 95%, 90% and 85%

designed for the recombination assays and plated on Middlebrook 7H10 containing

kanamycin (25 µg/µl). Integration of pRhomyco vectors in the appropriate site (attB) of the

M. smegmatis mc2 155 (wild-type and ΔnucS) chromosome was verified by PCR.

To measure recombination rates, we analysed the restoration of a functional hyg gene

that confers Hyg-R by a recombination event between the two overlapping fragments of the

hyg gene. Sixteen independent overnight cultures from each strain were grown in

Middlebrook 7H9 broth plus kanamycin (to prevent the selection of Hyg-R bacteria coming

from the early recombination of pRhomyco). Approximately, 104 bacteria were inoculated in

plain 7H9 broth (without kanamycin) to allow the recombination events and cultured 24h at

37ºC with shaking. Cultures were plated on 7H10 plus 50 µg ml-1

hygromycin (for

recombinant cells count) and in addition, appropriate dilutions were plated on plain 7H10 (for

viable cell counts) and incubated for 3-5 days. Total number of Hyg-R colonies was

considered to calculate recombination rates. Recombination rates were calculated and

analysed as described in estimation of mutation rates.

To validate the recombination assay, we first verified that no Hyg-R colonies were

generated by spontaneous mutation, even in the ΔnucS derivative. The mutation rate was

≤1x10-10

for the hypermutable derivative, far below the lowest recombination value (6x10-9

for WT, 15% non-identical sequences). These results indicate that the Hyg-R colonies from

our recombination assay were not generated by spontaneous mutations. Moreover, we verified

that recombinant clones express hygromycin resistance and kanamycin susceptibility by

picking 20-30 Hyg-R colonies from each strain onto kanamycin plates. In all cases they were

  15 

Hyg-R and Kan-S. Finally, we randomly isolated 10 independent Hyg-R presumptive

recombinant colonies from each construct (80 colonies total) and verified by PCR that a

single fragment of the size of the reconstituted hyg gene was amplified in all cases.

NucS purification and biochemical activity.

M. smegmatis nucS was cloned as a SUMO fusion and expressed in E. coli BL21 cells

overnight at 16ºC. NucS protein was purified by affinity chromatography using Ni2+

–NTA

agarose column followed by an additional step of anion exchange purification in HiTrap Q

HP column. NucS-SUMO fusion was cleaved by Ulp protease to release a native protein.

Purified protein was collected and concentrated to perform the activity assays (Supplementary

Fig. 1A). In addition, His-tagged M. smegmatis nucS was also cloned in pET28b (for E. coli

expression) and pYUB28b (for M. smegmatis expression). His-tagged NucS was also purified

by affinity chromatography and anion exchange purification as described before.

For DNA EMSA assays, NucS purified protein (1 µM to 16 µM) was incubated with

fluorescein-labelled single-stranded (45-mer DNA, 50 nM) or double-stranded DNA (45-bp

DNA, 50 nM), in 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 10% glycerol and 10 µg ml-1

BSA

for 30 minutes at 37ºC. Protein-DNA complexes were resolved on 5% polyacrylamide gel in

0.25xTBE plus 1% glycerol and run under refrigerated conditions (5ºC-10ºC).

For nuclease activity, fluorescein-labelled 36-mer DNA was annealed to a fully

complementary opposite strand (control) or carrying a single nucleotide mismatch in the

central region of the strand (mismatch substrates). 30 nM dsDNA substrates were incubated

with 300 nM of recombinant, tag free NucS in 20 mM Tris-HCl pH 7.5, 6 mM ammonium

sulfate, 2 mM MgCl2 and 100 mM NaCl, 0.1 mg ml-1

BSA and 0.1% TritonX-100. The

reactions were carried out at 37°C for 30 minutes when 0.5 U Proteinase K was added to each

reaction to digest NucS nuclease. Digestion was continued for 30 minutes at 50°C. Stop

solution was added (95% formamide, 0.09% xylene cyanol), samples were boiled at 95°C for

10 minutes and resolved on 7M urea, 15% polyacrylamide gel in 1xTBE for 2 hours. Mg, Mn

and Zn ions, and combinations of these three metals, were tested as cofactors with no

cleavage observed.

Analysis of nucS SNPs in M. tuberculosis clinical strains

Sequences from the MT_1321 gene product were downloaded from the Ensembl

database (http://bacteria.ensembl.org/index.html), M. tuberculosis variome resource 42

and

  16 

Comas et al. data 43

. 1,600 proteins were aligned using MAFFT and the polymorphisms were

visualized with Jalview.

Construction of nucS alleles by site-directed mutagenesis.

To construct nucS alleles, wild type nucSTB gene was PCR amplified and cloned into

pUC19 to be used as template for mutagenic PCRs. Single specific mutations were introduced

into nucSTB by PCR with the suitable mutagenic pairs of primers. Following the PCR reaction,

a DpnI digestion was carried out for 4 hours at 37ºC. Mutagenized plasmids obtained after

transformation into E. coli DH5α were verified by DNA sequencing. All the nucS alleles were

re-cloned into the integrative vector pMV361 to generate a set of complementation plasmids

and introduced by transformation into M. smegmatis mc2 155 ΔnucS. These plasmids integrate

at a single site, attB, in the M. smegmatis chromosome.

In order to test the efficiency of each nucS allele to restore normal mutation rates, the

plasmids pMV361 carrying the mutated alleles were integrated in M. smegmatis ΔnucS

chromosome. Integration of each nucSTB allele in the appropriate site (attB) of the

chromosome was verified by PCR.

Computational analyses of the NucS protein.

A summary of all the computational approaches conducted is depicted in

Supplementary Fig. 4.

To establish how general the presence of NucS is in the domains on life, we conducted

sequence analyses to initially identify NucS proteins (Supplementary Fig. 4).

NUCS_MYCTU (M. tuberculosis NucS) and NUCS_PYRAB (P. abyssi NucS) sequences

were used to find homologues. Each full sequence was used as an independent query in

pHMMER searches (HMMER3; http://hmmer.org/) against the large database of the

concatenated reference proteomes (2016_02 release of 17th February 2016). We next

collected sequences excluding the original sequences used to conduct the queries. The

remaining sequences were filtered by e-value (removed >0.0001), bit score >50, and length

(only those >75% length were kept), and further aligned by MAFFT

[http://mafft.cbrc.jp/alignment/software/] 44

. The alignments were visualized with Belvu

[http://sonnhammer.sbc.su.se/Belvu.html] 45

to check for quality. We removed the redundancy

of the alignment discarding sequences with more than 63% of sequence identity. We next

generated Markov profiles trained with the non-redundant multiple sequence alignment (using

HHMER3). For each protein, we obtained about the same 400 sequences that would give also

  17 

partial hits to different species, which suggested a potential existence of different domains in

the protein. Further checking of the sequences retrieved the final 370 sequences. As NucS was

not identified in eukaryotes or viruses, they were discarded for further analyses.

We constructed a phylogenetic profile of NucS (Supplementary Data 1) and mapped it

into the NCBI taxonomic tree (Figure 5, the newick tree format is available in Supplementary

Data 2). The tree was annotated using ggtree (http://www.bioconductor.org/packages/ggtree).

Characterization of the two NucS regions

To identify potential domains in NucS, we focused on the archaeal P. abyssi NucS,

whose 3D structure was previously resolved (PDB 2VLD chain B) 11

. This structure is

composed by two distinct regions: the N-terminal DNA binding region (1-114 amino acids)

and the C-terminal catalytic region (126-233 amino acids) 11

. Given the absence of structural

data for the bacterial NucS protein, the M. tuberculosis NucS structure was modelled using

the archaeal P. abyssi NucS as a template using I-TASSER 16

and generated a reliable model

(Supplementary Fig. 3). Template and model were structurally aligned using the

Combinatorial Extension (CE) algorithm17

, built-in with PyMOL, to identify relevant

residues. We used the structure-based alignment between this template and M. tuberculosis

NucS (NucS_MYCTU) (Supplementary Fig. 3) to determine the different regions for further

sequence analyses (N-terminal and C-terminal regions). These analyses were conducted using

profile searches as above and confirmed by independent analyses of the PF01939 domain

trained with NucS (for details see Supplementary Methods and Supplementary Fig. 4).

We conducted sequence searches of NucS_MYCTU in the PFAM database, where

the PF01939 domain (DUF91, automatic) was identified. From the PF01939 domain (539

sequences in 464 species), only 368 entries in 363 species (archaeal and bacterial) gave a

match with the full protein (Supplementary data file 1). The results obtained from the PFAM

database searches for the PF01939 domain, are in agreement with the structure-based profiles

analyses (see above) confirming the existence of two distinct regions.

Taking into account the domain boundaries established above, we next aligned the

368 full NucS proteins (from PFAM PF01939) to profiles built from the structural alignment

corresponding to i) the whole protein ii) the N-terminal region (NT) and iii) the C-terminal

region (CT).

  18 

Sequence analysis of NucS domains

Using the different defined regions by structural bioinformatics, we first conducted

sequence searches against the large database. We made non-redundant alignments of the N-

terminal region (containing 63 sequences) and the C-terminal region (containing 39

sequences) and built profile hidden Markov models (HMMs). These profiles were searched

using pHMMER 46

against the large References Proteomes file.

Our analyses identified the C-terminal region in all the original 368 NucS sequences

(for details see Supplementary Methods and Supplementary Fig. 4). In addition, the C-

terminal was also found in proteins containing a variety of additional protein domains in other

prokaryotic species and in few additional eukaryotic sequences. We checked the alignments to

confirm that the catalytic residues were conserved, as described in the structure of P. abyssi

11. When a profile with the eukaryotic sequences (excluding any bacterial or archaeal

sequences) was generated, we retrieved again the C-terminal regions of NucS above threshold

at significant values, but we did not retrieve any other eukaryotic sequences at reliable values.

On the other hand, the N-terminal region was found mainly in bacterial and archaeal NucS,

but also in two archaeal proteins annotated as topoisomerases. A pHMMER search of the

regions of these sequences yielded the N-terminal of NucS.

Phylogenetic analyses

We conducted Maximum Likelihood (ML) phylogenetic analyses using RAxML 47

in

order to construct phylogenetic trees that could explain the phylogenetic profile

(Supplementary Fig. 4).

To run phylogenetic analyses, NucS sequences from the PF01939 domain were used,

but aligned to their corresponding structure-based regions. Sequences found in our searches

with independent regions were also included. Out of 539, 368 unique sequences aligned with

the full NucS structure-based profile (Supplementary Fig. 5, Supplementary Data 3), 422

sequences containing the C-terminal region (NucS-CT) (Supplementary Fig. 6), 370

sequences containing the N-terminal region (NucS-NT) (Supplementary Fig. 7) were run.

Sequences were aligned to the structure-based profiles, and the alignments were

subjected to ML search and 1,500 bootstrap replicates using RAXML 47

. All free model

parameters were estimated by RAxML where we used a GAMMA model of rate

heterogeneity and ML estimate of alpha-parameter. The most likely selected model was LG

48. For accuracy and focus on our alignment, we run the sequential version of RAxML

creating various sets of starting trees (10 by manual inferring different starting trees and 10 as

  19 

automatically inferred by the program). The best setting (the closest likelihood to zero) was

used for further calculations.

To run phylogenetic analyses, sequences from the PFAM PF01939 domain were used,

but aligned to their corresponding region. For details of the particular regions for each domain

see Supplementary Data 4. Sequences found in our searches with NT or CT regions outside

NucS were also included. Phylogenetic trees were visualized with iTOL (http://itol.embl.de)

49. The raw trees corresponding to both CT and NT regions (Supplementary Figs. 6 and 7) are

available as Supplementary Data 5 and 6 respectively.

Data Availability

The authors declare that all data supporting the findings of this study are available within the

paper and its supplementary information files.

 

   

  20 

References 

1  Friedberg,  E.  et al. DNA Repair and Mutagenesis. Washington DC, USA: American 

Society of Microbiology. 2nd Edition edn,  (2006). 

2  Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins, 

and processes. Mutation research 435, 171‐213 (1999). 

3  Iyer,  R.  R.,  Pluciennik,  A.,  Burdett,  V.  &  Modrich,  P.  L.  DNA  mismatch  repair: 

functions  and  mechanisms.  Chem  Rev  106,  302‐323,  doi:10.1021/cr0404794 

(2006). 

4  Matic, I., Rayssiguier, C. & Radman, M. Interspecies gene exchange in bacteria: the 

role of SOS and mismatch repair systems in evolution of species. Cell 80, 507‐515, 

doi:0092‐8674(95)90501‐4 [pii] (1995). 

5  Sachadyn, P. Conservation and diversity of MutS proteins. Mutation research 694, 

20‐30, doi:10.1016/j.mrfmmm.2010.08.009 (2010). 

6  Mizrahi,  V.  &  Andersen,  S.  J.  DNA  repair  in  Mycobacterium  tuberculosis.  What 

have  we  learnt  from  the  genome  sequence? Molecular microbiology  29,  1331‐

1339 (1998). 

7  Banasik, M. & Sachadyn, P. Conserved motifs of MutL proteins. Mutation research 

769, 69‐79, doi:10.1016/j.mrfmmm.2014.07.006 (2014). 

8  Springer,  B.  et  al.  Lack  of  mismatch  correction  facilitates  genome  evolution  in 

mycobacteria.  Molecular  microbiology  53,  1601‐1609,  doi:10.1111/j.1365‐

2958.2004.04231.x (2004). 

9  Ford, C. B. et al. Use of whole genome sequencing to estimate the mutation rate of 

Mycobacterium tuberculosis during latent infection. Nature genetics 43, 482‐486, 

doi:10.1038/ng.811 (2011). 

10  Kucukyildirim,  S.  et  al.  The  Rate  and  Spectrum  of  Spontaneous  Mutations  in 

Mycobacterium smegmatis, a Bacterium Naturally Devoid of the Post‐replicative 

Mismatch Repair Pathway. G3 (Bethesda), doi:10.1534/g3.116.030130 (2016). 

11  Ren, B. et al. Structure and function of a novel endonuclease acting on branched 

DNA  substrates.  The  EMBO  journal  28,  2479‐2489,  doi:emboj2009192 

[pii]10.1038/emboj.2009.192 (2009). 

12  Ishino,  S.  et  al.  Identification  of  a  mismatch‐specific  endonuclease  in 

hyperthermophilic  Archaea.  Nucleic  acids  research,  doi:10.1093/nar/gkw153 

(2016). 

13  Wagner,  R.  et  al.  Involvement  of  Escherichia  coli  mismatch  repair  in  DNA 

replication  and  recombination.  Cold  Spring Harb  Symp Quant  Biol  49,  611‐615 

(1984). 

14  Nakae,  S. et al.  Structure  of  the EndoMS‐DNA Complex  as Mismatch Restriction 

Endonuclease. Structure 24, 1960‐1971, doi:10.1016/j.str.2016.09.005 (2016). 

15  Gross, M. D. &  Siegel,  E.  C.  Incidence  of mutator  strains  in  Escherichia  coli  and 

coliforms in nature. Mutation research 91, 107‐110 (1981). 

16  LeClerc, J. E., Li, B., Payne, W. L. & Cebula, T. A. High mutation frequencies among 

Escherichia coli and Salmonella pathogens. Science 274, 1208‐1211 (1996). 

17  Matic,  I.  et  al.  Highly  variable  mutation  rates  in  commensal  and  pathogenic 

Escherichia coli. Science 277, 1833‐1834 (1997). 

18  Oliver,  A.,  Canton,  R.,  Campo,  P.,  Baquero,  F.  &  Blazquez,  J.  High  frequency  of 

hypermutable Pseudomonas aeruginosa  in  cystic  fibrosis  lung  infection. Science 

288, 1251‐1254, doi:8507 [pii] (2000). 

  21 

19  Picard,  B.  et  al.  Mutator  natural  Escherichia  coli  isolates  have  an  unusual 

virulence  phenotype.  Infect  Immun  69,  9‐14,  doi:10.1128/IAI.69.1.9‐14.2001 

(2001). 

20  Giraud, A., Matic, I., Radman, M., Fons, M. & Taddei, F. Mutator bacteria as a risk 

factor in treatment of infectious diseases. Antimicrobial agents and chemotherapy 

46, 863‐865 (2002). 

21  Macia,  M.  D.  et  al.  Hypermutation  is  a  key  factor  in  development  of  multiple‐

antimicrobial resistance in Pseudomonas aeruginosa strains causing chronic lung 

infections.  Antimicrobial  agents  and  chemotherapy  49,  3382‐3386, 

doi:49/8/3382 [pii] 10.1128/AAC.49.8.3382‐3386.2005 (2005). 

22  Muller,  B.,  Borrell,  S.,  Rose,  G.  &  Gagneux,  S.  The  heterogeneous  evolution  of 

multidrug‐resistant Mycobacterium tuberculosis. Trends in genetics : TIG 29, 160‐

169, doi:10.1016/j.tig.2012.11.005 (2013). 

23  Ford,  C.  B.  et  al.  Mycobacterium  tuberculosis  mutation  rate  estimates  from 

different  lineages  predict  substantial  differences  in  the  emergence  of  drug‐

resistant tuberculosis. Nature genetics 45, 784‐790, doi:10.1038/ng.2656 (2013). 

24  Cox, E. C. Bacterial mutator genes and the control of spontaneous mutation. Annu 

Rev Genet 10, 135‐156, doi:10.1146/annurev.ge.10.120176.001031 (1976). 

25  Lee,  H.,  Popodi,  E.,  Tang,  H.  &  Foster,  P.  L.  Rate  and  molecular  spectrum  of 

spontaneous  mutations  in  the  bacterium  Escherichia  coli  as  determined  by 

whole‐genome sequencing. Proceedings of the National Academy of Sciences of the 

United  States  of  America  109,  E2774‐2783,  doi:10.1073/pnas.1210309109 

(2012). 

26  Garibyan,  L.  et  al.  Use  of  the  rpoB  gene  to  determine  the  specificity  of  base 

substitution mutations on the Escherichia coli chromosome. DNA Repair (Amst) 2, 

593‐608 (2003). 

27  Tham,  K.  C.  et  al.  Mismatch  repair  inhibits  homeologous  recombination  via 

coordinated directional unwinding of trapped DNA structures. Mol Cell 51, 326‐

337, doi:10.1016/j.molcel.2013.07.008 (2013). 

28  Feinstein,  S.  I.  &  Low,  K.  B.  Hyper‐recombining  recipient  strains  in  bacterial 

conjugation. Genetics 113, 13‐33 (1986). 

29  Spies,  M.  &  Fishel,  R.  Mismatch  repair  during  homologous  and  homeologous 

recombination.  Cold  Spring  Harbor  perspectives  in  biology  7,  a022657, 

doi:10.1101/cshperspect.a022657 (2015). 

30  Rock,  J.  M.  et  al.  DNA  replication  fidelity  in  Mycobacterium  tuberculosis  is 

mediated  by  an  ancestral  prokaryotic  proofreader.  Nature  genetics, 

doi:10.1038/ng.3269 (2015). 

31  Ebrahimi‐Rad, M.  et al. Mutations  in  putative mutator  genes  of Mycobacterium 

tuberculosis strains of the W‐Beijing family. Emerging infectious diseases 9, 838‐

845, doi:10.3201/eid0907.020589 (2003). 

32  Dos Vultos, T., Blazquez, J., Rauzier, J., Matic, I. & Gicquel, B. Identification of Nudix 

hydrolase  family  members  with  an  antimutator  role  in  Mycobacterium 

tuberculosis  and Mycobacterium  smegmatis.  Journal of bacteriology 188,  3159‐

3161, doi:188/8/3159 [pii] 10.1128/JB.188.8.3159‐3161.2006 (2006). 

33  Dagan,  T.,  Artzy‐Randrup,  Y.  &  Martin,  W.  Modular  networks  and  cumulative 

impact  of  lateral  transfer  in  prokaryote  genome  evolution.  Proceedings  of  the 

National Academy of Sciences of the United States of America 105, 10039‐10044, 

doi:10.1073/pnas.0800679105 (2008). 

  22 

34  Gophna, U., Charlebois, R. L. & Doolittle, W. F. Have archaeal genes contributed to 

bacterial  virulence?  Trends  in  microbiology  12,  213‐219, 

doi:10.1016/j.tim.2004.03.002 (2004). 

35  Busch, C. R. & DiRuggiero,  J. MutS and MutL are dispensable for maintenance of 

the genomic mutation  rate  in  the halophilic  archaeon Halobacterium salinarum 

NRC‐1. PloS one 5, e9045, doi:10.1371/journal.pone.0009045 (2010). 

36  Sassetti,  C.  M.,  Boyd,  D.  H.  &  Rubin,  E.  J.  Comprehensive  identification  of 

conditionally  essential  genes  in  mycobacteria.  Proceedings  of  the  National 

Academy  of  Sciences  of  the  United  States  of  America  98,  12712‐12717, 

doi:10.1073/pnas.231275498 (2001). 

37  Parish, T. & Stoker, N. G. Use of a  flexible cassette method to generate a double 

unmarked Mycobacterium tuberculosis tlyA plcABC mutant by gene replacement. 

Microbiology  146  (  Pt  8),  1969‐1975,  doi:10.1099/00221287‐146‐8‐1969 

(2000). 

38  Stover,  C.  K.  et al.  New use  of  BCG  for  recombinant  vaccines. Nature 351,  456‐

460, doi:10.1038/351456a0 (1991). 

39  Lee, M. H., Pascopella, L., Jacobs, W. R., Jr. & Hatfull, G. F. Site‐specific integration 

of  mycobacteriophage  L5:  integration‐proficient  vectors  for  Mycobacterium 

smegmatis,  Mycobacterium  tuberculosis,  and  bacille  Calmette‐Guerin. 

Proceedings  of  the National Academy of  Sciences  of  the United  States  of  America 

88, 3111‐3115 (1991). 

40  Paget,  M.  S.  et  al.  Mutational  analysis  of  RsrA,  a  zinc‐binding  anti‐sigma  factor 

with  a  thiol‐disulphide  redox  switch.  Molecular  microbiology  39,  1036‐1047 

(2001). 

41  Hinds, J. et al. Enhanced gene replacement in mycobacteria. Microbiology 145 ( Pt 

3), 519‐527, doi:10.1099/13500872‐145‐3‐519 (1999). 

42  Joshi,  K.  R.,  Dhiman,  H.  &  Scaria,  V.  tbvar:  A  comprehensive  genome  variation 

resource  for  Mycobacterium  tuberculosis.  Database  (Oxford)  2014,  bat083, 

doi:10.1093/database/bat083 (2014). 

43  Comas,  I.  et  al.  Out‐of‐Africa  migration  and  Neolithic  coexpansion  of 

Mycobacterium  tuberculosis  with  modern  humans.  Nature  genetics  45,  1176‐

1182, doi:10.1038/ng.2744 (2013). 

44  Katoh,  K.,  Misawa,  K.,  Kuma,  K.  & Miyata,  T. MAFFT:  a  novel method  for  rapid 

multiple  sequence  alignment  based  on  fast  Fourier  transform.  Nucleic  acids 

research 30, 3059‐3066 (2002). 

45  Sonnhammer, E. L. & Hollich, V. Scoredist: a simple and robust protein sequence 

distance  estimator.  BMC  Bioinformatics  6,  108,  doi:10.1186/1471‐2105‐6‐108 

(2005). 

46  Eddy,  S.  R.  A  new  generation  of  homology  search  tools  based  on  probabilistic 

inference. Genome Inform 23, 205‐211 (2009). 

47  Stamatakis, A. RAxML‐VI‐HPC: maximum likelihood‐based phylogenetic analyses 

with  thousands  of  taxa  and  mixed  models.  Bioinformatics  22,  2688‐2690, 

doi:10.1093/bioinformatics/btl446 (2006). 

48  Le,  S.  Q.  &  Gascuel,  O.  An  improved  general  amino  acid  replacement  matrix. 

Molecular  biology  and  evolution  25,  1307‐1320,  doi:10.1093/molbev/msn067 

(2008). 

49  Letunic,  I.  &  Bork,  P.  Interactive  tree  of  life  (iTOL)  v3:  an  online  tool  for  the 

display and annotation of phylogenetic and other trees. Nucleic acids research 44, 

W242‐245, doi:10.1093/nar/gkw290 (2016). 

  23 

50  Hug,  L.  A.  et  al.  A  new  view  of  the  tree  of  life.  Nat  Microbiol  1,  16048, 

doi:10.1038/nmicrobiol.2016.48 (2016). 

  24 

End Notes

Acknowledgments.

J.B. was supported by Plan Nacional de I+D+i and Instituto de Salud Carlos III,

Subdirección General de Redes y Centros de Investigación Cooperativa, Ministerio de

Economía y Competitividad, Spanish Network for Research in Infectious Diseases (REIPI

RD12/0015) - co-financed by European Development Regional Fund "A way to achieve

Europe" ERDF and SAF2015-72793-EXP and BFU2016-78250-P from Spanish Ministry of

Science and Competitiveness (MINECO)-FEDER). A.J.D. was supported by a grant from the

Biotechnology and Biological Sciences Research Council (BB/J018643/1). A. C-G. and J. R-

B. were supported by contracts from REIPI RD15/0012. and A. C-G also by a Ramon Areces

fellowship for Life Sciences. A. P. was supported by a “Sara Borrell” contract, Instituto de

Salud Carlos III. T.T. was supported by the Research Council of Norway, GLOBVAC project

234506. We are grateful to I. Cases for his help customizing the R libraries for tree

visualization, Federico Abascal for critical input regarding models of evolution, I. Comas for

critical reading of the manuscript and J. Glynn and S. Hoffner for providing additional data of

M. tuberculosis strains.

Author contributions: J.B. designed the project, directed the experimental work, constructed

S. coelicolor strains and wrote the manuscript; A.C-G participated in the project design,

performed and designed experiments, made strains in M. smegmatis and S. coelicolor,

developed biochemical assays together with P.P and wrote the manuscript. A.I.P. made

strains, measured recombination rates, performed experiments with M. smegmatis and

bioinformatics search of NucS polymorphisms. A. M. R. performed all the computational

analyses (evolutionary, structural bioinformatics and sequence) and wrote the manuscript;

J.R-B. measured mutation rates and performed statistical analysis; N.A-R, D.C., C.C. L.P., E.

D. Z., M. Herranz, T.T., D.G-V, S.W, M. P. and A.J.D. contributed with strain construction

and/or measured mutation rates. A.J.D. performed data base exploration, helped with NucS

structure and participated in writing the manuscript.

Conflict of interest. The authors declare no conflict of interest.

 

   

  25 

Figure legends.

Figure 1. Identification and characterization of M. smegmatis NucS. A) Schematic

representation of the process for identifying the nucS transposon mutant. ~11,000 clones from

the M. smegmatis insertion mutant library were replicated onto plates with (Rif) or without

(No Rif) rifampicin (step 1). One single clone (circled) produced a high number of Rif-R

colonies. After isolation and purification (step 2), the frequency of spontaneous Rif-R mutants

(bottom plate) was checked and compared with that of the wild-type (upper plate),

demonstrating its hypermutable phenotype. B) Multiple sequence alignment of NucS

sequences. M. tuberculosis (NucS_Mtu), M. smegmatis (NucS_Msm) and P. abyssi

(NucS_Pab) sequences are from Uniprot (identifiers are P9WIY4, A0R1Z0 and Q9V2E8,

respectively). Solid lines over the alignment indicate protein domains as defined previously

for P. abyssi NucS 11

(black, DNA-binding; grey, nuclease). Identical amino acid residues are

shown in black. Catalytic residues required for nuclease activity in P. abyssi NucS 11

are

labelled with asterisks. Nuclease motifs of P. abyssi NucS are also indicated 11

. C) DNA

binding activity of NucS. In a gel-based electrophoretic mobility shift assay (EMSA), purified

NucS protein (1 µM to 16 µM) is capable of binding to 45-mer ssDNA (50 nM) (left side) but

not to 45-bp dsDNA (50 nM) (right side). The arrow indicates the position of the DNA-NucS

complex.

Figure 2. Mutational effects of nucS deletion.

A) Rates of spontaneous mutations conferring rifampicin, Rif-R (red), and streptomycin

resistance, Str-R (grey) of M. smegmatis mc2 155 (WT), its ΔnucS derivative and the ΔnucS

strain complemented with nucS from M. smegmatis mc2 155 (nucSSm). B) Mutational

spectrum of M. smegmatis mc2 155 (green) and its ∆nucS derivative (red). Bars represent the

frequency of the types of change found in rpoB. C) Rates of spontaneous mutations

conferring Rif-R (red) and Str-R (grey) of S. coelicolor A3(2) M145 (WT), its ΔnucS

derivative and the ΔnucS strain complemented with the wild-type nucS from S. coelicolor

(nucSSco).

Error bars represent 95% confidence intervals (n=20). Asterisks denote statistical significance

(Likelihood ratio test under Luria-Delbruck model, Bonferroni corrected, p-value <10-4

in all

cases). Mutation rate: mutations per cell per generation.

  26 

Figure 3. Effect of nucS deletion on recombination.

A) Chromosomal construct used to measure recombination between homologous or

homeologous DNA sequences. The hyg gene is reconstituted by a single recombination event

between two 517-bp overlapping fragments (striped), sharing different degree of sequence

identity (100%, 95%, 90% and 85%) and separated by a kanamycin resistant (Kan-R) gene.

Recombinant clones express hygromycin resistance and kanamycin susceptibility.

B) Rates of recombination between homologous and homeologous DNA sequences with

different degree of identity (%) in M. smegmatis mc2 155 (WT, green squares) and its ΔnucS

derivative (red diamonds). Error bars represent 95% confidence intervals (n=16). Asterisks

denote statistical significance (Likelihood ratio test under Luria-Delbruck model, Bonferroni

corrected, p-value <10-4

in all cases).

Figure 4. Effects of NucS polymorphisms on mutation rates in the M. smegmatis ΔnucS

surrogate model.

Rates of spontaneous mutations conferring Rif-R of the M. smegmatis ΔnucS complemented

with wild-type nucSTB (ΔnucS/nucSTB; red) or containing each of the 9 polymorphisms

indicated (blue). Relative increases in mutation rates with respect to the control strain

(ΔnucS/nucSTB; set to 1) are shown inside the column. Error bars represent 95% confidence

intervals (n=20). Asterisks denote statistical significance (Likelihood ratio test under Luria-

Delbruck model, Bonferroni corrected; ***p < 0.001; ** p < 0.005). Mutation rate: mutations

per cell per generation.

Figure 5. Phylogenetic profiling of NucS. The NCBI taxonomic tree from 2,186 species

from Bacteria (black outer label) and Archaea (blue outer label). Orange branches: NucS

only; green branches: NucS and MutS-MutL. Bacteria includes Actinobacteria, Firmicutes,

Proteobacteria, FCB (Fibrobacteres, Chlorobi and Bacteroidetes), Other Terrabacteria

(Armatimonadetes, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Tenericutes and

unclassified Terrabacteria) and other groups (Acidobacteria, Aquificae, Caldiserica,

Chrysiogenetes, Deferribacteres, Dictyoglomi, Elusimicrobia, Fusobacteria, Nitrospirae, PVC

group, Spirochaetes, Synergistetes, Thermodesulfobacteria, Thermotogae and unclassified

bacteria). Archaea includes Euryarchaeota, TACK (Thaumarchaeota, Aigarchaeota,

Crenarchaeota and Korarchaeota) and unclassified archaeal species (*). As NucS is absent in

  27 

eukaryotes and viruses, these lineages were removed for clarity purposes. The tree was

annotated using ggtree (http://www.bioconductor.org/packages/ggtree).

Figure 6. A model for NucS protein emergence and evolution. The unrooted Tree of Life

(available and based on reference 50

) was used to depict the proposed evolutionary history of

NucS according to our data. The groups relevant to our model are highlighted. Coloured

squares depict the NucS-NT (blue) and NucS-CT (red) terminal regions. This model proposes

that NucS has an archaeal origin and emerged as a combination of two independent protein

domains with complex evolutionary history. Numbers indicate the steps of the model: Both

N-terminal and C-terminal regions likely emerged in the archaeal lineage (1). The CT region

was transferred via HGT to very few Eukaryotes and to some Bacteria (main groups with any

species having the NucS-CT region are labelled with red circles), where the CT domain

combined with other regions outside the context of NucS. In the archaeal lineage, NT and CT

regions fused to produce the full NucS (2). NucS expanded in many archaeal groups but was

also lost in some others. The full NucS protein was transferred to Bacteria by at least two

independent HGT events, one to some Deinoccocus-Thermus species (3) and another to

Actinobacteria (4).

MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLTENucS_Mtu 1--- RLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVTENucS_Msm 1HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIH-QNKKREPVNWQPPGSKVTFNucS_Pab 25

ESG-GQAPVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQLLNucS_Mtu 62QDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVELLNucS_Msm 59KEN-S---IISIRRRPYERLEVEIIEPYSLVVFLAEDYEELALTGSEAEM-ANLIFENPRVNucS_Pab 85

G--EGYTLVRREYMTAIGPVDLLCSDERG-GSVAVEIK-RRGEIDGVEQLTRYLELLNRDSNucS_Mtu 122G--AGYTLVRREYPTPIGPVDLLCRDELG-RSVAVEIK-RRGEIDGVEQLTRYLELLNRDSNucS_Msm 120IEEGFKPIYRE-KPIRHGIVDV-MGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEYNucS_Pab 141

VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYDTMRGMDSGEYRLFNucS_Mtu 179 226LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYDQMRGMDSDEYRLFNucS_Msm 177 224GE-NVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP-KKG---------NucS_Pab 200 236

Motif I

*

Motif II

**Motif III

**Motif QxxxY

**

M

A

B

CNucS NucS NucS

NucS-ssDNA

45-mer 50nM ssDNA 45-mer ssDNA 45bp-dsDNA

TACK group

FC

B g

roup

ProteobacteriaA

ctin

obact

eria

Firmicutes

Oth

er T

erra

bact

eria

Other gro

ups

Euryarchaeota

NucS+MutS/L

NucS

*

Cyanobacte

ria

Chloroflexi

Chlorobi

beta-ProteobacteriaF

irmicu

tes

Planctomycetes

Verrucomicrobia

Chlamydia

alpha-Proteobacteria

gamma-Proteobacteria

epsilon-Proteobacteria

Microgenomates

Parcubacteria

(1)

(4)

delta

-Pro

teobact

eria

Halobacteria

Deinococcus-Therm

us

Euryarchaeota

Crenarchaeota

Actinobacteria

Rubrobacteria

No NucS

Coriobacteriia

No NucS

(3)

(2)

EUKARYAPlacozoa,Porifera

NucS-CTNucS-NT

Horizontal gene transfer

Tree scale: 1

  1 

Supplementary Information

Supplementary Figures.

Supplementary Figure 1. Purified mycobacterial NucS does not cleave mismatch DNA

substrates in vitro. A) Native M. smegmatis NucS protein purification. Recombinant NucS protein

was expressed in E. coli, purified and concentrated, as described (see Methods). Native NucS

protein was analysed in a SDS-PAGE gel and stained with Coomassie Brillant Blue. Lane 1,

molecular marker; Lane 2: purified native M. smegmatis NucS; Lane 3: concentrated M. smegmatis

NucS protein (used for EMSA and nuclease assays). B) Double stranded DNA (30 nM) with or

without the indicated internal mismatches, was incubated with 300 nM NucS. Sequence consensus

is shown at the bottom and mismatch positions are indicated in red. No significant cleavage activity

was observed with the shown 36-mers or with longer 75-mer (not displayed here) mismatch

substrates.

C-A:C-A:

C-T:C-T:

C-C:C-C:

T-T:T-T:

T-G:T-G:

A-A:A-A:

A-G:A-G:

G-G:G-G:

NO:NO:

37

25

50

20

NucS

15

75

1 2 3

200

A B

10

kDa

150

100

dsDNA

  2 

A

B

Supplementary Figure 2. Tools for measuring recombination rates.

A) Alignment of the 517 bp overlapping sequences, with 100%, 95%, 90% and 85% identities,

shared by hyg 5´ and hyg 3´ regions, used to construct the different pRhomyco versions. Changes

  3 

introduced in the original sequence are highlighted. B) Cartoon of plasmid pRhomyco before and

after recombination. hyg 5’ and hyg 3’ are truncated alleles of the hyg gene carrying a 3´-terminal or

5´-terminal deletion, respectively. Both alleles overlap 517 bp (striped regions) and are separated by

a 1,200 bp region containing aph3 gene (Kan-R). Four versions of pRhomyco were generated

carrying hyg 5’ alleles 100%, 95%, 90% or 85% identical, in its overlapping region, to the hyg 3’.

Plasmids integrate in the chromosome of M. smegmatis mc2

155 and its ΔnucS derivative. Site-

specific recombination between the attP from the plasmid and the unique bacterial attB is promoted

by the plasmid-encoded integrase.

  4 

Supplementary Figure 3. Domain characterization of M. tuberculosis NucS and polymorphic

residues. A) The upper part is a schematic representation of the homodimeric structure of P. abyssi

NucS (2VDL) 1. The two distinct domains of NucS, N-terminal and C-terminal, are in blue and

N-terminal domainC-terminal domain

A

G157

D160

E174

Q187

Y191

K176

R42

R70

W75

HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIHQN-KKREPVNWQPPGSKVT2VLD_Paby 25M---RLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVTNucS_Mycsme 1MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLTNucS_Myctu 1

* * * * * * * ** * * ** * * *

* * * *

* * *

* * * ** * *

** * * *

*

* * **

* * * *

* * *

**

FK--E--NSmISIRRRPYERLEVEIIEPYSLVVFLAEDYEELaltgSEAEmANLIFENPR2VLD_Paby 84EQDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVENucS_Mycsme 58EESGGQ-APVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQNucS_Myctu 61

VIEEGFKPIYREKPIRHGIVDVMGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEY2VLD_Paby 140LLGAGYTLVRREYPTPIGPVDLLCRDELGRSVAVEIKR-RGEIDGVEQLTRYLELLNRDSNucS_Mycsme 118LLGEGYTLVRREYMTAIGPVDLLCRDERGGSVAVEIKR-RGEIDGVEQLTRYLELLNRDSNucS_Myctu 120

-GENVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP2VLD_Paby 200 233LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYDNucS_Mycsme 177 211VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYDNucS_Myctu 179 213

*

180o

Q187

K176

Y191

E174

Q187

A67

V69

S54

S39

W75W

75R70

R70

R42R42

A135

D162

Q187

K184

Y191

E127 D160

G157

A67

R144

K184

A135

D162

Q187

T168

B

* R I

S A

S S H A

E

R144

T168

S39

S54

V69

W75W

75R70

R70

R42R42

*

E127

C-terminal domain

  5 

pink, respectively. The first residues (1-25) of the NucS P. abyssi N-terminal domain, missing in

the M. tuberculosis model, are depicted as green cartoon. The putative β-clamp binding motif in P.

abyssi NucS is shown in orange. The important catalytic and DNA binding sites are depicted over

the P. abyssi structure as vertical red bars with the residues numbered above. Back and front views

of the structural superimposition of the homodimeric resolved structure of NucS from P. abyssi 1

and the M. tuberculosis model (purple) is presented below the schema (colour code as described for

the upper scheme). The important catalytic and DNA binding sites are depicted as red sticks and the

residues where polymorphisms described in this work have been detected are shown as green sticks

over the structural imposition. B) Multiple structure-based alignment of NucS from P. abyssi

(2VDL), M. smegmatis mc2 155 and M. tuberculosis CDC155. Colour code is the same as in panel

A. The amino acid change of each naturally occurring M. tuberculosis polymorphism is depicted in

green. Important catalytic and DNA binding residues are in red. Grey low-case indicates regions

without structural information; magenta “m” indicates Seleno-Met modifications in the structure of

P. abyssi. Residues of the putative β-clamp binding motif in P. abyssi NucS are shown in orange.

Only regions that align among the three proteins are shown. Asterisks indicate identical residues in

the three aligned sequences.

  6 

Supplementary Figure 4. Computational procedures to analyse NucS distribution and

evolution. Each panel depicts a different protocol. Yellow fonts indicate figures and/or tables.

Yellow uppercase letters are pieces of evidence used in the model presented in Fig. 6 (main text).

Domain Analyses IInitial Sequence Searches

~400

Compare ranked positions hits by e-value and

bit-score

NucS_pyrab NucS_myctu

e-val < 0.0001, aln lenght > 75%,

Bit score > 50

- ~370 unique archaeal/bacterial proteins

- Eukaryotes and Virus discarded as lack NucS (A)

Align sequences

(excluding original

queries)

pHMMER

MAFFT

hmmbuild

hmmsearch

e-val < 0.0001, aln lenght > 75%,

Bit score > 50

~400

370 370

External info

Input file

Output file

Results

Program

MutS/L identification

hmmbuild

hmmsearch

MutS MutL Selected species

Bit score > 50, aln lenght > 75%,

MUTSMutL seqs MutS seqs

pMutS pMutL

2,841 prokaryotic

genomes

MAFFT

alnMutS alnMutL

Phylogenetic Profiling

Figure 5

Supplementary data file 1

Large DB including

3,942 Ref Proteomes

aln-nr 63%

Large DB including

3,942 Ref Proteomes

p-aln p-aln

aln-nr 63%

Structure-based definition using NucS_pyrab:

full, N-terminal and C-terminal.

Archaea

Bacteria

Archaea/Bacteria

Archaea-others

Archaea-others

Archaea/Bacteria

Prokayotes-othersEukaryotes

EukaryotesMAFFT- hmmbuild

Archaea-NucS

Bacteria-NucS

MAFFT-hmmbuild

Archaea-NucS

Bacteria-NucS

hmmbuild

hmmsearch

NucS is built on two distinct domains (B) Supp. Fig 3

Large DB including

3,942 Ref Proteomes

hmmsearchLarge DB including

3,942 Ref Proteomes

p-NT p-CT

aFull aNT aCT

pFull pNT pCT

Tree of Life(A)

(B) (C)

(D)

Proposed evolution of NucSFigure 6

MutS/L

Extract PFAM

PF01939 seqs

Domain Analyses II

hmmalign

3D-based

pairwise

RAXML: phylogeny

NucS- full sequence

Phylogenetic tree

NucS in some bacteria have been transferred from Archaea (C)

aFull aNT aCT

Supp. Fig 5 Supp. Fig 7 Supp. Fig 6

NucS N-Terminal

Phylogenetic tree NucS C-Terminal

Phylogenetic tree

NucS shows a disperse distribution pattern (D)

(*) Only complete genomes Actinobacteria phylum and Archaea

** Confident absence

If NO NucS is identified

if MutS/L is found

if MutS/L not found:

and “Unassembled WGS”

else, translated searches vs genome(*)

if MutS/L found

if MutS/L not found

If NucS IS identified

if MutS/L is found

if MutS/L not found:

and “Unassembled WGS”

else, translated searches vs genome(*)

if MutS/L found

if MutS/L not found

Undef MutS/L

MutS/LLikelyNot

NucS-MutS/L

NucS-MutS/L

NucS-only**

Procedure:

Undef MutS/L

and “Unassembled WGS” Undef NucS else, translated searches vs genome(*)

if NucS found

if NucS not found

NucS

LikelyNot

  7 

Supplementary Figure 5. Phylogenetic analyses of NucS. Unrooted ML tree of full NucS

sequences from the PFAM PF01939. Black branches are Actinobacteria; blue branches are

Archaea; red branches are species from the Deinococcus-Thermus group. Labels indicate species

name. Red circles represent >80% bootstrap in 1,500 replicates.

Meth

anocella

_palu

dic

ola

.st.D

SM

_17711

Str

epto

myces_sp._

Tu6071

Stre

pto

myces_sp._

WM

6386

09

76

1.M

SD.t

s.iy

bsl

aw

_m

utar

da

uq

ola

H

Therm

ococcus_gam

mato

lera

ns.s

t.D

SM

_15229

Actinomyces_sp._oral_taxon_848_str._F0332

Corynebacterium_epiderm

idicanis

Amycolicicoccus_subflavus.st.DSM_45089

Actinomyces_sp.oral_tax_180.str.F0310

Natro

nobacte

rium

_gre

gory

i_A

TC

C.4

3098

Micr

obacteriu

m_hyd

roca

rbonoxy

dans

Corynebacterium_jeikeium.st.K411

Aero

pyru

m_pern

ix_ATC

C.7

00893

Halo

bacte

rium

_sp._

DL1

Mycobacterium_abscessus_ATCC.19977

Halo

bacte

rium

_sp._

DL1

Actin

obacte

ria_bacte

rium

_O

V450

Strepto

myc

es_

sp._

CN

Q-5

09

Corynebacterium_argentoratense_DSM.44202

Conexibacter_w

oesei.st.DSM

_14684

Stre

pto

myces_m

angro

vis

oli

Noc

ardi

a_cy

riacige

orgi

ca.s

t.GUH-2

Saccharomonospora_glauca_K62

Mycobacterium_parascrofulaceum_ATCC.BAA-614

Str

epto

myc

es_

sp._

WM

6378

Gardnerella_vaginalis_ATCC.14019

Rhodococcus_jostii.st.RHA1

Pyr

obacu

lum

_aero

philu

m_AT

CC

.51768

Strepto

myc

es_

rubello

murinus

Actinomyces_massiliensis_F0489

Rhodococcus_aetherivorans

Mycobacterium_sinense.st.JDM601

Actinomyces_sp.oral_tax_178.str.F0338

Rhodococcus_sp._B7740

Rothia_mucilaginosa.st.DY-18

Geoderm

atophilus_obscurus_ATCC.25078

Therm

omonospora_curvata_A

TC

C.19995

Kutzneria_albida_DSM.43870

Pyr

oco

ccus_

furiosu

s_A

TC

C.4

3587

Nocardiopsis_sp._N

RR

L_B-16309

Frigoribacterium_sp._RIT-PI-h

Strep

tom

yces

_rim

osus

_sub

sp._

rimos

us_A

TC

C.1

0970

Blastococcus_saxobsidens.st.DD2

Pseudonocardia_dioxanivorans_ATCC.55486

Meth

anopyru

s_kandle

ri.s

t.A

V19

Arthrobacter_globiformis.NBRC.12137

Stre

pto

myces_sp._

MM

G1121

Trueperella_pyogenes

Mycobacterium_hassiacum_DSM.44199

Stre

pto

myce

s_ca

ttleya

_ATC

C.3

5852

Rhodococcus_equi_ATCC.33707

Corynebacterium_glucuronolyticum

_ATCC.51866

Str

epto

myc

es_

tsuku

bensi

s_N

RR

L18488

Stre

pto

myces_averm

itilis_A

TC

C.3

1267

Actinotignum_schaalii

Kocuria_sp._UCD-OTCP

Actin

obacte

ria_bacte

rium

_O

V320

Saccharothrix_sp._NRRL_B-16348

Mycobacterium_paratuberculosis_ATCC.BAA-968

Natrin

em

a_sp.s

t.J7-2

Halo

rubru

m_la

cuspro

fundi_

AT

CC

.49239

Noca

rdia

_nova_SH

22a

22

90

07.

CC

TA

_m

ura

nila

s_

mui

ret

ca

bol

aH

Micr

obacteriu

m_laeva

niform

ans_OR221

Brevibacterium_mcbrellneri_ATCC.49030

Arthrobacter_arilaitensis.st.DSM_16368

Corynebacterium_glutamicum_ATCC.13032

Corynebacterium_resistens.DSM.45100

Pyro

coccus_abyssi.st.G

E5

Cellulomonas_gilvus_ATCC.13127

80

05.

ts.

sis

ne

gn

ag

gnij

_.p

sb

us

_s

uci

po

cs

org

yh

_p

ert

S

Actinomyces_sp._oral_tax_175.str.F0384

Stre

ptom

yces

_gris

eus_

subs

p._g

riseu

s.st

.JCM

_462

6

Sulfo

lobus_

solfa

taricu

s_ATC

C.3

5092

Beutenbergia_cavernae_ATCC.BAA-8

Str

epto

myces_virid

ochro

mogenes

Bifidobacteriu

m_scardovii

Scardovia_wiggsiae_F0424

Marin

itherm

us_

hyd

roth

erm

alis.st.D

SM

.14884

Sanguibacter_keddieii_ATCC.51767

Mobiluncus_mulieris_ATCC.35239

Act

inobact

eria

_bact

eriu

m_IM

CC

26256

Actinomyces_graevenitzii_C83

Stre

pto

myces_sp.

Cellulomonas_fimi_ATCC.484

marine_actinobacterium_PHSC20C1

Kita

sato

spora

_se

tae_ATC

C.3

3774

Actinobacte

ria_bacte

rium

_O

K006

Bifidobacterium_gallicum_DSM.20093

Str

epto

myces_gla

ucescens

Dein

oco

ccus_

soli_

Cha_et_

al._

2014

Halo

terrig

ena_tu

rkm

enic

a_A

TC

C.5

1198

Strep

tom

yces

_sp.

_769

Halo

fera

x_volc

anii_

AT

CC

.29605

Stre

pto

myces_sp._

e14

Renibacterium_salmoninarum_ATCC.33209

Mic

rolu

natu

s_ph

osph

ovor

us_A

TC

C.7

0005

4

Saccharothrix_espanaensis_ATCC.51144

Dietzia_c

inna

mea

_P4

Dein

oco

ccus_

pro

teolyticu

s_ATC

C.3

5074

Micro

bact

eriu

m_o

xyda

ns

Corynebacterium_vitaeruminis_DSM.20294

Noc

ardi

a_fa

rcin

ica.

st.IF

M_1

0152

Nocardia_sp._NRRL_S-836

Mycobacterium_therm

oresistibile

_ATCC.19527

Corynebacterium_diphtheriae_ATCC.700971

Actinoplanes_sp._N902-109

Bifidobacteriu

m_longum.st.NCC_2705

Strepto

myc

es_

sp._

NR

RL_F-6

602

Mycobacterium_haemophilum

Gordonia_polyisopre

nivorans.st.D

SM.44266

Pyro

lobus_

fum

arii.D

SM

.11204

Str

epto

myces_sp._

PB

H53

Bifidobacterium_bifidum_BGN4

Dermacoccus_nishinomiyaensis

misce_C

renarchaeota_group-1_archaeon_SG

8-32-1

Str

epto

myces_davaw

ensis

_JC

M_4913

Bifidobacterium_angulatum_DSM.20098

Strepto

myc

es_

xiam

enensi

s

Micr

obacteriu

m_ke

tosir

educens

Mycobacterium_vanbaalenii.st.DSM.251

Nocardioides_simplex

Corynebacterium_kutscheri

Meth

anosphaera

_sta

dtm

anae_A

TC

C.4

3021

Str

epto

myces_coelic

olo

r_A

TC

C.B

AA

Ilumatobacter_coccineus_Y

M16-304

Str

epto

myces_sp._

AS

58

Mycobacterium_leprae.st.TN

Gordonia_sputi_

NBRC.100414

Mobiluncus_curtisii_ATCC.43063

Mob

ilico

ccus

_pel

agiu

s_NBRC.1

0492

5

Nakamurella_multipartita_ATCC.700099

Pro

pion

imic

robi

um_l

ymph

ophi

lum

_AC

S-0

93-V

-SC

H5

Meth

anobre

vib

acte

r._A

bM

4

Stre

pto

myce

s_sp

._M

g1

Clavibacter_michiganensis_subsp._sepedonicus_ATCC.33113

Halo

sta

gnic

ola

_la

rsenii_

XH

-48

Corynebacterium_atypicum

Nocardioides_luteus

Stre

pto

myce

s_vie

tnam

ensis

Stre

pto

myce

s_sp

._N

RR

L_F

-4428

archaeon_GW

2011_AR

10

38

09

2.C

CT

A_

su

eci

vs

_s

ec

ym

otp

ert

S

Frankia_symbiont_subsp._D

atisca_glomerata

Microm

onospora_sp._NRRL_B

-16802

Frankia_sp._EUN1f

Therm

obifida_fusca.st.YX

Rothia_dentocariosa_ATCC.17931

Str

epto

myces_variegatu

s

misce_C

renarchaeota_group-6_archaeon_AD

8-1

Stre

pto

myces_scabie

i.st.8

7.2

2

Janibacter_hoylei_PVAS-1

Micr

obacteriu

m_tri

choth

ecenolyt

icum

Corynebacterium_pseudotuberculosis.st.C231

Strep

tom

yces

_niv

eus_

NC

IMB_1

1891

Strep

tom

yces

_lyd

icus

_A02

Therm

opro

teus_

sp._

AZ2

Kytococcus_sedentarius_ATCC.14392

Strepto

myc

es_

sp._

NR

RL_F-6

602

Mycobacterium_obuense

Bifidobacteriu

m_longum_subsp.longum_F8

Xylanimonas_cellulosilytica.st.DSM_15894

Mycobacterium_tuberculosis_H37Rv

Cellulomonas_flavigena_ATCC.482

Strepto

myc

es_

sp._

NR

RL_S

-495

Arthrobacter_sp._PAMC25486

Microm

onospora_sp._HK10

Stre

pto

myces_caele

stis

Cald

ivirga_m

aquili

ngensi

s_ATC

C.7

00844

Str

epto

myces_griseoflavus_T

u4000

Mycobacterium_elephantis

Dein

oco

ccus_

gobie

nsis.st.D

SM

_21396

Corynebacterium_xerosis

Micr

obacteriu

m_ch

ocola

tum

Mycobacterium_smegmatis_ATCC.700084

Str

epto

myces_virid

ochro

mogenes.s

t.D

SM

_40736

Natro

nobacte

rium

_gre

gory

i_A

TC

C.4

3098

Verrucosispora_maris.st.A

B-18-032

Streptomyces_sp._AA4

Segniliparu

s_ro

tundus_

ATCC.BAA-9

72

Jonesia_denitrificans_ATCC.14870

Gord

onia_sih

wensis_NBRC.1

08236

69

78

1.M

SD.t

s.ila

gto

ej_

su

cc

ocil

akl

ala

H

Corynebacterium_falsenii_DSM.44353

Rhodococcus_spRD6.2

Bifidobacterium_dentium_ATCC.27534

Bifidobacterium_therm

ophilum_RBL67

Halo

pig

er_

xanaduensis

.st.D

SM

.8323

Amycolatopsis_orientalis_HCCB10007

Curtobacterium_flaccumfaciens_UCD-AKU

Stre

pto

spora

ngiu

m_ro

seum

_ATC

C.1

2428

T9

11

CS

UM

_.p

s_

se

cy

mot

per

tS

Meth

anobre

vib

acte

r_ru

min

antium

_A

TC

C.3

5063

Acidothermus_cellulolyticus_ATC

C.43068

33

70

4_

MS

D.ts.

su

nillo

c_

se

cy

mot

pert

S

Micro

bacter

ium

_sp.

_SA39

Mycobacterium_kansasii_ATCC.12478

Arthrobacter_chlorophenolicus_ATCC.700700

Meth

anoca

ldoco

ccus_

jannasc

hii_

AT

CC

.43067

47

68

1_

MS

D.t

s.si

sn

ep

alo

om

_s

an

om

on

ort

aN

Leucobacter_sp._Ag1

Bra

chyb

acte

rium

_sp.

_SW

0106

-09

Stre

pto

myces_sp._

NR

RL_S

-444

Saccharopolyspora_erythraea_ATCC.11635

Dein

oco

ccus_

swuensis

Stre

pto

myce

s_sp

._N

RR

L_F-6

491

Vulc

anis

aeta

_dis

trib

uta

.st.D

SM

.14429

Mycobacterium_xenopi_RIVM700367

Saccharomonospora_marina_XMU15

Bifidobacteriu

m_asteroides.st.P

RL2011T

herm

ococcus_kodakare

nsis

_A

TC

C.B

AA

Pro

pion

ibac

teriu

m_a

cnes

.st.K

PA17

1202

Strepto

myc

es_

sp._

PVA

_94-0

7

Str

ept_

cyaneogriseus_subsp.n

oncyanogenus

Micrococcus_luteus_ATCC.4698

Corynebacterium_mustelae

Janibacter_sp._HTCC2649

Rhodococcus_erythropolis.st.PR4

Stre

pto

myces_sp._

NR

RL_W

C-3

618

67

9.2

1_

SD

_s

nati

cs

o_.

ps

bu

s_

su

ne

go

mor

hc

oe

sor

_tp

ertS

Terrabacter_sp._28

Aci

dia

nus_

hosp

italis

.st.W

1

Natria

lba_m

agadii_

AT

CC

.43099

Bifidobacterium_adolescentis_ATCC.15703

Micro

bacter

ium

_folioru

m

Catenulispora_acidiphila.st.D

SM

_44928

Micro

bacteriu

m_aza

dirach

tae

Lechevalieria_aerocolonigenes

Amycolatopsis_mediterranei.st.S699

Corynebacterium_sp._ATCC.6931

Gordonia_rh

izosp

hera_NBRC.16068

Therm

osphaera

_aggre

gans.s

t.DS

M.11

486

Salinispora_tropica_ATC

C.B

AA-916

I4F2P8 acti 477641

Mycobacterium_marinum_ATCC.BAA-535

halo

philic

_arc

haeon_D

L31

Stre

pto

myce

s_ka

trae

Stre

pto

myces_sp._

WM

6372

Bifidobacteriu

m_breve_DSM.20213

94

03

4.C

CT

A_i

utro

msir

am

_al

ucr

aol

aH

Actinobacte

ria_bacte

rium

_O

K074

Meth

anocella

_arv

ory

zae.s

t.DS

M.2

066

Therm

ofilum_sp._1910b

Stre

pto

myce

s_him

asta

tinicu

s_ATC

C.5

3653

Tsuk

amur

ella_p

auro

met

abola_

ATCC.8

368

Therm

ofilum_pendens.st.H

rk_5

Corynebacterium_variabile.st.DSM_44702

Frankia_sp.st.EuI1c

Bifidobacterium_kashiwanohense_PV20-2

Dein

oco

ccus_

marico

pensis.st.D

SM

.21211

Meth

anoth

erm

obacte

r_th

erm

auto

trophic

us_A

TC

C.2

9096

Pro

pion

ibac

teriu

m_a

cidi

prop

ioni

ci_A

TC

C.4

875

Gord

onia_am

arae_NBRC.1

5530

Str

epto

myces_in

carn

atu

s

Mycobacterium_neoaurum_VKM_Ac-1815D

Actinoplanes_sp._ATCC.31044

Mycobacterium_sp.st.KMS)

Patulibacter_m

edicamentivorans

Nocard

ia_bra

silie

nsis_ATCC.7

00358

Stre

ptom

yces

_pra

tens

is_A

TCC.3

3331

Nocardioides_sp.st.BAA-49

Therm

ogla

diu

s_ce

llulo

lyticus.st.1

633

Halo

geom

etr

icum

_borinquense_

AT

CC

.700274

Corynebacterium_efficiens.st.DSM_44549

Ignis

phaera

_aggre

gans.

st.D

SM

.17230

Kin

eosp

haer

a_lim

osa_

NBRC.1

0034

0

Strep

tom

yces

_pris

tinae

spira

lis_A

TC

C.2

5486

Nocardioides_sp._CF8

Therm

obisp

ora

_bisp

ora

_ATC

C.1

9993

Sta

phylo

therm

us_

marin

us_

AT

CC

.43588

Kocuria_rhizophila_ATCC.9341

Actinomyces_urogenitalis_DSM.15434

Corynebacterium_glyciniphilum

_AJ_3170

Gordonia_sp

._KTR9

Natro

nococcus_occultu

s_S

P4

Frankia_sp.st.EAN1pec

Meth

anobacte

rium

_sp._

MB

1

Str

epto

myces_nodosus

Stre

pto

myces_sp._

MM

G1533

Actinomyces_sp.oral_tax_448.str.F0400

Stre

pto

myce

s_bin

gch

enggensis.st.B

CW

-1

Hyp

erth

erm

us_

butylicu

s.st.DS

M_5456

Actinosynnema_mirum_ATCC.29888

Pro

pion

ibac

teriu

m_p

ropi

onic

um.s

t.F02

30a

47

80

07.

CC

TA

_ie

ata

ho

ku

m_

mui

bor

cim

ola

H

Stre

pto

myces_plu

ripote

ns

Stre

ptom

yces

_sp.

st.S

irexA

A-E

Meth

anobre

vib

acte

r_sm

ithii_

AT

CC

.35061

Microm

onospora_lupini_str._Lupac_08

Natro

nom

onas_phara

onis

_A

TC

C.3

5678

Meth

anobacte

rium

_la

cus.s

t.A

L-2

1

Arcanobacterium_haemolyticum_ATCC.9345

Str

epto

myc

es_

venezu

ela

e_AT

CC

.10712

Natrin

em

a_pelliru

bru

m.s

t.DS

M_15624

Strep

tom

yces

_aur

atus

_AG

R00

01

Bra

chyb

acte

rium

_fae

cium

_ATC

C.4

3885

Micr

obacteriu

m_te

stace

um.s

t.StL

B037

L0J2N8 acti 212767

Str

epto

myces_sp.s

t.S

PB

074

Actinoplanes_missouriensis_ATC

C.14538

Actinomyces_turicensis_ACS-279-V-Col4

Mycobacterium_sp._VKM_Ac-1817D

Bifidobacteriu

m_animalis

_subsp._lactis.st.A

D011

Stre

pto

myces_sp._

WM

4235

Gardnerella_vaginalis_JCP8481B

Aus

twickia_

chel

onae

_NBRC.1

0520

0

Stre

pto

myces_zin

cire

sis

tens_K

42

Arthrobacter_sp.st.FB24

Therm

ococcus_lit

ora

lis_A

TC

C.5

1850

Stre

pto

myces_sp._

MU

SC

136T

Kribbella_flavida.st.DSM_17836

Micro

bact

eriu

m_s

p._A

g1

Aeromicrobium

_marinum

_DSM.15272

Leifsonia_xyli_subsp._xyli.st.CTCB07

Intrasporangium_calvum_ATCC.23552

Kineococcus_radiotolerans_ATCC.BAA-149

Sulfo

lobus_

toko

daii.

st.D

SM

_16993

Microbacte

rium_ginse

ngisoli

Frankia_alni.st.ACN14a

Dein

oco

ccus_

gobie

nsis.st.D

SM

_21396

Corynebacterium_matruchotii_ATCC.14266

Corynebacterium_urealyticum_ATCC.43042

Halo

bacte

rium

_salin

aru

m_A

TC

C.7

00922

Stre

pto

myce

s_vio

lace

usn

iger_

Tu_411

3

Gordonia_bro

nchialis

_ATCC.25592

Bifidobacterium_pseudolongum_PV8-2

Str

epto

myc

es_

alb

us_

AT

CC

.21838

Sulfo

lobus_

aci

doca

ldarius_

ATC

C.3

3909

Saccharomonospora_viridis_ATCC.15386

Stre

ptom

yces

_cla

vulig

erus

_ATC

C.2

7064

Frankia_sp.st.CcI3

Isoptericola_variabilis.st.225

54

0M

_s

uc

aitn

aru

ao

esir

g_

se

cy

mot

pert

S

Sac

char

othr

ix_s

p._S

T-88

8

Actinoplanes_friuliensis_D

SM

.7358

Mycobacterium_rhodesiae.st.NBB3

Str

epto

myces_ghanaensis

_A

TC

C.1

4672

Pro

pion

ibac

teriu

m_f

reud

enre

ichi

i_su

bsp.

_she

rman

ii_ATC

C.9

614

Nocardiopsis_alba_A

TC

C.B

AA-2165

Candidatus_C

aldiarchaeum_subterraneum

Turicella_otitidis_ATCC.51513

Rhodococcus_pyridinivorans_SB3094

Microm

onospora_aurantiaca_ATCC.27029

Scardovia_inopinata_F0304

Meth

anobacte

rium

_palu

dis

.st.D

SM

.25820

Mycobacterium_sp._EPa45

Actinomyces_neuii_BVS029A5

Nocardiopsis_dassonvillei_A

TC

C.23218

Stackebrandtia_nassauensis.st.DSM_44728

Therm

ococcus_baro

philu

s.s

t.D

SM

_11

836

Tree scale: 0.1

ARCHAEA

ACTINOBACTERIA

Deinococcus

Tree scale

  8 

Supplementary Figure 6. Phylogenetic analyses of the NucS C-terminal (CT) region. Unrooted

ML tree of CT sequences from the PFAM PF01939 domain. Grey font indicates that this domain is

in NucS and blue font that is found outside NucS. Black branches, Actinobacteria; red

Deinococcus-Thermus; grey, other Bacteria; light blue, Archaea; green, eukaryotic sequences. Bold

fonts indicate the main species used in this study. Labels indicate species name. The circles

represent >80% bootstrap in 1,500 replicates.

111-A

ctinobacte

ria b

acte

rium

OV

320

415-Microbacterium ketosireducens

31

20

2.M

SD.

ev

erb

muir

etc

ab

odifi

B-8

03

35-Methanocella arvoryzae.st.DSM

22066

348-

Mar

inith

erm

us h

ydro

ther

mal

is.s

t.DSM

148

84169-C

orynebacterium camporealensis

07

90

1.C

CT

A s

us

omi

r.p

sb

us

su

so

mir

se

cy

mot

per

tS-

39

249-Gordonia sihw

ensis NBR

C 108236

270-M

ycobacte

rium

neoauru

m V

KM

Ac-1

815D

194-Austwickia chelonae NBRC 105200

106-S

trepto

myces x

iam

enensis

4N

GB

mu

difib

muir

etc

ab

odifi

B-7

03

424-Clostridium sp.CAG:628

79-A

ctin

obacte

ria b

acte

rium

OV

450

266-Myco

bacteriu

m sm

egm

atis.st.ATC

C.700084

258-Mycobacterium

leprae.st.TN

292-B

ifidobacte

rium

aste

roid

es.s

t.PR

L2011

196-Curtobacterium flaccumfaciens UCD-AKU

384-Haloarcula marismortui.st.A

TCC.43049

175-Corynebacterium sp.ATCC.6931

324-A

ero

mic

robiu

m m

arinum

.DS

M.1

5272

7-miscellaneous Crenarchaeota group archaeon SMTZ-80

112-S

trepto

myces s

p.s

t.S

PB

074

9-Burkholderiales bacterium GJ-E10

226-Arthrobacter sp.PAMC25486

66-A

ctinosyn

nem

a m

irum

.st.AT

CC

.29888

131-S

trepto

myc

es

zinci

resi

stens

K42

30

75

1.C

CT

A.t

s.si

tn

ec

sel

od

a m

uir

etc

ab

odi

fiB-

50

3

159-Cory

nebacteriu

m e

pidermidica

nis

243-Rhodococcus jostii.st.RHA1

117-S

trepto

myces s

p.M

MG

1533

284-M

obilu

ncus c

urtis

ii.st.A

TC

C.4

3063

124-S

trepto

myc

es

plu

ripote

ns

379-Halalkalicoccus je

otgali.st.D

SM 18796

162-Corynebacteriu

m efficiens.st.D

SM 44549

181-Corynebacterium lipophiloflavum.DSM.44291

333-M

icro

monosp

ora

sp.N

RR

L B

-16802

387-Haloquadratum walsbyi.st.DSM 16790

153-Cory

nebacteriu

m v

ariabile

.st.D

SM 4

4702

360-

Met

hano

ther

mob

acte

r the

rmau

totro

phicus

.st.A

TCC.2

9096

161-Coryn

ebacteriu

m glutamicum.st

.ATCC.13032

346-

Seg

nilip

arus

rot

undu

s.st

.ATC

C.B

AA

-972

390-Natronobacterium gregoryi.st.ATCC.43098

278-S

acch

aro

monosp

ora

gla

uca

K62

225-Arthrobacter arilaitensis.st.DSM 16368

39-Vibrio campbellii.st.H

Y01

173-Corynebacterium aurimucosum.st.ATCC.700975

85-S

trepto

myces a

lbus.s

t.AT

CC

.21838

211-Beutenbergia cavernae.st.ATCC.BAA-8

70-N

oca

rdio

psis a

lba.st.A

TC

C.B

AA

-2165

65-N

oca

rdia

sp.N

RR

L S

-836

419-Clostridium sp.CAG:433

62-S

acch

aro

thrix sp

.NR

RL B

-16348

275-M

ycobacte

rium

absce

ssus.st.A

TC

C.1

9977

313-A

ctinom

yces s

p.o

ral ta

xon 8

48 s

tr.F

0332

52-Frankia sp.st.E

AN

1pec

416-Parcubacteria

290-A

ctin

om

yces s

p.o

ral ta

xon 1

75 s

tr.F0384

25

85

3.C

CT

A.ts.

ay

eltta

c s

ec

ym

otp

ertS-

79

160-Cory

nebacteriu

m sp

.KPL1989

420-Clostridium sp.CAG:433

99-S

trepto

myces v

iola

ceusnig

er

Tu 4

113

58-S

tacke

bra

ndtia

nassa

uensis.st.D

SM

44728

250-Gordonia bronchialis.st.ATC

C.25592

274-M

ycobacte

rium

therm

ore

sistibile

ATC

C.1

9527

216-Microbacterium sp.Ag1

388-Halogeometricum borinquense.st.ATCC.700274

329-S

alin

ispora

tro

pic

a.s

t.A

TC

C.B

AA

-916

68-T

herm

obisp

ora

bisp

ora

.st.AT

CC

.19993

46-Deinococcus m

aricopensis.st.DSM

21211

369-Therm

ofilum

pendens.

st.H

rk 5

102-S

trepto

myces r

ubello

murinus

215-Microbacterium sp.SA39406-Pyrococcus abyssi.st.GE5

34-Caldivirga maquilingensis.st.ATCC.700844

154-Cory

nebacteriu

m je

ikeiu

m.s

t.K411

47-Deinococcus soli.C

ha et al.2014

206-Xylanimonas cellulosilytica.st.DSM 15894

197-Leucobacter sp.Ag1

132-S

trepto

myc

es

grise

ofla

vus

Tu4000

233-Arthrobacter aurescens.st.TC1

319-K

utz

neria a

lbid

a.D

SM

.43870

247-Gordonia sp.KTR9

107-S

trepto

myces s

p.N

RR

L F

-6602

402-Thermococcus kodakarensis.st.ATCC.BAA-918

182-Corynebacterium ureicelerivorans

73-S

trepto

myces k

atra

e

257-Mycobacterium

parascrofulaceum ATC

C.B

AA-614

17-Microgenomates

27-Lyngbya sp.st.PCC 8106

64-L

ech

eva

lieria

aero

colo

nig

enes

204-Jonesia denitrificans.st.ATCC.14870

244-Rhodococcus erythropolis.st.PR4

322-D

ele

ted-I

4F

2P

8265-Mycobacterium

sp.VKM

Ac-1817D

341-P

atu

libact

er m

edic

am

entiv

ora

ns

51-Frankia sp.E

UN

1f

126-S

trepto

myc

es

inca

rnatu

s

172-Corynebacterium casei LMG S-19264

185-Streptosporangium roseum.st.ATCC.12428

287-A

ctin

om

yces s

p.o

ral ta

xon 1

80 s

tr.F0310

80-S

trepto

myces s

p.W

M6372

376-Natro

nobacterium gregoryi.s

t.ATCC.43098

396-Natrialba magadii.st.ATCC.43099

43-Deinococcus gobiensis.st.D

SM

21396

281-A

ctinom

yces g

raeve

nitzii C

83

423-Trichoplax adhaerens

36-Methanocella paludicola.st.DSM

17711

373-Candidatus M

agnetobacteriu

m bavaric

um

30-Thermoproteus sp.AZ2

22-Hirschia baltica.st.ATCC.4981429-Jeotgalibacillus sp.D5

256-Mycobacterium

paratuberculosis.st.ATCC.B

AA-968

212-Microbacterium testaceum.st.StLB037

293-B

ifidobacte

rium

pseudolo

ngum

PV

8-2

50-Frankia sym

biont subsp.Datisca glom

erata

285-M

obilu

ncus m

ulie

ris A

TC

C.3

5239

163-Corynebacteriu

m kutscheri

44-Deinococcus gobiensis.st.D

SM

21396

38-Paraglaciecola arctica BSs20135

114-A

ctinobacte

ria b

acte

rium

OK

006

109-S

trepto

myces v

ariegatu

s

418-Pelagibacter ubique.st.HTCC1062

378-Halobacteriu

m salinarum.st.A

TCC.700922

53-Frankia sp.st.E

uI1c

151-

Cor

yneb

acte

rium

falsen

ii.DSM

.443

53

32-Ignisphaera aggregans.st.DSM 17230

37-Paraglaciecola arctica BSs20135

49-Frankia alni.st.A

CN

14a

143-

Stre

ptom

yces

cae

lest

is

404-Thermococcus barophilus.st.DSM 11836

277-N

oca

rdia

nova

SH

22a

320-G

eoderm

ato

philu

s o

bscuru

s.s

t.A

TC

C.2

5078

395-Haloterrigena turkmenica.st.ATCC.51198

168-Corynebacteriu

m maris

.DSM.45190

125-S

trepto

myc

es

colli

nus.

st.D

SM

40733

391-Halopiger xanaduensis.st.DSM 18323

232-Arthrobacter sp.st.FB24

411-Metagenomes

296-S

card

ovia

inopin

ata

F0304

400-Crenarchaeota group-1 archaeon SG8-32-1

11-Synechococcus sp.ATCC.27264

214-Microbacterium hydrocarbonoxydans

230-Arthrobacter globiformis NBRC 12137

195-Kineococcus radiotolerans.st.ATCC.BAA-149

19-Chryseobacterium sp.ERMR1:04

86-S

trepto

myces s

p.P

VA

94-0

7

370-Lokia

rchaeum

sp.G

C14 75

128-S

trepto

myc

es

rose

och

rom

ogenus

subsp

.osc

itans

DS

12.9

76

189-Terrabacter sp.28

89

00

2.M

SD.

mut

alu

gn

a m

uir

etc

ab

odi

fiB-

60

3

399-Crenarchaeota group-6 archaeon AD8-1

299-B

ifidobacte

rium

gallic

um

.DS

M.2

0093

75-S

trepto

myces s

p.M

g1

377-Halobacteriu

m sp.DL1

72-N

ocard

iopsis

sp.N

RR

L B

-16309

136-

Strep

tom

yces

gla

uces

cens

205-Cellulomonas gilvus.st.ATCC.13127

84-S

trepto

myces v

enezuela

e.s

t.AT

CC

.10712

291-C

ate

nulis

pora

acid

iphila

.st.D

SM

44928

380-Natro

nomonas moolapensis.st.D

SM 18674

5-Methanococcoides burtonii.st.DSM 6242

15-Streptomyces pluripotens

202-Rothia dentocariosa.st.ATCC.17931

78-S

trepto

myces s

p.N

RR

L F

-4428

342-C

onexi

bact

er w

oese

i.st.D

SM

14684

340-P

ropio

nim

icro

biu

m ly

mphophilu

m A

CS

-093-V

-SC

H5

148-

Cor

yneb

acte

rium

dipht

heria

e.st.A

TCC.7

0097

1

321-B

lasto

coccus s

axobsid

ens.s

t.D

D2

122-S

trepto

myc

es

sp.P

BH

53

146-

Bre

viba

cter

ium

mcb

relln

eri A

TCC.4

9030

4-Methanococcoides methylutens MM1

386-Haloferax volcanii.st.ATCC.29605

190-Intrasporangium calvum.st.ATCC.23552

45-Deinococcus proteolyticus.st.ATC

C.35074

179-Corynebacterium marinum.DSM.44953

100-S

trepto

myces h

imasta

tinic

us A

TC

C.5

3653

245-Rhodococcus pyridinivorans SB3094

334-A

ctin

opla

nes

friu

liensi

s.D

SM

.7358

231-Renibacterium salmoninarum.st.ATCC.33209

219-Microbacterium ketosireducens

288-A

ctin

om

yces s

p.o

ral ta

xon 1

78 s

tr.F0338

349-

Aer

opyr

um p

erni

x.st

.ATC

C.7

0089

3

273-M

ycobacte

rium

hassia

cum

.DS

M.4

4199

261-Mycobacterium

marinum

.st.ATC

C.B

AA-535

405-Thermococcus litoralis.st.ATCC.51850

283-A

ctinom

yces n

euii B

VS

029A

5

108-S

trepto

myces v

ietn

am

ensis

223-Microbacterium chocolatum

135-

Strep

tom

yces

dav

awen

sis

JCM

491

3

359-

Met

hano

bacter

ium

sp.

MB1

133-

Strep

tom

yces

sp.

e14

91-S

trepto

myces c

lavulig

eru

s.s

t.AT

CC

.27064

372-Candidatu

s Magneto

bacteriu

m b

avaric

um

279-S

acch

aro

monosp

ora

marin

a X

MU

15

20

A s

uci

dyl

se

cy

mot

pert

S-6

9

63-S

acch

aro

thrix e

spanaensis.st.A

TC

C.5

1144

152-Cory

nebacteriu

m g

lycinip

hilum

AJ

3170

123-S

trepto

myc

es

hyg

rosc

opic

us

subsp

.jinggangensi

s.st

.5008

410-Metagenomes

298-B

ifidobacte

rium

therm

ophilu

m R

BL67

71-N

oca

rdio

psis d

asso

nville

i.st.AT

CC

.23218

414-Peptococcaceae bacterium SCADC1 2 3

16-Kitasatospora setae.st.ATCC.33774

171-Corynebacterium stria

tum ATCC.6940

167-Corynebacteriu

m genitaliu

m ATCC.33030

352-

The

rmog

ladi

us c

ellu

loly

ticus

.st.1

633

371-Lokia

rchaeum

sp.G

C14 75

201-Rothia mucilaginosa.st.DY-18

25-Methanobacterium lacus.st.AL-21

220-Microbacterium trichothecenolyticum

210-Sanguibacter keddieii.st.ATCC.51767

260-Myco

bacteriu

m tu

bercu

losis.st.A

TCC.25618

6-Methanosalsum zhilinae.st.DSM 4017

164-Corynebacteriu

m mustelae

69-T

herm

om

onosp

ora

curva

ta.st.A

TC

C.1

9995

139-

Strep

tom

yces

sp.

NR

RL

WC

-361

8

89-S

trepto

myces s

p.s

t.Sire

xA

A-E

121-S

trepto

myc

es

sp.M

US

C11

9T

365-Meth

anosphaera

sta

dtmanae.s

t.ATCC

.43021

203-Kocuria rhizophila.st.ATCC.9341

295-P

ara

scard

ovia

dentic

ole

ns.D

SM

.10105

417-Brachyspira sp.CAG:700

14-Streptomyces albus.st.ATCC.21838

199-Clavibacter michiganensis.st.ATCC.33113

343-Ilu

mato

bact

er co

ccin

eus

YM

16-3

04

409-Bacillus cereus.st.ATCC.14579

156-Cory

nebacteriu

m g

lucu

ronolyt

icum

ATCC.5

1866

227-Arthrobacter chlorophenolicus.st.ATCC.700700

316-A

mycola

topsis

orienta

lis H

CC

B10007

140-

Strep

tom

yces

viri

doch

rom

ogen

es

183-Corynebacterium testudinoris

103-S

accharo

thrix s

p.S

T-8

88

145-

Stre

ptom

yces

sp.

WM

6378

149-

Cor

yneb

acte

rium

resist

ens.

st.D

SM 4

5100

355-

Hyp

erth

erm

us b

utylicus

.st.D

SM

545

655-N

ocardioides sp.CF8

332-M

icro

monosp

ora

lupin

i str.L

upac

08

276-D

ietzia

cinnam

ea P

4

10-Desulfatitalea sp.BRH c12

253-Nocardia farcinica.st.IFM

10152

208-Cellulomonas flavigena.st.ATCC.482

310-B

ifid

obacte

rium

longum

.st.N

CC

2705

67-T

herm

obifid

a fu

sca.st.Y

X

325-A

ctinopla

nes m

issouriensis

.st.A

TC

C.1

4538

269-D

ele

ted- L

0J2

N8

56-Nocardioides sp.st.B

AA

-499

401-Candidatus Caldiarchaeum subterraneum

394-Natronococcus occultus SP4

353-

The

rmos

phae

ra a

ggre

gans

.st.D

SM

114

86

224-Kocuria sp.UCD-OTCP

158-Cory

nebacteriu

m a

rgento

rate

nse.D

SM.4

4202

315-S

trepto

myces s

p.A

A4

218-Microbacterium foliorum

96

7.p

s s

ec

ym

otp

ertS-

59

354-

arch

aeon

GW

2011

AR10

222-Microbacterium laevaniformans OR221

20-Citreicella sp.357

347-

Aci

doth

erm

us c

ellu

loly

ticus

.st.A

TC

C.4

3068

57-N

oca

rdio

ides lu

teus

2-Methanomethylovorans hollandica.st.DSM 15978

362-

Met

hano

brev

ibac

ter s

p.AbM

4

141-

Strep

tom

yces

sp.

AS58

59-N

aka

mure

lla m

ultip

artita

.st.ATC

C.7

00099

74-S

trepto

myces s

p.C

48-Frankia sp.st.C

cI3

213-Microbacterium azadirachtae

118-S

trepto

myces s

p.M

MG

1121

129-S

trepto

myc

es

nodosu

s

193-Mobilicoccus pelagius NBRC 104925

82-S

trepto

myces n

iveus N

CIM

B 1

1891

138-

Strep

tom

yces

sca

biei

.st.8

7.22

344-M

icro

lunatu

s phosp

hovo

rus.

st.A

TC

C.7

00054

358-

Acidi

anus

hos

pita

lis.s

t.W1

142-

Stre

ptom

yces

gha

naen

sis AT

CC.1

4672

367-Meth

anopyrus

kandle

ri.st

.AV19

10

00

RG

A s

utar

ua

se

cy

mot

per

tS-

49

12-Desulfobacterium autotrophicum.st.ATCC.43914

229-Brachybacterium sp.SW0106-09

87-S

trepto

myces s

p.N

RR

L F

-6602

236-Tsukamurella paurometabola.st.ATCC.8368

188-Janibacter sp.HTCC2649

302-G

ard

nere

lla v

agin

alis

JC

P8481B

209-Cellulomonas fimi.st.ATCC.484

115-S

trepto

myces s

p.W

M6386

113-S

trepto

myces s

p.T

u6071

178-Corynebacterium humireducens NBRC 106098

144-

Stre

ptom

yces

cya

neog

riseu

s su

bsp.

nonc

yano

genu

s

176-Corynebacterium doosanense CAU 212

177-Corynebacterium halotolerans YIM 70093

104-S

trepto

myces s

p.N

RR

L S

-495

286-A

ctin

om

yces tu

ricensis

AC

S-2

79-V

-Col4

267-Mycobacterium

obuense

241-Rhodococcus sp.B7740

33-Vulcanisaeta distributa.st.DSM 14429

364-Meth

anobacteriu

m p

aludis

.st.D

SM 2

5820

312-A

rcanobacte

rium

haem

oly

ticum

.st.A

TC

C.9

345

200-Leifsonia xyli subsp.xyli.st.CTCB07

8-Desulfurispirillum indicum.st.ATCC.BAA-1389

336-P

ropio

nib

act

erium

pro

pio

nic

um

.st.F0230a

363-

Met

hano

brev

ibac

ter s

mith

ii.st

.PS

166-Corynebacteriu

m pseudotuberculosis.st.C231

259-Mycobacterium

kansasii ATCC.12478

282-A

ctinom

yces sp

.ora

l taxo

n 4

48 str.F

0400

13-Streptomyces viridochromogenes

381-Halobacteriu

m sp.DL1

403-Thermococcus gammatolerans.st.DSM 15229

184-Corynebacterium uterequi

330-M

icro

monospora

sp.H

K10

110-S

trepto

myces v

irid

ochro

mogenes.s

t.D

SM

40736

76-S

trepto

myces s

p.W

M4235

60-S

acch

aro

polysp

ora

eryth

raea.st.A

TC

C.11

635

328-M

icro

monospora

aura

ntiaca.s

t.A

TC

C.2

7029

242-Rhodococcus aetherivorans

407-Pyrococcus furiosus.st.ATCC.43587

83-S

trepto

myces s

p.N

RR

L F

-6491

350-

Pyr

olob

us fu

mar

ii.st

.DSM

112

04

385-halophilic archaeon DL31

3-Methanolobus psychrophilus R15

170-Corynebacterium sp.KPL1855

77-S

trepto

myces s

p.N

RR

L S

-4441

27-S

trepto

myc

es

grise

oaura

ntia

cus

M045

297-S

card

ovia

wig

gsia

e F

0424

301-G

ard

nere

lla v

agin

alis

.st.A

TC

C.1

4019

413-Rhodospirillaceae bacterium BRH c57

368-Therm

ofilum

sp.1

910b

268-Mycobacterium

vanbaalenii.st.DS

M 7251

398-Halomicrobium mukohataei.st.ATCC.700874

422-Ricinus communis

186-Kytococcus sedentarius.st.ATCC.14392

314-A

ctinotignum

schaalii

240-Rhodococcus equi ATCC.33707

147-

Cor

yneb

acte

rium

mat

ruch

otii AT

CC.1

4266

40-Firmicutes bacterium

CAG

:114

8F

mu

gn

ol.p

sb

us

mu

gn

ol m

uiret

ca

bo

difiB-

90

3

389-Halorubrum lacusprofundi.st.ATCC.49239

105-S

trepto

myces s

p.C

NQ

-509

234-Micrococcus luteus.st.ATCC.4698

361-

Met

hano

brev

ibac

ter r

umin

antiu

m.s

t.ATC

C.3

5063

130-S

trepto

myc

es

coelic

olo

r.st

.ATC

C.B

AA

-471

335-A

ctin

opla

nes

sp.A

TC

C.3

1044

263-Mycobacterium

sp.EPa45

251-Gordonia rhizosphera N

BRC 16068

235-marine actinobacterium PHSC20C1

338-P

ropio

nib

act

erium

acn

es.

st.K

PA

171202

318-S

accharo

monospora

virid

is.s

t.A

TC

C.1

5386

150-

Cor

yneb

acte

rium

ure

alyt

icum

.st.A

TCC.4

3042

382-Halobacterium salinarum.st.A

TCC.700922

157-Cory

nebacteriu

m a

typicu

m

198-Frigoribacterium sp.RIT-PI-h

24-Burkholderia sp.YI23

116-S

trepto

myces a

verm

itili

s.s

t.A

TC

C.3

1267

272-M

ycobacte

rium

rhodesia

e.st.N

BB

3

119-S

trepto

myc

es

sp.M

US

C136T

246-Nocardia brasiliensis ATCC.700358

397-Halostagnicola larsenii XH-48

248-Gordonia am

arae NBRC 15530

303-B

ifidobacte

rium

scard

ovii

374-Candidatus M

agnetobacterium bavaric

um

134-

Strep

tom

yces

man

grov

isol

i

252-Nocardia cyriacigeorgica.st.G

UH-2

81-S

trepto

myces ts

ukubensis

NR

RL18488

331-V

err

uco

sisp

ora

maris.

st.A

B-1

8-0

32

392-Natrinema sp.st.J7-2

357-

Sul

folo

bus

solfa

taric

us.s

t.ATC

C.3

5092

18-Amphimedon queenslandica

383-Natronomonas pharaonis.st.A

TCC.35678

280-A

ctinom

yces m

assilie

nsis F

0489

300-B

ifidobacte

rium

dentiu

m.s

t.AT

CC

.27534

207-Isoptericola variabilis.st.225

327-A

ctinopla

nes s

p.N

902-1

09

191-Janibacter hoylei PVAS-1

239-Gordonia sputi.NBRC.100414

326-P

seudonocard

ia d

ioxaniv

ora

ns.s

t.A

TC

C.5

5486

255-Mycobacterium

xenopi RIV

M700367

264-Mycobacterium

sinense.st.JDM

601

421-Amphimedon queenslandica

21-Brevundimonas sp.AAP58

90-S

trepto

myces p

rate

nsis

.st.A

TC

C.3

3331

26-Lyngbya sp.st.PCC 8106

408-Methanocaldococcus jannaschii.st.ATCC.43067

238-Gordonia polyisoprenivorans.st.DSM 44266

393-Natrinema pellirubrum.st.DSM 15624

412-Thiothrix nivea.DSM.5205

187-Dermacoccus nishinomiyaensis

31-Pyrobaculum aerophilum

.st.ATCC.51768

311-T

ruepere

lla p

yogenes

155-Cory

nebacteriu

m k

roppenst

edtii.st

.DSM

44385

356-

Sul

folo

bus

toko

daii.st

.DSM

169

93

180-Corynebacterium imitans

317-A

mycola

topsis

mediterr

anei.st.S

699

323-K

ribbella

fla

vid

a.s

t.D

SM

17836

221-Microbacterium ginsengisoli

42-Deinococcus sw

uensis

254-Mycobacterium

elephantis

174-Corynebacterium xerosis

137-

Act

inob

acte

ria b

acte

rium

OK07

4

165-Corynebacteriu

m vitaeruminis.D

SM.20294

351-

Sta

phyl

othe

rmus

mar

inus

.st.A

TC

C.4

3588

120-S

trepto

myc

es

svic

eus

AT

CC

.29083

68

45

2.C

CT

A sil

ari

ps

ea

nit

sir

p s

ec

ym

otp

ert

S-2

9

101-K

itasato

spora

seta

e.s

t.A

TC

C.3

3774

366-Meth

anobacteriu

m la

cus.

st.A

L-21

11

0D

A.t

s.si

tc

al.

ps

bu

s sil

ami

na

mui

ret

ca

bo

difi

B-4

03

289-A

ctin

om

yces u

rogenita

lis.D

SM

.15434

271-M

ycobacte

rium

sp.st.K

MS

88-S

trepto

myces g

riseus s

ubsp.g

riseus.s

t.JC

M 4

626

1-Firmicutes bacterium CAG:582

337-P

ropio

nib

act

erium

aci

dip

ropio

nic

i.st.ATC

C.4

875

237-Rhodococcus sp.RD6.2

28-Bacillus decisifrondis

339-P

rop fre

udenre

ichii

subsp

.sherm

anii.

st.A

TC

C.9

614

192-Kineosphaera limosa NBRC 100340

41-Frankia sp.EUN1f

98-S

trepto

myces b

ingchenggensis

.st.B

CW

-1

61-A

myco

licicoccu

s subfla

vus.st.D

SM

45089

217-Microbacterium oxydans

54-Nocardioides sim

plex

294-B

ifidobacte

rium

kashiw

anohense P

V20-2

262-Mycobacterium

haemophilum

345-

Act

inob

acte

ria b

acte

rium

IMC

C26

256

375-Rhizophagus irr

egularis D

AOM 197198w

228-Brachybacterium faecium.st.ATCC.43885

23-Burkholderia vietnamiensis.st.G4

Tree scale

  9 

Supplementary Figure 7. Phylogenetic analyses of NucS N-terminal (NT) region. Unrooted ML

tree of NT sequences from the PFAM PF01939 domain. Black branches, Actinobacteria; red

Deinococcus-Thermus; light blue, Archaea. Labels indicate species name. The circles represent

>80% bootstrap in 1,500 replicates.

115-Bifidobacterium_breve.DSM.20213

34-Streptom

yces_albus.st.ATCC.21838

166-Natronomonas_pharaonis.st.ATCC.35678

97-Amycolicicoccus_subflavus.st.DSM_45089

289-Gordonia_amarae.NBRC.15530

254-Curtobacterium_flaccumfaciens_UCD-AKU

273-Mycobacterium_hassiacum.DSM.44199

108-Bifidobacterium_bifidum_BGN4

42-Streptomyces_sp.NRRL_S-495

84-Amycolatopsis_orientalis_HCCB10007

262-

325-Coryneba

cterium

_vitaerum

inis.DS

M.2029

4

6-Streptomyces_lydicus_A02

131-Halomicrobium_mukohataei.st.ATCC.700874

308-Corynebacterium_humireducens.NBRC.106098_=.DSM.45392

159-Methanobrevibacter_

smithii.st.PS

247-Arthrobacter_sp.PAMC25486

120-Actinomyces_sp.oral_taxon_848_str.F0332

285-Gordonia_sputi.NBRC.100414

126-halophilic_archaeon_DL31

161-Methanosphaera_stadtmanae.st.ATC

C.43021

179-Halogeometricum_borinquense.st.ATCC.700274

12-Streptomyces_sp.NRRL_WC-3618

337-Coryne

bacterium_je

ikeium.st.K4

11

143-Na

tronoba

cterium

_grego

ryi.st.AT

CC.43098

344-Propionibacterium_acidipr

opionici.st.ATCC.4875

151-Pyroco

ccus_abyss

i.st.GE5

93-Nocardia_nova_SH22a

216-Microbacterium

_ketosireducens

354-Thermobifida_fusca.st.YX

279-Mycobacterium_bovis

226-Cellulomonas_gilvus.st.A

TCC.13127

40-Streptomyces_caelestis

358-Frankia_sp.st.EuI1c

208-Actinomyces_sp.oral_taxon_178_str.F0338

170-Halomicrobium_mukohataei.st.ATCC.700874

43-Streptomyces_rubellomurinus

76-Streptomyces_zinciresistens_K42

1-Streptomyces_bingchenggensis.st.BCW-1

299-Corynebacterium_xerosis

275-Mycobacterium_marinum.st.ATCC.BAA-535

315-Corynebacterium_aurimucosum.st.ATCC.700975

51-Actinoplanes_misso

uriensis.st.A

TCC.14538

78-Pseudonocardia_dioxanivorans.st.ATCC.55486

293-Rhodococcus_sp.B7740

101-Brevibacterium_mcbrellneri.ATCC.49030

230-Iso

ptericola_variabilis.st.2

25

21-Actinobacteria_bacterium_OK006

74-Actinobacteria_bacterium_OV320

242-Brachybacterium_faecium.st.ATCC.43885

133-Halopiger_xanaduensis.st.DSM_18323

45-Verrucosisp

ora_maris.st.A

B-18-032

105-Gardnerella_vaginalis_JCP8481B

220-Microbacterium

_sp.SA39

110-Bifidobacterium_pseudolongum_PV8-2

272-Mycobacterium_neoaurum_VKM_Ac-1815D

227-Cellulomonas_flavigena.st.A

TCC.482

152-Methan

ocaldococc

us_jannasc

hii.st.ATCC

.43067

33-Streptom

yces_glaucescens

46-Micromonospora_sp.HK10

212-Microbacterium

_laevaniformans_O

R221

165-Methanobacterium_sp.MB1

66-Streptomyces_sp.PBH53

198-Methanocella_paludicola.st.DSM_17711

318-Corynebacterium_glutamicum.st.ATCC.13032

319-Corynebacterium_argentoratense.DSM.44202

180-Haloferax_volcanii.st.ATCC.29605

326-Coryneba

cterium

_kutscheri

79-Saccharopolyspora_erythraea.st.ATCC.11635

13-Streptomyces_avermitilis.st.ATCC.31267

69-Streptomyces_pratensis.st.ATCC.33331

209-Actinomyces_sp.oral_taxon_180_str.F0310

122-Halobacterium_salinarum.st.ATCC.700922

7-Streptomyces_auratus_AGR0001

50-Actinoplanes_sp.st.A

TCC.31044

304-Turicella_otitidis.ATCC.51513

39-Streptomyces_griseoflavus_Tu4000

113-Bifidobacterium_kashiwanohense_PV20-2

169-Haloarcula_marismortui.st.ATCC.43049

184-Caldivirga_maquilingensis.st.ATCC.700844

144-Ha

lobacter

ium_sa

linarum

.st.ATCC

.700922

47-Micromonospora_sp.NRRL_B-16802

11-Streptomyces_sp.AS58

250-Arthrobacter_chlorophenolicus.st.ATCC.700700

67-Streptomyces_hygroscopicus_subsp.jinggangensis.st.5008

53-Micromonospora_lupini_str.Lupac_08

27-Streptom

yces_sp.C

17-Streptomyces_tsukubensis_NRRL18488

96-Stackebrandtia_nassauensis.st.DSM_44728

200-Conexibacter_woesei.st.DSM_14684

215-Microbacterium

_trichothecenolyticum

291-Rhodococcus_aetherivorans

162-Methanothermobacter_thermautotrophicus.st.ATCC.2909

6

44-Saccharothrix_sp.ST-888

360-Frankia_symbiont_subsp.Datisca_glomerata

38-Streptom

yces_griseus_subsp.griseus.st.JCM_4626

119-Rothia_mucilaginosa.st.DY-18

132-Halovivax_ruber.st.DSM_18193

248-Arthrobacter_globiformis.NBRC.12137

61-Streptomyces_mangrovisoli

52-Actinoplanes_friuliensis.DSM.7358

62-Streptomyces_cyaneogriseus_subsp.noncyanogenus

98-Kutzneria_albida.DSM.43870

30-Streptom

yces_sp.Mg1

37-Streptom

yces_vietnamensis

48-Micromonospora_aurantiaca.st.A

TCC.27029

300-Corynebacterium_sp.ATCC.6931

332-Cory

nebacteri

um_varia

bile.st.D

SM_4470

2

71-Streptomyces_sp.MMG1121

362-Streptomyces_violaceusniger_Tu_4113

234-Austwickia

_chelonae.NBRC.105200

172-Natrinema_pellirubrum.st.DSM_15624

218-Microbacterium

_azadirachtae

178-halophilic_archaeon_DL31

194-Deinococcus_swuensis

14-Streptomyces_nodosus

245-Mobiluncus_curtisii.st.ATCC.43063

25-Streptomyces_variegatus

100-Kytococcus_sedentarius.st.ATCC.14392

160-Methanobrevibacter_sp.Ab

M4

357-Frankia_sp.st.CcI3

127-Haloarcula_marismortui.st.ATCC.43049

134-Halostagnicola_larsenii_XH-48

189-Staphylothermus_marinus.st.ATCC.43588

309-Corynebacterium_marinum.DSM.44953

139-Therm

ofilum_pe

ndens.st.H

rk_5

295-Rhodococcus_equi.ATCC.33707

5-Streptomyces_sp.769

298-Dietzia_cinnamea_P4

345-Propionimicrobium_lymphophilum_AC

S-093-V-SCH5

322-Corynebacterium_pseudotuberculosis.st.C231

258-Leifsonia_xyli_subsp.xyli.st.CTCB07

4-Streptomyces_rimosus_subsp.rimosus.ATCC.10970

111-Bifidobacterium_dentium.st.ATCC.27534

123-Halorubrum_lacusprofundi.st.ATCC.49239

190-Hyperthermus_butylicus.st.DSM_5456

241-Brachybacterium_sp.SW0106-09

361-Frankia_sp.st.EAN1pec

103-Nocardioides_luteus

36-Streptom

yces_sp.NRRL_F-6491

228-Jonesia_denitrifica

ns.st.A

TCC.14870

367-Streptomyces_sp.Tu6071

181-Haloquadratum_walsbyi.st.DSM_16790338-Coryneba

cterium_lipoph

iloflavum.DSM

.44291

233-Kineosphaera_limosa.NBRC.100340

114-Bifidobacterium_animalis_subsp.lactis.st.AD011

60-Streptomyces_sp.e14

321-Corynebacterium_diphtheriae.st.ATCC.700971

333-Cory

nebacteri

um_glyci

niphilum_

AJ_3170

70-Streptomyces_incarnatus

349-Nocardiopsis_sp.NRRL_B-16309

264-Mycobacterium_sp.st.KMS

288-Gordonia_rhizosphera.NBRC.16068

252-Arthrobacter_arilaitensis.st.DSM_16368

83-Amycolatopsis_mediterranei.st.S699

195-Deinococcus_gobiensis.st.DSM_21396

31-Streptom

yces_sp.WM6372

290-Gordonia_sihwensis.NBRC.108236

137-Haloquadratum_walsbyi.st.DSM_16790

278-Mycobacterium_tuberculosis.st.ATCC.25618

94-Nocardia_brasiliensis.ATCC.700358

281-Mycobacterium_haemophilum

327-Co

ryneba

cterium

_haloto

lerans_

YIM_70

093

282-Mycobacterium_paratuberculosis.st.ATCC.BAA-968

294-Rhodococcus_erythropolis.st.PR4

121-Actinomyces_neuii_BVS029A5

125-Haloferax_volcanii.st.ATCC.29605

77-Streptomyces_cattleya.st.ATCC.35852

246-Kocuria_sp.UCD-OTCP

155-Thermoco

ccus_barophilu

s.st.DSM_1183

6

206-Actinotignum_schaalii

22-Streptomyces_sp.W

M6378

365-Streptomyces_sp.NRRL_F-6602

55-Catenulispora_acidiphila.st.DSM_44928

259-Streptomyces_sp.AA4

263-Mycobacterium_elephantis

138-Thermofilum_sp.1910b

91-Nocardia_farcinica.st.IFM_10152

168-Halalkalicoccus_jeotgali.st.DSM_18796

146-Me

thanoce

lla_palu

dicola.st.DS

M_1771

1

204-Actinomyces_m

assiliensis_F0489

302-Corynebacterium_ureicelerivorans

323-Coryneba

cterium

_matruchotii.ATCC.14266

297-Rhodococcus_pyridinivorans_SB3094

202-Actinomyces_sp.oral_taxon_175_str.F0384

346-Propionibacterium_propionicum.st.F0230a

148-Acid

ianus_ho

spitalis.st

.W1

320-Corynebacterium_efficiens.st.DSM_44549

154-Thermoc

occus_gamm

atolerans.st.D

SM_15229

92-Nocardia_cyriacigeorgica.st.GUH-2

229-Cellulomonas_fimi.st.A

TCC.484

140-miscellaneou

s_Crenarchae

ota_group

-6_archae

on_AD8-1

54-Salinispora_tropica.st.ATCC.BAA-916

49-Actinoplanes_sp.N902-109

185-Thermoproteus_sp.AZ2

188-Thermosphaera_aggregans.st.DSM_11486

106-Bifidobacterium_thermophilum_RBL67

329-Co

rynebac

terium_s

p.KPL1

989

142-Halobacteriu

m_sp.DL1

28-Streptom

yces_katrae

102-Aeromicrobium_marinum.DSM.15272

219-Microbacterium

_sp.Ag1

336-Coryne

bacterium_

resistens.s

t.DSM_4510

0

210-Arcanobacterium

_haemolyticum

.st.ATCC.9345

277-Mycobacterium_parascrofulaceum.ATCC.BAA-614

213-Microbacterium

_testaceum.st.StLB037

312-Corynebacterium_casei_LMG_S-19264

117-Nocardioides_sp.st.BAA-499

348-Nocardiopsis_dassonvillei.st.ATCC.23218

328-Co

rynebac

terium_t

estudino

ris

271-Mycobacterium_smegmatis.st.ATCC.700084

239-Kineococcus_radiotolerans.st.ATCC.BAA-149

58-Streptomyces_niveus_NCIMB_11891

301-Corynebacterium_epidermidicanis

72-Streptomyces_sp.MUSC136T

368-Streptomyces_clavuligerus.st.ATCC.27064

355-Acidothermus_cellulolyticus.st.ATCC.43068

225-Xylanimonas_cellulosilytica

.st.DSM_15894

158-Methanobrevibac

ter_ruminantium.st.AT

CC.35063

317-Corynebacterium_uterequi

211-Trueperella_pyogenes

221-Microbacterium

_hydrocarbonoxydans

99-Nakamurella_multipartita.st.ATCC.700099

356-Frankia_alni.st.ACN14a

196-Deinococcus_proteolyticus.st.ATCC.35074

116-Nocardioides_sp.CF8

59-Streptomyces_pluripotens

2-Streptomyces_sp.CNQ-509

141-miscellaneou

s_Crenarchae

ota_group

-1_archae

on_SG8-32-1

81-Saccharomonospora_marina_XMU15

244-Kocuria_rhizophila.st.ATCC.9341

65-Streptomyces_roseochromogenus_subsp.oscitans_DS_12.976

217-Microbacterium

_ginsengisoli

269-Mycobacterium_vanbaalenii.st.DSM_7251

128-Natronomonas_moolapensis.st.DSM_18674

193-Marinithermus_hydrothermalis.st.DSM_14884

75-Streptomyces_viridochromogenes

347-Microlunatus_phosphovorus.st.ATCC.700054

145-Me

thanoce

lla_arvo

ryzae.st

.DSM_2

2066

359-Frankia_sp.EUN1f

10-Streptomyces_griseoaurantiacus_M045

9-Streptomyces_pristinaespiralis.ATCC.25486

223-Microbacterium_oxydans

237-Janibacter_hoylei_PVAS-1

306-Corynebacterium_glucuronolyticum.ATCC.51866

118-Rothia_dentocariosa.st.ATCC.17931

23-Streptomyces_sviceus.ATC

C.29083

176-Natronococcus_occultus_SP4

56-Kitasatospora_setae.st.ATCC.33774

311-Corynebacterium_doosanense_CAU_212_=.DSM.45436

296-Rhodococcus_jostii.st.RHA1

173-Natronobacterium_gregoryi.st.ATCC.43098

174-Natrinema_sp.st.J7-2

270-Mycobacterium_obuense

283-Mycobacterium_sp.EPa45

183-Vulcanisaeta_distributa.st.DSM_14429

32-Streptom

yces_sp.NRRL_S-444

268-Mycobacterium_thermoresistibile.ATCC.19527

149-Sulfo

lobus_ac

idocaldari

us.st.ATC

C.33909

314-Corynebacterium_camporealensis

8-Streptomyces_xiamenensis

340-

366-Streptomyces_sp.st.SPB074

164-Methanobacterium_paludis.st.DSM_25820

186-Pyrobaculum_aerophilum.st.ATCC.51768

257-Frigoribacterium_sp.RIT-PI-h

207-Actinomyces_turicensis_AC

S-279-V-Col4

87-Saccharothrix_espanaensis.st.ATCC.51144

205-Actinomyces_sp.oral_taxon_448_str.F0400

240-Beutenbergia_cavernae.st.ATCC.BAA-8

109-Bifidobacterium_gallicum.DSM.20093

286-Gordonia_bronchialis.st.ATCC.25592

182-Ignisphaera_aggregans.st.DSM_17230

197-Thermogladius_cellulolyticus.st.1633

334-Cory

nebacteriu

m_urealyt

icum.st.AT

CC.43042

260-Tsukamurella_paurometabola.st.ATCC.8368

112-Bifidobacterium_adolescentis.st.ATCC.15703

80-Saccharomonospora_glauca_K62

64-Streptomyces_viridochromogenes.st.DSM_40736

305-Corynebacterium_genitalium.ATCC.33030

15-Streptomyces_sp.MMG1533

86-Actinosynnema_mirum.st.ATCC.29888

324-Coryneba

cterium

_mustelae

351-Streptosporangium_roseum.st.ATCC.12428

287-Gordonia_sp.KTR9

167-Natronomonas_moolapensis.st.DSM_18674

82-Saccharomonospora_viridis.st.ATCC.15386

104-Nocardioides_simplex

187-Candidatus_Caldiarchaeum_subterraneum

316-Corynebacterium_striatum.ATCC.6940

352-Thermobispora_bispora.st.ATCC.19993

20-Streptomyces_sp.W

M4235

280-Mycobacterium_leprae.st.TN

343-Propionibacterium_f

reudenreichii_subsp.sher

manii.st.ATCC.9614

147-Sul

folobus_s

olfataricu

s.st.ATC

C.35092

214-Microbacterium

_chocolatum

156-Thermococc

us_litoralis.st.AT

CC.51850

235-Mobilico

ccus_pelagius.NBRC.104925

364-Streptomyces_sp.PVA_94-07

129-Natronomonas_pharaonis.st.ATCC.35678

171-Halostagnicola_larsenii_XH-48

339-Coryneba

cterium_kropp

enstedtii.st.DS

M_44385

267-Mycobacterium_xenopi_RIVM700367

255-Clavibacter_michiganensis_subsp.sepedonicus.st.ATCC.33113

342-Geodermatophilu

s_obscurus.st.ATCC.

25078

107-Bifidobacterium_angulatum.DSM.20098

153-Pyrococ

cus_furiosus

.st.ATCC.43

587

199-archaeon_GW2011_AR10

150-Sulfo

lobus_toko

daii.st.DSM

_16993

191-Deinococcus_maricopensis.st.DSM_21211

89-Lechevalieria_aerocolonigenes

157-Thermococcu

s_kodakarensis.st.

ATCC.BAA-918

276-Mycobacterium_kansasii.ATCC.12478

19-Actinobacteria_bacterium_OV450

3-Streptomyces_sp.NRRL_F-6602

249-Arthrobacter_sp.st.FB24

73-Streptomyces_sp.WM6386

238-Janibacter_sp.HTCC2649

253-Leucobacter_sp.Ag1

307-Corynebacterium_atypicum

135-Natrialba_magadii.st.ATCC.43099

57-Streptomyces_davawensis_JCM_4913

256-marine_actinobacterium_PHSC20C1

85-Nocardia_sp.NRRL_S-836

335-Coryne

bacterium_

falsenii.DSM

.44353

266-Mycobacterium_rhodesiae.st.NBB3

63-Streptomyces_collinus.st.DSM_40733

35-Actinobacteria_bacterium

_OK074

68-Streptomyces_sp.st.SirexAA-E

203-Actinomyces_urogenitalis.DSM

.15434

350-Nocardiopsis_alba.st.ATCC.BAA-2165

284-Gordonia_polyisoprenivorans.st.DSM_44266

303-Corynebacterium_imitans

331-Ilum

atobacte

r_coccin

eus_YM1

6-304

330-Act

inobacte

ria_bac

terium_I

MCC26

256

236-Dermacoccus_nishinomiyaensis

261-Mycobacterium_abscessus.st.ATCC.19977

16-Streptomyces_venezuelae.st.ATCC.10712

310-Corynebacterium_maris.DSM.45190

243-Micrococcus_luteus.st.ATCC.4698

29-Streptom

yces_sp.NRRL_F-4428

313-Corynebacterium_sp.KPL1855

224-Sanguibacter_keddieii.st.A

TCC.51767

201-Actinomyces_graevenitzii_C83

163-Methanobacterium_lacus.st.AL-21

353-Thermomonospora_curvata.st.ATCC.19995

124-Halogeometricum_borinquense.st.ATCC.700274

95-Kribbella_flavida.st.DSM_17836

251-Renibacterium_salmoninarum.st.ATCC.33209

175-Halopiger_xanaduensis.st.DSM_18323

231-Terrabacter_sp.28

88-Saccharothrix_sp.NRRL_B-16348

24-Streptomyces_sp.M

USC119T

363-Streptomyces_himastatinicus.ATCC.53653

292-Rhodococcus_sp.RD6.2

130-Halorhabdus_tiamatea_SARL4B

274-Mycobacterium_sp.VKM_Ac-1817D

90-Segniliparus_rotundus.st.ATCC.BAA-972

222-Microbacterium

_foliorum

341-Blastococcus

_saxobsidens.st.D

D2

26-Streptom

yces_ghanaensis.ATCC.14672

192-'Deinococcus_soli'_Cha_et_al._2014

41-Streptomyces_coelicolor.st.A

TCC.BAA-471

136-Natronobacterium_gregoryi.st.ATCC.43098

265-Mycobacterium_sinense.st.JDM601

177-Halorubrum_lacusprofundi.st.ATCC.49239

232-Intrasporangium_calvum.st.A

TCC.23552

18-Streptomyces_scabiei.st.87.22

Tree scale: 1

ACTINOBACTERIA A

RCHAEA

ARCHAEA

Deinococcus

Deinococcus

  10 

Supplementary Tables

Supplementary Table 1. Mutation rates of M. smegmatis and its ∆nucS derivatives. Rates of

spontaneous mutations conferring rifampicin (Rif-R) and streptomycin resistance (Str-R) of M.

smegmatis mc2

155 (WT), M. smegmatis ∆nucS and M. smegmatis ∆nucS complemented with the

wild-type nucS from M. smegmatis mc2

155 (nucSSm). Fold change indicates the increase in

mutation rate with respect to the strain M. smegmatis mc2

155 (set to 1). Mut Rate: mutation rate

(mutations per cell per generation).

Strain  Genotype Mut Rate 

Rif 95% CI  Fold 

Mut Rate 

Str 95% CI  Fold 

mc2 155  WT  2.07x10‐9  1.47‐2.74x10‐9  1  5.58x10‐10  2.25‐11.1x10‐10  1 

∆nucS  ∆nucS  3.11x10‐7  2.81‐3.39x10‐7  150.2  4.82x10‐8  3.53‐6.17x10‐8  86.3 

∆nucS/nucSSm  Complemented  3.02x10‐9  2.2‐3.97x10‐9  1.46  1.43x10‐9  0.77‐2.33x10‐9  2.56 

  11 

Supplementary Table 2. Mutational spectrum of M. smegmatis mc2 155 and its ∆nucS derivative.

A) Spontaneous mutations conferring rifampicin resistance found in 41 independent Rif-R mutants in

the WT and ∆nucS strains. The columns show the position of the mutations in the rpoB gene sequence,

the codon change (modified bases are in bold), the amino acid change caused by each mutation and the

number of independent Rif-R mutants isolated from mc2 155 (WT) or its ΔnucS derivative. B) Specificity

of base substitutions, summarized by class, produced in the WT and ∆nucS strains.

A)

Position

Codon change

Amino acid

change

WT

ΔnucS

A 1295 G

GAC→GGC

Asp 432 Gly

0

1

C 1324 T

CAC→TAC

His 442 Tyr

6

6

C 1324 G

CAC→GAC

His 442 Asp

2

0

A 1325 G

CAC→CGC

His 442 Arg

13

20

A 1325 C

CAC→CCC

His 442 Pro

4

0

G 1334 A

CGT →CAT

Arg 445 His

0

6

G 1334 C

CGT→CCT

Arg 445 Pro

1

0

G 1334 T

CGT→CTT

Arg 445 Leu

2

0

C 1340 T

TCG→TTG

Ser 447 Leu

9

6

C 1340 G

TCG→TGG

Ser 447 Trp

2

0

T 1346 C

CTG→CCG

Leu 449 Pro

0

2

G 1348 A

GGC→AGC

Gly 450 Ser

1

0

Δ CCAGCTGTC

(1275-1283)

CCAGCTGTC

Ser 425 Arg

+ ΔGlnLeuSer

(426-428)

1

0

TOTAL

41

41

  12 

B)

 

 

 

 

Mutation WT (%)

n=41

∆nucS (%)

n=41

G:C→A:T 16 (39.0) 18 (43.9)

A:T→G:C 13 (31.7) 23 (56.1)

G:C→T:A 2 (4.9) 0

A:T→T:A 0 0

G:C→C:G 5 (12.2) 0

A:T→C:G 4 (9.7) 0

Deletions 1 (2.4) 0

  13 

Supplementary Table 3. Mutation rates of S. coelicolor A3(2) M145 and its ∆nucS

derivatives.  Rates of spontaneous mutations conferring rifampicin (Rif-R) and streptomycin

resistance (Str-R) of S. coelicolor A3(2) M145, its ∆nucS derivative  and  S. coelicolor ∆nucS

complemented with the wild-type nucS from S. coelicolor (nucSSco). 95% confidence intervals (CI)

are indicated. Mut Rate: mutation rate (mutations per cell per generation).

Strain Genotype Mut Rate

Rif-R 95% CI Fold

Mut Rate

Str-R 95% CI Fold

S. coelicolor

A3(2) M145 WT 5.11x10

-9 2.87-8.01x10

-9 1 4.30x10

-10 0.24-18.9x10

-10 1

∆nucS ∆nucS 5.54x10-7

4.20-6.97x10-7

108.4 8.49x10-8

5.78-11.4x10-8

197.4

∆nucS/nucSSco Complemented 2.01x10-9

0.83-3.88x10-9

0.5 4.30x10-10

0.24-18.9x10-10

1

  14 

Supplementary Table 4. Characteristics of the M. tuberculosis representative strains

containing NucS polymorphisms. Genome ID/common name, NucS polymorphism, resistance

profile, lineage and origin of the strains are shown. Resistance profile indicates not

detected/unknown antibiotic resistances (Susceptible) or MDR, multidrug-resistant strain

(expressing at least rifampicin and isoniazid resistance).

 

 

 

GenomeID/

name Polymorphism

Resistance

profile Lineage Origin

CDC1551 WT Susceptible 4 North America

TKK_02_0079 S39R MDR 4 South Africa

MTB_N1057 S54I Susceptible 4 South Asia

KT-0040 A67S Susceptible 2 S. Korea (Broad Inst)

ERR036236 V69A Susceptible 1 Unknown

BTB 04-388 A135S MDR 3 Sweden (Broad Inst)

BTB 07-246 R144S MDR 4 Sweden (Broad Inst)

TKK_03_0044 D162H Susceptible 4 South Africa

HN2738 T168A Unknown Unknown Unknown (Broad Inst)

MTB_X632 K184E MDR 4 Central America

  15 

Supplementary Table 5. Effect of M. tuberculosis NucS naturally occurring polymorphisms on

mutation rates. Rates of spontaneous mutations conferring rifampicin resistance (Mut rate,

mutations per cell per generation) of M. smegmatis ∆nucS complemented with the wild-type nucS

from M. tuberculosis (nucSTB) or the 9 polymorphic alleles. 95% confidence intervals (CI) are

shown. Fold change indicates the increase in mutation rate with respect to the strain M. smegmatis

∆nucS complemented with wild-type nucSTB (∆nucS/nucSTB), set to 1.

 

Amino acid

change Codon change Mut rate 95% CI

Fold

change

M. smegmatis

∆nucS/nucSTB

nucSTB from

CDC1551 4.17x10

-9 3.29-5.12x10

-9 1

S39R AGC→AGG 3.49x10-7

2.93-4.0x10-7

83.7

S54I AGT→ATT 7.94x10-9

5.49-11.07x10-9

1.9

A67S GCG→TCG 2.78x10-9

1.87-3.84x10-9

0.7

V69A GTG→GCG 8.77x10-9

6.50-11.2x10-9

2.1

A135S GCG→TCG 3.57x10-8

2.77-4.38x10-8

8.6

R144S CGC→AGC 3.16x10-8

2.40-4.0x10-8

7.6

D162H GAC→CAC 8.98x10-9

6.42-11.80x10-9

2.2

T168A ACC→GCC 3.96x10-8

3.09-4.81x10-8

9.5

K184E AAG→GAG 3.06x10-8

2.37-3.75x10-8

7.3

 

  16 

Supplementary Table 6. Sequence of oligonucleotides used in this work.

Oligonucleotide Sequence (5’-3’) Purpose

SecMycomar CCCGAAAAGTGCCACCTAAATTGTAAGCG Localization of Tn insertion site

DelnucSm5F CCCGCTGCAGCTGGCCGAGTTCGG nucSSm M. smegmatis deletion

DelnucSm5R CGGTAAGCTTGGCTATCACGAGGCGCACCC nucSSm M. smegmatis deletion

DelnucSm3F CGGAAAGCTTAGCGACGAGTACCGGCTCTT nucSSm M. smegmatis deletion

DelnucSm3R AATCGTCGACCGAACCCATCAACTTACCGA nucSSm M. smegmatis deletion

CompnucTBF ACTGGAATTCTCGAGTGGTGGCCTTCTCGGATGGCAT nucSTB for ΔnucSSm complementation

CompnucTBR ACTGAAGCTTTCAGAACAGCCGGTACTCGCCGCT nucSTB for ΔnucSSm complementation

CompnucSmF ACTGGAATTCCCGCGCCAGCGAATTGTCGGCGTTCAT nucSSm for ΔnucSSm complementation

CompnucSmR CGGCAAGCTTTCAGAAGAGCCGGTACTCGTCGCT nucSSm for ΔnucSSm complementation

RifRRDRrpoBF

GTGGCGGCGATCAAGGAGTTCTTC RRDR-rpoBSm amplification

RifRRDRrpoBR

GGCGACCGACACCATCTGGCGCGG RRDR-rpoBSm amplification

DelnucSco5F

GTGTAAGCTTCGGCACCGCGGTGAGTGTGC nucSSco S. coelicolor deletion

DelnucSco5R CGACGGATCCGCGGGCAATGACGAGACGCA nucSSco S. coelicolor deletion

DelnucSco3F GCTGGGATCCTTCTGAGGGCGCGACGCGTC nucSSco S. coelicolor deletion

DelnucSco3R CAGGGATATCTGCCCGCCCTGGTCGGCGAG nucSSco S. coelicolor deletion

CompnucScoF TACGGAATTCGGGTTCTCCTCTCGCACCCCCGACCAGCAGGGG nucSSco for ΔnucSSco complementation

CompnucScoR TACGGGATCCTCAGAACAGCCGCAGCTTGTCGTCCTCGATGCC nucSSco for ΔnucSSco complementation

Rechyg3F CAGTTTCATTTGATGCTCGATGAG Recombination. pRhomyco 100%

Rechyg3R GACTAACTAGTCAGGCGCCGGGGGCGGTGT Recombination. pRhomyco 100%

Rechyg5F GATGCAGCTGGGAGTGCGCTGTGACACAAGAATCC Recombination. pRhomyco 100%

Rechyg5R CGATAAGCTTCGAATTCTGCAGCTCG Recombi. pRhomyco 100%, 95%, 90% and

85%

Rechyg5intR GATGTGATCACCGGGTCGGGCTCG Recombi. pRhomyco 95%, 90% and 85%

Rec95-BclIF GATGTGATCAAGCTGTTCGGGGAGCACTG

Recombination. pRhomyco 95%

Rec95-NheIR

GATGGCTAGCGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 95%

Rec90-85-BclIF GATGTGATCAAGCTCTTCGGGGAACACTG Recombination. pRhomyco 90% and 85%

Rec90-85-PciIR GATGACATGTGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 90% and 85%

S39RF GGCCGACGGATCGGTCAGGGTACATGCTGACGACCG Site-directed mutagenesis nucSTB

S39RR CGGTCGTCAGCATGTACCCTGACCGATCCGTCGGCC Site-directed mutagenesis nucSTB

S54IF CCGTTGAACTGGATGATTCCGCCGTGCTGGTTG Site-directed mutagenesis nucSTB

  17 

 

 

 

S54IR CAACCAGCACGGCGGAATCATCCAGTTCAACGG Site-directed mutagenesis nucSTB

A67SF AGAGTCCGGCGGCCAGTCGCCAGTGTGGGTGGTCG Site-directed mutagenesis nucSTB

A67SR CGACCACCCACACTGGCGACTGGCCGCCGGACTCTTC Site-directed mutagenesis nucSTB

V69AF CCGGCGGCCAGGCGCCAGCGTGGGTGGTCGAGAAC Site-directed mutagenesis nucSTB

V69AR GTTCTCGACCACCCACGCTGGCGCCTGGCCGCCGG Site-directed mutagenesis nucSTB

A135SF GCCGCGAGTACATGACCTCGATCGGACCCGTCGAC Site-directed mutagenesis nucSTB

A135SR GTCGACGGGTCCGATCGAGGTCATGTACTCGCGGC Site-directed mutagenesis nucSTB

A162HF GCGGCGTGGCGAGATCCACGGCGTGGAGCAGCTGAC Site-directed mutagenesis nucSTB

A162HR GTCAGCTGCTCCACGCCGTGGATCTCGCCACGCCGC Site-directed mutagenesis nucSTB

T168AF GCGTGGAGCAGCTGGCCCGCTACCTCGAGTTGC Site-directed mutagenesis nucSTB

T168AR GCAACTCGAGGTAGCGGGCCAGCTGCTCCACGC Site-directed mutagenesis nucSTB

K184EF GTGCTCGCGCCGGTCGAGGGGGTGTTTGCCG Site-directed mutagenesis nucSTB

K184ER CGGCAAACACCCCCTCGACCGGCGCGAGCAC Site-directed mutagenesis nucSTB

pvv16-seq-fwd CGGTGAGTCGTAGGTCGGGACGG PCR amplification and sequencing

pvv16-seq-rev TGCCTGGCAGTCGATCGTACGCTAG PCR amplification and sequencing

  18 

Supplementary Data.

Supplementary Data 1. Taxonomic distribution of NucS in reference proteomes (Big excel

data file).

Supplementary Data 2. Taxonomic tree for phylogenetic profiling in newick format. Raw

NCBI profiling tree from archaeal and bacterial species extracted from NCBI in newick format,

corresponding to Figure 5.

Supplementary Data 3. Phylogenetic tree of full NucS in newick format. Raw Maximum

Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 5 (full NucS) in

newick format.

Supplementary Data 4. Detailed information for phylogenetic trees of NucS regions. Sequence

details of NucS-CT and NucS-NT labels on Supplementary Figures 6-7 trees (Excel file).

Supplementary Data 5. Phylogenetic tree of C-terminal NucS in newick format. Raw

Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 6

(NucS-CT region) in newick format.

Supplementary Data 6. Phylogenetic tree of N-terminal NucS in newick format. Raw

Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary figure 7

(NucS-NT region) in newick format.

 

 

 

 

 

 

 

 

 

 

  19 

 

Supplementary Methods.

Generation of a ΔnucS knockout mutant in M. smegmatis.

To generate a ΔnucS knockout mutant in M. smegmatis, a 1.0 kb PstI-HindIII upstream and HindIII-

SalI downstream fragments of nucS were amplified by PCR, using delnucSm 5F, 5R, 3F and 3R

primers, and cloned in-frame into the p2NIL vector. Then, a pGOAL19 PacI-cassette, carrying a β-

galactosidase lacZ gene, a hygromycin-resistance hyg gene and a sacB gene, that confers sucrose

sensitivity, was also inserted into the p2NIL vector. The resulting plasmid p2NIL-ΔnucSSm,

harbouring an in-frame deletion of the target gene (lacking ≈ 95% of the gene sequence), was

electroporated into M. smegmatis mc2 155. Cells were plated on Middlebrook 7H10 agar-X-gal

(100 µg ml-1

) plus kanamycin (25 µg ml-1

) and hygromycin (50 µg ml-1

) and incubated for 4-5 days

at 37ºC. Single-crossover merodiploid clones were grown in 7H9 broth without antibiotics to allow

a second crossover event. Cultures were diluted and counter-selected on Middlebrook 7H10-X-gal

(100 µg ml-1

) plates containing 2% sucrose and incubated 4-5 days at 37ºC. Double-crossover

clones were tested for kanamycin and hygromycin susceptibility to confirm the loss of the plasmid.

Finally, M. smegmatis mc2 155 ΔnucS colonies were tested by PCR and sequencing to verify the

unmarked nucS deletion.

Complementation of the M. smegmatis ΔnucS mutant.

For complementation with nucS from M. smegmatis mc2 155, the MSMEG_4923 gene (nucSSm),

including its own 46-bp promoter region, was amplified by PCR with primers compnucSmF and

compnucSmR primers, digested with EcoRI and HindIII and cloned into the integrative vector

pMV361, rendering the complementation vector pMV-nucSSm. Similarly, for complementation with

nucS from M. tuberculosis, the wild-type full-length MT_1321 gene from CDC1551 control strain

(nucSTB), with its own 61-pb upstream promoter region, was amplified by PCR, using compnucTBF

and compnucTBR primers, cloned into the pMV361 vector to generate the complementation vector,

pMV-nucSTB. Putative complemented mutants were obtained upon electroporation of the plasmids

into M. smegmatis mc2 155 ΔnucS and incubation of the plated samples on Middlebrook 7H10 agar

plus kanamycin (25 µg ml-1

) for 3-5 days at 37ºC. Finally, M. smegmatis mc2 155 ΔnucS

complemented mutants were analysed by PCR and sequencing to verify the proper insertion of the

genes.

  20 

Generation of ΔnucS S. coelicolor knockout mutant and complementation.

pIJ6650 vector was used to clone in-frame two 1.5 kb DNA fragments, one HindIII-BamHI nucS

upstream fragment plus one BamHI-EcoRV nucS downstream fragment, previously amplified by

PCR (primers delnucSco 5F, 5R, 3F and 3R). E. coli ET12567 (pUZ8002) was transformed with

pIJ-ΔnucSSco (containing the in-frame deletion of the nucSSco gene) and conjugated with S.

coelicolor A3(2) M145. Following the isolation of apramycin resistant single-crossover, putative

double-crossover mutants were isolated and the unmarked deletion of the nucSSco gene was verified

by PCR. S. coelicolor ΔnucS was complemented with a wild-type copy of nucSSco cloned in

pSET152. pSET152 is a non-replicative plasmid in Streptomyces that carries the attP site and the

integrase gene of the C31 phage and consequently can integrate into the attB site 2 in the

chromosome of Streptomyces. nucSSco gene with its own promotor was amplified by PCR with

primers compnucScoF and compnucScoR, cloned into pSET152 using EcoRI and BamHI sites to

give the pSET-nucSSco plasmid that was introduced into E. coli ET12567 (pUZ8002) by

transformation. Finally, pSET-nucSSco from E. coli ET12567 (pUZ8002, pSET-nucSSco) was

introduced into S. coelicolor ΔnucS by conjugation in order to integrate the construction into the

chromosome.

Construction of pRhomyco plasmids.

To create a template for measuring homologous and homeologous recombination in M. smegmatis,

we designed a recombination assay based on the hygromycin-resistance (Hyg-R) gene hyg, using

the integrative plasmid pMV361. For pRhomyco 100%, a fragment denominated hyg 3’ (from

nucleotide 195 of the coding sequence to the stop codon) of the hyg gene, was PCR amplified from

pRAM vector, using rechyg3F and rechyg3R primers and digested with EcoRV/SpeI to be cloned in

StuI/SpeI targets of pMV361. Additionally, a fragment denominated hyg 5´ containing 711 bp

(from the start codon of the hyg gene to nucleotide 711) was PCR amplified and cloned using

PvuII/HindIII with rechyg5F and rechyg5R primers. Both fragments share two overlapping hyg

fragments of 517 bp (nucleotide 195 to 711). The homeologous recombination vectors, named

pRhomyco 95%, 90% and 85%, were constructed by replacing the overlapping 517-bp fragment of

hyg 5´ for synthetic fragments that have 95%, 90% or 85% sequence similarity to the original hyg

fragment (Supplementary Fig. 2). First, a common fragment to the three of them (from nucleotide 1

to 193 of hyg) was PCR-amplified with rechyg5F and rechyg5intR primers and cloned using

PvuII/BclI. Then, each synthetic fragment was PCR amplified and cloned using BclI/NheI with

rec95-BclIF and rec95-NheIR primers, in the case of pRhomyco 95%. For pRhomyco 90% and

  21 

85%, the variable fragment was cloned using BclI/PcI-BspHI with rec90-85-BclIF and rec90-85-

PciIR primers.

Computational analyses.

A summary of all the computational approaches conducted is depicted in Supplementary Fig. 4. For

NucS, we used the structure-based alignment of bacterial and archaeal NucS (Supplementary Fig.

3). Then we followed the procedure depicted in Supplementary Fig. 4 to conduct domain analyses

using sequences.

Using the different defined regions by structural bioinformatics, we first conducted sequence

searches against the large database. We made non-redundant alignments of the N-terminal region

(containing 63 sequences) and the C-terminal region (containing 39 sequences) and built profile

hidden Markov models (HMMs). These profiles were searched using pHMMER 3 against the large

References Proteomes file where additional proteins containing CT and NT regions in alternative

proteins were found. While the NT region was found in alternative archaeal proteins (we selected

two after checking the alignment F7PKA0_9EURY and L0I7R5_HALRX (DUF91, PF01939), the

CT region was found not only in additional prokaryotic groups, but also in eukaryotic sequences

(I1F1Q3_AMPQE, I1F1Q5_AMPQE, B3SFT8_TRIAD, B9T981_RICCO, and

A0A015J6U4_9GLOM). With the exception of A0A015J6U4, and B9T981, the matching regions

are located within a DUF1016 domain (uncharacterized). We checked the alignments to confirm

that the catalytic residues, as described in the structure of P. abyssi 1, were conserved.

We next searched the PFAM database with NUCS_MYCTU and found a domain, PF01939, which

was trained in three NucS full sequences. From the PF01939 domain (539 sequences in 464

species), only 368 entries in 363 (archaeal and bacterial) species gave a match with the full protein

(Supplementary data file 1).

To identify MutS and MutL in bacterial and archaeal species, we generated HMMs from different

proteins, which would capture most of the bacterial and archaeal diversity (Supplementary Fig. 4).

The MutS profile was trained with the following sequences (MUTS_ECOLI, C1F256_ACIC5,

MUTS_AQUAE, MUTS_BACTN, MUTS_CHLPN, MUTS_CHLTE, MUTS_CHLAA,

WP_027388865.1, MUTS_PROM3, B5YE41_DICT6, MUTS_BACSU, MUTS_STAAM,

MUTS_CLOTE, MUTS_FUSNN, C1AEM6_GEMAT, A6DUF8_9BACT, MUTS_RHOBA,

MUTS_BRUME, MUTS_NEIMA, MUTS_SALTY, MUTS_VIBC3, MUTS_PSEAE,

MUTS_DESVH, MUTS_TREPA, A0A075WU36_9BACT, MUTS_THEMA, WP_009958798.1),

while the MutL profile was trained with the following sequences (MUTL_ACIC5,

MUTL_AQUAE, MUTL_BACTN, MUTL_CHLPN, MUTL_CHLTE, A9WJ86_CHLAA,

  22 

MUTL_DEIRA, MUTL_DICT6, D9S9H6_FIBSS, MUTL_BACSU, MUTL_STAAM,

MUTL_CLOTE, Q8RG56_FUSNN, MUTL_GEMAT, A6DHB3_9BACT, Q7UMZ3_RHOBA,

MUTL_BRUME, MUTL_NEIMA, A0A0C5TQ33_SALTM, MUTL_VIBCH, MUTL_PSEAE,

Q72ET5_DESVH, MUTL_TREPA, A0A075WSF3_9BACT, MUTL_THEMA, and

MUTL_ECOLI). The models were searched against each reference proteome where bit scores <50

were discarded to filter and analyze the results. Only full proteins were retrieved by forcing a length

> 75%, so partial hits would be excluded.

For actinobacterial and archaeal species not showing NucS, searches in all the particular genomes

(only for complete genomes) were done using translated searches against their genomes using

tblastn with NucS_MYCTU, MutS_ECOLI, and MUTL_ECOLI for actinobacterial, and

NucS_PYRAB, MutS_HALMA, and MUTL_HALSA for archaeal proteins.

 

 

  23 

References for Supplementary Information

1  Ren, B. et al. Structure and function of a novel endonuclease acting on branched DNA 

substrates.  The  EMBO  journal  28,  2479‐2489,  doi:emboj2009192  [pii] 

10.1038/emboj.2009.192 (2009). 

2  Bierman,  M.  et  al.  Plasmid  cloning  vectors  for  the  conjugal  transfer  of  DNA  from 

Escherichia coli to Streptomyces spp. Gene 116, 43‐49 (1992). 

3  Eddy, S. R. A new generation of homology search tools based on probabilistic inference. 

Genome Inform 23, 205‐211 (2009).