Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Adaptive Evolution
in
Microbial Genomes
Sujay Chattopadhyay Department of Microbiology
University of Washington
Seattle WA
Gene
acquisition
NON-PATHOGENIC (COMMENSAL) HABITAT
PATHOGENIC (VIRULENCE) HABITAT
Gene
mutation
Gene
loss
low
fit
Mechanisms of Virulence Evolution
Patho-adaptive mutation *
*
*
[Welch et al. PNAS, 2002]
40.2% 7.8%
2.6%
21.8%
2.7%
18.0%
6.9%
MG1655 (K-12)
non-pathogenic
EDL933
enterohemorrhagic
CFT073
uropathogenic
Mosaic vs. Common Genes
3 Escherichia coli genomes
fimH – A Common Gene Encoding
Type 1 Fimbrial Adhesin of E. coli
oropharynx
stomach
large intestine
urinary tract
*
NON-PATHOGENIC
(COMMENSAL) HABITAT
PATHOGENIC
(VIRULENCE) HABITAT
- type 1 fimbriae are critical for colonization
FimH
*
*
g259a (G87R)
fimH
from a uropathogenic
E.coli
fimH
from a commensal
E.coli
Patho-adaptive Mutations in
E. coli FimH Adhesin
Weissman et al. Infect Immun, 2007
FimH Protein Tree
R187H
G160R
G87S
V4G
G87R
G87C
A48V
T6N
V4G
V184I:
V251A
T6P
S99N
V4E
T6Y
V184A
A48V
1 amino acid change
T95I A127V
Q290K
A48V
A149M
R248H G87C
G216F
A263V
A140V
A10I:
V77A
V4F G294A
N6T T6N
A279C G83A
A127T
R3Q:
A127V
A48T
G87R
T95I:
V184A
T95A
N91S
Sokurenko et al. Mol Biol Evol, 2004
I. Detecting adaptive evolution via
mutations in coding genes
synonymous
(or silent)
change
non-synonymous
(or amino acid)
change
Two Types of Gene Mutation
*
*
Comparing Rates of Gene Mutation
(dN/dS)
synonymous
(or silent)
change
non-synonymous
(or amino acid)
change
* Rate of synonymous change
(dS)
Rate of non-synonymous change
(dN)
dN/dS
<1 – Purifying Selection (against amino acid changes)
>1 – Positive Selection (for amino acid changes)
)3
41ln(
4
3Sp)
3
41ln(
4
3Np
[where pN and pS denote proportions of non-synonymous and synonymous changes]
= =
dN/dS Shows Purifying Selection in fimH
R187H
G160R
G87S
V4G
G87R
G87C
A48V
T6N
V4G
V184I:
V251A
T6P
S99N
V4E
T6Y
V184A
A48V
1 amino acid change
T95I A127V
Q290K
A48V
A149M
R248H G87C
G216F
A263V
A140V
A10I:
V77A
V4F G294A
N6T T6N
A279C G83A
A127T
R3Q:
A127V
A48T
G87R
T95I:
V184A
T95A
N91S
dN/dS = 0.08
Sokurenko et al. Mol Biol Evol, 2004
Need for a
more sensitive approach
than dN/dS
Detecting
convergent evolution
at molecular level
Morphological Convergence
VERTEBRATES
TETRAPODS
AMNIOTES
Fish Turtles
Amphibians
Snakes and
Lizards
Crocodiles
and Birds
Mammals
develop fins
develop flippers
to adapt to the aquatic habitat
Ala 27 Val
* Ala 27 Val
Adaptive Mutations Tend to Repeat
non-synonymous
(or amino acid)
change
synonymous
(or silent)
change
non-synonymous
(or amino acid)
change
*
Ala 27 Thr
Ala 27 Val
Ala 27 Val
repeated phylogenetically-unlinked mutations
in the same amino acid position
Convergent Evolution via Mutations
Hotspot Mutations
G87S
G87S
G87C
A48V
A48V
1 amino acid change
A127V
A48V
G87C
A127T
A127V
A48T
G87S
Convergent Evolution of FimH
Sokurenko et al. Mol Biol Evol, 2004
Chattopadhyay et al. J Mol Evol, 2007
Adaptive evolution of FimH
occurs via accumulation of
hotspot mutations
What about other genes
in the genome?
Genome-level study
Detect genes undergoing
adaptive convergent evolution
via HOTSPOT mutations
using TimeZone software [Chattopadhyay et al. Nature Protocols, 2013]
LT2 D23580
14028S
SL476
SL254
RKS4594
SPB7
ATCC 9150
AKU_12601
SC-B67
SL483
P125109
287/91 CT_02021853
CVM19633
CT18 Ty2
Paratyphi C
Typhimurium
Heidelberg
Newport
Choleraesuis Paratyphi B
Agona
Paratyphi A
Enteritidis Gallinarum Dublin
Schwarzengrund
Typhi
serovar
10 nucleotide changes
Phylogenetic tree of concatenated sequences of 7 housekeeping genes
used in multilocus sequence typing (MLST)
17 Salmonella enterica subsp. I Genomes
Shigella
EAEC (Enteroaggregative) ExPEC (Extraintestinal)
EHEC (Enterohemorrhagic)
ExPEC (Extraintestinal)
Fecal
Fecal
ETEC (Enterotoxigenic)
EAEC (Enteroaggregative)
Fecal
ExPEC (Extraintestinal)
EPEC (Enteropathogenic)
Environmental
042
UMN026
EDL933
Sd197
Sb227
Ss046
Sf5str8401
SE11
55989
E24377A
IAI1
ATCC 8739
HS
MG1655
S88
UTI89
IAI39
536
CFT073
ED1a
E2348/69
SECEC SMS-3-5
pathotype
21 Escherichia coli Genomes
10 nucleotide changes
Phylogenetic tree of concatenated sequences of 7 housekeeping genes
used in multilocus sequence typing (MLST)
1488
common
5370 genes
E. coli
2797
common
4450 genes
S. enterica ss. I
Common Genes Analyzed
Based on >95% sequence-identity and >95% length-coverage
0
10
20
30
40
50
S.enterica E. coli
High Frequency of Genes with Hotspot Changes
% of genes
with hotspots
Ala 27 Val
Ala 27 Val
Ala 27 Thr
COINCIDENTAL
(same position,
different mutations)
PARALLEL
(same position,
same mutations)
% of genes
with hotspots
Predominance of Parallel Hotspot Changes
0
10
20
30
40
50
S. enterica E. coli
all hotspots
parallel
coincidentalAla 27 Val
Ala 27 Val
Ala 27 Thr
COINCIDENTAL
(same position,
different mutations)
PARALLEL
(same position,
same mutations)
Parallel Hotspots
Can Result from Recombination
These changes would appear as
hotspots in phylogenetic tree
Recombination Detection Analysis
Method P-value calculation References
Pairwise
Homoplasy Index
permutation Bruen et al.,
2006
Maximum 2
(MAXCHI)
2, permutation Maynard Smith,
1992
Neighbor Similarity
Score (NSS)
Contingency test for clustering
in compatibility matrix
Jacobsen &
Easteal, 1996
(in collaboration with Vladimir Minin, Dept. of Statistics, UW)
0
10
20
30
40
50
S. enterica E. coli S. enterica E. coli
Frequency of Hotspots Remains High
in Non-recombinant Genes
% of genes
with hotspots
all
genes
non-recombinant
genes
[Chattopadhyay et al. PNAS, 2009; J Bacteriol, 2012]
0
10
20
30
40
50
S. enterica E. coli S. enterica E. coli
all hotspots
parallel
coincidental
% of genes
with hotspots
all
genes non-recombinant
genes
Parallel Hotspots Predominant
Also in Non-recombinants
[Chattopadhyay et al. PNAS, 2009; J Bacteriol, 2012]
Can neutral (random) mutations
explain:
overall high frequency of
hotspot mutations?
predominance of parallel
hotspots?
S. enterica E. coli
non-recombinant
genes
Simulation of Mutations under Neutrality
using EvolveAGene3 software [Hall BG, Mol Biol Evol, 2008]
One allele from real dataset
as reference
Generating new alleles under the constraints of:
1. gene-length identical to real dataset
2. Number of alleles identical to real dataset
3. Equal probability of mutations in generating new alleles
4. No indels or stop codon
5. Branch-lengths of simulated phylogenetic tree ranging within
the limits of average branch-length of the real dataset tree
10 iterations to yield
10 simulated datasets
for each gene
0
5
10
15
20
real simulated real simulated
parallel
coincidental
% of genes
with hotspots
S. enterica E. coli
In simulated sequences:
Hotspot frequency significantly lower
Coincidental hotspots more frequent than parallel ones
[Chattopadhyay et al. PNAS, 2009; J Bacteriol, 2012]
Hotspot mutations are
results of adaptive evolution
I. Detecting adaptive evolution via
mutations in coding genes
II. Assessing functional bias of proteins
with adaptive mutations
23% Categories Enriched
for Hotspot Mutations in S. enterica ss. I
18
enriched
categories
60
non-enriched
categories
Enriched Functional Categories in S. enterica ss. I
Two-component
system
Amino acid
biosynthesis
DNA
repair
Purine
metabolism
Vitamin
metabolic
process
RNA
modification
Ubiquinone & other
terpenoid-quinone
biosynthesis
Homologous
recombinantion Carbohydrate
biosynthesis
Transition
metal
ion
transport
Bacterial
secretion
system
Pentose
phosphate
pathway
Glycolysis/
gluconeogenesis
Propanoate
metabolism
Pyrimidine
metabolism
Citrate (/TCA)
cycle
Nicotinate &
nicotinamide
metabolism
Bacterial
chemotaxis
18
enriched
categories
13
enriched
categories 35
non-enriched
categories
27% Categories Enriched
for Hotspot Mutations in E. coli
Two-
component
system
Amino acid
biosynthesis
DNA
repair
Purine
metabolism
Vitamin
metabolic
process
RNA
modification Ubiquinone & other
terpenoid-quinone
biosynthesis
Homologous
recombinantion
Pyruvate
metabolism
Lipid
synthesis
Cofactor
biosynthesis
Cellular
respiration
Carbohydrate
biosynthesis
Enriched Functional Categories in E. coli
13
enriched
categories
S. enterica ss. I E. coli
Common Enriched Functional Categories
significant overlap (P<0.001)
[Chattopadhyay et al. J Bacteriol, 2012]
Two-component
system
Amino acid
biosynthesis
DNA
repair
Purine
metabolism
Vitamin
metabolic
process
RNA
modification
Ubiquinone & other
terpenoid-quinone
biosynthesis
Homologous
recombinantion
Carbohydrate
biosynthesis
Gene Product Functional
Category
aer Aerotaxis sensor regulator Two component
system creB Response regulator
ruvA Holliday junction helicase subunit A DNA repair
trmA tRNA (uracil-5-)-methyltransferase RNA modification
thrC Threonine synthase Amino acid
biosynthesis
mrcA Transpeptidase of penicillin-binding protein 1a Peptidoglycan
synthesis murB UDP-N-acetylenolpyruvoylglucosamine reductase
murF D-alanine:D-alanine-adding enzyme
holB DNA polymerase III, subunit Purine metabolism
yfbB Putative metabolic enzyme Vitamin
metabolism
Gene-level Adaptive Convergence Between Species
[Chattopadhyay et al. J Bacteriol, 2012]
I. Detecting adaptive evolution via
mutations in coding genes
II. Predicting functional role of adaptive
mutations
III. Identifying pathotype-specific adaptive
evolution
Shigella
EAEC (Enteroaggregative) ExPEC (Extraintestinal)
EHEC (Enterohemorrhagic)
ExPEC (Extraintestinal)
Fecal
Fecal
ETEC (Enterotoxigenic)
EAEC (Enteroaggregative)
Fecal
ExPEC (Extraintestinal)
EPEC (Enteropathogenic)
Environmental
042
UMN026
EDL933
Sd197
Sb227
Ss046
Sf5str8401
SE11
55989
E24377A
IAI1
ATCC 8739
HS
MG1655
S88
UTI89
IAI39
536
CFT073
ED1a
E2348/69
SECEC SMS-3-5
pathotype
Pathotypes: Shigella, ExPEC, Fecal
Highest rate of
accumulation
in Extra-intestinal
pathogens
(p<0.0001)
Rates of Accumulation of Genes with Hotspots
No. of genes with hotspot mutations
Evolutionary divergence of the strains
E. coli pathotypes
rate
of
ac
cu
mu
lati
on
of
ge
ne
s w
ith
ho
tsp
ot
mu
tati
on
s
Extra-
intestinal
Shigella
2000
0
Fecal
1000
3000
4000
5000
6000
[Chattopadhyay et al. PNAS, 2009]
E. coli require large number of convergent
mutations to adapt to extra-intestinal habitats
Systemically Invasive Serovars
LT2 D23580
14028S
SL476
SL254
RKS4594
SPB7
ATCC 9150
AKU_12601
SC-B67
SL483
P125109
287/91 CT_02021853
CVM19633
CT18 Ty2
Systemically
invasive
Paratyphi C
Typhimurium
Heidelberg
Newport
Choleraesuis Paratyphi B
Agona
Paratyphi A
Enteritidis Gallinarum Dublin
Schwarzengrund
Typhi
serovar
Systemically
invasive
Systemically
invasive
Systemically
invasive
Ch
ole
rae
su
is
Du
bli
n
Pa
raty
ph
i B
Gallin
aru
m
Typ
hi
Para
typ
hi A
Dublin 3
Paratyphi B 0 6
Gallinarum 5 1 0
Typhi 2 12 14 3
Paratyphi A 8 8 15 5 37
Paratyphi C 2 1 2 2 1 2
High Level of Hotspot-sharing
Between Typhi and Paratyphi A
[Chattopadhyay et al. J Bacteriol, 2012]
Convergent Evolution
in Typhi and Paratyphi A
LT2 D23580
14028S
SL476
SL254
RKS4594
SPB7
ATCC 9150
AKU_12601
SC-B67
SL483
P125109
287/91 CT_02021853
CVM19633
CT18 Ty2
Paratyphi C
Typhimurium
Heidelberg
Newport
Choleraesuis Paratyphi B
Agona
Paratyphi A
Enteritidis Gallinarum Dublin
Schwarzengrund
Typhi
serovar
Genome Research 2007 Jan;17(1):61-8
yla
B e
ntD
yjjV
STM3549
yjjV
Genomic Distribution of Genes Sharing
Hotspot Mutations in Typhi and Paratyphi A
[Chattopadhyay et al. J Bacteriol, 2012]
Convergent evolution
in Typhi and Paratyphi A
via mutations
Conclusions
Convergent molecular evolution is a major
mode of microbial adaptive evolution
Molecular convergence targets specific functional
categories and pathotypes
TimeZone offers a suite of tools to detect genes
targeted by adaptive mutations
B. bacilliformis KC583
B. bacilliformis INS (contigs)
B. clarridgeiae 73
B. grahamii as4aup
B. tribocorum CIP 105476
B. vinsonii (subsp. berkhoffii) Winnie
B. quintana RM-11
B. quintana Toulouse
B. henselae Houston-1
B. australis Aust/NH1 Kangaroo
Human
Cat
Mouse
Rat
Dog
Rhesus macaque
Human
Cat
100 nucleotide changes
10 Bartonella spp. Genomes
Phylogenetic tree of concatenated sequences of 4 housekeeping genes (rpoB,
groEL, ribC and gltA) used in multilocus sequence typing (MLST)
B. australis
Aust/NH1
43
112
87
38
31
22
88
1057
CORE
MOSAIC
1556
B. quintana
RM-11,
Toulouse
B. vinsonii
(subsp. berkhoffii)
Winnie
B. tribocorum
CIP 105476
B. bacilliformis
KC583
B. clarridgeiae
73
B. grahamii
as4aup
B. henselae
Houston-1
19
Pan-genomic profile of Bartonella species
Two human pathogens at two extreme ends
of the distribution of species-specific genes
Distribution of species-specific (i.e. unique) genes (in blue),
core genes with truncation mutations (in red) in Bartonella spp.
0
20
40
60
80
100
120 no. of species-specific (or unique) genes
no. of core genes with truncation mutations
Two human pathogens at two extreme ends
of the distribution of genes with truncation mutations
gene product
protein
length
(AA)
BARBAKC583_0311 Brp family immunodominant surface
antigen 551
BARBAKC583_0314 Brp family immunodominant surface
antigen 1235
BARBAKC583_0513 putative adhesin/invasin 800
BARBAKC583_0912 adhesin/hemagglutinin 1259
BARBAKC583_1109 outer membrane autotransporter 1193
BARBAKC583_1133 outer membrane autotransporter 1058
Outer membrane proteins
as sole overrepresented functional category
in Bartonella bacilliformis specific genes
gene product AA length non-syn / syn
BARBAKC583_0061 hypothetical protein 202 1 / 0
rimM (BARBAKC583_0085) 16S rRNA-processing protein 191 1 / 0
rpoH2
(BARBAKC583_0126) RNA polymerase factor sigma-32 303 1* / 0
BARBAKC583_0440 uracil-DNA glycosylase 275 1 / 0
BARBAKC583_0617 lipid A biosynthesis lauroyl acyltransferase 301 1 / 0
ribH (BARBAKC583_0633) 6,7-dimethyl-8-ribityllumazine synthase 152 0 / 1
pcs (BARBAKC583_0817) phosphatidylcholine synthase 253 1 / 0
BARBAKC583_0880 integral membrane protein 527 2 / 0
BARBAKC583_1011 acetyltransferase, GNAT family protein 185 1 / 0
flaA (BARBAKC583_1040) flagellin A 380 1 / 0
BARBAKC583_1110 hypothetical protein 636 1 / 0
BARBAKC583_1128 hypothetical protein 253 0 / 1
BARBAKC583_1133 outer membrane autotransporter 1058 1 / 0
BARBAKC583_1223 cell wall hydrolase family protein 284 1 / 0
BARBAKC583_1262 putative L-asparaginase 329 1 / 0
acnA (BARBAKC583_1282) aconitate hydratase 895 0 / 1
BARBAKC583_1342 ABC transporter, permease/ATP-binding protein 588 0 / 1
Polymorphic genes in B. bacilliformis
rimM (BARBAKC583_0085) 16S rRNA-processing protein 191 1 / 0
pcs (BARBAKC583_0817) phosphatidylcholine synthase 253 1 / 0
BARBAKC583_1011 acetyltransferase, GNAT family protein 185 1 / 0
gene product AA length non-syn / syn
10
nucleotide
changes
rimM
(16S rRNA-processing protein)
B. grahamii as4aup
B. tribocorum CIP 105476
B. vinsonii (subsp. berkhoffii) Winnie
B. quintana RM-11
B. quintana Toulouse
B. henselae Houston-1
B. clarridgeiae 73
B. australis Aust/NH1
B. bacilliformis KC583
B. bacilliformis INS
K11R
K11R
B. bacilliformis INS
B. bacilliformis KC583
B. clarridgeiae 73
B. australis Aust/NH1
W85L
W85R
BARBAKC583_1011
(acetyltransferase,
GNAT family protein)
B. grahamii as4aup
B. tribocorum CIP 105476 B. quintana RM-11
B. quintana Toulouse B. bacilliformis KC583
B. bacilliformis INS
B. clarridgeiae 73
B. australis Aust/NH1 B. henselae Houston-1
B. vinsonii (subsp. berkhoffii) Winnie
G116E
G116S
pcs
(phosphatidylcholine synthase)
Cross-species convergent mutations
Conclusions
Preliminary results indicate footprints of within-
clonal adaptive evolution in B. bacilliformis
Comparative genomics of strains within and
between Bartonella species representing strains from
diverse clonal groups, clinical outcomes and regions
of isolation can offer important clues to the
pathoadaptive significance of the genetic variations
Acknowledgments
Cal Poly State University, CA
Peter Chi
SUNY Stony Brook, NY
Daniel Dykhuizen
Univ of Minnesota, MN
James Johnson
Univ of Montana, MT
Michael Minnick
UPCH, Peru
Arturo Centurion
UW Microbiology
Evgeni Sokurenko
Sandip Paul
Veronika Tchesnokova
Dagmara Kisiela
Steve Moseley
UW Statistics
Vladimir Minin
Seattle Children’s Hospital
Scott Weissman