Upload
vutuong
View
222
Download
0
Embed Size (px)
Citation preview
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Exploration ofneighbourhoods forinductive reasoning
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Authors
in silicoEduardo RochaIvan MoszerClaudine Médigue
in vivo / in vitroAgnieszka SekowskaAnne Marie GillesOctavian Barzu
CollectiveStanislas Noria
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
a virtuous circle
ContextDataHypotheses => Today’s presentation
A Chinese view for ….
vision_en.html
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Anglo-American
NATO
Bottom Up
Data-driven
Greco-Latin
OTAN
Top Down
Hypothesis-driven
Chinese
« Bombardment of the Chinese Embassy in Belgrade »
Sideways
Context-driven
causeries/Western.html
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Data vs Hypotheses
What biologicalquestion are you
asking?
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Variation / Selection / Amplification
Evolution creates
Function recruits
Structure coding process
Sequence
Empedocle / Maupertuis / Malthus /Darwin
vision_en.html
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
What is Life?
Physics: matter, energy, time
Biology: Physics + information,coding, control...
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
What functions for Life?An extension of Cuvier’s
view….Physical stability ([cyto]skeleton)ReproductionRespirationLocomotionPerceptionTransport (import / export)Circulation (internal fluxes)Digestion and recyclingAssimilationAccommodation (regulation)Maintenance (repair)Etc…
causeries/causeries.html
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Three processes are needed for Life:
Information transfer (Living Turing Machines)
Driving force for a coupling between the genome structure andthe structure of the cell:
Metabolism (Internal organisation)Compartmentalization (General structure)
Because of these two processes, note that “concentration”usually does not have a meaning inside a cell
What is Life?
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Inductive strategy:exploring
“neighborhoods”Genes do not operate inisolationProteins are part ofcomplexes, as are partsin an engine
It is important tounderstand theirrelationships, as those inthe planks which make aboat
The Delphic Boat: Harvard University
Press, february 2003
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Induction: exploringneighborhoods
To make discoveries we explore the general« neighborhoods » of genes of interest: proximity in thechromosome, in evolution, in the literature, in biochemicalcomplexes, in metabolism etc.
Comparative genomics is essential, hence the use of« subtractive » genomics (comparison of pairs or largersets of similar genomes)
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
From sequence tofunction
Combining genome sequence data and insilico prediction (bioinformatics) we test ourhypotheses using large scale genomicstechniques (transcriptome and proteomeanalysis) as well as other types ofneighborhoods, such as common electriccharge or codon usage bias.
↓ Note that regulation evolves much fasterthan all other processes
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Genome organisationIs the gene order random in the chromosomes?
At first sight, despite different DNA managementprocesses not much is conserved, and horizontallytransferred genes are distributed throughoutgenomes
However, groups of genes, such as operons orpathogenicity islands tend to cluster in specificplaces, and they code for proteins with commonfunctions
A question: where are located repeats?
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Caveat:Repeats are meaningful
Remember also:
This clock has aminute minutehand
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
.
0
0
1000
2000
3000
4000
1000 2000 3000 4000
Escher i ch i a
c o l i
1000
0 200
2000
600 1000 1400 1800
Haemophi lus
in f l uenzae00
500
1000
1500
0 500 1000 1500
Methanococcus
j an n asch i i 100
200
300
400
0
0 100 200 300 400 500
Mycoplasma
gen i ta l i um
500
0200
400
600
800
0 200 400 600 800
Mycoplasma
pneumoniae
0500
1000
1500
0 500 1000 1500
Hel icobacter
py l o r i
0 1000 2000 3000 4000
0
B a c i l l u s
s u b t i l i s
1000
2000
3000
4000
0 500 1000 1500
0
Methanobacterium
thermoautotrophicum
500
1000
1500
NR = 397
NT = 283
NR = 170
NT = 54
NR = 204
NT = 111
NR = 139
NT = 82
NR = 260
NT = 187
NR = 552
NT = 250
NR = 183
NT = 75
NR = 280
NT = 137
DNA management:Repeats in genomes
E. Rocha, A. Viari & A. Danchin Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis andother competent prokaryotes. Mol. Biol. Evol. (1999) 16: 1219-1230
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Genome organisation
The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, which differentiates theleading strand from the laggingstrand
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
E. Rocha
180
90
0
270
55% leading
Escherichia coli
Ori
Ter
90270
65% leading
Treponema pallidum
Ori
Ter
180
9027075% leading
Bacillus subtilis
Ori
Ter
90270
87% leading
Thermoanaerobactertengcongensis
Ori
Ter
CDS densityLeading CDS density
(updated from Kunst etal , Nature, 97)
Different “OperatingSystems”?
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
To lead or to lag...
Is it possible to see whether the position ofgenes in the chromosome is randomlydistributed on the leading and lagging strand?
Chosing arbitrarily an origin of replication and aproperty of the strand (base composition, codoncomposition, codon usage, amino acidcomposition of the coded protein…) one canuse discriminant analysis to see whether thehypothesis holds.
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Chosing arbitrarily anorigin of replicationand a property of thestrand (basecomposition, codoncomposition, codonusage, amino acidcomposition of thecoded protein…) onecan use discriminantanalysis to seewhether thehypothesis holds.
To lag or to lead...
E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
To lag or to lead, that is thequestion
.
0,450,5
0,550,6
0,650,7
0,750,8
0,85
0 20 40 60 80 100
Bacillussubtilis
accu
racy
Borreliaburgdorferi
0,4
0,5
0,6
0,7
0,8
0,91
0 20 40 60 80 100 0,4
0,5
0,6
0,7
0,8
0,91
Chlamydiatrachomatis
0 20 40 60 80 100
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Escherichiacoli
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Heamophilusinfluenzae
0 20 40 60 80 100
HelicobacterPylori
0,4
0,45
0,5
0,55
0,6
0,65
0,7
0,40,45
0,50,55
0,60,65
0,70,75
0,8
0 20 40 60 80 100
Methanobacteriumthermoautotrophicum
position (%) position (%) position (%)
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Mycobacteriumtuberculosis
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100
Treponemapallidum
Bases
Amino acids
Codons
Dinucleotides
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Visible even in proteins…
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Replication transcriptionconflicts
Transcription may proceed opposite tothe movement of the replication forkmovementThis will abort transcription, leading totruncated mRNAIf translated truncated mRNA may leadto truncated proteins, this will becomenegative dominant if in complexes…
E.P.C. Rocha & A. Danchin Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nature Genetics (2003) 34 : 377-378
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Essentiality in B. subtilis
highly
expressed
0%
25%
50%
75%
100%
non-highly
expressed
Essential genes
highly
expressed
non-highly
expressed
Non-essential genes
Lagging
Leading
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
When polymerases collide (II)
DNAPdeceleration
End oftranscription
Arrest of RNAP & DNAP
Transcriptionabortion
Co-oriented Head-onConsequences:1. Replication slow-down
2. Loss of transcripts
Consequences:1. Aborted transcripts
2. Truncated essentialproteins
E. Rocha
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Distribution of highly expressed genes
Highly expressed genescluster near the origin infast-growing bacteria
Origin
Terminus
Middle
Ori
Ter
10%
20%
30%
40%
50%
60%
70%
0%
C. c
resc
entu
s
M. t
uber
culo
sis
E. c
oli
B. su
btili
s
Fast growers | Slow growersFast growers | Slow growers
E. Rocha
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Gene vicinity: synteny
C. Médigue
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Multivariate Analyses
In contrast to standard genetics, genomics analyses large collectionsof genes and gene products.
Multivariate analyses try to extract information by simplifying thenumber of relevant descriptors in the objects of interest.
Principal Component Analysis uses the centered average and a simpledistance (identity); it is the reference method.
Correspondence Analysis belongs to the same family, but it uses the χ2 measure as a distance. This allows the user not only to work withhighly heterogeneous objects but also to work simultaneously on thespace of objects and on the space of descriptors.
Independent Component Analysis uses the non gaussian character ofthe values associated to descriptors
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Neighborhoo:distribution ofaminoacids inthe proteome
G. Pascal
Bias in amino acid distribution
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Universal biases in proteinamino acid composition
First axis: separates Integral Inner MembraneProteins (IIMP) from the rest; driven by oppositionbetween charged and large hydrophobic residues
Second axis: separates proteins according toan opposition driven by the G+C content of the firstcodon base
Third axis: separates proteins by their contentin aromatic amino acids; enriched in orphanproteins
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Temperature-dependentbiases in protein amino
acid composition
The amino acid composition of proteins dependsheavily on the phylogeny => need to compareorganisms related to each other
The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures
Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)
Biases are always dominated by the IIMPclustering
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Codon usage biases
20 amino acids 61 codonsStudy of the genes in the codon space,using Correspondence Analysis (χ2
measure)At least three classes of genes,including one corresponding tohorizontal transfer
C. Médigue, T. Rouxel, P. Vigier, A. Hénaut & A. Danchin. Evidence for horizontal gene transfer in Escherichia coli speciation.J. Mol. Biol. (1991) 222 pp. 851-856
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Gene exchangeGenes expressed at a
high level
under exponential
growth conditions
Horizontally
exchanged genes
Core metabolism
of the cell
Class I: core metabolism
Class II: high expression inexponential growth
Class III: horizontal transfer
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Codon usage, organisation andevolution of the B. subtilis genome
(Moszer, 98)
Correspondence analysis
Classification
Highly expressedAtypical / HGTOthers
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
The cell organizers
It is too early to understand theselection pressures that organize thecell architecture. However, at least inbacteria, the role of gasses andchemical highly reactive radicals playprobably a major role. Most of thecorresponding genes are stillunknown….
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Sulfur undergoes oxido-reduction reactions from -2 to +6Incorporation of sulfur into metabolism usually requiresreduction to the gaseous form H2SH2S is highly reactive, in particular towards dioxygen=> These two gasses, despite their diffusion properties,must be kept separate as much as possibleSulfur scavenging is energy-costly=> Sulfur containing molecules have to be recycled
Selection pressure fororganisation: Oxido-
reduction
A. Sekowska, H-F. Kung & A. Danchin Sulfur metabolism in Escherichia coli and related bacteria, facts and fiction.J. Mol. Microbiol. Biotechnol. (2000) 2: 145-177
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Sulfur metabolism: anunexpected organiser of the
cell ’s architecture• Sulfur metabolism-related proteins are more acidic(average pI 6.5) than bulk proteins (richer in asp and glu),they are poor in serine residues
• They are significantly poor in sulfur-containing amino-acids
• Their genes are very poor in codons ATA, AGA and TCA
• There are no class III (horizontal transfer) genes in theclass (only 2 in 150 genes)
• => sulfur-metabolism genes are ancestral and may for acore structure for the E. coli genome
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Proximity in the chromosomeSulphur islands
E.P.C. Rocha, A. Sekowska & A. Danchin Sulfur islands in the Escherichia coli genome: markers of the cell's architecture?FEBS Lett. (2000) 476: 8-11
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
The error catastrophe
Similarity in sequence leads to functionalinference
Because of recruitment of pre-existing structures,there is often no obvious link between a structureand a function (the book-paperweight
Hence a propagation of annotation errors ykrS annotated as « translation factor » is a
component of sulfur metabolism! A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin Bacterial variations on the methioninesalvage pathway BMC Microbiol (2004) 4: 9
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
A new metabolicpathway
A. Sekowska
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Just so story: proximity inthe genome
cmk (mssA) rpsAEscherichia coli
cmk ypfDBacillus subtilis no rpsA !!!
cmk rpsA
cmk rpsA
Haemophilus influenzae
Sinorhizobium meliloti
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
The pyrimidine diphosphateparadox
OMP UMP UDPUDP UTP CTP
In order to make deoxyribonucleotides the cell uses
ribonucleosides diphosphates, not triphosphates
NDP dNDP dNTPNDR NDK
no CDP !!!
And here is the paradox:
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
How is the paradoxresolved?
OMP UMP UDPUDP UTP CTP
mRNA
DNA
CMP
CDPdCDP
RNases Cmk
PNPase
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Phylogenetic neighbors:the S1 box
• rpsA codes for ribosomal protein S1. It contains the S1 box (PROSITE PS50126). Many other proteins contain a similar box: polynucleotide phosphorylase, RNases E, G and R, RNAhelicases etc.• protein RegB of bacteriophage T4, associated to S1, cuts mRNA at GAGG motifs.• S1 is a subunit of bacteriophage Qβ replicase…
=> All this points to a function for S1 in RNA metabolism
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Codonusagebias
neighbors
Gene Comment
blacatdicBlppompA
long mRNA turnover
pyrF pyrimidine metabolism
hflBftsHmrsACFlpp
cell architecture
nusApcnBmetYpnprnarnbrncrne/amsrngrph
RNA maturation and turnover
trxA oxido-reduction, subunit of T7replicase, needed for synthesisof deoxyribonucleotides
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Protein complexes:the Degradosome
PNPasePolyA polymeraseRNAse ES1
Polyphosphate kinase
Enolase
mRNA degradation
CDP for de novo DNA synthesis
GDP recycling of GTP for carbohydrate secretion
GDP + PEP GTPNDK +PYK
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Just so story: the cmk rpsAoperon
cmk (mssA) rpsAEscherichia coli
Conclusion:The function of the cmk rpsA operon is to make CDPfor DNA synthesis
mssA was discovered as a suppressor ofsmbA (pyrH), itself a suppressorof MukB, amyosin-like protein involved in chromosomesegregation=> DNA synthesis is involved in the function.
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Selection pressure forcompartmentalization: adangerous intermediate
OMP UMP UDPUDP UTP CTP
dUDP
dUTPDNA dUMP + PPi
dTMP
dTDP
dTTP
DNA
Uridylate kinase (UMK)pyrH (smbA) No CDP: no DNA…
S. Noria & A. Danchin Just so genome stories : what does my neighbor tell me? International Congress Series 1246 Elsevier Science (2002) 3-13
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
In conclusion:
UMK must becompartmentalized
S. Landais, P. Gounon, C. Laurent-Winter, J.C. Mazié, A. Danchin, O. Barzu& H. Sakamoto Immunochemical analysisof UMP kinase from Escherichia coli. J.Bacteriol. (1999) 181: 833-840
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
A prediction: ribosomerecycling and UTP
pyr H frr Escherichia coli
pyr H frr Bacillus subtilis
pyr H frr Photorhabdus luminescens
This organisation is conserved in most Gram+ and Gram-bacteria. Why ?
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Ribosome recycling andUTP
frr codes for the ribosome recycling factor, that allows 70Sribosomes to split into 30S and 50S subunits. In polycistronicoperons, the 70S ribosome can go on from one gene to thenext one without recycling (this requires formylation of the firstmethionine). At the end of the message, the ribosomes mustrecycle. This happens in a context where transcripts makestem and loops, ending with a polyU sequence.
Conjecture: is UTP controlling the activity of Frr? Rememberthat one cannot speak of « concentrations » of molecules in acell. 1 micromolar would mean 600 molecules. There are20,000 ribosomes, therefore 1 mM means only 30 individualmolecules in the immediate vicinity of each ribosome...
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]
Transcriptiontermination
UUUUUUUUUU
At Rho-independentsites for termination oftranscription themessenger RNA endswith rows of U. Thismust lower the localavailability of UTP….
This suggests Frr as a drug target, with analogs of UTP as leads...