© Genetics of Bacterial Genomes Institut Pasteur1 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
The cell as a livingcomputer
© Genetics of Bacterial Genomes Institut Pasteur2 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
A preconceivedideology
Mathematics
Physics
Chemistry
Biology
Sociology
MolecularMolecularBiologyBiology
StructuralStructuralBiologyBiology
GeneticsGeneticsandandGenomicsGenomics
© Genetics of Bacterial Genomes Institut Pasteur3 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Background
Physics: matter, energy, timeBiology: Physics + information, coding,control...Arithmetics: strings of whole numbers,recursivity, coding…Computing: Arithmetics + program +machine...
© Genetics of Bacterial Genomes Institut Pasteur4 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
InformationTransfer
As is the case for building up a machine, oneneeds a book of recipe to build up a cell
This asks for changing the text of the recipe intosomething concrete: this transfers« information »
In a cell, information transfer is managed by thegenetic program
© Genetics of Bacterial Genomes Institut Pasteur5 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Three processes are needed for Life:
Information transfer (Living Computers?) => the goal ofgenomics is to decipher the blueprint of the “read-only”memory of the machine
Driving force for a coupling between the genome structure andthe structure of the cell:
Metabolism (Internal organisation)Compartmentalization (General structure)
What is Life?
© Genetics of Bacterial Genomes Institut Pasteur6 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Two processes are needed for computing:
A read/write machine
A program on a physical support (typically, a tape illustrates thesequential string of symbols that makes up the program), split (inpractice) into two entities:
Program (providing the goal)Data (providing the context)
The machine is distinct from the program
What iscomputing?
© Genetics of Bacterial Genomes Institut Pasteur7 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
The cell factory
A cell behaves like acomputer that wouldprogram theconstruction ofsimilar computers
It has a magnetic tape,or hard disk (the« genetic program »)and reading deviceswhich allow it to readthe program and put itinto action
The « cloning » of the ewe Dolly was exactly that:changing the program from a machine (an egg) toanother one (an egg without a nucleus)
© Genetics of Bacterial Genomes Institut Pasteur8 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
From the recipe to the dish:from the genetic program
to the cell
When you read therecipe, you performactions to make the dish.A special machineryreads the DNA andcopies it into activeagents, the proteins(enzymes are proteins).
DNA
protein
© Genetics of Bacterial Genomes Institut Pasteur9 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Cells as computers
Genomics rest on an alphabetic metaphor, that of a textwritten with a four-letter alphabet, acting as a program
Conjecture: do cells behave as computers?
Genetic engineeringVirusesHorizontal gene transferCloning animal cells
all point to separation betweenMachineData + Program
© Genetics of Bacterial Genomes Institut Pasteur10 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
If the machine has not only to behaveas a computer but has also toconstruct the machine itself, onemust find an image of the machinesomewhere in the machine (J. vonNeumann)
Is there a map of thecell in thechromosome?
A. Danchin The Delphic Boat. What genomes tell us (2003) Harvard University Press
© Genetics of Bacterial Genomes Institut Pasteur11 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Genome organisationIs the gene order random in the chromosomes?
At first sight, despite different DNA managementprocesses not much is conserved, and genestransferred from other organisms are distributedthroughout genomes
However, groups of genes such as operons orpathogenicity islands tend to cluster in specificplaces, and they code for proteins with commonfunctions
First question: how are generated and where arelocated repeats in the genome sequence?
© Genetics of Bacterial Genomes Institut Pasteur12 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Repeats inbacteria
Abcissa: first occurrence of the repeatOrdinate: second position of the repeat
Diagonal: repeats are located near to eachother
© Genetics of Bacterial Genomes Institut Pasteur13 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
.
0
0
1000
2000
3000
4000
1000 2000 3000 4000
Escher i ch i a
c o l i
1000
0 200
2000
600 1000 1400 1800
Haemophi lus
in f l uenzae00
500
1000
1500
0 500 1000 1500
Methanococcus
j an n asch i i 100
200
300
400
0
0 100 200 300 400 500
Mycoplasma
gen i ta l i um
500
0200
400
600
800
0 200 400 600 800
Mycoplasma
pneumoniae
0500
1000
1500
0 500 1000 1500
Hel icobacter
py l o r i
0 1000 2000 3000 4000
0
B a c i l l u s
s u b t i l i s
1000
2000
3000
4000
0 500 1000 1500
0
Methanobacterium
thermoautotrophicum
500
1000
1500
NR = 397
NT = 283
NR = 170
NT = 54
NR = 204
NT = 111
NR = 139
NT = 82
NR = 260
NT = 187
NR = 552
NT = 250
NR = 183
NT = 75
NR = 280
NT = 137
DNA management:Repeats in genomes
© Genetics of Bacterial Genomes Institut Pasteur14 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Repeats: DNAmanagement differs
according to organism
Ξ No correlation with the length of the genomeΞ Non-random distribution of repeats in bacteria that can catch up
DNA from the environmentΞ When repeats are rare, they are located not far from each other
(10-15 kb)Ξ DNA is managed very differently in different bacteria
Ξ A side view: genomes from higher cells are much morerepeated than genomes of microbes, that look highly random atfirst sight
© Genetics of Bacterial Genomes Institut Pasteur15 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Caveat:Repeats are meaningful
Remember also:
This clock has aminute minutehand
….Thereis nojunkDNA
© Genetics of Bacterial Genomes Institut Pasteur16 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Genomeorganisation
Is the genes’ order random?
At first sight, perhaps because of different DNAmanagement processes, not much is conserved, andhorizontally transferred genes are distributedthroughout genomes
However, pathogenicity islands tend to cluster atspecific places, and they code for proteins withcommon functions
© Genetics of Bacterial Genomes Institut Pasteur17 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Genome organisation
The genome organisation is so rigid thatthe overall result of selection pressureon DNA is visible in the genome text,which differentiates the leading strandfrom the lagging strand
© Genetics of Bacterial Genomes Institut Pasteur18 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
© Genetics of Bacterial Genomes Institut Pasteur19 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
© Genetics of Bacterial Genomes Institut Pasteur20 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
To lead or tolag...
Is it possible to see whether the position ofgenes in the chromosome is randomlydistributed on the leading and laggingstrand?
© Genetics of Bacterial Genomes Institut Pasteur21 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Chosing arbitrarilyan origin ofreplication and aproperty of thestrand (basecomposition, codonusage bias, aminoacid composition ofthe codedprotein…) one canuse statistics tosee whether thehypothesis holds
To lag or to lead...
© Genetics of Bacterial Genomes Institut Pasteur22 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
To lag or to lead, thatis the question
.
Bases
Amino acids
Codons
Dinucleotides
0,450,5
0,550,6
0,650,7
0,750,8
0,85
0 20 40 60 80 100
Bacillussubtilis
accu
racy
Borreliaburgdorferi
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100 0,4
0,5
0,6
0,7
0,8
0,9
1 Chlamydiatrachomatis
0 20 40 60 80 100
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Escherichiacoli
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Haemophilusinfluenzae
0 20 40 60 80 100
Helicobacterpylori
0,4
0,45
0,5
0,55
0,6
0,65
0,7
0,40,45
0,50,55
0,60,65
0,70,75
0,8
0 20 40 60 80 100
Methanobacteriumthermoautotrophicum
position (%) position (%) position (%)
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Mycobacteriumtuberculosis
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100
Treponemapallidum
© Genetics of Bacterial Genomes Institut Pasteur23 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Conclusion 1
Proteins are made of 20 amino acidtypes, among which Valine andThreonine, and one observes thatValine-rich protein are on the leadingstrand while Threonine-rich proteins areon the lagging strand! Isologousproteins replace preferentially oneresidue for the other when their genechange strand
© Genetics of Bacterial Genomes Institut Pasteur24 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
To lag or to lead, thatis the question
.
0
5
10
15
0 5 10 15
% V
% T
Borreliaburgdorferi
0
5
10
15
0 5 10 15
% V%
T
Chlamydiatrachomatis
180
90
0
270
55% leading
Escherichia coli
Ori
Ter
90270
65% leading
Treponema pallidum
Ori
Ter
180
9027075% leading
Bacillus subtilis
Ori
Ter
90270
87% leading
Thermoanaerobactertengcongensis
Ori
Ter
CDS densityLeading CDS density
(updated from Kunst etal , Nature, 97)
Different “OperatingSystems”?
© Genetics of Bacterial Genomes Institut Pasteur26 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Conclusion 2 andmore questions
… The genome organisation is much more rigid
than usually assumed. Some regions (suchas the terminus) are rather unstable, but mostof the genome structure is preserved throughevolution. The distribution of genes on theleading and lagging strands is highly non-random. Is it associated to some particularfunction (such as the Operating System in acomputer)? Where are essential geneslocated?
© Genetics of Bacterial Genomes Institut Pasteur27 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Essentiality, notexpressivity dominates
the strand-choice
Most essential genes are located in theleading strandMany highly expressed genes arelocated in the lagging strandEssentiality organises the genome’sarchitecture
© Genetics of Bacterial Genomes Institut Pasteur28 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
(adapted from Brewer, Cell, 88)
© Genetics of Bacterial Genomes Institut Pasteur29 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Essentiality in B. subtilis
highly
expressed
0%
25%
50%
75%
100%
non-highly
expressed
Essential genes
highly
expressed
non-highly
expressed
Non-essential genes
Lagging
Leading
© Genetics of Bacterial Genomes Institut Pasteur30 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Distribution of highly expressed genes
Highly expressed genescluster near the origin infast-growing bacteria
Origin
Terminus
Middle
Ori
Ter
10%
20%
30%
40%
50%
60%
70%
0%
C. c
resc
entu
s
M. t
uber
culo
sis
E. c
oli
B. su
btili
s
Fast growers | Slow growersFast growers | Slow growers
© Genetics of Bacterial Genomes Institut Pasteur31 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Gene order conservationGene order conservation
16S:.063 nt-1 16S:.051 nt-1
16S:.071 nt-116S:.076 nt-1
.
Chlamydia pneumoniae
Ch
lam
yd
ia
mu
rid
aru
m
.
Yersinia pestis
Esch
eri
ch
ia
co
li
.
Ba
cil
lus
su
btil
is
Bacillus halodurans
origin
terminus
Buchnera Ap
Buch
nera
Bp
© Genetics of Bacterial Genomes Institut Pasteur32 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Exploring“neighborhoods”
Genes do not operate inisolationProteins are part ofcomplexes, as are partsin an engine
It is important tounderstand theirrelationships, as those inthe planks which make aboat
© Genetics of Bacterial Genomes Institut Pasteur33 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Exploringneighborhoods
To make discoveries we explore the general« neighborhoods » of genes of interest: proximity in thechromosome, in evolution, in the literature, in biochemicalcomplexes, in metabolism etc.
Comparative genomics is essential, hence the launching ofseveral parallel genome programs
© Genetics of Bacterial Genomes Institut Pasteur34 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Gene vicinity: synteny
© Genetics of Bacterial Genomes Institut Pasteur35 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Three processes are needed for Life:
Information transfer: groups of co-variant geneexpression
Driving force for a coupling between the genomestructure and the structure of the cell:
Metabolism (Internal organisation)Compartmentalization (General structure)
What is Life?
© Genetics of Bacterial Genomes Institut Pasteur36 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Co-variation of geneexpression
Collecting data using large scale genomicstechniques :
DNA chips and protein fingerprinting
© Genetics of Bacterial Genomes Institut Pasteur37 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Expressionneighborhood: all genes
on a chip
•Green:
Firstcondition
•Red:
Secondcondition
•Yellow:
no difference
Two conditions are compared on the same chip
© Genetics of Bacterial Genomes Institut Pasteur38 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Protein fingerprinting
A technique named « two-dimensional gelelectrophoresis » allows oneto separate all the proteinsof a cell, and to color them.Once colored, proteins aresorted and identified by« mass spectrometry ».Different cells or cells indifferent environments havea different pattern, exactlyas individuals have differentfingerprints.
© Genetics of Bacterial Genomes Institut Pasteur39 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Three processes are needed for Life:
Information transfer
Driving force for a coupling between the genomestructure and the structure of the cell:
Metabolism (Internal organisation)Compartmentalization (General structure)
What is Life?
© Genetics of Bacterial Genomes Institut Pasteur40 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Metabolicneighborhoods:
Chemical pathways
Often, drug targets are found in metabolic pathways.The idea is to mimick a normal molecule in the celland to replace it by a similar one which kills theactivity of some protein: this is exactly what poisonsdo! Antibiotics are poisons of a special kind, whichact against microbes: to discover a new antibiotic isto discover a self-consistent pathway
© Genetics of Bacterial Genomes Institut Pasteur41 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Metabolic neighborhood:unexpected selective
constraints
OMP UMP UDPUDP UTP CTP
RNA
DNA
CMP
CDPdCDP
target
target
CDP diglycerides
phospholipids
AnticancerAnticancertargettarget??
DNA is made from NDP nucleotides, but CDP is absent:
© Genetics of Bacterial Genomes Institut Pasteur42 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Three processes are needed for Life:
Information transfer
Driving force for a coupling between the genomestructure and the structure of the cell:
Metabolism (Internal organisation)Compartmentalization (General structure)
What is Life?
© Genetics of Bacterial Genomes Institut Pasteur43 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
A dangerous intermediate
OMP UMP UDPUDP UTP CTP
dUDP
dUTPDNA dUMP + PPi
dTMP
dTDP
dTTP
DNA
pyrHUridylate
kinase (UMK)
DNA is made from NDP nucleotides, but UDP must not get in:
© Genetics of Bacterial Genomes Institut Pasteur44 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Uridylatekinase
] Essential enzyme] Different origin in cells without a nucleus
(bacteria) and in cells with a nucleus] Conjecture: UDP must be
compartmentalized to prevent U to enterDNA
© Genetics of Bacterial Genomes Institut Pasteur45 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Cellcompatmentalisation
in vivoThe pyrH geneis fused withthe reportergfp gene andreplaces itswild-typecounterpart.One observeslocalisation ofGFP under themembrane,and at foci….
© Genetics of Bacterial Genomes Institut Pasteur46 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Conclusion
The genome organisation is much morerigid than usually assumed.
Some regions (such as the terminus)are rather unstable, but most of thegenome structure is preserved throughevolution
© Genetics of Bacterial Genomes Institut Pasteur47 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Why a computer?The power ofalgorithms
The analogy between the cell is a computer goes beyongthe separation between the program and the machine
The structure of the program itself is subject toarchitectural constraints, that must derive from somesort of selection pressure, building up an image of thecell in the program
Computing allows one to reinvestigate the idea ofpreformism: the cell is not preformed, but its algorithmof construction is
© Genetics of Bacterial Genomes Institut Pasteur48 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Gestalt andAlgorithms
« Start from the top, middle
Go downfrom right to left
Accelerate
Turn right
etc.
© Genetics of Bacterial Genomes Institut Pasteur49 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
From the cell tothe animal?
The analogy between the cell is a computer goes beyongthe separation between the program and the machine
Can this analogy be extended to organisms as wholeentities?
Homeogene control the development of animals (andplants): the order of these genes along chromosomesfollow the plan of the body of the animal
© Genetics of Bacterial Genomes Institut Pasteur50 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Drosophiloculus,Homunculus ?
© Genetics of Bacterial Genomes Institut Pasteur51 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Drosophiloculus,
Homunculus ?
© Genetics of Bacterial Genomes Institut Pasteur52 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Our working hypothesis was that thecell would behave as a computer.This conjectures that an architecturalprogram exists in the chromosome.This may be wrong, but science goesto India… and may find America !
From the geneticprogram to a research
programme?
A. Danchin The Delphic Boat. What genomes tell us (2003) Harvard University Press
© Genetics of Bacterial Genomes Institut Pasteur53 http://www.pasteur.fr/recherche/unites/REG/ [email protected]
Anglo-American
NATO
Bottom Up
Data-driven
Greco-Latin
OTAN
Top Down
Hypothesis-driven
Chinese
« Bombardment of the Chinese Embassy in Belgrade »
Sideways
Context-driven
causeries/Western.html