44
Bioinformatics and Evolutionary Bioinformatics and Evolutionary Genomics : Genomics : Pathway evolution Pathway evolution

Bioinformatics and Evolutionary Genomics : Pathway evolution

  • Upload
    moana

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics and Evolutionary Genomics : Pathway evolution. What is a pathway ? -An ordered set of proteins and substrates (boundaries) -A graph -A system (systems biology) (includes a notion of function, regulation) - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics and Evolutionary Genomics : Pathway evolution

Bioinformatics and Evolutionary Genomics :Bioinformatics and Evolutionary Genomics :

Pathway evolutionPathway evolution

Bioinformatics and Evolutionary Genomics :Bioinformatics and Evolutionary Genomics :

Pathway evolutionPathway evolution

Page 2: Bioinformatics and Evolutionary Genomics : Pathway evolution

What is a pathway ?

-An ordered set of proteins and substrates (boundaries)

-A graph

-A system (systems biology) (includes a notion of function, regulation)

-A set of proteins that “do something together” (includes complexes, regulatory and signalling pathways), a.k.a. a functional module

-A set of proteins that are co-regulated, or behave similarly in evolution

What is a pathway ?

-An ordered set of proteins and substrates (boundaries)

-A graph

-A system (systems biology) (includes a notion of function, regulation)

-A set of proteins that “do something together” (includes complexes, regulatory and signalling pathways), a.k.a. a functional module

-A set of proteins that are co-regulated, or behave similarly in evolution

Page 3: Bioinformatics and Evolutionary Genomics : Pathway evolution

Tracing the evolution of NADH:ubiquinone oxidoreductase (Complex I of the Tracing the evolution of NADH:ubiquinone oxidoreductase (Complex I of the oxidative phosphorylation), from 14 subunits (Bacteria) to 46 subunits oxidative phosphorylation), from 14 subunits (Bacteria) to 46 subunits

(Mammals) by comparative genome analysis(Mammals) by comparative genome analysis

Bacteria: 14 subunits

Algae: 30

Fungi: 37

Mammals: 46

Plants: 30

Page 4: Bioinformatics and Evolutionary Genomics : Pathway evolution

Name Name Bt Mm Tr Dm Ag Ce Nc Ca Yl Sc Sp At Atc Cr Bacteria

NU1M Chain1 NuoH NU2M Chain2 NuoN

NU3M Chain3 NuoA

NU4M Chain4 NuoM

NULM Chain4L NuoK NU5M Chain5 NuoL

NU6M Chain6 NuoJ

NUAM 75kD NuoG NUBM 51kD NuoF

NUCM 49kD NuoC

NUGM 30kD NuoD

NUHM 24kD NuoE NUIM TYKY NuoI

NUKM 20kD NuoB

ACPM SDAP COG0236

NUEM 39kD COG0702 N5BM B14.7

NESM ESSM

NI8M B8 NUYM AQDQ NOG07158

NIDM PDSW

NUFM B13

NUPM 19kD NIPM 15kD

Distribution of Complex I subunits among model species, inDistribution of Complex I subunits among model species, in red red identified at identified at the protein level (exp.), in the protein level (exp.), in yellow yellow at the gene level.at the gene level.

FungiFungiMammalsMammals Plants/AlgaePlants/AlgaeArthro.Arthro.

Page 5: Bioinformatics and Evolutionary Genomics : Pathway evolution

NUMM 13kD COG4391

N7BM B17.2 COG3761

NI2M B22

NB6M B16.6

NB8M B18

NB4M B14

NB2M B12

CI30 CI30 ZP_00241795

CI84 CI84

NIMM MWFE

NB5M B15

NIAM ASHI

NIGM AGGG

NISM SGDH

NUDM 42kD COG1428

N4AM B14.5a

NB7M B17

NUOM 9/10kD

N4BM B14.5b

NUML MLRQ

NINM MNLL

NIKM KFYI

NI9M B9

NUXM 20.9kD

NUZM 21.3a

NURM 17.8kD

Plant1 25/27kD FBP-like

Plant2 30/32kD FBP-like

Plant3 29kD FBP-like

Plant4 6kD

Plant5 8kD

Plant6 17kD

Plant7 NDH11

Plant8 NDH16

Plant9 9kD

Plant10 16kD

Plant11 19kD

Bt Mm Tr Dm Ag Ce Nc Ca Yl Sc Sp At Cr

Distribution of Complex I subunits among model species, inDistribution of Complex I subunits among model species, in red red identified at the protein identified at the protein level (exp.), in level (exp.), in yellow yellow at the gene level, in white at the DNA level.at the gene level, in white at the DNA level.

FungiFungiMammalsMammals Plants/AlgaePlants/AlgaeInsectsInsects

Page 6: Bioinformatics and Evolutionary Genomics : Pathway evolution

Reconstructing Complex I Reconstructing Complex I evolution by mapping the evolution by mapping the

variation onto a phylogenetic variation onto a phylogenetic tree. After an initial “surge” in tree. After an initial “surge” in

complexity (from 14 to 35 complexity (from 14 to 35 subunits in early eukaryotic subunits in early eukaryotic

evolution) new subunits have evolution) new subunits have been gradually added and been gradually added and

incidentally lost.incidentally lost.

Complex I loss is not always Complex I loss is not always “complete”, S.cerevisiae and “complete”, S.cerevisiae and S.pombe have retained 1 and S.pombe have retained 1 and

3 proteins3 proteins

Six of the eukaryotic Complex Six of the eukaryotic Complex I proteins have been I proteins have been

“recruited” from the alpha-“recruited” from the alpha-proteobacteriaproteobacteria

Page 7: Bioinformatics and Evolutionary Genomics : Pathway evolution

Beyond Blastology, Cogoly: Phylogenies for orthology Beyond Blastology, Cogoly: Phylogenies for orthology predictionprediction

The Complex I assembly protein CI30 has been duplicated in the Fungi. The Complex I assembly protein CI30 has been duplicated in the Fungi. This can explain the presence of a CIA30-homolog in Complex I-less This can explain the presence of a CIA30-homolog in Complex I-less S.pombeS.pombe

Page 8: Bioinformatics and Evolutionary Genomics : Pathway evolution

In the eukaryotic evolution of Complex I, new subunits have been added “all over” the In the eukaryotic evolution of Complex I, new subunits have been added “all over” the complexcomplex

Gabaldon et al, J. Mol. Biol 2005

Page 9: Bioinformatics and Evolutionary Genomics : Pathway evolution

Eukaryotic evolution of Eukaryotic evolution of Complex I contrasts in which Complex I contrasts in which individual subunits have been individual subunits have been added to a growing complex added to a growing complex contrasts with prokaryotic contrasts with prokaryotic

evolution in which separate, evolution in which separate, multi protein complexes multi protein complexes

appear to have been appear to have been assembled (T. Friedrich).assembled (T. Friedrich).

An explanation for this An explanation for this contrast is the “operon” contrast is the “operon” genome organization of genome organization of

prokaryotes, which facilitates prokaryotes, which facilitates the duplication of sets of the duplication of sets of

interacting proteins.interacting proteins.

Page 10: Bioinformatics and Evolutionary Genomics : Pathway evolution

CO G 0021CO G 0213CO G 2820

ribose phosphate metabolism (not cohesive at all)ribose phosphate metabolism (not cohesive at all)

CO G 0707CO G 0769CO G 0770CO G 0771CO G 0773CO G 0796CO G 0812CO G 1181

peptidoglycan biosynthesis pathway (highly cohesiveness, far from perfect)peptidoglycan biosynthesis pathway (highly cohesiveness, far from perfect)

Is this variation in subunits the exception or Is this variation in subunits the exception or the rule for functional modules?the rule for functional modules?

Very few functional modules are perfect; limited cohesiveness; functional units vs evolutionary units

Page 11: Bioinformatics and Evolutionary Genomics : Pathway evolution

Non-orthologous gene displacement/analogous proteinsNon-orthologous gene displacement/analogous proteins

Not specific to the “genome” age, but research into this topic has increased dramatically with the availability of complete genomes.(people would encounter “missing links”, and start hypothesizing about what could fill up this gap)

First systematic analysis on M.genitalium (Koonin et al., Trends Genet. 1997)

Not specific to the “genome” age, but research into this topic has increased dramatically with the availability of complete genomes.(people would encounter “missing links”, and start hypothesizing about what could fill up this gap)

First systematic analysis on M.genitalium (Koonin et al., Trends Genet. 1997)

Page 12: Bioinformatics and Evolutionary Genomics : Pathway evolution
Page 13: Bioinformatics and Evolutionary Genomics : Pathway evolution
Page 14: Bioinformatics and Evolutionary Genomics : Pathway evolution

The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: predicting anti-correlation / complementary patterns: predicting

analogous enzymesanalogous enzymes

The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: predicting anti-correlation / complementary patterns: predicting

analogous enzymesanalogous enzymes

A B A B

Genes with complementary phylogenetic profiles tend to have a similar biochemical function.Genes with complementary phylogenetic profiles tend to have a similar biochemical function.

Page 15: Bioinformatics and Evolutionary Genomics : Pathway evolution

Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes

Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes

Page 16: Bioinformatics and Evolutionary Genomics : Pathway evolution

Prediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmed

Page 17: Bioinformatics and Evolutionary Genomics : Pathway evolution

(recent) Gene Duplication(recent) Gene Duplication(recent) Gene Duplication(recent) Gene Duplication

• fate after duplication: neofunctionalization or fate after duplication: neofunctionalization or subfunctionalization subfunctionalization

• GO process / molecular function / cellular componentGO process / molecular function / cellular component

• Substrate vs catalytic site / mechanismSubstrate vs catalytic site / mechanism

• fate after duplication: neofunctionalization or fate after duplication: neofunctionalization or subfunctionalization subfunctionalization

• GO process / molecular function / cellular componentGO process / molecular function / cellular component

• Substrate vs catalytic site / mechanismSubstrate vs catalytic site / mechanism

Page 18: Bioinformatics and Evolutionary Genomics : Pathway evolution

subfunctionalization: example in terms of protein subfunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)

subfunctionalization: example in terms of protein subfunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)

Page 19: Bioinformatics and Evolutionary Genomics : Pathway evolution

neofunctionalization: example in terms of protein neofunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)

neofunctionalization: example in terms of protein neofunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)

Page 20: Bioinformatics and Evolutionary Genomics : Pathway evolution

Sub vs neo in regulatory contextSub vs neo in regulatory contextSub vs neo in regulatory contextSub vs neo in regulatory context

OLD VIEW

NEW VIEW

Moore and Purugganan 2005 b

Page 21: Bioinformatics and Evolutionary Genomics : Pathway evolution

An example of a metabolic Pathway: Histidine Metabolism (including biosynthesis) in KEGGAn example of a metabolic Pathway: Histidine Metabolism (including biosynthesis) in KEGG

Page 22: Bioinformatics and Evolutionary Genomics : Pathway evolution

Histidine Biosynthesis in EcoCyc

Page 23: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway evolution:Pathway evolution:

How to evolve a complex thing, when the intermediates don’t How to evolve a complex thing, when the intermediates don’t make sense make sense See the discussion regarding the evolution See the discussion regarding the evolution of the eye.of the eye.

Pathway evolution occurs at two levels:Pathway evolution occurs at two levels:

which substrate will be turned into which productwhich substrate will be turned into which product

Get the proteins to catalyze the required reactionsGet the proteins to catalyze the required reactions

Page 24: Bioinformatics and Evolutionary Genomics : Pathway evolution

Model of Horowitz (1945): “Retrograde evolution” (Back Model of Horowitz (1945): “Retrograde evolution” (Back propagation by gene duplication within the pathway)propagation by gene duplication within the pathway)

1)1) Given a good “soup”, first evolve the enzyme for the last Given a good “soup”, first evolve the enzyme for the last step of the pathway (the other intermediates are in the step of the pathway (the other intermediates are in the soup)soup)

2)2) Secondly, as the substrate of the last step is the product of Secondly, as the substrate of the last step is the product of the preceding step, the enzymes need similar binding sites the preceding step, the enzymes need similar binding sites duplicate the gene encoding the last step to evolve the duplicate the gene encoding the last step to evolve the last minus one steplast minus one step

3)3) Iterate step 2Iterate step 2

Page 25: Bioinformatics and Evolutionary Genomics : Pathway evolution

time

Gene duplication

Gene duplication

Horowitz model of pathway evolution

enzyme

End prod.substr.

Page 26: Bioinformatics and Evolutionary Genomics : Pathway evolution

We have data !! (no time machine), but we can test whether We have data !! (no time machine), but we can test whether homologous proteins tend to cluster in pathways.homologous proteins tend to cluster in pathways.

Some pathways do display such clustering.e.g. Tryptophane, Some pathways do display such clustering.e.g. Tryptophane, Histidine biosynthesis contain subsequent steps catalyzed by Histidine biosynthesis contain subsequent steps catalyzed by homologous proteinshomologous proteins

Page 27: Bioinformatics and Evolutionary Genomics : Pathway evolution

Teichmann et al, Trends Biotechn. 2001

Page 28: Bioinformatics and Evolutionary Genomics : Pathway evolution

Homologous proteins are overrepresented at short distances withinHomologous proteins are overrepresented at short distances withinpathways, supporting the Horowitz model.pathways, supporting the Horowitz model.

Page 29: Bioinformatics and Evolutionary Genomics : Pathway evolution

Alternative theory of pathway evolution:Alternative theory of pathway evolution:

Jensen, 1976: Enzyme recruitment in evolution of new Jensen, 1976: Enzyme recruitment in evolution of new functionfunction

Primordial enzymes were multifunctional (“substrate Primordial enzymes were multifunctional (“substrate ambiguity”)ambiguity”)Ordered pathways were evolved from these enzymes by Ordered pathways were evolved from these enzymes by gene duplication followed by specialization (recruitment)gene duplication followed by specialization (recruitment)

Page 30: Bioinformatics and Evolutionary Genomics : Pathway evolution
Page 31: Bioinformatics and Evolutionary Genomics : Pathway evolution

How many proteins are really multifunctional ?How many proteins are really multifunctional ?

Example: finding the fructose 1,6 biphosphate phosphatase in Example: finding the fructose 1,6 biphosphate phosphatase in the Archaeathe Archaea

Stec B, Yang H, Johnson KA, Chen L, Roberts MF.Stec B, Yang H, Johnson KA, Chen L, Roberts MF.MJ0109 is an enzyme that is both an inositol MJ0109 is an enzyme that is both an inositol monophosphatase monophosphatase and the 'missing' archaeal fructose-1,6-bisphosphatase.and the 'missing' archaeal fructose-1,6-bisphosphatase.Nat Struct Biol. 2000 Nov;7(11):1046-50.Nat Struct Biol. 2000 Nov;7(11):1046-50.

A number of multifunctional are being discovered but the A number of multifunctional are being discovered but the question remains whether multifunctional enzymes played a question remains whether multifunctional enzymes played a larger role in early evolutionlarger role in early evolution

Page 32: Bioinformatics and Evolutionary Genomics : Pathway evolution

Structural assignments and sequence comparisons were used to show that 213 domain families constitute approximately 90% of the enzymes in the small-molecule metabolic pathways. Catalytic or cofactor-binding properties between family members are often conserved, while recognition of the main substrate with change in catalytic mechanism is only observed in a few cases of consecutive enzymes in a pathway. Recruitment of domains across pathways is very common, but there is little regularity in the pattern of domains in metabolic pathways. This is analogous to a mosaic in which a stone of a certain colour is

selected to fill a position in the picture.(Teichmann et al., 2001) Pathway evolution operates mainly by recruitment, not by Horowitz’ retrograde evolution.(notice that this is not so surprising, given what we learned on day 2:Substrate specificities are relatively volatile aspects of the enzyme evolution, catalytic function is much better conserved the “conservation of substrate binding, evolution of catalytic function” argument is not really what one encounters in present day evolution This does not necessarily support the Jensen theory of substrate ambiguity.

Page 33: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway duplication: co-duplicate multiple functional interacting proteins to together take a place in a new pathway.

Pathway duplication

Page 34: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway duplication at the protein level: homologous (sometimes identical) proteins are used to catalyze a chain of similar reactions

propionateATP + CoA

acetate

AMP + PPipropionyl-CoA

2-methylcitrate

2-methylisocitrate

acetyl-CoA

citrate

isocitrate

succinate pyruvate succinate glyoxylate

citrate synthase

propionyl-CoA synthase acetyl-CoA synthase

aconitase

isocitrate lyase2-methyl isocitrate lyase

acinotase + prpD

2-methyl citrate synthaseH2O + oxaloacetate

CoA

H2O

Page 35: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway duplication between (methyl)citric acid metabolism and Amino-Acid biosynthesis (Lysine, Leucine)

Lys20 homologous to LeuA, not GltA

HacAB homologous to LeuCD, Acn

PH1722 homologous to icd, LeuB

Page 36: Bioinformatics and Evolutionary Genomics : Pathway evolution

Methods: define paraCOGsMethods: define paraCOGsMethods: define paraCOGsMethods: define paraCOGs

all COGs& NOGs

HMMs Raw outputMSAs(Muscle) (HHmake) (HHsearch)Align

create HMM profiles

All vs. allprofile-profile

searches

Assign homology

Page 37: Bioinformatics and Evolutionary Genomics : Pathway evolution

Methods: define functional modulesMethods: define functional modulesMethods: define functional modulesMethods: define functional modules

Functional module: primary building block of biomolecular systems, i.e. metabolic or signaling pathway or protein complex

all COGs& NOGs

‘Rough’ functional modules

Functionally linked COG pairs, recalculated for genomic context links

only (npf)

STRINGdataset CFinder

Clustering

Specific functional modules

Iterative module subclustering

CFinder

Page 38: Bioinformatics and Evolutionary Genomics : Pathway evolution
Page 39: Bioinformatics and Evolutionary Genomics : Pathway evolution

Tracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductases

Page 40: Bioinformatics and Evolutionary Genomics : Pathway evolution

Duplication of NqrDE/RnfAE Duplication of NqrDE/RnfAE occurred prior to module occurred prior to module

duplicationduplication

Duplication of NqrDE/RnfAE Duplication of NqrDE/RnfAE occurred prior to module occurred prior to module

duplicationduplication

Page 41: Bioinformatics and Evolutionary Genomics : Pathway evolution

Reconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductases

• Sub-functionalization on the protein complex levelSub-functionalization on the protein complex level

Redox-driven Na+-pump

Reductase of proteins involved in nitrogen-fixation

Page 42: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway duplication is prevalent in signalling, transport pathways. (The evolution of the MAP kinase pathways: coduplication of interacting proteins leads to new signaling cascades. Caffrey DR, O'Neill LA, Shields DC. J Mol Evol 1999

Nov;49(5):567-82)

Page 43: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway duplication in signaling pathways is:

1) Easy because one does not have to change the substrate specificity

2) Hard because one does not want too much crosstalk…

Is it one duplication of the entire pathway or stepwise duplication?

Pathway duplication in signaling pathways is:

1) Easy because one does not have to change the substrate specificity

2) Hard because one does not want too much crosstalk…

Is it one duplication of the entire pathway or stepwise duplication?

Page 44: Bioinformatics and Evolutionary Genomics : Pathway evolution

Pathway evolution scenariosPathway evolution scenarios