33
Day 3: From Homology to Orthology: Predicting protein function. Fitch 1970: “Where the homology is the result of gene duplication so that both copies have descended side by side during the history of an organism, (for example alpha and beta hemoglobin) the genes should be called paralogous (para= in parallel). Where the homology is the result of speciation so that the history of the gene reflects the history of the species (for example, alpha hemoglobin in man and mouse) the genes should be called orthologous (ortho=exact)”

Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Day 3: From Homology to Orthology:

Predicting protein function.

Fitch 1970: “Where the homology is the result of gene duplication so that both copies

have descended side by side during the history of an organism, (for example alpha and

beta hemoglobin) the genes should be called paralogous (para= in parallel). Where the

homology is the result of speciation so that the history of the gene reflects the history of

the species (for example, alpha hemoglobin in man and mouse) the genes should be

called orthologous (ortho=exact)”

Page 2: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Mechanism of Gene Duplication

Hurles M (2004) Gene Duplication: The Genomic Trade in Spare Parts. PLoS Biol 2(7): e206. doi:10.1371/journal.pbio.0020206

Page 3: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Expansion of the globin family from a single ancestral member

Page 4: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Olfactory receptor family in mouse (green) and human (red, ~ 1000), many are shared between the species and predate the

speciation

Page 5: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Acad9, an example of a gene duplication followed

by neofunctionalization

Page 6: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Neofunctionalisation while maintaining some original function: The

assembly factor ACAD9 that originated via gene duplication from

VLCAD at the bilateria has maintained its active site and has

retained catalytic activity. Altered helices likely involved in

complex I assembly have been determined with sequence

harmony

Nouws et al, Cell. Metab. 2010

Page 7: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Subfunctionalisation + Neofunctionalisation in complex I (Zhu et al, Nature 2016)

Page 8: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

The acyl carrier protein (SDAP/ ACP) that functions in fatty

acid synthesis in plants has acquired a role in complex I in

fungi/metazoa (neofunctionalization)

Sub-functionalization after gene duplication for the complex I

binding Acyl Carrier Protein in Y. lipolytica

fatty acids synthesis

fatty acids synthesis(?) + complex I interacting with B14

complex I interacting with B22

fatty acids synthesis + complex I interacting with B22 AND B14

fatty acids synthesis

fatty acids synthesis

fatty acids synthesis (does not have complex I)

Page 9: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Comparing genomes for their

genes, orthologs

Gene A

A

B

Species I

Species II Gene duplication Speciation

Orthologs

“Which genes do two genomes share, and which don’t they share, and how does that relate

to their phenotypical similarities and differences”

Page 10: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Genome I

Genome II

35% 30%

25% 23%

Orthologs are expected to have relatively high levels of sequence

identity to each other (compared to other non-orthologous homologs),

because they diverged relatively recently, and …… because they have

similar functions…. (???)

Large scale orthology determination is often done using bidirectional

best hits: a so-called “graph based approach”.

Page 11: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Genome I

Genome II

35% 35%

25% 23%

Genome III

40% 30% 22%

In graph based approaches multiple genomes can be used to check for

consistency of bidirectional best hits.

35% 20%

25%

Page 12: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Gene duplications are creative, creating the possibility for

developing new functions (in this case involved in carnitine

synthesis) but …. They mess up orthology: i.e. orthology is

non-transitive

Page 13: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Inparalogs versus outparalogs:

Inparalogs are due to relatively recent, species-specific gene duplications, e.g.

Q9V6P0 and Q9VY24.

Outparalogs are due to gene duplications that preceded speciations, e.g. Q9V6P0 vs.

Q9VDM7

Page 14: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Solution to the non-transitivity of the concept of orthology sensu

stricto is: “Group orthology”

Conceptually: all proteins that are directly descended from one

protein in the last common ancestor are considered orthologous to

each other

In graph based approaches: Combine all connected “best triangular

hits” into Clusters of Orthologous Groups (COGs, Tatusov et al,

1997). WWW.NCBI.NLM.GOV

Page 15: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein
Page 16: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Gene A

A

B

Species I

Species II Gene duplication Speciation

Non-Orthologs,

although

bidirectional

best hits

Parallel non-orthologous gene-loss can lead to misidentification of orthology relations

when using best bi-directional hits as criterion.

Gene loss

Page 17: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

B . s u b t i l i s D n a K

E . c o l i H s c A

B u c h n e r a H s c A

R . p r o w a z e k i i H s c A

9 2 2

H . s a p i e n s 1 9 9 6 4 2

S . c e r e v i s i a e Y H R 0 6 4 C

H . s a p i e n s 5 9 0 6 2

H . s a p i e n s 1 8 7 1 1 6

S . c e r e v i s i a e B R 1 6 9 C

S . c e r e v i s i a e Y P L 1 0 6 C

1 0 0 0

1 0 0 0

S . c e r e v i s i a e Y K L 0 7 3 W

9 9 7

7 6 9

6 0 6

9 7 1

9 6 6

E . c o l i D n a K

B u c h n e r a D n a K

R . p r o w a z e k i i D n a K

H . s a p i e n s 2 3 6 2 7

S . c e r e v i s i a e E C M 1 0

S . c e r e v i s i a e S S C 11 0 0 0

S . c e r e v i s i a e S S Q 1

6 1 6

5 0 7

1 0 0 0

9 2 7

8 5 0

0 . 2

Variations in the rate of evolution can lead to misidentification of

orthology relations when the latter are based on bi/multi-directional

best hits.

Page 18: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Because of independent loss events, and because of variable rates

of evolution, in large gene families, orthology determination using

bi/multi-directional best hits does not always resolve separate

orthologous and/or functional groups.

One solution to this is the creation of phylogenies………

Page 19: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein
Page 20: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Prediction of orthology using phylogenies (unrooted)

Page 21: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Classic usage of phylogeny: inferring evolutionary history from a single,

orthologous group of proteins, e.g. the origin of hydrogenosomes. Trees

are not always perfect (even when published in Nature).

Dyall et al, Nature 2004

Hrdy et al, Nature 2004

Page 22: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Unrooted tree topologies

= =

=

((A,B),(C,D)) Bracket notation

A A

A

B B

B

C

C C

D

D

D

Page 23: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

-Unrooted tree topologies only reflect relative evolutionary

relations (In the primates the humans and chimpanzee are closer

related to each other than they are to the Orang-Otang and the

Gibbon)

-Rooted trees reflect relative order of descendance (In the

primates first the Gibbon branched off, then the Orang-Otang

branched off, then the chimpanzee and then the humans)

Orang-Otang Gibbon

Chimp Human

Chimp

Human

Orang-Otang

Gibbon

Baboon

Page 24: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

How we root a tree affects the orthology relationships.

Spec I A

Spec II C

Spec II B

Spec I D

Spec II C Spec I D

Spec I D Spec II B Spec I A

Spec I A Spec II B

Spec II C

Page 25: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

How we root a tree also affects the number of gene losses, if the duplication from

A to A’ would have happened in the ancestor of Spec I, II and III, then where is

gene A’ in Spec II and III? Has it been lost?

Spec I D Spec II B Spec II C Spec I A Spec II ?

Spec I ?

Page 26: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Effectively, what one does when one roots the tree based on “species overlap”

between the partitions is minimizing gene losses and duplications

Spec I A

Spec II C

Spec II B

Spec I D

Spec II C Spec I D

Spec I D Spec II B

Spec I A Spec II B

Spec II C Spec I A

Page 27: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Beyond inparalogs and outparalogs: A numbering

system for paralogy. LOFT (Levels of Orthology

from Trees)

Page 28: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Prefab phylogenetic trees: TreeFam

Page 29: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

TreeFam contains the domain composition of the

proteins, is sometimes varies between paralogous

proteins.

Page 30: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Substrate specificities are not necessarily monophyletic (convergent

evolution).

Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate

dehydrogenase. Wu et al., PNAS 1999

Page 31: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

CH3 - C - C - O-

OH

H O

Lactate

CH2 - C - C - O-

OH

H O

-O - C -

O

Malate

CH2 - C - C - O-

O

O

-O - C -

O

Oxaloacetate

CH3 - C - C - O-

O

O

Pyruvate

LDH

MDH

Lactate/Malate Dehydrogenase Different small-molecule specificity

Page 32: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Lactate/Malate Dehydrogenase

CH3 - C - C - O-

OH

H O

Lactate

CH2 - C - C - O-

OH

H O

-O - C -

O

Malate

negative

Arg 102

positive

Hannenhalli & Russell, JMB, 303, 61-76, 2000

Page 33: Day 3: From Homology to Orthology: Predicting protein function. - …biosb.nl/wp-content/uploads/2016/11/BioSB-orthology.pdf · Day 3: From Homology to Orthology: Predicting protein

Another source of information that can be used for orthology

prediction is gene-order conservation.

35% 35%

(be careful for duplicated sets of genes though)