1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

Preview:

Citation preview

1

Paralogs

Inbal Yanover

Reading Group in Computational Molecular Biology

2

• Orthologs: Homologous sequences are orthologous if they were separated by a speciation event

• Paralogs: paralogous if they were separated by a gene duplication event

Homologs

3

Genomic duplication

Can involve:Individual genes• Genomic segments • Whole genome duplication (WGD)

Gene duplication has a major role in evolution.

4

Whole genome duplication

• Large scale adaptation

• Polyploidy instability

• Back to stability: – gene loss– mutation– genomic rearrangements

5

Fate of duplicated genes

Find specialized ‘niche’:• Localization• Temporal expression• Expression level

Another classification:• Sub – functionalization• Neo – functionalization (lowest probability)• Non – functionalization (70%)

6

Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae

Kellis M, Birren BW, Lander ES.

Nature. Apr 2004.

First article

7

• S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii

• Analyzing post duplication divergence of paralogs

Main ideas

8

• After duplication, usually, one paralog would be lost (random local deletions)

• Both copies will be retained only if they acquire distinct functions

• Eventually: a few paralog genes in the same order and same orientation

• Those regions should be short since chromosomal rearrangements will disrupt gene order over time

Expected signature for genome duplication:

9

Model for WGD followed by massive gene loss

Common ancestor

10

Proving existence of an ancient WGD

• Look for a species (Y) in the lineage of S.cerevisiae (S).

• Y and S should have 1:2 mapping and:– Nearly every region in Y would correspond to 2 regions in S

(‘sister region’). – Each sister region in S would contain an ordered

subsequence of the genes in Y.

– Each sister region in S would contain ~half of Y genes. – Together, the two sister region account for nearly all Y’s genes.– Every region of S would correspond to one region in Y.

11

Y = K. Waltii

• Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae).

• 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae).

• 7% of it’s genes shows no protein similarity to S. Cerevisiae

• Identifying orthologs regions:– Matching genes (based on protein similarity)– Regions with numerous matching genes in the same order.

• Most local regions in K. waltii mapped to two regions in S. cerevisiae.

• Each of those regions matched subset of K. waltii genes.

12

Quantify observations

DCS – Doubly Conserved Synteny:

maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.

13

Gene and region correspondence

14

Results

• 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes)

• DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD

• Typical DCS block:– 27 genes.– Separated by small segments (~3 genes), that match

one conserved region in S. cerevisiae.

15

Duplicate mapping of centromers

Note: no paralogs here !

16

• Using the DCS blocks: define 253 sister regions in S. cerevisiae.

• Many of those could not be recognized without K. waltii mediation.

Duplicated blocks in S. cerevisiae

17

Duplicated blocks in S. cerevisiae

18

Zooming in on one sister region

19

Conclusion

WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.

20

Pattern of gene loss

• Number of chromosomes was doubled.

• Despite WGD, current S. cerevisiae genome:– 13% larger than K. waltii genome.– 10% more genes.

• Gene loss: – large segmental deletions <-> individual gene deletions.– Balanced between two paralogs <-> act primarily on one of

them.

• Analysis of DCS blocks show:– average size of lost segment: 2 genes.– average balance: 43%-57%.

21

Two models – what happens after duplication event

• One copy preserves original function while the other one is free to diverge (Ohno)

• Both copies would diverge more rapidly and acquire new functions

22

Study the evolution of the 457 gene pairs that arose by WGD:

• Use synteny to distinguish them from pairs which arose by local duplication events.

• Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides).

Evolutionary analysis

23

Results

• 17% of gene pairs (76 of 457) showed accelerated protein evolution relative to K. waltii.

• In 95% of them, accelerated evolution was confined to only one paralog

• Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function

24

• 115 gene pairs consisting of one paralog which has evolved >50% faster than the other.

• Often, derived paralogs are specialized in:– Cellular localization (Acc1 - Hfa1)– Temporal expression (Skt5 – Shc1)

Ancestral <-> derived paralogs

25

Ancestral <-> derived paralogs, cont.

• Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes:– Deletion of ancestral paralog was lethal in 18%.

– Deletion of derived paralog was never lethal.

• Explanation:– Derived paralog is not essential under this conditions.– Ancestral paralogue compensate. (but not vice versa)

26

• 60 of the 457 pairs (13%) showed decelerated protein evolution.

• Including highly constrained proteins: – ribosomal proteins (25)– Histone proteins (2)– Translation factors (4)

• In 90% of them both paralogs were very similar (98% amino acid identity versus 55% for all pairs)

more results

27

However…

• ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457)

• Possible explanations:– Too strict criteria– Divergence in regulatory regions will not be seen

here.– Sometimes it’s nice to have two copies.

28

summary

• S. cerevisiae arose from an ancient WGD.– Massive loss of ~90% of duplicated genes in small

deletions.– Preserving at least one copy of each ancestral gene.

• divergence of paralogs:– Accelerated evolution (17%)– Derived genes tend to be specialized in function,

expression level and localization.– Derived genes tend to lose essential aspects of their

ancestral function.

29

Second article

Transcription control reprogramming in genetic backup circuits.

Kafri R, Bar-Even A, Pilpel Y.

Nat Genet. Mar 2005.

30

Introduction

• Severe mutations often don’t result in abnormal phenotype

• Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation

• Suggested mechanism: transcriptional reprogramming

31

Definitions

• Working on S. cerevisiae.

• Paralog pairs defined by BLASTing their DNA sequences.

• Dispensable genes = non essential.

32

Expression parameters

• For each pair of paralog:

– Calculate 40 correlation coefficients of mRNA expression.

– Define: mean expression similarity <= mean.

– Define: partial co regulation (PCoR) <= standard deviation.

33

Summary of observations

Co-expressedExpressed differently

Remote paralog - +

Close paralog + -

: +backup enabled

34

Close paralogs

• Backup increases with co-expression.

• Similar sequences:– Similar expression– Enable backup

Co-expressedExpressed differently

Remote paralog

- +Close paralog

+ -

35

Remote paralogs

Co-expressedExpressed differently

Remote paralog

- +Close paralog

+ -

• Backup is optimal in non-co expressed pairs.

• co-expression (little backup): • interaction• sub-functionalization

36

Suggestion for backup mechanism

• A, B - genes which are expressed differently.

• Upon mutation in A: expression of gene B is reprogrammed.

• Result: wild type expression profile of A.

37

Experimental verifier: reprogramming in Acs1/Acs2

Glucose

Acs1

Acs2

Glucose

Wild-type

Acs1 Acs2

Acs1 Acs2

38

What is the mechanism enabling this change?

• Suggestion: backup occurs among paralogs with partially co regulation.

• Enable switching from different expression profile to similar one.

• Observation: PCoR predicts backup.

39

0 0.2 0.4 0.6

0.6

0.8

1P

ropo

rtio

n o

f d

isp

ensa

ble

ge

nes

Partial motif content overlap is optimal for backup

O=|m1 ∩ m2|

|m1 U m2|

Motif content overlap (O)

Backup measure

40

suggestion

• Unique motifs -> different expression level.

• Shared motifs -> enable responding to the same conditions.

Hypothesis: PCoR underlies reprogramming and backup.

41

In high PCoR paralogs one gene is upregulated when other is deleted

<0.35 >0.45Partial co-regulation (predicted backup capacity)

Fol

d ch

ange

0.35 – 0.450

1

2

3

4

5

6

7

8

9

10

(Hug

hes

et a

l. C

ell 2

000)

42

What controls reprogramming?

• Kinetic model:

TE2

E1

G1

G2

M1

M2

G1, G2 – paralog genes.

E1, E2 – their products.

T – TF which is generated by M1 and has binding site in both genes.

43

Conclusions

• In remote paralogs:Genes which express differently but has partial

common regulation tends to backup each other.

• In close paralogs:Backup increases with co-expression.

44

Third article

Gene regulatory network growth by duplication

Teichmann SA, Babu MM.

Nat Genet. May, 2004

45

• What is the role of gene duplication in regulatory network evolution?

• Determine the extent to which duplicated genes inherit interactions from their ancestors.

• Describe possible mechanisms which leads to the formation of a new interaction.

Main questions

46

• Transcription factor• DNA binding site• Target gene (or transcription unit)

Complex network:• 1 gene is regulated by few transcription factors.• 1 transcription factor controls more than one

gene.

Transcription factor

Target gene

Basic unit of gene regulation

47

Research subjects

E. Coli and yeast known regulatory networks:

> 100 transcription factors regulate several hundreds genes.

Gene regulatory network in Yeast

477 proteins (109 TFs + 368 TGs)901 interactions

Gene regulatory network in E. coli

795 proteins (121 TFs + 674 TGs) 1423 interactions

48

• duplication event:– Inherit regulatory interaction– Lose regulatory interaction

• Also, a new interaction may arise.

Duplication (reminder)

49

• structural protein homology

• Detects more distant relationships than sequence

• > 65% of the genes are the result of gene duplication

• Same domain architecture -> common ancestor.

Homology detecting

50

Duplication of transcription factor

Transcription factor

Target gene

Inheritance

Duplication of TF

Loss and gain

51

Duplication of transcription factor (TF)

• At first, new TF regulates the same target gene.• Divergence:

– Regulate the same gene but respond to a different signal.

– Recognize a new binding site.

• More than 2/3 of TF in E. coli and yeast have at least one interaction in common with their duplicates (128 interaction in E. coli (10%). 188 interactions in yeast (22%))

52

• Both homologous involves drug response.

• They responds to a different signal.

Pdr1 Pdr3

Flr1

Example: Duplication of TF in yeast

53

Duplication of target gene and it’s upstream region

Transcription factor

Target gene

Loss and gain Inheritance

54

Duplication of target gene (TG) and it’s upstream region

First, both genes are regulated by the same TF.

• Divergence: – Change coding sequence but stay under the

same TF control – Change upstream region as well, resulting in

recognition of a different TF

• 272 interaction in E. coli (22%). 166 interactions in yeast (20%)

55

BioA and BioBFCD operons are regulated by BirA TF.

Those are homologous enzymes in the biotin biosynthesis pathway.

Example: Duplication of TG in E. coli

BioA BioF

BirA

56

Duplication of transcription factor (TF) and its target gene (TG) around the same time

Duplication of TF+TGgain gain

57

Duplication of transcription factor (TF) and its target gene (TG) around the same time

• Can happen if both were adjacent on the chromosome.

• New TF regulates only the new TG, while old TF regulates old TG.

• Divergence of TF or TG can result in additional interactions.

• 74 interaction in E. coli (6%). 31 interactions in yeast (4%).

58

Example: Duplication of both TF and its TG in yeast

AraBAD RhaBAD

AraC RhaR

60

Some more numbers

• Duplication and inheritance:

E. Coli yeast

TF: 10% 22%

TG: 22% 20%

Both: 6% 4%

61

• Gene regulatory networks in E. coli and yeast:

The number of TG per TF obeys a power low.

• Do TF with many TG have many homologous genes as their target?

No.

Are duplication patterns linked to topology of networks?

62

Conclusions

• In both E. coli and yeast ~90% of the interactions evolved by duplication:

– Half of them: duplication + inheritance of interaction

– Other half: duplication + gain of new interactions.

63

The End

Of the semester…

Recommended