74
Adaptive Evolution on Genes and Genomes Hernán J. Dopazo Evolutionary Genomics Unit Bioinformatics & Genomics Department Centro de Investigación Príncipe Felipe Valencia. Spain http://hdopazo.bioinfo.cipf.es/ Genomes & Systems Universitat Pompeu Fabra Barcelona, Spain Sunday, 14 November 2010

Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

Embed Size (px)

Citation preview

Page 1: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

Adaptive Evolution on Genes and Genomes

Hernán J. Dopazo

Evolutionary Genomics UnitBioinformatics & Genomics DepartmentCentro de Investigación Príncipe FelipeValencia. Spainhttp://hdopazo.bioinfo.cipf.es/

Genomes & Systems

Universitat Pompeu Fabra

Barcelona, Spain

Sunday, 14 November 2010

Page 2: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

2

Natural Selection

The mechanisms in The mechanisms in

which relative which relative

frequencies of frequencies of

genotypes change genotypes change

according to their according to their

relative fitnesses in relative fitnesses in

the populationthe population

Positive SelectionPositive Selection

Purifying SelectionPurifying Selection

Page 3: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

3

Natural Selection

Natural selection is a process of pervasive Natural selection is a process of pervasive

importance in the biological word, which includes importance in the biological word, which includes

our own species, and on which that species is our own species, and on which that species is

utterly depenent. Progress in evolutionary biology utterly depenent. Progress in evolutionary biology

and its applications is perhaps most obviously and its applications is perhaps most obviously

relevant to medical and environmental issues, but relevant to medical and environmental issues, but

there is no aspect of human life for which an there is no aspect of human life for which an

understanding of evolution is not a vital neccestity.understanding of evolution is not a vital neccestity.

George C. Williams, Plan and Purpose in Nature, 1996

Page 4: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

4

Positive SelectionThe mechanism whereby newly produced mutants have higherhigher fitnessesfitnesses than the average in the population and the frequencies of the mutants increaseincrease in the following generations

The biological consequence of such a mechanism is: adaptationadaptation

Page 5: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

5

Purifying SelectionThe mechanism whereby newly produced mutants have lowerlower fitnessesfitnesses than the average in the population, and the frequencies of the mutants decreasedecrease in the following generations

The biological consequence of such a mechanism is the mantainement ofmantainement ofadaptationadaptation in the population

Page 6: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

6

Neutral EvolutionThe great majority of evolutionary changes at the molecular level are caused by random drift of selectively neutral or nearly neutral mutations

The relevant consequence of such a mechanism is the fixation of fixation of mutations at a mutations at a constant rate with constant rate with in the population

Page 7: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

7

Measuring natural selection at molecular Measuring natural selection at molecular levellevel

Selective pressure at the protein level can be measured as: =dN/dS

where, where, dN is the number of nonsynonymous substitutions per nonsynonymoudN is the number of nonsynonymous substitutions per nonsynonymous site s site and dSand dS is is the number of synonymous substitutions per synonymous site betwethe number of synonymous substitutions per synonymous site between two en two proteinprotein--coding sequencescoding sequences

Page 8: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

8

Measuring natural selection at molecular Measuring natural selection at molecular levellevel

Selective pressure at the protein level can be measured as: =dN/dS

where, where, dN is the number of nonsynonymous substitutions per nonsynonymoudN is the number of nonsynonymous substitutions per nonsynonymous site s site and dSand dS is is the number of synonymous substitutions per synonymous site betwethe number of synonymous substitutions per synonymous site between two en two proteinprotein--coding sequencescoding sequences

If non-synonymous mutations are favoured by positive selection, non-synonymous mutations will be fixed at faster rate than synonymous mutations, then > 1

Page 9: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

9

Measuring natural selection at molecular Measuring natural selection at molecular levellevel

Selective pressure at the protein level can be measured as: =dN/dS

where, where, dN is the number of nonsynonymous substitutions per nonsynonymoudN is the number of nonsynonymous substitutions per nonsynonymous site s site and dSand dS is is the number of synonymous substitutions per synonymous site betwethe number of synonymous substitutions per synonymous site between two en two proteinprotein--coding sequencescoding sequences

If non-synonymous mutations are favoured by positive selection, non-synonymous mutations will be fixed at faster rate than synonymous mutations, then > 1

If non-synonymous mutations are deleterious (purifying selection), synonymous mutations will be fixed at faster rate, then < 1

Page 10: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

10

Measuring natural selection at molecular Measuring natural selection at molecular levellevel

Selective pressure at the protein level can be measured as: =dN/dS

where, where, dN is the number of nonsynonymous substitutions per nonsynonymoudN is the number of nonsynonymous substitutions per nonsynonymous site s site and dSand dS is is the number of synonymous substitutions per synonymous site betwethe number of synonymous substitutions per synonymous site between two en two proteinprotein--coding sequencescoding sequences

If non-synonymous mutations are favoured by positive selection, non-synonymous mutations will be fixed at faster rate than synonymous mutations, then > 1

If non-synonymous mutations are deleterious (purifying selection), synonymous mutations will be fixed at faster rate, then < 1

If selection has no effect on fitness (neutral evolution), synonymous and non synonymous mutations will be fixed at equal rate, then = 1

Page 11: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

11

Sequence pairwise comparison: Counting methods

Suppose a gene has 300 codons and we observe 3 synonymous and 3 nonsynonymous diferences between two sequences.

Is = 1 ?

S S S

N N N

HCh 1 3002992

If mutations from any one nucleotide to any other occur at the same rate, we expect 25.5% of mutations to be synonymous and 74.5% to be nonsynonymous (a default consequence of the genetic code table).

We do not expect synonymous and nonsynonymous mutations at equalproportions even if there is no selection at the protein level.

Page 12: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

12

Sequence pairwise comparison: Counting methods

Suppose a gene has 300 codons and we observe 3 synonymous and 3 nonsynonymous diferences between two sequences.

Is = 1 ?

S S S

N N N

HCh 1 3002992

The total number of synonymous (S) and nonsynonymous (N) sites probably are close to:

S ; N 5.229255.030035.670745.03003Therefore,

336.05.229

35.670

30.01310.0044 0.0131 dS ; 0.0044 Nd

Page 13: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

13

Counting methods (3 steps)

1. Count the total number of N and S nucleotide sites

Complicated by factors such as ts/tv bias and base/codon frequency bias!!!

H CCA CCG AAA ACC �… TCA GTA CGG AAT

Ch GCA CCG AAA AGC �… ACA GTA GGG AAC1 2 3 4 �… 697 698 699 700

S ; N 6.768366.070034.1331634.07003

Page 14: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

14

Counting methods (3 steps)

1. Count the total number of N and S nucleotide sites

Complicated by factors such as ts/tv bias and base/codon frequency bias!!!

H CCA CCG AAA ACC �… TCA GTA CGG AAT

Ch GCA CCG AAA AGC �… ACA GTA GGG AAC1 2 3 4 �… 697 698 699 700

S ; N 6.768366.070034.1331634.07003

2. Count the synonymous (S) and nonsynonymous (N) differences

N NS SS

H CCA CCG AAA ACC �… TCA GTA CGG AAT

Ch GCA CCG AAA AGC �… ACA GTA GGG CAC1 2 3 4 �… 697 698 699 700

S

This is straigthtforward if the two compared codons differ at onecodon position only. When they differ at 2 or 3 codon positions, there exists 4 or 6 pathways from one codon to the other. The multiple pathways may involve different number of synonymous and nonsynonymous and should ideally be weighted appropriately according to their likelihood of occurrence. Most counting methods use equal weighting.

Page 15: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

15

Counting methods (3 steps)

1. Count the total number of N and S nucleotide sites

Complicated by factors such as ts/tv bias and base/codon frequency bias!!!

H CCA CCG AAA ACC �… TCA GTA CGG AAT

Ch GCA CCG AAA AGC �… ACA GTA GGG AAC1 2 3 4 �… 697 698 699 700

S ; N 6.768366.070034.1331634.07003

2. Count the synonymous (S) and nonsynonymous (N) differences

N NS SS

H CCA CCG AAA ACC �… TCA GTA CGG AAT

Ch GCA CCG AAA AGC �… ACA GTA GGG CAC1 2 3 4 �… 697 698 699 700

S

This is straigthtforward if the two compared codons differ at onecodon position only. When they differ at 2 or 3 codon positions, there exists 4 or 6 pathways from one codon to the other. The multiple pathways may involve different number of synonymous and nonsynonymous and should ideally be weighted appropriately according to their likelihood of occurrence. Most counting methods use equal weighting.

3. Apply a correction for multiple substitution at the same site

Page 16: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

16

Counting methods:The method of Miyana-Yasunaga (1980, J Mol Evol,16(1):23), and its simplified

version (Nei-Gojobori, 1986, Mol Biol Evol,3(5):418) are based on nucleotide substitution model of Jukes and Cantor (1969) and ignore the ts/tv bias or base codon frequency.

Since ts are more likely to be synonymous than tv at 3rd. position, ignoring the ts/tv rate bias understimate the number of S and overestimate N. This effect is well known, and different methods account for this ratio

(Li, et al. 1985, Mol Biol Evol, 2(2):150, Li, 1993, J Mol Evol, 36(1):96, Pamilo and Bianchi 1993, Mol Biol Evol, 10(2):271, Ina, 1995, J Mol Evol, 40(2):190)

The effect of biased base/codon frequencies can have devastating effects on the estimation of dN and dS. Qualitatively different conclusions were reached depending on whether codon usage bias is accomodated for nucler genes from mammals and Drosophila.

A counting method incorporating both the ts/tv bias and the base/codon frequency biaswas implemented by Yang and Nielsen, 2000, Mol Biol Evol,17(1):32.

Many, if not all of them, are incorporated in codeml (PAML) program.

Page 17: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

17

Codon Model

.transition ousnonsynonym aby differj and i if

on,transversi ousnonsynonym aby differj and i if

,transition s synonymouaby differj and i if

on,transversi s synonymouaby differj and i if positions, codon 3 or 2 at differj and i if

q

j

j

j

j

ij

,

,

,

,,0

q Q ij

Markov model of codon substitution (61 states or sense codons) (Goldman & Yang, 1994 Mol Biol Evol, 11, Pp 725; Muse & Gaut, 1994, Mol Biol Evol, 11, Pp 715

The instantaneous substitution rate from codon i to codon j

Codon instantaneous rate matrix

j j

j j

j j

j j

qqqqq

qqqqq

qqqqq

qqqqq

0...

0.....................

0...

0...

,6161,6160,612,611,61

,6061,6060,602,601,60

,261,260,22,21,2

,161,160,12,11,1

Pu A G tstv

Py C T ts

Frequency of codon j

ts/tv rate ratio

dN/dS rate ratio

The model accounts for ts/tv bias, unequal synonymous and nonsynonymous substitution, and biased base/codon frequencies.

Page 18: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

18

Codon ModelFor example, consider the substitution rates to codon CTG (Leu). We have:

position one than more at differCTG TTT the since q

,transition ousnonsynonym a is change CTG(Leu) GTG(Pro) the since µ q

on,transversi ousnonsynonym a is change CTG(Leu) GTG(Val) the since µ q

,transition s synonymoua is change CTG(Leu) TTC(Leu) the since µ q

on,transversi s synonymoua is change CTG(Leu) CTC(Leu) the since µ q

CTGTTT,

CTGCTGCCG,

CTGCTGGTG,

CTGCTGTTC,

CTGCTGCTC,

0

.transition ousnonsynonym aby differj and i if

on,transversi ousnonsynonym aby differj and i if

,transition s synonymouaby differj and i if

on,transversi s synonymouaby differj and i if positions, codon 3 or 2 at differj and i if

q

j

j

j

j

ij

,

,,

,,0

Pu A G tstv

Py C T ts

Page 19: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

19

Codon Model

Molecular sequence data do not allow separate estimation of rate (µ) and time (t), and only their product (µt) can be identified.

We thus fix the rate µ such that the expected number of nucleotide substitutions per codon is one.

This scaling means that time (t) is measured by genetic distance -the expected number of (nucleotide) substitutions per codon.

The transition probability matrix over time t is:

etptP Qtij )()(

CTCCCC

=1t

p

Lastly, the model is time-reversible

This means, )()( tptp jijiji

Page 20: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

20

Pairwise Comparison:Maximum Likelihood (ML) estimation of

H CCC CCG �… ACC h=1 h=2 �… h=n

Ch CTC CCG �… AGC

Xh=(x1,x2)

tn codon sites

CCC CTCp

The probability of site h,

)()())( ,1

61

1,,

tptp(tpxf CTCCCCCCC0k

h CTCkCCCkk

x1 x2 x1 x2

t0 t1

k= ancestral codon, unkown!!

t = t0 + t1

Page 21: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

21

Pairwise Comparison:Maximum Likelihood (ML) estimation of

)()(211 , tpxf xxxh

Parameters in the model: the sequence divergence (t), transition/transversion rate ratio ( ),the nonsynonymous/synonymous rate ratio ( ) ,the codon frequency, ( j)

...estimated from the observed data (base �–F3x4- or codon frequencies �–F61)

The log-likelihood function for these sequence is then given by:

)(log),,(1

h

n

h

xftl

Page 22: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

22

Maximum Likelihood (ML) estimation of

�• Since there is no analytic solution, a numerical optimization algorithm is used to maximize �“l�”

)(log),,(1

h

n

h

xftl

Page 23: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

23

Comparing pairwise methods

Codeml

YN00

PAML

Codon bias Transition bias

Page 24: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

24

Phylogenetic (ML) estimation of

Pairwise methods has low power to detect positive selection

It averages over all sites along all the evolutionary distance separating sequences.

Only 17 out of 3600 genes (Endo et al. 1996, MolBiolEvol, 13. Pp. 685)

Power is improved if selective pressure is allowed to vary over

sites or branches.

Increasing the complexity of the codon model in this way requires

that likelihood be calculated for multiple sequences on a

phylogeny.

Page 25: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

25

Likelihood calculation for multiple sequences on a phylogeny

Given an unrooted tree with N=4 species

The probability of observing the data at codon h, is:

k gxggxgggkkxkkxkkh tptptptptpxf )()()()()()( 44,3,0,2,1, 321

k g

x1

x2x4

t1 t3

t2 t4

N-2 ancestral nodes

t0

Xh=(x1,x2,x3,x4)x3

The data at each The data at each site will be the sum site will be the sum over 61over 61(N(N--2)2) possible possible combinations of combinations of ancestral nodesancestral nodes

The log-likelihood function is the sum over all (h=n) codon sites:

)(log),,(1

)32( h

n

hN xftl We assumed a single We assumed a single value for value for

all the sequences in the treeall the sequences in the tree

Page 26: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

26

Modelling variable selective pressure among lineages Yang, Z. 1998. Mol. Biol. Evol. 15:568.

Adaptive evolution is most likely to occur in an episodic fashion.

The probability of observing the data at codon h,

k g

x1

x2

x3

x4

t1, 0

Xh=(x1,x2,x3,x4)

The transition probabilities for The transition probabilities for the 2 set of branches are the 2 set of branches are calculated from different rate calculated from different rate matrices (Q) generated by using matrices (Q) generated by using different different ratiosratiost4, 1

t3, 1

t2, 0

t0, 0

The most general The most general model called the model called the �“free�“free--ratio model�”ratio model�” specified specified an independent an independent for for

each branch of the treeeach branch of the tree

k gxggxgggkkxkkxkkh tptptptptpxf ),(),(),(),(),()( 144,13,00,02,01, 321

The log-likelihood function is the sum over all (h=n) codon sites:2 2 values estimated for values estimated for

different lineages in the tree different lineages in the tree averaging over all sites!!

)(log),,,(1

10)32( h

n

hN xftl

averaging over all sites!!

Page 27: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

27

Modelling variable selective pressure among lineages Yang, Z. 1998. Mol. Biol. Evol. 15:568.

Adaptive evolution is most likely to occur in an episodic fashion.

Page 28: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

28

Modelling variable selective pressure among sites Yang, Z. and R. Nielsen, 2002. Mol. Biol. Evol. 19: 908-917.

0=0 0< 1<1

The most used approach is to use a statistical distribution to model the random variation in over sites (ie.: beta, gamma distribution, etc.)

Codeml has 13 alternative models available.Continuous distributions are approximated by discrete categories: K

Ki classes, with i different ratios and pi proportions

If K=2, i=(0,1)

p1p0

k g

x1

x2x4

t1 t3

t2 t4

t0

x3

k gxggxgggkkxkkxkkih tptptptptpxf )()()()()()( 44,3,0,2,1, 321

Since we do do not known to Since we do do not known to wich class each site h belongs, wich class each site h belongs, we sum over both classeswe sum over both classes

)()(1

0ih

i

iih xfpxf

)(log1

h

n

h

xfl

Page 29: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

29

Models of variable ratio among sites

Page 30: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

30

Bayes Empirical Bayes (BEB)

Yang Z., W. Wong & R Nielsen, 2005, Mol Biol Evol 22. Pp 1107

After ML estimation of model�’s parameters, an empirical Bayes approach is used to infer which class each site is most likely to belong.

The posterior probability that site h with data xh is from sitek is:

)()(

)(h

khkhk xf

xfpxprob

Page 31: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

31

Bayes empirical bayes results

Neutral

Positive

Purifying

Page 32: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

32

Yang, Z. and R. Nielsen, 2002. Mol Biol Evol 19. Pp. 908

Branch-sites models

Usefull to test adaptive process ocurred at a subset of sites for a limited period of time.Usefull to test adaptation after gene duplicaton

2>1 p2a , p2b

, p

0< 0<1 , 0< 0<1 pForeg

round

Background

PositivePositive selectionselection

NeutralityNeutrality

PurifyngPurifyng selectionselection

Test I: M1a (neutral) vs A1(Positive selection) models

k= 3, site classes (0, 1, 2)

Page 33: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

33

Likelihood ratio test �…nested models comparison

22,05.011

2dfa

lll AMAccept the alternative hypothesis of positive selection!!!

Page 34: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

34

Thanks!

Acknowledgements

Chapter 5. Bielawsky J. & Z. Yang, 2005

Micostrium VulgarisPhylum: ChordataSubphylum: VertebrataClass: Mammalia

Page 35: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

36

Topics

Human EvolutionPositive selection / relaxation- PLoS Comp. Biol., 2006

Brain-specific genes- submitted, Mol Biol Evol

Human DiseaseHuman cSNPs functional prediction- JMB, 2006

Pupas Web server- NAR, 2006

Page 36: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

37

Ancient Positive Selection on Human-Chimp GenomeComparative Genomics DataMain Publications

Page 37: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

38

Our work�…why?�•�• MainMain QuestionsQuestions

Which are the full set of genes and functions that evolve outside of the molecular clock hypothesis?

Which are the full set of genes and functions that were positively selected during evolution of each species, and which show evidence of relaxation?

How do these sets of genes compare amongst themselves and in between derived and ancestral lineages at a functional level?

Page 38: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

39

Comparing Human-Chimp PSGs studies

NO

---

YES-NO

---

YES

GOGO--PSGsPSGs

differencesdifferences

betweenbetween

HH--CC

YES

MK

NO

YES

YES

ML ML

modelsmodels

insteadinstead

Ka/Ka/KsKs

ratioratio

YES

---

YES

YES

NO

MultipleMultiple

testingtesting

correctioncorrection

YES

NO

YES

NO

NO

DifferencesDifferences

PSPS--RXRX

NOApoptosisGametogenesisImmune responseSensory perceptionmRNA transcriptionTranscription factor

10,767:304-H

Bustamante, et Bustamante, et al. al.

NatureNature 20052005Celera

YESSpermatogenesisPerception of soundReproductionOlfactationImmune response

13,454:585 (H-C)

TheThe ChimpChimp SeqSeq. . AnalysisAnalysis

ConsortiumConsortiumNatureNature 20052005

ArbizaArbiza, et al. , et al. PLoS CB 2006PLoS CB 2006

Ensembl

NielsenNielsen, et al. , et al. PLoS PLoS BiolBiol 20052005

Celera

Clark, et al. Clark, et al. Science 2003Science 2003

Celera

NOImmune responseSensory perceptionSpermatogenesisApoptosis, Cell cyle

8,079:733 (H-C)35 p<0.05

YESTranscription factorCell. Prot. Metab.OlfactationImmune responseG-PCR

13,198108-H577-C

YESOlfactationSensory perceptionG-PCR

7,645:1,547-H1,534-C

HH--C C

differentialdifferential

lineagelineage

analyssanalyss

MainMain GOGO--PSGsPSGs

categoriescategories

((BiologicalBiological

processprocess))

NumberNumber

ofof PSGsPSGs

inferredinferred

Page 39: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

40

Positive selection, relaxation and clockDerived and ancestral lineages

2>1 p2a , p2b

, p

0< 0<1 , 0< 0<1 p

Foreground

Background

PositivePositive selectionselectionNeutralityNeutrality

PurifyngPurifyng selectionselection

TestingTesting PS PS andand RXRX(CodeML, PAML, Zhang, Nielsen, Yang, Mol Biol

Evol 2005)

Branch-site models: site classes (0, 1, 2)

Test I: M1a (neutral) vs A (Positive selection) models

Test II: A1 (neutral) vs A (Positive selection) models

TestingTesting ClockClock(RRTree, Robinson-Rechavi, Bioinformatics

2000)

Page 40: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

41

Human and Chimp evolves at equal rates

Page 41: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

42

Gene functions outside clock behaviour are non-significantly different between H-Ch

Page 42: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

43

PSG�’s deduced from Ka/Ks>1 vs. ML branch-site model approximations

A minor proportion of genes with

Ka/Ks>1 match events of PS

Ka/Ks>1 with PS

Ka/Ks<1 with PS

Ka/Ks>1 without PS

Ka/Ks<1 without PS

Page 43: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

44

Genes

Functional Analysis

Relative Ancestral and Derived change

Page 44: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

45

Page 45: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

46

Conclusions INon-neutral evolution is an infrequent process shaping the pattern of divergence

between human and chimp genomes.665 Human (5%) and 1,341 Chimp (10%) genes after correcting for multiple testing

Use of mean normalized Ka rate approaches (Ka/Ks) for concentrating cases of

positive selection should be discarded in favor of more sensitive methodologies.

Functional classes encompassed by the sets of genes evolving without clock and

those under the influence of positive selection in both species were found to be

largely the same and in similar proportions.

However the set of PSG�’s were different in each species.

Comparisons of relative trends among derived and ancestral lineages may provide further insight on H and Ch differencesOut of 59 GO categories:

41 showed a relative increase in Human greater than in Chimp11 showed a relative increase in Chimp greater than that in Human

Suggesting Human may have grown further apart from the ancestral lineage in common GO categories than Chimp has.

Page 46: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

47

Positive selection: Nervous System

Page 47: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

48

Brain genes are between the most conservedgenes analysed in PS studies

Page 48: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

49

How different is the evolution ofhuman brain-specific genes from:

�• others human T-SG�’s?�• and in between lineages?

Are there any effects of usingalternative statistical methods

�• Rate estimation?�• Mean estimation?

Main Questions

..!!?..!!?

286 H-TSGs, 551 HKGs, 9 Tissues

SEWM SEWM estimationestimation usingusing a freea free--branchbranch ML ML modelmodel

Page 49: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

50

Re-Analysis of Dorus, et al. (2004) data

SEWM SEWM estimationestimation usingusing a freea free--branchbranch ML ML modelmodel

Page 50: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

51

Some of the H-TSG�’s under PS (Test II)

Page 51: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

52

Conclusions II

Estimates of average dN/dS ratios are sensitive to the methods used for combining

estimates and the methods used for estimation of dN/dS.

Brain specific genes show no evidence for acceleration in the primate lineage

compared to many other tissue specific gene categories.

There is no evidence for an elevation in the dN/dS ratio in brain-specific genes in

humans compared to chimpanzees, neither between primates and murids taxa.

The number of brain-specific genes showing evidence for positive selection is higher

in the chimpanzee than in the human evolutionary lineage.

While there undoubtedly has been much positive selection relating to brain function

during the evolution of modern man, this selection has not been so pervasive that is

has resulted in a detectably accelerated rate of molecular evolution.

Page 52: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

53

HoweverHowever�…�…TIG, november 2006

Page 53: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

Human Disease & Evolution

Page 54: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

55

Medicine believes in genetics, less in bioinfoand almost nothing in evolutionary biology!!

Medicine seldom takes into account evolutionary

biology�’s conclusionsParasites should evolve towards a bening coexistence with their host�…

It strongly depends on mode of inheritance (Paul Ewald, 1996)

Scientists working in biomedicine rarely recognize

basic evolutionary biology conceptsPAML matrix

Positional homology

Page 55: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

56

Evolutionary Thinking in BiomedicineNotable exceptions

1996. Paul M. Ewald. Evolution of infectious disease�”Parasites vertically inherited should evolve toward a bening coexistence with their host�”.

1996. R. Nesse and G. Williams. Why we get sick? The newscience of Darwinian Medicine

�“Why in a body of such exquisite design, are there a thousands flaws and fraities that make us vulnerable to disease?... They suggest new ways of addressing illness�”

1998. Stephen C. Stearns. Evolution in health and disease�“... our body was shaped by natural selection to maximize reproductive success in ancestral environments�”.

2002. Steve A. Frank. Immunology and Evolution of Infectiousdisease

Bring the gap between immunology and epidemiology.

Page 56: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

57

Example: SNP�’s and Disease

SNPs can cause alterations of gene function by�…Alterations in expression levelAlterations in expression levelAlternative splicingAlternative splicingAlteration (or loss) of gene product functionAlteration (or loss) of gene product function

Changes in the stability of the proteinChanges in the stability of the proteinFunctionally important residuesFunctionally important residuesPhylogenetic conservationPhylogenetic conservation

Natural selection working at codon levelNatural selection working at codon level

Page 57: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

58

Main Question

CouldCould anan estimatorestimator ofof thethe selectiveselective

preassurespreassures actingacting atat codon codon levellevel (( )) be be

usedused as a as a predictorpredictor ofof thethe phenotypephenotype

effecteffect ofof SNP´sSNP´s??

Page 58: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

59

Detecting Positive & Negative Selection

SiteSite--specific modelsspecific modelsaverage dN/dS over lineages but differentiate over sites

123

dN/dS

dN/dS

ML models for positive selection

M1a vs M2a; M7 vs M8

Bayes Empirical Bayes (BEB)

M2a, M8

(which class (h) each site is

most likely to belong).

Sitewise likelihood-ratio method

(SLR)

Page 59: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

60

Bayes Empirical Bayes (BEB)

Neutral

Positive

Purifying

Page 60: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

Evolutionary models in action

p53

DNA

A real case A real case withwith

thethe master master

proteinprotein ofof thethe cellcell

Page 61: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

62

TP53 protein

Probably p53p53 is the main protein regulating cell division and apoptosis

Many mutant formsmutant forms are involved in different types of human cancer

Page 62: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

63

EvolutionaryModels in TP53

�• M1a vs M2a -> no PS

�• M7 vs M8 -> no PS

�•logL(M2a)> logL (M8) -> **best

�• SLR -> Other alternative method�•(multiple testing 95%, 99%)�• ---, ----�• +++, ++++

Page 63: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

64

TP53 Evolutionary Analysis

DB and TR domains have the lowest value distribution

Page 64: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

65

DNA-binding domain evolutionary and biologicalanalysis

SLR: 0.1, 0.1 < 0.2, 0.2 < 0.3, >0.3

A

DNA

L1 L2S1 S2 S2’ S3 S4

100 120 140 160

L3L2 L2S5 S6 S7

H1 S8

180 200 220 240

L3 H2S9 S10

260 280

!!+! !! ! ! !+! ++!+!!!!!!!!!! !!!!!!!!!!!!!!!! ! !!+!+ !!!!!!!!! !

!!!!!!!!!!!!! ! !! !! !!!!!!!!! ! ! !! !! +!!!+!!!! !!!!!!+!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!! !+!!!!!!!!!!!!!!!!!!!+!!!+!!

B

169–MTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSS-241** * *** ** ** ***** * * *** **** * ** *****

96–SVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQH-168** * * * *** *** *** * *** ** * ** ** * **

242–CMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENL-289********** ** *** * *** **** ******** ** *

Page 65: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

66

TR-domain evolutionary and biological analysis

SLR: 0.1, 0.1 < 0.2, 0.2 < 0.3, >0.3

B

A

335 345

!!!!!!!!!+ !!!!+!! !!!!!!+!+ 325–GEYFTLQIRGRERFEMFRELNEALELKDAQA-355

* * * * * * *

Page 66: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

67

How effective is natural selection ?

§ Evolutionary biologists recognized that natural selection works in proportion to the number of deleterious mutations in the population

Page 67: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

68

TP53 mutation freq. and selective constraints

According to the theory this will follow an �“L�” shape curve

Page 68: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

The Main Question again

p53 results seem to show good signals, however,

Is it possible to obtain a specific predictor of the

more frequent amino acid changes associated to

human diseases?

Page 69: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

70

Bioinformatics and evolutionary analysis

Analyse DB containing codon mutation frequencies for all the possible

human diseases proteins

Immune deficiency and cancer (COSMIC) databases (approx. 250

genes)

Ensembl-orthologous genes in different species

Mammals and Vertebrates

Evolutionary ML analysis

(M1a, M2a, M7, M8, SLR)

Statistical tests (KS)

reject genes with <10 mutations

Page 70: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

71

and frequency distribution of Immune and Cancer mutations

threshold= 0.1

TwoTwo samplesample KolmogorovKolmogorov--SmirnovSmirnov testtestH0: freq. (lower ) = freq. (upper )

HA: freq. (lower ) > freq. (uppper )

Page 71: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

72

Conclusions III

We have found an evolutionary parameter that allows to

differentiate amino acids where disease is more frequent

This parameter is a measure of the action of natural selection

working on vertebrate species during million years

We hypothesize that non-synonymous changes on amino acids

showing < 0.1 probably affects the normal function of proteins

Recently we confirmed this results using more than 3,000 proteins

Disease and polymorphisms are differentiated using values

Page 72: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

73

Selective constraints on all the cSNP�’s of the Human Genome

PAMLPAML--SLRSLR

Evolutionary Models Evolutionary Models

Page 73: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

74

Bioinformatic Tool: SNP�’s probable associated to mendelian diseases (NAR, web issue 2006)

Page 74: Adaptive Evolution on Genes and Genomes - Genómica · PDF file · 2013-11-07Adaptive Evolution on Genes and Genomes Hern n J. Dopazo ... S PS Pu A G ts tv Py C T ts. 19 Codon Model

76

Thanks�…again!!Thanks�…again!!

Micostrium VulgarisPhylum: ChordataSubphylum: VertebrataClass: Mammalia