DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT

Preview:

DESCRIPTION

DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT. ESTIMATING THE dN/dS RATIO FOR GENE SEQUENCES IN THE PRESENCE OF RECOMBINATION. Danny Wilson 12 th October 2004. Menu. Codon-based models of molecular evolution An new method for estimating omega with recombination - PowerPoint PPT Presentation

Citation preview

DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT

ESTIMATING THE dN/dS RATIO FOR GENE SEQUENCES IN THE PRESENCE OF

RECOMBINATION

Danny Wilson12th October 2004

Menu

Codon-based models of molecular evolution

An new method for estimating omega with recombination

Does it work? Simulation studies and example data

Part one

Codon-based models of molecular evolution

Underlying rates of non-synonymous mutation are usually confounded with selection against inviable mutants.

Thus it is convenient to model functional constraint as mutational bias.(Or rather, make no attempt to disentangle the two).

Ancestral type

Neutral mutant

Inviable mutant

Mutation Selection

Sampling usuallyoccurs at this point

i.e. post-selection

Phe Phe Leu Leu Ser Ser Ser Ser Tyr Tyr STOPSTOP Cys Cys STOP Trp Leu Leu Leu Leu Pro Pro Pro Pro His His Gln Gln Arg Arg Arg Arg Ile Ile Ile Met Thr Thr Thr Thr Asn Asn Lys Lys Ser Ser Arg Arg Val Val Val Val A la Ala Ala Ala Asp Asp Glu Glu Gly Gly Gly GlyUUU UUC UUA UUG UCU UCC UCA UCG UAU UAC UAA UAG UGU UGC UGA UGG CUU CUC CUA CUG CCU CCC CCA CCG CAU CAC CAA CAG CGU CGC CGA CGG AUU AUC AUA AUG ACU ACC ACA ACG AAU AAC AAA AAG AGU AGC AGA AGG GUU GUC GUA GUG GCU GCC GCA GCG GAU GAC GAA GAG GGU GGC GGA GGG

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64Phe UUU 1 1 3 4 4 5 4 0 0 4 0 5 4 4Phe UUC 2 1 4 4 5 4 0 0 4 0 5 4 4Leu UUA 3 1 3 5 4 0 4 2 4 4Leu UUG 4 1 5 0 4 0 4 2 4 4Ser UCU 5 1 3 2 2 4 0 0 4 0 5 4 4Ser UCC 6 1 2 2 4 0 0 4 0 5 4 4Ser UCA 7 1 3 4 0 4 5 4 4Ser UCG 8 1 0 4 0 4 5 4 4Tyr UAU 9 1 3 4 4 5 0 5 4 4Tyr UAC 10 1 4 4 5 0 5 4 4

STOP UAA 11 0 0 0 0 0 0 0 0 0 0 1 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0STOP UAG 12 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0Cys UGU 13 0 0 1 3 4 4 5 4 4Cys UGC 14 0 0 1 4 4 5 4 4

STOP UGA 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0Trp UGG 16 0 0 0 1 5 4 4Leu CUU 17 0 0 0 1 3 2 2 5 4 4 4 4Leu CUC 18 0 0 0 1 2 2 5 4 4 4 4Leu CUA 19 0 0 0 1 3 5 4 4 4 4Leu CUG 20 0 0 0 1 5 4 4 4 4Pro CCU 21 0 0 0 1 3 2 2 4 4 4 4Pro CCC 22 0 0 0 1 2 2 4 4 4 4Pro CCA 23 0 0 0 1 3 4 4 4 4Pro CCG 24 0 0 0 1 4 4 4 4His CAU 25 0 0 0 1 3 4 4 5 4 4His CAC 26 0 0 0 1 4 4 5 4 4Gln CAA 27 0 0 0 1 3 5 4 4Gln CAG 28 0 0 0 1 5 4 4Arg CGU 29 0 0 0 1 3 2 2 4 4Arg CGC 30 0 0 0 1 2 2 4 4Arg CGA 31 0 0 0 1 3 2 4Arg CGG 32 0 0 0 1 2 4Ile AUU 33 0 0 0 1 3 2 4 5 4 4 5Ile AUC 34 0 0 0 1 2 4 5 4 4 5Ile AUA 35 0 0 0 1 5 5 4 4 5

Met AUG 36 0 0 0 1 5 4 4 5Thr ACU 37 0 0 0 1 3 2 2 4 4 5Thr ACC 38 0 0 0 1 2 2 4 4 5Thr ACA 39 0 0 0 1 3 4 4 5Thr ACG 40 0 0 0 1 4 4 5Asn AAU 41 0 0 0 1 3 4 4 5 5Asn AAC 42 0 0 0 1 4 4 5 5Lys AAA 43 0 0 0 1 3 5 5Lys AAG 44 0 0 0 1 5 5Ser AGU 45 0 0 0 1 3 4 4 5Ser AGC 46 0 0 0 1 4 4 5Arg AGA 47 0 0 0 1 3 5Arg AGG 48 0 0 0 1 5Val GUU 49 0 0 0 1 3 2 2 5 4 4Val GUC 50 0 0 0 1 2 2 5 4 4Val GUA 51 0 0 0 1 3 5 4 4Val GUG 52 0 0 0 1 5 4 4Ala GCU 53 0 0 0 1 3 2 2 4 4Ala GCC 54 0 0 0 1 2 2 4 4Ala GCA 55 0 0 0 1 3 4 4Ala GCG 56 0 0 0 1 4 4Asp GAU 57 0 0 0 1 3 4 4 5Asp GAC 58 0 0 0 1 4 4 5Glu GAA 59 0 0 0 1 3 5Glu GAG 60 0 0 0 1 5Gly GGU 61 0 0 0 1 3 2 2Gly GGC 62 0 0 0 1 2 2Gly GGA 63 0 0 0 1 3Gly GGG 64 0 0 0 1

K ey

0 change involving a stop codon1 no change2 synonymous transversion3 synonymous transition4 non-synonymous transversion5 non-synonymous transition

Types of single nucleotide mutationTransitions vs. transversions

A G

T C

Purine

Pyramidine Transitions

Transitions

Transversions

For any base there are always 2 possible transversions and 1 possible transition.

Types of codon mutationSynonymous vs. non-synonymous

T T G

T T A

Leucine

Leucine

T T G

A T G

Leucine

Methionine

LeucinepH 5.98

6-fold degeneracy in the genetic code

MethioninepH 5.74

Single unique codon ATG

(CH3)2-CH-CH2-CH(NH2)-COOHCH3-S-(CH2)2-CH(NH2)-COOH

Synonymous Non-synonymous

Example: CTT

C T T T TT TT T

T TT TT TT TT TT T

TAG

CAG

CAG

Phe Non-synonymous transition

Ile Non-synonymous transversion

Val Non-synonymous transversion

Ser Non-synonymous transition

Tyr Non-synonymous transversion

Cys Non-synonymous transversion

Phe Non-synonymous transition

Leu Synonymous transversion

Leu Synonymous transversion

Leucine

Nielsen and Yang (1998) codon-based model of molecular evolution

Mutation rate

Synonymous transversion

Synonymous transition

Non-synonymous transversion

Non-synonymous transition

Other

Interpretation

Transition-transversion ratio

dN/dSRelative viability of non-synonymous mutations

codeML

Pros Viable method for detecting mode of selection

on a codon sequence

Cons Categorizes possible values for omega into a

small number of discrete intervals Results can be misleading in the presence of

recombination

Part two

An new method for estimating omega with recombination

Inference with recombination

?|Pr

Pr,|Pr

1|Pr

,|Pr1

|Pr

PrdPr,|Pr

Pr|Pr|Pr

1

1

X

GQ

GGX

MX

GXM

X

GGGX

XX

i

iM

i

i

M

i

i

Li and Stephens (2003)Approximation to the likelihood

,...|ˆ...,|ˆ|ˆ

,...|Pr...,|Pr|Pr

|Pr

11121

11121

nn

nn

XXXXXX

XXXXXX

X

Li and Stephens (2003)Approximation to the likelihood

TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT

TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC

TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC

TTTGATACCGTTGCCGAAGGTTTGGGCGAAATTCGTGATTTATTGCGCCGTTATCATCAT

,...|Pr 314 XXX

Li and Stephens (2003)Approximation to the likelihood

TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT

TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC

TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC

,...|Pr 314 XXX

ii

ii

ii

ii XX

XXk

XXkk

k

XX

rec

,,4

,,4

,,4

,,4 |

2

1

2

1

)|Pr(

2/expPr

My modification to Li and Stephens(2003)

0

2,

0 ,,4

,,4,,4

exp

)Pr(),|Pr(

)|Pr(|

,,4dtktkp

dtttXX

XXXX

tXX

ii

iiii

ii

iX ,4 iX ,

t

Estimating variable omega

The problem A constant omega model is prone to averaging

positive and negative omegas in a gene Allowing every site its own omega leaves little

information for inference

The solution A change-point model where windows of

adjacent sites share the same omega

Estimating variable omega

1 2 3 4 5

MCMC moves: Change omega for a single block Extend a block 5’ or 3’ Split an existing block Merge adjacent blocks

Part three

Does it work? Simulation studies and example data

Posterior distribution for known and unknown genealogy

Posterior distribution for known and unknown genealogy

Neutral dataset

True omega

Posterior mean

Posterior HPD interval

Non-neutral dataset

True omega

Posterior mean

Posterior HPD interval

HIV envelope geneSlow Non-Progressors vs Rapid Progressors

Slow Non-Progressors Rapid Progressors

HIV envelope geneSlow Non-Progressors vs Rapid Progressors

Slow Non-Progressors Rapid Progressors

Neisseria meningitidis PorB3

Neisseria meningitidis PorB3

95% HPD Upper0.0386

95% HPD Lower0.0187

Work in progress…

Variable recombination rateModel indelsFalsifiability testTest for sensitivity to rate heterogeneity

Acknowledgements

Gil McVean (Supervisor)Martin Maiden (Supervisor)Ziheng YangRachel Urwin (meninge data)Charlie Edwards (HIV data)

Recommended