Upload
fahim
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT. ESTIMATING THE dN/dS RATIO FOR GENE SEQUENCES IN THE PRESENCE OF RECOMBINATION. Danny Wilson 12 th October 2004. Menu. Codon-based models of molecular evolution An new method for estimating omega with recombination - PowerPoint PPT Presentation
Citation preview
DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT
ESTIMATING THE dN/dS RATIO FOR GENE SEQUENCES IN THE PRESENCE OF
RECOMBINATION
Danny Wilson12th October 2004
Menu
Codon-based models of molecular evolution
An new method for estimating omega with recombination
Does it work? Simulation studies and example data
Part one
Codon-based models of molecular evolution
Underlying rates of non-synonymous mutation are usually confounded with selection against inviable mutants.
Thus it is convenient to model functional constraint as mutational bias.(Or rather, make no attempt to disentangle the two).
Ancestral type
Neutral mutant
Inviable mutant
Mutation Selection
Sampling usuallyoccurs at this point
i.e. post-selection
Phe Phe Leu Leu Ser Ser Ser Ser Tyr Tyr STOPSTOP Cys Cys STOP Trp Leu Leu Leu Leu Pro Pro Pro Pro His His Gln Gln Arg Arg Arg Arg Ile Ile Ile Met Thr Thr Thr Thr Asn Asn Lys Lys Ser Ser Arg Arg Val Val Val Val A la Ala Ala Ala Asp Asp Glu Glu Gly Gly Gly GlyUUU UUC UUA UUG UCU UCC UCA UCG UAU UAC UAA UAG UGU UGC UGA UGG CUU CUC CUA CUG CCU CCC CCA CCG CAU CAC CAA CAG CGU CGC CGA CGG AUU AUC AUA AUG ACU ACC ACA ACG AAU AAC AAA AAG AGU AGC AGA AGG GUU GUC GUA GUG GCU GCC GCA GCG GAU GAC GAA GAG GGU GGC GGA GGG
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64Phe UUU 1 1 3 4 4 5 4 0 0 4 0 5 4 4Phe UUC 2 1 4 4 5 4 0 0 4 0 5 4 4Leu UUA 3 1 3 5 4 0 4 2 4 4Leu UUG 4 1 5 0 4 0 4 2 4 4Ser UCU 5 1 3 2 2 4 0 0 4 0 5 4 4Ser UCC 6 1 2 2 4 0 0 4 0 5 4 4Ser UCA 7 1 3 4 0 4 5 4 4Ser UCG 8 1 0 4 0 4 5 4 4Tyr UAU 9 1 3 4 4 5 0 5 4 4Tyr UAC 10 1 4 4 5 0 5 4 4
STOP UAA 11 0 0 0 0 0 0 0 0 0 0 1 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0STOP UAG 12 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0Cys UGU 13 0 0 1 3 4 4 5 4 4Cys UGC 14 0 0 1 4 4 5 4 4
STOP UGA 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0Trp UGG 16 0 0 0 1 5 4 4Leu CUU 17 0 0 0 1 3 2 2 5 4 4 4 4Leu CUC 18 0 0 0 1 2 2 5 4 4 4 4Leu CUA 19 0 0 0 1 3 5 4 4 4 4Leu CUG 20 0 0 0 1 5 4 4 4 4Pro CCU 21 0 0 0 1 3 2 2 4 4 4 4Pro CCC 22 0 0 0 1 2 2 4 4 4 4Pro CCA 23 0 0 0 1 3 4 4 4 4Pro CCG 24 0 0 0 1 4 4 4 4His CAU 25 0 0 0 1 3 4 4 5 4 4His CAC 26 0 0 0 1 4 4 5 4 4Gln CAA 27 0 0 0 1 3 5 4 4Gln CAG 28 0 0 0 1 5 4 4Arg CGU 29 0 0 0 1 3 2 2 4 4Arg CGC 30 0 0 0 1 2 2 4 4Arg CGA 31 0 0 0 1 3 2 4Arg CGG 32 0 0 0 1 2 4Ile AUU 33 0 0 0 1 3 2 4 5 4 4 5Ile AUC 34 0 0 0 1 2 4 5 4 4 5Ile AUA 35 0 0 0 1 5 5 4 4 5
Met AUG 36 0 0 0 1 5 4 4 5Thr ACU 37 0 0 0 1 3 2 2 4 4 5Thr ACC 38 0 0 0 1 2 2 4 4 5Thr ACA 39 0 0 0 1 3 4 4 5Thr ACG 40 0 0 0 1 4 4 5Asn AAU 41 0 0 0 1 3 4 4 5 5Asn AAC 42 0 0 0 1 4 4 5 5Lys AAA 43 0 0 0 1 3 5 5Lys AAG 44 0 0 0 1 5 5Ser AGU 45 0 0 0 1 3 4 4 5Ser AGC 46 0 0 0 1 4 4 5Arg AGA 47 0 0 0 1 3 5Arg AGG 48 0 0 0 1 5Val GUU 49 0 0 0 1 3 2 2 5 4 4Val GUC 50 0 0 0 1 2 2 5 4 4Val GUA 51 0 0 0 1 3 5 4 4Val GUG 52 0 0 0 1 5 4 4Ala GCU 53 0 0 0 1 3 2 2 4 4Ala GCC 54 0 0 0 1 2 2 4 4Ala GCA 55 0 0 0 1 3 4 4Ala GCG 56 0 0 0 1 4 4Asp GAU 57 0 0 0 1 3 4 4 5Asp GAC 58 0 0 0 1 4 4 5Glu GAA 59 0 0 0 1 3 5Glu GAG 60 0 0 0 1 5Gly GGU 61 0 0 0 1 3 2 2Gly GGC 62 0 0 0 1 2 2Gly GGA 63 0 0 0 1 3Gly GGG 64 0 0 0 1
K ey
0 change involving a stop codon1 no change2 synonymous transversion3 synonymous transition4 non-synonymous transversion5 non-synonymous transition
Types of single nucleotide mutationTransitions vs. transversions
A G
T C
Purine
Pyramidine Transitions
Transitions
Transversions
For any base there are always 2 possible transversions and 1 possible transition.
Types of codon mutationSynonymous vs. non-synonymous
T T G
T T A
Leucine
Leucine
T T G
A T G
Leucine
Methionine
LeucinepH 5.98
6-fold degeneracy in the genetic code
MethioninepH 5.74
Single unique codon ATG
(CH3)2-CH-CH2-CH(NH2)-COOHCH3-S-(CH2)2-CH(NH2)-COOH
Synonymous Non-synonymous
Example: CTT
C T T T TT TT T
T TT TT TT TT TT T
TAG
CAG
CAG
Phe Non-synonymous transition
Ile Non-synonymous transversion
Val Non-synonymous transversion
Ser Non-synonymous transition
Tyr Non-synonymous transversion
Cys Non-synonymous transversion
Phe Non-synonymous transition
Leu Synonymous transversion
Leu Synonymous transversion
Leucine
Nielsen and Yang (1998) codon-based model of molecular evolution
Mutation rate
Synonymous transversion
Synonymous transition
Non-synonymous transversion
Non-synonymous transition
Other
Interpretation
Transition-transversion ratio
dN/dSRelative viability of non-synonymous mutations
codeML
Pros Viable method for detecting mode of selection
on a codon sequence
Cons Categorizes possible values for omega into a
small number of discrete intervals Results can be misleading in the presence of
recombination
Part two
An new method for estimating omega with recombination
Inference with recombination
?|Pr
Pr,|Pr
1|Pr
,|Pr1
|Pr
PrdPr,|Pr
Pr|Pr|Pr
1
1
X
GQ
GGX
MX
GXM
X
GGGX
XX
i
iM
i
i
M
i
i
Li and Stephens (2003)Approximation to the likelihood
,...|ˆ...,|ˆ|ˆ
,...|Pr...,|Pr|Pr
|Pr
11121
11121
nn
nn
XXXXXX
XXXXXX
X
Li and Stephens (2003)Approximation to the likelihood
TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT
TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC
TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC
TTTGATACCGTTGCCGAAGGTTTGGGCGAAATTCGTGATTTATTGCGCCGTTATCATCAT
,...|Pr 314 XXX
Li and Stephens (2003)Approximation to the likelihood
TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT
TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC
TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC
,...|Pr 314 XXX
ii
ii
ii
ii XX
XXk
XXkk
k
XX
rec
,,4
,,4
,,4
,,4 |
2
1
2
1
)|Pr(
2/expPr
My modification to Li and Stephens(2003)
0
2,
0 ,,4
,,4,,4
exp
)Pr(),|Pr(
)|Pr(|
,,4dtktkp
dtttXX
XXXX
tXX
ii
iiii
ii
iX ,4 iX ,
t
Estimating variable omega
The problem A constant omega model is prone to averaging
positive and negative omegas in a gene Allowing every site its own omega leaves little
information for inference
The solution A change-point model where windows of
adjacent sites share the same omega
Estimating variable omega
1 2 3 4 5
MCMC moves: Change omega for a single block Extend a block 5’ or 3’ Split an existing block Merge adjacent blocks
Part three
Does it work? Simulation studies and example data
Posterior distribution for known and unknown genealogy
Posterior distribution for known and unknown genealogy
Neutral dataset
True omega
Posterior mean
Posterior HPD interval
Non-neutral dataset
True omega
Posterior mean
Posterior HPD interval
HIV envelope geneSlow Non-Progressors vs Rapid Progressors
Slow Non-Progressors Rapid Progressors
HIV envelope geneSlow Non-Progressors vs Rapid Progressors
Slow Non-Progressors Rapid Progressors
Neisseria meningitidis PorB3
Neisseria meningitidis PorB3
95% HPD Upper0.0386
95% HPD Lower0.0187
Work in progress…
Variable recombination rateModel indelsFalsifiability testTest for sensitivity to rate heterogeneity
Acknowledgements
Gil McVean (Supervisor)Martin Maiden (Supervisor)Ziheng YangRachel Urwin (meninge data)Charlie Edwards (HIV data)