Upload
morty
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Transition Bias and Substitution models. Xuhua Xia [email protected] http:// dambe.bio.uottawa.ca. Transitions and Transversions. Purine. Pyrimidine. Transition: t he substitution of a purine for a purine or a pyrimidine for a pyrimidine. Symbolized by s. A G C T. A G C T. - PowerPoint PPT Presentation
Citation preview
Xuhua Xia
Transition bias refers to the degree by which the s/v ratio deviates from the expected 1/2. The observed s/v ratio is almost always much larger than 1/2.
A G
C T
A G
C T
A G
C T
Transitions and Transversions
Transition: the substitution of a purine for a purine or a pyrimidine for a pyrimidine. Symbolized by s.
Transversion: the substitution of a purine for a pyrimidine or vice versa. Symbolized by v.
What is transition bias?
Purine
Pyrimidine
Xuhua Xia
Transition Bias is Ubiquitous. Why?
• For both invertebrate and vertebrate genes:
• What causes transition bias?– Mutation bias
– Selection bias
1
2obs
obs
s
v
obs s s
obs v v
s P
v P
Selection bias in fixation probability
Protein-coding genesRNA genes
Mutation bias
Xuhua Xia
Mitochondrial Genetic CodeAmino Amino Amino Amino
Codon acid Codon acid Codon acid Codon acid
UUU Phe UCU Ser UAU Tyr UGU CysUUC Phe UCC Ser UAC Tyr UGC CysUUA Leu UCA Ser UAA Stop UGA TrpUUG Leu UCG Ser UAG Stop UGG Trp
CUU Leu CCU Pro CAU His CGU ArgCUC Leu CCC Pro CAC His CGC ArgCUA Leu CCA Pro CAA Gln CGA ArgCUG Leu CCG Pro CAG Gln CGG Arg
AUU lle ACU Thr AAU Asn AGU SerAUC Ile ACC Thr AAC Asn AGC SerAUA Met ACA Thr AAA Lys AGA StopAUG Met ACG Thr AAG Lys AGG Stop
GUU Val GCU Ala GAU Asp GGU GlyGUC Val GCC Ala GAC Asp GGC GlyGUA Val GCA Ala GAA Glu GGA GlyGUG Val GCG Ala GAG Glu GGG Gly
• Synonymous and nonsynonymous
• Degeneracy:
– Non-degenerate
– Two-fold degenerate
– Four-fold degenerate
• Transitions are synonymous and transversions are nonsynonymous at two-fold degenerate sites.
Xuhua Xia
RNA secondary structure
Seq1: CASeq1: CACCGAGA ||||| ||||| GUGCU GUGCU
Seq2: CAUGA ||||| GUGCU
Seq1: CSeq1: CAACGACGA ||||| ||||| GUGCU GUGCU
Seq2: CGCGA ||||| GUGCU
G/U pair, although not as strong as A/U or C/G pair, generally does not disrupt RNA secondary structure (and occurs frequently in RNA secondary structure).
Xuhua Xia
Causes of transition bias
I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be." Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03
obs s s
obs v v
s P
v P
Xuhua Xia
At Four-fold Degenerate Sites
At four-fold degenerate sites, all nucleotide substitutions are synonymous and subject to roughly the same selection pressure (similar fixation probabilities)
2obs s s s
obs v v v
s P
v P
Glycine codon:
GGA
GGC
GGG
GGT
Four-folddegenerate site
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s s v Glu Gly Trp
Xuhua Xia
At Nondegenerate Sites
Glycine codon:
GGA
GGC
GGG
GGT
nondegenerate site
At nondegenerate sites, all nucleotide substitutions are nonsynonymous and subject to roughly the same selection pressure (similar fixation probabilities)
2obs s s s
obs v v v
s P
v P
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s v Glu Gly Trp
Xuhua Xia
At Two-fold Degenerate Sites
At two-fold degenerate sites, all transitional substitutions are synonymous, and all transversional substitutions are nonsynonymous
802 v
s
v
s
v
s
obs
obsP
P
P
P
v
s
GAA His
GAG His
GAC Gln
GAT Gln
2-fold degenerate site
A transition is about 40 time as like to become fixed as a transversion.
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s s s v Glu Gly Trp
Xuhua Xia
Methylation and deamination
H3C-MethyltransferaseH3C- +
Donor Acceptor
Xuhua Xia
Methylation and DNA Repair in E. coli
• DNA alphabets: ACGT
• RNA alphabets: ACGU
• DNA duplication and Watson-Crick paring rule: A-T, C-G
3’--CTAG----CTAGGTAT----C-----C--CTAG-----------5’ |||| |||||||| ? ? ||||5’--GATC----GATCCATA----U-----T--GATC-----... 3’
H3C H3C H3C
H3CmutSmutH mutL
Xuhua Xia
Methylation-Modification System
TGGC*CA AC*CGGT
Brevibacterium albidum
dsDNAphage
Bacterial Genome
Restrictionenzyme
Transcription and Translation
Bacterial Membrane----TGG|CCA-------ACC|GGT---
Methylase
Xuhua Xia
CpG-Specific DNA Methylation
• Mammalian DNA methyltransferase 1 (DNMT1)– NLS-containing domain
– replication foci-directing domain
– ZnD, Zn-binding domain
– polybromo domain
– CatD, the catalytic domain
Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99.
1343
350 613 746 1124
609 748 1110NlsD ZnD CatD
CpG mCpG mCpG
RFDD PBD
1620
Xuhua Xia
CpG-Specific DNA Methylation
5’ATGCGA-------CCGA--------ACGGC--TAA 3’ |||||| |||| |||||3’TACGCT-------GGCT--------TGCCG--ATT 5’
H3C
H3C
H3C
Fully methylated Hemi-methylated Unmethylated
Note: 5’CG3’ = CpG
Xuhua Xia
Methylation and Gene Regulation• Proteins with a methyl-CpG binding domain (MBD)
– MBD1, MBD2, and MBD3 – MeCP2
• Deacetylases: An enzyme that removes an acetyl group• Histone deacetylases: deacetylate lysyl residues in histones (the half life of an
acetyl group is ~10min). Acetylation removes a positive charge on the lysine -amino group and promote nucleosome melting (and gene expression). Deacetylation tend to decrease or turn off gene expression.
---mCpG-----------------MBD
Histone deacetylase Condensed
DNA with repressed transcription
Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7. Lysine demethylation
Xuhua Xia
H3C
Methylation and Mutation
N
N
O
NH2O
Cytocine is converted to Thymine
methylation
Spontaneous deamination
N
N
O
H3C
O
Xuhua Xia
Vertebrate mitochondrion
Parental H
Parental L
Daughter H
OH
OL
Daughter L
Xuhua Xia
Spontaneous deamination
N
NNH
N
NH2
NH
NNH
N
O
NH2
N
NH
NH2
O
NH
NNH
N
O
H
NH
NNH
N
O
O
N
NH
O
O
Adenine Guanine Cytosine Methylcytosine
Hypoxanthine Xanthine Uracil Thymine(Pair with C) (Pair with C) (Pair with A) (Pair with A)
N
NH
NH2
O
CH3
N
NH
O
O
CH3
H2 O
NH
3
H2 O
NH
3
H2 O
NH
3
H2 O
NH
3
Xuhua Xia
Transversion can erase transitions
Transitions can erase transitions, and transversions can erase transversions.
However, a transversion can erase many transitions occurring before it, and subsequent transitions cannot erase the transversion:
AACGCTTGACG
AACGCTTAACG
AACGCTTGACG
AACGCTTCACG
AACGCTTTACG
Although a transition could also erase 2n transversions occurring before it, this is rare because transversions are in generally much rarer than transitions.
Transitions tend to be missed in counting much more frequently than transversions.
AACGCTTGACGAACGCTTTACGAACGCTTAACGAACGCTTGACG
Xuhua Xia
Summary• Selection: Transitions are tolerated more than transversion by
natural selection because– they are more likely synonymous in protein-coding sequences than
transversions– they are less likely to disrupt RNA secondary structure than
transversions.• Mutation: Transitional mutation occurs more frequently than
transversions because– Misincorporation during DNA replication occur more frequently
between two purines or between two pyrimidines than between a purine and a pyrimidine
– A purine is more likely to mutate chemically to another purine than to a pyrimidine (e.g., through spontaneous deamination) . The same for pyrimidine.
• Bias in counting: Transitions tend to be missed in counting much more frequently than transversions (which necessitates the substitution models)
Xuhua Xia
Nucleotide Substitutions
ACACTCGGATTAGGCT
ACACTCGGATTAGGCT
ATACTCAGGTTAAGCT
ACAATCCGGTTAAGCT
T C C
AGACTCGGATTAGGCT
Observed sequences
sing
le
mul
tipl
e
coin
cide
ntal
para
llel
conv
erge
nt
back
Actual number of changes during the evolution of the two daughter sequences: 12
Observed number of differences between the two daughter sequences: 3.
Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12.
From WHL
Xuhua Xia
Substitution models and phylogenetics
• A substitution model is to model the evolutonary process so as to correct for multiple hits.
• A phylogenetic reconstruction method implicitly or explicitly assumes a substitution model.
• A phylogenetic method assuming a wrong substitution model will typically lead to wrong trees produced.
A G
C T
A G C TA a1 a2 a3 G a7 a4 a5 C a8 a9 a6
T a10 a11 a12
A G C TA a1G a2C a3T G a1A a4C a5T C a2A a4G a6T
T a3A a5G a6C
The diagonal of a transition probability matrix is subject to the constraint that each row sums up to 1.
JC69
i = 0.25ai = c
F81/TN84A, C, G, T
ai = c
K80i =0.25a1 = a6 = a7 = a12 = a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11=
HKY85A, C, G, T
a1 = a6 = a7 = a12 = a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11=
TN93A, C, G, T
a1 = a7 = 1
a6 = a12 = 2
a2 = a3 = a4 = a5 = a8 = a9 = a10 =a11= GTR
Unrestricted: no equilibrium i
Xuhua Xia
The TN93 model as an example
.
.
.
.
2
2
1
1
ACT
GCT
GAT
GAC
Q
- frequency parameters
- rate ratio parameters
In addition to illustrated assumptions, it also assumes that the frequency and rate ratio parameters do not change over time, i.e., the substitution process is stationary.
A G
C T
T C A G
Xuhua Xia
Substitution Models• There are three types of substitution models in molecular
evolution– Nucleotide-based
– Amino acid-based
– Codon-based
• Substitution models are characterized by two categories of parameters: the frequency parameters and the rate ratio parameters, and different models differ by their assumptions concerning these two categories of parameters.