Upload
pearl-tate
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
For bacterial genomes the main source of heterogeneity of the genetic text is the signal corresponding to the presence of coding information
Mutual information in three consecutive letters - frequency of triplet ijk
- frequency of letter i
Introduction
ijk kji
ijkijk ppp
ffM 2log
ijkf
ip
Different types of codon bias
Translational (mainly fast-growing bacteria) GC-rich (or AT-rich) codons are preffered Codons with G and C in 3rd position are preffered (or
A and T) Influenced by GC-skew (G-C/G+C) or AT-skew Influenced by strand (leading or lagging) Codon bias connected with genes from other
organisms (horizontally transferred)
Questions
How codon usage of different genes in different genomes is organized?
How to describe codon bias quantatively? How to detect what is the main source of
codon bias?
Qualitative study of codon usage
We can describe every gene by its frequencies of codons – vector with 64 components (59 are interesting for studying codon bias)
PCA (principal component analysis) and CA (correspondence analysis) are the most common techniques for exploratory study of codon usage
Close points – genes with similar codon usage
Common pattern of fast-growing bacteria
IV
II
I
III
Genes of class I(most of)
Genes of class II(higly expressed)
Genes of class III(unusual)
Genes of class IV(hydrophobic)
Typical case of fast-growing bacterium:
Bacillus subtilis
Genes of class I(most of)
Genes of class II(higly expressed)
Genes of class III(unusual)
Genes of class IV(hydrophobic)
Escherichia coli
Genes of class I(most of)
Genes of class II(higly expressed)
Genes of class III(unusual)
Genes of class IV(hydrophobic)
Lower-eukaryotic organism:
Saccharomyces cerevisiae
Genes of class I(most of)
Genes of class II(higly expressed)
Genes of class III(unusual)
Genes of class IV(hydrophobic)
Higher-eukaryotic organism:
Caenorhabditis elegans
Genes of class I(most of)
Genes of class II(higly expressed)
Genes of class III(unusual)
Genes of class IV(hydrophobic)
Slow-growing bacterium:
Helicobacter pylori
Genes of class I(most of)
Genes of class IV(hydrophobic)
Some conclusions: sources of sequence heterogeneity
Hydrophobicity Evolutional pressure (translational bias) Horizontal transfer Different GC(AT)-content Strand heterogeneity
Quantative measures of bias
Effective number of codons Nc
Relative Synonymous Codon Usage
Relative Codon Adaptiveness [0..1]
jNkk
j
i
fN
f
..1
i 1 RSCU
},max{ w i iforsynonymsallf
f
j
i
Codon Adaptaion Index (CAI)
Codon bias with respect to some small set of genes (Reference Set)
},max{ w i iforsynonymsallf
f
j
i
L
L
iiwgeneCAI
1
)(
fi – frequency of codon i, calculated over referenceset SL – number of all codons
in a gene
geneiii
i wwggeneCAI lnln)(ln64
1
gi – frequency of codon iin a gene
Problems:
Functions of genes need to be known Expert needs to know the type of codon bias
already (else the results will be meaningless) The genes in Reference Set may not have the
highest CAIs
We use as a Reference Set the most biased genes with respect to dominating codon bias.
It is not necessarily translational
The most biased set of genes SR
Calculate CAI (with wi calculated over SR) for every gene in genome
Then every gene in SR has CAI higher than any gene which is not in SR
We can have several SR for one genome, every of them reflects presence of some type of codon bias
)()( RR SgeneCAISgeneCAI
Algorithm for detecting dominating codon bias
1. Calculate wi over 100% genes, and CAIs for all genes
2. Select 50% genes with the highest CAIs, calculate wi, recalculate CAIs
3. Select 25% genes with the highest CAIs, calculate wi, recalculate CAIs
…When we will have to select 1% of genes or less,
repeat with 1% until convergence.
Example of non-dominating bias
Genes in Class III (possibly horizontally transferred genes) of Bacillus subtilis
We can detect and measure this bias by finding the most biased genes in class III with analog of the algorithm proposed