Upload
anson
View
37
Download
0
Embed Size (px)
DESCRIPTION
Analyses of ORFans in microbial and viral genomes. Journal club presentation on Mar. 14 Albert Yu. ORFan. Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) - PowerPoint PPT Presentation
Citation preview
Analyses of ORFans in microbial and viral genomes
Journal club presentation on Mar. 14
Albert Yu
ORFan
Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered
Nearly all genomes have ORFans (df %)
The more genomes sequenced, the more ORFans have found
Most are annotated as hypothetical proteins of unknown function (no exp.)
ORFan continue
More data…
real , functional proteins
3D nstructure
conserved in closely related species (Ka/Ks)
Origin of ORFans ????????
Viral genome Microbial genome?
Viral laterally transferred genes (especially phages)
Viral genome Microbial genome
Question: the origin of ORFans
Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses
To find homologs to these microbial ORFans within the virus sequence database
Genome-wide quantitative study
• BLASTP
• 277 microbial genomes
• 1456 viral genomes
• H(g): the number of genomes having at least one homolog of ORFan g
• U(g): uniqueness: the genomic distance between the genomes with ORFan g
Classification of ORFans
• Singleton: without any homolog wherever
H=1, BLASTP=1
• Paralogous: homologs in the same genome
H=1, BLASTP>1
• Orthologous: homologs within very closely related microbial genome
H>1, U <= 0.1(by observations)
The U-value for all ORFs in prokaryote genomes
In total:
ORFs: 818906
ORFans: 110186
S: 64324(7.8%)
P: 10419(1.3%)
O: 35443(4.3%)
0.64
S or p
O
• ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%)
• Non-ORFans-VH%(NOVH): % of non-ORFans having homologs in viruses (4.1% ~ 18.2%)
• The strength of the hypothesis = the value between these two VH%
Percentages of microbial ORFs with homologs in viruses
Red: OVH
Blue: NOVH24 phylogenetic clades
Bacteria
Archea
Firmicutes
Gamma proteobacteria
The average % of OVH and NOVH in various groups
148
66
6310% vs 9 %
8.5% vs 2.7 %
6.6% vs 0.8 %
Conclusion
• Most OVH << NOVH: current evidence supporting the hypothesis is weak
• Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased)
Viral database bias
1456 viruses
280 phages (109--Gamma; 102--Firmicutes; 69--others)
Sampling ?????
Viral genome Microbial genome
• 277 Microbial genomes• 1456 viruses
All-virus-DB: 43566 ORFs• 280 phages (20%)
Phage-DB: 18368 ORFs (42%)ORFans:
all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr)
all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)
Some characteristics of ORFans
• Bacterial ORFans are shorter than non-ORFans on average
• Bacterial ORFans have significant lower GC3 content than non-ORFans
The length of Viral ORFans and non-ORFans
Length: Non-ORFans > ORFans
Length: ORFans < non-ORFans
GC3%: ORFans < non-ORFans
The number of ORFs per genome in 1456 viruses
Focusing on phage: higher %
The growing of the number of phage ORFans (consistent)
Drop to 0 ?
Keep increasing
38.4%
• Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity
• Only 280 phage genomes in database (low phage sampling)
Less than 5 phages
Virus sampling bias between and within groups
The H-value percentages for all phage ORFs and prokaryotic ORFs
prokaryotesphages
9.1% - ORFans
11.3% - ortho
38.4% - ORFans
32.4% - ortho
the H-value percentages of phage ORFs
• 4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans)
• 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans)
• 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)