24
Analyses of ORFans in microbial and viral ge nomes Journal club presentation on Mar. 14 Albert Yu

Analyses of ORFans in microbial and viral genomes

  • Upload
    anson

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Analyses of ORFans in microbial and viral genomes. Journal club presentation on Mar. 14 Albert Yu. ORFan. Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) - PowerPoint PPT Presentation

Citation preview

Page 1: Analyses of ORFans in microbial and viral genomes

Analyses of ORFans in microbial and viral genomes

Journal club presentation on Mar. 14

Albert Yu

Page 2: Analyses of ORFans in microbial and viral genomes

ORFan

Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered

Nearly all genomes have ORFans (df %)

The more genomes sequenced, the more ORFans have found

Most are annotated as hypothetical proteins of unknown function (no exp.)

Page 3: Analyses of ORFans in microbial and viral genomes

ORFan continue

More data…

real , functional proteins

3D nstructure

conserved in closely related species (Ka/Ks)

Origin of ORFans ????????

Viral genome Microbial genome?

Viral laterally transferred genes (especially phages)

Page 4: Analyses of ORFans in microbial and viral genomes

Viral genome Microbial genome

Page 5: Analyses of ORFans in microbial and viral genomes

Question: the origin of ORFans

Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses

To find homologs to these microbial ORFans within the virus sequence database

Page 6: Analyses of ORFans in microbial and viral genomes

Genome-wide quantitative study

• BLASTP

• 277 microbial genomes

• 1456 viral genomes

• H(g): the number of genomes having at least one homolog of ORFan g

• U(g): uniqueness: the genomic distance between the genomes with ORFan g

Page 7: Analyses of ORFans in microbial and viral genomes

Classification of ORFans

• Singleton: without any homolog wherever

H=1, BLASTP=1

• Paralogous: homologs in the same genome

H=1, BLASTP>1

• Orthologous: homologs within very closely related microbial genome

H>1, U <= 0.1(by observations)

Page 8: Analyses of ORFans in microbial and viral genomes

The U-value for all ORFs in prokaryote genomes

In total:

ORFs: 818906

ORFans: 110186

S: 64324(7.8%)

P: 10419(1.3%)

O: 35443(4.3%)

0.64

S or p

O

Page 9: Analyses of ORFans in microbial and viral genomes

• ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%)

• Non-ORFans-VH%(NOVH): % of non-ORFans having homologs in viruses (4.1% ~ 18.2%)

• The strength of the hypothesis = the value between these two VH%

Page 10: Analyses of ORFans in microbial and viral genomes

Percentages of microbial ORFs with homologs in viruses

Red: OVH

Blue: NOVH24 phylogenetic clades

Bacteria

Archea

Firmicutes

Gamma proteobacteria

Page 11: Analyses of ORFans in microbial and viral genomes

The average % of OVH and NOVH in various groups

148

66

6310% vs 9 %

8.5% vs 2.7 %

6.6% vs 0.8 %

Page 12: Analyses of ORFans in microbial and viral genomes

Conclusion

• Most OVH << NOVH: current evidence supporting the hypothesis is weak

• Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased)

Viral database bias

1456 viruses

280 phages (109--Gamma; 102--Firmicutes; 69--others)

Sampling ?????

Page 13: Analyses of ORFans in microbial and viral genomes

Viral genome Microbial genome

Page 14: Analyses of ORFans in microbial and viral genomes

• 277 Microbial genomes• 1456 viruses

All-virus-DB: 43566 ORFs• 280 phages (20%)

Phage-DB: 18368 ORFs (42%)ORFans:

all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr)

all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)

Page 15: Analyses of ORFans in microbial and viral genomes

Some characteristics of ORFans

• Bacterial ORFans are shorter than non-ORFans on average

• Bacterial ORFans have significant lower GC3 content than non-ORFans

Page 16: Analyses of ORFans in microbial and viral genomes

The length of Viral ORFans and non-ORFans

Length: Non-ORFans > ORFans

Page 17: Analyses of ORFans in microbial and viral genomes

Length: ORFans < non-ORFans

GC3%: ORFans < non-ORFans

Page 18: Analyses of ORFans in microbial and viral genomes

The number of ORFs per genome in 1456 viruses

Focusing on phage: higher %

Page 19: Analyses of ORFans in microbial and viral genomes

The growing of the number of phage ORFans (consistent)

Drop to 0 ?

Keep increasing

38.4%

Page 20: Analyses of ORFans in microbial and viral genomes

• Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity

• Only 280 phage genomes in database (low phage sampling)

Page 21: Analyses of ORFans in microbial and viral genomes

Less than 5 phages

Virus sampling bias between and within groups

Page 22: Analyses of ORFans in microbial and viral genomes

The H-value percentages for all phage ORFs and prokaryotic ORFs

prokaryotesphages

9.1% - ORFans

11.3% - ortho

38.4% - ORFans

32.4% - ortho

Page 23: Analyses of ORFans in microbial and viral genomes

the H-value percentages of phage ORFs

Page 24: Analyses of ORFans in microbial and viral genomes

• 4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans)

• 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans)

• 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)