48
Microbial Genome Profile Analysis - Genome Profile DataBase An integrated system for complete microbial genome analysis P.C. P.C. Lyu Lyu Department of Life Sciences & Institute of Bioinformatics and Structural Biology, National Tsing Hua University

Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Microbial Genome Profile Analysis- Genome Profile DataBase

An integrated system for complete microbial genome analysis

P.C. P.C. LyuLyuDepartment of Life Sciences &

Institute of Bioinformatics and Structural Biology, National Tsing Hua University

Page 2: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Why study microbial genomes?until whole genome analysis became viable, life sciences have been based on a reductionist principle – dissecting cell and systems into fundamental components for further studystudies on whole genomes and whole genome sequences in particular give us a complete genomic blueprint for an organismwe can now begin to examine how all of these parts operate cooperatively to influence the activities and behavior of an entire organism – a complete understanding of the biology of an organismmicrobes provide an excellent starting point for studies of this type as they have a relatively simple genomic structure compared to higher, multicellular organismsstudies on microbial genomes may provide crucial starting points for the understanding of the genomics of higher organisms

Page 3: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Why study microbial genomes?

analysis of whole microbial genomes also provides insight into microbial evolution and diversity beyond single protein or gene phylogeniesin practical terms analysis of whole microbial genomes is also a powerful tool in identifying new applications in for biotechnology and new approaches to the treatment and control of pathogenic organisms

Page 4: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

History of microbial genome sequencing

1977 - first complete genome to be sequenced was bacteriophage φX174 - 5386 bpfirst genome to be sequenced using random DNA fragments -Bacteriophage λ - 48502 bp1986 - mitochondrial (187 kb) and chloroplast (121 kb) genomes of Marchantia polymorpha sequencedearly 90’s - cytomegalovirus (229 kb) and Vaccinia (192 kb) genomes sequenced1995 - first complete genome sequence from a free living organism - Haemophilus influenzae (1.83 Mb)late 1990’s - many additional microbial genomes sequenced including Archaea (Methanococcus jannaschii - 1996) and Eukaryotes (Saccharomyces cerevisiae - 1996)

Page 5: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Most of complete genomes are microbes.

163 Complete Microbial Genomes

Page 6: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Does whole genome information reflect on environmental biodiversity ?

Phylogenomics- whole genome scale.

Genome Composition Bias in Extremophile :Hyperthermophile(80-110℃), Psychrophile(<15℃),Acidophile(pH<2), Alkaliphile(pH>10)Halophile(0.2M salt) …

Page 7: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

What's GPDB ?(Genome Profile DataBase)

Information derived from both nucleotide and protein sequence in a genome-wide scale.Provide and compare features of the fully sequenced organisms in a graphic and easy-reading way.

http://gpdb.life.nthu.edu.tw

Page 8: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

145 organisms(17 Archaea, 128 Bacteria)

223 complete sequence (157 Chromosomes, 66 Plasmids)

Total - 429,177 protein (ORFs)

GPDB Current Status

Page 9: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Definition of Genome Profile

Basic information -Species name, taxonomy, # of chromosome/plamid , genome size, orf number…

Nucleotide composition -ATGC composition, GC/AT content, N-nucleotide frequency (n=2,3), Codon usage…

Amino acid composition -Amino acid group composition, N-peptide frequency distribution (n=1,2), Proteome length, Mw, pIdistribution…

Page 10: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

GPDB flowchart

Genome ProfilePipeline

MySQL - GPDB

Apache + PHP +GD2 + JpGraph

COGNCBI RefSeq GTOP

Whole GenomeSequence Information

OrthologousProtein Information

PSI-Blast 3D StructureInformation

Genome ProfileData

Source

Process

GPDB

Browse Compare Virtual 2D

Linux PC Cluster

Page 11: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

GPDB database schema

Page 12: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

NCBIGeneBank / RefSeq

Whole Genome Data

Grab Whole GenomeDNA/Protein Sequence

including Annotation

Genome Profile

Data analysis

2D gel

Compare

Browse

Page 13: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Browse – Helicobacter pylori 26695

Page 14: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Basic Information

Page 15: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

AT/GC Content & Skew

Page 16: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

di-, tri-Nucleotide composition

Page 17: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Codon Usage

Page 18: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Amino acid composition

Page 19: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Proteome Distribution

TMHMM prediction

Page 20: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this
Page 21: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

NCBIGeneBank / RefSeq

Whole Genome Data

Grab Whole GenomeDNA/Protein Sequence

including Annotation

Genome Profile

Data analysis

2D gel

Compare

Browse

Page 22: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Virtual 2D Gel

Simulate 2D gel by pI & mw

Search by different pH & mw range to simulate the real 2D gel.by spot range to guess the possible spot, ex:

pI = 5.7 +/- 5%Mw = 100K +/- 5%

Filter out transmembrane proteins.TMHMM program

E. L.L. Sonnhammer, G. von Heijne, and A. Krogh.In J. Glasgow et al., eds., Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology , 175-182. AAAI Press, 1998.A. Krogh, B. Larsson, G. von Heijne, and E. L. L. Sonnhammer. Journal of Molecular Biology, 305(3):567-580, January 2001.Moller S, Croning MD, Apweiler R. Bioinformatics 2002 Jan;18(1):218

Page 23: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Virtual 2D Gel Flowchart

Ref Sequencefrom NCBI

Calcaulate pI and MW

MySQL

Protein records

PHP / GD Library

•ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

•EMBOSS package – pepstat•Available at http://www.hgmp.mrc.ac.uk/Software/EMBOSS/

Page 24: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

pI and MWTheoritical vs. Experimental

Comparison between calculated and experimentally obtained pIand MW values for 47 randomly selected proteins from Pseudomonas aeruginosa.

Nucleic Acids Research, 2003, Vol. 31, No. 13 3862-3865

Page 25: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Search by Gel Range without TMHMM Filter

pI 3-10Mw 10000 - 15000

Transmembrane Helix

Annotation

Page 26: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Search by Gel Range with TMHMM Filter

Page 27: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Search by Spot Range without TMHMM Filter

pI = 7 +/- 10%Mw = 100000 +/- 10%

Page 28: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Search by Spot Range with TMHMM Filter

Page 29: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

NCBIGeneBank / RefSeq

Whole Genome Data

Grab Whole GenomeDNA/Protein Sequence

including Annotation

Genome Profile

Data analysis

2D gel

Compare

Browse

Page 30: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

ex: pI distribution

4 strain E. coli

2 strain H. pylori

Lactococcus lactis subsp. lactis bv. Diacetylactis

Bordetella pertussis

Wigglesworthia glossinidia

Halobacterium sp. NRC-1

Pyrococcus abyssi

Page 31: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

pI Distribution

0%

2%

4%

6%

8%

10%

12%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

11.5

12.5

13.5

pI

ecoli0ecoli1ecoli2ecoli3

4 strain E. coli

Similar species, similar pI distribution

pI Distribution

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

11.5

12.5

13.5

pI

hpylo0hpylo1

2 strain H. pylori

Page 32: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Different species, different pI distribution

pI Distribution

0%

5%10%

15%

20%25%

30%

35%

40%45%

50%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

11.5

12.5

13.5

pI

ecoli0halob0hpylo0wglos0

Page 33: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

pI Distribution

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5

pI

145 Species

Page 34: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

How to compare ?

Interactive on-line analysis to help us to explore the different combination.

Easy-reading.

Page 35: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Transform

[Bacteria] - Aquifex aeolicus VF5

Page 36: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Transform

Page 37: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Hierarchical Clustering

Page 38: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Clustering

There are many clustering methods.We use Euclidean distances for Hierarchical Clustering.It just a easy way to read, not the only solution!

Page 39: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

On-line comparison

Page 40: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

On-line Clustering

Page 41: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this
Page 42: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this
Page 43: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this
Page 44: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

All 429177 ORFs pI distribution

0

10000

20000

30000

40000

50000

60000

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14

pI

Num

ber

pI DistributionGPDB 429,177 proteins

Page 45: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Extreme pI DistributionpI Distribution

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

11.5

12.5

13.5

pI

baphi0baphi1baphi2cbloc0mgeni0mpulm0uurea0wglos0

pI Distribtution

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

11.5

12.5

13.5

pI

blong0halob0mkand0mther0synec0synec1

Endosymbiont?

Halophile, Methanophile?

Page 46: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

ConclusionWe have constructed the database (GPDB), which provides many whole-genome scale features.A perl package called “ Genome Profile Pipeline”which can automatically analyze data was programmed.GPDB can help us to compare and analyze different microbial systems in a whole-genome scale.Integrated information from this database may be useful for data mining, comparative genomics and systems biology research.

Page 47: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this

Thanks for your attention

Page 48: Microbial Genome Profile Analysis - Genome Profile DataBasealpha.life.nthu.edu.tw/SysBio/gpdb_1001.pdf · microbial systems in a whole-genome scale. Integrated information from this