23
Pathway Tools Meeting - December 1, 2005, Geneva Pathway Tools Meeting - December 1, 2005, Geneva (SIB) (SIB) Putting together synteny and Putting together synteny and metabolic information to metabolic information to achieve relevant expert achieve relevant expert annotation of microbial annotation of microbial genomes genomes Dr Claudine Médigue & & : :

Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

  • Upload
    penda

  • View
    28

  • Download
    2

Embed Size (px)

DESCRIPTION

Pathway Tools Meeting - December 1, 2005, Geneva (SIB). :. &. Putting together synteny and metabolic information to achieve relevant expert annotation of microbial genomes. Dr Claudine Médigue. What is MaGe ? Yet another bacterial annotation platform !…. - PowerPoint PPT Presentation

Citation preview

Page 1: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Pathway Tools Meeting - December 1, 2005, Geneva (SIB)Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Putting together synteny and Putting together synteny and metabolic information to achieve metabolic information to achieve

relevant expert annotation of relevant expert annotation of microbial genomesmicrobial genomes

Dr Claudine Médigue

&& ::

Page 2: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Its development started in Oct. 2002

Context : the Acinetobacter sp. ADP1 genome annotation (Summer 2004)

What is MaGe ? Yet another bacterial annotation platform !… What is MaGe ? Yet another bacterial annotation platform !…

An automatic annotation process :

Shares functionalities with other existing annotation systems :

A relational database (MySQL) used to store the sequences and the analysis results.

Syntaxic and functional annotationsFunctional annotation and classification inferences

A WEB interface allowing multiple users to simultaneously annotate a genome. Connectivity to other databases or systemsDeveloped by biologists involved in manual expert annotation

Graphical interface which focuses on gene context and synteny results with available bacterial proteomes.

Page 3: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Relational SGBD (MySQL)Relational SGBD (MySQL)

Purpose: storage of ‘clean’ and complete annotation data which are subsequently used in the genomic comparative analysis.

• Annotation tool resultsAnnotation tool results : : Intrinsic: genes, signals, repeats,…

• New bacterial genomesNew bacterial genomes (annotation projects)(annotation projects)

Extrinsic : BLAST, InterPro, COG, synteny …

Introduction to the Prokaryotic Genome DataBase (PkGDB)Introduction to the Prokaryotic Genome DataBase (PkGDB)

• Complete bacterial genomesComplete bacterial genomes (Refseq NCBI and Genome Review EBI)(Refseq NCBI and Genome Review EBI)

Integration in PkGDB

Management of frameshifts

Correction of obvious errors

Syntactic re-annotation

Add missing gene annotations

NAR (WS), 2003

NAR (WS), 2005

Page 4: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Simplified structure of PkGDBSimplified structure of PkGDB

Genomic ObjectsAutomatic and manual functional assignations

Published genomes Newly sequenced genomes

Gene prediction AMIGene

Re-annotation project Annotation project

Annotation history

Sequence updates and annotation transfer

Functional Classification

Annotator management

Functional predictions

Orthologs & Paralogs

Syntenies

Protein similarities

Domains and motifsEnzymatic functions

helixes and signal peptides

UniprotKEGG COGInterpro

Reference annotation for model

organisms

Specific regions

Ecogene Geneprotec Subtilist

GenomeReviews

NCBIRefSeq

Annotation management

MultiFun GeneOntology

BioCyc

Project customization

• Multiple correspondences• Local rearrangements (ins/del)

Boyer et al. Bioinformatics (Nov 2005)

Page 5: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

How to read the synteny maps ?How to read the synteny maps ?

ACIAD0574hutH

Two ‘homologs’ to ACIAD0574on the P. aeruginosa genome

This P. syringae gene (PSPTO0599/hutH-1) is a putative ‘ortholog’ to ACIAD0574 and is

involved in a synteny group containing 17 genes (in green)

These two P. syringae genes (PSPTO5274/hutH-2 and 5276/ hutH-3)

are similar to ACIAD0574 (putative paralogs of PSPTO0599)

Page 6: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

A larger view of the previous A larger view of the previous AcinetobacterAcinetobacter ADP1 region ADP1 region

0574

hutH

0582-0583

fabG-fabF

0562

hisS

4 of 138genomesin PkGDB

9 of 284 complete microbial proteomes (RefSeq section)

Page 7: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

How are genes organized in a synteny group ?How are genes organized in a synteny group ?

Synteny with Ralstonia solanacearum Mega Plasmid

Synteny with Ralstonia solanacearum chromosome

Page 8: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Synteny maps are useful to annotate gene fusion/fissionSynteny maps are useful to annotate gene fusion/fission

Colored rectangles represent the part of the protein which aligns with the corresponding Acinetobacter protein.

Fusion of genes involved in DNA replicationdnaQ (DNA polIII, epsilon subunit + proofreading 3’-5’ exonuclease)

rnhA (degradation of Okazaki fragments)

(dnaQ) YPO1082YPO1081 (rnhA)

(dnaQ) STM0264STM0263 (rnhA)

(dnaQ) NMB1514(rnhA) NMB1618

(dnaQ) PA1816PA1815 (rnhA)

(dnaQ) PSPTO3711PSPTO3712(rnhA)

Page 9: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Genomic ObjectsAutomatic and manual functional assignations

Published genomes Newly sequenced genomes

Gene prediction AMIGene

Re-annotation project Annotation project

Annotation history

Sequence updates and annotation transfer

Functional Classification

Annotator management

Functional predictions

Protein similarities

Domains and motifsEnzymatic functions

helixes and signal peptides

UniprotKEGG COGInterpro

Reference annotation for model

organisms

Ecogene Geneprotec Subtilist

GenomeReviews

NCBIRefSeq

Annotation management

MultiFun GeneOntology

BioCyc

Functional Classification

Annotator management

Orthologs & Paralogs

Syntenies

Reference annotation for model

organisms

Specific regions

Ecogene Geneprotec Subtilist

MultiFun GeneOntology

Project customization

Simplified structure of PkGDBSimplified structure of PkGDB

PRIAMhttp://bioinfo.genopole-toulouse.prd.fr/priam/

Position-specific scoring matrices ('profiles') built with SwissProt proteins

www.genome.jp/kegg/

Dynamicrequests

Localinstallation

http://www.biocyc.org/

Page 10: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Setting up a new annotation project : an exampleSetting up a new annotation project : an example

Newly sequenced genomes

• Bradyrhizobium sp. ORS278 (Genoscope) -> 1 chr (7,5 Mb)• Bradyrhizobium sp. BTAi (DOE/JGI) -> 1 chr (8,5 Mb)

Genomes in public DataBanks

• Mesorhizobium loti (00) • Sinorhizobium meliloti (01)• Bradyrhizobium japonicum (02)• Rhodopeudomonas palustris (03)

Available related sequences

• Rhizobium leguminosarum(Sanger Center)• Rhodobacter sphaeroides(DOE/JGI)• Rhodospirillum rubrum (DOE/JGI)

Complete pipeline ofautomatic annotations

Re-annotation process(pseudogenes, missing genes)

Automatic syntaxic annotations(in some cases, functional annotations)

Searching for synteny groups with complete proteomes available in RefSeq section(NCBI, 284 to date) and in PkGDB (curated genomes, 138 to date)

PkGDB

AcinetoScope

RhizoScopeYersiniaScope ColiScope

CloacaScope

FrankiaScope

Pathway Tools

BradyBTCyc BradyORCyc

Metabolic pathwayreconstruction

BrajapCyc

Ocelotobjectmodel

RhizoCyc

BioWareHouse relational model

Page 11: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

1414 4343

127127

873873

897897

830830

1616

7676

3030

724724

BradyrhizobiumBradyrhizobium sp. ORS278 sp. ORS278BradyrhizobiumBradyrhizobium sp. BTAi sp. BTAi

Bradyrhizobium japonicumBradyrhizobium japonicum USDA 110 USDA 110

ORS278BTAi genes coding the same

reactionPathway Reaction

BRAOR5732 BRABT1389,BRABT0754,BRABT0723,BRABT0755,BRABT0724

protocatechuate degradation I PROTOCATECHUATE-4,5-DIOXYGENASE-RXN

BRAOR5733 BRABT1389,BRABT0754,BRABT0723,BRABT0755,BRABT0724

protocatechuate degradation I PROTOCATECHUATE-4,5-DIOXYGENASE-RXN

BRAOR5771BRABT1389,BRABT0754,BRABT07

23,BRABT0755,BRABT0724protocatechuate degradation I PROTOCATECHUATE-4,5-DIOXYGENASE-RXN

BRAOR5772BRABT1389,BRABT0754,BRABT07

23,BRABT0755,BRABT0724protocatechuate degradation I PROTOCATECHUATE-4,5-DIOXYGENASE-RXN

BRAOR5776 BRABT0759 protocatechuate degradation I RXN-2463

Comparative Metabolic Capabilities : an exampleComparative Metabolic Capabilities : an example

Reaction content comparisons between the 3 Bradyrhizobium Reaction content comparisons between the 3 Bradyrhizobium organisms organisms (BioWareHouse SQL query on reactions having gene-> (BioWareHouse SQL query on reactions having gene->

protein->reaction correspondences )protein->reaction correspondences )

Page 12: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

BRAOR5771-5772 - 5773

BradyrhizobiumBradyrhizobium ORS278 region containing CDS 5771&5772 ORS278 region containing CDS 5771&5772

!!!

!!!???

““Cloning and Characterization of the Genes Encoding Cloning and Characterization of the Genes Encoding Enzymes for the Protocatechuate Enzymes for the Protocatechuate MetaMeta-degradation -degradation Pathways of Pathways of Pseudomonas ochraceaePseudomonas ochraceae NGJ1” Maruyama NGJ1” Maruyama et et alal. (2004) . (2004) Biosci. Biotechnol. BiochemBiosci. Biotechnol. Biochem, , 6868, 1434-1441., 1434-1441.

15277747

Page 13: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

AUTOmatic vs EXPert annotation of the regionAUTOmatic vs EXPert annotation of the region

BRAOR5770

BRAOR5771

BRAOR5772

BRAOR5773

BRAOR5774

BRAOR5775

BRAOR5776

AUTO =

PRODUCT EC-number Gene Evidence

4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase

EXP

1.1.1.18 ligCBLAST R. palusPRIAM (medium)

4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase 1.2.1.45 ligC BLAST P. testosteroniPublication + Enzyme

Protochatechuate 4,5-dioxygenase, alpha subunit 1.13.11.8 ligB BLAST R. palusPRIAM (high)

AUTO

EXP

AUTO = EXP Protochatechuate 4,5-dioxygenase, beta subunit 1.13.11.8 ligA BLAST R. palusPRIAM (high)

2-pyrone-4,6-dicarboxylic acid hydrolase none ligI BLAST R. palus

3.1.1.57 ligI BLAST R. palusPublication + Enzyme

AUTO

EXP 2-pyrone-4,6-dicarboxylic acid hydrolase

Putative dehydrogenase none BLAST R. palusAUTO none

1.1.1.-BLAST R. palusInterproScan

EXP nonePutative dehydrogenase with NAD binding protein

Putative acyl transferase none BLAST R. palusAUTO fidZ

4.1.3.17 BLAST P. ochraceaePublication + Enzyme

EXP ligK4-hydroxy-4-methyly-2-oxoglutarate aldolase

4-oxalomesaconate hydratase none ligJ BLAST R. palus

4.2.1.83 ligJ BLAST R. palusPublication + Enzyme

AUTO

EXP 4-oxalomesaconate hydratase

Page 14: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

BradyrhizobiumBradyrhizobium ORS278 region after expert annotation ORS278 region after expert annotation

ligC1.2.1.45 4.1.3.1

7

BRAOR5770

4.2.1.83ligBA

1.13.11.8

BRAOR5771-72BRAOR5773

ligI3.1.1.57

BRAOR5775

ligKligJ

BRAOR5776BRAOR5777 BRAOR5778

Page 15: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Connectivity to KEGG databaseConnectivity to KEGG database

Enzymes encoded by genes in the MaGe region

Enzymes encoded by genes elsewhere in the Bradyrhizobium genomeAdditional enzymes in E. coli

4.2.1.83

?

Page 16: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Connectivity to KEGG databaseConnectivity to KEGG database

Enzymes encoded by genes in the MaGe region

Enzymes encoded by genes elsewhere in the Bradyrhizobium genomeAdditional enzymes in E. coli

Page 17: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

57715775

5772 5773

BradyrhizobiumBradyrhizobium ORS278 region after expert annotation ORS278 region after expert annotation

5770 5776

BRAOR5770_ligC

4-carboxy-2-hydroxymuconate6-semialdehyde dehydrogenase

1.2.1.45

BRAOR5776_ligJ

4-oxalmesaconate hydratase

4.2.1.83

The reactions catalyzed by 1.2.1.45 and 4.2.1.83 exist in MetaCyc but they are not involved in a pathway.

Probable protochatechuatetransporter

Probable transcriptionalregulator of protochatechuate degradation

BRAOR5777 BRAOR5778ligR

Page 18: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Enzymatic activity predictions (PRIAM) : some resultsEnzymatic activity predictions (PRIAM) : some results

Comparison of PRIAM predictions [P] and Expert annotations [E]

Nb EC_[P] vs EC_[E]

Total genes 3325

1012 / 947

AcinetobacterADP1

Pseudoalteromonashaloplanktis

Frankiaalni

Pseudomonasentomophila

3514

927 / 993

6861

1729 / 1498

5182

1455 / 1232

EC_[P] = EC_[E] 632 (62.5%)

47 (4.6%)EC_[P](3 digit) = EC_[E]

697 (75.2%)

23 (2.5%)

912 (52.8%)

68 (3.9%)

820 (56.3%)

46 (3.2%)

EC_[P] <> EC_[E]

111 (11.7%)

EC_[P] & (NO EC_[E])

131 (12.9%)

202 (20.0%)

EC_[E] & (NO EC_[P]) 152 (15.3%)

102 (11.0%)

105 (11.3%)

111 (7.4%)

401 (23.2%)

348 (20.1%)

90 (7.3%)

285 (19.6%)

304 (20.9%)

Limitations of PRIAM sequence-based enzyme prediction

Availability of at least one UniProt/SwissProt sequence in the Enzyme entry ! Existence of closely related enzymes with different substrate specificity

Several wrong predictions in case of Medium/Low PRIAM confidence

Relaxed substrate specificity exhibited by some enzymes

Page 19: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

PGDBs built at GenoscopePGDBs built at Genoscope

Automatic updates of PathoLogic predictions : every week

MaGe’s training courses include a quick overview of how to explore PathoLogic results to perform relevant expert annotation

• The number of enzymes and pathways is slightly greater in our PGDBs (source of annotations + process of Pathologic file format generation)

• Important discrepancies with Sinorhizobium meliloti (44 predicted pathways in the SRI/EBI PGDB vs 259 in the Genoscope PGDB)

18 PGDBs : other published bacterial genomes

25 PGDBs for newly sequenced and annotated bacterial genomes

Our PGDBs are currently available in the MaGe’s interface

NO curation to date (Tier 3* Databases)(except for Acinetobacter ADP1-> Metabolic Thesaurus project)

HomePage : http://www.genoscope.cns.fr/agc/mage/

«Expansion of the BioCyc collection of pathway/genome databases to 160 genomes» Karp et al.Nucleic Acid Research, 2005, 33: 6083-6089.

To date : about 60 Tier 3 PGDBs 16 PGDBs common to SRI/EBI PGDBs Tier3* (and 4 with Tier2*):

*Tier 3: Computationally-Derived Databases Subject to No Curation*Tier 2: Computationally-Derived Databases Subject to Moderate Curation

Page 20: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Some Questions / PerspectivesSome Questions / Perspectives

Better correspondences between BioCyc and MaGe

• Optional fields in the PathoLogic file format (PubMedID, Funcat, …)

How to tackle the pseudogene information ?

Pathway X doesn’t exist because

No enzyme has been found

Some enzymes correspond to pseudogenes

Remove false-positive pathway (Tier 3 -> Tier2)

Curation of PGDB ?

• Automatic reduction of false positive pathway predictions stored in the PGDBs

Integration and evaluation of Pathway Hole Filler

• Finding a way to get a list of false positive pathways at the end of the manual process of annotation.

Tier2 -> Tier1*, especially creation of new metabolic pathways :

• PGDBs freely available for «adoption» by biologists

!!! Not an easy task !!! (a strong knowledge of metabolism is required)

*Tier1: Intensively Curated Databases

Page 21: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Metabolic Thesaurus project at GenoscopeMetabolic Thesaurus project at Genoscope

Annotation

Knock-out collection

2240 ADP1 genesknocked out

Metabolism predictionVincent Schächter’s bioInformatic team

Flux ModelsModel

Network reconstruction

Biological evidence

Accurate phenotyping

Systematic phenotyping

Transcriptomeanalyses

Biochemical studies

Functional complementation

Véronique de Berardinis’s team

3325 Acinetobacter ADP1annotated genes

Page 22: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Metabolic Pathway Reconstruction / Experimental DataMetabolic Pathway Reconstruction / Experimental Data

Metabolic Thesaurus ColiScope

Acinetobacter ADP1 KO collection

Sequencing of 2 commensal and 4 pathogenic E. coli strains

Phenotypic analysis: growth essay on different nutrient sources+

Metabolome analysis: LC/MS and CE/MS

Data Integration and Comparative Analysis

Evolution of metabolic capabilities => adaptation of

microorganisms commensalism / virulence

emergence

Linked enzymatic activity to genes of unknown function

Page 23: Pathway Tools Meeting - December 1, 2005, Geneva (SIB)

Participating teamsParticipating teams

David Vallenet

Stéphane Cruveiller

AGC team : Zoé Rouy

Aurélie Lajus

Genoscope informatic system team

Laurent Sainte-Marthe

Claude Scarpelli

Sylvain Bonneval

… and with the help of : François Lefèvre (V. Schächter team)

Mage’s users feedback helps in improving many functionalities of our system !

Claudine Médigue