36
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International [email protected] http://BioCyc.org/

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

  • Upload
    gisela

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples. Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International [email protected] http://BioCyc.org/. Computing with Pathway Tools: APIs. Generic functions with a consistent naming scheme - PowerPoint PPT Presentation

Citation preview

Page 1: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

Computational Exploration of

Metabolic Networks with Pathway Tools

Part 2: APIs & Examples

Randy Gobbel, Ph.D.Bioinformatics Research Group

SRI International

[email protected]://BioCyc.org/

Page 2: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsComputing with Pathway

Tools: APIs

Generic functions with a consistent naming scheme

Basic frame access functions Built-in functions for analysis and global statistics

Simultaneous access to multiple KBs Cross-species comparisons Specialized KBs

MetaCyc SchemaBase

Page 3: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsComputing with Pathway

Tools: APIs

PerlCyc interface Library of Perl functions for querying PGDBs via socket connection Database access functions

Select_Organism, All_Pathways Functions for performing inference / hardwired queries

Genes_Of_Reaction, Genes_Of_Pathway Transcription_Unit_Transcription_Factors Enzyme_P

JavaCyc interface also in progresshttp://aracyc.stanford.edu/~mueller/perlcyc/

Lisp API http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

Page 4: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsPerlcyc and Javacyc

Interface to running Pathway Tools image through TCP

Names are translated to Perl and Java conventions

Object references are supported by means of unique frame names

Page 5: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsPathway Tools API

Functions

get_class_all_instances(Class) Returns the instances of Class

Key Pathway Tools classes:

Genetic-Elements Genes Proteins

Polypeptides Protein-Complexes

Pathways

Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites

Page 6: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsPathway Tools API

Functions

Notation Frame.Slot means a specified slot of a specified frame

get_slot_value(Frame Slot) Returns first value of Frame.Slot

get_slot_values(Frame Slot) Returns all values of Frame.Slot

slot_has_value_p(Frame Slot) Returns true if Frame.Slot has at least one value

member_slot_value_p(Frame Slot Value) Returns true if Value is one of the values of Frame.Slot

Page 7: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Additional Pathway Tools Functions – Semantic Inference LayerBuilt-in functions encode commonly used queries

that compute indirect DB relationships genes_of_pathway, substrates_of_pathway all_transcription_factors, regulon_of_protein

See http://bioinformatics.ai.sri.com/ptools/ptools-fns.html for more information

Page 8: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Computing with Pathway Tools:Flat Files

Two file formats: tab-delimited, attribute-valueOne file for each format, each datatypeSpecification:

http://bioinformatics.ai.sri.com/ptools/flatfile-format.htmlExamples:

Pathways.col – Pathways and genes encoding enzymes Enzymes.col – Enzymes and reactions they catalyze Pathways.dat – Full data on each pathway Reactions.dat – Full data on each reaction

Page 9: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsExample Flat File

UNIQUE-ID - P107-PWYTYPES - Energy-MetabolismCOMMON-NAME - RuMP cycle and formaldehyde assimilationREACTION-LIST - FORMATEDEHYDROG-RXNREACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXNREACTION-LIST - 6PGLUCONDEHYDROG-RXNREACTION-LIST - R84-RXNREACTION-LIST - PGLUCISOM-RXNREACTION-LIST - R12-RXNREACTION-LIST - R10-RXNSYNONYMS - ribulose-monophosphate cycleSYNONYMS - formaldehyde oxidation//

Page 10: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsExample Flat File –

Reactions.dat

UNIQUE-ID - R84-RXNTYPES - EC-1.1.1EC-NUMBER - 1.1.1.-IN-PATHWAY - P122-PWYIN-PATHWAY - P107-PWYLEFT - GLC-6-PLEFT - NADOFFICIAL-EC? - NORIGHT - 6-P-GLUCONATERIGHT - NADHRIGHT - PROTON//

Page 11: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsExample Flat File –

Compounds.dat

UNIQUE-ID - GLC-6-PTYPES - Carbohydrate-DerivativesCOMMON-NAME - glucose-6-phosphateCAS-REGISTRY-NUMBERS - 56-73-5CHEMICAL-FORMULA - (C 6)CHEMICAL-FORMULA - (H 13)CHEMICAL-FORMULA - (O 9)CHEMICAL-FORMULA - (P 1)MOLECULAR-WEIGHT - 260.137SYNONYMS - D-glucose-6-PSYNONYMS - glucose-6-PSYNONYMS - α-D-glucose-6-phosphateSYNONYMS - α-D-glucose-6-PSYNONYMS - D-glucose-6-phosphate//

Page 12: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsBioinformatics Results:

Algorithms

Query and visualization environment for genome and pathway information

PathoLogic algorithm predicts the metabolic network of an organism from its genome

Algorithm for global characterization of a metabolic network

Algorithms under development for qualitative modeling of the cell

Page 13: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsThe Pathway Tools

KB as a "virtual cell"

Detailed representation of proteins, including subunits

Protein complexes and modificationsLinks from genome, through proteins, to

pathways and superpathways

Page 14: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsComputing with the

Metabolic Network

Comparative analysis of metabolic networksVisualization of expression data

Correlation of metabolism and transportConnectivity analysis of metabolic network

Forward propagation of metabolitesVerification of known growth media with

metabolic network

Page 15: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Computational Explorationof PGDBsInfer metabolic network from genome

Bioinformatics 18:705 2002Global properties of the metabolic network

Genome Research 10:568 2000Global properties of the genetic network

Comparison of whole metabolic networks

Consistency of a PGDB with respect to known growth-media requirements

Search for gaps in metabolic network Pacific Symp Biocomputing 2001:471

Page 16: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsExample Studies

Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network

Reactions catalyzed by more than one enzyme Enzymes that catalyze more than one reaction Reactions participating in more than one pathway

Automatic detection of intersection points in the metabolic network Nutrient analyses

Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network?

Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds?

Operon prediction

Page 17: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsProtein subunits and

linked genes

Question: are protein subunits coded by neighboring genes?

Proteins are linked to genes, gene positions are recorded in the KB

Procedure Fetch all protein complexes Subunits are stored in the ‘components’ slot Each component has a ‘gene’ slot Genes have ‘left-end-position’ and ‘right-end-position’ slots

Results Protein subunits of >90% of heteromeric enzymes are

encoded by neighboring genes

Page 18: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Global properties: How many reactions are catalyzed by more than one enzyme?Procedure

get_class_all_instances(‘Reactions’) We are interested only in reactions with at least one value in

their ‘enzymatic-reaction’ slot result = reactions with more than one value for their

‘enzymatic-reaction’ slotResults

About 10% of reactions are catalyzed by more than one enzyme

Two classes of multi-enzyme reactions Homologous enzymes “Easy” reactions

Page 19: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?)Procedure

get_class_all_instances(‘Proteins’) result = proteins with more than one value in the ‘catalyzes’

slotResults

100 out of 607 enzymes catalyze multiple reactions This is significantly more than predicted by genome

sequencing projects

Page 20: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Global properties: Reactions in multiple pathways Procedure

get_class_all_instances(‘Reactions’) result = reactions with more than one value in the ‘in-

pathway’ slotSignificance

Reactions that appear in multiple pathways correspond to intersection points in the metabolic network

Could be used to identify candidate reactions for drug targets

Page 21: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsMetabolic Overview

Queries

Species comparison Highlight reactions that are

Shared/not-shared with Any-one/All-of A specified set of species

Overlay expression data Absolute or relative expression levels Reaction colors reflects expression level

Page 22: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

A

E

Page 23: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Page 24: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples
Page 25: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples
Page 26: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsC. crescentus Cell Cycle Gene

Expression

Page 27: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformatics

Global Consistency Checking of Biochemical Network

Given: A PGDB for an organism A set of initial metabolites

Infer: What set of products can be synthesized by the small-

molecule metabolism of the organism

Can known growth medium yield known essential compounds?

Pacific Symposium on Biocomputing p471 2001

Page 28: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsAlgorithm:

Forward Propagation

Nutrientset

Metaboliteset

“Fire”reactions

Transport

Products

Reactants

PGDBreaction

pool

Page 29: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsResults

Phase I: Forward propagation 21 initial compounds yielded only half of 38 essential

compounds for E. coli

Phase II: Manually identify Bugs in EcoCyc (e.g., two objects for tryptophan) Missing initial protein substrates (e.g., ACP) Missing pathways in EcoCyc

Phase III: Forward propagation with 11 more initial metabolites

Yielded all 38 essential compounds

Page 30: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsInitial Metabolites

(Total: 21 compounds)

Nutrients (8) (M61 Minimal growth medium)

H+, Fe2+, Mg2+, K+, NH3, SO4

2-, PO4

2-, Glucose

Nutrients (10) (Growth conditions)

Water, Oxygen, Trace elements (Mn2+, Co2+, Mo2+, Ca2+, Zn2+, Cd2+, Ni2+, Cu2+)

Bootstrap Compounds (3) ATP, NADP, CoA

Page 31: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsNutrient-Related Analysis:

Validation of the EcoCyc Database

Results on EcoCyc:

Phase I:• Essential compounds

• produced

19• not produced

19

• Total compounds • produced:

(28%)

• Reactions• Fired

(31%)

Page 32: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsMissing Essential

Compounds Due To

Bugs in EcoCyc

Narrow conceptualization of the problem Protein substrates

Incomplete biochemical knowledge

Page 33: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsNutrient-Related Analysis:

Validation of the EcoCyc Database

Results on EcoCyc:

Phase II (After adding 11 extra metabolites):• Essential compounds

• produced

38• not produced

0• Total compounds

• produced:

(49%)• not produced:

(51%)• Reactions

• Fired

(58%)• Not fired

(42%)

Page 34: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsOperon Prediction

Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002)

Distance between genes Functional classification Correctly predicts 75% of transcription units, 65% of operons

Additional information available in PGDB Pathways Protein complexes Transporters Improved prediction performance: 80% of transcription units,

69% of operonsDetailed paper in preparation

Page 35: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsVisualization of Genetic

Network

Operon display windowTranscription factor display windowHighlight regulon on Overview diagramPaint expression data onto Overview diagram

Database adapter mechanism: MAGE-ML intermediate form Adapter defined for SMD

Animation User specified mapping of color ranges Import of SAM files (next release)

List of significantly +/- genesDisplay full genetic network (later release)

Page 36: Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

SRI InternationalBioinformaticsAcknowledgements

SRI Peter Karp, Suzanne Paley,

Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud

EcoCyc Project Julio Collado-Vides, Ian

Paulsen, Monica Riley, Milton Saier

MetaCyc Project Sue Rhee, Lukas Mueller,

Peifen Zhang, Chris SomervilleStanford

Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh

Funding sources: NIH National Center for

Research Resources NIH National Institute of

General Medical Sciences

NIH National Human Genome Research Institute

Department of Energy Microbial Cell Project

DARPA BioSpice, UPC

BioCyc.org