21
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh

Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh

Embed Size (px)

Citation preview

Integration of E. Coli Data(E. coli Pathway and Genomic Data from BioCyc)

Jesse Walsh

Outline

• Description of BioCyc data– Format– Key Classes

• How I am retrieving and storing the data– SPDB schema– Key tables

• Recent Developments

BioCyc Data Format

• Frames are made of slots– Slots are made of facets– Slots values can have annotations

Slot

Slot

Slot

Frame

Facet Facet Facet

Annotation

Annotation

Reaction X

Common Name

EC #

Reactants

Coefficient

Compartment

:VALUE-TYPE, :DOCUMENTATION

BioCyc Class Hierarchy…. Complicated

Key Classes in BioCyc

• Genes • Proteins • Polypeptides (a subclass of Proteins) • Protein-Complexes (a subclass of Proteins) • Pathways • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters http://bioinformatics.ai.sri.com/ptools/classes.html

Why not just use BioCyc?

• Advantages:– Fast access to individual objects– Logic based assertions

• Disadvantages– Hard to query– Difficult to understand the structures– Difficult to know all of what is in the database– Difficult to integrate other types of data

• Solution:– Create a relational database

SPDB SchemaSimple Pathway DataBase

Pathway

• “Central” table• Allows organization of major pathways• Easy to retrieve a pathway, or all reactions

that share a pathway with a specified reaction

Reaction

• Reactions types include: – Catalysis, Spontaneous, Transcription, Translation,

Promoter, Transcription Factor• Transcription, Translation, Promoter, and TF reactions

are all inferred reactions• Reactions are the “nodes” of networks in SPBD

Entity• Entities include:

– Compound, Protein (Complex/Monomer), Gene, Transcription Unit, Promoter

• Entities with multiple types are represented with the most specific type in its hierarchy– (i.e. A protein that is also a complex will be listed as “Complex”, not

“Protein”– “Enzyme” status is stored as a participation type

Participation in Reactions

• Entities participate in reactions• Information includes km data• Unsure if condition data exists, and unsure

how to access evidence data

Data Links in BioCycPathway

Reaction

Reactants/Products Enzymes/Cofactors

Genes

Transcriptional Unit

Promoter

Transcription Factor Sigma Factor

Translation Reaction

Transcription Reaction

Promoter Relation

Activation/RepressionSpecificity Relation

Data Retrieval StrategyPathway

Reaction

Reactants/Products Enzymes/Cofactors

Genes

Transcriptional Unit

Promoter

Transcription Factor Sigma Factor

Translation Reaction

Transcription Reaction

Promoter Relation

Activation/RepressionSpecificity Relation

1

2

3

Improvements to SPDB

• Explicitly organize pathway networks and reaction networks

• Allow recursive tracing of pathway elements

Old Organization of Reaction Data

Pathway

Rxn

Rxn

Rxn

Rxn

Rxn

Rxn

Better Way

RxnRxn

Rxn

RxnRxn

Pathway

Explicitly link reactions in the context of individual pathways

Recursively Tracing the DataPathway

Reaction

Reactants/Products Enzymes/Cofactors

Genes

Transcriptional Unit

Promoter

Transcription Factor Sigma Factor

Translation Reaction

Transcription Reaction

Promoter Relation

Activation/RepressionSpecificity Relation

Genes of TFs

Coefficient Data for Reactions

6 ATP + 3 L-serine + 3 2,3-dihydroxybenzoate 6 diphosphate + 6 AMP + enterobactin + 9 H+

To Do

• MIAME experimental conditions• Explore other data in BioCyc

Flow of Data (The Big Picture)• Data is imported from BioCyc (EcoCyc + MetaCyc)• Changes can be made to BioCyc via Cell Designer, which will then be

propagated to SPDB• Biomart is one option to directly view data in SPDB

BioCycPGDB SPDBJavaCycConnection BioCycImporter

Lisp Based DB MySQL Object Oriented DB

API based on JavaCyc

Cell Designer BioMart

Researcher

Data in BioCyc13.1 13.0 SPDB

Pathways 242 (Excludes Superpathways) 237 290

Reactions 1784 175110714(1751 not inferred, 4373

‘orphaned’)Enzymes 1415 1409 1409

Transporters 244 243 --Gene product

summaries 3599 -- --

Genes 4496 4477 4523Transcription

Units 3356 3375 3337

Citations 18,469 17,842 --