1
BiGG: Biochemical, Genetic and Genomic Database Jun Young Park 1 , Jan Schellenberger 2 , Tom M. Conrad 3 , Bernhard Ø. Palsson 1,2 1 Department of Bioengineering, University of California San Diego [email protected] [email protected] 2 Bioinformatics Program, University of California San Diego [email protected] 3 Department of Chemistry and Biochemistry, University of California San Diego [email protected] DATABASE CONTENTS Introduction EXPORTING BROWSING ABSTRACT References 1. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ: Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox. Nat Protocols 2007, 2(3):727-738. 2. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27(1):29-34. 3. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A et al: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19(4):524-531. We describe BiGG, a database of Bi ochemically, G enetically and G enomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different species. Furthermore, BiGG contains links to several publicly available databases where additional information can be found and integrated. In addition, BiGG contains a customized export tool that enables the generation of SBML files for further network analysis by external software packages. BiGG addresses a need in the systems biology community to have access to high quality curated metabolic reconstructions. CONCLUSION • The scope of covered reactions is often greater than for other databases. • BiGG uses both genetics and literature based data to assess whether a reaction is present. • BiGG assigns confidence levels to each reaction which can be used when evaluating the resultant model. • BiGG includes relationships between genes and proteins (GPR). • Compartmentalization in BiGG gives a more accurate description of reactions involving membrane transporters. • BiGG provides the gap between a reconstruction and a model. • The BiGG database provides the first collection of curated high quality metabolic reconstructions suitable for Gene-Protein-Reaction (GPR) associations Single Gene Reaction Multiple Gene Reaction DNA mRNA Protein Reaction translati on complexin g transcripti on activi ty sphingosine kinase 2 platelet-activating factor acetylhydrolase The ‘on’ or ‘off’ state of each reaction in the network may be controlled by the genotype and expression level of associated genes. Some cases involve multiple genes and proteins whose relationship is described using Boolean logic. A single protein may be composed of subunits coded by two (or more) genes. GPRs may be used to evaluate the effects of gene knockouts and gene regulation on the metabolic reconstructions, ruling out reactions whose necessary genes are not available. E coli iAF1260 H sapiens S cerevisiae 240 (160) 134 (122) 106 (67) 200 (195) 3197 (2915) 1901 (1733) 691 (672 ) 766 (745) E coli iAF1260 H sapiens S cerevisiae 311 87 74 1037 517 137 269 124 Reactions Metabolites Reactions may be searched for by name, EC number, or associated gene as well as by using the model name as the only search parameter. Specifying compartment, pathway, or metabolite participation is also a possibility. Results may be limited by only including reactions with known gene associations, high or low confidence, or by excluding transport reactions. In addition, reactions may be searched across reconstructions allowing for model comparison. Lists of reactions matching a set of criteria may be exported as a tab delimited flat file. The exported files can contain information for multiple models, simplifying model comparison. Metabolites may be searched for by name, KEGG ID, CAS ID, or charge. Limiting searches by compartment, pathway, and organism is possible. In addition to basic metabolite information such as formula and charge, lists of reactions in which the metabolite participates are listed and categorized by the metabolite’s role as a reactant or a product. This feature facilitates the tracing of a metabolite through a pathway in the absence of graphical pathway maps. Lists of metabolites matching a set of search criteria may be exported, and contain information such as metabolite name, abbreviation, formula, KEGG ID, and CAS ID. Metabolic Maps The left diagram shows the number of reactions shared by the three largest reconstructions. The numbers in parentheses represent non-exchange reactions. The right diagram shows the number of metabolites shared by the three largest reconstructions. Compartmentalization Optional Information Simulation The BiGG database is capable of exporting reconstructions in SBML format. This XML format is widely used for distributing systems biology models. The user has several options to customize export on the Web. A compartment in a metabolic reconstruction has a distinct pool of metabolites and a set of reactions which may be unique to that compartment. By default, reactions and metabolites are compartmentalized in the models meaning they exist in distinct compartments such as the Cytosol or the Golgi. The user can choose the model to be “partially decompartmentalized” or “fully decompartmentalized.” If partially decompartmentalized, reactions and metabolites ordinarily assigned to subcompartments of the Cytosol (Mitochondria, Peroxisome, etc) are instead assigned to the Cytosol, while the Extraorganism compartment is untouched. In a fully decompartmentalized model, there are no compartments and all reactions and metabolites exist in an unsegregated single-compartment system. The user can choose which optional information to include in the SBML file. The notes field of the Reaction entries can include Boolean strings corresponding to the GPR statements. The GPR field is read and interpreted by the COBRA toolbox. The SBML file may also include information on genes, proteins and citations. Because the SBML specification does not include fields for this kind of data, this information is stored in the ‘notes’ field of the reaction entries. Each model includes several metabolic maps. All the maps are drawn in SVG format and can be displayed on all major browsers. When there are available maps that include any chosen reaction or metabolite, the maps will be listed in “details” page under appropriate organisms. Primary molecules are drawn larger compared to other non-primary molecules. Molecules that are outside the cell (extraorganism) are colored yellow. Molecules in different compartment have different suffixes in their names. For example, Cytosol is [c] and Nucleus is [n]. In case of reversible reactions, reactant- side molecules are pointed with smaller arrowheads. The reaction or metabolite the user searched is highlighted red so that it is easier to locate it on the map. The components of the maps, lines and circles, are hyperlinked to display more information on them when they are clicked. This graphical representation would provide the user with another way of understanding chemical pathways. The map to the right shows a part of Carbohydrate Metabolism in human. The SBML file contains a few additional reactions that are necessary for simulation purposes. In case of H. pylori iIT341, reactions DM_HMFURN, sink_ahcys(c), and sink_amob are present in the exported model, for example. To run meaningful simulations, it is important that the bounds of exchange fluxes be specified to model the environment. By including the flux bound vectors in the SBML file, the simulation process is simplified for simulations. In addition, upper and lower flux bounds of all reactions may be refined before exporting so as to allow the user to create SBML files with customized parameters. The last ten years have seen the emergence of many genome-scale metabolic reconstructions. These manually- curated, component-by-component (bottom- up) reconstructions of genomic and bibliomic data have lead to a bi ochemically, g enetically and g enomically structured (BiGG) knowledgebase. Such reconstructions are of interest for their detailed curated content and for their utility in assessing metabolic capabilities. A metabolic reconstruction can be mathematically represented as an in silico model for computing allowable network states through the application of governing chemical and genetic constraints under the co nstraint b ased r econstruction and a nalysis (COBRA) framework. Furthermore, gap analysis identifies possible missing reactions by finding so called ‘dead end’ metabolites which can be produced by the network but not consumed. BiGG includes seven different genome- scale reconstructions of six organisms: Homo sapiens Recon 1, Escherichia coli iJR 904 and iAF1260, Saccharomyces cerevisiae iND750, Staphylococcus aureus iSB619, Methanosarcina barkeri iAF692, and Helicobacter pylori iIT341. The Website The BiGG browser and exporter. SBML files are compatible with the COBRA toolbox which allows performing many computational procedures. Using the COBRA toolbox, the SBML file exported from BiGG may be imported as a network data structure into Matlab. COBRA Compatibility BiGG is available at http://bigg.ucsd.edu/ NMN Metabolism in S. cerevisiae All queries are performed by a Linux/Apache Server using Perl with the CGI and DBI modules. Database Schema Reconstructions are developed in and stored on a Genomatica (San Diego, CA)-supplied Simpheny TM server running an Oracle TM database. Access to this database is provided by a read-only client with several tables and views for accessing information on Reactions, Metabolites, Genes, Proteins and Citations.

BiGG: Biochemical, Genetic and Genomic Database Jun Young Park 1, Jan Schellenberger 2, Tom M. Conrad 3, Bernhard Ø. Palsson 1,2 1 Department of Bioengineering,

Embed Size (px)

Citation preview

Page 1: BiGG: Biochemical, Genetic and Genomic Database Jun Young Park 1, Jan Schellenberger 2, Tom M. Conrad 3, Bernhard Ø. Palsson 1,2 1 Department of Bioengineering,

BiGG: Biochemical, Genetic and Genomic DatabaseJun Young Park1, Jan Schellenberger2, Tom M. Conrad3, Bernhard Ø. Palsson1,2

1Department of Bioengineering, University of California San Diego [email protected] [email protected] 2Bioinformatics Program, University of California San Diego [email protected]

3Department of Chemistry and Biochemistry, University of California San Diego [email protected]

DATABASE CONTENTSIntroduction

EXPORTINGBROWSING

ABSTRACT

References1. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ: Quantitative prediction of cellular

metabolism with constraint-based models: The COBRA Toolbox. Nat Protocols 2007, 2(3):727-738. 2. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes.

Nucleic Acids Res 1999, 27(1):29-34.3. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A

et al: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19(4):524-531.

We describe BiGG, a database of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different species. Furthermore, BiGG contains links to several publicly available databases where additional information can be found and integrated. In addition, BiGG contains a customized export tool that enables the generation of SBML files for further network analysis by external software packages. BiGG addresses a need in the systems biology community to have access to high quality curated metabolic reconstructions.

CONCLUSION• The scope of covered reactions is often greater than for other databases.• BiGG uses both genetics and literature based data to assess whether a reaction is present.• BiGG assigns confidence levels to each reaction which can be used when evaluating the resultant model.• BiGG includes relationships between genes and proteins (GPR).• Compartmentalization in BiGG gives a more accurate description of reactions involving membrane transporters. • BiGG provides the gap between a reconstruction and a model. • The BiGG database provides the first collection of curated high quality metabolic reconstructions suitable for study with COBRA methods.

Gene-Protein-Reaction (GPR) associationsSingle Gene Reaction Multiple Gene Reaction

DNA

mRNA

Protein

Reaction

transla

tionco

mp

lexi

ng

transcrip

tio

na

ctivity

sphingosine kinase 2 platelet-activating factor acetylhydrolase

The ‘on’ or ‘off’ state of each reaction in the network may be controlled by the genotype and expression level of associated genes. Some cases involve multiple genes and proteins whose relationship is described using Boolean logic. A single protein may be composed of subunits coded by two (or more) genes. GPRs may be used to evaluate the effects of gene knockouts and gene regulation on the metabolic reconstructions, ruling out reactions whose necessary genes are not available.

E coliiAF1260

H sapiens

S cerevisiae

240(160)

134(122)

106(67)

200(195)

3197(2915)

1901(1733)

691(672) 766

(745)

E coliiAF1260

H sapiens

S cerevisiae

311

87

74

1037 517

137

269

124

Reactions

Metabolites

Reactions may be searched for by name, EC number, or associated gene as well as by using the model name as the only search parameter. Specifying compartment, pathway, or metabolite participation is also a possibility. Results may be limited by only including reactions with known gene associations, high or low confidence, or by excluding transport reactions. In addition, reactions may be searched across reconstructions allowing for model comparison. Lists of reactions matching a set of criteria may be exported as a tab delimited flat file. The exported files can contain information for multiple models, simplifying model comparison.

Metabolites may be searched for by name, KEGG ID, CAS ID, or charge. Limiting searches by compartment, pathway, and organism is possible. In addition to basic metabolite information such as formula and charge, lists of reactions in which the metabolite participates are listed and categorized by the metabolite’s role as a reactant or a product. This feature facilitates the tracing of a metabolite through a pathway in the absence of graphical pathway maps. Lists of metabolites matching a set of search criteria may be exported, and contain information such as metabolite name, abbreviation, formula, KEGG ID, and CAS ID.

Metabolic Maps

The left diagram shows the number of reactions shared by the three largest reconstructions. The numbers in parentheses represent non-exchange reactions.

The right diagram shows the number of metabolites shared by the three largest reconstructions.

Compartmentalization

Optional Information

Simulation

The BiGG database is capable of exporting reconstructions in SBML format. This XML format is widely used for distributing systems biology models. The user has several options to customize export on the Web.

A compartment in a metabolic reconstruction has a distinct pool of metabolites and a set of reactions which may be unique to that compartment. By default, reactions and metabolites are compartmentalized in the models meaning they exist in distinct compartments such as the Cytosol or the Golgi. The user can choose the model to be “partially decompartmentalized” or “fully decompartmentalized.” If partially decompartmentalized, reactions and metabolites ordinarily assigned to subcompartments of the Cytosol (Mitochondria, Peroxisome, etc) are instead assigned to the Cytosol, while the Extraorganism compartment is untouched. In a fully decompartmentalized model, there are no compartments and all reactions and metabolites exist in an unsegregated single-compartment system.

The user can choose which optional information to include in the SBML file. The notes field of the Reaction entries can include Boolean strings corresponding to the GPR statements. The GPR field is read and interpreted by the COBRA toolbox. The SBML file may also include information on genes, proteins and citations. Because the SBML specification does not include fields for this kind of data, this information is stored in the ‘notes’ field of the reaction entries.

Each model includes several metabolic maps. All the maps are drawn in SVG format and can be displayed on all major browsers. When there are available maps that include any chosen reaction or metabolite, the maps will be listed in “details” page under appropriate organisms. Primary molecules are drawn larger compared to other non-primary molecules. Molecules that are outside the cell (extraorganism) are colored yellow. Molecules in different compartment have different suffixes in their names. For example, Cytosol is [c] and Nucleus is [n]. In case of reversible reactions, reactant-side molecules are pointed with smaller arrowheads. The reaction or metabolite the user searched is highlighted red so that it is easier to locate it on the map. The components of the maps, lines and circles, are hyperlinked to display more information on them when they are clicked. This graphical representation would provide the user with another way of understanding chemical pathways.

The map to the right shows a part of Carbohydrate Metabolism in human.

The SBML file contains a few additional reactions that are necessary for simulation purposes. In case of H. pylori iIT341, reactions DM_HMFURN, sink_ahcys(c), and sink_amob are present in the exported model, for example. To run meaningful simulations, it is important that the bounds of exchange fluxes be specified to model the environment. By including the flux bound vectors in the SBML file, the simulation process is simplified for simulations. In addition, upper and lower flux bounds of all reactions may be refined before exporting so as to allow the user to create SBML files with customized parameters.

The last ten years have seen the emergence of many genome-scale metabolic reconstructions. These manually-curated, component-by-component (bottom-up) reconstructions of genomic and bibliomic data have lead to a biochemically, genetically and genomically structured (BiGG) knowledgebase.

Such reconstructions are of interest for their detailed curated content and for their utility in assessing metabolic capabilities. A metabolic reconstruction can be mathematically represented as an in silico model for computing allowable network states through the application of governing chemical and genetic constraints under the constraint based reconstruction and analysis (COBRA) framework. Furthermore, gap analysis identifies possible missing reactions by finding so called ‘dead end’ metabolites which can be produced by the network but not consumed.

BiGG includes seven different genome-scale reconstructions of six organisms:

Homo sapiens Recon 1, Escherichia coli iJR 904 and iAF1260, Saccharomyces cerevisiae iND750, Staphylococcus aureus iSB619, Methanosarcina barkeri iAF692, and Helicobacter pylori iIT341.

The Website

The BiGG browser and exporter.

SBML files are compatible with the COBRA toolbox which allows performing many computational procedures. Using the COBRA toolbox, the SBML file exported from BiGG may be imported as a network data structure into Matlab.

COBRA Compatibility

BiGG is available at http://bigg.ucsd.edu/

NMN Metabolism in S. cerevisiae

All queries are performed by a Linux/Apache Server using Perl with the CGI and DBI modules.

Database Schema

Reconstructions are developed in and stored on a Genomatica (San Diego, CA)-supplied SimphenyTM server running an OracleTM database. Access to this database is provided by a read-only client with several tables and views for accessing

information on Reactions, Metabolites, Genes, Proteins and Citations.