Download pdf - List of Public Databases

Transcript
  • 8/13/2019 List of Public Databases

    1/54

    Canadian Bioinformatics

    Workshopswww.bioinformatics.ca

  • 8/13/2019 List of Public Databases

    2/54

    2Module #: Title of Module

  • 8/13/2019 List of Public Databases

    3/54

    Module 4

    Databases for Chemical, Spectral and

    Biological Data

    David Wishart

  • 8/13/2019 List of Public Databases

    4/54

    Two Solitudes

    B

    ioinformatics

    Ch

    eminformatic

    s

  • 8/13/2019 List of Public Databases

    5/54

    Cheminformatics vs.Bioinformatics

    Established in the1960s

    Designed for the

    needs of organicchemists

    User-pay, limitedpublic access

    Funded by largecompanies (MDL,Bielstein, Sigma,

    CAS)

    Established in the1990s

    Designed for needs

    of molecularbiologists

    Web-based, openaccess model

    Funded by largegovt agencies

    (NCBI, EBI, NIH, GC)

  • 8/13/2019 List of Public Databases

    6/54

    Whats A Database For?

    Information consolidation & linkage Information retrieval (query matching)

    Reference values, reference data,

    reference sequences, reference images

    Data for training/testing algorithms

    Similarity searching (image, spectra,structure, sequence, text)

    Prediction (structure, function, property,phylogeny, activity, relationship)

  • 8/13/2019 List of Public Databases

    7/54

    Database Evolution

    Hobby database(flatfile)

    Limited coverageLimited depth

    Greater coverageGreater depth

    Extensive coverageModest depth

    In

    creasingCost+

    Resources

    Sizeofuserco

    mmunity

    Curated,non-redundant(relational, warehouse)

    Archived, open deposit,redundant

    (relational, distributed)

  • 8/13/2019 List of Public Databases

    8/54

    Database Evolution

    Hobby database(flatfile)

    N

    eedforstandar

    dization

    Curated,non-redundant(relational, warehouse)

    Archived, open deposit,redundant

    (relational, distributed)

    D

    ependenceonautomation

    Queryingcap

    abilities

  • 8/13/2019 List of Public Databases

    9/54

    The Problem with Metabolomics

    ?

    Genomics

    Proteomics

    Metabolomics

    Gene IDs +

    Transcript

    Abundance

    Protein IDs +

    Concentrations

    Metabolite IDs +

    Concentrations

  • 8/13/2019 List of Public Databases

    10/54

    Metabolomics Databases

    Most data for metabolomics is still in texbooksor print journals (100+ years of clinicalchemistry, 75 years of classic biochemistry)

    Field lags behind genomics/proteomics by about20 years

    Challenge is to appeal to different usercommunities (metabolomics researchers,

    analytical chemists, plant chemists, clinicalchemists, physicians, drug researchers, NMRspecialists, MS specialists, bioinformaticians,standards setters, etc.)

  • 8/13/2019 List of Public Databases

    11/54

    Databases for Metabolomics

    NMR spectral databases

    Primarily small molecule spectra, not all metabolites

    MS or MS/MS spectral databases

    Primarily small molecule spectra, not all metabolites

    Compound databases

    Mostly compound names, structures, IDs, physprops

    Pathway databases Mix of metabolite, drug, protein, signaling pathways

    Comprehensive metabolomic databases

    Combines most/all of the above, focus on metabolites

  • 8/13/2019 List of Public Databases

    12/54

    NMR Spectral DBsSBDS NMRShiftDB

    MMCD BMRB

  • 8/13/2019 List of Public Databases

    13/54

    SDBS

    http://riodb01.ibase.aist.go.jp/sdbs/cgi-bin/direct_frame_top.cgi

  • 8/13/2019 List of Public Databases

    14/54

    SDBS

    Maintained in Japan by AIST (since1970s)

    Includes 24,700 MS spectra, 15,400 1HNMR spectra, 13,600 13C NMR spectra,52,500 FT-IR spectra on 34,000 cmpds

    Extensive suite of spectral search tools

    Most compounds are not metabolites,but still very useful for manyresearchers

  • 8/13/2019 List of Public Databases

    15/54

    BioMagResBank

    http://www.bmrb.wisc.edu/metabolomics/

  • 8/13/2019 List of Public Databases

    16/54

    BioMagResBank

    868 reference metabolites

    5-6 NMR spectra (1H, 13C, 1D, 2D) percompound

    Search by name, synonyms, InChI,formula, SMILES

    Focus primarily on plant metabolites

    (Arabidopsis) although now includesother mammalian metabolites

    No assignments available

  • 8/13/2019 List of Public Databases

    17/54

    NMRShiftDB

    46,606 1H/13C

    NMR Spectra

    38,802 structures

    http://www.ebi.ac.uk/nmrshiftdb/

  • 8/13/2019 List of Public Databases

    18/54

    NMRShiftDB

    Database developed by ChristophSteinbeck (who also leads ChEBI)

    Not restricted to metabolites, includes

    many organic compounds Supports chemical shift prediction

    Can search by name, structure or

    chemical shifts (peaks and Jcamp file)

    Includes chemical shift assignments(but in organic solvents)

  • 8/13/2019 List of Public Databases

    19/54

    MMCD

    http://mmcd.nmrfam.wisc.edu/

    20,306 cmpds

    791 1H NMR

    791 13C NMR791 TOCSY

    791 13C HSQC

    300 1H NMR (Lit)

    907 13C NMR (Lit)

    525 HSQC (Lit)

    2021 MS (Lit)

  • 8/13/2019 List of Public Databases

    20/54

    MMCD

    Supports structure, name, NMR (shifts),MS (peaks) searches

    Data includes chemical formula, namesand synonyms, structure, physical andchemical properties, NMR and MS data,NMR chemical shifts, species

    associations and extensive links toimages, references, and other publicdatabases

  • 8/13/2019 List of Public Databases

    21/54

    MS Spectral DBsNIST/AMDIS Metlin

    GolmDB MassBank

  • 8/13/2019 List of Public Databases

    22/54

    MassBank

    http://www.massbank.jp/

  • 8/13/2019 List of Public Databases

    23/54

  • 8/13/2019 List of Public Databases

    24/54

    MassBank

    Very nicely maintained and easilysearchable collection of mostlymetabolite MS spectra

    Includes ESI-QTOF, ESI-QqQ, GC-EI-TOF, EI, ESI-FTICR, Ion-trap, etc.

    Covers 30,857 MS spectra from

    approximately 14,500 compounds Archives data from ~20 different

    sources (Japan, Germany, US, etc)

  • 8/13/2019 List of Public Databases

    25/54

    Compound DBsChEBI PubChem

    ChemSpider Ligand Expo

  • 8/13/2019 List of Public Databases

    26/54

    ChEBI

    Pronounced KEBEE

    Chemical Entities of Biological Interest

    Contains 25,518 3 star compounds Most compounds are from KEGG,

    LipidMaps, DrugBank, Patents

    Most data is on names, ontology,synonyms, MW, formula and structure

    Searchable by name, formula, structure

  • 8/13/2019 List of Public Databases

    27/54

    PubChem

    NIH database of 31 million compounds and 75million substances, 1644 HT screens of compounds

    Compound must have

  • 8/13/2019 List of Public Databases

    28/54

    ChemSpider

    Contains 25 million compounds from 400data sources

    Searchable by name, synonym, InChi,

    structure, registry #, SMILES, calculatedproperties (but not by formula or mass)

    Data includes names, synonyms, wikipediaarticles, descriptions, data sources,suppliers, patents, articles, properties, MESHheadings, pharmacology links, spectra (UV,IR, NMR, MS) sourced from other sites

  • 8/13/2019 List of Public Databases

    29/54

    Ligand Expo

    Contains the small molecules in thePDB

    Useful because it links

    chemicals/metabolites/drugs to theirtargets

    Also provides 3D structure coordinates

    Searchable via 3-letter chemicalidentifier code, molecular name,molecular formula, SMILES description,

    InChi, 3D structure

  • 8/13/2019 List of Public Databases

    30/54

    Other Compound DBs3DMet KNApSAcK

    ZINC LipidMaps

  • 8/13/2019 List of Public Databases

    31/54

    Other DBs 3DMet

    3D structure database of natural metabolites

    KNApSAcK

    Database of 50,000 plant metabolites linked to

    species information ZINC

    Database of 2.7 million commercially available

    chemicals (mostly drug-like compounds)

    LipidMaps Database with 30,000 lipids (Fatty acyls,

    glycerolipids, glycerophospholipids, sphingolipids,

    sterols, prenols, saccharolipids, polyketides)

  • 8/13/2019 List of Public Databases

    32/54

    Pathway DBsKEGG SMPDB

    BioCyc/MetaCyc Reactome

  • 8/13/2019 List of Public Databases

    33/54

    Pathway DBs

    Rich source of biological data thatrelates metabolites to genes, proteins,diseases, signaling events and

    processes

    Provide various tools to permitvisualization and gene/metabolite

    mapping Often cover multiple species

  • 8/13/2019 List of Public Databases

    34/54

    KEGG Kyoto Encyclopediaof Genes and Genomes

    http://www.genome.jp/kegg/

  • 8/13/2019 List of Public Databases

    35/54

  • 8/13/2019 List of Public Databases

    36/54

    The Small Molecule

    Pathway Database (SMPDB)

    http://www.smpdb.ca

  • 8/13/2019 List of Public Databases

    37/54

    SMPDB

    350 hand-drawn pathways relevant tohuman/mammalian metabolism

    155 drug pathways

    72 disease pathways

    12 signalling pathways

    70 standard metabolic pathways

    Searching and browsing capabilities

    Metabolite mapping capabilities

    Captures structure, organelle,

    cellular and organ information

  • 8/13/2019 List of Public Databases

    38/54

    Exploring Pathways withSMPDB

  • 8/13/2019 List of Public Databases

    39/54

    Mapping Metabolites withSMPDB

  • 8/13/2019 List of Public Databases

    40/54

    Mapping MetaboliteConcentrations with SMPDB

  • 8/13/2019 List of Public Databases

    41/54

  • 8/13/2019 List of Public Databases

    42/54

    HMDB Features/Content

    7969 metabolites

    120 bacterial (gutmicrobe) metabolites

    Normal/abnormalconcentrations

    700+ disease links

    1700 NMR spectra

    2600 MS spectra

    310 GC-MS spectra

    Sequence search tools

    Spectral search tools

    Extensive browsing

    tools Pathway search tools

    Structure searches

    Biofluid browsing

    Text search tools

    Full data downloads

  • 8/13/2019 List of Public Databases

    43/54

    The Human Metabolome Project

    $7.5 million Genome Canada Project launched in Jan.2005 - Based at the University of Alberta

    Mandate to quantify and identify all metabolites inbiofluids such as urine, CSF and blood as well as

    tissues using HT experiments and text analysis (~8000cmpds to date)

    Associate metabolite concentrations to ~500 diseasesor conditions

    Make all data freely and electronically accessible(HMDB, DrugBank, FooDB, T3DB)

    Develop novel technologies and software to improvemetabolome coverage and metabolomic throughput

  • 8/13/2019 List of Public Databases

    44/54

    The Human Metabolomes

    M mM M nM pM fM

    Endogenous metabolites

    Drugs

    Food additives/Phytochemicals

    Drug metabolites

    Toxins/Env. Chemicals3100 (T3DB)

    500 (DrugMet)

    30,000 (FooDB)

    1450 (DrugBank)

    8000 (HMDB)

  • 8/13/2019 List of Public Databases

    45/54

    Meet the Metabolomes

    http://www.foodb.ca http://www.drugbank.ca

    http://www.hmdb.ca http://www.T3DB.org

  • 8/13/2019 List of Public Databases

    46/54

    Inside the HMDB

  • 8/13/2019 List of Public Databases

    47/54

    Inside the HMDB

    HMDB Databrowser

    102 data fields

  • 8/13/2019 List of Public Databases

    48/54

    HMDB Spectral Searching

  • 8/13/2019 List of Public Databases

    49/54

    HMDB: Pathway Tools

    -Enter metabolites-Link to metabolic pathways-Explore pathway images

  • 8/13/2019 List of Public Databases

    50/54

    The HMDB Biofluid Database

    Reference metaboliteconcentrations for >450different diseases &conditions

    Abnormal and normalmetaboliteconcentrations for >15biofluids and >4500different metabolites

    Designed for clinicalchemists & physicians

    Largest & most completeresource of its kind

  • 8/13/2019 List of Public Databases

    51/54

    Inside DrugBank

    http://www.drugbank.ca

  • 8/13/2019 List of Public Databases

    52/54

    Query Tools

    PharmaBrowse

    ChemQuery

  • 8/13/2019 List of Public Databases

    53/54

    Query Tools

    SeqSearch

    DataExtractor

  • 8/13/2019 List of Public Databases

    54/54

    Database Comparison

    HMDB

    KEGGPubChem

    MMCD

    ChEBI

    SDBS

    Reactome

    Metlin

    Cyc DBs

    MSSpectra

    NMR

    Spectra

    Pathways

    Structures

    Description

    s

    ChemP

    rops

    Physiol.da

    ta

    Nomenclat.

    Links+Re

    fs


Recommended