1
DisGeNET: a discovery platform to support translational research and drug discovery Janet Piñero, Núria Queralt-Rosinach , Àlex Bravo, Ferran Sanz and Laura I. Furlong Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University Acknowledgements The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB). DisGeNET: Disease-Gene NETwork of relations for discovery DisGeNET in the Open PHACTS Discovery Platform for drug discovery and development DATA DISCOVERY KNOWLEDGE BASE TOOLS FOR EXPLORATION AND ANALYSIS Motivation: Better understanding of human gene component and disease mechanisms for translational research and drug discovery and development. Challenge: One of the major current bottlenecks for knowledge discovery on the genetic component of diseases is that the information is fragmented. The vast amount of biomedical information about genotype-phenotype relations is distributed in several databases, represented and annotated using different data models, vocabularies and standards, and it is domain and technology-specific, which hampers their access, integration, analysis, and interpretation. Approach: DisGeNET Discovery Platform 1 collects and integrates the available information on gene-disease associations (GDAs), covering the whole spectrum of human diseases, and using standards for their annotation and representation. DisGeNET in the LOD cloud for translational research DisGeNET + external multidomain sources. It is interlinked to other biomedical databases to answer scientific questions that need the interrogation of cross-domain resources. It aims to support the development of bioinformatic Semantic Web applications to extract key knowledge on the molecular mechanisms of diseases. DisGeNET-RDF: a GDA Linked Open Data resource Implementation: The platform is composed of a knowledge base and a set of tools for data analysis and interpretation. EVIDENCE-BASED DISCOVERY CLINICIAN INTEROPERABILITY METADATA DATABASES & LITERATURE STANDARDS INTEGRATION OPEN http://www.disgenet.org/ RESEARCHER CURATOR BIOINFORMATICIAN & DEVELOPER DISCOVERABILITY COMMUNITY USE LARGE-SCALE EXTRACTION AND INTEGRATION DIGITAL PUBLICATION, SHARING AND LINKING Usage stats (Ago2014-Ago2015): 12,040 users, 22,696 sessions 14,494 downloads DisGeNET used in +20 publications, cited in +60 articles Other Projects: PubAnnotation, OpenLifeData Registered: biosharing OMICtools NeuroLex Datahub Present in the Semantic Web: URI/RDF/nanpublications Machine-processable Semantic integration Links to the Linked Open Data (LOD) cloud Data analysis across domains SEMANTIC WEB What is the tissue expression pattern of the genes associated to Obesity? Large-scale integration across domains 17,181 Genes PANTHER class 14,610 Diseases MeSH class 60% complex, 36% rare/Mendelian, and 4% infectious diseases DO MSH OMIM NCI ORDO ICD9 19 58 38 33 13 12 TRACK OF EVIDENCE S = W CURATED + W PREDICTED + W LITERATURE Provenance (PubMed ID, source) DisGeNET score (evidence) Web: http://www.disgenet.org/ RDF: http://rdf.disgenet.org/ SPARQL: http://rdf.disgenet.org/sparql/ Open PHACTS API : https://dev.openphacts.org ACCESS Open Database License: http://opendatacommons.org/licenses/odbl/1.0/ Downloads: Tab separated plain text SQLite RDF Trusty nanopublications Web interface SPARQL endpoint / Linked Data browser Open PHACTS Discovery Platform Nanopublication network disGeNET2R R package DIFFERENT USER PROFILES AVAILABILITY Metadata: data-item description dataset description Programmatic access: Automatic analysis Higher speed Reduce error Share results Embed in workflows REPRODUCIBILITY Several formats and models Transparency and Validation SOURCES Recent findings 429,111 Gene-Disease Associations Sentence description NORMALIZATION HARMONIZATION NCBI Gene ID UMLS CUIs. DisGeNET association type ontology INTEROPERABILITY SYNTACTIC COMMON IDs and ONTOLOGIES SEMANTIC 11 common ontologies in RDF 2 Nanopublications 3 GENE: DISEASE STANDARDIZATION Digital objects DisGeNET association type ontology Semanticscience Integrated Ontology (SIO) 4 Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID http://lod-cloud.net/ ; Aug 2014 4,962,315 RDF links to RDF datasets in the LOD https://datahub.io/dataset/disgenet (more statistics) LOD cloud RDFIZATION METADATA RDF INTERLINKING Dataset (Open PHACTS + ) Linksets (Open PHACTS + ) Use Open PHACTS guidelines Dereferenceable URIs (primary or ) SIO ) OWL NCBI Gene ID PANTHER Classification UMLS CUIs MeSH Classification Data providers Disease annotation in the Open PHACTS Discovery Platform 5 OMIM included > 20 000 000 of triples RDF SCHEMA METADATA INTERLINKING Linksets providers > 70 000 number of linksets FUTURE New data: Disease-phenotype associations (HPO) New use cases New API calls Score: Add to API calls EXPLORER KNIME More @ http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql-queries-2 MAPPINGS TO OTHER DISEASE TERMINOLOGIES DRUG TARGET PATHWAY DISEASE DISEASE PHENOTYPE DISEASE GENE GDA EVIDENCE SNP SCORE Gene-disease association as entity Data item Dataset <disease> <void:inDataset> <dgn-void:disease-dataset> http://www.myexperiment.org/groups/1125.html API References 1. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028 bav028. 2. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as Nanopublications. Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., Hoehndorf, R. (2014). The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., Williams, A. J. (2014, January 1). Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW- 2012-0088 GDAs described by SIO https://dev.openphacts.org /disease/getTargets http://rdf.disgenet.org/void-v3.0.0.ttl Which compounds target proteins associated with Parkinson's disease or Alzheimer's disease?

DisGeNET: a discovery platform to support translational ... · Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease associations

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DisGeNET: a discovery platform to support translational ... · Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease associations

DisGeNET: a discovery platform to support translational research and drug discovery Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong

Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University

Acknowledgements

The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).

DisGeNET: Disease-Gene NETwork of relations for discovery

DisGeNET in the Open PHACTS Discovery Platform for drug discovery and development DATA

DISCOVERY

KNOWLEDGE BASE

TOOLS FOR EXPLORATION AND ANALYSIS

Motivation: Better understanding of human gene component and disease mechanisms for translational research and drug discovery and development. Challenge: One of the major current bottlenecks for knowledge discovery on the genetic component of diseases is that the information is fragmented. The vast amount of biomedical information about genotype-phenotype relations is distributed in several databases, represented and annotated using different data models, vocabularies and standards, and it is domain and technology-specific, which hampers their access, integration, analysis, and interpretation. Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease associations (GDAs), covering the whole spectrum of human diseases, and using standards for their annotation and representation.

DisGeNET in the LOD cloud for translational research

• DisGeNET + external multidomain sources. • It is interlinked to other biomedical databases to answer scientific questions that need the interrogation of cross-domain resources. • It aims to support the development of bioinformatic Semantic Web applications to extract key knowledge on the molecular mechanisms of diseases.

DisGeNET-RDF: a GDA Linked Open Data resource

Implementation: The platform is composed of a knowledge base and a set of tools for data analysis and interpretation.

EVIDENCE-BASED DISCOVERY

CLINICIAN INTEROPERABILITY

METADATA

DATABASES & LITERATURE STANDARDS

INTEGRATION

OPEN

http://www.disgenet.org/

RESEARCHER CURATOR

BIOINFORMATICIAN & DEVELOPER

DISCOVERABILITY COMMUNITY USE

LARGE-SCALE EXTRACTION AND INTEGRATION

DIGITAL PUBLICATION, SHARING AND LINKING

Usage stats (Ago2014-Ago2015): • 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,

cited in +60 articles • Other Projects: PubAnnotation,

OpenLifeData

Registered: • biosharing • OMICtools • NeuroLex • Datahub

Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data (LOD)

cloud • Data analysis across domains

SEMANTIC WEB

What is the tissue expression pattern of the genes associated to Obesity?

• Large-scale integration across domains

• 17,181 Genes • PANTHER class

• 14,610 Diseases • MeSH class

60% complex, 36% rare/Mendelian, and 4% infectious diseases

DO MSH OMIM NCI ORDO ICD9

19 58 38 33 13 12

TRACK OF EVIDENCE

S = WCURATED + WPREDICTED + WLITERATURE

• Provenance (PubMed ID, source)

• DisGeNET score (evidence)

Web: http://www.disgenet.org/ RDF: http://rdf.disgenet.org/ SPARQL: http://rdf.disgenet.org/sparql/ Open PHACTS API: https://dev.openphacts.org

ACCESS Open Database License: http://opendatacommons.org/licenses/odbl/1.0/

Downloads: • Tab separated plain text • SQLite • RDF • Trusty nanopublications Web interface SPARQL endpoint / Linked Data browser Open PHACTS Discovery Platform Nanopublication network disGeNET2R R package

DIFFERENT USER PROFILES AVAILABILITY Metadata: • data-item description • dataset description Programmatic access: • Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows

REPRODUCIBILITY

Several formats and models

Transparency and Validation

SOURCES

Recent findings

429,111 Gene-Disease Associations

Sentence description

NORMALIZATION

HARMONIZATION

• NCBI Gene ID • UMLS CUIs.

DisGeNET association type ontology

INTEROPERABILITY

SYNTACTIC

COMMON IDs and ONTOLOGIES

SEMANTIC • 11 common ontologies in

• RDF2

• Nanopublications3

• GENE:

• DISEASE

STANDARDIZATION

Digital objects

DisGeNET association type ontology

Semanticscience Integrated Ontology (SIO)4

• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID

http://lod-cloud.net/; Aug 2014

4,962,315 RDF links to RDF datasets in the LOD

https://datahub.io/dataset/disgenet (more statistics)

LOD cloud RDFIZATION

METADATA

RDF

INTERLINKING

• Dataset (Open PHACTS + )

• Linksets (Open PHACTS + )

• Use Open PHACTS guidelines • Dereferenceable URIs (primary or ) • SIO

• • )

OWL

• NCBI Gene ID • PANTHER Classification

• UMLS CUIs • MeSH Classification

• Data providers • Disease annotation in the Open PHACTS Discovery Platform5

• OMIM included • > 20 000 000 of triples

RDF SCHEMA METADATA INTERLINKING • Linksets providers • > 70 000 number of linksets

FUTURE

New data: • Disease-phenotype associations (HPO) • New use cases • New API calls

Score: • Add to API calls

EXPLORER KNIME

More @ http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql-queries-2

MAPPINGS TO OTHER DISEASE TERMINOLOGIES

DRUG

TARGET PATHWAY

DISEASE

DISEASE PHENOTYPE

DISEASE GENE

GDA

EVIDENCE SNP SCORE

Gene-disease association as entity

• Data item

• Dataset

<disease> <void:inDataset> <dgn-void:disease-dataset>

http://www.myexperiment.org/groups/1125.html

API

References 1. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a

discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028. 2. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the

Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as

Nanopublications. Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience

Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked

data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088

• GDAs described by SIO

https://dev.openphacts.org

/disease/getTargets

http://rdf.disgenet.org/void-v3.0.0.ttl

Which compounds target proteins associated with Parkinson's disease or Alzheimer's disease?