Upload
nuria-queralt-rosinach
View
251
Download
1
Embed Size (px)
Citation preview
DisGeNET-RDF: a GDA Linked Open Data resource
DisGeNET: a discovery platform to support translational research and drug discovery Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong
Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University
Acknowledgements
The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).
DisGeNET: Disease-Gene NETwork of relations for discovery
DATA
DISCOVERY
KNOWLEDGE BASE
TOOLS FOR EXPLORATION AND ANALYSIS
Motivation: Better understanding of human gene component and disease mechanisms for translational research and drug discovery and development. Challenge: One of the major current bottlenecks for knowledge discovery on the genetic component of diseases is that the information is fragmented. The vast amount of biomedical information about genotype-phenotype relations is distributed in several databases, represented and annotated using different data models, vocabularies and standards, and it is domain and technology-specific, which hampers their access, integration, analysis, and interpretation. Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease associations (GDAs), covering the whole spectrum of human diseases, and using standards for their annotation and representation.
DisGeNET in the LOD cloud for translational research
• DisGeNET + external multidomain sources in LOD. • It is interlinked to other biomedical databases to answer scientific questions that need the interrogation of cross-domain resources. • It aims to support the development of bioinformatic Semantic Web applications to extract key knowledge on the molecular mechanisms of diseases.
Implementation: The platform is composed of a knowledge base and a set of tools for data analysis and interpretation.
EVIDENCE-BASED DISCOVERY
CLINICIAN INTEROPERABILITY
METADATA
DATABASES & LITERATURE STANDARDS
INTEGRATION
OPEN
http://www.disgenet.org/
RESEARCHER CURATOR
BIOINFORMATICIAN & DEVELOPER
DISCOVERABILITY COMMUNITY USE
LARGE-SCALE EXTRACTION AND INTEGRATION
DIGITAL PUBLICATION, SHARING AND LINKING
Usage stats (Ago2014-Ago2015): • 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,
cited in +60 articles • Other Projects: PubAnnotation,
OpenLifeData
Registered: • biosharing • OMICtools • NeuroLex • Datahub
Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data (LOD)
cloud • Data analysis across domains
SEMANTIC WEB
What is the tissue expression pattern of the genes associated to Obesity?
• Large-scale integration across domains
• 17,181 Genes • PANTHER class
• 14,610 Diseases • MeSH class
60% complex, 36% rare/Mendelian, and 4% infectious diseases
DO MSH OMIM NCI ORDO ICD9
19 58 38 33 13 12
TRACK OF EVIDENCE
S = WCURATED + WPREDICTED + WLITERATURE
• Provenance (PubMed ID, source)
• DisGeNET score (evidence)
Web: http://www.disgenet.org/ RDF: http://rdf.disgenet.org/ SPARQL: http://rdf.disgenet.org/sparql/ Open PHACTS API: https://dev.openphacts.org
ACCESS Open Database License: http://opendatacommons.org/licenses/odbl/1.0/
Downloads: • Tab separated plain text • SQLite • RDF • Trusty nanopublications Web interface SPARQL endpoint / Linked Data browser Open PHACTS Discovery Platform Nanopublication network disGeNET2R R package
DIFFERENT USER PROFILES AVAILABILITY Metadata: • data-item description • dataset description Programmatic access: • Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows
REPRODUCIBILITY
Several formats and models
Transparency and Validation
SOURCES
Recent findings
429,111 Gene-Disease Associations
Sentence description
NORMALIZATION
HARMONIZATION
• NCBI Gene ID • UMLS CUIs.
DisGeNET association type ontology
INTEROPERABILITY
SYNTACTIC
COMMON IDs and ONTOLOGIES
SEMANTIC • 11 common ontologies in
• RDF2
• Nanopublications3
• GENE:
• DISEASE
STANDARDIZATION
Digital objects
DisGeNET association type ontology
Semanticscience Integrated Ontology (SIO)4
• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID
http://lod-cloud.net/; Aug 2014
4,962,315 RDF links to RDF datasets in the LOD
https://datahub.io/dataset/disgenet (more statistics)
LOD cloud RDFIZATION
METADATA
RDF
INTERLINKING
• Dataset (Open PHACTS + )
• Linksets (Open PHACTS + )
• Use Open PHACTS guidelines • Dereferenceable URIs (primary or ) • SIO
• • )
OWL
• NCBI Gene ID • PANTHER Classification
• UMLS CUIs • MeSH Classification
• Data providers • Disease annotation in the Open PHACTS Discovery Platform5
• OMIM included • > 20 000 000 of triples
RDF SCHEMA METADATA INTERLINKING • Linksets providers • > 70 000 number of linksets
FUTURE
New data: • Disease-phenotype associations (HPO) • New use cases • New API calls
Score: • Add to API calls
EXPLORER KNIME
More @ http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql-queries-2
MAPPINGS TO OTHER DISEASE TERMINOLOGIES
DRUG
TARGET PATHWAY
DISEASE
DISEASE PHENOTYPE
DISEASE GENE
GDA
EVIDENCE SNP SCORE
Gene-disease association as entity
• Data item
• Dataset
<disease> <void:inDataset> <dgn-void:disease-dataset>
http://www.myexperiment.org/groups/1125.html
API
References 1. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a
discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028. 2. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the
Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as
Nanopublications. Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience
Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked
data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088
• GDAs described by SIO
https://dev.openphacts.org
/disease/getTargets
http://rdf.disgenet.org/void-v3.0.0.ttl
Which compounds target proteins associated with Parkinson's disease or Alzheimer's disease?
DisGeNET in the Open PHACTS Discovery Platform for drug discovery and development