18
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland, Galway

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Embed Size (px)

Citation preview

Page 1: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Ali Hasnain et. al

Insight Center for Data Analytics National University of Ireland, Galway

Page 2: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Agenda

• Motivation• Linked Life Sciences Roadmap• Cataloguing and Linking• Extending Catalogue – Metadata &

Provenance• Query Engine• Results

Page 3: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Motivation

• Biomedical Data is heterogeneous and spread across multiple sources (SPARQL endpoints).

• Navigation is a challenge.

• Containing trillions of triples and represented with insufficient vocabulary reuse.

• Biologists sometimes want to get more information regarding the data including its source, creator, publisher and also statistics with respect to its size (Metadata & Provenance).

3

Page 4: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

How to deal heterogeneous data?

DrugBank

DailyMed

CheBI, KEGG

Reactome

Sider

BioPax

Medicare

Page 5: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

We want to query the content, not the source

Proteins

Molecules

Genes

Diseases

Page 6: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

A Linked Life Sciences Roadmap

Proteins

Molecules

Genes

Diseases

:Protein:Molecule

:Gene

:Disease

Uniprot

PDB

Pfam PROSITE

ProDom

UnirefUniPark DailymedDrug

Bank ChemBL

PubChem KEGG

Gene Ontology

GeneID

Affymetrix

Homogene

MGI

Diseasome

SIDER

Page 7: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

2- Possible Solutions

• To assemble queries over multiple graphs at multiple endpoints, either:

• vocabularies and ontologies are reused, Or • translation maps between different terminologies

are created (“a posteriori integration”)

Page 8: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

a-priori v.s a-posteriori Integration

8

Page 9: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Cataloguing and Linking

9

Page 10: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Describing DataSets- an Extract from Catalogue

Page 11: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Extending Catalogue – Metadata & Provenance

Page 12: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,
Page 13: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,
Page 14: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Query Engine

http://srvgal86.deri.ie:8000/graph/Granatum

Page 15: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Visual & Graphical View

Page 16: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

SPARQL Endpoints returning results per query

Page 17: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,

Runtimes taken by different queries (Max, Min, Average, Median)

Page 18: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland,