40
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project Alasdair J G Gray [email protected] www.alasdairjggray.co.u k @gray_alasdair http:// c745.r45.cf2.rackcdn.com/img/ 2009/lens_filter_coasters.jpg

Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Alasdair J G [email protected]

www.alasdairjggray.co.uk@gray_alasdair

http://c745.r45.cf2.rackcdn.com/img/2009/lens_filter_coasters.jpg

Page 2: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Open PHACTS Use Case

“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”

Chemical Properties (Chemspider) Launched drugs (Drugbank) Human => Mouse (Homologene) Protein Families (Enzyme) Bioactivty Data (ChEMBL) … other info (Uniprot/Entrez etc.)

“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”

21/05/2014 Brighton Seminar 2

Page 3: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

LiteraturePubChem

GenbankPatents Databases

Downloads

Data Integration Data Analysis Firewalled Databases

Repeat @ each companyx

Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

A single, shared solution.

Funded under• IMI: 2011-14• ENSO: 2014-16

Pre-competitive Informatics

Page 4: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Open PHACTS Discovery Platform

21/05/2014 Brighton Seminar 4

Drug Discovery Platform

Apps

Domain API

Interactive responses

Production qualityintegration platform

MethodCalls

Page 5: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

(April 2013 – March 2014)

15.8 million total hits

API Hits

Page 6: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

An “App Store”?

http://www.openphactsfoundation.org/apps.html

Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium

MOE Collector Cytophacts Utopia Garfield SciBite

KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

Page 7: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Drug

Disease

PathwayTarget

https://dev.openphacts.org/

Linked Data API

21/05/2014 Brighton Seminar 7

Page 8: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

OPS Discovery Platform

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

Page 9: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Platform Interaction

Page 10: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Provenance

Page 11: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Multiple Identities

Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”

http://bioinformatics.roslin.ac.uk/lawslaws/

21/05/2014 Brighton Seminar 11

P12047X31045P120

47

GB:29384RS_

2353

Are these the same thing?

Page 12: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Gleevec® = Imatinib Mesylate

21/05/2014 Brighton Seminar 12

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Page 13: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

21/05/2014 Brighton Seminar 13

Page 14: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

21/05/2014 Brighton Seminar 14

Page 15: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Multiple Links: Different Reasons

21/05/2014 Brighton Seminar 16

Link: skos:closeMatchReason: non-salt form

Link: skos:exactMatchReason: drug name

Page 16: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Strict Relaxed

Analysing Browsing

Dynamic Equality

21/05/2014 Brighton Seminar 17

skos:exactMatch(InChI)

Page 17: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Strict Relaxed

Analysing Browsing

Dynamic Equality

21/05/2014 Brighton Seminar 18

skos:closeMatch(Drug Name)

skos:closeMatch(Drug Name)

skos:exactMatch(InChI)

Page 18: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Initial Connectivity

21/05/2014 Brighton Seminar 19

Datasets 37

Linksets 104

Links 7,096,712

Justifications 7

Page 19: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Compound Information

Page 20: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Genes == Proteins?

BRCA1Breast cancer type 1 susceptibility protein

21/05/2014 Brighton Seminar 21

http://en.wikipedia.org/wiki/File:Protein_BRCA1_PDB_1jm7.png

http://en.wikipedia.org/wiki/File:BRCA1_en.png

Page 21: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Proceed with Caution!

21/05/2014 Brighton Seminar 22

Page 22: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Co-reference Computation

Rules ensure• Unrestricted transitivity

within conceptual type• Restrict crossing

conceptual types

Based on justifications

Provenance captured

Target

Protein

Gene

21/05/2014 Brighton Seminar 23

0..*

0..*

0..*

0..1

0..1

Page 23: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Initial Connectivity

21/05/2014 Brighton Seminar 24

Datasets 37

Linksets 104

Links 7,096,712

Justifications 7

Page 24: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Inferred Connectivity

21/05/2014 Brighton Seminar 25

Datasets 37

Linksets 883

Links 17,383,846

Justifications 7

Page 25: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

BridgeDb

21/05/2014 Brighton Seminar 26

Page 26: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

http://ops.rsc.org/OPS45975 http://ops.rsc.org/OPS45978

has_isotopically_unspecified_parent [CHEMINF:000459]

has OPS normalized counterpart [CHEMINF:000458]

http://ops.rsc.org/OPS45991

is_tautomer_of[chebi:is_tautomer_of]

http://ops.rsc.org/OPS45987

has_stereoundefined_parent [CHEMINF:000456]

http://ops.rsc.org/OPS45981

Lenses

Page 27: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

OPS Discovery Platform

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

Page 28: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

?iri cheminf:logd ?logd .FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 )

cw:979b545d-f9a9 cheminf:logd ?logd .GRAPH <http://rdf.chemspider.com> {

}

cw:979b545d-f9a9 cheminf:logd ?logd .

Query Expansion

Identity Mapping Service

(BridgeDB)

Query Expander Service

Profiles

Mappings

Q, L1 Q’

[cw:979b545d-f9a9,cs:2157, chembl:1280,db:db00945]

cw:979b545d-f9a9, L1

Can also be achieved through UNION

21/05/2014 Brighton Seminar 29

Page 29: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Experiment

Is it feasible to use a stand-off mapping service?• Base lines (no external call):

– “Perfect” URIs– Linked data querying

• Expansion approaches (external service call):– FILTER by Graph– UNION by Graph

C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S. Pettifer: Including Co-referent URIs in a SPARQL Query. COLD 2013. http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf21/05/2014 Brighton Seminar 30

Page 30: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

“Perfect” URI BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { chembl_mol:m1280 cheminf:mw ?mw . }}

21/05/2014 Brighton Seminar 31

Page 31: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Linked Data BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { ?chemblid cheminf:mw ?mw . } cs:2157 skos:exactMatch ?chemblid .}

21/05/2014 Brighton Seminar 32

Page 32: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Queries

Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)

21/05/2014 Brighton Seminar 33

Page 33: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Queries

Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)

21/05/2014 Brighton Seminar 34

Page 34: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Data:167,783,592 triples

Mappings:2,114,584 triples

Lenses:1

Experiment Data

21/05/2014 Brighton Seminar 35

Page 35: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Average execution times

36

Page 36: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Average execution times

0.01

8

37

Page 37: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Q6: Target Pharmacology

44

Page 38: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Conclusions

• Computing co-reference advantageous– Requires less raw linksets– Larger coverage across datasets

• Rules ensure control– Genes can equal proteins– Compounds never equal proteins

• Provenance captured throughout

21/05/2014 Brighton Seminar 45

Page 39: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Conclusions

• Query expansion slower in general– Due to separate service call– Difference below human perception– UNION faster than FILTER on Virtuoso

• Stand-off mappings feasible• Infrastructure can support lenses

21/05/2014 Brighton Seminar 46

Strict Relaxed

Analysing Browsing

Page 40: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Questions

[email protected]@gray_alasdair

[email protected]@open_phacts