41
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF 1 ChEBI User Group Meeting:June 24, 2010 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University Department of Biology School of Computer Science Institute of Biochemistry Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering

We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Embed Size (px)

Citation preview

Page 1: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 1

We’re all SMILES! Building Chemical Semantic Web Services

with SADI, ChEBI, and CHEMINF

Michel Dumontier, Ph.D.Associate Professor of Bioinformatics

Carleton University

Department of BiologySchool of Computer Science

Institute of BiochemistryOttawa Institute of Systems Biology

Ottawa-Carleton Institute of Biomedical Engineering

Page 2: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF
Page 3: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 3

Syntactic Web…It takes a lot of digging to get answers

Page 4: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 4

Surface web:167 terabytes

Deep web:91,000 terabytes

545-to-one

We need to get to the deep web

Page 5: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010

and tap into the global web of structured knowledge

5

Page 6: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 6

The Semantic Web is the new global web of knowledge

It is about standards for publishing, sharing and querying knowledge drawn from diverse sources

It makes possible the answeringsophisticated questions using

background knowledge

Page 7: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Goals

• Provision chemical data on the Web

• Find cheminformatic services that will consume the data

• Answer questions about chemicals by reasoning over essential chemical knowledge

ChEBI User Group Meeting:June 24, 2010 7

Page 8: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Is caffeine a drug-like molecule?

ChEBI User Group Meeting:June 24, 2010 8

Page 9: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Lipinski Rule of Five• Rule of thumb for druglikeness (orally active in humans)

(4 rules with multiples of 5)– Less than 500 Dalton– Less than 5 hydrogen bond donors– Less than 10 hydrogen bond acceptors– A partition coefficient value between -5 and 5

• We need a more formal (machine understandable) description

ChEBI User Group Meeting:June 24, 2010 9

Page 10: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 10

Formal Ontology as a Strategy

Page 11: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 11

The Web Ontology Language (OWL) Has Explicit Semantics

Can therefore be used to capture knowledge in a machine understandable way

Page 12: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Lipinski Rule of Five• Empirically derived ruleset for druglikeness

(4 rules with multiples of 5)– Less than 500 Dalton– Less than 5 hydrogen bond donors– Less than 10 hydrogen bond acceptors– A partition coefficient value between -5 and 5

• A formal description using OWL:

ChEBI User Group Meeting:June 24, 2010 12

Page 13: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

To calculate these attributes, we need access to a computable representation

of the molecular structure

ChEBI User Group Meeting:June 24, 2010 13

ball & stick model for caffeine

Page 14: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

The chemical graph specifies the type and connectivity of atoms in molecules. It describes

a part of chemical structure

SMILES strings are common representations of the chemical graph

ChEBI User Group Meeting:June 24, 2010 14

ball & stick model for caffeine

SMILES string for caffeine

Cn1cnc2n(C)c(=O)n(C)c(=O)c12

Page 15: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Chemical descriptors

• Chemical descriptors are data (quantities or values) that provide information about substances, molecular entities, and their parts (rings, atoms, bonds, etc).

• Sometimes they enumerate material parts, they quantify or describe qualities, functions or dispositions

• Often used to build Quantitative Structure Activity Relationships (QSAR) models

• Example descriptors :– Mass values– Partition coefficients– Heats of formation– Aromaticity values– Molecular formulas

ChEBI User Group Meeting:June 24, 2010 15

Page 16: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

The Chemical Information Ontology (CHEMINF)

• 100 chemical descriptors• 50 chemical qualities• Relates descriptors to their

specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)

• Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, Egon Willighagen, Peter Murray-Rust, Cristoph Steinbeck

ChEBI User Group Meeting:June 24, 2010 16

http://semanticchemistry.googlecode.com

Page 17: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

CHEMINF provides the vocabulary to define an input (SMILES-annotated molecule) and an output

(molecule annotated with a descriptor)

ChEBI User Group Meeting:June 24, 2010 17

Page 18: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Ultimately, the goal is to use an OWL reasoner to reason about the attributes to determine

whether the compound is drug-like

ChEBI User Group Meeting:June 24, 2010 18

Page 19: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Semantic Automated Discovery and Integration

http://sadiframework.org

Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB

SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs

19ChEBI User Group Meeting:June 24, 2010

Page 20: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

SADI

• OWL classes in SADI are local to individual services

– They should uniquely specify the service input and outputs (they exactly have the right restrictions)

– one service’s world-view can conflict with another,but a client can use any or all

• maximize interoperability by reusing types and relations

ChEBI User Group Meeting:June 24, 2010 20

Page 21: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Create code stubs using the ontology

• Publish the ontology to a web-accessible locationhttp://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl

• Make sure that the class names are resolvable(easy when using the hash notation)

http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-molecule

http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-molecule

http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-molecule

http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-molecule

http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule

• Download/checkout the codehttp://sadiframework.org

• Run the code generator – specify the URIs that correspond to input and output types

ChEBI User Group Meeting:June 24, 2010 21

Page 22: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Implement the functionality

• Java version – Uses Jena to manipulate the RDF graph– Uses Maven to build from command-line or Eclipse; Invokes

Jetty for service testing

• Chemistry– We used the Chemistry Development Kit (CDK) to implement 4

services

ChEBI User Group Meeting:June 24, 2010 22

Page 23: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Working with the service (GET)

• Responds to a GET by providing the service description in RDF– conforms to Feta (BioMoby, myGrid)

ChEBI User Group Meeting:June 24, 2010 23

curl http://cbrass.biordf.net/logpdc/logpc<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <rdf:Description rdf:about=""> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText> <j.0:hasOperation rdf:resource="#operation"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/> </rdf:Description> <rdf:Description rdf:about="#input"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description> <rdf:Description rdf:about="#operation"> <j.0:outputParameter rdf:resource="#output"/> <j.0:inputParameter rdf:resource="#input"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/> </rdf:Description> <rdf:Description rdf:about="#output"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description></rdf:RDF>

Page 24: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Working with the service (POST)

• Responds to a POST with service output (process an input file)

ChEBI User Group Meeting:June 24, 2010 24

<rdf:Description rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"> <rdf:type rdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300> </rdf:Description>

<rdf:RDF xmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#" xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sio="http://semanticscience.org/resource/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <so:smilesmolecule rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/> </so:smilesmolecule> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300> </sio:CHEMINF_000018></rdf:RDF>

curl --data @caffeine.rdf http://cbrass.biordf.net/logpdc/logpc

Page 25: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Publish and Register the service

ChEBI User Group Meeting:June 24, 2010 25

http://sadiframework.org/registry

Page 26: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Now what?

ChEBI User Group Meeting:June 24, 2010 26

Page 27: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 27

Semantic Health and Research Environment

SHARE is an application that execute (SPARQL) queries as workflows over SADI Services

Page 28: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

“Reckoning”

dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data

into that ontology

28ChEBI User Group Meeting:June 24, 2010

Page 29: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 29

SPARQL is the new cool kid on the query block

SQL SPARQL

Page 30: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

SHARE

• SPARQL engine– triple patterns are matched against service

descriptions

– knowledge base is dynamically populated

– queries can contain OWL classes, which are expanded to the required triple patterns

– query is optimized to minimize the number of service calls and the amount of data sent over the network

ChEBI User Group Meeting:June 24, 2010 30

Page 31: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI has data!

ChEBI User Group Meeting:June 24, 2010 31

Page 32: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Bio2RDF provides ChEBI in RDF

ChEBI User Group Meeting:June 24, 2010 32

Page 33: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 33

Bio2RDF now serving over 40 billion triples of linked biological data

Page 34: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 34

Page 35: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

An increasing amount of machine understandable chemical data

ChEBI User Group Meeting:June 24, 2010 35

Dataset Source

ChEBI Bio2RDF

PubChem Bio2RDF

DrugBank Bio2RDF

KEGG Bio2RDF

PDB Bio2RDF

PharmGKB Bio2RDF

CTD Bio2RDF

TCM LODD

Medicare LODD

SIDER LODD

ChEMBL LODD

DailyMed LODD

Page 36: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Query for log p

ChEBI User Group Meeting:June 24, 2010 36

Page 37: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Query: Is caffeine a drug-like molecule?

ChEBI User Group Meeting:June 24, 2010 37

Page 38: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 38

SADI

• Describe the input and output using OWL-DL classes

• Subject of input and output must be the same

• Web services indexed by predicates

• Biocatalogue will list SADI-compliant services

• Taverna plugin to work with SADI services

• Protégé 4.1 plugin to create SADI services

• Simplified migration path for existing web services (java, perl)

Page 39: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Benefits

• Data remains distributed – no warehouse!

• Data is not “exposed” as a SPARQL endpoint– greater provider-control over computational

resources

• Yet data appears to be a SPARQL endpoint… no modification of SPARQL or reasoner required.

ChEBI User Group Meeting:June 24, 2010 39

Page 40: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

Join Us!

• SADI and CardioSHARE are Open Source

• Come join us – we’re having a lot of fun!!

http://sadiframework.org

ChEBI User Group Meeting:June 24, 2010 40

Page 41: We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

ChEBI User Group Meeting:June 24, 2010 41

Acknowledgements

This research is supported by The Heart + Stroke Foundation of BC and Yukon, Microsoft Research, The Canadian Institutes of Health Research, The Natural Sciences and Engineering Research Council of Canada and CANARIE.

Leonid Chepelev (implementing the services)

Luke McCarthy (technical support)

Mark Wilkinson (vision and leadership)

CHEMINF Group

Janna HastingsNico AdamsEgon Willighagen