Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Making semantics work in drug discovery
Indiana University School of Informatics and Computing
David Wild, Assistant Professor and Director, Cheminformatics & Chemogenomics Research Group (CCRG) Indiana University School of Informatics and Computing [email protected]
“Information is cheap. Understanding is expensive” (Karl Fast)
http://djwild.info
“Stack” of applying semantics in drug discovery & healthcare
New biomedical insights
Integrated knowledge discovery processes
Integrative tools and algorithms
Accessible networks of semantically integrated data
Only now going
mainstream
Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.
Chem2Bio2RDF.org – semantically integrated data
Chen, B., Dong. X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., Wild, D.J. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010, 11, 255.
We can answer many questions with SPARQL…
What pathways will troglitazone affect?
PREFIX c2b2r: <http://chem2bio2rdf.org/chem2bio2rdf.owl#> PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select distinct ?pathwayName ?datasource from <http://chem2bio2rdf.org/owl#> where { ?chemical rdfs:label "Troglitazone"^^xsd:string; c2b2r:hasInteraction ?interaction . ?interaction c2b2r:hasTarget ?target ; c2b2r:biologicalInterest true . ?pathway c2b2r:isPathwayOf ?target ; bp:name ?pathwayName ; bp:xref [c2b2r:identifierType ?datasource] . }
Drug
Gene
Pathway
We can answer many questions with SPARQL…
What are possible multi-target MAPK Inhibitors?
PREFIX pubchem: <http://chem2bio2rdf.org/pubchem/resource/> PREFIX kegg: <http://chem2bio2rdf.org/kegg/resource/> PREFIX uniprot: <http://chem2bio2rdf.org/uniprot/resource/> SELECT ?compound_cid (count(?compound_cid) as ?active_assays) FROM <http://chem2bio2rdf.org/pubchem> FROM <http://chem2bio2rdf.org/kegg> FROM <http://chem2bio2rdf.org/uniprot> WHERE { ?bioassay pubchem:CID ?compound_cid . ?bioassay pubchem:outcome ?activity . FILTER (?activity=2) . ?bioassay pubchem:Score ?score . FILTER (?score>50) . ?bioassay pubchem:gi ?gi . ?uniprot uniprot:gi ?gi . ?pathway kegg:protein ?uniprot . ?pathway kegg:Pathway_name ?pathway_name . FILTER regex(?pathway_name,"MAPK signaling pathway","i") . } GROUP BY ?compound_cid HAVING (count(*)>1)
Bio-Assay
Gene
Pathway
Comp-ound
Gene
Variety of expert GUI tools for searching
“Stack” of applying semantics in drug discovery & healthcare
New biomedical insights
Integrated knowledge discovery processes
Integrative tools & algorithms
Accessible networks of semantically integrated data
Very little work done in this area
Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.
ChemBioSpace Association Search
He, B., Tang, J., Ding, Y., Wang, H., Sun, Y., Shin, J.H., Chen, B., Moorthy, G., Qiu, J., Desai, P., Wild, D.J., Mining relational paths in biomedical data. PloS One, 2011, 6(12), e27506.
Semantic Linked Association Prediction Association score: 2385.9 Association significance: 9.06 x 10-6 => missing link predicted
SLAP significant subgraph
Chen, B., Ding, Y., Wild, D.J. Assessing Drug Target Association using Semantic Linked Data. PLoS Computational Biology, 2012, 8(7), e1002574
Compound/Target SLAP Virtual Screen - Troglitazone
(C) 2012 DATA2DISCOVERY INC 11
SLAP Drug-Target Prediction Matrix
Bipartite repurposing graph created with Sci2
Assessing drug similarity from biological function � Took 157 drugs with 10 known
therapeutic indications, and created SLAP profils against 1,683 human targets
� Pearson correlation between profiles > 0.9 was used to create associations between drugs
� Drugs with the same therapeutic indication unsurprisingly cluster together – also subcluster by MOA
� Some drugs with similar profile have different indications – potential for use in drug repurposing?
Large-scale repurposing networks
(C) 2012 DATA2DISCOVERY INC 15
Repurposing example
(C) 2012 DATA2DISCOVERY INC 16
H1 antihistamine
anticonvulsant antiarrhythmic
anticonvulsant antidepressive
alpha / beta blocker used for CHF
“Stack” of applying semantics in drug discovery & healthcare
New biomedical insights
Integrated knowledge discovery processes
Integrative tools & algorithms
Accessible networks of semantically integrated data
What is the added value?
Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.
Integrative virtual screening � Ligand-based screening: QSAR, similarity, pharmacophore � Structure-based screening: Molecular docking
� Semantic screening: Semantic association with targets and/or known ligands
� Look at top hits using each method, and fused hits using harmonic data fusion of ranked lists
� Currently being validated in PXR (Univ. Cincinnati) and Mtb (OSDD) projects)
Pharma-cophore
Random Forest ROCS SLAP Fusion
Pharma-cophore 1 -0.12 -0.07 0.08 0.37
Random Forest 1 0.04 -0.27 0.32
ROCS 1 -0.17 0.37
SLAP 1 0.28
Fusion 1
See Bioorg Med Chem Lett.2012 May 1;22(9):3349-53
MOA: Identifying cardiac side effects of Rosiglitazone
Gene/Drug Rosi-glitazone
Tro-glitazone
Pio-glitazone
SAA2 Strong “Discussed” PharmGKB
V. weak V. weak
APOE Strong “Discussed” PharmGKB + Matador
V. weak V. weak
ADIPOQ Strong Positive PharmGKB
V. weak Strong Positive PharmGKB
CYP2C8 Strong Changes metabolism (CTD)
V. Weak Strong Changes metabolism (CTD)
MOA: Identifying differential LDL-lowering effect of Troglitazone
Parkinson Disease-Inflammation Network
Integrating phenotypic assays
(C) 2012 DATA2DISCOVERY INC
22
Phenotypic Assay
Wnt pathway
Associated Assay
Resveratrol
Known assay data
Associated targets
Tool prototypes: djwild.info / d2discovery.com
Semantic Technologies in Drug Discovery
http://blog.project-sierra.de/archives/1639
� Most commercial organizations are still in the early adopter phase: with a big data integration problem and realizing semantic technologies are a better solution to this than relational databases
� Some companies are in “the bowling alley” and are moviong out of this phase. No-one is in “the tornado”
� Research (OpenPHACTS, etc) is well on the way to solving the data integration problem and is moving on to advanced searching, data mining and prediction
Google Knowledge Graph
http://blogs.gartner.com/darin-stewart/2012/05/17/googles-knowledge-graph-yeah-thats-the-semantic-web-sort-of/
Lessons & thoughts: adoption of semantics � Semantic search is going mainstream
� Google Knowledge Graph, Facebook Graph Search, Linked Open Data (LOD), OpenPHACTS � Now identified as “top technology trend” for 2013 (gartner.com/newsroom/id/2359715) � Everyone has a big data integration problem. Semantics now work well for this. � Many pharma companies have semantic pilot programs but no-one has gone “mainstream” with
semantics. However this is probably not far off. � Switching from relational to semantic models seems revolutionary but can be done in an evolutionary
fashion (D2R, etc), although there are some capacity issues and limitations with this approach.
� We need support for horizontal research in semantic prediction and data mining � Based on huge hetergeneous graphs – application of existing and new graph algorithms � Very little work has been done so far on semantic prediction using heterogeneous, semantic graphs –
most work is siloed in graph theory, data mining, communities
� We need support for vertical research in big data / networks / semantics for translational medicine and drug discovery � Semantic prediction, using all available data, shows strong promise for utility in areas such as drug-target
prediction, off-target profiling, and drug repurposing. � Semantics might have rapid adoption in healthcare (EMR’s, PHR), and it should be important to be able to
integrate from the molecular to patient level. Need to keep good alignment between disciplines � Academic-Industry cross-silo colloabration essential: EU OpenPHACTS good example of success