24
TITLE OF PRESENTATION | Presented By Date Combining semantic triple stores across knowledge domains Matthew Clark 15 March 2016 Matthew Clark ([email protected]), Frederik van den Broek, Anton Yuryev, Maria Shkrob, Sherri Matis-Mitchell, Timothy Hoctor. R & D Solutions, Elsevier Inc., 251 st National Meeting of the American Chemical Society, San Diego CA March 13-17, 2016, American Chemical Society: Washington DC, 1996; CINF 118.

Combining semantic triple stores across knowledge domainsbulletin.acscinf.org/PDFs/251nm/2016_spring_CINF_118.pdf · Combining semantic triple stores across knowledge domains Matthew

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

TITLE OF PRESENTATION |

Presented By

Date

Combining semantic triple stores across knowledge domains

Matthew Clark

15 March 2016

Matthew Clark ([email protected]), Frederik van den Broek, Anton Yuryev, Maria Shkrob, Sherri Matis-Mitchell, Timothy Hoctor. R & D Solutions, Elsevier Inc., 251st National Meeting of the American Chemical Society, San Diego CA March 13-17, 2016, American Chemical Society: Washington DC, 1996; CINF 118.

TITLE OF PRESENTATION |

• Very large data sets

• Order of ~107 documents published (patents, journals, books)

• Each document has ~200 sentences ~109 statements.

• Statements are about molecules, properties, reactions, indications etc.

• Combinatorial connections between large data sets

• “connecting the dots” among these facts results in a very large number of possible connections

• 𝑛!

𝑘! 𝑛−𝑘 ! combinations of k elements chosen from a pool of n.

2

What Constitutes Big Data?

Pathways • Relationships mined from

12,000 titles , 25M documents

• <subject> <verb> <object> relationships

• Each subject, object, verb has a taxonomy

• Example: “protein” causes/induces disease

Compounds • 16,000 journal titles

plus patent offices • Compounds,

Reactions, Properties • Over 6 million

compounds with bioactivity

Bioassays • Biological relationships mined

from journals/patents (over 16 million)

• <compound> <verb> <object> <quantity>

• Example: Sunitinib binds-to Bcr-abl in <assay type> at 1nM

TITLE OF PRESENTATION | 3

Biological Pathways extracted via semantic

text mining

A upregulates B

B upregulates C

C increases Disease

Normalizing vocabularies required: proteins, diseases, drugs, chemicals

A B C disease

Bioactivities through text analysis

IC50 6.3nM, kinase binding assay 10mM concentration

Chemical Structures And Properties

InChi, Name

NCBI, Uniprot

EMTREE ReaxysTree, Structures

TITLE OF PRESENTATION | 4

Example: Process for Finding New Indications for a Drug

Find all targets for which the compound has high affinity

Collate the diseases by targets and activity of the compound

Using unique set of proteins from steps 1 and search for all diseases reported to be related to them

Step 1 Step 2 Step 3

Find all compound-protein/gene relationships with > 1 reference using text analysis

Targets inhibited

Targets Related to

Disease

TITLE OF PRESENTATION |

5

Processing Protocol in Biovia PipelinePilot

Input: Drug Name

Output: Ranked

indications the drug may

treat

TITLE OF PRESENTATION | 6

Example drug: Ruxolitinib

• Janus kinase inhibitor selective for JAK1 and JAK1

• Approved for

• Myelofibrosis – cancer of bone marrow

• Polycythemia vera –too many red blood cells are made in bone marrow

*

* Jakafi is a registered trademark of Incyte Corporation. Incyte Corporation did not sponsor and was not involved in this data analysis .

TITLE OF PRESENTATION |

• For each target in the pathway, search for active compounds and compute activity for each

7

Elsevier indexes each reported measurement; must compute the ‘best’ value for each compound

Target in Disease Pathway

ABCC8

Reported Activities

8.0

7.9

6.6

6.0

7.0

6.0

5.0

Mean by Compound

7.9

6.4

5.0

In many cases there are several reported measurements for the same target/compound

TITLE OF PRESENTATION |

Target Name Number of

Reports pX

JAK2 34 8.5 JAK1 24 8.5 JAK3 8 8.0 TYK2 16 7.7 JAK2 (V617F) 4 7.6 LTK 1 7.5

MAP3K2 1 7.4

ROCK2 1 7.3

ROCK1 1 7.2

CaMK2 2 7.2

DCAMKL1 1 7.2 DAPK1 1 7.1

LRRK2 1 7.1 ACK1 1 7.1

DAPK3 1 7.1 LRRK2 (G2019S) 1 7.1 DAPK2 1 7.0

8

Ruxolitinib Reported Activities

Specific Bioactivities

>= 7 log units

TITLE OF PRESENTATION | 9

Proteins Regulating Neoplasms (with > 100 References)

TITLE OF PRESENTATION | 10

Diseases Related to Ruxolitinib Active Targets

MeDRA Level Disease Targets In Disease Pathway Inhibited by Ruxolitinib Target Count

soc Neoplasms JAK2;JAK1;TYK2;JAK3;LTK;ROCK2;ROCK1;DAPK1;LRRK2 9

hlt Inflammation JAK2;JAK1;TYK2;JAK3;ROCK1;DAPK1;LRRK2;DAPK3 8

pt Cancer JAK2;JAK1;JAK3;MAP3K2;ROCK1;DAPK1;LRRK2 7

pt Cell Transformation, Neoplastic JAK2;JAK1;ROCK1;DAPK1 4

hlt Colitis TYK2;JAK3;LRRK2 3

pt Hypertension JAK2;ROCK1;DAPK3 3

pt Ischemia JAK2;ROCK1;DAPK1 3

pt Insulin Resistance JAK2;ROCK2;ROCK1 3

hlt Diabetes Mellitus JAK2;TYK2;ROCK1 3

pt Obesity JAK2;ROCK2;ROCK1 3

TITLE OF PRESENTATION |

Disease # Targets Inhibited Example Rux Trials from ClinTrials.gov

Neoplasms 9 Neoplasms, Hematologic; Myeloproliferative Neoplasms Inflammation 8 Neoplasm Metastasis 5 Metastatic Pancreatic Adenocarcinoma; Metastatic Cancer Cell Transformation, Neoplastic 4 Colitis 3 Hypertension 3 Ischemia 3 Insulin Resistance 3 Diabetes Mellitus 3 Obesity 3 Inflammatory Bowel Diseases 3 Autoimmune Diseases 3 Ruxolitinib Prior to Transplant in Patients With Myelofibrosis Atherosclerosis 3 Graft vs Host Disease 3 Ruxolitinib in Combination With Autotransplant Prostate Cancer 3 Metastatic Prostate Cancer Neoplasm Invasiveness 3 Metastatic Pancreatic Adenocarcinoma; Metastatic Cancer Cardiac Hypertrophy 3 Hyperinsulinism 2

11

Analysis — Suggested Indications are Consistent with Current Clinical Trials

• There is a cluster of insulin/diabetes related indications – possible new area?

TITLE OF PRESENTATION |

Disease Name Selected Sentences Number of Refs

inflammation IL17A --+> Inflammation

Collectively, the data presented here indicate that integrin αvβ8 on DCs facilitates the development of Th17 cells, and consequently contributes to IL-17-mediated CNS inflammation, through activation of TGF-β. … IL-17 mRNA in sputum of asthmatic patients: linking T cell driven inflammation and granulocytic influx? 955

autoimmune diseases

IL17A ---> Autoimmune Diseases

It has been well recognized that IL-23/Th17/IL-17 axis is critically involved in driving chronic inflammatory autoimmune diseases. … The production of IL-17 by T helper17 cells was recently shown to be essential for development of CIA or other autoimmune diseases . 271

arthritis IL17A --+> Arthritis

In a murine model, interleukin -17 plays a critical role in the pathogenesis of arthritis. 196

12

Examples of Target-Disease Relationships

TITLE OF PRESENTATION | 13

This Analysis Shows Connections of Ruxolitinib to Alopecia

A cancer drug that grows hair! Trials are under way Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition Nature Medicine 20, 1043–1049 (2014) doi:10.1038/nm.3645 Global transcriptional profiling of mouse and human AA skin revealed gene expression signatures indicative of cytotoxic T cell infiltration, an interferon-γ (IFNG) response and upregulation of several γ-chain (γc) cytokines known to promote the activation and survival of IFN-γ–producing CD8+NKG2D+ effector T cells. Therapeutically, antibody-mediated blockade of IFN-γ, interleukin-2 (IL-2) or interleukin-15 receptor β (IL-15Rβ) prevented disease development, reducing the accumulation of CD8+NKG2D+ T cells in the skin and the dermal IFN response in a mouse model of AA.

TITLE OF PRESENTATION | | 14

• A rare genetic disease

• Permanently excessive level of insulin in the blood

• Develops within the first few days of life

Symptoms include floppiness, shakiness, poor feedings, seizures, fits and convulsions.

• If not caught quickly can lead to brain injury or even death.

• In the most severe cases the only viable treatment is the removal of the pancreas, consigning the patient to a lifetime of diabetes.

Example: Treatments for Congenital Hyperinsulinism

is a UK charity that is building the rare disease community to raise awareness, drive research and develop treatments. is partnering with Findacure scientists to help identify and evaluate treatments for this devastating disease.

TITLE OF PRESENTATION | | 15

From pathways to treatments: Biovia PipelinePilot implementation combines data sources

Automated analysis combines bioassay data with pathway data

Find all targets that could be used to affect the disease state

Query for each target to find the activities for each compound that are >6 log units

Collate data by compound to summarize the targets/activities related to disease that the compound hits • Compute geometric mean of activities for ranking • Rank by number of targets and geometric mean of

activities against targets

Step 1 Step 2 Step 3

TITLE OF PRESENTATION | | 16

Automated analysis combines bioassay data with pathway data

From pathways to treatments:

• 88 Targets related to hyperinsulinism with ≥3 literature references

• Full PathwayStudio relationship information

• PathwayStudio also has all compounds suggested as treatments

Find all targets that could be used to affect the disease state

Step 1

TITLE OF PRESENTATION | | 17

Building and refining the disease model

• Summary of the literature findings: CHI mutations in the context of insulin secretion

• Generate hypotheses using:

• 6.2M literature-extracted findings

• Functional annotations (e.g. Gene Ontology)

• >1800 pre-build pathways modeling disease and normal states

TITLE OF PRESENTATION | | 18

Automated analysis combines bioassay data with pathway data

From pathways to treatments:

Find all targets that could be used to affect the disease state

Query for each target to find compounds that have high affinity for them (>6 log units)

Step 1 Step 2

Targets based on text mining

Approved compounds

TITLE OF PRESENTATION | | 19

Automated analysis combines bioassay data with pathway data

From pathways to treatments:

Mean of activities among these targets

Mean of activities among these targets Targets and activities for each compound

Drug-likeness metrics for

sorting/classification

• All compounds that were observed to bind to targets in pathway

• Sorted by number of

active targets. Too many targets may suggest lack of specificity.

Find all targets that could be used to affect the disease state

Query for each target to find compounds that have high affinity for them (>6 log units)

Collate data by compound to summarize the targets/activities related to disease that the compound hits • Compute geometric mean of activities for ranking • Rank by number of targets and geometric mean of

activities against targets

Step 1 Step 2 Step 3

TITLE OF PRESENTATION |

• Starts with the set of active compounds and attempts to find common active scaffolds among them

• This is one of 38 scaffold systems identified as potentially active to treat hyperinsulinism

• Analysis method used: "The Scaffold Tree, Visualization of the Scaffold Universe by Hierarchical Scaffold Classification", Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., Waldmann, H., J. Chem. Inf. Model. 2007, 47, 47-58.

20

Next Step – Analyze Molecules to Identify Common Active Scaffolds for Novel Designs

More levels of “simplification” of common scaffolds from active compounds

Level 1 Level 2 Level 3 Note: many of these can be recognized as kinase-inhibitor scaffolds

TITLE OF PRESENTATION | 21

Who is collaborating? The collaboration analysis shows clinical centers specializing in CHI

• Filtered for institutions with > 4 publications and who collaborated with another institution. • Size of circle proportional to total number of publications • Line width proportional to the number of co-authored publications • Lines labeled with DOI’s

TITLE OF PRESENTATION | 22

Who are the researchers in congenital hyperinsulinism?

• Filtered for authors with > 3 publication and who collaborated with another person. • Size of circle proportional to total number of publications • Line width proportional to the number of co-authored publications • Lines labeled with DOI’s • Numbers for authors are Scopus ID

TITLE OF PRESENTATION |

• Results in testable ideas

• Many compounds are already approved drugs, can be tested in in-vivo experiments

• Concepts can be extended to find novel compounds

• Use modeling tools to extract common frameworks

• SAR to optimize activity for new indication

• Compare with compounds suggested as treatments as found by text mining

• Shows power of combining pathway data with experimentally verified binding data

• Not just theoretical pathways, but testable ideas.

23

Summary

TITLE OF PRESENTATION |

• Maria Shkrob

• Frederik van den Broek

• Sherri Matis-Mitchell

• George Jiang

• Tim Hoctor

• For valuable discussions

Jim Rinker, WuXi AppTec

Huijun Wang, Merck & Co.

24

Acknowledgments