Drug-Target Interaction Prediction Using Semantic Similarity and … · 2018. 11. 26. ·...

Preview:

Citation preview

Drug-Target Interaction PredictionUsing Semantic Similarity and Edge

Partitioning

Guillermo Palma1, Maria-Esther Vidal1, and Louiqa Raschid2

1Universidad Simón Bolívar, Venezuela2 University of Maryland, USA

ISWC 2014, Riva de Garda, Italy. October 2014

Universidad Simón Bolívar

1

Increases median survival to nearly five years for HER2-positiveMetastatic Breast Cancer.

Docetaxel

Trastuzumab

Pertuzumab

http://www.drugdevelopment-technology.com/projects/pertuzumab/http://www.nasdaq.com/article

News - September 28th 2014.CLEOPATRA PROJECT*

Effectiveness of thecombination of two Monoclonal drugsin a chemotherapytreatment.

2

Increases median survival to nearly five years for HER2-positiveMetastatic Breast Cancer.

Docetaxel

Trastuzumab

Pertuzumab

http://www.drugdevelopment-technology.com/projects/pertuzumab/http://www.nasdaq.com/article

News - September 28th 2014.CLEOPATRA PROJECT*

Effectiveness of thecombination of two Monoclonal drugsin a chemotherapytreatment.

Can computational tools be used to predict combinations of drugs?

2

Inhibits the ability of HER2 to interact with

3

Drugs of the same family-Monoclonal drugs

Inhibits the ability of HER2 to interact with

3

Drugs of the same family-Monoclonal drugs

Proteins of the same family-HER family

Inhibits the ability of HER2 to interact with

3

Inhibits the ability of HER2 to interact with

4

Inhibits the ability of HER2 to interact with

d1

d2

Drugs

Similar

4

Inhibits the ability of HER2 to interact with

d1

d2

Drugs

Similar

Targets

Similar

t1

t2

4

Inhibits the ability of HER2 to interact with

d1

d2

Drugs

Similar

Targets

Similar

t1

t2

4

Inhibits the ability of HER2 to interact with

d1

d2

Drugs

Similar

Targets

Similar

t1

t2

4

Drug-Target InteractionsDrugs Targets

5

Drugs

DrugSemantic Similarity Measure

Targets

Drug-Target Interactions

Chemical Space Similarity 6

Drugs

DrugSemantic Similarity Measure

TargetSemantic Similarity Measure

Targets

Drug-Target Interactions

Chemical Space Similarity Genomic Space Similarity7

Drugs

DrugSemantic Similarity Measure

TargetSemantic Similarity Measure

Targets

Drug-Target Interactions

Chemical Space Similarity Genomic Space Similarity8

Drugs

DrugSemantic Similarity Measure

TargetSemantic Similarity Measure

Targets

Drug-Target Interactions

Drug-Target Interactions

Chemical Space Similarity Genomic Space Similarity9

Drugs

DrugSemantic Similarity Measure

TargetSemantic Similarity Measure

TargetsDrug-Target Interactions

Drug-Target Predictions

Drug-Target Interactions

Chemical Space Similarity Genomic Space Similarity10

Main Contributions1

11

Main Contributions1 2

ML Method

Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

12

Main Contributions1

3

2

ML Method

Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

13

Agenda

1

2

Semantics Based Edge Partitioning Problem (semEP)

Empirical Evaluation

14

SEMANTICS BASED EDGE PARTITIONING PROBLEM (SEMEP)

1

15

semEP

semEP: Semantic Based Edge Partitioning

16

semEP

semEP: Semantic Based Edge Partitioning

Similarity Measures

16

semEP

semEP: Semantic Based Edge Partitioning

Similarity Measures

16

d1

d2

d3

d4

d5

t1

t2

t3

t4

t5

i1

i2

i3

i4

i5i6i7

i8

i9

Semantics Based Edge Partitioning Problem (semEP)

17

d1

d2

d3

d4

d5

t1

t2

t3

t4

t5

i1

i2

i3

i4

i5i6i7

i8

i9

Semantics Based Edge Partitioning Problem (semEP)

Minimize the number of clusters.Density of the Clusters is Maximized.Semantics encoded in the ontologies is used to compute similarity between entities.

17

d1

d2

d3

d4

d5

t1

t2

t3

t4

t5

i1

i2

i3

i4

i5i6i7

i8

i9

Semantics Based Edge Partitioning Problem (semEP)

semEP is the problem of partitioning the edges of the bipartite graph intothe minimal set of clusters that maximize the cluster density.

Minimize the number of clusters.Density of the Clusters is Maximized.Semantics encoded in the ontologies is used to compute similarity between entities.

17

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graphd1

d2

d3

t1

t2

t3

i1

i4 i3

i2

18

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graphd1

d2

d3

t1

t2

t3

i1

i4 i3

i2

18

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graph

i2 i3

i1 i4

VCGd1

d2

d3

t1

t2

t3

i1

i4 i3

i2

18

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graphd1

d2

d3

t1

t2

t3

i1

i4 i3

i2

i2 i3

i1 i4

VCG

d2 is not similar to d1

Edges in the bipartite graph are mapped to edges in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:

sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2

19

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graph VCG

i2 i3

i1 i4d2 is not similar to d1

Edges in the bipartite graph are mapped to nodes in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:

sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2

d1

d2

d3

t1

t2

t3

i1

i4 i3

i2

20

Mapping to Vertex Coloring Graph (VCG)

semEP bipartite graph VCG

i2 i3

i1 i4d2 is not similar to d1

t2 is not similar to t3

Edges in the bipartite graph are mapped to nodes in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:

sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2

d1

d2

d3

t1

t2

t3

i1

i4 i3

i2

21

The Vertex Coloring Problem Coloring the vertices of a graph such

that no two adjacent vertices share the same color.

The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.

The Vertex Coloring is an NP-hard problem [Garey 79].

i2 i3

i1 i4

22

cDensityd1

d2

d3

t1

t2

t3

sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4

23

cDensityd1

d2

d3

t1

t2

t3

sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4

23

cDensityd1

d2

d3

t1

t2

t3

sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4

23

cDensityd1

d2

d3

t1

t2

t3

sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4

23

The Vertex Coloring Problem Coloring the vertices of a graph such

that no two adjacent vertices share the same color.

The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.

The Vertex Coloring is an NP-hard problem [Garey 79].

i2 i3

i1 i4

24

The Vertex Coloring Problem Coloring the vertices of a graph such

that no two adjacent vertices share the same color.

The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.

The Vertex Coloring is an NP-hard problem [Garey 79].

semEP implements the well-known approximate algorithm DSATUR algorithm to solve the Vertex Coloring Problem and to partition the Bipartite Graph Edges.

i2 i3

i1 i4

25

EMPIRICAL EVALUATION2

26

Evaluation on Drug-Target Interactions

Drug Similarity: drug-drug chemical similarityscore based on the hashed fingerprints from SMILES

Target Similarity: target-target similarity scorebased on the normalized Smith-Waterman sequence similarity score.

• 900 Drugs, 1,000 Targets and 5,000 Interactions: Nuclear receptor, Gprotein-coupled receptors (GPCRs), Ion channels, and Enzymes.

K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local models. Bioinformatics, 25(18).2009.

Data from Drugbank

27

semEP Predictions

Prediction probability:p1

28

Evaluation Protocol

A 10-fold cross validation:• Training data: Randomly selected 90% of

positive and negative interactions.• Test data: remaining 10% of the

interactions.

29

State-of-the-art Machine Learning Methods

• BLM: Bipartite Local Method [Cheng et al] • LapRLS: Laplacian Regularized Least Squares

[Xia et al]• GIP: Gaussian Interaction Profile [Van

Laarhoven et al]• KBMF2K: Kernelized Bayesian Matrix

Factorization with twin Kernels [Gonen]• NBI: Network-Based Inference [Cheng et al]

30

Experiment IResearch Question: Can semEP predictions

improve the performance of state-of-the art prediction methods?

Evaluation Protocol:Set of interactions of the benchmark

(positive and negative predictions).

Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions

added.

The same number of random predictions are added to the predictions of Y.

Ysem

EP

Ycnt

rlY

31

Experiment IResearch Question: Can semEP predictions

improve the performance of state-of-the art prediction methods?

Evaluation Protocol:Set of interactions of the benchmark

(positive and negative predictions).

Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions

added.

The same number of random predictions are added to the predictions of Y.

Ysem

EP

Ycnt

rlY✔

31

Experiment IResearch Question: Can semEP predictions

improve the performance of state-of-the art prediction methods?

Evaluation Protocol:Set of interactions of the benchmark

(positive and negative predictions).

Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions

added.

The same number of random predictions are added to the predictions of Y.

Ysem

EP

Ycnt

rlY✔

31

Experiment IResearch Question: Can semEP predictions

improve the performance of state-of-the art prediction methods?

Evaluation Protocol:Set of interactions of the benchmark

(positive and negative predictions).

Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions

added.

The same number of random predictions are added to the predictions of Y.

Ysem

EP

Ycnt

rlY✔

✔31

Evaluation of semEP and State-of-the-art Machine Learning Methods

ML Method Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

AUC for the GPCR dataset

semEP is able to improve performance of all the methods

Performance of the methods degrades for Ycntrl.

32

Evaluation of semEP and State-of-the-art Machine Learning Methods

ML Method Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

ML Method Y YsemEP Ycntrl

BLM 0.472 0.481 0.327

NBIds 0.615 0.719 0.467

GIP 0.705 0.764 0.563

LapRLS 0.630 0.704 0.517

KBMF2K 0.673 0.760 0.544

AUC for the GPCR dataset

AUPR for the GPCR dataset

semEP is able to improve performance of all the methods

Performance of the methods degrades for Ycntrl.

33

Overlap of Top10 positive predictions of semEP

The overlap is remarkably low. Results suggest that semEP predictions are accurate and diverse.All these techniques explore different spaces.

ML Method

Nuclear Receptor GPCR Ion channel Enzyme

Equal Different Equal Different Equal Different Equal Different

BLM 1 9 0 10 0 10 0 10

NBI 0 10 1 9 0 10 0 10

GIP 2 8 1 9 0 10 3 7

LapRLS 4 6 1 9 0 10 2 8

KBMF2K 4 6 0 10 0 10 0 10

34

Experiment IIResearch Question: Can semEP novel

predictions be validated?Evaluation Protocol:Top5 novel predicted interactions are

validated in the STITCH drug-target interaction website.Novel predicted interaction are interactions

that do not appear in the dataset.

STITCH: http://stitch.embl.de/

35

STITCH http://stitch.embl.de/

36

STITCH http://stitch.embl.de/

36

STITCH http://stitch.embl.de/

36

STITCH http://stitch.embl.de/

36

Validation of Top 5 Drug-Target Interactions (Novel predictions)

Top 5 were validated in STITCH http://stitch.embl.de/

Novel predicted interactions are interactions that do not appear in the dataset. semEP novel predicted interactions can be validated across all target groups.

ML Method Nuclear Receptor GPCR Ion Channel Enzyme

semEP 4 5 1 4

BLM 2 1 0 0

NBI 1 1 1 2

GIP 3 3 1 1

LapRLS 5 3 2 2

KBMF2K 3 4 2 2

37

Analyzing Top Drug-Target Interactions (Novel predictions for GPCRs)

Top 2, 3, and 4:D02076 hsa:146D02076 hsa:147D00604 has:147Probability: 0.8 38

Conclusions1

39

Conclusions1 2

ML Method

Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

40

Conclusions1

3

2

ML Method

Y YsemEP Ycntrl

BLM 0.888 0.911 0.798

NBI 0.833 0.900 0.769

GIP 0.943 0.958 0.843

LapRLS 0.941 0.956 0.844

KBMF2K 0.939 0.960 0.845

41

Future Directions Apply semEP to other domains, e.g., to predict drug-

drug interactions or adverse drug events or gene GO annotations.

http://informatics.mayo.edu/adepedia/index.php/Main_Pagehttp://omictools.com/metaadedb-s5660.html

Adverse Drug EventsDrugs

42

MANY THANKS!QUESTIONS

https://code.google.com/p/semep/Code Available at:

43

Solutions to semEP

Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.

Solutions to semEP

Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.

Solutions to semEP

Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.

Recommended