Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Performance MetricsProposed Method2
GPU based virtual screening techniques for faster drug discovery 12/15/201646
46
Running Time of serial & parallel SOM implementationsProposed Method2
GPU based virtual screening techniques for faster drug discovery47 12/15/2016
47
The parallelized algorithm speedup the process considerably when implemented on a GPU.
Proposed Method-3
Substructure similarity based virtual screening Using GPU
12/15/2016
48
GPU based virtual screening techniques for faster drug discovery 48
Substructure similarity based Virtual Screening
Active
unknowns• One known active molecule
becomes the querymolecule.
• Look for compounds that are most similar to the querymolecule.
• Actives from unknown set can be easily identified by matching the structuralsimilarity of these molecules against the known active molecule.
© NVIDIA 2013
Proposed Method3
GPU based virtual screening techniques for faster drug discovery 49
49
12/15/2016
Similar Property Principle
The substructure based virtual screening is based on the widely accepted Similar Property Principle(SPP).
SPP States that “chemicals of similar structure
frequently share similar biological activities and physio-chemical properties”[55].
The size of the maximum common subgraph(MCS) between 2 graphs is a good metric for checking the compound similarity [53][54].
Size of MCS can be taken as a metric for ligand based virtual screening.
12/15/2016
50Proposed Method3
GPU based virtual screening techniques for faster drug discovery 50
Maximum Common Subgraph Problem denotes the largest common subgraph between the graphs under observation.
51
Proposed Method3
GPU based virtual screening techniques for faster drug discovery 12/15/2016
51
MCS algorithm : related works
McGregor[51] introduced a maximal common subgraphalgorithm that uses a backtrack search.
MCS of two graphs can be found by an algorithm that transforms the MCS problem into Maximal Clique Enumeration problem (MCE) [56].
The compatible information between two graphs should be stored as a new graph called “product graph”.
MCE algorithm uses the edge product graph for the solution. The maximum clique in the edge product graph corresponds to the
maximum common sub graph.
12/15/2016
52Proposed Method3
GPU based virtual screening techniques for faster drug discovery 52
Consider two graphs G1=(V1,E1) and G2=(V2,E2)
The edge product graph uses the vertex set V = E1 x E2.
The edges are formed between two vertices (e1, e2) and (f1, f2), if
e1 ≠ f1 and e2 ≠ f2;
if either e1, f1 in G1 are connected via the vertex labeled same as that of e2, f2 in G2; OR
e1, f1 and e2, f2 are not adjacent in G1 and G2 respectively.
• The BK algorithm finds all cliques in a graph exactly once[50].
Finding Edge product Graph for MCE[56]
GPU based virtual screening techniques for faster drug discovery 53
53
12/15/2016
Proposed Method3
12/15/201654
Proposed Method3
Serial BK algorithm[50]
GPU based virtual screening techniques for faster drug discovery
54
12/15/201655
Solution tree generated by BK algorithmInput Graph G
Proposed Method3
GPU based virtual screening techniques for faster drug discovery
Example
EP-Graph
(1,5)
)
(4,6)
)(4,5)
(2,5)
)
(3,5)
)(3,7)
)
(1,6)
)
(2,6)
)
(3,6)
)
(1,7)
)
(2,7)
)
(4,7)
)
12/15/201656
Proposed Method3
GPU based virtual screening techniques for faster drug discovery
12/15/2016GPU based virtual screening techniques for faster drug discovery 57
a) Backtrack to form the common subgraph from the 1st graph with edges 2,4,3
b) Maximum common subgraph of the above two graphs
One of the maximal clique(clique of size 3)Proposed Method3
Parallel BK algorithm
Clique enumeration is done by constructing a solution tree.
For large graphs, the serial solution consumes huge amount of time for the tree construction.
Parallel BK: Related work
Matthew C. Schmidt[57] proposed a parallel MCE algorithm for shared memory high performance computing architectures using OpenMP.
It involves decomposition of the search tree into search sub trees
Implemented using OpenMP/POSIX thread for shared memory and MPI for distributed memory.
Tested on Cray XT4 machine..
12/15/2016
58Proposed Method3
GPU based virtual screening techniques for faster drug discovery 58
Proposed GPU based MCE
Branch and bound techniques are utilized in the design
The proposed parallel method to solve MCE also uses the idea of decomposing the search tree into search sub trees.
The proposed solution uses a Breadth First technique for branching in the algorithm.
Current Active Set
Contains the list of nodes to be branched
Each node in currentActiveSet becomes a BFS root
Each thread will take a node from currentActiveSet and evaluates a sub region of the solution space
12/15/2016
59Proposed Method3
GPU based virtual screening techniques for faster drug discovery 59
12/15/201660
Proposed Method3
GPU based virtual screening techniques for faster drug discovery
12/15/201661
Proposed Method3
Solution tree generated by BK algorithmInput Graph G
GPU based virtual screening techniques for faster drug discovery
12/15/201662
Proposed Method3
Solution tree generated by BK algorithmInput Graph G
GPU based virtual screening techniques for faster drug discovery
12/15/201663
Proposed Method3
Solution tree generated by BK algorithmInput Graph G
GPU based virtual screening techniques for faster drug discovery
12/15/201664
Solution tree generated by BK algorithmInput Graph G
Proposed Method3
GPU based virtual screening techniques for faster drug discovery
12/15/2016GPU based virtual screening techniques for faster drug discovery 65
Test runs for different EP Graph
Experiments conducted using 6 randomly generated graphs[1-6 items in Table] and 7 DIMACS graphs(established benchmark graph).
The run time and size of the maximum cliques found are summarized in Table
12/15/201666
Proposed parallel virtual screening MethodProposed Method3
GPU based virtual screening techniques for faster drug discovery
12/15/201667
Proposed Method3
Proposed parallel virtual screening Method continued …
GPU based virtual screening techniques for faster drug discovery
12/15/2016GPU based virtual screening techniques for faster drug discovery 68
Parallelism in MCS based VSProposed Method3
68
12/15/2016GPU based virtual screening techniques for faster drug discovery69
Results of the proposed MCS based algorithm when Benzene was thequery compound
Proposed Method369
Comparison of proposed Machine learning based VS techniques
Performance comparison on various aspects Efficiency metrics Run time Ability to label molecules
Observations Since SOM takes multiple iteration to converge, Accuracy of SOM is
found better than RF method SOM takes large execution time since the multiple iterations are
required for completing the screening SOM based method can reduce the false positive, since it classifies
molecules as undefined also
12/15/2016
70
GPU based virtual screening techniques for faster drug discovery 70
12/15/2016GPU based virtual screening techniques for faster drug discovery 71
Performance comparison on efficiency metrics of RF and SOM based methods
12/15/2016GPU based virtual screening techniques for faster drug discovery 72
Comparison of running time of Parallel RF and Parallel SOM classification for virtual screening
List of Compounds which are predicted as Active by RFC and Undefined by SOM for GDB17 test set, which is a data set of unknown chemical compounds
12/15/2016GPU based virtual screening techniques for faster drug discovery 73
Tools Developed
Based on the methods proposed, following tools are developed to make the drug discovery process for efficient GPURFSCREEN SOMSCREEN GRAPHSCREEN
These tools are built using Python and C language on CUDA frame work.
Source code, Readme files etc are available at http://ccc.nitc.ac.in/project/
Conclusion
Considering large volume of data involved in ligand based drug design, the proposed parallel methods can reduce the running time.
The cost of installation, power consumption and maintenance of a GPU based system are lower compared to other multi-core system.
GPU based virtual screening is a viable alternative for quickly screening large quantity of ligand data at a lower cost
As part of the thesis, three new tools for faster virtual screening were developed
12/15/2016
74
GPU based virtual screening techniques for faster drug discovery 74
Future Work
12/15/2016GPU based virtual screening techniques for faster drug discovery
75
GPU based RF can further be parallelized by using multiple cores available in CPUs
Variant of random forest classifiers that implement balanced decision trees
Other distance measures such as Manhattan distance can be used as a discriminant function for the winner neuron prediction in SOM based VS.
The development of GRAPHSCREEN is limited because it considers only MCS related properties for screening the compounds.
75