Upload
abhik-seal
View
105
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Abhik SealPhd Student(Chemical Informatics)
Indiana University Bloomingtonhttp://chemin-abs.blogspot.com/
mypage.iu.edu/~abseal/10/16/2012
Enhanced 3D Virtual Screening of PknB Inhibitors using Data fusion
Whats Pknb ???
• Ser/Thr protein kinase (STPK) highly conserved in Gram-positive bacteria and apparently essential for Mycobacterial viability.
• Essential for cell division and metabolism, expressed in exponential growth and overexpression causes defects in cell wall synthesis and cell division.
10/16/2012
PknB binding ATP pocket
Gatekeeper
Wehenkel,FEBS Letters 580 (2006) 3018–302210/16/2012
Kinase inhibitor and pharmacophores
Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer 2009
Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–269410/16/2012
Properties of Kinase Inhibitors
Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694
10/16/2012
• A data fusion algorithm accepts two or more ranked lists and merges these lists into a single ranked list with the aim of providing better effectiveness than all systems used for data fusion. (Croft,2000, Chapter 1; Meng et al., 2002).
• Another aim of the data fusion is to group existing search services under one umbrella, as the number of existing search services increases (Selberg & Etzioni, 1996)
• Fusion in automatic ranking of IR systemsAutomatic ranking of information retrieval systems using data fusion, Nuray & Can ’06
• Merging the retrieval results of multiple systems. see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion)
10/16/2012
Used ByMeta Search engines for example :(http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines)
ex: www.dogpile.com,www.copernic.com,www.hotbot.com
Meta search
D1 D2 D3
Engine1 Engine 2 Engine 2
Information Resource10/16/2012
Workflow of meta-search
• Execute a database search for some particular target structure using different similarity measures
• Note the rank position, R(i), of each database structure in the ranking for the i-th similarity measure using similarity coefficients
• Combine the various positions using a fusion rule to give a new rank position for each database structure
• Use these fused positions to generate the final output ranking for the search.
http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf10/16/2012
Types of fusion for 2D similarity search
a) Similarity fusion (SF):
SF involves searching a single reference structure against a database using multiple different similarity measures, and the output is obtained by combining the rankings resulting from these different measures.
b) Group fusion (GF):GF involves searching multiple reference structures against a database using a single similarity measure, and the output is obtained by combining the rankings resulting from these different reference structures.
Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012
Similarity fusion (SF)
(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches.
Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012
Group fusion(GF)
(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches.
10/16/2012
Reciprocal Rank method
• Merge compounds using only rank positions
• Rank score of compound i (j: system index)
j iji dposdr
)(1
1)(
10/16/2012
Reciprocal rank example
• 4 systems: A, B, C, Ddocuments: a, b, c, d, e, f, g
• Query results:A={a,b,c,d}, B={a,d,b,e},C={c,a,f,e}, D={b,g,e,f}
• r(a)=1/(1+1+1/2)=0.4r(b)=1/(1/2+1/3+1)=0.52
• Final ranking of compounds:(most relev) a > b > c > d > e > f > g (least relev)
Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data fusion. Information Processing and Management 42 (2006) 595–61410/16/2012
Sum score
The normalized scores of each ranking are summed to get the fused score of a compound
Ranking 1 Ranking 2 Ranking 3 Sum score Rank
Compound 1 1 0.9 0.7 2.6 1
Compound 2 0.8 0.5 1 2.3 2
Compound 3 0.7 1 0.5 2.2 3
Compound 4 0.2 0 0.1 0.3 4
Compound 5 0 0.3 0 0.3 5
Sum rank
• In sum rank ranking is done based on the sum scores the maximum score receives the minimum rank . The ranks are then summed and reranked.
Ranking 1 Ranking 2 Ranking 3 Sum rank Rank
Compound 1 1 10 4 15 5
Compound 2 2 5 6 13 4
Compound 3 7 4 3 14 4
Compound 4 2 3 3 8 2
Compound 5 3 2 1 6 1
3D Screening
Pharmacophore design
To generate the pharmacophoric features we used the energetic pharmacophore as developed by Salam et al with presence of exclusion spheres.
Pharmacophoric sites were automatically generated with Phase using the default set of six chemical features: hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable (P), and aromatic ring (R).
E-Pharmacophores
E-pharmacophore I E-pharmacophore II E-pharmacophore III10/16/2012
Validation of Pharmacophores
• To determine how well a hit list was for a query compound or a pharmacophore; yield of active compounds, enrichment factor, percentage actives and Goodness of a Hit list (GH score) were considered.
• Also, how well a pharmacophore or any other screening method can rank compounds “early” in a virtual screening process using Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC Truchon et al) and RIE metric (Sheridan et al)
• 35 active compounds randomly sampled from 62 actives along with 1000 decoys
(www. schrodinger.com/ glide_decoy_set). 10/16/2012
Why BEDROC ??• Despite its early recognition sensitivity, the Enrichment Factor has the
drawback of being insensitive to the relative ranking of the compounds in the top X% and ignoring the complete ranking of the remaining data set.
• The ROC measure cannot identify the compounds ranked early in a virtual screening process.
• This BEDROC metric uses an exponential decay function to reduce the influence of lower ranked compounds on the final score. The score has a parameter α that allows the user to adjust the definition of the early recognition problem.
• BEDROC value for three VS methods at α=20.At α=20 implies that 80% of the the final BEDROC score is based on the first 8% of the ranked data set.
10/16/2012
Validation of virtual screening
a) E- pharmacophore E-pharmacophore III was selected based on the performance measures and also number of compounds retrieved had more than fitness 2 and also high Goodness of Hit Score, yield of actives and specificity.
b) ROCS All the compounds were scored and ranked according to Tanimoto combo score parameters were selected as mentioned by Bostrom et al.
c) Glide XP All compound were score based on the glide XP docking score. The compound were ranked in a descending order of scores.
E-pharmacophore I
E-pharmacophore II
E-pharmacophore III
D8
R13
Which pharmacophore is good?
Does sites D8 and R13 important?
Results
Performance measures
Method EF(1%) EF(2%) EF(5%) EF(10%) BEDROC (α=20) RIE
E-pharmacophore I 11.71 11 10.51 6.8 0.538 7.81
E-pharmacophore II 29.57 27.51 12.14 6.9 0.716 10.40
E-pharmacophore III 29.57 27.14 13.71 7.42 0.744 10.81
vROCS 29.57 26.71 13.14 7.42 0.749 10.89
GlideXP 26.71 21 11.42 6.28 0.629 9.14
Sum score 29.57 28.57 14.85 7.42 0.785 11.42
Sum rank 29.57 24.28 12 7.42 0.703 10.21
Reciprocal rank 29.57 29.57 17.14 8.85 0.875 12.73
AUC ROC results
Methods AUC(1%) AUC(2%) AUC(5%) AUC(100%)
E-pharmacophore III 0.56 0.602 0.649 0.832
vROCS 0.58 0.62 0.62 0.89
GlideXP 0.39 0.44 0.51 0.84
Sum score 0.64 0.6780 0.717 0.90
Sum rank 0.47 0.49 0.565 0.91
Reciprocal rank 0.72 0.75 0.81 0.96
Architecture
System1
System 2
System 3
System 4
Data Preprocessing
Rescoring and Ranking
Decision
Validation
Fusion Algorithms
10/16/2012
Virtual Screening of Asinex 400K compounds Workflow
• 400K compounds from Asinex Optimized using ligprep
Virtual ScreeningUsing • Phase E
pharmacophore select top 5000 compounds for VS in vROCs and Glide SP
• Conformer generation and perfom ROCS
• Glide SP docking
Data Fusion Using Reciprocal Rank algorithm
Chemical Structure Collection 3D virtual Screening
Post processing and Ranking
Compound Selection
Data Fusion Top 10% of the database Selected for for Glide XP docking
45 compoundsSelected after visual Inspection and pharmacophore mapping
Machine Learning Models under process• Tools used: a)PowerMV descriptors 2D pharmacological fingerprints, Weighted Burden Number and 8 properties b) maccs(166 keys) c) rcdk extended graph basedd) j compound mapper library PHAP2PT3 D, PHAP3PT3D , CATS3D,CATS2D None of the descriptors till now efficient to retrieve the 3D screening results well. But ML model provides hope because it’s classifying active and decoys well with polykernel SVM.
PCA Analysis of predicted compounds• 12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc
. html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity XLogP, molecular weight, topological shape and geometrical shape.
Hits retrieved After visual inspection and Pharmacophore mapping
Docking of predicted compounds
Tools Used
• For docking and pharmacophore – Schrodinger’s Glide and phase
• Shape based Screening – vROCS• Performance calculation and visualization - R
statistics, ggplot2, enrichVS package.
More work
• Working with Design of PknG inhibitors • Enhanced Ranking systems for better
prediction• Automated protocol for developing enhanced
virtual screening using open source tools.
Acknowledgements
• Indo US science Technology Forum• Prof P.Yogeshwari and Prof D.Sriram (BITS
Hyderabad)• Computer Aided Drug Design Lab BITS Pilani
Hyderabad. • Prof David J Wild• OSDD Team