3 d virtual screening of pknb inhibitors using data

[email protected] 1

Abhik SealPhd Student(Chemical Informatics)

Indiana University Bloomingtonhttp://chemin-abs.blogspot.com/

mypage.iu.edu/~abseal/10/16/2012

Enhanced 3D Virtual Screening of PknB Inhibitors using Data fusion

http://chemin-abs.blogspot.com/

http://chemin-abs.blogspot.com/

[email protected] 2

Whats Pknb ???

• Ser/Thr protein kinase (STPK) highly conserved in Gram-positive bacteria and apparently essential for Mycobacterial viability.

• Essential for cell division and metabolism, expressed in exponential growth and overexpression causes defects in cell wall synthesis and cell division.

10/16/2012

[email protected] 3

PknB binding ATP pocket

Gatekeeper

Wehenkel,FEBS Letters 580 (2006) 3018–302210/16/2012

[email protected] 4

Kinase inhibitor and pharmacophores

Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer 2009

Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–269410/16/2012

[email protected] 5

Properties of Kinase Inhibitors

Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694

10/16/2012

[email protected] 6

Some PknB inhibitors

10/16/2012

[email protected] 7

Data Fusion

10/16/2012

[email protected] 8

• A data fusion algorithm accepts two or more ranked lists and merges these lists into a single ranked list with the aim of providing better effectiveness than all systems used for data fusion. (Croft,2000, Chapter 1; Meng et al., 2002).

• Another aim of the data fusion is to group existing search services under one umbrella, as the number of existing search services increases (Selberg & Etzioni, 1996)

• Fusion in automatic ranking of IR systemsAutomatic ranking of information retrieval systems using data fusion, Nuray & Can ’06

• Merging the retrieval results of multiple systems. see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion)

10/16/2012

http://en.wikipedia.org/wiki/Data_fusion

[email protected] 9

Used ByMeta Search engines for example :(http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines)

ex: www.dogpile.com,www.copernic.com,www.hotbot.com

Meta search

D1 D2 D3

Engine1 Engine 2 Engine 2

Information Resource10/16/2012

http://en.wikipedia.org/wiki/List_of_search_engines

[email protected] 10

Workflow of meta-search

• Execute a database search for some particular target structure using different similarity measures

• Note the rank position, R(i), of each database structure in the ranking for the i-th similarity measure using similarity coefficients

• Combine the various positions using a fusion rule to give a new rank position for each database structure

• Use these fused positions to generate the final output ranking for the search.

http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf10/16/2012

http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf


Types of fusion for 2D similarity search

a) Similarity fusion (SF):

SF involves searching a single reference structure against a database using multiple different similarity measures, and the output is obtained by combining the rankings resulting from these different measures.

b) Group fusion (GF):GF involves searching multiple reference structures against a database using a single similarity measure, and the output is obtained by combining the rankings resulting from these different reference structures.

Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012


Similarity fusion (SF)

(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches.

Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012


Group fusion(GF)

(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches.

10/16/2012


Reciprocal Rank method

• Merge compounds using only rank positions

• Rank score of compound i (j: system index)

j iji dposdr

)(1

1)(

10/16/2012


Reciprocal rank example

• 4 systems: A, B, C, Ddocuments: a, b, c, d, e, f, g

• Query results:A={a,b,c,d}, B={a,d,b,e},C={c,a,f,e}, D={b,g,e,f}

• r(a)=1/(1+1+1/2)=0.4r(b)=1/(1/2+1/3+1)=0.52

• Final ranking of compounds:(most relev) a > b > c > d > e > f > g (least relev)

Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data fusion. Information Processing and Management 42 (2006) 595–61410/16/2012

Sum score

The normalized scores of each ranking are summed to get the fused score of a compound

Ranking 1 Ranking 2 Ranking 3 Sum score Rank

Compound 1 1 0.9 0.7 2.6 1

Compound 2 0.8 0.5 1 2.3 2

Compound 3 0.7 1 0.5 2.2 3

Compound 4 0.2 0 0.1 0.3 4

Compound 5 0 0.3 0 0.3 5

Sum rank

• In sum rank ranking is done based on the sum scores the maximum score receives the minimum rank . The ranks are then summed and reranked.

Ranking 1 Ranking 2 Ranking 3 Sum rank Rank

Compound 1 1 10 4 15 5

Compound 2 2 5 6 13 4

Compound 3 7 4 3 14 4

Compound 4 2 3 3 8 2

Compound 5 3 2 1 6 1

3D Screening

Pharmacophore design

To generate the pharmacophoric features we used the energetic pharmacophore as developed by Salam et al with presence of exclusion spheres.

Pharmacophoric sites were automatically generated with Phase using the default set of six chemical features: hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable (P), and aromatic ring (R).


E-Pharmacophores

E-pharmacophore I E-pharmacophore II E-pharmacophore III10/16/2012


Validation of Pharmacophores

• To determine how well a hit list was for a query compound or a pharmacophore; yield of active compounds, enrichment factor, percentage actives and Goodness of a Hit list (GH score) were considered.

• Also, how well a pharmacophore or any other screening method can rank compounds “early” in a virtual screening process using Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC Truchon et al) and RIE metric (Sheridan et al)

• 35 active compounds randomly sampled from 62 actives along with 1000 decoys

(www. schrodinger.com/ glide_decoy_set). 10/16/2012


Some formula’s

10/16/2012


Why BEDROC ??• Despite its early recognition sensitivity, the Enrichment Factor has the

drawback of being insensitive to the relative ranking of the compounds in the top X% and ignoring the complete ranking of the remaining data set.

• The ROC measure cannot identify the compounds ranked early in a virtual screening process.

• This BEDROC metric uses an exponential decay function to reduce the influence of lower ranked compounds on the final score. The score has a parameter α that allows the user to adjust the definition of the early recognition problem.

• BEDROC value for three VS methods at α=20.At α=20 implies that 80% of the the final BEDROC score is based on the first 8% of the ranked data set.

10/16/2012

Validation of virtual screening

a) E- pharmacophore E-pharmacophore III was selected based on the performance measures and also number of compounds retrieved had more than fitness 2 and also high Goodness of Hit Score, yield of actives and specificity.

b) ROCS All the compounds were scored and ranked according to Tanimoto combo score parameters were selected as mentioned by Bostrom et al.

c) Glide XP All compound were score based on the glide XP docking score. The compound were ranked in a descending order of scores.

E-pharmacophore I

E-pharmacophore II

E-pharmacophore III

D8

R13

Which pharmacophore is good?

Does sites D8 and R13 important?

Results

Performance measures

Method EF(1%) EF(2%) EF(5%) EF(10%) BEDROC (α=20) RIE

E-pharmacophore I 11.71 11 10.51 6.8 0.538 7.81

E-pharmacophore II 29.57 27.51 12.14 6.9 0.716 10.40

E-pharmacophore III 29.57 27.14 13.71 7.42 0.744 10.81

vROCS 29.57 26.71 13.14 7.42 0.749 10.89

GlideXP 26.71 21 11.42 6.28 0.629 9.14

Sum score 29.57 28.57 14.85 7.42 0.785 11.42

Sum rank 29.57 24.28 12 7.42 0.703 10.21

Reciprocal rank 29.57 29.57 17.14 8.85 0.875 12.73

AUC ROC results

Methods AUC(1%) AUC(2%) AUC(5%) AUC(100%)

E-pharmacophore III 0.56 0.602 0.649 0.832

vROCS 0.58 0.62 0.62 0.89

GlideXP 0.39 0.44 0.51 0.84

Sum score 0.64 0.6780 0.717 0.90

Sum rank 0.47 0.49 0.565 0.91

Reciprocal rank 0.72 0.75 0.81 0.96


Architecture

System1

System 2

System 3

System 4

Data Preprocessing

Rescoring and Ranking

Decision

Validation

Fusion Algorithms

10/16/2012

Virtual Screening of Asinex 400K compounds Workflow

• 400K compounds from Asinex Optimized using ligprep

Virtual ScreeningUsing • Phase E

pharmacophore select top 5000 compounds for VS in vROCs and Glide SP

• Conformer generation and perfom ROCS

• Glide SP docking

Data Fusion Using Reciprocal Rank algorithm

Chemical Structure Collection 3D virtual Screening

Post processing and Ranking

Compound Selection

Data Fusion Top 10% of the database Selected for for Glide XP docking

45 compoundsSelected after visual Inspection and pharmacophore mapping

Machine Learning Models under process• Tools used: a)PowerMV descriptors 2D pharmacological fingerprints, Weighted Burden Number and 8 properties b) maccs(166 keys) c) rcdk extended graph basedd) j compound mapper library PHAP2PT3 D, PHAP3PT3D , CATS3D,CATS2D None of the descriptors till now efficient to retrieve the 3D screening results well. But ML model provides hope because it’s classifying active and decoys well with polykernel SVM.

PCA Analysis of predicted compounds• 12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc

. html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity XLogP, molecular weight, topological shape and geometrical shape.

http://rguha.net/code/java/cdkdesc.html




Hits retrieved After visual inspection and Pharmacophore mapping

Docking of predicted compounds

Tools Used

• For docking and pharmacophore – Schrodinger’s Glide and phase

• Shape based Screening – vROCS• Performance calculation and visualization - R

statistics, ggplot2, enrichVS package.

More work

• Working with Design of PknG inhibitors • Enhanced Ranking systems for better

prediction• Automated protocol for developing enhanced

virtual screening using open source tools.

Acknowledgements

• Indo US science Technology Forum• Prof P.Yogeshwari and Prof D.Sriram (BITS

Hyderabad)• Computer Aided Drug Design Lab BITS Pilani

Hyderabad. • Prof David J Wild• OSDD Team

Education

3 d virtual screening of pknb inhibitors using data