Upload
others
View
8
Download
1
Embed Size (px)
Citation preview
GlideA New Paradigm for Rapid,
Accurate Docking and ScoringA New Paradigm for Rapid,
Accurate Docking and Scoring
Current Status and Future PlansThomas A. Halgren
Glide ~ Potential Utility
Lead Discovery! Dock sets of purchasable compounds to choose compounds to buy! Dock compound collection to choose compounds to assay! Especially, use in combination with HTS to:
" Reduce number of compounds that need to be assayed" Identify additional hits that HTS might miss
! Dock sets of CombiChem compounds to generate focused CombiChem libraries
Lead Optimization! Dock active compounds to confirm binding mode or to suggest
an alternative binding mode! Evaluate new ligand designs before synthesis
Criteria for High-Throughput Docking
! Screen databases rapidly on a timescale compatible with drug discovery needs
! Determine correct ligand binding mode! Predict binding affinity with high accuracy
! must be able to rank known ligands well and screen out ligands that won’t bind
! Provide a user-friendly setup and a highly automated docking protocol
! Facilitate analysis of docking results
Glide offers an attractive combination ofthese features
Glide Hierarchical Docking Strategy
Glide’s docking algorithmapproximates a completesystematic search overligand positions, orienta-tions, and conformationsin the receptor site.
Increasingly demandingtests are applied as thesearch space is reduced.
Glide "Funnel"Ligand conformations
1. Site-point search
2b. Subset test
2c. Greedy score
3. Grid minimization
4. Final scoring(GlideScore)
2a. Diameter test
Top hits
2d. Refinement
Conformation Generation ~ Definitions
O
N
O
O-
Hrotamer group
rotamer group
S
N
The four internal rotatable bonds are part of the core region
Glide generates anddocks many coreconformations, buttreats the rotamergroups sequentially,rather than combina-torially. This speedsup the calculation.
Ligand Placement ~ Definitions
Ligand Center, used insite-point search
Ligand Diameter
Atoms close to Ligand Diameter, used in diameter test
Line between two most widely separated atoms
Placed at the center of the ligand diameter
! Generate a 2-Å grid of site points in the active site! Pre-compute histograms of distances between
site point and receptor surface in grid setup! Compare site point – receptor surface histograms
with the ligand center – ligand surface histogram
! Reject mismatched site points
Stage 1 ~ Fast Site-Point Search
Site Point-Receptor Surface Ligand Center-Ligand Surface
Stage 2 ~ Rough Scoring
! Diameter test – check steric clashes of atoms near ligand diameter for ~300 pre-specified orientations of the ligand diameter
! Subset test – rotate about ligand diameter in 15����increments; score atoms capable of making H-bonds, ligand-metal interactions
! Greedy scoring – score all atom positions ±±±±1 Å in x,y,z directions; use best score
! Refinement – move whole ligand ±±±±1 Å in x,y,z directions and re-score; reduce ~5000 poses to ~400 for energy minimization
! Use pre-computed OPLS-AA vdW and electrostatic grids
! Anneal from soft to hard potential" Smoothing reduces large initial
energy/gradient terms from close contacts,permits freer movement
! Also optimize torsional angles when doing flexible docking
! Use Monte Carlo moves to explore nearby torsional minima for a small number of low-energy poses
Stage 3 ~ Energy Minimization
! Choose best pose(s) based on Emodel, a combination of:" Coulomb-vdW energy" GlideScore, an enhanced version of ChemScore" Internal strain energy for potential directing
conformational-search algorithm
! Final scoring based GlideScore:" employs all ChemScore terms" includes a contribution from the CvdW energy" adds terms that penalize non-physical interactions
Stage 4 ~ Final Scoring
Docking Accuracy ~ Test Set
Test-set includes 285 structures from the PDB! RMS < 1.0 Å for half of test-set! RMS < 1.5 Å for two-thirds of test-set
! Larger RMSDs! ligands with rotated phobic groups ! very large ligands (>20 rotatable bonds)! ligands highly exposed to solvent, few/no
H-bonds to receptor
Glide Speed and Accuracy ~ Summary
Number of Rot. Bonds
Number of Cases
Ave. RMS Top Ranked
PoseAve. CPU
Time (min)*
0-3
4-6
7-10
0-8
0-10
0-20
51
92
48
164
191
266
0.99
1.50
1.79
1.36
1.44
1.71
0.2
0.6
1.7
0.5
0.8
2.4
*1.3 GHz Linux Pentium III
Flexible docking of MMFFs-opt’d co-crystallized PDB ligands
Comparison to Gold for Gold Test Set
≤≤≤≤ 20 RB (86 cases) All Ligands (93 cases)
Avg. RMSD Max. RMSD Avg. RMSD Max. RMSD
Glide 1.48 5.8 Glide 1.48 5.8 1.57 1.57 6.86.8
Gold 2.89 14.0 3.03 14.0
RMSD (Å) comparison of structures from the PDB for the Gold test set
*
*Rotatable Bonds
http://www.ccdc.cam.ac.uk/prods/gold/rms_tab.html
Comparison to FlexX for FlexX Test Set
≤≤≤≤ 20 RB (177 cases) All Ligands (191 cases)
Avg. RMSD Max. RMSD Avg. RMSD Max. RMSD
Glide 1.46 8.3 Glide 1.46 8.3 1.66 13.51.66 13.5
FlexX 3.43 13.4 3.67 15.5
RMSD (Å) comparison of structures from the PDB for the FlexX test set
*
*Rotatable Bonds
http://cartan.gmd.de/flexx/html/flexx-eval.html
Scoring Accuracy ~ Database Screens Used
! Ten receptors featuring a wide variety of types of binding sites used to evaluate database enrichment
! Thymidine Kinase, Estrogen Receptor, CDK-2 Kinase, Sugar-Binding protein, HIV Protease, Thrombin, Thermolysin, P38 MAP Kinase, Cox-2, HIV-RT
! Roughly 1100 “decoy” ligands from CMC or PDB! 877 CMC ligands; 229 PDB ligands
! Up to 20 rotatable bonds; MMFF94s optimized
! Known binders from literature, from PDB test set, or provided by pharmaceutical colleagues
GlideScore 1.8 ~ Evolved from ChemScore
����Gbind = C0 + Clipo ���� f(rlr) + Chbond ���� g(����r) h(����a)
+ Cmetal ���� f(rlm) + Crotb Hrotb + Cclash Vclash
! ChemScore (Eldridge, JCAMD 1997, 11, 425-445) does:
! reward favorable lipophilic, hydrogen bonding, and metal ligation contacts
! penalize freezing out of rotatable bonds
! ChemScore doesn’t:
! penalize steric clashes (though a later version does)
! penalize lipophilic or hydrophilic mismatches
But GlideScore does
Enrichment Factors ~ Definitions
! Enrichment Factor (EF) usually defined as:EF = {Ntotal / Nsampled} #### {Hitssampled / Hitstotal}
! Better definition (EF’):
EF’ = {50% / Avg%ranksampled} #### {Hitssampled / Hitstotal}
! EF’ counts all sampled hits equally, not just last hit found
! Report enrichment factors EF or EF’ based on finding
70% or 80% of known binders
! Might be better to report EF or EF’ based on sampling
2%, 1% or less of a larger ranked database! Relevant measure is how many compounds can be assayed
Database Screening ~ GlideComp 1.8 Scoring
! GlideComp 1.8 is a combination of GlideScore 1.8 and the Coulomb-van der Waals energy:
GlideComp = 0.6*GlideScore + 0.08*E_CvdW
! The CvdW energy is the OPLS-AA nonbonded interaction energy as computed on the grid" ionic charges are reduced by ~50% to place charge-
charge, charge-dipole, and dipole-dipole interactions on a common energy scale
" exception: anionic-ligand/metal-cation interations! Glide 1.8 (and 2.0) also allow user to require that a
ligand achieve a specified hydrogen-bonding (hb) or metal-ligation (ml) score to be reported and counted
Enrichment Factors EF’(70%) for Glide 1.8
Database Screen
GlideScore w/o hb/ml
GlideComp w/o hb/ml
GlideComp w hb/ml
Thymidine Kinase 3 8 16
CDK-2 Kinase 6 7 7 P38 MAP Kinase 4 7 8
Estrogen Recep. 98 91 94 Thrombin 22 20 27
HIV Protease 46 47 54
Sugar-Bind. Prot. 82 105 105 Thermolysin 4* 6* 6*
* Emodel gives EF’ = 22 (w/o hb/ml filters) and EF’ = 23 (w hb/ml)
GlideCompusually doesbetter than GlideScore; hb or ml thresholdfilters some-times help
Main problems for Glide 1.8 scoring:! Enrichment factors for CDK-2 kinase and
P38 MAP kinase are too low (~7)
" Some methods don’t do this well!
! A different scoring function (Emodel) needs tobe used for thermolysin to give decent enrichment
! Having to choose and apply specific h-bond (hb)and metal-ligation (ml) filters is awkward for the user" Without a h-bond filter, the enrichment factor is
also too low for thymidine kinase
Glide 1.8 Problems
! Try descriptors based on FlexX, PLP (ScreenScore;Roche Basel) and ChemScore (done)
! Add additional Schrödinger descriptors! Add many more database screens: HIV-RT, cox-2,
neuraminidase, gyrase B, gelatinase A, squalene synthase, aldose reductase, acetylcholine-esterase, thymidylate synthase ... (in progress)
! Fit scoring models by optimizing parameters to maximize enrichment factors (done)
! Test robustness by excluding data and refitting(initial tests suggest overfitting is not a serious problem)
Improving Glide Scoring ~ Approach
7.06.03.010%
12.08.02.05%
20.015.05.02%
GS 2.0GC 1.8GS 1.8EF
Thymidine Kinase, w/o and w/ –1.8 kcal hb filter
8.09.09.010%
16.014.06.05%
25.015.05.02%
GS 2.0GC 1.8GS 1.8EF
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.00
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
10%
5%
2%
Database Enrichment for Glide 2.0 vs. Glide 1.8
50% of theactives are found in first 2% of the rankeddatabase forGlideScore 2.0; 90% arefound in thefirst 10%
GlideScore2.0 is clearlybetter
8.06.05.010%
14.010.08.05%
25.010.015.02%
GS 2.0GC 1.8GS 1.8EF
9.08.07.010%
12.08.08.05%
15.010.020.02%
GS 2.0GC 1.8GS 1.8EF
cdk-2 Kinase ~ 1dm2 and 1aq1 receptor sites
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.00
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
GlideScore 2.0 is clearly better for the 1dm2 site; is better at 5% and 10% of database for the 1aq1 site
7.17.15.010%
12.98.68.65%
10.73.53.52%
GS 2.0GC 1.8GS 1.8EF
5.84.54.510%
10.96.76.05%
19.713.610.62%
GS 2.0GC 1.8GS 1.8EF
p38 MAP kinase (1a9u) and cox-2 (1cx2)
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.00
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
GlideScore 2.0 is better for both p38 and cox-2
9.09.09.010%
18.016.018.05%
40.040.045.02%
GS 2.0GC 1.8GS 1.8EF
10.09.09.010%
20.018.018.05%
50.040.045.02%
GS 2.0GC 1.8GS 1.8EF
Estrogen receptor (3ert and 1err)
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
All three scoring methods perform very well for both receptor sites
10.010.010.010%
13.820.016.25%
18.828.128.12%
GS 2.0GC 1.8GS 1.8EF
10.010.010.010%
17.316.016.05%
36.633.330.02%
GS 2.0GC 1.8GS 1.8EF
Thrombin (1dwc) and HIV protease (1hpx)
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.00
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
GlideScore 2.0 is a bit worse for thrombin, but a bit better for HIV protease
10.010.010.010%
20.020.020.05%
50.033.350.02%
GS 2.0GC 1.8GS 1.8EF
Sugar-Binding Protein (1abe)/Thermolysin (1tmn)
10.010.06.04.010%
18.016.08.04.05%
35.025.010.010.02%
GS 2.0EmodelGC 1.8GS 1.8EF
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 Emodel GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
GlideScore 2.0 results are excellent - even for thermolysin, where GlideScore 1.8 does poorly
5.53.92.710%
3.66.73.65%
3.06.03.02%
GS 2.0GC 1.8GS 1.8EF
4.54.22.410%
2.45.43.05%
6.19.10.02%
GS 2.0GC 1.8GS 1.8EF
HIV reverse transcriptase (1vrt and 1rt1 sites)
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
0
10
20
30
40
50
60
70
80
90
100
GS 1.8 GC 1.8 GS 2.0
Database Enrichment for Glide 2.0 vs. Glide 1.8
HIV-RT is a tough case, but GlideScore 2.0 is slightly better than GlideScore 1.8
0
10
20
30
40
50
60
70
80
90
100
Th
Kin
ase
(1ki
m)
Est
. Rec
ep. (
3ert
)
CD
K-2
(1d
m2)
Suga
r-B
ind.
Pro
t.
HIV
Pro
teas
e
Thr
ombi
n
The
rmol
ysin
p38
MA
PK
inas
e
Cox
-2
Cox
-2 (
site
1 lig
s)
HIV
RT
(1v
rt)
HIV
RT
(1r
tl)
CD
K-2
(1a
q1)
Th
Kin
ase*
(1ki
m)
Est
. Rec
ep. (
1err
)
* -1.8 H-bond filter
Glide 2.0 Enrichment Factors ~ EF’(70%)
Improving Glide Scoring ~ Current Results
Database Screen GlideComp
1.8 GlideScore
2.0 Thymidine Kinase 8 12 Estrogen Recep. 91 83 CDK2 Kinase 7 23 P38 MAP Kinase 7 12 Sugar-Bind. Prot. 105 82 HIV Protease 47 57 Thrombin 20 17 Thermolysin 6 46 Cox-2 4 6 Cox-2 (site 1 ligs) 15 41 HIV-RT 3 5
EF’ values shown,computed for 70% recovery of active binders
None of the enrichment factors shown here useeither hydrogen-bonding or metal-ligation filters
Improving Glide Scoring ~ Cox-2 Case
Cox-2 screen had 33 active ligands from literature plus 1106 CMC or PDB database ligands
! Only 19 actives could dock with negative Coulomb-vdW energies anywhere in the primary binding pocket
! However, 13 of these “site 1” ligands show up in the first 20 positions in the ranked database!
! This gives an EF’(70) enrichment factor of 41 based on finding 13 or 19 site 1 ligands
! This is a better result than was obtained with Glide 1.8, and shows that Glide 2.0 is very effective at identifying known Cox-2 actives; it just can’t find all of them.! Probably shows limitations of docking to a rigid receptor
! GlideScore 2.0:" improves EF’s for most less well-treated screens " preserves high EF’s for well-treated screens " has less need for system-specific hydrogen-bonding or
metal-ligation filters" does not require different scoring function for
metalloproteins" therefore is easier to use
! The new scoring function does not use any ScreenScore or PLP terms, but does use the C-vdW interaction energy
! Key new terms are lipo-nonlipo terms (as in Fresno) and terms that reflect hydrophobic and hydrophilic complementa-
rity, evaluated using Merck-style “Active Site Mapping” grids
Improving Glide Scoring ~ Summary
Comparison to Other Methods ~ EF(80%)
Thymidine Kinase
Estrogen Receptor
Bissantz et al., J. Med. Chem. 2000, 43, 4759
01020304050607080
Glide-GlideScore Dock-DockE Dock-PMF
FlexX-FlexX FlexX-PMF FlexX-DockE
Gold-Gold Gold-DockE
Only Glide/GlideScore 2.0 gives a good enrich-ment factor for both thymidinekinase andthe estrogenreceptor.
Glide 2.5 ~ Improving Database Enrichment via Extra-Precision Docking and Scoring
! Iteratively re-dock one or a set of top-ranked poses generated by Glide in its normal mode of operation! Generate set of perturbed core conformations for each
pose! Constrain new docking to local region of receptor cleft
used by each such pose ! Can be ~ten times more expensive, but can apply to just
top 10-15% of docked ligands to keep costs in bounds.
! Extra precision scoring uses new technology that attempts to take better account of solvation/ desolvation phenomena and to more accuratelydiscriminate between good and bad interactions
Further Improving Database Enrichment; Tentative Results for Glide 2.5 and 2.5XP
Database Screen
Glide 1.8
Glide 2.0
Glide 2.5*
Glide 2.5XP
Thymidine Kinase 8 12 20 29 Estrogen Recep. 91 83 74 96 CDK2 Kinase 7 23 38 38 p38 MAP Kinase 7 12 12 44 Cox-2 (site 1 ligs) 15 41 39 39 Thrombin 20 17 22 40 Thermolysin 6 46 37 43 HIV Protease 47 57 55 46 HIV-RT 3 5 6 9
Enrichment factors EF’ for recovering 70% of active binders
* Not in � release of Glide 2.5, but may be in final release
GlideScore2.5XP (extraprecision mode) and even normal GlideScore 2.5 improve several cases significantly
0
10
20
30
40
50
60
70
80
90
100
Est
. Rec
ep. (
3ert
)
CD
K-2
(1d
m2)
Suga
r-B
ind.
Pro
t.
HIV
Pro
teas
e
Thr
ombi
n
The
rmol
ysin
p38
MA
PK
inas
e
Cox
-2 (
site
1 lig
s)
HIV
RT
(1v
rt)
HIV
RT
(1r
tl)
CD
K-2
(1a
q1)
Th
Kin
ase
(1ki
m)
Est
. Rec
ep. (
1err
)
Glide 2.5XP Enrichment Factors ~ EF’(70%)
Comparisonto 2.0 EF’sshows markedimprovement
! Initial implementation will allow user to require:! hydrogen bonds to designated receptor atoms! metal ligations to designated metal ions (e.g., Zn2+)
! Any ligand atom of proper type can satisfyconstraint
! Types initially recognized will be: ! Hbond donor! neutral Hbond acceptor! anionic Hbond acceptor
! A later implementation may:! allow hydrophobic constraints to be defined! allow specific functional groups to be specified
Constraints - Under Development for Glide 2.5
to allow discrimination for metalloproteins}
Current and Planned Improvements for Glide
! Improve the scoring function (Glide 2.0)
! Reduce memory requirements (Glide 2.0)
! Improve docking accuracy (Glide 2.5 – Fall 2002)
! Implement constraints (Glide 2.5 – Fall 2002)
! Further scoring accuracy (Glide 2.5 – Fall 2002)
! Improve efficiency ! Allow receptor flexibility
! Treat discrete sidechain positions/ionization states! Treat protein as an ensemble of configurations
Assessing Docking Hits ~ Active Site Mapping
! Maestro facility: provides visual feedback to user by displaying molecular surfaces and volumes
! Helps to qualitatively assess Glide docking hits:! Hydrophobic volumes – enclose regions in active-site
space appropriate for hydrophobic portions of ligand! Hydrophilic volumes – enclose hydrophilic regions! Visual comparison quickly highlights:
! mismatches in complementarity! “targets of opportunity” – e.g., hydrophobic regions
with room for a larger hydrophobic group
! Molecular and extra-radius surfaces can alsobe visualized