Upload
douglas-snow
View
216
Download
2
Embed Size (px)
Citation preview
Structure-Based Virtual Screening:New methods, Old Problems and “Ancient” Solutions
Structure-Based Virtual Screening:New methods, Old Problems and “Ancient” Solutions
Structure-Based Virtual Screening (SVS) is a proven technique Structure-Based Virtual Screening (SVS) is a proven technique for lead discoveryfor lead discovery
Still significant room for improvementStill significant room for improvement Efforts generally focused on the creation of novel scoring functionsEfforts generally focused on the creation of novel scoring functions
In this presentationIn this presentation Present a novel technique for scoring function development Present a novel technique for scoring function development
Highlight problems encounteredHighlight problems encountered Illustrate the potential of pharmacophore constraints to mitigate some of Illustrate the potential of pharmacophore constraints to mitigate some of
these issues these issues
Analyze implications for current and future SVS technologyAnalyze implications for current and future SVS technology
Scoring Functions: Development Within an SVS algorithm Framework
Scoring Functions: Development Within an SVS algorithm Framework
We are interested in the top ranking molecules from SVSWe are interested in the top ranking molecules from SVS
Do not care about the nature of the score itselfDo not care about the nature of the score itself
Alternative strategy - design function around optimization of active Alternative strategy - design function around optimization of active
molecule rankmolecule rank Accomplished using docking data from selected SVS algorithmsAccomplished using docking data from selected SVS algorithms
No binding data requiredNo binding data required Complexed ligand should top the rank listComplexed ligand should top the rank list
Allows metrics that describe reasons for lack of bindingAllows metrics that describe reasons for lack of binding High scoring docked inactivesHigh scoring docked inactives
Effect of docking algorithm limitations can be better understoodEffect of docking algorithm limitations can be better understood Optimizing within the framework to be used for SVS calculations Optimizing within the framework to be used for SVS calculations
Data Set Selection: Filtering
Data Set Selection: Filtering
Initial set of ~300 complexes extracted from the work of Böhm, Keske Initial set of ~300 complexes extracted from the work of Böhm, Keske and Dixonand Dixon
Gschwend et al. J. Mol. Recognit. 1996, 9, 175-186. Gschwend et al. J. Mol. Recognit. 1996, 9, 175-186. Böhm. J. Comput.-Aided Mol. Design 1994, 8, 243-256.Böhm. J. Comput.-Aided Mol. Design 1994, 8, 243-256.
To ensure diversity and quality, filters applied:To ensure diversity and quality, filters applied: Discard complexes with:Discard complexes with: More than 50 heavy atoms / Resolution>2.5Å / covalently More than 50 heavy atoms / Resolution>2.5Å / covalently
bound / incompletely modeledbound / incompletely modeled
Data weighted towards specific targets - many close analoguesData weighted towards specific targets - many close analogues If a general scoring function is required, these need to be filteredIf a general scoring function is required, these need to be filtered
Initial efforts were set on removing all repeat targets. Initial efforts were set on removing all repeat targets. Too drastic - multiple complexes of the same target kept as long as ligand Too drastic - multiple complexes of the same target kept as long as ligand
represented a unique chemotype (no analogues)represented a unique chemotype (no analogues)
Data Set Selection:Unexpected OdditiesData Set Selection:Unexpected Oddities
Odd interactions brought about by extreme crystallization conditionsOdd interactions brought about by extreme crystallization conditions 1rnt - acidic crystallization conditions (pH5.0) produce unusual protonation state1rnt - acidic crystallization conditions (pH5.0) produce unusual protonation state
Multiple points of crystal contact with symmetry related moleculesMultiple points of crystal contact with symmetry related molecules 4gr1 has more interactions with symmetry related protein than deposited structure4gr1 has more interactions with symmetry related protein than deposited structure
An extreme case, but problem significant enough for inclusion in Relibase: An extreme case, but problem significant enough for inclusion in Relibase:
http://www.ccdc.cam.ac.uk/news/14_12_01.htmlhttp://www.ccdc.cam.ac.uk/news/14_12_01.html
Scoring Function Development :Data Set Selection - the Final Tally
Scoring Function Development :Data Set Selection - the Final Tally
Once all filters applied > 75% of complexes removedOnce all filters applied > 75% of complexes removed
Highlights significant problems in generating a clean data setHighlights significant problems in generating a clean data set
An under-appreciated problem in scoring function developmentAn under-appreciated problem in scoring function development
The need to analyze and exploit all available PDB data including the most recently deposited structuresThe need to analyze and exploit all available PDB data including the most recently deposited structures
Requires much manual interventionRequires much manual intervention
Poster 110 Sadowski et al.Poster 110 Sadowski et al.
Poster 251 by Fenu et al.Poster 251 by Fenu et al.
Final selections - 20 training set and 10 test set complexesFinal selections - 20 training set and 10 test set complexes
Scoring Function Development : Basic Strategy
Scoring Function Development : Basic Strategy
Use a GA and stored metric data to simultaneously optimize rank of
“active” orientations in each target site
DOCK (4.0) Ligand data set into each active site
(“active” ligand + molecular noise)
Feed all docked orientations into metric generator
Scoring function Development:GA Implementation
Scoring function Development:GA Implementation
GA optimizes GA optimizes
average rank of the average rank of the
“active” orientations “active” orientations
within a data set of within a data set of
docked molecules docked molecules
and targets. and targets.
Score = a*metric1 + b*metric2 + ...
Scoring function Development:Tests run
Scoring function Development:Tests run
Three primary experiments undertaken:Three primary experiments undertaken: Optimize rank using crystallographic ligand orientation (CLO) studyOptimize rank using crystallographic ligand orientation (CLO) study
Replace CLO with orientation produced on reDOCKing ligand binding Replace CLO with orientation produced on reDOCKing ligand binding
conformer into target site - closest docked orientation (CDO) studyconformer into target site - closest docked orientation (CDO) study
Compare results with standard DOCK scoring functions (contact / force Compare results with standard DOCK scoring functions (contact / force
field)field)
“Typical” CDO orientation compared to CLO binding mode for 7est. Heavy atom RMS=1.56Å
Scoring function Development: CLO test
Scoring function Development: CLO test
High ranking in both training and test sets High ranking in both training and test sets
(4/24000 - 37/22000)(4/24000 - 37/22000)
Clash descriptor scores highlyClash descriptor scores highly CLASH weights against ligand protein bumps CLASH weights against ligand protein bumps
Rare for CLO Rare for CLO
More common in DOCK orientations.More common in DOCK orientations.
Effectively acts as an indicator variableEffectively acts as an indicator variable
TrainComplex Rank
1phd 33
3cpa 5
9aat 4
1ak3 3
2pk4 3
2tmn 3
1rnt 2
7est 2
1abe 0
1apv 0
1pph 0
1rbp 0
1snc 0
2tsc 0
3gap 0
4dfr 0
4phv 0
4sga 0
6tim 0
7cat 0
Average rank
TestComplex Rank
1dr1 286
1xli 39
1phg 30
2ifb 1
3fx2 1
4mdh 05p21 0
5tmn 0
8cpa 0
Averagerank
4
37
Optimized coefficients
(Normalized % Contribution to CLO scores )
Clash Hyd Surf Hbond Electro
-0.816 (-3.7) 0.006 (44.0) 0.574 (42.6) 0.070 (17.1)
Scoring function Development: CDO test
Scoring function Development: CDO test
4 training set and 1 test set compound unable 4 training set and 1 test set compound unable
to dock within 2.0Å RMS of CLOto dock within 2.0Å RMS of CLO Removed from analysisRemoved from analysis
Test results look less impressiveTest results look less impressive
Due to docking inaccuracies Due to docking inaccuracies H bond network breakdownH bond network breakdown
Clash term importance drops significantly now, as Clash term importance drops significantly now, as
CDO, unlike CLO often contains bumpsCDO, unlike CLO often contains bumps
TestComplex Rank
1xli 11785
1dr1 9503
2ifb 848
1phg 99
4mdh 29
5tmn 4
3fx2 2
3tpi 2
8cpa 1
AverageRank
TrainComplex Rank
1phd 260
7est 114
1rnt 111
2tmn 83
3cpa 66
9aat 37
2tsc 27
1abe 246tim 22
2pk4 16
1ak3 13
4dfr 13
3gap 9
1pph 2
1rbp 1
1snc 1
Average Rank 51
2476
Optimized coefficients
(Normalized % Contribution to CDO scores )
Clash Hyd Surf Hbond Electro
-0.20 (-8.9) 0.01 (51.7) 0.97 (37.6) -0.10 (19.6)
Scoring function Development: Average test set Rank Comparisons
Scoring function Development: Average test set Rank Comparisons
CDO orientations in CLO scoring function Average rank = 19337 CLO orientations in CDO scoring function Average rank = 75 CDO performance more robust
Due to a reduction in sensitivity to steric clashes
CDO orientations and DOCK contact score average rank = 2690 CDO orientations and DOCK force field score average rank = 16518 All atom model and R12 repulsion oversensitive to clashes Contact score user controlled steric clash penalty permits sensitivity control
Comparison of CDO and contact score shows a slight improvement average ranks = 2476 / 2690 H bond/electrostatics adding some additional resolution
Scoring function Development: Conclusions
Scoring function Development: Conclusions
Results highlight potential pitfalls in scoring function designResults highlight potential pitfalls in scoring function design More robust data sets required (More robust data sets required (c**p in - c**p out )c**p in - c**p out ) Xtal data performance not necessarily representative of real world SVSXtal data performance not necessarily representative of real world SVS
CLO scoring functionCLO scoring function
High resolution descriptors are not always compatible with binding modes of 1.0-2.0High resolution descriptors are not always compatible with binding modes of 1.0-2.0ÅÅ
accuracy often seen at current sampling levelsaccuracy often seen at current sampling levels H bond net work breakdown even with near-hit binding modesH bond net work breakdown even with near-hit binding modes
Need to consider alternative scoring metricsNeed to consider alternative scoring metrics lower resolution descriptors / non-binding event measures lower resolution descriptors / non-binding event measures
Scoring and sampling are not separable problemsScoring and sampling are not separable problems To take scoring functions to the next level need to focus on SVS technology with more To take scoring functions to the next level need to focus on SVS technology with more
exhaustive sampling paradigms exhaustive sampling paradigms Additional CPU essential: Distributed (grid-based) computingAdditional CPU essential: Distributed (grid-based) computing
Exploiting an old Trick:SVS and Pharmacophore constraints
Exploiting an old Trick:SVS and Pharmacophore constraints
Another major scoring function failingAnother major scoring function failing Inability to differentiate H bond/Salt bridge strengthsInability to differentiate H bond/Salt bridge strengths
H bonds often measured by presence or absenceH bonds often measured by presence or absence
Salt bridges despite there importance are often ignoredSalt bridges despite there importance are often ignored
SVS searches are generally undertaken with a binding hypothesis in SVS searches are generally undertaken with a binding hypothesis in
mindmind Exploitation of known target structural biologyExploitation of known target structural biology
Scoring functions often struggle to incorporate such informationScoring functions often struggle to incorporate such information
Pharmacophore constraints provide a sampling-based alternative Pharmacophore constraints provide a sampling-based alternative
paradigm to mitigate these issuesparadigm to mitigate these issues
Pharmacophoric Constraints: DOCK Chemical Matching and Critical Regions
http://www.cmpharm.ucsf.edu/kuntz/dock4/html/Manual.47.html#pgfId=20180
Pharmacophoric Constraints: DOCK Chemical Matching and Critical Regions
http://www.cmpharm.ucsf.edu/kuntz/dock4/html/Manual.47.html#pgfId=20180
# acyl sulphonamide # acyl sulphonamide
definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )
# deprotonated carboxyl# deprotonated carboxyl
definition O.co2 ( C )definition O.co2 ( C )
Region 1 + 2
acceptor / donor
Region 3
Hydrophobic
In-house DOCK pharmacophore types:
heavy atom
donor
acceptor
hydrophobe
aromatic
aromatic_hydrophobic
acid
base
donor_and_acceptor
special (e.g. metal chelator)Sample Kinase site definition
Sample acid site point definitions DOCK permits creation of user DOCK permits creation of user
defined pharmacophore elementsdefined pharmacophore elements
When combined with critical regions, When combined with critical regions,
DOCK can simultaneously undertake DOCK can simultaneously undertake
1000’s of binding site constrained 1000’s of binding site constrained
pharmacophore searchespharmacophore searches
Pharmacophoric Constraints:Comparison Test sets
Pharmacophoric Constraints:Comparison Test sets
5 Targets analyzed5 Targets analyzed ~10000 noise molecules plus active compound data set ~10000 noise molecules plus active compound data set
docked into each active sitedocked into each active site
Enrichment analysis based on chemotype rather than Enrichment analysis based on chemotype rather than
headline hit rate to prevent active analogue bias headline hit rate to prevent active analogue bias
Target Active chemotypedefinitions
Defined critical regions(associated pharmacophore type(s))
Serine protease 1 P1 substituent / P1-P4linker substituent
S1 sub site (base or hydrophobe)S4 sub site (hydrophobe)
Serine protease 2 P1 substituent / P1-P4linker substituent
S1 sub site (base)S4 sub site (hydrophobe)
Fatty acid bindingprotein 1
Core linking acid moiety toremaining substituents
Acid binding sub site (acid)Rear hydrophobic pocket
(hydrophobe)Fatty acid binding
protein 2Core linking acid moiety to
remaining substituentsAcid binding sub site (acid)
KinaseMoiety mimicking adenine /
main core of moleculesAdenine hydrogen bonding regions
(donor/acceptor) rear hydrophobic pocket (hydrophobe)
Averaged Chemotype EnrichmentsAveraged Chemotype Enrichments
Constrained contact search enrichment stands out Constrained contact search enrichment stands out Force field performance limited by aforementioned over-Force field performance limited by aforementioned over-
sensitivity to steric clashessensitivity to steric clashes
0
1
2
3
4
5
6
7
100 200 300 400 500Compound rank
Che
mo
typ
es f
oun
d Generic forcefield
Constrainedforce field
Genericcontact
Constrainedcontact
Searches Across Different SVS Paradigms: Kinase Pocket
Searches Across Different SVS Paradigms: Kinase Pocket
Performance improves as Performance improves as
scoring function simplifiedscoring function simplified Prometheus in particular led astray Prometheus in particular led astray
by spurious h bondsby spurious h bonds
Flexible site / inactivated formFlexible site / inactivated form Challenging targetChallenging target
Constrained contact score Constrained contact score
performs bestperforms best Unable to implement Constraints in Unable to implement Constraints in
Prometheus and GOLDPrometheus and GOLD
1 0 0
3 0 0
5 0 0
0
2
4
6
8
Che
mot
ypes
hi
t
C o m p o u n d r a n k
K i n a s e ( 1 4 c h e m o t y p e s t o t a l )
Pharmacophoric Constraints:Conclusions
Pharmacophoric Constraints:Conclusions
Pharmacophores offer numerous attractive features in SVS Pharmacophores offer numerous attractive features in SVS Improved hit ratesImproved hit rates
Binding orientations constrained by user hypotheses to biologically Binding orientations constrained by user hypotheses to biologically
relevant regions of spacerelevant regions of space known structural biology known structural biology
For algorithms such as DOCK, large increases in search speed (typically For algorithms such as DOCK, large increases in search speed (typically
1-2 orders of magnitude)1-2 orders of magnitude)
Simple scoring functions still have a role to play in SVSSimple scoring functions still have a role to play in SVS more tolerance to errors in binding mode and limitations in active site more tolerance to errors in binding mode and limitations in active site
resolutionresolution
AcknowledgementsAcknowledgements
Thank youThank you to to
GA scoring functionGA scoring function
designdesign
Ryan SmithRyan Smith
Dan GschwendDan Gschwend
Andrew LeachAndrew Leach
Rod HubbardRod Hubbard
Pharmacophore searchingPharmacophore searching
Tim PerkinsTim Perkins
Dan CheneyDan Cheney
Doree SitkoffDoree Sitkoff
John TokarskiJohn Tokarski
Yi LiYi Li
Jonathan Mason and all my otherJonathan Mason and all my other
BMS colleagues past and presentBMS colleagues past and present
TEC / GA source available to all interested partiesTEC / GA source available to all interested [email protected]@bms.com
Searches Across Different SVS Paradigms: Generic vs Constrained(*) Searches
Searches Across Different SVS Paradigms: Generic vs Constrained(*) Searches
FAB protein 2 well FAB protein 2 well
defined rigid pocketdefined rigid pocket Good SVS targetGood SVS target
All methods perform wellAll methods perform well high percentage of high percentage of
chemotypes foundchemotypes found
In all cases constrained In all cases constrained
search outperforms its search outperforms its
generic equivalentgeneric equivalent