Ajay N. Jain

Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular

Similarity-Based Search Engine

Ajay N. Jain

UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California

Presentation by Susan TangCS 379a

January 23, 2006

Protein-Ligand Docking Overview

Goal- To predict how well a given set of ligands will bind to a

protein structure- To predict the structure of bound protein-ligand

complexes

Components- Search method: explore different ways that ligand can

interact/fit with protein- Scoring function: assign a quantitative value to each

ligand/protein fit

Protein-Ligand Docking Overview

Criteria 1) Docking accuracy

Measures ability to find a conformation + alignment (pose) of a protein-ligand that is close to reality

2) Scoring accuracyAbility to rank a correct pose of a molecule higher than an incorrect one

3) Screening utilityAbility to identify only true ligands in a set that contains false positives

4) SpeedHow fast the algorithm can screen a library of ligands

Surflex: A new docking methodology

• Combines Hammerhead’s empirical scoring function with a molecular similarity method to generate putative poses of ligand fragments

• Like Hammerhead, Surflex has 1 mode that uses an incremental construction search approach. But Surflex also has another mode: a whole molecule approach that is faster/more accurate

• Surflex is designed primarily as a screening tool for small molecule libraries

Surflex: Computational Design

• Protomol Generation First create an ideal active site ligand from the protein structure of interestInput: (a) protein structure (b) list of residues to identify protein active site

Output: A protomol, or target to which potential ligands or ligand fragments are aligned based on molecular similarity

Procedure: Molecular fragments are put into the protein binding site in multiple positions optimized for interaction with protein select high-scoring nonredundant fragments protomol formation


• Protomol for streptavidin compared with the native pose of biotin (green)

• The bond being pointed to is broken by Surflex to make fragments of biotin for docking.

Surflex: Computational Design• Docking

Ligands are docked into the protein to optimize scoring function

Input: (a) protein structure, (b) protomol, (c) ligand(s)

Output: The optimized poses of docked ligands along with corresponding scores

Procedure: Divide input ligand into 1-10 molecular fragments search each fragment in terms of conformation each conformation of each fragment is aligned to protomol to get poses with maximum molecular similarity to protomol score aligned fragments and keep those with highest score and minimal protein interpenetration construct full ligand molecule from the aligned fragments using either an incremental construction approach or whole molecule approach highest scoring poses undergo further refinement of conformation and alignment

Incremental Construction vs. Whole Molecule Algorithm

Incremental Construction - Makes strong assumption that maximizing the similarity of tiny fragments to the protomol will generate good poses

Whole Molecule Algorithm- bypasses the strong independence assumption made in incremental construction- “dead” pieces are carried with the “live” piece during conformation search- when creating putative poses to protomol, the “dead” pieces in their arbitrary initial conformation are carried into the molecular similarity computation eliminate those with worst protein interpenetration- for remaining poses, score on basis of individual fragments- recursive search yields whole molecules that consist of fragments selected from different docked poses- these whole molecules score well in total, over all fragments



• Illustrates the process of docking biotin to streptavidin (blue)

• Gray indicates the “live” fragment

• Magenta indicates the “dead” fragment

• Green lines show the result of merging the two well-docked fragments at the atoms indicated by yellow circles

• The merged pose closely follows the parent fragments’ original configurations

Surflex: Evaluation

1) Evaluation of reliability and accuracy of dockings- Comparison with experimental results on 81

protein/ligand pairs- The pairs were selected to represent structural diversity

2) Evaluation of Surflex’s utility as a screening tool- Performed on 2 protein targets (thymidine kinase and

estrogen receptor) - Competing docking methods were tested side by side

using the same data set for comparison purposes (GOLD, Dock, FlexX)

3) Evaluation of the Surflex’s docking speed- Investigate relationship between docking time and # of rotatable bonds

Surflex: Evaluation Data Set Construction

Filtering Criteria:(1) 15 or fewer rotatable bonds

Most small molecules have <= 15 rotable bonds(1) no covalent attachments between ligand and protein

Since Surflex’s scoring function was developed strictly on noncovalent complexes

(3) ligands with no obvious errors in structure Undesirable to modify an existing protein-ligand

complex prior to testing

* data set used for GOLD docking program

134 protein-ligand

Complexes*

filter 81

protein-ligandcomplexes

Surflex: EvaluationResults

1) Evaluation of reliability and accuracy of dockingsDescribes how thorough the search procedure is and to what extent scoring function can recognize good dockings

• Surflex returned a pose within 2.5 angstroms rmsd (94 % of cases)

• Surflex returned a BEST scoring pose that was within 2.5 angstroms (86 % of cases)

• With a single docking from a random initial pose, chances of finding a correct or nearly correct pose is averaged to be ~70 %



2) Evaluation of Surflex’s utility as a screening toolTests ability of program to detect true positives against a background of random molecules (sensitivity vs. specificity)

• Surflex had a True Positive rate of > 80% at a False Positive rate of < 1 %

• Surflex had the best performance (lowest FP rate for a given TP rate) out of the different individual and combined methods assayed


3) Evaluation of the Surflex’s docking speedDocking speed becomes very important in screening large compound libraries.

• Surflex demonstrated a docking time that was approx. linear in number of rotatable bonds

• Rigid molecules took a few seconds and each additional rotatable bond took an additional ~10 seconds

• Surflex yielded a mean running time of 44 seconds for the 81 protein-ligands in the test set used earlier

• Docking speed ranges from 50-100 seconds per molecule for FlexX, DOCK, and GOLD (Surflex speed is comparable to these times)

• Quantitative comparison across methods is difficult due to differences in hardware and methodology


Conclusions• Surflex marks a step forward in flexible molecular docking programs • Compared to the best docking methods available, Surflex is:

– as fast – as accurate in terms of docked ligand RMSD– much more accurate in terms of scoring

• Assaying the top scoring 1% of compounds in the screening library should yield a large proportion of true positives

• Potential areas of improvement - scoring and penetration terms should be combined into a single score- scoring function should include training on non-binding ligands (negative examples)- effect of nonbonded self-interactions within ligands should be accounted for explicitly- allow a degree of protein flexibility (side chain movement)

Documents

Ajay N. Jain