8
DRUG DEVELOPMENT RESEARCH 72 : 130–137 (2011) Research Overview Fragment and Protein Simulation Methods in Fragment Based Drug Design Anthony E. Klon, Zenon Konteatis, Siavash N. Meshkat, Jinming Zou, and Charles H. Reynolds Drug Development Research, Discovery Technologies, Ansaris, Blue Bell, Pennsylvania Strategy, Management and Health Policy Enabling Technology, Genomics, Proteomics Preclinical Research Preclinical Development Toxicology, Formulation Drug Delivery, Pharmacokinetics Clinical Development Phases I-III Regulatory, Quality, Manufacturing Postmarketing Phase IV ABSTRACT Fragment-based drug design (FBDD) has become an important and successful approach to drug discovery. In this review, we discuss two classes of simulation technologies that we routinely employ as part our of computational FBDD efforts. The first class centers on simulation methods in torsion space to develop high-quality protein models suitable for FBDD. These algorithms allow for fast molecular dynamics and modal Monte Carlo simulations. The torsion space dynamics techniques have been applied to develop models for the bound conformations of a variety of proteins including the HIV-1 protease, p38 MAP kinase, and the 5 0 -AMP-activated protein kinase. The second class of simulations is comprised of the Grand Canonical Monte Carlo and systematic sampling methods, which are used to explore the interactions of individual fragments with the protein target. Previously published validation studies for the binding of molecules to T4 lysozyme and the p38 MAP kinase are discussed. We review previous work to computationally assemble whole molecules from fragment binding data, a potential bottleneck in the FBDD approach. One effect of the fragment simulations is that an approximate value for the free energy of binding of a given molecule with the protein may be computed from the fragment simulations, with an estimated standard error approaching 1 kcal/mol, which is comparable to the performance of a variety of other methods reported in the literature. Drug Dev Res 72:130–137, 2011. r 2010 Wiley-Liss, Inc. Key words: fragment-based drug design; molecular dynamics; protein modeling; drug discovery INTRODUCTION The cost of bringing new small molecule phar- maceuticals to market has frequently been cited at approximately $800 million U.S. dollars and takes up to 15 years from the start of a drug discovery program from the data available as of 2000 [DiMasi et al., 2003]. In 2006, the United States Government Accountability Office (GAO) released a report examining the costs of new drug development from 1993 through 2006 that showed that R&D investment has increased from 2000 to 2004, while the number of new drug applications (NDAs) and NDAs for new molecular entities (NMEs) has declined [Aronovitz et al., 2006]. In inflation- adjusted dollars, there has been a 147% increase in R&D costs between 1993 and 2004 with no commen- surate increase in the number of NDAs and NMEs. Consistent with the timescales reported by DiMasi et al. [2003], the GAO reported that the time to successfully bring a new drug to market in 2004, on average, was over 13 years. This was due to an average of 6.5 years spent in drug discovery and pre-clinical development and 7 years spent in clinical trials. DDR Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/ddr.20409 Correspondence to: Anthony E. Klon, Ansaris, Four Valley Square, 512 East Township Line Road, Blue Bell, PA 19422. E-mail: [email protected] c 2010 Wiley-Liss, Inc.

Fragment and protein simulation methods in fragment based drug design

Embed Size (px)

Citation preview

DRUG DEVELOPMENT RESEARCH 72 : 130–137 (2011)

Research Overview

Fragment and Protein Simulation Methodsin Fragment Based Drug Design

Anthony E. Klon,� Zenon Konteatis, Siavash N. Meshkat, Jinming Zou,and Charles H. Reynolds

Drug Development Research, Discovery Technologies, Ansaris, Blue Bell, Pennsylvania

Strategy, Management and Health Policy

Enabling

Technology,

Genomics,

Proteomics

Preclinical

Research

Preclinical Development

Toxicology, Formulation

Drug Delivery,

Pharmacokinetics

Clinical Development

Phases I-III

Regulatory, Quality,

Manufacturing

Postmarketing

Phase IV

ABSTRACT Fragment-based drug design (FBDD) has become an important and successful approach todrug discovery. In this review, we discuss two classes of simulation technologies that we routinely employas part our of computational FBDD efforts. The first class centers on simulation methods in torsion space todevelop high-quality protein models suitable for FBDD. These algorithms allow for fast moleculardynamics and modal Monte Carlo simulations. The torsion space dynamics techniques have been appliedto develop models for the bound conformations of a variety of proteins including the HIV-1 protease, p38MAP kinase, and the 50-AMP-activated protein kinase. The second class of simulations is comprised ofthe Grand Canonical Monte Carlo and systematic sampling methods, which are used to explore theinteractions of individual fragments with the protein target. Previously published validation studies for thebinding of molecules to T4 lysozyme and the p38 MAP kinase are discussed. We review previous work tocomputationally assemble whole molecules from fragment binding data, a potential bottleneck in theFBDD approach. One effect of the fragment simulations is that an approximate value for the free energy ofbinding of a given molecule with the protein may be computed from the fragment simulations, with anestimated standard error approaching 1 kcal/mol, which is comparable to the performance of a variety ofother methods reported in the literature. Drug Dev Res 72:130–137, 2011. r 2010 Wiley-Liss, Inc.

Key words: fragment-based drug design; molecular dynamics; protein modeling; drug discovery

INTRODUCTION

The cost of bringing new small molecule phar-maceuticals to market has frequently been cited atapproximately $800 million U.S. dollars and takes up to15 years from the start of a drug discovery programfrom the data available as of 2000 [DiMasi et al., 2003].In 2006, the United States Government AccountabilityOffice (GAO) released a report examining the costs ofnew drug development from 1993 through 2006 thatshowed that R&D investment has increased from 2000to 2004, while the number of new drug applications(NDAs) and NDAs for new molecular entities (NMEs)has declined [Aronovitz et al., 2006]. In inflation-adjusted dollars, there has been a 147% increase in

R&D costs between 1993 and 2004 with no commen-surate increase in the number of NDAs and NMEs.Consistent with the timescales reported by DiMasiet al. [2003], the GAO reported that the time tosuccessfully bring a new drug to market in 2004, onaverage, was over 13 years. This was due to an averageof 6.5 years spent in drug discovery and pre-clinicaldevelopment and 7 years spent in clinical trials.

DDR

Published online in Wiley Online Library (wileyonlinelibrary.com).DOI: 10.1002/ddr.20409

�Correspondence to: Anthony E. Klon, Ansaris, Four ValleySquare, 512 East Township Line Road, Blue Bell, PA 19422.E-mail: [email protected]

�c 2010 Wiley-Liss, Inc.

Typically, only one out of 10,000 compounds testedstarting from early phase research resulted in anapproved drug. The rate of clinical failures has alsoincreased from 82% in the period between 1996 and1999 to 91% for the period between 2000 and 2003.The decline in productivity was further demonstratedby reports from the FDA that NDAs were primarilysubmitted for existing drugs rather than NMEs (102NDAs vs. 30 NMEs in 2004).

Computational fragment-based drug design offersthe opportunity to synthesize and test far fewercompounds than is necessary using traditional ap-proaches. In a typical high-throughput screening (HTS)campaign, or virtual screening (VS) study, somewhere onthe order of 100,000–1,000,000 or more may be tested invitro or in silico. During the course of a drug discoveryprogram leading to a clinical candidate, it is common tomake between 5,000–10,000 compounds. We typicallysimulate �2,000 unique fragments in a single program.The number of potential compounds that could con-ceivably be synthesized using these fragments is Nf,where N is the number of fragments being simulated andf corresponds to the number of fragments in thedesigned compounds. For a relatively modest three-fragment build, for example, the number of compoundsthat can theoretically be considered using 2,000simulated fragments is 8� 109. Fortunately, by allowingthe protein structure to dictate which fragments have thepotential to interact favorably, far fewer need actually beexplicitly considered by the designer. This approachprovides a means for exploring greater chemical diversitythan traditional HTS or VS approaches. Furthermore,the choice of fragments and, ultimately, compounds isnot restricted to structures available in an existingcorporate library as would be the case in an HTScampaign. In our experience, this approach requiressynthesis of �500–1,000 compounds to identify a drugcandidate, corresponding to an order of magnitudeincrease in efficiency in the synthetic effort.

PROTEIN MODELING

The successful prosecution of a computationalfragment-based drug design effort relies on theexistence of a model protein structure of sufficientquality that the fragment simulations may be carriedout and yield meaningful results. The interplay of twocriteria, the origin of the protein models, and theaccuracy of these models, is necessary for theidentification of a suitable protein model to support acomputer-aided FBDD effort.

Development of Protein Models

Broadly speaking, the protein models used in thefragment simulations may come from either experimental

sources or may be developed through protein modelingtechniques. In principle, either x-ray crystallography orNMR spectroscopy can be used to determine thestructure of the protein in question, but historicallystructures determined through x-ray crystallographyhave been used as the gold standard for computer-aided design efforts at Ansaris. In instances where thestructure of a protein is not known, it is necessary toutilize homology modeling techniques [Diller and Li,2003] to arrive at an initial protein structure, which isfurther optimized using a combination of Cartesian-space and torsion-space molecular mechanics algo-rithms [Carnevali et al., 2003; Katrich et al., 2003;Rosenthal and Meshkat, 2004] to arrive at a modelof sufficient quality to support fragment simulationand design.

In addition to the generation of starting modelsfor a protein of unknown structure, homology modelingtechniques have also been used successfully toreconstruct missing sections of a protein crystalstructure. This is commonly the case with mobiledomains such as the activation loop (A-loop) in proteinkinases. We have also used these tools to modelproteins in different conformations. A classic examplefor protein kinases is the DFG-motif. The DFG-motifexists in two general conformations, the ‘‘DFG-in’’ and‘‘DFG-out.’’ Type I, ATP-competitive, kinase inhibitorstypically bind to the DFG-in conformation, whileType II inhibitors, spanning the ATP site and anallosteric pocket, typically bind to the DFG-outconformation [Zhang et al., 2009]. Depending on thetype of kinase inhibitor being designed, a proteinmodel with the DFG-motif in either the DFG-in orDFG-out conformation is necessary. In cases where thecrystal structure exists for one conformation of theDFG-motif, DFG-in or DFG-out, a homologousprotein may be used to model the alternativeconformation [Kufareva and Abagyan, 2008].

Both of these cases existed with our recent workon the design of inhibitors to the 50-AMP-activatedprotein kinase (AMPK). The existing crystal structuresof AMPK available from the Protein Data Bank showedthe DFG-motif in the DFG-out conformation, and theA-loop was entirely absent from the publishedstructure. The crystal structure of the microtubule-affinity-regulating kinase-1 (MARK-1) was used tomodel the missing A-loop and the DFG-motif in theDFG-in conformation. Subsequent optimization withImagiro resulted in a model suitable for Type I kinaseinhibitor design [Machrouhi et al., 2010]. This approachwas also used to create a protein model for the p38MAP kinase from a crystal structure (PDB ID no.1KV2 [Pargellis et al., 2002]) missing the A-loop [Clarket al., 2009b]. In this case, the A-loop was taken from

131SIMULATION METHODS IN FBDD

Drug Dev. Res.

another p38 structure (PDB ID no. 1A9U [Wang et al.,1998]), and the resulting model was relaxed by allowingthe A-loop to undergo 1 ns of torsion-space MD.

Refinement of the Protein Model Using CompoundsWith Known Activity

The initial model generated can be furtheroptimized if there are ligands known to bind to thetarget protein. In these cases, the known ligand isdecomposed into its constituent fragments (Fig. 1).Each fragment is then simulated against the targetprotein, as discussed below, and the known compoundis assembled from the fragment distributions. Addi-tional rounds of optimization are carried out inCartesian space to correct the bond length and anglegeometries for the ligand, followed by torsion-spacerefinement in which the entire protein-ligand complexis allowed to relax. Once a stable structure is obtained,the ligand may be removed and the resulting proteinmodel may be used for future design efforts. If theentire ligand cannot be constructed in the protein, acycle of Transplant-Insert-Constrain-Relax-Assemble(TICRA) [Meshkat et al., 2010] moves is carried outin which as many fragments as possible are built intothe protein model, then the resulting conformation isrelaxed in Cartesian and torsion space as describedabove. These operations are carried out iteratively untilthe entire ligand can be built into the active site.

The caveat of this approach is that the resultingprotein-ligand complex is necessarily biased towardscompounds with similar chemotypes. In cases whereour computational FBDD approach is being used tosupport lead optimization this is not necessarily adrawback, but may actually be desirable if the goal is todesign to a protein model more suitable for thechemical series of interest. In this way, a protein model

may be optimized concurrently with the program as thechemical series is refined to an optimized lead. Thisapproach also provides the ability to generate multipleprotein models if several chemical series are beingoptimized.

Torsion Space Molecular Dynamics Simulations ofProteins

The primary molecular modeling software pack-age used at Ansaris is Imagiro, originally developed byProtein Mechanics, and it is capable of Cartesian-spaceand torsion-space molecular mechanics algorithms,including energy minimization (EM), molecular dy-namics (MD), Monte Carlo (MC), and Modal MonteCarlo [Carnevali et al., 2003] (MMC) calculations. Thealgorithms implemented in Imagiro provide severaladvantages to conventional molecular mechanics soft-ware packages that significantly increase computationalspeed without sacrificing accuracy by using an implicitsolvent model [Catto, 2003], an internal coordinateJacobian [Rosenthal and Meshkat, 2004], and theprojection of a Cartesian force field onto an internalcoordinate model with fixed bond geometry originallydeveloped by Molsoft [Carnevali et al., 2003; Katrichet al., 2003].

Torsion space molecular mechanics calculationsprovide several advantages compared to conventionalCartesian space algorithms. By eliminating bond andangular terms from the energy calculations, largertimesteps may be taken [Chen et al., 2005]. Fewerterms in the energy function reduces the complexity ofthe calculations, resulting in less computational timenecessary per step. Furthermore, less drift in the totalenergy of the system is observed, with the consequencethat long-timescale simulations may be undertakenwith greater accuracy. We are currently capable of

N

N

N

ON

NN

NH

NH

O

1

2

3

Cl

Fig. 1. Demonstration of fragment decomposition used for (1) Compound C, an inhibitor of AMPK [Bain et al., 2007], (2) the lead compound forthe p38 MAP kinase inhibitor BIRB 796 [Regan et al., 2002], (3) an example of a small two-fragment compound used in crystallographic andcalorimetric studies of binding to T4 lysozyme L99A [Morton et al., 1995; Morton and Matthews, 1995].

132 KLON ET AL.

Drug Dev. Res.

achieving �50 ps/day of simulation time for a system of�9,000 atoms on a single processor (2.66 GHz IntelCore 2 CPU, 1 GB memory), and parallelization scalesroughly linearly up to eight processors.

HIV-1 Protease Flap Opening and ClosingMechanisms Using Torsion Space Molecular

Dynamics

Toth and Borics used the torsion-space MDalgorithms in Imagiro to investigate the flap closingand opening mechanisms in the HIV-1 protease (HIV-1PR) [Toth and Borics, 2006a,b]. These simulationsdemonstrated the utility of torsion-space dynamics toevolve a relevant binding model for small moleculeinhibitors from a crystal structure of a different proteinconformation. The simulations also demonstrated theability to generate multiple plausible protein conforma-tions to explain ligand binding. This approach can beapplied to proteins of different classes to generatemultiple conformations suitable for design from asingle starting structure or multiple structures.

Starting with a crystal structure of the HIV-1 PRwith the flaps in the semi-open form, Toth and Boricscarried out six torsion-space MD simulations withtimescales ranging from 1 to 5 ns [Toth and Borics,2006b]. The simulations revealed that the semi-openconformation of the HIV-1 PR was stabilized by weakpolar interactions between the flaps on the twomonomers, which blocked ligand access to the activesite. Two distinct transition states between the semi-closed and open conformations of the flaps wereobserved. The flap-open conformations were stabilizedby intramolecular hydrophobic contacts between theflap and residues elsewhere on the same monomer.Hydrophobic contacts formed first in two simulations,and were followed by an asymmetric disruption of theintermolecular polar contacts between the flaps in thesemi-closed conformation. In two other simulations,the intermolecular polar flap contacts were brokencooperatively in a symmetric fashion, followed by theformation of the intramolecular hydrophobic contacts.In both transition states, the authors observed the flapslost their b-hairpin secondary structure and becameloops. The authors classified the open conformationsinto three distinct families based on key intramolecularhydrophobic contacts formed between the flaps andthe remainder of the protein. In all three families, theopen conformation allowed direct access to the activesite for ligand binding. The authors also observed a‘‘curled’’ conformation in which the tips of the flapsfolded down into the binding pocket, forming highlysymmetrical intra- and intermolecular hydrophobiccontacts. In the curled conformation, the approach tothe binding site was blocked for putative ligands.

Toth and Borics carried out torsion-space MDsimulations at the nanosecond timescale for two modelsof HIV-1 PR in complex with its natural substrate tostudy the substrate-induced closing of the flaps overthe active site [Toth and Borics, 2006a]. CA-p2 is a ten-residue peptide sequence from the gag/pol polyproteincontaining one site that is recognized and cleaved bythe protease. A model with the HIV-1 PR flaps in theopen conformation in complex with CA-p2 wasgenerated by taking the protein structure from theresults of the HIV-1 PR semi-open simulations andsuperimposing the CA-p2 peptide from a co-crystalstructure of the HIV-1 PR/CA-p2 complex. The authorsused the co-crystal structure complex [Prabu-Jeyabalanet al., 2000] of HIV-1 PR/CA-p2 with the protease inthe closed form as a control. The simulation of theopen form of the protein was carried out over a periodof 10 ns, during which one flap was observed totransition from the open to the closed form. Themovements of the peptide sequence were comparablein both simulations and after 10 ns the conformation ofthe closed flap matched that of the co-crystal structure.The mechanism of flap closure was observed to takeplace in an asymmetrical manner, with flap A collapsingto form interactions with the substrate peptidesequence over a period of �200 ps. During the next�2,500 ps, flap B collapsed to form a stable inter-mediate state characterized by the same hydrophobicclusters that were observed in the previous study offlap opening. This was followed by the formationof contacts between flap B and the substrate, andover the final 5,000 ps of simulation time, theflap equilibrated in the crystallographically observedconformation.

Fast Protein Folding Using Modal Monte Carlo

The utility of the Modal Monte Carlo (MMC)algorithm available in Imagiro was demonstrated by aprotein-folding experiment conducted using the Trp-cage as a test case [Carnevali et al., 2003]. Trp-cage is a20-residue polypeptide that is capable of spontaneouslyfolding into a stable tertiary structure [Neidigh et al.,2002]. The study used the fully extended conformationof the peptide as a starting point and a variety ofannealing schedules were evaluated. The most success-ful MMC was reported to have run at a constanttemperature of 500 K with the lowest energy con-formation obtained after 19� 106 attempted MC stepsand normal modes were calculated every 5,000 steps.The authors reported this took �4.5 days of computertime, as compared to over one year of computer timefor conventional MD. The reason for the significantimprovement in computation time for predicting thefolded structure using MMC was attributed to the fact

133SIMULATION METHODS IN FBDD

Drug Dev. Res.

that MD is prone to heavily sample high-frequencymodes, while sparsely sampling low energy modes. Thelowest energy model arrived at by MMC was comparedto the averaged NMR ensemble of 38 structuresreported by Neidigh et al. [2002] and found to have aCa RMSD of 1.370.2 A and a heavy atom RMSD of1.770.2 A.

FRAGMENT-BINDING SIMULATIONS

Between 2,000 and 2,500 individual fragmentsare simulated against a protein target during the courseof a typical project. These fragments may be broadlydivided by their character among hydrophobic, aro-matic, hydrogen bond donor, hydrogen bond acceptor,or mixed chemistry (Fig. 2). During the simulations,each fragment is treated as a rigid body. Thisnecessitates that energetically favorable alternativeconformations for flexible fragments be generatedand simulated separately. Imagiro is used to generatealternative conformations in advance, and charges areassigned using either the AM1-BCC method [Jakalian

et al., 2002] or the CHELPG method in Gaussian[Frisch et al., 1998]. A solvation correction termis applied using the generalized Born surface area(GB/SA) method [Qui et al., 1997].

Grand Canonical Simulations of Fragment Binding

The fragment simulations are carried out bysolvating the target protein in a box containing onefragment of interest. A unitless parameter, B, is used toevaluate the strength of the interaction between thefragment poses and protein. B is related to thechemical potential by:

B ¼ ðm� midealÞ=kT1ln Nh ið Þ ð1Þ

where (m�mideal) is the excess chemical potential, k isthe Boltzmann constant, T is the absolute temperature,and (/NS) is the number of fragments in the system[Clark et al., 2006, 2009a; Guarnieri and Mezei, 1996].The simulated annealing of the chemical potential iscarried out by beginning with a high excess chemicalpotential (high B-value). Once the system has equili-brated, the B-value is lowered in successive stages. At

indane

Aromatic

benzene naphthalene

S

thiophene propyne fluorene

Hydrocarbon

CH

methane

(Z)-but-2-ene cyclopentane

neopentane

norbornane

propane

Hydrogen Bond Donor

OH

ethanol

H N NH

NH

guanidine

HN

indole

NH

pyrrole

NH

methyl amine

OH

phenol

Hydrogen Bond Acceptor

N

pyridine

O

tetrahydrofuran

H S

thioether

O

acetaldehyde

SO

dimethylsulfoxide

N

acetonitrile

Mixed Chemistry

N NH

2-aminopyridine

OOHN

succinimide

NHN

imidazole

SO

ONH

methylsulfonamide

O

NH

NH

1,3-dimethylurea

O

NH

acetamide

Fig. 2. Sample fragments simulated against protein targets using the Grand Canonical Monte Carlo or systematic sampling algorithms.Fragments may be classified by their chemical character, which governs their interactions with the protein target and may be divided intoaromatic, hydrophobic, hydrogen bond donating, hydrogen bond accepting, or combinations of two or more.

134 KLON ET AL.

Drug Dev. Res.

each stage, fragments are either inserted or deletedusing the Metropolis Monte Carlo criteria. As theB-value is reduced, the bulk of the fragments solvatingthe protein leaves the box, leaving those that binddirectly to the target. With successive steps, thesefragments begin to leave the box until the most tightlybound fragments remain. The B-value thus provides anassessment of the relative binding affinity of thefragment at various pockets on the protein. Thisexercise is repeated separately for each of thefragments chosen to be simulated against the targetprotein.

Free Energy Calculation of Fragment Binding

T4-lysozyme and thermolysin were used as testcases to illustrate the utility of the grand canonicalsimulations to fragment-protein binding [Clark et al.,2006]. Both of these proteins have experimentallydetermined crystallographic and thermodynamic bind-ing data for small fragments such as those routinelysimulated against protein targets at Ansaris forcomparison. The binding site for T4 lysozyme is small,well-characterized, and highly hydrophobic in nature,while the thermolysin-binding site is much moresolvent exposed. The fragments used in the experi-mental studies tend to be rigid, thereby reducing thecomplexity of the sampling problem and enabling amore accurate analysis of the calculated B-values.

For the T4-lysozyme example, 14 fragments weresimulated against the crystal structure of lysozyme withN-butyl benzene (PDB ID no. 186L) [Morton andMatthews, 1995]. The fragment simulations were notrestricted to the binding pocket, but allowed to samplethe entire surface of the protein. For 11 of the 14fragments, the lowest scoring pose was found in thebinding site. For these 11 fragments, the r2 betweenthe calculated B-value and the experimentally mea-sured DGbinding was 0.57, and the standard error (SE)was 0.4 kcal/mol. The root mean square deviation(RMSD) was calculated for the 11 fragments whoselowest scoring poses were found in the binding sitebetween the crystallographically observed pose and thefull ensemble of predicted poses. These calculatedvalues ranged from 0.69 A for benzene to 2.96 A forethyl benzene. The higher RMSD reported for ethylbenzene was due to the fact that the lowest energyensemble allowed rotation around the torsion bondconnecting the ethyl group to the benzene ring.

A subset of the fragments was simulated againstthe L99A mutant of T4 lysozyme to compare theimpact of non-interacting versus interacting fragments[Clark et al., 2009a]. In the simulations wherefragments are permitted to interact, the repulsivenon-bonding (van der Waals) term is used, but

electrostatic interactions are disabled. The authorsdemonstrated how free energies may be calculateddirectly from the results of the converged Monte Carlosimulations by comparing the final fragment concen-trations with the concentration in the reference state ofthe ideal system. The results of these simulationsshowed that while both simulation methods reachedthe same results, the simulations of interacting frag-ments converged to the same calculated free energyvalues more quickly. The authors reported that thestandard deviation between the calculated and experi-mentally measured free energy values was 1.5 kcal/mol.

The results of fragment-binding simulationsagainst thermolysin [Clark et al., 2006] were comparedto multiple-solvent-crystal-structure (MSCS) experi-ments with acetone and isopropanol [English et al.,1999, 2001]. In the MSCS studies, the organic solventswere co-crystallized at varying concentrations withthermolysin. The crystallographically observed poseswere compared with the closest calculated poseidentified during the grand canonical simulations. Forboth acetone and isopropanol, the relative rankings ofsolvent concentration versus the calculated B-values forthese poses agreed well overall.

Systematic Sampling

Systematic sampling was used to obtain improvedresults for ligands binding to the T4 lysozyme L99Aand P38 MAP kinase test systems as compared to theGrand Canonical Monte Carlo approach [Clark et al.,2009b]. In order to calculate the free energy of bindingfor a ligand, the molecule is broken into its constituentfragments (Fig. 1). In the systematic sampling protocol,the six rotational and translational degrees of freedomare explored by treating each fragment as a rigid bodyand applying the necessary geometric transformationsthroughout the volume around the protein beingstudied. The molecular mechanics free energy iscalculated between the fragment and the protein foreach pose. In order to calculate the free energies ofbinding for a whole molecule, the constituent frag-ments are sampled independently of one another.Individual poses for the fragments are then linkedtogether if the bond distances and angles are within0.5 A and 151 of their ideal geometric values. In thefinal stage, the free energy of binding for a molecule iscalculated from the difference between its bound andunbound states. The energy of binding is calculatedfrom:

Ef ¼ Efragment-proteinf 1Esolvation

f 1Efragment strainf ð2Þ

where Efragment-proteinf is the fragment interaction energy

with the protein, Esolvationf the GB/SA solvation energy,

135SIMULATION METHODS IN FBDD

Drug Dev. Res.

and Efragmentf strain is the internal strain energy for the

fragment. Once all fragments for a single molecule arelinked together, the total energy is calculated from:

E ¼X

f

Ef 1X

f ;g

Einterfragment strainf ;g ð3Þ

whereP

f Ef is the total energies for all fragments in

the molecule, andP

f,g Einterfragment strainf ;g is the total of

the internal strain energies between individual frag-ments in the molecule.

The binding energies for a set of 16 compounds forwhich calorimetric data was available for the T4 lysozymeL99A protein were calculated from systematic samplingdata. The overall error of prediction was found to be1.22 kcal/mol. Fifteen ligands, which consisted of fourfragments each that bind to the p38 AMP kinase, wereconstructed from the systematic sampling data and thecalculated standard error of prediction was reported tobe 0.98 kcal/mol. The results of the studies on these testsystems suggests that the systematic sampling approachgenerates slightly better calculated binding affinities thanthe Grand Canonical Monte Carlo approach, and thatthe accuracy of the method is comparable to othercomputational tools reported in the literature for freeenergy calculations [Chang and Gilson, 2004; Lu andWong, 2005; Weis et al., 2006].

CONCLUSIONS

Our technology platform is based on twocomputational capabilities that greatly facilitate FBDDefforts. Protein simulations are carried out in torsionspace, allowing the rapid and accurate refinement ofprotein models using EM, MC, MD, and MMCalgorithms. Fragment simulations are carried out usingthe Grand Canonical Monte Carlo and systematicsampling approaches. Binding energies are calculateddirectly from the resulting fragment simulations thatare comparable in performance to existing methodsused to calculate the free energy of binding of ligandsto proteins. However, the ability to implicitly samplechemical space rapidly is even more important thancalculating accurate binding energies for the frag-ments. This approach vastly increases the number ofhypothetical compounds that may be considered whilesimultaneously reducing by several orders of magni-tude the actual number of compounds the drugdesigner actually has to evaluate. Rather than exhaus-tively enumerating a library of compounds as intraditional methods, the computationally driven frag-ment-based approach inverts the problem by consider-ing only those fragments that are predicted to have areasonable chance of interacting with the protein.

DISCLAIMER

The authors are employees of Ansaris.

REFERENCES

Aronovitz LG, Redican-Bigott G, Hormozi S, Klazkin J, Lichtenfeld D,Ulrich S. 2006. New drug development: science, business,regulatory, and intellectual property issues cited as hamperingdrug development efforts. Washington, DC: United StatesGovernment Accountability Office; Report number GAO-07-49.

Bain J, Plater L, Elliot M, Shpiro N, Hastie CJ, McLauchlan H,Klevernic I, Arthur JS, Alessi DR, Cohen P. 2007. The selectivityof protein kinase inhibitors: a further update. Biochem J 408:297–315.

Carnevali P, Toth G, Toubassi G, Meshkat SN. 2003. Fast proteinstructure prediction using monte carlo simulations with modalmoves. J Am Chem Soc 125:14244–14245.

Catto E. 2003. Method for providing thermal excitation to moleculardynamics models. US Patent Application 0187626 A1.

Chang CE, Gilson MK. 2004. Free energy, entropy, and induced fitin host-guest recognition: calculations with the second-generationmining minima algorithm. J Am Chem Soc 126:13156–13164.

Chen J, Im W, Brooks 3rd CL. 2005. Application of torsion anglemolecular dynamics for efficient sampling of protein conforma-tions. J Comput Chem 26:1565–1578.

Clark M, Guarnieri F, Shkurko I, Wiseman J. 2006. Grand canonicalMonte Carlo simulation of ligand-protein binding. J Chem InfModel 46:231–242.

Clark AM, Meshkat S, Wiseman J. 2009a. Grand Canonical Free-Energy Calculation of Protein-Ligand Binding. J Chem Inf Model49:934–943.

Clark M, Meshkat S, Talbot G, Carnevali P, Wiseman J. 2009b.Fragment-Based Computation Of Binding Free Energies BySystematic Sampling. J Chem Inf Model 49:1901–1913.

Diller DJ, Li R. 2003. Kinases, homology models, and highthroughput docking. J Med Chem 46:4638–4647.

DiMasi JA, Hansen RW, Grabowski HG. 2003. The price ofinnovation: new estimates of drug development costs. J HealthEcon 22:151–185.

English AC, Done SH, Caves LS, Groom CR, Hubbard RE. 1999.Locating interaction sites on proteins: the crystal structure ofthermolysin soaked in 2% to 100% isopropanol. Proteins 37:628–640.

English AC, Groom CR, Hubbard RE. 2001. Experimental andcomputational mapping of the binding surface of a crystallineprotein. Protein Eng 14:47–59.

Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA,Cheesman JR, Zakrzewski VG, Montogomery Jr JA, Stratmann RE,Burant JC, et al. 1998. Gaussian 98 (Revision A.9). Pittsburgh, PA:Gaussian, Inc.

Guarnieri F, Mezei M. 1996. Simulated annealing of chemicalpotential: a general procedure for locating bound waters.application to the study of the differential hydration propensitiesof the major and minor grooves of DNA. J Am Chem Soc 118:8493–8494.

Jakalian A, Jack DB, Bayly CI. 2002. Fast, efficient generation ofhigh-quality atomic charges. AM1-BCC model: II. Parameteriza-tion and validation. J Comput Chem 23:1623–1641.

136 KLON ET AL.

Drug Dev. Res.

Katrich V, Totrov M, Abagyan R. 2003. ICFF: a new method toincorporate implicit flexibility into an internal coordinate forcefield. J Comput Chem 24:254–265.

Kufareva I, Abagyan R. 2008. Type-II kinase inhibitor docking,screening, and profiling using modified structures of active kinasestates. J Med Chem 51:7921–7932.

Lu B, Wong CF. 2005. Direct estimation of entropy loss due toreduced translational and rotational motions upon molecularbinding. Biopolymers 79:277–285.

Machrouhi F, Ouhamou N, Laderoute K, Calaoagan J, Bukhtiyarova M,Ehrlich PJ, Klon AE. 2010. The rational design of a novel potentanalogue of the 50-AMP-activated protein kinase inhibitorcompound C with improved selectivity and cellular activity.Bioorg Med Chem Lett 20:6394–6399.

Meshkat S, Zou J, Klon AE, Wiseman JS, Konteatis Z. 2010.Transplant-insert-constrain-relax-assemble (TICRA): protein-ligandcomplex structure modeling and application to kinases. J Chem InfModel (in press).

Morton A, Matthews BW. 1995. Specificity of ligand binding in aburied nonpolar cavity of T4 lysozyme: linkage of dynamics andstructural plasticity. Biochemistry 34:8576–8588.

Morton A, Baase WA, Matthews BW. 1995. Energetic origins ofspecificity of ligand binding in an interior nonpolar cavity of T4lysozyme. Biochemistry 34:8564–8575.

Neidigh JW, Fesinmeyer RM, Andersen NH. 2002. Designing a20-residue protein. Nat Struct Biol 9:425–430.

Pargellis C, Tong L, Churchill L, Cirillo PF, Gilmore T, Graham AG,Grob PM, Hickey ER, Moss N, Pav S, et al. 2002. Inhibition ofp38 MAP kinase by utilizing a novel allosteric binding site. NatStruct Biol 9:268–272.

Prabu-Jeyabalan M, Nalivaika E, Schiffer CA. 2000. How doesa symmetric dimer recognize an asymmetric substrate?A substrate complex of HIV-1 protease. J Mol Biol 301:1207–1220.

Qui D, Shenkin PS, Hollinger FP, Still WC. 1997. The GB/SAcontinuum model for solvation. A fast analytical method for thecalculation of approximate born radii. J Phys Chem A 101:3005–3014.

Regan J, Breitfelder S, Cirillo P, Gilmore T, Graham AG, Hickey E,Klaus B, Madwed J, Moriak M, Moss N, et al. 2002. Pyrazoleurea-based inhibitors of p38 MAP kinase: from lead compound toclinical candidate. J Med Chem 45:2994–3008.

Rosenthal D, Meshkat S. 2004. Efficient methods for multibodysimulations.

Toth G, Borics A. 2006a. Closing of the flaps of HIV-1 proteaseinduced by substrate binding: a model of a flap closingmechanism in retroviral aspartic proteases. Biochemistry 45:6606–6614.

Toth G, Borics A. 2006b. Flap opening mechanism of HIV-1protease. J Mol Graph Model 24:465–474.

Wang Z, Canagarajah BJ, Boehm JC, Kassisa S, Cobb MH,Young PR, Abdel-Meguid S, Adams JL, Goldsmith EJ. 1998.Structural basis of inhibitor selectivity in MAP kinases. Structure6:1117–1128.

Weis A, Katebzadeh K, Soderhjelm P, Nilsson I, Ryde U. 2006.Ligand affinities predicted with the MM/PBSA method: depen-dence on the simulation method and the force field. J Med Chem49:6596–6606.

Zhang J, Yang P, Gray NS. 2009. Targeting cancer with smallmolecule kinase inhibitors. Nat Rev Cancer 9:28–39.

137SIMULATION METHODS IN FBDD

Drug Dev. Res.