A Comparison of Heuristics Search Algorithms for Molecular Docking

Embed Size (px)

Citation preview

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    1/18

    It ,$,r"v 2oe@ournul oJ C.,ry'uts-Atulad Molc.ulo Dari8,t, ll (1991) 2n9 t8.@ 1997 Klueer Acadenic Publisherc. Printed in The Netherland:.J CAMD ]95

    A comparisonf heuristic earch lgorithmsor molecular ockingDavid R. Westhead+, avid E. Clark*+ and ChristopherVl Murray***

    Proteu! Mole.uLlr Derign Ltd-, Prcteu! Howq Lyne Green RusinessPark, M.rct:lesfeu, Cherhirc SKt I qJL, U.K.

    Yff:.fl,|:;:fl5"i3'lKcywoftb:Lig;rnd+rorcin docking; Molccular rccognitioni volutionary algorithms;Simutrrcdanocaliog;Tabu scarch

    SummaryThis paper describes the implemeDtation and oomparison of four heuristic search algorithms (geneticalgorithm, evolutionary progmmming, simulatedannealingand tabu search)aod a random iearchprocedure for Eexible molecular docking. To our knowledgg this is the first application of the tabusearchalgorithm in this arca. Thc algo.rithmsare comparcd using a rccently doscribed fast moleoularrecognition potcntial function and a diverse set of five protein-ligand systems,Statistical analysis of theresults indicates that overall the geneticalgorithm perforrns best in terms q[ thi fredian energy of thesolotions located. Howeve4 tabu searchshow$a better performanc id tefins of locating solutions closeto the crystallograph icigand conformation.These esults uggest ha! a hybrid searchalgorithm maygive ruperig. rcsults to any of the algorithms alone.

    IntroductionThe safeand effectiveacticn of a pharmaceuticalagentwithin the human body dependsup6n thc selcctive ecog-nition of the drug moleculeby the appropriate argetreceptor,This moleclrlarrccognitior is governedby theinterplay of a number of factors such as steric, electro-staticand hydrophobic nte-ractions.The sum of the freeenergyof these nteractions s termed the ,inding affnityof the molecule or the rcceptor and is govemed, n part,by the goomotry of the ligand-rcceptor complex. Theearly stagesof drug discovery usually termcd leadgener-qtior, could be significantly expedited if there existed amethod whereby the geomctry of the ligand-receporcomplexand the binding af6nity of a givenmolecule ora receptorof known structurs could be reliably estimatedwithout rcsorting to th expcrimental techniquesof syn-thesis, co-cryBtallisationand assay. In computer-aidedmoleculardesign CAMD), the search or methods ior he

    ab nitio prcdictionby computerof tbe bindiDg eometry

    and binding affinity of two molecules s termed the'dockingproblem'.Becausef its potentialapplicationnCAMD, thedockingproblem as eceived uchattcntionover the yearsand the progressmade bas been eviewedin a numberof recentarticles-6].The earlicstdocking programsconsidered nly thetEnslatiooaland odentational egrces f frcedomof theligand with rcspect o the receptor,e.g.Rel 7. However,more recently,with advances n ths powgr of the avail-able computer hardware and increasinglysophisticatedsoftware algorithms, it has becomcpossiblq o.takp intoaccount routincly the intemalconformational flexibilityof the lignd ('flexible' docking) [8-23]. Limited confor-mational flexibility of the recptor s also beingpermittedIn some approaches1,1q, Clearly,as the numberofdegeesoffreedom beingcxplorcd o thc dockingprobleminqeasa!, the size of the search space rapidly becomesenortrous;Gehlhaaret al. [4] estimate hat, in one oftheir exampleq the searchspacc comprised at least 10D

    solutions.Facedwith sucha situatiolr. t is obvious hattPresentaddress: MBL Oulstation, EuropeanBioinformatics nstilutq WcllcomcTrust GcnomeCampu.xHi (ron,CambddgeCBIO ISD, U.K.'+Prsentaddress:Drgnham Resarch entrq Rh6ne-Poulenc orer Ltd.. Rainham RoadSouth, Dagcnham,EssexRMlo 7XS, U.K.tl'To whom corrcspondencsshould be addrcsscd.Abbrcviations: GA, g.nenc algorithm; EB cvolutionary programming; sA, simulated annealing; Ts, tabu search; Rs, .andom search;COM, centr.of m6ss; rms, rcot-mcan squaE; [CM, internal coordinatc modelling; cPU, cntral processing unit; DHFR, dihydrofolatc reductale; MTx,mcthotrexate; DANA, 2,3dehydro-2deoxy-/V-acetylncummiDic acid; NAPAB dr-(2-naphrhyl-sulphonyl-glycyl)-Dl-p-amidinophenylalany!pipcridine; FIFO, fi sr-in, i sl-ou!.

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    2/18

    2t0fast andeffectivesearchalgorithmsareof crucial imDort-ance f the dockingproblems to bc tackled uccessfullvThe purpose f this paper s to compare heperform-anceof four heuristic earch lgorithmswhenapplied omolecular ocking.These our arc simulatedannealine(SA), a geneticalgorithm (CA), wolutionary programlning (EP) and tabu search iS). While SA, GA and Ephavecach beenapplied to the docking problem in theexistingiteraturqto our knowledgeherehas beennopreviousattcmpt to use he TS algorithm. This work thusserveso introduce his algorithm to the freld and tocompa-ret with algorithms lreadyn use. he algorithmsare compared,over a numberof test cases, ccordingtotheirability o locateoptimaof an objectivdenergyJnc_tion designed or use n the docking problem.In developing successfulockingmethod,consider-ations egarding he energy unctionare at leastas im_portant as thospertaining to the sarchalgorithm. Theminimum value(s)of this function should correspond o

    the prcfencd bioding mode(s)of the ligiaod,and ihe ulti-matc goal should be a correlation between he valuesofthe function and the binding amnity of the ligand. Forthis work, since our aim is the comparison of seanchalgorithmg wehavestudicda singlecnergy unction. Thiswaschosen o bc the function recentlydescribedby Gehl_haar et al. U4,l5l, becausct is very fast to evaluatecomputationally,and becauset hasbeendemonstratedobe successfuln a number of docking application& Thechoice o comparealgorithms usinga singleenergy unc_tion simplifiesour studycoosiderably,and, sincepotentialfunciionsused n docking tend to sharcsimilar character_isticq thcre are good rasons o believe hat conclusionsdlawn about algorithmic performanceusing this energyfunction will probably apply when the algorithms arcusedwith diffeEnt functions.A study clmparing variousdocking-energyunctions in termsoi their abilit-yto pre-dict binding modeand afnnity would beof greai interestand is plannedfor the futurc.Comparisonof heuristic algorithms is dimcult for twomarn rcasons First, each catcgory of algorithm coversmany different possibilitics ior implementation, each ofwhich will perform differentty for a given optimisationploblem. For instancq within the GA category it is poss-ible to implement a onc-point crossovcr, a twoj)orntcKlssoyer, r somethingmore complicated,atrd therare

    TABLETESTCASES SEDN THEPRESENTTUDY

    many othe! possiblesourcesof variation.Second,heperforrnancef eachalgorithrndepcnds n a setof ad_justableoperationalparameters, nd the quality of thcresults epends n the extent o which hese reoDtimalfor a giventest case.Our approach o the first of theseproblemshassimplybeon o seek o implement all the algorithmswithout biasand with no preconceiveddea as to which alsorithmshouldbedctermined he .besl'.We havealsoso--uehtoimplement ach of thc algorithms n a fairly ,stan-dard.mariier.For instancqmaintaining he GA examplqouralgorithmemploysust a simpleone-point rossoverndrcndom mutations. t is possiblg indeed ikely, hit amore sophisticatedGA woutd perform better than oursimple one, Howwer, the more sophisticated he alEo_rithmq the moredifficult the comparison, ince ophi-si_cation bringswith it more adjustableparamcters o opti-miseand_a ifhcultchoice\which of themanypossiLil_rtres or Increasedophisticatibn houldbe chosen.t isour bciief that the besr algorithm for the docking pro6iemrs probablya hybrid of various ypesof algorithm. t ishoped that by comparing fairly simple implcmentationsof.eachalgorithm,our studywill point to desirable lgo_nth[uc characteristicsor use n hybridatgorithms,The second problem, that of opfimising oprationalparameters theso,called,meta-optimisatio[' problem), salso oot stndghtforward. It is almost impossible o guar_antee hat any given set of parameters hosens truly'optimal', particularly f the trnrametersre coupled ns9me.]vay. n this work, a set of parametersor eachalgofltlunwassoughrwhichperformedwellon all the estcases od c"\tcnsive tuning' experimentswerecaried outto this eod.Our cxperienchasshown hat it is imDortantto tuneparametersover a number of lestcases ecausetis possibleo over-optimisehe performance n one estcaseat the expeDse f the others if only one example s

    _used.Cleqrly,.t is a very dgsirablecharacteristicof adockilg algorithm that parameters bc transferablebe-tweenestcaseq nd algorithms or whicb his s not truewill be penalisedn this type oTitudy.Our compqiso_nof algorithms is carried out ovor Rvetestcases sspecified n Table l. This numUeras;uagedto.be,sumcie[t to allow generalmnclusionsabout a[o-rithmic_.performance o be drawn, while retaining ihepossibility of a detailed discussionof each est case.The

    Enzyme Li8!od No. of rotarrble bonds PDBco& RcferneDihydrofolateEduclasclnduena virusncuraminidascHIV-I p.otcaseThrombio

    McthotrrxateDANAxK263NAPAPArgatroban

    74867

    3DFRINSDIHVRI ETSIETR

    5758IThestructurcs f rhe igandsandthede6nitionof th"i. -t"t"bt" bond,"r" giiJi-illJ

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    3/18

    lestcases ere hoseno berelevant roblemsn CAMD.to be of varyingdifficulty as docking problems, nd toreflectdifferentaspects fmolecular recognition.Someoftheproblems mphasisehe ormationof hydrogen ondsin molecularecognition, thelsemphasisetericit to theacttve itg and otherscontaina moreequalmixtureofthe wo.In addition o thecomparison f algorithmqa methodis describedwhereby he docking problem can be set upin a verygeneralway, ncluding igid bodyand rotatabiebonddegrees f freedom n the ligand, rctatablebonds nthe activesite of the recptor,and the variable presenceor abseDcef crystallographic aier molecules.n thispaper,however, nly the liganddegrces f freedomarcused, he othersbeing eft for a later publication. t isalso llustrated ow,within hismethod, ach fthe searchalgorithms an be mplemented ith minimaleffort,andwith much important code in common.The resultinssoftwarewill be refened to as pRO_LEADS (liganievaluation y automaticdockiDg tudies) nd formspartofour Prometheusystemor molecular esign nd simu_lation.MethodsPrcblem eprcsentaronDegrees ffreedomIn order for solution of the docking problem,asstatedin the Introduction section, o be feasibleusingcurrentlyavailable methods and computational rcsourceq it isnecessaryo rnakeccrtainsimplificationsFirst, neitherreceptornor ligand can beconsidered o be fullt flexible.I he receptormust beconsidered igid exceptperhaps orsome imited flexibility in the activesite.Since he lieandis typicallymuchsmaller han the receptor,more leiibil_ity canbeconsidercd,lthoughmostmethods nly allowligandflexibility through rotations about rotatablcbonds.Second, n active ite or the receptormustbedefinednorder o restricthe regionofspace n whichsolurions resought,ODly with thesesimplificationss the solutionspaC f thc prcblern sulficiently small that a goodheuris_tic algorithm could becxpcted o find optimal solutionswith a reasonable uccessate.._Thedocking methods implemented in pRO_LEADSallow the following degrees f freedom;(l) Ligiand translatio! - the tigand is frce to move

    within a user-specified ox defining ihe active sitq if the@ntroidof the igand movcsoutside thc boxt a uscr-spc-ifiedpenalty s added o thc ener$/value(2) Ligandorienration the ligand has full orienta_tionalfreedom.(3) Ligand flexibiliry - the ligard is considered lcxiblethrough a list of rotatable bonds The rctatable bondscan be specifiedby the useror assignedautomatically.

    2tl(4) Re{ptor ctive sitc residues thcsecan be con_sidered lexible hrough their rotatablebonds The usercan choose he degreeof flexibility. The availableoptionsare all rctatablebonds n the side chainor iust the ter_minal rotatable ond-(5) Crystallographic atermolecules if watermol-eculesarc present n the X-ray structure of the recaDtorthesecan b definedro be .variable..In this case hedocking algorithmssearch or solutions n which thewatermoleculesmay be either present r absent.In this article weexplain how our approachdealswithall the abovedegrees.ofreedom.For the resuttsDres_ented,only the first three degrees f freedomon rheabove ist wereconsidered ariablq investigation f theothersbeing eft until a later publicatron.

    Docking va ablesThe ICM tree [24] provides a complete ntemal coor_dinate dgscriptionof an assemblyof molecules,heirintcrnalconformationsnd relative ositions nddirienta_tions.This makes t an idealbasis or rhe choiceof thevariablesor use n docking and thus a verysimilarschemehasbee! implementedn this work. In pRO_LEADS. thedocling variables rpprcsenting the relative position ofligand and recptorand their intemal conformationsarea subsetof thevariables rom the intemal coordinate ree.This. s illuskared n Fig. l. The rcceptor s.copsidercdnxed.tnspacgand the relativepositioosof the feccptorano lrgand are govemed by the rigid body-vadables3t_tach.gdo the igard. With the loration ofFig. I theseare{Bl,Vl,Tl,V2,T2,T3}, and can b irterpretedas bond .lengthq and valenceand torsion angle.,with rcspct o. :thg fixed triplet of virtual atoms at lhe root of the tree.

    Variables representing lexibility of ligand and receltoraietonion angles aken fiom the internal coordimte iree,. The.docking variableqas manipulatedby dockingalgonthms,are storedas a stdng of real numbers. hehrstsrxare he rigid body variables nd after theseollowthe variablesor ligandand recptor lexibility.At rheeno or rhevanable tnngare storedvariables ontrollinsthe presence or absence of variable crystaltographi!watcrsusinga method similar to that suggestedndepen_dently by Readet al. [2j].

    ^ Mostofthe ener$/ unctionsor use n docking equireCartssian coordinales for the ligaud and recepior mol_ccUles,t is necessaqfherefoE to car4r out interconver- .sionsbetweenhe dockingvariableslringand Cane ancoordinates.This is accomplishedby a single sofrware' 'modulof PRO_LEADS which is ca.lledhe codingmod_ula The interfact of this module to the outside worldcomprises hreeroutines: Initialcode, Codeand Dccode.Initialcode s calledonce or each igand/receptorystcm;thrssetsup an initial variable tringconsistcntwith Car_tesian coordinates of the molecules iind user optionsrelated o ffexibility. r alsosetsup privatemodutedata

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    4/18

    212

    /**,

    @l |rb! l Ioth'rvtr i 'bts"Fis,.Deriva,ionor,hcd.****r,::l"l,llL"#],:'5{,fi-#i::#[*1":i:"::*:i$g:n*hi"l,"t*aienccnglc nd a toFionansrc 'T ":Iil#;?;"Iogmphic warenarcappendillLrstratcdariablsenotinghepresencer

    DckiDg vffiable stline

    Br vr l r l lv2 T2 T3 T4 T6

    l"T'"ir'l ,yp.""r *;"ute mavalsobeaddedcontaidnsthemapp--r:i:T-t-Ti:lf#ii"9l1;:f,:l: *Tl-t)',",lJ""r"

    "uonchosenrorthisworkisthatduetoeceprorndisandt-lt*:,-l lll"J ;iiit"a", *u, rheenereitiJli.""r' """'i"ore'fierhe allo In*lil-t:o.t^"in:: ""iii#"J,1t. iii,iii. iii. r"n"tion't'i"hsspecificallvto codeandDecodeonvertrgaoond cceptorar- 9tliit:::'i;.':o'"tiie uppti"ution',omprisesouri:#:#;;;j;to dockingariablesnd ice ersa desisnedorrasrosffi'Ht#:l;il;t;een iigand ndwithinhis esisn,mprementat'"'"i'-"i-l*r'i"t '"'-'j':h:#;3Tflj;r;J::#'1""1;TJ;ffl*o.tl'J,,jff:T;3"'J:?*"m",1*;llq*qs,:,,:_4"##:ff*"pllli"ii:t".l";:ltfT*'::J:'.Ifr[;;,1,"jil?""f:"J:iil;::,iii#i,"-"i",? *:l*ff;i]i,l',,i;,,"" termsapairwiseum#ilil;;:a,iJ;of algorithmsuicklvsimilarlv'he ln:-l:l-".d protein eavv toms, ach-termakingenersl/valuationalled veacn tgorithmsperformed :1" lt^giil'il#;;;"il;" Fig 2 The unctioni, iii,"n'"."*r* whicht he":[:'il:[:lTiIJ i,:i"f,"o:'il"i",, e-n, andheorcerd,emProvsswitthoveruseroptionsThis allowlil**","il;;,;il;ry:::*lfi:;::l',f;'l:;lli:tffi5l';ll'fi'fl':"J,i"'1"""i:i:'?$It is worth noting at thls stage ntalgorithm frena"p"na,upon.onj'""*r"*j".;; "i an. int.ru"tion.'ir'a,r.0,4.5'5,-0 '20'0) Atom typingas-iulurci acr,o"r,ing',iuur".,J;,lT,"Jf,r* ;m:.h,ft-ryU!$lh*TilJ,1;:ff1lg,t*:Ufl'flXl#,'lifi;p,.p.*["ir,",_.:::::.iii5,fiLl_,*t::yl*:l;f#"J3;l;i;,,;ff;i.'i"";""'*nili:l,ilf";15""t;""|[: tfl.,Tili};;;;;;J'",yiurrog,"pr,i"arers)..oron'liii"-ri""tt rr'ouraeaccountedor,nlht::iT-:l:':L: )],ii,"#f." i,,la "rpruo. rr" type r nteractione-li"oj'*""]"'*'r.r :1.:;j-'lfli.t"t""L'Jj$lH t-ilitllill;-r;i,i, s'""nuur":l:-:'o*"-*mq:q,tili{ii::;:i"*:.T,il:"J;:**m:,*;;i;:*llil*in"u.on""naoiu tongigand ere.chosen'l"li:1i:::: il'l",l"i"J O""ialngaproponionf he adiusf the[i:n*:, fli.;::i:l;:l'll';;:'i'i: ""ou"" lliffi'il;;;:*i'on^or "boundingoxncom-

    II

    t

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    5/18

    Fig.Z Thenon-boodednteractionunction f tbc dockingoe.gyfunction fcehlhaar r al. 41.passinghe molecule.n general,t hasben ound thataddinghalf the radiuscreates grid of sumcient ize oencompasshe movement fthe molecule uring dockingwhile emainingwithin reasonable emory imits_ n allexper-imentseported n this paper,a grid resolutionof0.2 A wasused.An additionaloption availablewith thisenergy unction is to scaledown repulsivecomponents nthe, nteraction energy by a user-specified actor; manydocking algorithms makc-use of this option iii 6rder tofacilitate sarching@y allowing the ligand to penetratethe receptor) n the aady stagsof a docking nid; but itwasnot used n any of the examplesquoted in this paperThis wai becauseor two ofthe algorithms (IS and RS),it wouldhavebendimcult to implementhe scaling n amcaningful way. However, it is likely that, for crtaincasegsuch scaling may prove beneficial to the dockingprocess.Recently,Verkhivker et al. 126lhave suggestedthat bettersearching haracteristics reobtained when heF valiie of the eoergy unction is set equal to 4.0; how-ever,all the results n this paperweregeneratedusing thefunction itr its original formulation.' The ntemal energyof the igand is the sum of torsionaland i[tcmal clash terms,The latter term is a penalty ofl0 000when he distancebetweenwo non-bondedatomsbecomesess han 2.35A. Thc forme! term has he form

    E = A(l - cos(n0 Oo)) (l)wherc0 is the torsion anglc,for sp3-spt bonds A = 3.0,n= 3, 0o= n, and for sp'-sp'! bonds A = 1,5, n = 6, 0o= 0.Other typsof bondscannot be considercdrotatablc,The 6nal tefm in the eorgy uatio& is apeoalty forleaving.thcbox.dooning tha activc site, TWo options areavaihble: the fiIst attaches he pnalty to solutions withthecentrcidof the ligaod outlide thc box, ard the secondattacheshc peoalty to each igand.atom which falls out-sidc the box. In this work, thc former option ha.sbeellpreferredbecauset constraiusposition while allowing fullorientational frcdom.

    SearchalgotithmsThe different, earchalgorithms usedare briefly de-scribedbelowand Figs 3 6 give the schemes sed orSA, EB TS and GA, respectively.n.-thecase of tabusQarch,more d. priqgion is given since his is a novel

    approach o the docking problem. For eachsearchalgo-rithm, the pa&meJer.s avebeen choseqso that the. otalnumbrof enerry evalualions per docking ruo_wasap-proximately200000.Ltitisl confomationsFor the purposesof the test casesused n this paper,the stadn-g po-sitianand odentatiol of tbe ligand wererandomiscdwithin th box defining tho acJivesite. Alltorsion angle dg;king variables were also randomised.Soqpjlgadlhlr!.! require only one suchstarting position(SA and TS), others (the cA and EP) requirea popula-tion of them. The software alsoprovides he option of auser-specified tarting confolmation.Simulatedanneali\gThe simulated nnealing lgorithm 27,28]ollows heschente.llustrated iD Fig. 3.Within our implementatioo, thepcturbations requiredto gelerale new solutions arc random nunibe$ drawnfrom either the Gaussianor Cauchydistributioq at thechoice of the user. The use of the Cauchy distributionwasmotivated by the fast simulatedannealingalgorithmof Szu and Hartley B9| Both options were tried for a[umber of tcst cases,and, sincno particular advantagewas found for the Cauchy distribution, the_GaussiandistributioDwas ured for the examples itcd in tbis ppper.Perturbations o angular variableswere orced o lio in aDappropriatedomain (-n,fi] for iorsion angles and [0,r]for valence angles) by translating any out-ofuomainvalues hrqugh multiples of the domain size.The size ofthe perturbationsgenerateddependson the width of rhegenerationdistribution (standarddeviation or theGauss-ian, smi-interquartilerange for the Cauchy).Mthin thealgorithm this is set as a proportion of the sizebf theallowej domain for each variable(e,g.a proportion of2trfor torsion angles n the ligand), and this proportion isthe same or arry variablc. For the variablesBl. Tl and

    TABLEPAIRWISENTERACTIONSN THEDOCKINGPOTENTIALFoNqflONNon-polar

    DonorAccptorBOttrNoajrolarHB: hydrcgenbqnd intcractioq S: stcric inrcraction.prciein atomlyFs arc istcd orizontallynd igrndaton types cnically.

    SHBHBSHBSHB.SHBHBHBSssss

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    6/18

    2t41. Generate tarfingpoint either hesameas theinput olution r by randomisinghedocking ari-aD|es2. Beginoop over the number f temperatures(a) Setcunent olutiono thebest rompreviouslemperature(b) Scaledownwidthof generationistributionaccordingo temperaturet option s chosen(c) Beginoop over the nunlber f trlals(i) ceneratenewsolution y perlurbing ur-rentsolution(ii) Evaluate nergy f newsolurion(iii) Decidewhetheror not o accept th newsolulionusing Metropolis rltedon(iv) lf accepted,et curent solutjono newsolution(v) Update estsolution t this emperaturefnecessary(d) Update estsolution verallf necessarv

    3. OutputbestsolutionoundFig. l. Sinulated ahnealingalgorithm.

    V I , the width wassetwith respect o a maximum_allo}adtranslation-qual o the tengrhof the ongesr ideof theoounclDgbox, rather han the sizeof the alloweddo_main, usingscaling or ihe angular variablesas descdbedin the Dockingvariablesection.The nitial width of thegenerationdistribution is a user input (a value of 0.05wasused or the xampls ited in this papcr),An optionexists o scale his width down linearly with temperature,resulting in smaller perturbations being used at lowertempmtures;howsver, this was found to provide nosignificantadvantaggand so wasnot used..The_algorithm is driven by a user_specified ooling, schg{qlg 99-.r4pris!ng ser of mqnotonically decreasin!tempmturcaanda numberof.trials at oach mperatue,The effcctivenessf the algorithm n optimisinj the sys_temeDer$' s stronglydcpendcnton thc cooling scbedulaThe cxamplescited in this paper all used hc sane cool-lng schedule. his was {T, = 4Q966;Trr = O.880lTi, =2,-..,20),with 10000riatsat each emperature.hevalueof theBoltanann constantusdwas0.0 9g6 kcal(nol K).Thc temperaturesould only bc consideredrcal' if theunit of the energy unction were kcaUmol,which is notthe case or the cnergy unction used n this paper,Evoutionaryprog ammingThe EP algorithm follows the frameworkgiven n Fig.

    Each individual in the population is representedby apair of real-valuedectors.One vectorstoreshc dockinsvariables escribedn the earlier sectionanJ thi otheiholds parameiersguiding self-adaptive mutatiou (vide

    infra). In El -offspringare createdfrom parentsbv mu_tation. Tradilionally.a mutation operatorbaseduponGaussianrandon numbers has been used, but recentwork by Yao and Liu [31]suggestshat more.apidcon_vergeJrcean beobtainedby usingCauchy andomnum_brs nstead.Furthermorggoodresultshavebeen oundusingself-adaptive mutation parameters [32], which al_lowsthFmutation to mould itself to the searchas t oro_cceds[301. n PRO_LEADS,self-adaptivemuratio; isalw4ys usd and Cauchy random numbers bave beeninvestigatedas an altemative to the traditional Gaussianoperator.^FollowingSaravanant al. [32],setf_adaptiveutationot a parent x,o) to an offspring x,,o')can be ormulatedthus:

    oi = qexp(r,N(0,1) r\(0,1)) (2)and

    xi=\+oiNr(0,1) (3)whercx is thc vector of docking variablesand c is theassociatedcctor of mutation parametersN(0,1) is anormallydistributed andom number$,ith a meanof 0and a standarddeviationof l. The parameters and darp commonly-seto (.{2VnfIand (i2n')-', respecrively,wheren is the length of vector x.For fastevolutionary rogramming, ao and Liu [3t]suggesthat Eq. 3 be replacedby

    xi=x,+4q (4)

    1. Createan Initial poputafionof solutions2. Evaluatehe tnessof eachpopulation emberusing heenergy unclion3. Create offsprlng fr-omatt parentswithoutselec_tion_usingmutationooerator4. Ev4ratg fihegs.of offspring5. Loop over alloffsprlng for toumament etection(a) Randomly hoeseN otheroffsprings oppo_nenlsfor toumament(b) Scolr a win each ime the chosenoffspringsmoreft than its opponent(c) Rank this offspringby number of wins in itstournament6. Seleel oFranklng solutions s newpopulation7. lf userdefinednumberof generafionssexceaded, top. Elsegoto 3

    Fig.4. Evolutionaryprogiammingatgorithm.

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    7/18

    where is a Cauchy andomnumbervariable. ollowingthcsc uthor\ thc Cirussi4ll crlurbutionofthc nr tutioDparameters asbeenmaintained. t may b that therearcbette! schemes or use with Cauchy mutation; this isbeing nvestigated[31].For all the EP exprimentsescribedn this paper,populationsizeof 2000 ndividuals was used and theevolutionary search took placc over 50 generations Ineachgetreration,every parent gaverise to two childrenand thc number of competitors in tbe selectioo ourna-ments wasset to five. lnitial testsshowed hat usincCauchy atherthan Gaussiannndom numbers icrmuta-tron gavesuperior results and so all EP runs used theformer distribution, The initial value of the mutationparamctero wasstat 0.075 or all the docking variablesexceptfor two of the rigid body variables, or which itwasscaledas describodn the Docking variablessction.TbbuseqchThe modem form of tabu (or taboo) soarch s due toGlover [33,34]and *as originally applied to problems nthe feld of operationsrescarch.Morc recently,howeve4tabu searchhas begun o athact atlqrtion a$ an offectivheuji_stiJ arch procedurc or combinatorial optimisalionproblems.iDmolecular design,suchas the evaluation ofthe chemicaldistancebetwcen wo molccules[35].Otherworkers in the molecular design field have employedrelated aoncpts 36-38], but, to ouI krror',/ledge,hispapr reports the 6rst application of tabu search o thedockingproblem.As its name suggestq abu search s concemedwithirnposing cskictionsto eoabtexarcn processo Degoti-ate otherwisedimcult rggiqDs33].These estrictions akethe form primarily of a dr4 /rr, whiqhstores a 4urnberofpreviouslyyisited solutioos or regionsof space By pre-vetilg the search rom rcvisiting these regions (exrptunder spcialconditioDs,vidc infra), the explqration ofthe scanch pace ao be encouraged.Our implcmeutationof tabu search or moleculardocking is presentedn Fig.Tabuscarchmaintainsonly oncur_ntsolutionduringthe courseof a searchand the initial solution ic chosn(vide supm) at the start of the run. From this currentsolutiotr, a use_r-defircd-umber of .movA-rT'g;;;ratcdbya mutatiolJike proceduren whichGaussianor Cauchyrandom variablesare adddto cach of the docking vari'-ablesn thecuflentsolution,Eachof thsemo]es s thenscorcdusiog he energy unction 4nd they are thcn rankd

    in ordcr, with the bst moveat the head of the iist. Themove!3ryl4aminEd i&rank order. Moves are considered'tab!' if they gleratesolutionswhich are not sumcientlvdifferent rom those oiutionsn the abu ist.The thresh-old measureused in this work to determine the tabusratus or otherwise of potential movcs is a rool-ElqaqsquqryGLns) measuredover heavyatoms) of 0.?5 A or

    215less etweenhe wo solutionsbeingcompared. he high-cst raDkingmovc(titbu ()r D()t) s itlwaysactcptcd f ilsenergy.s.lower han. he lowestenorgy o far. Otherwisethe algorithm chooses hc best non-tabumove, If neitherof these riteriacan be met, the algoritbm erminatssI[ 4-few-curreltsolutioncan be found, t is added o .,the tabu list. Eqfly_n the search, olutionsqr.q !4ply-addedo theendof the ist until it is full. Thercafter,hecurfentsolutionmust rcplacean existing olutionstoredin the tabu list. In PRO_LEADS, the tabtr ist is.Danagedin_4_:fust,in,.fint ul (FIFO).m4nnerwith_the urenrsolution replaEingthe iabu solution havjrg. the lougesrrcsidencen tbe list, We have also experimentodwith anenclgy-basgdupda.ling criterion io which the curentsolution replaceshesOlution of lowcstenergy n the rabulist, but tcsts havc showo that it offeF rrQBarticular ad-vantageover the traditional FIFO updatingprocedure"o4!J&.nqw-c,!,rrcnt .galutionhasbn dentificd andstored, a new seJ.qfmoves is generated rom it and thesearg progdure_gpntinues with Uc next itemtion. A

    turther mechanismwhich helpssearchexplorationhasalso been mpleinnted: [, after a numberof itcrations of-the aboveprocedurg it is observed hat the bst solutionhas not changed,then the t4bu sealch is randogrly rc-started aJa new position in the soarchspac.While this

    1, Create initia-l_s-qlulio'| s spqcifiedor a! random.Make his he cunentsolutionEvatu?tqculrentsolutlon. f thecurrent olutionis the begt so far, record itUpdateTabu ist(a) lf tabu ligt s not tutl, addcunentsolution o tist(b) Else,replaceotdestmgmber9l tist with curentsolutionGonerate ndevaluateN possiblemoves romth_djnntsolutionRankN posslblemores in ascending rderofenergyExsmlng:hemoys jn rank order(a) lf-qloig h s.lgwgrenergy ian best so ter,accept t andgo to 7(b) ll move s not Tabu,accpt t and_gea7(c) lf no acceptablemovesare ocated,e]mln-ats algorithmtf the itg|alio! lttnit has been reached,xit_withthebest solution ound. f Qe bestsolutionso hrhas nol changed or a givennumberof itemtions.restarthewholeprocedurego o 1),Otherwise,goto2

    7.

    Fig.5. Tabu earch lgorithm.

    I

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    8/18

    ' ff 1q,ffi:,:H",Ti"":l::ff:ilffiflT:(b) lf population-asconvergedr maximum um_Der f genetic perationsas beenexceeded,(c) S,electwo parentsolutions y routettewheelprocedure(d) lf^(Crossover),producehir'ochildrenoy oneponl crossover{i) Chooseandom ositionn docking ari-ables(ii) Divide arents t thisooint(iii)Obtin hildren y taking ombrnrngirstp|ece f one parentwithsecondpieceofolnerDarenl

    , . Otherwise, opy parenrso children(e) Loopoverchitdren nd f (Mutate) pply ran_oom mutation(i) Choose ocking ariable t ranoom(r, Addrandom umberiom Gaussranis_rflowon o docking ariable.Widthofd|smbulions 0.i of thedomain tze orlne variable(D Replaceeast it population emberf chitd,s

    '1. ceneratean nilial populationof solutions

    energys lowerGo o 2(a)

    If:Xr;q:"";'Tiix, ,:lll.(theconditionsrossover'utatcnd ccept

    to act direcrlyupon the stringof realdockingvariablesdescritedn theDockingvariaiblesection.f,"""fg.J,f_:a-Kes use of two operators. crossover and mutation.vrussoveracts upon two parcnt solulionsand producestwo newsolutions alledchildren. the mutrtionop"Lto,1:: t: "i"

    solution.The probabitity f rh" ";;;;;;lccurnngs.controlledby the user.F* th" "^;;i;, ;;rns papet.thecrossoverrobability is 0.5 and the muta-tronprobabilirys 0.5.However,n PRO_LEADS,muia_tronwill always ccur f therehasbeenno crossoveroas,. :TTT" the generic iversityof the population.. nercctron f parentsat eachstep oilows he roulette::::l T** [39].Each. opulationmembers assisneda rawltuess val:ue.,,, which is given by tle aifferenilinen=ergyetweenhe solutionenirgl, and the solutionofmz|"rltmumnergywithin the population. f,ir ra* ""fu":t -tl^:,-tt''* linearly.. =ap,15, so that rhe averagenrnessspresewed nd themaximum6rne* i, Mu,,il;i;_raramttmesgreate,Lhan heaverage.f thisscheme verrc!urrsn negattvec,tled tness alues,he attercriterionrsdroppcdand the lowestfitness s set to zero.When the.-1,1:": ulu": havebeencalcutated, achsolution s as-srgned- sectionof a roulettewheetof sizeproponionatto t$ fitness, nd this wheels spun o select arents. hepoint of scatinghe fitness aluess to vary the selectionl:ess1re. sed by the algorithm. Witt u iurg" uulu; ;iy."_i::al:l"r.tn. s"lecrion ressures verysrrong nd hehttest^individualshavea very high probability-ofsetec_jl"j^^S-..T,ll t *lds to teai to l-owgenetrc iversiryofrne poputatlonand subsequentrapping n a locatmini_mum, moderatevaluesof MaxScaleparam re usuallyused 1. 2 s used or th eapplicationsn thispapcr),

    Randomsearch-^ TherandomsearchRS)procedure rmplygenentesiandom.conformationand orientationof ttri tiganasuU-j::l:. j.h" consrrainr hat the ligand.scentreof massl:y]:i],::."tTT theboundingboxspecifiedy the user.rn eachol the RSdocking uns, his was epeated00000hmesand the algorithm erminated etumiDg he owestenergy olution oundduringthe search.

    multiple estart iocedures not part of theclassicalabusearch, t has beenshownto heli the searchescaperomIocalminima n our studie$.t " tuOu "ur"f, "in,inu"lor a user-definedumberofiterations.At the endofthistrme, t terminates nd retums the bestsolution founidu ng the search.' In all the tabu searchexperiments escribedn thispaper.he searchwasallowed o proceedor 2000 tera_::::;1'j::Lll:-,'on. r00 movesweregenerated sing\aucny mutahon wilh a fixed o value of 0.075.Th;lengthof-the abu istwas2i andthe fandom estartwasinitiated f the bestsolutionhad not changed fter 100lterationsGeneticalgorithm. l qejailedlccountofgeneticalgorirhms anbe foundll^:":]:}f fE rhe.generat ramlwork ror our senericl:: l l j l : :_', '":,*,ed_in ig .6. rhe atgorithmslmpte_menred n 'steady_state,form, .e. hesame opulationof:?l:tl:ns,is,continua y updated, nd theres no "on""fir a generation...,.f '*: ltn many_genericlgorithms re rmplemented*i, l i ?ri? encodingi.e. he variabtesreencodedna orrsrnng). t wasdecidedo allowthe genetic perators

    Local minimisation

    *^ lo-1al-milimiser whichuseshepowettatgorirhm[40Jas.oeenmptementedn pRO_LEADS.Thi"lr u non_i"]nvatlve.algorithmdesignedo move he sotution o thehearestoc?lminimum. t canbeused ptionallyasa finalstage.m-inimisation f the lowestenergyconformationloundafter heoperation fany of thesearch lgorithmsBasisof comparisonbetweenalgorithms,.^li^l l]: *ott, trr. primaryconcem swrlh examininglne retalrve erformanceof the variousalgorithms.Aii

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    9/18

    217

    the heuristicalgorithmscontaiDa stochasticelement,andso produce diiferent results depending on the start-ingvalue of a random numberseed. t is necessaryhcreforoto assssformanc statisticallyovera sufficiently argcnumberJf independent dals. To ensurca fair compari-son betweenalgorithmq eachone was limited to a maxl-mum of 200000 (+l %) function evaluationsper docking'This number was chosento bG largs enough for mostalgorithms to achievea reasonablesuccess dte, whileleading o a CPU time rcquiremntshort enough o per-mit many independcnt rials to b made'The first and most straigbtloryrard crite a we use ntbe comparison f the algorithmsaro the chaBcteristicsof the energydisfibution of thE results,generatd romthe abovd irials, Thes chat-actcristicsare the gverageenergyof a solution, and ths-Ividth of the disttibution"rolild thi, "u"tug" "alua For a simplccase, n which theenergysurfacehasa singleminimumwhich s muchdeeperthan ary othor minimum, an ideal algorithm would beexpcctgd o prdue an sYeragg.eners|.aloq9-to-bq va!-ueof tbis minimum with a narlow-disldbutioo of rcsyJtsaround this valug reflecting he fa4 qBl noJrrials leadto thisdeepminimumbcing ocatedThe energysurfacek rarely assimpleas the ideal caseoutlined abovg aod frequently therq are a [umber ofdeepminima of very similar energy 'competing minima')'In such a ca.se,while the characteristicsof the eneryydistribution arc still usefulquadtities, they do not alwaysrevcal all the diff encesbetween he algorithms whicharc presnt in the results and can sometimesevcn bemislcading.When herearcompetingminima,it is usfulto classifysolutionsproducedby thc algorithmsac4ordingto the minimum to which they cortespond,and to studyhow solutioneare distributedamongst he variousminimafor each of the algorithms 4 qgqrl quantity to aid thisanalvsis s the rmi.distancc 6f ttre ilocfea ligand confor-

    ' mation from th conformation it adopts in the cry$alstructue (i.. the distancefrom the 'concct' aos'wer)'Ascatte-rplot of nlrs agliqs! -oarey ot all the solutionsproctuccdusually reveals.a nuqber of qltr$tersof solu-iionq eachclustercor-respgDdingo a giYc! binding r4odeof thc lignd, and idepti!99..!4!b-q!9t!9 -b,!'gad inimumin the eoergyfunction. A! ex4mination,.af$uc-h g4lterplots often reveals ntcEstin& algotithmic characteristic$which would not be ipparcBt from a study of the energydistribution alone, as will bccomc clcar in the Resultssection.For matry cases, sing the energy unction chosn olthis study,we fitrd that tbe dpestmrnllrtlp lqcated bJanv of tbe alqorithms is one corresponding o thc crystal.t-cture Wih this in mind, uqelso f,olnpArealSaalhmsacrording to thei[ rsnccssate', that is, following Gehl-haar et al. [4,15] the proportion of th9 trials,whigh finda solution within 1.5 A rms (hsavy atoms only) of thecrystallographic igand conformatioD.

    A prelimin4ry test was carrid out in order to decideon appropriate statistical methodswith whigb to a.ssssthe risults. This revealed hat, in genelal, he4-istri-butionof results eviates ignificantly rom tbg 4grmaldistribu-tion, -ai milht have been predictcd from the expectedform of the energtsurface.Wth this n mind' themedianand semi-interquartilerangewerepreferredas descriptivestatisticso the more cornmonmeanand standard evi-aiioo. When ornparing theheuristic algorithmq thg.main .quatrtity consideredwas the md.-iarene-lgy.sf tha-.dis-tributionof bglt.encrgies btainedover500 ndependenttriali- fhis number of arialswas chosenbecauset wasfound to provide.a very good estimateof the median"n".gy uod a good estimateof the more variable semi-interquartile range.The minirnum energysolution foundo".r th" 500 ti"lt it also reported for each of the heuris-tic algorithms. Note that because he RS procedurc isonly i-ntended s a control, statistics o; fuis atgorithmweregatheredover 100 ndependcnt rial$ In view of thedeviationsrcm thenormaldistribution, Doo'Parametricmethodwaschosn o assesshe stalisticalsignificanccofthesecompa son$ The metho-dof Qardnq and Altman[4ll was usd o computea 95%confidencnterval oritt" difenince b"twcen the two medians.and a significantdiffeiliice belween ivo algeiiithns wassupposed o existif zero feli outside this interval. As with all significancetests, his simply tells us if an observeddilTerencemay beconsidercd real', i.e. unlikely to havcoccuded by chance'Of course, t is possibie or small differencso bestatist!cally significatrtand yet be of no practical significance'ID general, he local minimiser is used o refine he bestsolution at the end of each docking run. Howevct forone of the test cases lHvR), it wasdecidedo examinethe relativeperformancesof the algorithmsboth with andwithout this final refinement.This cnablsan assessmentof the bnefit to eachalgorithm of thc fioal stageof localminimisation.Test cqses

    The test casschosen or the paper are given n Tablel. All arc topical tost casesn CAMD' for which inhib-itors of th associatcdenzymearc on the market or inclinical tdals as therapeuticagcnts ot important disases'Dihydrofolate- rcductase-mclbst!9lqtq p.IIER-MTX)was choscl,bqtusq in.ragnt years t basbecomea stan-da4-F!.94!q. for,dgqking algorithns It thus srvesas auseful benchmark for our results Inffuenza virus neur- :'-aminidase-DANA was chosenas a tost cas n whichclecrostatic/hy{4|gen bond effectsare- h-ought o domiTnatc recognition, and HW-l protcase-XK263waschosen ,becausen thiscase ipophilic ntcractions reparticularlyimpoiant for good binding (steric fit). Thqlwo thrombinexampleswere choscnbecausehe inhibiton ll!4b str-oqglipophilic and strongelectrostaticnteractions n different'

    . i ,

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    10/18

    218

    Fig. T Ligandg rcd itr docking studics n lhis papcrwith rotaisblcbondsndicated:a) 0ethotrcxate;b)DANA; (c)Xf.263i(d)NAPAP;(e)aBatrcban.

    but sirpilarly-sized,binding pockets Thus, the thrombinexamples rovide a good testof thc ability of the dockingpotntial function to diffeEntiate betwen he two typesof interaction.AII of the test cases /tbprepared n a similar fashion.The cry__staltructureswereextracted rcm the Brcok-haven Databank [42] and hydrcgen atoms wel addedusing the Insightll/Discover software [431.The ligandshucturcs were mininiscd prior to docking usitrg theC\IFF force field [431.Sinc the potential function usedin this work does not require accurate hydrogen atompositions. no minimisation of the receptor was per-formed.In all .seg a bounding box defining the activc sitewasspecifiedby pcrmittiog th ligan(t'sCOM to moveupto 2.0 A alolg each axis from its crystallographic loca-tion. The tot4l volumeof the bounding box constrainingthe COM was thus 64 A!.The trcatment of crystallographic water molecules ndocking and other molccular design studies s not a sim-ple problem and has recently been the subjct of a de-tailed computational study [,14-4Q. One approach s toremovecrystallographicwater moleculesbefore attempt-ing to dock the ligand;see or exampleRefs.13, 6 and18. These g:oups rcported that they were still able toobtain correct docked conformations in their tast cases,although other workers bave found that removal of thecrystallographicwatershas necessitatedhe inclusion ofa continuum solvation model beloregood resultscouldbeobtained[7,44. In our tcst cases,we have ound that therenoval of crystallographic watcrs has a significantlydelete ous effcct upon the ability of the dockingalgo-rithm !o lgcate the arystallograp.hicbinding mode forINSD-, ETR and 3DFR. For instancg,with,a-'dry'ac-tive site for 3DFR, the best success atg sbservgdwasabout 50%; on inclusion of all the water molecules, hisrose to ovr 80%. Clearly, watsr molcules.n the activesite-helpto stqb-ilisehe crystallograBhiccodicrmatio[ bythe imposition of additional steric and electrostaticcon-straints. n the ight of these indings,and n view of thenature of the work- n this papr which is primarily con-cemed with invcstigating algorithmic performance,wehave rctained all thc crystallographic wate$ present noui tri:stcaSeiRssults

    FiguE ? shows he structuEs of the ligands used n thestudy togetherwith,the bondswhiqhare considered ota-table.Tablc 3 gives he results or the diffirent searchalgorithms on the test cases The most obvious result isthat, or all the algorithms,RS performs erypoorly.Thehigh medianenergy nd ow subcessateproduced y RSreffects he fact that RS is ineffectualin a searchspaceofthis size,and indicates he advantageof lhe more intelli-

    %E{- -t'OH'-'s__,// 'a)'

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    11/18

    2t9TABLE ]I)OCKINC tT STJLI'S I'OR TIII] TEST CASI]S GIVI:N IN 'TARI'I| I

    Minimum ocrgy Mcdian cncrgY Scmi-intcrquartilc angc Sucrc$s atc (%)PDB codc Algorilhm3DFR]DFR]DFR]DFR]DFRINSDINSDINSDINSDINSDIHVRII IVRIHVRIHVRIHVRIHVR'IHVR'IHVR'IHVR'lHvR:I ETSI E-rSI ETSIETSI ETSIETRIETRIETRIETRIETR

    -163.15-164.61-164,69-167.64-t37.4-103.88-105.60-104.71-105.4342.75-177.31-t75.24-t16.52-154.86-t59.58-168.55-15t.66-63.50

    -ll8.0l-139.76-t39.4-tM.o0-112.68-138.46-140.52-138.68-140.85-101.45

    -t5t.62-152. l3-t50.13-151.96-82.15-93.04-98.3546.78-98.75-71.18

    -158.40-155.02-156.56-1s6.44-'15.41-111.40-143.99-1J5.74-152.932t.92-t t 5.76-l17.07-120.13-1t8.39-71.53-88.86-87.52-88.60-52.38

    SAEPTSGARSSAEPTSGARSSAEPTSRSSAEPTSRSSAEPTSGARS

    5.948.086.73'1.5422.633.t22.372.561.846.955.598.82't.669,4324.775.059.311.1910,0626.095.415.214.72l7. l l

    t6.978.9814.018.909.23

    90 ,16939

    4064885166554585926l4850510

    l98II2302l39t33

    SAEPTSGARSEnefgies arc in srbirEry uoits and statisrics a;erived from 500 indcpcndent docking artempts for each algorithm. excepr RS (100 attempts)'For comDarison,he enersi."or r. rn.cro"ea .ystri "onfo.rnatton"arc'-t4t.?6 (3DFR), -100.04 (l NSD), -149 55

    ( HvR)' -t 3248 ( ETs) and-98.89 IETR)." These're*ultserc btained ilrorl usingocalminimisationf thebest olution'

    gent'algorithms. he Esults for the other algorithms nthe ndividual estcases roconsideled elow.D hydrolo a e r educase-me h o e s te

    The results n Table 3 showthat the bestperformancfor 3DFR in terms of median enrgy s produced by theGA. The differences n median energybetweenEB SAandTS are not statisticallysignificant.The successatersnot well correlated with the median energy; the,mostsuccrssfulalgorithm in terms of eneryy(the GA) is actu'ally the oint worstof the four 'intelligent'algorithms ntrms of suacessate, This may indicate that the GA ismorc proneto becoming rapped in local energ minima'which'reprcsent onformationsmore than 1.5A rms fromthe crystal conformation, than SA or TS, both of whichperform very well on this test case' t is worth noting thatthis test casewasused n parameterisationof the enerryfunction [4], and that thercfore t might beexpecte.dhat

    this utlctionshouldhavea single eepminimumncar hecr)sJalstructureand yield good successates.The scatterploi of energyversus ms from the crystal structure(Fig8) for the GA addssomeweight o thishypothesis,how-ing that the majorityof solutions ould by the GA withenergyess han -150 are ndd lose o the crystal tluc-tu!a, Nonetheless,he plot clearlyshowssomesolutionsoit* "r,"rgy *it.tt rms values of more than 2.5 A tept"-sentingsuboptimal minima on the energysurface'Neuruminidase-DANA

    The results for INSD show that t}le GA again per-forms best n termsof medianenergy,althoughthe differ-"n"i Giw*n EP and the GA is not statistically signifi-cant.'fle suqe-s-s.ratesrc somewhat owgr than thoseobservedwith iiilFn-vrx, except for TS which stillcontinues to perform very well. This ptiints to a morecompiicatedeocrgy surface for this casq possibly with

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    12/18

    220more thanone minimumof similardepth o that corre_spondingo rhecrystal trltcture.hi ssuspicions con-firmed by the scatter lot shown n Fig. 9 in which t canbe clearlyseen hat there s no simple inearrelationshipbetween nergy nd rms from the cryslalstructure. h;6gureshowshat TS finds wo dominantclusters f low-energy olutions, neclose o thecrystalstructure avingenergiesn the range 87 to -105 units,and the otherwith rms values n the range4.5-5.0A with energiesranging rom -88 to -99 units.The lowerru""".r ru-t",can be attributed n part to the existence f this latterminimum n whicheachalgorithmbecomesntrappednsomeproportionof its dockingattempts. he resultsnthiscase eemo suggesthat TS s ess usceptibleo thisentrapment nd, hus,may be carryingout a moreeflec_tiveglobalsearchhan the otheralgorithmsAs an asidq t is interestingo considerhe two ob-served indingmodesn moredetail. n thecrystallogra-phic conformation, he carboxylategroup of DANAformsa salt bridgewith ArgrTr nd alsohydrogen ondswith Arg'r5and Arg'?erclearlya verystrongand specificintemction.The carbonylof DANAk acetylamino roupformsa hydrogen ond with Argrae nd the methylmoi-ety of theacetylamino roupmakes ydrophobic ontactwith Trpr76and Arp:22. he alternativebinding modediscoveredy the dockingalgorithms s almost nvenedwith respecto the crystallog.aphiconformation.Thesalt bridge is not formed; instead, the carbonyl ofDANAs acetylamino roup ormsa hydrogen ondwithArgrr5. he majorityof the remainder f the nteractionsin this modeare hydrophobic. t is our belief hat this

    latterbindingmode s in fact an artefactof thepotentialfunction,whichseemso be biasedn favouroisteric fitand appears ot to favour specificnteractionsike thecarboxylate-argininealt bridgesufficiently. he result sthat the energyseparationof the two minima is verysmall (aboutsix units); his leads o a greater endencyfor someof the algorithms o be trapped n the higherenergyminimum han would exist f the seDaration eremadegreaterby alteringthe porentialfunition so thatsaltbridgesweremorehighly rewarded.HIV-l protease-XK263

    A point to note conceming his system s that thecomplex sdepositedn thePDB contajnsno watermol-ecules; he active site water moleculewhich mediatescontactbetween eptidomimeticHIV-I proteasenhib,.itorsand the enzymes displaced y the carbonylgroupof the urea which interactsdirectlywith the activesiteresiduesle5o nd Ile5o'.SA perlorms best according o the median energycriterion,and the differcncebetweent and the second-placedalgorithm (he GA) is statisrically ignificant l-thougfi the differences n median energiesbetween heGA, TS and EP are not. In this testcase,n conrmst othe two previousexampleqa better corrglation s ob-served etween uccessute and medianenergyi A per-formingbestand EP performingwolst on both counts.The scatter lot shown n Fig. I0 for the resulrs roducedby SA shows his co[elation well. It is alsonoticeablefrom this igure hat a numberof low+nergysolutions ie

    \

    l5

    0. 5

    -170 -160 -150 -t40 -130 -120 _l 0Energy arbitraryunits)

    Fjg.8.Scattcr lot of healyalom msversusnergyor 500 ocks lmcthoirexatento DHFR using heGA.

    o o tr coaooo o

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    13/18

    "o o $"d{&o fo.roe o o

    &""fe' .F-f ""'".l0 -t05 -100 -90Enerry arbitrry units)

    Fi8. 9. Scattcr plot of hcavy atom rms rrsus cncrBy for 500 docks of DANA ioto influcnza virus ncuraminidase usine TS

    3c

    0-l

    lq.!!rq Jlqg !.5-2.0 A rms Ttis expla.inswhy the suc.cessrates for this tcst cas arc xiiewhat lowsr than thosforiird in DHFR-MTX. The-.gcatcrspreadof good sol-utigns around the crystallogralhic colformation reflectsthe,less ircctioral nature of biqdlgg of parts of this largemolecule n an actiw site dor4ioated by lipophilic con-t4sts.For exarnplg the exactorientation of the naphthytand phenyl rings hasgater effecton the calculated msthanon the valueof tbc energy unction.Theperformanceofthe algorithmsfor I fryR when helocal minimisr is not employd s also given in Table 3.,First, in terms of median encrgy, t can be seen hat theresults rom cach algorithm arc improved by locat mini-misation, the size of the improvement varying as RS(97.53)> A(21.00)>TS20.82)>Epl 1.03)>cA 3.s1).This yariation can b ioterprcted as an indication of theeffectivenessof thc local searching performed by thealgorithms Clearly, he pojlLation-basedcvolutionaryalgorithms rc mole eflective s ocal searchershan ourimplementationsof SA or TS. It should be noted thatthese mprovementsarc both significant and computa-tiooally inexpeNive (typically the local minimisationrequircs only a few hundred e[ergy evaluations to be

    comparcdwith the 200 000permitted for the main algo,rithms), making the local minimiser a useful adjunct tothe heuristicsearchalgorithm& Scond, t is interesting ollote that in lgrms of succes! atg,.theprformanceof thcfour algorithms is little affected by the absnceof theloc4! Eiaimisation proc.edure.This again indicatgs rhatthecrystallographicminimum is broad with a wide rangeof energiesbeingprossiblswithin ths 1.5 A rms cutoff

    Thrombin-NAPAPTS aBd-thacA produce tr owcstmedian energiesorIETS, thc differencbtween hem not bei;g statisticallysignificant. In terms of successfuldocking this exampleprdU*!_thS.lawert. r.4tesaf AlrI tcst set. However, hesefiguresare somewhatmisleadiug sincc.the-owest energysolutigns aE.not considoredsuccessful y our criterion,aswill now be explained.The-qq41teJlqt of energyversus ms for the TS resultsis shown n Fig. I L The figure indicates hat thcre are atleast thrce major clusters of solutions ptoducd by thealgorithms.The,-first.clustcr is close to the crystallogra-Pbic minimum, which can be characterisedby the piper-id-ine,nalhthyl and benzamidine moietios of NApApinteractingwith (rcspectively)hc lipophilic'P' pocket, hetipophilic 'D' pocket and an aspartate esiducat the bot-tom of the Sl subsiteof thrombin.A.secondclustcrofsotuttool oqxrrs at about 3.5 A rms frgm thc crystalstructurc and contains some solutioos of slightly lowerenergy than thtr first cluster, From the standDoilt ofminimising heglob-al oergy four scoring unciion, hisis probably thecqrrect solution, and this explainswhy thetabu searchdoes retatively poorly at docking NApAp's-qcccssfully'. abu search ocates he secondcluster 16%of the time, more often than all thc other algorithm$CIhcorrEspondingprcntageates or the GA, Ep andSA are 6Vo,'l% atd 9%, respectivcly.) he secoodclusteris cha!"acterisedy the 'incorrecf positionigg 9f th9 naph-thyl moiety, which points into solvent and makes littlecontribution o the score.This positionings favourable

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    14/18

    222since he rcorrect,positioninso,*nr:rnl?rliift",i;j,ll.il!fi,:":-"il,r.i-"ij#[ff,i[:;#;1lfuitr-fl*l,.'i,,ifis given no *"igr,t ii-ti" ,".,irj"rnolecular interactiontri*{i:fififfi'H';;Jt',:'jffiffif##i[ifr$:.:jt;r;i,it{r"m::*,ru:m:.1"-lill"tTj[:Xil',,:,.,r,"n1r*'-{0unitsoren-i*jffffili:'*[H';rr'*'-.'rti;ffirilfd+Itt*fffiThrombin_argatrobqn

    ",.j::,$ resr. ase here s somecorretarionberweenff# j{ii y1ilffs;n ::*jruls3;ggg;*ggry::

    ::,,,,j*a"f itE,::,":LfJT,.T,,HiT :,":;j-i+:H$i:,tr{'",tf[###;:q;,nr"*tr;T,*:T:# l,i"l'lliililr#iq,:il"T*:{#ll:i,F:"#rr*

    l r se.he rA-r +L-,r,*r"*".,iil :::1:"'l{!"Jl'fi ;il:Tl'i:r,"o"i'""'"h'";.,'J."nT;:j:"j,fismorerrectivet:::::_l:*':. "$ ;; ;;:H:lf.::'[:T:f"rTiihl:ffi:*']ffi'rlfr,ip-$jfri{"rrf,,,:'#:ii'.fr :.Tj"",H,liir::Td,"?r," r#l^iil4;ii$-,Tfr1=",ffi*f;t+xl*"ti,l'iji$flf;$##*

    0-rE0 -160 -t40 -t00r20 -80Fig I0 scatterptot of healyatomrmsversu

    Energy afuitsaryunits)I energyor 500docks f XK263 nto HIV_ prorcasclsingSA.

    . " ' i . .ot o. . . '4 ".,"':"." ": "r.

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    15/18

    223

    Ee

    Energy arbitBry units)Fig. l. Scaricr lotof hcavy lom msvcrsusncr8yor 500docks fNApAp into hrombin sios S

    Argatroban is a moderatelydimcult test case or thealgorithms ecause,n spite of there beinga good low-energyminimum, t is quite dimcult to locate with thenumberof function evaluations hosen or this study).For this reason,t wasdecided o relun the obs with allthe algorithmsncreasinghe numberof rotatable oodsby one each ime. Because f the large numbersof experi-ments n\,olvedn this study,only 100dockingattemptswereused o derive he statisticsThe order n which herotatable oDdswereactivateds indicated n Fig. l3a.This orderwaschosen o as o apprcximately uarantethe bstperformanceof the GA for a given number ofrotatablebonds. The resulting median energiesand suc-cass ates reshown n Figs. 3b and c. The ntroductionof the fi$t nverctatable bondscan be seen o have ittleeffect on the performanceof the algorithms, excpt orthe random search which is seriously compromisedasadditional rotatable bonds are activated. This indicatesthat althoughhe numbrof rotatablebonds g a reason-able ndicator of the sizeof the sarchspace or thc testcasq t is a poor indicatorof its difhculty.The difficuhyis primarilycontrolledby the prcsence nd character fthecompetingow-energy inimaon thepotentialenergysurface. he additionof bonds6 and ? obviously ntro-duces ndconsolidatest eastonemorecompetingmini-mum. In general, he successates follow the pattem TS> SA > EP> CA, The relative perfiormanceof the algo-rithms using he medianenerg5rriteria variesconsider-ably as he estcase hangesrom an easy est o a diffr-cult one i.e.on the ntloductiooofcompeting ow-energyminima).The CA doesbest n termsof medianenergy

    when he testcase s easy, ut not so wellwhen he sixthand seventhotatablebonds are ntroduced.We ascribethis to good local searching apabilitics ut somewhatpoorerglobalsearching. P showssimilar but lesspro-nounced halacteristics S isa poorer ocalsearcherutproduces the best median energies or $evenrotatablebonds presumablybecauseof its increased bility tosample he global energyminimum elative o the otheralgorithms,t could be argued hat SA showsa similarthough ess lear-cut ffect.The fact hat he success tcsand median energies t seven otatablebondsare notexactlyhe same s he results or 500 unsgiven n Table3 underlineshe need o cary out largenumbers f runsfor comparisons f this type.Discussion

    The clearest onclusion hat can be drawn from ourresultss that RS is alwaysout-performed y the othermorc 'intelligent' algorithms.This was of course o beexpected iven the size of the searchspaces nvolved;nonetheless, S doesprovidea good 'control' for ourresults.Drawingconclusions bout he other four algo-rithms s morc difficult. An immediate bservationhatcan be made s that all belcfited, albcit to differingde-grees,rom hybridisationwith the Powell ocal optimisa-tion algorithm. Tuming to the comparisonof medianenergies,n threeof the five test caseshe GA was n0oint) fiIst place n termsof itsmedianenergy nd t wasneverworse han oint second. hus,on the basis f thiscriterion, he GA mayperhaps e udgod he ,best'algo-

    ." -"fe.*-145 -140 -135 -130 -125 -r20 -l t5 -l o _105 -100 _9 5

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    16/18

    224rithm. If points- re awardedaccordrng o rankingbymedran.energyfrom 4 lor first placto I tbr founhptace),hen,based n overallperformanceacrosshe iveffi:?f,Sif;lililff1,X** asorrows:A,"J:,':#;:1.'ff:ljil3_iiJTjl,illi,il.^,.,1;.,11,ffi

    rlc,'\ArArframpte are not includedbecause. sdiscussedearliet the global minimum on the energy surlace is not

    illg"_1 _" *:*:": usinsurcriterion.nteresrinsty,heDestesultattainedby the CA using hiscriterions,ontfsecond lace- ehindse lHvn_X{zer;. l, i. fit"rf ir,",,thrs-resuheflects predispositionon the pan of our GAror entrapmcntn ocalminimamore han S _, f.rnme cryshl.conformation,Conversely,he successf fl,Iij^1 p:od"* the besrsuccessar" tn tfre ,"mainin-gtnreestcaseq uggestshat t is able o escaperom suchmrnrma nd ocate he correct,minimummoreotten.This

    3

    - I40 -t30 -l l0Enerry arbitrary nib)

    3E

    ;*S.'"'."e ".;i"g.8gig.3-.:." ". .* ' . . .a". i ." .

    -.,r,{-r1i g^ n "n o"-sesjaw,S;"";-140 - t30 -t20 -l l0 -70Energ/ (arbitraryunits)Fig. 12.Scafier lotsof hcavy lom msvcrsus ncrgyor 500 ocks fargatrobannto thrombin jing a) hcGA and b)TS.

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    17/18

    .60:70

    3+oE -909r,I -lroE -120

    -130-l40

    r4 5Nur$cr of routablc oDds

    t.\ \ . \ .\ \ \

    Gl{ +TS.o-.SA -*RS ----

    ' . . \ \ " .. . \ \' ' '--"" '" 't ' .

    \t. ,

    Nunbcrof rourablc ond!Fig. 13. .) fuf.troban showinghcorder t| which hc rotrr.bt! bondrsc.gctivrtcd, (b) E&,t on m.dian.aaerg of ilcrErnqtt lly iDcrlrsiry urenutnberof rotatablcbonds o r.Saltobon. c)E|fcct on suocesrarcof incrrerncntaltynciasing itrc nurnt". oriat"u" uoJ, in'"rg"t.tun.

    100

    'z

  • 8/6/2019 A Comparison of Heuristics Search Algorithms for Molecular Docking

    18/18

    It226capabilitysprobabty resultof both he randomrcstart,aeature.of our TS algorithmand th" us" of a hid ;;;thrcsholdwhichdrives hesearchnto newareas f space.To summarisq sing hesameporntssconngscheme sro r rn e medtanenergyan d excludinghe thrombin_NAPAPexamplq he ankingofthe algorithms ccordinsro successar es Ts (14) sA ( .)> cA = Ep (8)._In passing.t is also nterestingo considerherclativeetnctencrest th edocking lgorithmsn termsof Cp Utme per dockingattemptasmeasured n one nodeof aConvexExemplar Hp 735 chip). The useof the grid_basedenergyevaluationmeans hat the CpU time-oerenergy valuation ariesn a inear ashionwith the nu _berof heavyatoms n the ligand.Thug for TS the fastestd_ocksre with DANA (22j s) and the ,to*"rr ur. *irhXK263 (5359. Using his slowestesa ase s a basis orcompanson, he ranking from fastest o slowest s SA(530t < TS (535s)< Ep (584s)