B-spline method for energy minimization in grid-based molecular mechanics calculations

— —< <

B-Spline Method for EnergyMinimization in Grid-Based MolecularMechanics Calculations

DANIEL OBERLIN, JR., HAROLD A. SCHERAGABaker Laboratory of Chemistry, Cornell University, Ithaca, New York 14853-1301

Received 19 June 1997; accepted 14 August 1997

ABSTRACT: A method is described for molecular mechanics calculationsbased on a cubic B-spline approximation of the potential energy. This method isuseful when parts of the system are allowed to remain fixed in position so thata potential energy grid can be precalculated and used to approximate theinteraction energy between parts of a molecule or between molecules. Weadapted and modified the conventional B-spline method to provide anapproximation of the Empirical Conformational Energy Program for PeptidesŽ .ECEPP potential energy function. The advantage of the B-spline method oversimpler approximations is that the resulting B-spline function is C2 continuous,which allows minimization of the potential energy by any local minimizationalgorithm. The standard B-spline method provides a good approximation of theelectrostatic energy; but in order to reproduce the Lennard]Jones andhydrogen-bonding functional forms accurately, it was necessary to modify thestandard B-spline method. This modification of the B-spline method can also beused to improve the accuracy of trilinear interpolation for simulations that donot require continuous derivatives. As an example, we apply the B-splinemethod to rigid-body docking energy calculations using the ECEPP potentialenergy function. Energies are calculated for the complex of Phe-Pro-Arg withthrombin. For this system, we compare the performance of the B-spline methodto that of the standard pairwise summation in terms of speed, accuracy, andoverhead costs for a variety of grid spacings. In our rigid-body dockingcalculations, the B-spline method provided an accurate approximation of thetotal energy of the system, and it resulted in an 180-fold reduction in the timerequired for a single energy and gradient calculation for this system. Q 1998John Wiley & Sons, Inc. J Comput Chem 19: 71]85, 1998

Correspondence to: H. A. Scheraga; e-mail: [email protected] sponsor: NIH; grant number GM-14312Grant sponsor: NSF; grant number MCB95-13167

( )Journal of Computational Chemistry, Vol. 19, No. 1, 71]85 1998Q 1998 John Wiley & Sons, Inc. CCC 0192-8651 / 98 / 010071-15

OBERLIN AND SCHERAGA

Keywords: molecular docking; protein folding; grid approximation; trilinearinterpolation; potential energy

Introduction

n chemical simulations such as protein foldingI and molecular docking, it is important to beable to calculate and minimize molecular potentialenergy functions efficiently. Speed becomes criticalin the treatment of larger systems because thenumber of pairwise interactions grows quadrati-cally with the number of atoms in the system. Inorder to reduce the number of interactions thatmust be computed, some methods1 ] 4 make theassumption that the host molecule, or a portion of

Ž .it, remains rigid in molecular docking so thateach ligand atom interacts with a static potentialfield defined by the fixed atoms of the hostmolecule. It may also be possible to apply thesemethods to protein folding when parts of apolypeptide chain remain fixed. Hereafter, for il-lustrative purposes, the discussion will be pre-sented in terms of molecular docking. The poten-tial energy field of the host molecule is computedfor points on a 3-dimensional grid before the simu-lation begins. During the simulation, the doublesummation of pairwise interactions between lig-and and host atoms is replaced by a single sumover ligand atoms, each of which interacts with thefield of the host molecule. The potential energy fora single ligand atom can be approximated quicklyby using a weighted average of the potential en-ergy at nearby grid points. In previous works2 ] 4

the grid approximation was based on trilinear in-terpolation,5 which uses the eight grid points atthe vertices of a cubic grid cell surrounding aligand atom to approximate the potential energy atthe location of that atom. One disadvantage of thismethod is that it does not provide a continuousderivative across the boundaries of the grid cells,making the resulting function unsuitable for localminimization algorithms such as Newton-typemethods. To use a grid method with these algo-rithms, a functional form is needed that will pro-vide the degree of continuity required by the par-ticular algorithm.

In this work we use a B-spline6 method toprovide a smooth approximation to the potentialenergy field of the host molecule. A B-spline is apiecewise polynomial function that is used to pa-

rameterize smooth curves or surfaces from a set ofŽdiscrete data such as a grid of values of potential

.energy . B-splines can be constructed from polyno-mials of any degree, and higher degree B-splinesproduce smoother curves with a higher order ofcontinuity. In the method presented here, use ismade of a cubic B-spline that provides a continu-ous second derivative. The details of the approxi-mation are presented in a later section of thisarticle.

B-splines have been used previously in molecu-lar mechanics calculations for the approximationof Ewald sums.7 In the present work we adaptedand modified the B-spline method to provide adirect approximation to the Empirical Conforma-

Ž .tional Energy Program for Peptides ECEPP po-tential energy function rather than using it to ap-proximate an Ewald sum. ECEPP includes the

Ž . Ž .Lennard]Jones 12]6 , hydrogen-bond 12]10 ,Ž .and electrostatic 1rr functional forms that are

common to many empirical potential energy func-tions. The resulting B-spline function is C2 con-tinuous, which ensures that it is suitable forminimization by any Newton-type algorithm. Thestandard B-spline method provides a good approx-imation of the electrostatic energy; but in order toreproduce the Lennard]Jones and hydrogen-bond-ing functional forms accurately, it was necessary tomodify the standard B-spline method for reasonsthat will be discussed in a later section. In addi-tion, the modification of the B-spline method canalso be used to improve the accuracy of trilinearinterpolation for simulations that do not requirecontinuous derivatives.

As an example, we develop and apply the B-spline method to rigid-body docking using theECEPP potential energy function. Energies are cal-culated for the complex of Phe-Pro-Arg withthrombin. For this system we compare the perfor-mance of the B-spline method to that of the stan-dard pairwise summation in terms of speed, accu-racy, and overhead costs for a variety of gridspacings.

Energy Function

The components of the ECEPPr3 potential en-ergy function8 consist of electrostatic, Lennard]

VOL. 19, NO. 172

B-SPLINE METHOD FOR MM CALCULATIONS

Jones, hydrogen-bonding, and intrinsic torsionalterms. In this rigid-body docking example, weconsider only the intermolecular interactions be-tween the ligand and host molecules. The resultingpairwise summation can be written as

N Na b q q Ai j hkF s qÝ Ý 12< <r I r < <r I ri jis1 js1 i j

B Chk hk Ž .q q , 110 6< < < <r I r r I ri j i j

where h is the type of atom i of the host moleculeand k is the type of atom j of the ligand molecule.The parameters A, B, and C are determined by

Žthe interaction type. Either B is zero for the. ŽLennard]Jones terms or C is zero for the hydro-

.gen-bonding terms . N and N are the number ofa bhost and ligand atoms, respectively. Because thepositions of the host atoms in the inner sum arefixed, we sum over these atoms and define thepotential energy for a single ligand atom withrespect to all of the fixed atoms of the host:

Na qiŽ .F r s ,Ýel j < <r I ri jis1

Na A B Chk hk hkŽ .F r s q q ,Ýk j 12 10 6< < < < < <r I r r I r r I ris1 i j i j i j

Ž .2

where F represents the electrostatic potential en-elergy and F represents the Lennard]Jones andkhydrogen-bond potential energy for a ligand atomof type k. With these functions defined, we canrewrite the total interaction energy between thehost and the ligand as a single sum over all of theligand atoms:

Nb

Ž . Ž . Ž .F s q F r q F r . 3Ý j el j k jjs1

When applying the B-spline method to simulationsof flexible molecules, the standard ECEPP tor-sional terms are calculated in addition to the termsapproximated by the B-spline.

In the method presented here, we define a boxthat contains the region of the host molecule whereligand docking will occur. The potential energyfunctions F are computed at points defined by a3-dimensional grid bounded by this box. Each gridof potential energy values is computed once beforethe simulation begins. During the simulation, the

values and derivatives of F may be approximatedat any point inside the box by combining thevalues of the potential energy at nearby grid pointsusing the B-spline method. For notation we willdesignate a grid G of potential energy values as

Ž . Ž .G s F r , 4lm n lm n

where l, m, and n are the discrete indices over thespace of the grid. For the ECEPPr3 potential en-ergy, a single grid is computed for the electrostatic

Žpotential energy, and N noncovalent grids onetype.for each type of atom in the ligand are computed

for the Lennard]Jones and hydrogen-bonding po-tential energies. The electrostatic grid representsthe electrostatic potential energy of the host mol-ecule, and each noncovalent grid represents theLennard]Jones and hydrogen-bonding potentialenergies for a particular atom type with respect toall atoms of the host molecule. Each grid point haspotential energy contributions from all atoms ofthe host; it is intended to be used for a single typeof ligand atom. During the course of computingthe noncovalent grids for the different atom types,it is necessary to compute the inverse powers of rŽthe distance between a grid point and a fixed

.atom in the host for every pairing of a grid pointand a host atom. When computing this set ofpotential energy grids, it is especially important toimplement an efficient algorithm that does notwaste time by repeatedly calculating the sameinverse powers of r for each grid. In Appendix A,we present an algorithm that resulted in a nine-foldspeedup in the time required to generate the 13grids needed for the ECEPP function.

B-Spline Approximation

The B-spline methodology was originally devel-oped for use in the computer graphics industry,where it is necessary to use functions that parame-terize smooth curves and surfaces from an input ofdiscrete points. In this work, we use B-splines toprovide a smooth approximation of the potentialenergy field of the host molecule in rigid-bodydocking energy calculations. Before the simulationbegins, the potential energy F is computed forpoints that lie on a 3-dimensional grid G encom-passing the volume of interest of the host molecule.During the simulation, the spline is used to ap-proximate the potential energy at the locations ofthe ligand atoms by combining the potential en-ergy values that were computed previously atnearby grid points.

JOURNAL OF COMPUTATIONAL CHEMISTRY 73


To introduce the cubic B-spline grid approxima-tion described in this article, we first consider alinear B spline, which is equivalent to the methodof trilinear interpolation.5 In this method, a cubiccell is defined by locating the eight grid pointssurrounding the point P in space where the poten-tial energy of interaction of the host with a ligand

Ž .atom is to be evaluated see Fig. 1 . The potentialenergy values that were computed previously atthese grid points are weighted and summed toapproximate the potential energy of the host at thepoint P inside the cube. We will refer to the vertexof the cube with the smallest indices l, m, and n asthe ‘‘origin vertex’’ of the cell, and will designatethe origin vertex as

Ž .o s r . 5lm n

The approximate potential energy at a given pointP inside the cube is obtained by forming a linearcombination of the potential energy values at theeight surrounding grid points, each weighted by aproduct of three 1-dimensional basis functions:

lq1 mq1 nq11Ž . Ž .F r f G B r y oÝ Ý Ý i jk iyl x x

isl jsm ksn

1 Ž . 1 Ž . Ž .= B r y o B r y o , 6jym y y kyn z z

where r is the location of point P where thepotential energy is being evaluated, G are thei jk

FIGURE 1. The 2 = 2 = 2 cubic grid cell for which thepotential energies of the host at the eight vertices areused for trilinear interpolation of the potential energy ata point P within the cell where a ligand atom is located.

values of the potential energies F that were com-puted previously, and B1 refers to the set of twolinear B-spline basis functions that are defined by 6

x1 Ž .B x s 1 y ,0 d

x1 Ž . Ž .B x s . 71 d

Ž .In eq. 7 , d is the spacing between two adjacentgrid points. The basis functions give a higherweight to grid points that are closer to the point Pwhere the potential energy is being approximated.The basis functions also have the property knownas partition of unity:

1 1 11 1 1Ž . Ž . Ž . Ž .B x B y B z s 1. 8Ý Ý Ý i j k

is0 js0 ks0

In the special case of linear B-splines, the splineinterpolates the potential energy values at the gridpoints. That is, when the spline is evaluated at theexact location of a grid point, that grid point isgiven a weight of one and all other grid points aregiven a weight of zero. This results in perfectagreement between the spline and potential en-ergy function at the grid points. This method ofinterpolation produces a piecewise linear function

Ž .of the form of eq. 7 . As a result, the function hasdiscontinuous first derivatives at the grid cellboundaries, thus making it unsuitable for mostlocal minimization algorithms. A remedy to thisproblem is to use B-splines of higher degree.Higher degree B-splines produce piecewise poly-nomial surfaces with a higher degree of continuity,but it is necessary to combine more potential en-ergy values to produce the approximation. CubicB-splines have the property that their first deriva-tives are smooth, making them ideal for local min-imization.

For a 3-dimensional B-spline of degree p, theŽ .3potential energy values at p q 1 grid points

must be used to construct the approximation. Inthe case of a linear B-spline where p s 1, thisproduces the eight grid points that are the verticesof the cell containing the location where the poten-

Ž .tial energy is approximated as in Fig. 1 . For acubic spline where p s 3, the 64 points in a 4 = 4= 4 lattice are used to produce the approximationof the potential at any point P in the central cell of

Ž .this lattice see Fig. 2 .The spline approximation is obtained by a linear

combination of the potential energy values G ati jkthe grid points comprising the 4 = 4 = 4 lattice,

VOL. 19, NO. 174


FIGURE 2. The 4 = 4 = 4 lattice of points used for thecubic B-spline approximation. The potential energy atany point P in the central cell of the lattice is evaluatedin terms of the values at all the points of the 4 = 4 = 4lattice.

each weighted by a product of three cubic basisfunctions. The cubic B-spline approximation be-comes

lq3 mq3 nq33 3 3Ž . Ž . Ž . Ž .F r f G B r B r B r ,Ý Ý Ý i jk iyl x jym y kyn z

isl jsm ksn

Ž .9

where B3 refers to the set of four cubic B-splinebasis functions that are distinguished by the fourpossible values of the subscript for B. The cubic

ŽB-spline basis functions as well as the basis func-.tions for any degree B spline are defined by a

recursive formula presented in Appendix B. Asimilar recursive formula exists for the derivativesof the basis functions, so that the B-spline summa-tion may be differentiated to provide the functiongradient.

It is important to note that B-splines of degreegreater than 1 will approximate the potential en-ergy values at the grid points without necessarilypassing through them. That is, the approximationwill not match the potential energy values at thegrid points exactly. In practice, the B-spline func-tional form provides excellent accuracy in the ap-proximation of the electrostatic potential function1rr, but in order to obtain a satisfactory accuracyfor the Lennard]Jones and hydrogen-bonding po-tential energy functions it is necessary to modifythe method.

The surprising accuracy for 1rr is probably dueto the fact that it is a harmonic function of thevariables r , r , and r . An interesting property ofx y z

Ž .harmonic functions f x is that they satisfy thefollowing equation9:

1Ž . Ž . Ž .f x s ? f y dS, 10HA < <yyx sr

where A is a constant equal to the surface area ofthe sphere of integration. In other words, a har-monic function has the special property that thefunction value at a given point is equal to theaverage of the function over the surface of a spherecentered at that point. The function evaluation in

Ž .eq. 10 is similar to the B-spline procedure in thesense that a function value is determined by anaverage over the function values at nearby points.In the case of the B-spline procedure, a finitenumber of nearby grid points are averaged to-gether. If we consider the B-spline procedure to be

Ž .an approximation of eq. 10 , then we would ex-pect the B-spline to give a better approximation for

Ž .harmonic functions because eq. 10 holds for thesefunctions. As will be shown in the Results section,this appears to be the case.

The Lennard]Jones and hydrogen-bonding po-tential functions are not harmonic. Therefore, asshown in the next section, they should be modifiedin order to apply a B-spline approximation.

Empirical Error Correction

In the previous section, it was shown that theB-spline approximation of the pairwise sum isformed by a linear combination of potential energyvalues computed at points that lie on a 3-dimen-

w Ž .xsional grid eq. 9 . The conventional pairwisew Ž .xsum eq. 1 is replaced by a single sum over the

w Ž .xligand atoms eq. 3 , each of which interacts withthe field of the entire host molecule. In order toapproximate the potential energy of the host at thelocation of a ligand atom, the B-spline procedure isused to combine the potential energies at neigh-

w Ž .xboring grid points eq. 9 . Each of these gridpotential energy values is computed prior to thesimulation by a summation of the distance depen-dent interactions between an atom placed at thegrid point and the set of fixed atoms belonging to

w Ž . Ž .xthe host molecule eqs. 2 , 4 .In a discussion of the sources of error in the

approximation, we will refer to the error in the



approximation of a single interaction term. This isthe error that results from the spline approxima-

Ž y1 y6 y10 y12 .tion of an interaction term r , r , r , or rbetween a ligand atom and a single fixed atom inthe host molecule. In this example of a singleinteraction term, each of the grid points wouldrepresent the interaction potential energy of a sin-gle host atom rather than the sum of the potentialenergies of each of the host atoms. Although weare interested in the error in the sum of potentialenergies, the error in the sum is directly related tothe error in a single term. Because the B-spline is alinear transformation of the sum of potential en-ergy values, the error in the approximation of thesum will be equal to the sum of the errors in theapproximations of each of the individual terms;thus, the error in the approximation should beanalyzed in terms of the error in the individualterms.

Another important consideration is that the B-spline of one of these pairwise interaction termswill not exhibit exact radial symmetry with respectto the position of the fixed host atom. This isbecause the approximation is composed of contri-butions from potential energy values that havebeen sampled at points lying on a cubic grid. Thus,the exact value for a given distance r will alsodepend on the locations of the host and ligandatoms within the grid. For this reason, when weexamine the accuracy of the B-spline of a singleinteraction for a given interatomic distance r, wemust consider the mean value of the spline overall possible pairs of points corresponding to a hostand ligand atom separated by a distance r, which

² :we denote as f .rsplineFor inverse powers of r other than ry1, where

the standard B-spline method results in a largedegree of error, we observed that this error is dueto a ‘‘shifting effect’’ that is a result of the B-splinetransformation. The mean value of the spline ap-proximation is shifted to lower values of r with

Žrespect to the correct value i.e., the graph is shifted.to the right :

² : Ž Ž .. Ž .f f f r y s r . 11rspline

Ž .Here, the function s r represents the degree towhich the spline function has been shifted towardsmaller values of r. The function s varies fordifferent grid spacings as well as for differentfunctions f. As an example, for the Lennard]Jones

Žfunction each of the two terms attractive and.repulsive will be shifted by a different function

Ž .s r . The cumulative effect of these errors in the

pairwise terms results in a large error in the totalenergy.

Ž .The systematic error described in eq. 11 can becorrected empirically by replacing the function fwith a modified function f . In this approach, weccompensate for the right shift of the mean value ofthe B-spline to lower values of r by shifting the

Žoriginal function f to higher values i.e., to the left.of the graph .

Ž . Ž Ž .. Ž .f r s f r q s r . 12c

The spline is then calculated using the values ofthe function f rather than f , which results in acsubstantial reduction in the error. For the termsrequired for the Lennard]Jones and hydrogen-bonding functions, we found that the followingfunctional form is useful in describing the shifting

Ž .function s r over an acceptable range of r :

bŽ . Ž .s r f a q , 13

c q r

where the parameters a, b, and c were determinedby a fitting procedure that minimizes the error in² :f . In this fitting procedure, the followingrspline

estimation of the error in the spline approximationis minimized for the set of interatomic distances r :i

2² : Ž .f y f rrc iiŽ . Ž .E a, b , c s , 14Ý ž /Ž .f rii

Ž . Ž .where f is given by eq. 12 and f r is a term ofc ithe ECEPP pairwise interaction functions: eitherry6 , ry10, or ry12.

We found that a good set of parameters couldbe determined by choosing a representative set of

˚Ž .values of r viz., 1.75, 3.0, and 6.0 A where theifunction is most sensitive to the magnitudes of thedistances. The average value of the spline function² :f was computed over a sample of 1000 ran-rc i

domly generated pairs of points that were con-tained within the grid and were separated by thedistance r . A single local minimization was per-iformed starting with each parameter equal to1, and the Broyden]Fletcher]Goldfarb]ShannoŽ . 10BFGS quasi-Newton algorithm was used toperform the local minimization. Because of the

Ž .dependence of the function s r on the particularfunction f and the grid spacings, it was necessaryto repeat this procedure at a variety of grid spac-ings for each of the three terms ry6 , ry10, andry12.

VOL. 19, NO. 176


In summary, in order to compensate for theright-shifting effect of the B-spline, the functionsry6 , ry10, and ry12 are replaced by left-shiftedfunctions that have been parameterized so that theerror of the B-spline approximation of these func-tions is minimized. The summations of theseshifted functions are computed for the grid points

win the same manner as the original function eqs.Ž . Ž .x2 , 4 , and the spline approximation is used tocombine the values of the shifted functions. Be-cause the B-spline is a linear transformation, thecorrection of the individual pairwise terms resultsin the proper correction of the total energy func-tion.

Results

To evaluate the properties of the modified B-spline method, we present results from energycalculations for a rigid-body docking system con-sisting of the protein thrombin as the host moleculeand the tripeptide Phe-Pro-Arg as the ligandmolecule. In this system the host molecule is sta-tionary and the ligand molecule is allowed fulltranslational and rotational freedom. Both mole-cules are rigid and are not allowed to changeconformation. The six variables of the system arethe three translational and three rotational degreesof freedom for the ligand. The interaction energy is

formulated as the summation of the interactionenergies of the ligand atoms with respect to the

Ž .field of the host molecule as in eq. 3 , and eachvalue for F is approximated using the cubic B-

Ž .spline formulation in eq. 9 . The potential energyvalues F on the grid are calculated using thei jkECEPP potential energy parameters and the cor-rection procedure described in the previous sec-tion. The structures of the host and ligand wereobtained from the crystal structure lppb.pdb of thethrombin inhibitor Phe-Pro-Arg-chloromethylke-

Ž . 11tone PPACK bound to thrombin. The chloro-methylketone group of PPACK was removed inorder to simulate the formation of a noncovalentcomplex. The resulting structure of the tripeptidePhe-Pro-Arg was then regularized to the ECEPPgeometry, and the energy of the binary complexwas minimized to produce the structures used inthese calculations.

Before the modified B-spline method could beapplied, it was necessary to obtain the parameters

Ž . w Ž .xa, b, and c of the shift function s r eq. 13 forthe terms ry6 , ry10, and ry12 of the pairwiseinteraction potentials. The parameters were found

Ž .by minimizing the function defined in eq. 14 .These parameters were determined for a variety ofgrid spacings, and Table I lists the resulting pa-rameters. For testing purposes, three grids of 50 =

˚50 = 50 A dimensions were calculated for a selec-tion of the grid spacings in Table I, namely 0.25,

TABLE I.( )Cubic B-Spline Shift Function Parameters of Eq. 13 Determined for Different Grid Spacings.

Grid Spacing˚( )A Function a b c

y6 y 21.0 r y6.3323 = 10 1.6120 2.8975y10 y 1r y3.6372 = 10 7.4107 6.8799y12 y 1 1 1r y9.2671 = 10 2.2550 = 10 1.3136 = 10y6 y 2 y 10.8 r y1.8287 = 10 7.4426 = 10 1.4382y10 y 2r y6.0632 = 10 1.7326 2.3502y12 y 2r y9.0247 = 10 2.3743 2.8494y6 y 3 y 1 y 10.66 r y8.0604 = 10 4.6981 = 10 9.5659 = 10y10 y 2 y 1r y2.5977 = 10 9.9251 = 10 1.5294y12 y 2r y3.7353 = 10 1.2948 1.8137y6 y 3 y 1 y 10.5 r y3.4953 = 10 2.4732 = 10 5.5514 = 10y10 y 3 y 1 y 1r y9.6628 = 10 4.8727 = 10 8.9241 = 10y12 y 2 y 1r y1.3664 = 10 6.2053 = 10 1.0611y6 y 4 y 1 y 10.33 r y6.8142 = 10 1.0000 = 10 2.4346 = 10y10 y 3 y 1 y 1r y1.9936 = 10 1.8873 = 10 4.0201 = 10y12 y 3 y 1 y 1r y2.9069 = 10 2.3620 = 10 4.8497 = 10y6 y 4 y 2 y 10.25 r y2.1119 = 10 5.4371 = 10 1.3572 = 10y10 y 4 y 1 y 1r y6.2407 = 10 1.0058 = 10 2.2551 = 10y12 y 4 y 1 y 1r y9.1829 = 10 1.2470 = 10 2.7334 = 10



˚0.5, and 1.0 A. The results of these tests are shownbelow.

The energy of the rigid-body docking systemŽ .approximated by the B-spline method was thenminimized starting from 500 random configura-tions in order to confirm that the B-spline functionis minimizable and to provide an ensemble oflow-energy structures with which to evaluate theaccuracy of the approximation. The minimizationswere performed using the BFGS algorithm, and

˚the 0.5 A grid was used for the energy minimiza-˚tions. Later, the 0.25 and 1.0 A grids were used to

evaluate the accuracies of these grid spacings forthe minima generated by this procedure.

Because the starting configurations were gener-ated randomly, a significant number of the result-ing minima had severe atomic overlaps that re-sulted in extremely high energy values. In theseconfigurations, the B-spline method tended to sig-nificantly underestimate the true value of the en-ergy. Figure 3 is a plot of the log of the B-splineapproximation versus the log of the exact energyvalues for the 291 local minima that had positiveenergies out of the total of 500 local minima. Thewide range in magnitudes of the energy values is

due to the steepness of the repulsive ry12 term forthe small values of r resulting from the overlap-ping atoms. Because the B-spline approximation isproduced from a weighted sum of the potentials atthe 64 grid points surrounding the ligand atom,the maximum value of the spline inside a grid cellis limited by the highest of the 64 values in thesum.

Of the 500 local minima that were generated,we selected the 209 minima with negative energyvalues to represent structures without significantatomic overlaps. For these low-energy configura-tions, Figures 4]6 are plots of the B-spline ener-

˚gies of these minima for the 1.0, 0.5, and 0.25 Agrids, respectively, versus the exact energies. Table

Ž .II lists the root mean square RMS errors for thesecalculations. The tables and figures show that theerror of the approximation is reduced for smallergrid spacings, as would be expected.

Figure 7 is a plot of the B-spline total energies˚for an uncorrected 0.5 A grid versus the exact

energies for the low-energy configurations. TheRMS errors for these calculations are listed inTable II with the errors for the corrected grids.This figure illustrates the significant error associ-

FIGURE 3. Logarithmic plot of the corrected B-spline energy values versus the exact energy values for the 291 local˚minima that had positive energy values. A 0.5 A grid was used for these B-spline calculations. Each + symbol

represents the B-spline energies and exact energies for a single local minimum. The line y = x illustrates the deviationof the B-spline approximation from the exact values.

VOL. 19, NO. 178


FIGURE 4. Plot of the corrected B-spline energy values versus the exact energy values for the 209 local minima that˚had negative energy values. The plot is for a grid with a spacing of 1.0 A.

˚FIGURE 5. Same as Figure 4, but for a grid spacing of 0.5 A.



˚FIGURE 6. Same as Figure 4, but for a grid spacing of 0.25 A.

ated with the shifting effect of the standard B-splinemethod. The B-spline energies tend to have largevariations in magnitude and provide a very poorapproximation to the exact energy. Figure 8 is aplot of the electrostatic energy of the uncorrectedB-spline approximation versus the exact electro-static energy. Table II also lists the RMS error forthe electrostatic energy. These data show the re-markable accuracy of the B-spline in the approxi-mation of the function ry1 and illustrate that theerror in the total energy is due entirely to theshifting effect of the B-spline on the pairwise en-ergy terms ry6 , ry10, and ry12.

TABLE II.RMS Errors of Low-Energy Minima for VariousGrid Calculations.

Grid RMS ErrorSpacing of Energies

˚( ) ( )A Type of Calculation kcal / mol

1.0 Corrected B spline 3.10.5 Corrected B spline 0.130.25 Corrected B spline 0.0120.5 Uncorrected B spline 21.80.5 Electrostatic B spline 0.000430.5 Uncorrected trilinear 5.530.5 Corrected trilinear 0.39

The shifting effect is not unique to cubic B-splines, and it is also observed in trilinear inter-polation. In studies in which continuity of the de-

Žrivatives is not necessary such as Monte Carlo.calculations , the correction method can also be

used to improve the accuracy of the trilinear inter-polation method. As a demonstration, we calcu-lated the energies and errors of the 209 low-energyconfigurations using the trilinear approximation

˚w Ž . Ž .xeqs. 6 , 7 and the same uncorrected 0.5 A gridthat was used for the B-spline calculations in Fig-ure 7. We then applied the fitting technique to findthe parameters a, b, and c for the shifting func-tions that would minimize the error for a trilinear

˚interpolation using a 0.5 A grid. The resultingparameters are listed in Table III, and the RMSerror for the uncorrected and corrected energiesare listed in Table II. These data show that theerror of the uncorrected trilinear approximation issignificantly less than that of the uncorrected B-spline approximation. This is due to the fact thatthe trilinear approximation interpolates the func-tion values and is thus more tightly constrained tothese values than the cubic B-spline. Higher orderB-splines involve averaging over more functionvalues, and they tend to produce smoother curvesat the expense of larger deviations from the defin-ing function values. Table II also shows that the

VOL. 19, NO. 180


FIGURE 7. Plot of the uncorrected B-spline energy values versus the exact energy values for the 209 local minima that˚had negative energy values. A 0.5 A grid was used for these B-spline calculations.

FIGURE 8. Plot of the electrostatic B-spline energy values versus the exact energy values for the 209 local minima that˚had negative energy values. A 0.5 A grid was used for these B-spline calculations.



TABLE III.( )Trilinear Interpolation Shift Function Parameters of Eq. 13 .

Grid Spacing˚( )A Function a b c

y6 y 4 y 1 y 10.5 r y1.9439 = 10 1.0906 = 10 2.0767 = 10y10 y 3 y 1 y 1r y1.7222 = 10 2.1087 = 10 4.3926 = 10y12 y 3 y 1 y 1r y2.9061 = 10 2.6693 = 10 5.6196 = 10

error of the corrected B-spline approximation is lessthan the error of the corrected trilinear approxima-tion, illustrating that the function can be repre-sented better by a cubic polynomial than by alinear function.

In the rigid-body docking calculations pre-sented here, the B-spline method resulted in a180-fold reduction in the time required for a singleenergy and gradient calculation for this system ascompared to the standard pairwise calculation.The timings required to calculate the 13 gridsŽcorresponding to the 13 unique types of atoms in

.ECEPP needed to perform ECEPP calculations arelisted in Table IV. These calculations were per-formed using the second algorithm described inAppendix A. This algorithm was 9 times fasterŽ .with 13 atom types than the simpler conventionalmethod listed first in Appendix A. These timingswere made on a single processor of an SGI PowerOnyx system. The storage required for grids ofvarious sizes are also listed in Table IV.

Conclusions

The modified B-spline method provides a fastand accurate approximation of standard molecularmechanics force field functions for applications inwhich part of the system can be considered fixed.Because the approximation has a continuous sec-ond derivative, the resulting function is minimiz-able by gradient-based local minimization algo-

TABLE IV.Grid Storage Requirements and Timings forGrid Generation.

Grid Spacing Generation Time Storage Required˚( ) ( ) ( )A min MB

1.0 16 3.50.5 137 28.00.25 1105 224.0

rithms. Thus, simulations that depend on suchminimization algorithms can benefit from the B-spline method. If the simulation does not require a

Žcontinuous derivative as in the case of Monte.Carlo calculations , the B-spline correction proce-

dure can also be used to provide an accuratetrilinear approximation.

In this example, we demonstrated the use of themethod for rigid-body docking calculations. Animportant point about the timing of the method isthat, for rigid-body docking, the time for the en-ergy calculation scales linearly with the number ofatoms in the ligand. It is also possible to apply themethod to situations in which the ligand is com-pletely flexible or when only part of the hostmolecule is fixed. In other work,12 we obtained a45-fold speed improvement in simulations of thesame system but used a completely flexible ligandŽcompared with the 180-fold improvement for rig-

.id-body docking .The spline correction procedure is simple to

implement and requires no additional complexityin the actual energy evaluation procedure. It isinteresting to note that, without the correction pro-cedure, the B-spline and trilinear approximationsŽ y1 .for terms other than r are in error to a degreethat they are essentially useless.

Although the accuracy of the method is verygood for low-energy configurations, there is a largeunderestimation of the high energies of configura-tions with severe atomic overlaps. This error forhigh-energy configurations does not limit the use-fulness of the method because these high-energyvalues are not accurately defined by the force fieldand are generally not of quantitative interest.

Finally, consideration must be given to theoverhead and storage costs associated with thegeneration of these potential energy grids. In thiswork, the grid spacings that we tested spanned awide range of accuracy and storage requirements.

˚A 1.0 A grid is sparse enough so that it can containa very large system, but it has an associated errorthat renders it unusable for quantitative work. At

VOL. 19, NO. 182


˚the other extreme, a 0.25 A grid provides verygood accuracy but requires an extremely large

Ž .amount of storage and computation time for asystem like the example in this article. It has been

˚our experience that a grid spacing of 0.66 A pro-vides a good compromise between accuracy andrequired storage for systems of the size consideredin this work. The resulting data have been ofmanageable size and the approximation has pro-vided a satisfactory accuracy.

Appendix A: Efficient Generation ofPotential Energy Grids

When generating the N interaction grids fortypethe Lennard]Jones and hydrogen-bonding ener-gies, it is of interest to calculate one grid Gk foreach atom type k of the ligand such that

Na A Bhk hkkG s qÝlm n 12 10< < < <r I r r I ris1 lmn i lmn i

Chk Ž .q . A.16< <r I rlmn i

The simplest method, which involves generatingthe grids sequentially, is very inefficient becausethere is an outer loop over atom types, and thegrid for atom type k is generated completely be-fore the grid for atom type k q 1 is generated.Inside the loop over atom types, there is a loopover all the atoms in the host molecule. For eachatom in the host molecule there is a loop over thegrid points, and a contribution for that atom isadded to the potential energy at each point in thegrid. Briefly, this algorithm is

Outer loop over atom type kFill the grid Gk with zerosLoop over atom j of the host to be placed on thegrid

Loop over grid space l, m, n

A Bhk hkk kG ¥G q qlm n lm n 12 10< < < <r I r r I rlmn i lmn i

Chkq 6< <r I rlmn i

End loop over grid spaceEnd loop over atom i

End loop over atom type k

This method is very wasteful because it calculates

the functions ry6 , ry10, and ry12 a total of Ntype

times for every pairing of a host atom to a gridpoint. A more efficient method can be used thatcalculates each of these functions only once foreach pairing. During the generation of the grids,the method requires additional temporary storagefor three potential energy grids denoted T 12 , T 10,and T 6. This temporary storage is required onlyduring the generation of the grids. The new algo-rithm can be written as

kw xCreate a list L of host atoms by type: L j is thejth atom of type kFill the grids G with zerosOuter loop over atom type k

Fill the temporary grids T 12 , T 10, and T 6 withzeros

kw xLoop over atoms of type k, L jLoop over grid space l, m, n

112 12T ¥T qlm n lm n 12< <kr I rlmn L [j]



End loop over grid spaceEnd loop over atomsLoop over atom types h

Look up ECEPP parameters A, B, and C foratom types h and kLoop over grid space l, m, n

Gk ¥Gk q A T 12 q B T 10 q C T 6lm n lm n hk lm n hk lm n hk lm n

End loop over grid spaceEnd loop over atom types

End outer loop over atom types

The second loop over atom types, which updatesthe final grid Gk, is quite fast compared to the loopover atoms because there are many more atomsthan atom types. In ECEPP calculations there are13 unique atom types, and we observed a ninefoldspeedup in the time to produce the grids.

Appendix B: Definition of B-SplineBasis Functions

A B-spline is a piecewise polynomial functionthat produces a smooth parameterization of a sur-



face or function based on values that have beensampled at a set of points defined by a grid. Thegrid itself represents the parameter space, and thespacing of the grid points along a particular coor-dinate specifies the values of the correspondingparameter for which the function or surface will besampled. In the general case, the spacing does notneed to be regular; however, for simplicity wechose to use a regular grid spacing in this applica-tion. Each grid point is associated with a corre-sponding value that is either a function value or avector, depending on whether the spline is beingused to parameterize a function or a surface. Forthe application presented in this article, we param-

Žeterized a 3-dimensional function the potential.energy of the host molecule and the grid has

3-dimensions corresponding to the x, y, and zcoordinates of the space in which the potentialenergy has been sampled. In this section, we intro-duce the B-spline methodology as applied to 1-di-mensional functions and discuss its extension tomultidimensional functions.

Ž .For the B-spline S x of a 1-dimensional func-Ž .tion f y , there is a single parameter x. By analogy

Ž .with eq. 4 of the text, we will refer to the sam-Ž .pled function values as G s f y . The values Gi i i

are associated with the values of the parameterthat we denote as x . For a parameterized func-ition, the parameter x can be equivalent to theindependent variable y, so that x s y . The valuesi ix are referred to as ‘‘knots,’’ and we define a setiof numbers known as a ‘‘knot vector’’ that liststhese values in increasing order.6 The spline is apiecewise polynomial function, and the knots areboundaries that join the different polynomial func-tions of which the spline is composed. The knotvector serves the additional purpose of specifyingthe continuity of the spline at each of the knots.The continuity at a knot may be decreased byrepeating the knot within the knot vector. As an

Ž .example, a knot vector for a cubic B-spline p s 3with regularly sampled points corresponding tothe integers from 1 to 5 would be defined as

� 4 Ž .X s 1, 1, 1, 1, 2, 3, 4, 5, 5, 5, 5 . B.1

For a spline of degree p, the degree of continuityat unrepeated knots will be p y 1; thus, the inte-

Ž .rior knots 2]4 will have a continuous secondderivative. For each repetition of a knot in the knotvector, the degree of continuity at that knot is

Ž .decreased by one. For both end points 1, 5 , theknots were repeated 3 times, which makes the

Žfunction itself discontinuous at these points. Thisis necessary because the spline ends at these

.points. The B-spline methodology was developedso that the continuity at the interior points couldalso be lowered so that a wide variety of curvesand surfaces could be modeled. In this application,the purpose of the spline is to provide a smoothand continuous parameterization of the potentialenergy function; hence, we do not lower the conti-nuity at any of the interior points.

The polynomial form of the spline is given bythe B-spline basis functions. These basis functionsare defined in terms of the degree of the spline andthe knot vector. From a given knot vector X, theB-spline basis functions may be defined with thefollowing recursive formula:6

1 if x F x F xi iq10 Ž .B x s ,i ½ 0 otherwise

x y xip py1Ž . Ž .B x s B xi ix y xiqp i

x y xiqpq1 py1 Ž . Ž .q B x , B.2iq1x y xiqpq1 iq1

where x represent the elements of the knot vectori

X. The B-spline basis functions are used tosmoothly combine the function values at nearbyknots. A 1-dimensional B-spline of degree p can bewritten as

npŽ . Ž . Ž .S x s B x ? G , B.3Ý i i

is0

where p is the number of knots in the knot vectorŽ .and G is the function value f y . It can be seeni i

Ž .from eq. B.2 that many of the basis functions willŽbe zero. For the 0th degree basis functions step

. Ž .functions , there will be only one p q 1 nonzerobasis function between any two knot values, indi-cating that only one function value contributes tothe spline inside a given interval. For a cubicspline there will be four nonzero basis functionsbetween any two knot values. This property illus-trates that the B-splines are ‘‘local’’ in the sensethat distant function values do not affect the splineinside a given range. It also shows that, for higherdegree splines, more function values contribute tothe spline within a given interval and the localityis broadened.

Ž .Equation B.3 can be differentiated by using aŽ .recursive formula similar to eq. B.2 , which de-

fines the derivatives of the B-spline basis func-

VOL. 19, NO. 184


tions6:

d pi i Ž .B s B xp py1dx x y xiqp i

piq1 Ž . Ž .y B x . B.4py1x y xiqpq1 iq1

To parameterize higher dimensional functions,a knot vector is required for each of the indepen-dent variables. For a 3-dimensional potential en-ergy function, there will be three knot vectors X, Y,and Z. We denote the grid of sampled function

Ž .values as G . Equation B.3 generalizes toi jk

nn nyx zp p pŽ . Ž . Ž . Ž .S x , y , z s B x B y B z ? G .Ý Ý Ý i j k i jk

is0 js0 ks0

Ž .B.5

Ž . Ž .Equations 6 and 9 of the text follow from eq.Ž .B.5 and from the fact that the basis functions arezero outside the range of the summation indices.

Acknowledgments

We thank J. Kostrowicki for helpful discussionsand J. Y. Trosset for providing the coordinates forthe thrombinrPhe-Pro-Arg system and for testingthe method in his calculations. This work wassupported by the National Institutes of HealthŽ .GM-14312 and the National Science FoundationŽ .MCB95-13167 . The computations were carried outon the SGI Power Challenge computers at theCornell National Supercomputer Facility, a re-

source of the Center for Theory and Simulation inScience and Engineering at Cornell University,which is funded by the National Science Founda-tion, New York State, the IBM Corporation, andmembers of its Corporate Research Institute, withadditional Research Resource funds from the Na-tional Institutes of Health.

References

Ž .1. P. J. Goodford, J. Med. Chem., 28, 849 1985 .Ž .2. D. S. Goodsell and A. J. Olson, Proteins, 8, 195 1990 .

3. E. C. Meng, B. K. Shoichet, and I. D. Kuntz, J. Comput.Ž .Chem., 13, 505 1992 .

4. B. A. Luty, Z. R. Wasserman, P. F. W. Stouten, C. N. Hodge,M. Zacharias, and J. A. McCammon, J. Comput. Chem., 16,

Ž .454 1995 .5. W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P.

Flannery, Numerical Recipes in C: The Art of Scientific Com-puting, Cambridge University Press, Cambridge, U.K., Sec-ond edition, 1992, p. 123.

Ž6. L. A. Piegl and W. Tiller, The NURBS Non-Uniform Rational.B-Splines Book, Springer]Verlag, Berlin, 1995, p. 47.

7. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee,Ž .and Lee G. Pederson, J. Chem. Phys., 103, 8577 1995 .

8. G. Nemethy, K. D. Gibson, K. A. Palmer, C. N. Yoon, G.´Paterlini, A. Zagari, S. Rumsey, and H. A. Scheraga, J. Phys.

Ž .Chem., 96, 6472 1992 .9. G. Hellwig, Partial Differential Equations: An Introduction,

Blaisdell, New York, 1964, p. 36.10. M. Minoux, Mathematical Programming: Theory and Algo-

rithms, Wiley, New York, 1986, p. 102.11. W. Bode, I. Mayr, U. Baumann, R. Huber, S. R. Stone, and J.

Ž .Hofsteenge, EMBO J., 8, 3467 1989 .12. J.-Y. Trosset, B. Maigret, and H. A. Scheraga, J. Comput.

Chem., to appear.


Documents

B-spline method for energy minimization in grid-based molecular mechanics calculations