10
Compurers C&m. Vol. 15,No. 3, pp. 251-260, 1991 0097-8485/91 $3.00 + 0.00 Pdntcd in Orcat Britain. All rights reserved Copyright 6 1991 Pergamon Press plc NEW APPROACHES TO POTENTIAL ENERGY MINIMIZATION AND MOLECULAR DYNAMICS ALGORITHMS TAMAR SCHLICK Courant Institute of Mathematical Sciences and Chemistry Department, New York University, 251 Mercer Street, New York, NY 10012, U.S.A. (Received 23 April 1990; received for publication 7 February 1991) Abstract-We describe two new algorithms for macromolecular simulations: a truncated Newton method far potential energy minimization and an implicit integration scheme for molecular dynamics (MD). The truncated Newton algorithm is specifically adapted for large-scale potential energy functions. It uses a$alytic second derivatives and exploits the separability structure of the Hessian into bonded and n@bonded terms. The method is rapidly convergent (with a quadratic convergence rate) and allows vilriations for avoiding analytic computation of the nonbonded Hessian terms. The MD algorithm cdmbines the implicit Euler scheme for integration with the tangevin dynamics formulation. The implicit scheme permits a wide range of time steps without loss of numerical stability. In turn, it requires that a npnlinear system be solved at every step. We accomplish this task by formulating a related minimization p th’ oblem-not to be confused with minimization of the potential energy-that can be solved rapidly with e truncated Newton method. Additionally, the MD scheme permits the introduction of a “cutoff” fkquency (o,) which, in particular, can be used to mimic the quantum-mechanical discrimination among adtivity of the various vibrational modes. 1. INTRODUCTION Potential energy minimization and molecular dynam- ics (MD)1 simulations encompass two computational approaches for generating biomolecular structures by semi-emgirical methods. While both methods are powerful, in their own right, there are well-known computa(iona1 obstacles associated with large-scale applicati$ns. These include: (1) the difficulty of non- linear oyjtimization, in general, often embodied in practice by slow and poor convergence for difficult problemi; (2) the multiple-minima problem, in par- ticular, f+ which no deterministic algorithms are yet satisfactdry; and (3) the restriction of the integration step in edplicit MD schemes to very small values due to the eqistence of a wide spread of time scales in biomolec@ar motions (the spread spans over 10 orders of magnitude)! In Section 2 we describe our truncated Newton minimiwtion package, TNPACK, developed for large-scale problems and potential energy functions in particblar (Schlick & Overton, 1987; Schlick & Fogelson, 1991). The method relies on second deriva- tive infoqmation, but in a versatile and efficient way as to be feasible for large problems. We ultimately hope to ‘sethis local minimizer as part of a determin- istic glo 1 optimization algorithm. In Set ion 3, we describe a new MD scheme based on the i plicit-Euler integration scheme (IE) and the Langevi formulation (Peskin & Schlick, 1989; Schlick 1 Peskin, 1989; Schlick et al., 1991b). The implicit bheme permits a wide range of time steps without sacrifice of numerical stability_ The stochas- tic terms (frictional and random) are added to estab- lish a regime where the intrinsic damping of the implicit scheme works in tandem with the damping from friction to effectively damp out high-frequency modes. We summarize our results in Section 4 and discuss future questions. Our emphasis throughout this paper is the formu- lation and implementation of the algorithms them- selves; although comparisons with other methods and applications are mentioned, full details are described in separate works. 2. “TNPACK”, THE TRUNCATED NEWTON MINIMIZATION PACKAGE 2.1. Algorithm Newton methods for minimization are attractive because of their rapid, quadratic convergence rate (Luenberger, 1984). For large-scale applications, such as computation of protein and nucleic acid struc- tures, they are generally not used because of their prohibitive storage and computational requirements. However the inherent smoothness of semi-empirical energy functions makes Newton methods potentially powerful: direct second-derivative information could be exploited to produce reliable and efficient perfonn- anee that would make the additional effort involved worthwhile. The philosophy of truncated Newton methods is to concentrate computational effort near important I 251

New approaches to potential energy minimization and molecular dynamics algorithms

Embed Size (px)

Citation preview

Page 1: New approaches to potential energy minimization and molecular dynamics algorithms

Compurers C&m. Vol. 15, No. 3, pp. 251-260, 1991 0097-8485/91 $3.00 + 0.00 Pdntcd in Orcat Britain. All rights reserved Copyright 6 1991 Pergamon Press plc

NEW APPROACHES TO POTENTIAL ENERGY MINIMIZATION AND MOLECULAR

DYNAMICS ALGORITHMS

TAMAR SCHLICK Courant Institute of Mathematical Sciences and Chemistry Department, New York University,

251 Mercer Street, New York, NY 10012, U.S.A.

(Received 23 April 1990; received for publication 7 February 1991)

Abstract-We describe two new algorithms for macromolecular simulations: a truncated Newton method far potential energy minimization and an implicit integration scheme for molecular dynamics (MD). The truncated Newton algorithm is specifically adapted for large-scale potential energy functions. It uses a$alytic second derivatives and exploits the separability structure of the Hessian into bonded and n@bonded terms. The method is rapidly convergent (with a quadratic convergence rate) and allows vilriations for avoiding analytic computation of the nonbonded Hessian terms. The MD algorithm cdmbines the implicit Euler scheme for integration with the tangevin dynamics formulation. The implicit scheme permits a wide range of time steps without loss of numerical stability. In turn, it requires that a npnlinear system be solved at every step. We accomplish this task by formulating a related minimization p th’

oblem-not to be confused with minimization of the potential energy-that can be solved rapidly with e truncated Newton method. Additionally, the MD scheme permits the introduction of a “cutoff”

fkquency (o,) which, in particular, can be used to mimic the quantum-mechanical discrimination among adtivity of the various vibrational modes.

1. INTRODUCTION

Potential energy minimization and molecular dynam- ics (MD)1 simulations encompass two computational approaches for generating biomolecular structures by semi-emgirical methods. While both methods are powerful, in their own right, there are well-known computa(iona1 obstacles associated with large-scale applicati$ns. These include: (1) the difficulty of non- linear oyjtimization, in general, often embodied in practice by slow and poor convergence for difficult problemi; (2) the multiple-minima problem, in par- ticular, f+ which no deterministic algorithms are yet satisfactdry; and (3) the restriction of the integration step in edplicit MD schemes to very small values due to the eqistence of a wide spread of time scales in biomolec@ar motions (the spread spans over 10 orders of magnitude)!

In Section 2 we describe our truncated Newton minimiwtion package, TNPACK, developed for large-scale problems and potential energy functions in particblar (Schlick & Overton, 1987; Schlick & Fogelson, 1991). The method relies on second deriva- tive infoqmation, but in a versatile and efficient way as to be feasible for large problems. We ultimately hope to ‘se this local minimizer as part of a determin- istic glo 1 optimization algorithm.

In Set ion 3, we describe a new MD scheme based on the i plicit-Euler integration scheme (IE) and the Langevi formulation (Peskin & Schlick, 1989; Schlick 1 Peskin, 1989; Schlick et al., 1991b). The implicit bheme permits a wide range of time steps

without sacrifice of numerical stability_ The stochas- tic terms (frictional and random) are added to estab- lish a regime where the intrinsic damping of the implicit scheme works in tandem with the damping from friction to effectively damp out high-frequency modes.

We summarize our results in Section 4 and discuss future questions.

Our emphasis throughout this paper is the formu- lation and implementation of the algorithms them- selves; although comparisons with other methods and applications are mentioned, full details are described in separate works.

2. “TNPACK”, THE TRUNCATED NEWTON MINIMIZATION PACKAGE

2.1. Algorithm Newton methods for minimization are attractive

because of their rapid, quadratic convergence rate (Luenberger, 1984). For large-scale applications, such as computation of protein and nucleic acid struc- tures, they are generally not used because of their prohibitive storage and computational requirements. However the inherent smoothness of semi-empirical energy functions makes Newton methods potentially powerful: direct second-derivative information could be exploited to produce reliable and efficient perfonn- anee that would make the additional effort involved worthwhile.

The philosophy of truncated Newton methods is to concentrate computational effort near important

I 251

Page 2: New approaches to potential energy minimization and molecular dynamics algorithms

252 TAMM SCHLICK

regions of space (i.e. near critical points). Only near these important regions will the algorithm perform the standard Newton method. At other regions, computational effort is less but still ensures that steady progress is made toward a solution (Dembo & Steihaug, 1983).

Recall that Newton methods are based on a local quadratic approximation to the objective function. If we denote E as the energy function, xk as our collective coordinate vector at step k of the method, g, and H, as the gradient vector and Hessian matrix, respectively, at x,, then a step from xk along a direction p can be written as:

E(xk + p) r E(xk) + p’& + +pTH,p. (1) In the standard Newton method, we produce a sequence of iterates:

xk+l = Xk + AkPk, k = 0, 1,2,. . . ) (2)

until the gradient is sufficiently small. In TNPACK, we employ the stopping criterion:

llgkll <Emax{l,~bk Ii}. (3)

where E is a user-supplied small number, along with other tests for convergence of the xk and E(x,) sequences. We currently use a scaled Euclidean norm, I( . Ii/,,&, in these tests. In the recursive formula (2) pk is a search vector that leads to a minimum, or approximate minimum, of the right-hand side of (1); the scalar 1, > 0 is determined to ensure sufficient decrease along pk (Dennis & Schnabel, 1983; Schlick & Fogelson, 1991).

When the local quadratic approximation of (1) is accurate (i.e. the function is locally convex), pk can be obtained as the solution of the linear system known as the Newton equation:

H,p= -a. (4)

However, when the function is not adequately rep- resented by a quadratic model, Hk may be indefinite; consequently, the solution of (4) may be ill-con- ditioned or, worse, may not exist. We can remedy this by allowing a positive-definite approximation to H, , namely H,, to replace H, in (4):

R,p= -&, (5)

We refer to equation (5) as the “modified” Newton equation. Taken together, the incorporation of Hk into the method, along with the line search (which damps the Newton step), produces a quadratically- convergent method that is guaranteed to converge to a local minimum of the objective function (Luenberger, 1984; Dennis & Schnabel, 1983).

In practice, one effective way of producing pk as a solution of (5) is by the modified Cholesky factoriz- ation, MCF (Gill & Murray, 1974; Gill et al., 1983). The Gill ei al. is one such MCF strategy (Gill et al., 1983), while the Schnabel & Eskow offers a more recent and different approach (Schnabel & Eskow,

1988). Both procedures have been analyzed in the context of optimization (Schlick, 1990).

In effect, the MCF solves a symmetric n x n linear system Mz = c in which M need not be positive- definite by adding implicitly to it4 a diagonal matrix D. The matrix D is chosen so that &f = h4 + D is sufficiently positive-definite. m is then factored to its standard Cholesky factors, and the solution z is obtained by backward and forward substitution (Gill et al., 1983). The strength of the MCF stems from its three basic properties: (1) indefiniteness of A4 is detected during the factorization itself, and the method is numerically stable; (2) positive-definite matrices are not perturbed (i.e. D becomes the zero matrix and the standard Cholesky factorization results); and (3) computational requirements are approximately 0(n2) operations more than the stan- dard Cholesky factorization (for dense matrices). Alternative approaches to MCF, such as exploiting a positive-definite submatrix of H, have also been offered (Nash, 1984).

Now, the truncated Newton approach introduces yet another modification to the method of computing p: the linear system (5) need not be solved exactly at every step. A control parameter qk is introduced to yield the truncated-Newton criterion:

IIH~P~+IzII ~P&JI. (6)

It can be shown (Dembo & Steihaug, 1983) that when qk is chosen as:

qk=min f, //&II { >

, C G 1,

quadratic convergence can be maintained, i.e. l~xk+l - x* I/ Q /3 /I xk - x* II’, if xk is the neighborhood of a solution x*. Recently, alternatives to (6) have been proposed in which both vector quantities are scaled (Deuflhard, 1990).

In sum, our truncated Newton algorithm iterates on the recursive formula (2) until (3) is satisfied; the search direction pk is computed to satisfy (6) and (7), and a line search (Dennis & Schnabel, 1983; Schlick & Fogelson, 1991) is used to determine the step length, 12,. Details of the implementation are discussed in the next subsection.

2.2. Implementation

The truncated-Newton framework described above leads to a nested iteration structure: outer loop for xk, inner loop for pk. To truncate the inner loop, an iterative procedure for solving linear systems must be used.

The linear preconditioned conjugate gradient method (PCG) is attractive for large-scale problems because of its modest computational requirements and theoretical convergence in at most m 4 n iter- ations, where m is the number of distinct eigenvalues of H (Golub t Van Loan, 1989). Each iteration of PCG requires: (1) O(n) additions and multiplications;

Page 3: New approaches to potential energy minimization and molecular dynamics algorithms

Two new algorithms for macromolecular simulations 253

(2) a Hessian/vector product (Hd); and (3) solution of a linear system Mz = r, where M is a sparse approxi- mation to H. The use of preconditioning (i.e. the introdu&on of M) is important for accelerating convergence (Axelsson, 1985). Preconditioning aims to produce an effective matrix (M-*H) with a more clusteredi eigenvalue structure and/or lower condition number,than H. However, it introduces computation (3) abo*e, namely the solution of a different linear system (jthan the modified Newton) at each iteration. Thus, it is essential for efficiency of the method that M be factored very rapidly in relation to the original Hessian’matrix (H). Additionally, it is essential that the eff+ive preconditioner be positive-definite for standard implementation of PCG.

For Potential energy functions, our implemen- tation of the truncated Newton method is the following:

1. C&ice of M. We use the Hessian components from tblz local chemical interactions: bond length, bond angle and dihedral angle terms. If we write H as the hm: H = Hs + H,,, where Hs contains the bonded ‘terms and H,, contains the nonbonded terms, then iU = HB . Since Hr, can be evaluated more rapidly ithan H,, [there are O(n’) pairwise inter- actions but at most O(n) bonded interactions], M provides an excellent candidate as preconditioner

TNPACK PRBGRESS - DEBXYCYTIDINE [N=871

(approximation) to H. The components of M are automatically constructed as H is evaluated, and moreover they lead to a sparse diagonally-clustered structure (Schlick et al., 1991b). In fact, the sparsity increases significantly with the problem size which, of course, is advantagenous in our context.

2. Implementation of Hd. One can evaluate this Hessian/vector product in the standard way, but to save storage and computation of the nonbonded Hessian components we employ the following finite- difference design:

HdN&x+hd)-g(x) h . (8)

In this expression, h is a suitably-chosen small num- ber such as in the order of the cubic root of machine precision (Schlick & Fogelson, 1991). This implemen- tation requires that an additional gradient be evalu- ated at every PCG step, but it entirely eliminates the need for computing analytically and storing the expensive, pairwise Hessian elements of H,.,, .

3. Implementation of Mz = r. Together with the choice of M, the manner of solving Mz= r is crucial to optimal performance of the algorithm. In ThTACK, we incorporate a sparse MCF for this purpose. The MCF, as discussed earlier, is used to

---- ZZS. NORM

\ \

r

\ \ \

1

\ ‘,

-. \

‘-._ \ ! %_ \ : -. ‘\ \

‘\ \ ‘\ \

‘\ ‘,

\

‘-__ \! --_ ‘! \:

: : ? \

‘\ :

dQ \ I I I I t I I I t I I 14 I I I I I I I L 4 I

1 2 3 4 5 6 7 8 9 10 11 12 ITERRTIBN

eg. 1. UNPACK progress for dcoxycytldlne minlmlxadon, n = 87. For each iteration, the values of E, llgll and llrll = IIRp+rll arc shown.

UC ld,z-F I

Page 4: New approaches to potential energy minimization and molecular dynamics algorithms

254 TAMMAR %XUlCK

guarantee that the effective preconditioner to the Newton equations (n= M + D) is positive-definite. The sparse factorization ensures that only the nonzero elements of M are processed.

We have taken the Yale Sparse Matrix Package (YSMP), a FORTRAN package for solving large sparse linear systems (Eisenstat et ul., 1981, 1982), and modified it to perform the Gill et al. MCF (Gill et al., 1983; Schlick & Fogelson, 1991). YSMP is attractive in our context because of its modular solution process. First, it provides an ordering rou- tine that finds an optimal variable reordering so as to minimize fill-in (i.e. the introduction of nonzeros in the factors of A4 in place of original zeros). Second, it offers a compact storage format in which only the nonzero elements are stored in a one-dimensional array (integer arrays are used to keep track of the corresponding matrix indices). Third, the solution process for Mz = r is obtained in three steps: (1) compute a symbolic factorization of M; (2) compute a numerical factorization of M; and (3) perform backward and forward substitution to calculate z. In the truncated Newton context, the reordering and the symbolic factorization of M are done only once, assuming no covalent bonds are broken nor formed. The numerical factorization of M is performed once for each new Newton iteration k, while the backward

and forward substitutions are performed every PCG iteration.

2.3. Computational complexity and performance

The performance time and efficiency in nonlinear optimization problems in general are highly depen- dent on the problem size, functional form, starting point and choice of parameters. For TNPACK, in brief, computation time depends 09 %, the number of E and g evaluations, Hd multiplications and Mz = r solutions. The complexity of some of these operations depends on the user’s implementation.

Every outer loop in TNPACK requires one E and g evaluation. Additional evaluations may be needed for the line search. [The line search is performed by cubic interpolation, and its termination depends on how rapidly the function-reduction criteria are sat- isfied, see Schlick & Fogelson (1991).] Every inner loop requires: (1) one Hd multiplication [which means that an additional gradient evaluation is required when {8) is used]; (2) O(n) additions and multipli- cations; and (3) one sparse MCF solution of Mz = r. As mentioned above, reordering of M is O(n3 but is done only once, at the beginning of minimization. This effort, however, can significantly reduce the cost of all subsequent steps. The symbolic solution of M is performed only once every outer iteration, and the

TNPFlCK PRBGRESS - WRTER CLUSTER [Id=11251

" lo9 $ - FCTN. VALUE -- GFtAD.NORY -_-- RES. NORM

10 20 30 40 50 60 70 80 90 ITERATIBN

Fig. 2. TNPACK progress for minimization of a water cluster, IZ = 1125. See Fig. 1 caption.

Page 5: New approaches to potential energy minimization and molecular dynamics algorithms

Two new algorithms for macromolecular simulations 255

numerical solution of the linear system is done once every PCG iteration. We have shown (Schlick & Fogelsonl 1991) that in practice-for large sparse systems f here the number of nonzeros in each row of the factors of M can be bound independently of n-the c&t of the solution process is O(n). This is a signifioant reduction from the standard dense facto&at on, which is O(n’).

Our ex t rience with TNPACK for potential energy functions ~ has found the algorithm to be reliable and efficient. @omparisons with other methods have been made for small to medium-sized problems (Schlick & Overton, 1987; Schlick & Fogelson, 1991; van de Graaf, 19!90; Nash & Nocedal, 1989) and large-scale problems i(Zou et al., 199 1). For large-scale problems, a class 04 algorithms that may be competitive with truncate&Newton methods is limited-memory quasi- Newton rjlethods (Liu & Nocedal, 1989). Preliminary results already suggest that truncated Newton methods perform better on approximately quadratic problems and on problems with ill-conditioned Hessians. We believe that with a good choice of preconditjoner, an efficient truncated Newton approach is ideal for problems where the function and gradient are continuously differentiable and costly to evaluate. For these cases, the economy and reliability of truncated Newton methods are advantageous.

Illustra/.ions of performance are provided in Figs 1 and 2. Figure 1 shows progress for a deoxycytidine model of 87 variables, while Fig. 2 demonstrates progress ffor a water-cluster model of 1125 variables (125 mol cules). The water problem is particularly difficult, $eM. use the initial configuration is highly random, so the associated energy and gradient norm are very high. Nevertheless, we can see in both examples ithat convergence occurs in far less than n iterations Rnd that convergence is rapid at the last few steps. We can also observe that the residual norm (IlHp + gl ) is high at the beginning of minimization but signi f!c antly reduced as the algorithm proceeds. This is a !consequence of the truncation criteria (6) and (7). $I oreover, this behavior demonstrates the strength df the truncated Newton approach: overall progress i$ not sacrificed by the approximation to the Newton search direction at each step [see also Schlick & Overton (1987) for a comparison with full Newton algorithms].

3. ‘DIR ~LANGEVlN/IMPLICIT-RULER SCHEME

3.1. Objeatives

were interested in addressing the fol- (1) using larger time steps than those

since the use of larger time steps only if the high-frequency modes are

effectively “damped out” by the method. Fnrther- more, quantum mechanics gives a rationale for searching for a numerical scheme with these properties: modes with frequencies w >>k, T/A (kB = Boltzmann’s constant, T = absolute temperature and h = Planck’s constant divided by 27~) are essentially frozen at their ground state.

3.2. Algorithm

To address the first issue--larger time steps-we turn to a well-known numerical scheme for “stiff” problems, the implicit Euler method (Dahlquist & Bjork, 1974). The method is well known for its high stability.

The implicit Euler scheme (IE) discretizes the differential equation dy/dt =f[y(t)J. where y is a vector, by the formula (y”+’ - y”)/At =f(y”+‘). In this notation, y’ represents an approximate solution to y at time ndt. Since the difference formula looks “back” with respect to y” + ‘, the scheme is also known as backward Euler. In contrast, the explicit, or for- ward, Euler scheme uses y” in the right-hand side to produce (y” + ’ - y”)/Ar =f(y”). The explicit formu- lation is much easier to solve, but it exhibits violent instability when At is not sufficiently small [see Fig. 3 and Dahlquist & Bjiirk (1974)]. Higher-order explicit schemes are also vulnerable to tbe choice of step size for stiff problems.

We apply IE to the Langevin dynamics formu- lation (McCammon & Harvey, 1987; van Gunsteren et al., 1981):

M d’x(t 1 - = - gE[X(oI - yMv(t) + r(t), dt2 (9)

where A4 is the diagonal mass matrix (not to be confused with the preconditioner used earlier), x and v are the coordinate and velocity vectors, respectively, g, is the gradient of the potential energy, y is a “collision frequency” and r is a random force. The random force is a stationary, Gaussian process with mean zero and covariance matrix given by

3.0 2.5

Irwllclt va explicit-Euler for y’--oy

2.0 1.5 A A

Fig. 3. Solutions to the differential equation y’ = --oy, LI > 0 (exact solution = e-“3, with the implicit (IM) and explicit (EX) Euler methods. The IM discretization produces y” = (1 + aAr)-“yO, while EX gives y’ = (I - aAr)“y! The IM scheme is always stable since a At > 0, but EX requires that aAr i 2. For example, for y” = 1 and aAt = 2.2, we obtain: y” = (3.2)-“for IM (-. -. -) andy” = (- 1.2)” for EX

(-). Dramatic instability can be noted in the latter.

Page 6: New approaches to potential energy minimization and molecular dynamics algorithms

2% TAWR SCHLICK

(r(t)r(t’)T) = 2yk,TM6(t - I’). Upon discretization by IE, we obtain the following pair of first-order differential equations:

v-+-l_ “” M- ( > A I

= -g,(x”+‘)-yyMv”+‘+p+l (; Oa)

where

(r”} = 0, (lla)

(r”(r+‘)T) = 2yk,TM(G,~Ar). (1 lb)

Note that both x and v are evaluated at step n + 1 in the right-hand side of (lOa).

3.3. Implementation

To solve for x”+ I, we eliminate P+ I from (lOa) to obtain:

(1 + yAt)M(x”+’ - x;) + (At)*ge(P + ‘) = 0, (1Za)

where

%=x”+ $-@” + AW'"). (12b)

Note that we have collected-in the vector $+luan- tities known from the previous time step. (r is random and is chosen independently at every step.) Now, if we think of (12a) as a condition that the gradient of some function O(x) is zero (i.e. g, = 0), we can solve for x”+ ’ by minimizing the objective “dynamics” function:

Q(x) = f( 1 + y At) (x - x+,)~M(x - x0)

+ (At)‘E(x). (13)

Thus, each step of our MD scheme consists of obtaining x”“+ ’ by minimizing Q(x) and then computing v” +I from (lob).

The following points are worth noting regarding implementation of the scheme:

1. @J(X) contains a “kinetic” and a potential energy term. This composition explains intuitively the reason the method is stable for a wide range of At: when At is large, the potential term dominates, and we essen- tially follow low-energy forms in conformation space; when At is small, the kinetic term dominates, and we perform continuous dynamics. Thus, the method can be viewed as an interpolator between energy minimiz- ation and molecular dynamics. This suggests that the method may be used to reveal certain properties that the static approach (minimization) cannot and for which the standard, continuous dynamic approach is not sufficient.

2. Since we minimize O(x) to obtain x”+‘, any solution (i,e. local minimum) will suffice. In other words, we need not worry about the multiple minima problem in this context. In fact, it is a desirable outcome to find a configuration in a different region

of space because this is, after all, an important goal in dynamics. Indeed, we have already encountered such cases in the first time steps, where nucleic-acid sugars changed from C3’-endo to C2’-endo puckering forms. (Note that this reveals information on access- ible conformation space of the system but not on transition pathways.)

3. Minimization of Q(x) is well-suited for TNPACK, our minimization package described in the previous section. Since the kinetic term is a simple quadratic, its evaluation and differentiation is rapid (see also subsection 3.5). Furthermore, the second partial derivatives of the kinetic term form a positive multiple of the diagonal mass matrix. This not only shifts the eigenvalue spectrum corresponding to the potential term in the positive direction but preserves the sparsity structure of the problem. Thus, all the components we developed for efficient minimization of E in TNPACK can be fully exploited in this new context. In particular, the preconditioner for @can be simply formulated as the preconditioner for E added to a multiple of the mass matrix. Performance of TNPACK is thus especially rapid when a quadratic model provides a good approximation for 0.

4. Although we minimize D(x) at every dynamics step to solve for x” + I, we have an excellent starting candidate: x& x” + (Ar)v” or x”. In our experience, $ works very well, and convergence occurs in a small number of iterations, -2-14, where the exact number depends on the size of At.

3.4. Choice of parameter

To address the second issudamping high-fre- quency modes-we focus on the Langevin parameter y. In the standard formulation, y is a physically-deter-’ mined collision frequency. It determines the strength of coupling between the system and its heat bath, it determines the rate at which energy is dissipated through friction, and it enters into the properties of the random force through (I 1 b).

The IE scheme introduces yet another type of damping, intrinsic damping of the scheme. By solving analytically the equations of motion for harmonic oscillators, we can show that the rate constant for amplitude decay of a mode with frequency w is: y/2 from friction (for an underdamped oscillator) and (w’At)/2 from IE (see Fig. 4). Thus, the two rates are very different; frictional damping is independent of UI as well as At, while the scheme’s damping is both w and Al-dependent. In particular, for fixed At, the higher the frequency, the larger the damping trans- mitted by the IE scheme, but friction treats both high and low frequencies equally.

This analysis led to the following idea (Peskin & Schlick, 1989): choose a “cutoff frequency” w, and set y to:

y =wfAt. (14)

This choice will make the two decay rates equal at o = o, (see Fig. 4). Consequently, it might lead to a

Page 7: New approaches to potential energy minimization and molecular dynamics algorithms

Two new algorithms for macromolecular simulations 257

0 12343678 9 IO

FREQUENCY WC

Fig. 4. Rate constants for amplitude decay for harmonic oscillators. The rates are: y/2 from friction (for an under- damp& oscillator), and @*At)/2 from the intrinsic, implicit-$uler damping. The two rates can be set equal at

UJ = w, if y = wf Ar. See text for details.

damping of modes o % WC--where the IE damping dominates-and “activation” of modes o Q w,-- where frictional damping dominates.

Theoretical analysis of this choice of y was per- formed For the case of coupled harmonic oscillators (Fe&in Br S&lick, 1989). We have shown that by setting 0~~ to the natural quantum-mechanical value:

o, = k, T/R, (15) we can obtain an energy partition among the various modes that closely resembles the quantum- mechanical distribution (see Fig. 5). This result

1

0.22

0.19 Ownturn mlekonlw

w 0.14

0.10

0.02

10-r 100 10’

dug

Fig. 5. Energy distributions for coupled harmonic oscillators from quantum mechanics and from the Langevin/implicit- Euler scheme. The energies are plotted as a function of B = o/o, where w, = k, T/k. The distribution curves are EQM = kTs/(e@ - 1) and ELla = kT/(l + L?‘), respectively (Petin & S&lick, 1989). Note that the classical energy distribution is EcM = kT, for all 0. In contrast, both the QM and L/IE curves exhibit the energy discrimination among the various vibrational frequencies, where only the low-frequency modes have near kT energy. Critical points

for the nonlinear curves occur near 0 = 1.

was obtained in the limit w,At + 0 with w, fixed at (15). (Note that y + 0 as o, At = y/w, --t 0, and this produces different behavior from Lang&n dynamics at fixed y.) We found that low-frequency modes have their full share of kBT energy per mode (as predicted by the classical equipartition theorem), while high-frequency modes contribute much less.

0.32

0.30

0.28

0.26

0.24

0.22

0.20

& 0.18 p: g 0.1.6 W

0.14

0.12

0.10

0.09

0.06

0.04

0.02

0

MD- Ccmputed Energy distributions

A-KInas EnwQy

B-Pot*nlial En*qy

FREQUENCIES

a coupled harmonic oscillator system by the Results are displayed against the theoretical

shown more clearly in Fig. 5 (see caption above). Results were obtained from the same run as described in Ptskin & Schlick (1989) but with double the number of iterations.

Page 8: New approaches to potential energy minimization and molecular dynamics algorithms

258 TAMAR SCHLICK

A critical point for tbe distribution curve occurs near w =w,. This frequency discrimination is thus very different from classical statistical-mechanics predictions (see Fig. 5).

3.5. Computational complexity and performance

Each iteration of dynamics requires that the ran- dom force r be generated from a Gaussian distri- bution, that the gradient of the potential energy be calculated, and that several vector operations be performed [equations (lZb), (lob)]. We calculate each component of r independently from a Gaussian distribution by the algorithm of Odeh & Evans (1974) using the pseudo-random number generator suggested by Park & Miller (1988). Our formulation requires, in addition, that Cp and its derivatives be evaluated for the purpose of minimization. Since we use TNPACK for minimizing Cp, we require routines to evaluate @ and its first and second derivatives. The derivatives of @ can be easily computed once the derivatives of E are available:

g*(x) = (1 + yAt)M(x - xg) + @02g&), (16a)

H,(x) = (1 + yAtM4 + (At)‘H&). Wb)

Thus, our scheme increases the complexity of analogous explicit formulations by adding the

requirement for minimization of 9. Computational cost for minimization with TNPACK was discussed in the previous section. As we mentioned, all issues of preconditioner and sparsity are preserved for 9 and, moreover, minimizing d is easier and more rapid than minimizing E: excellent starting points are available, and H, is generally positive-definite. Our experience thus far indicates that 2-3 iterations of @-minimization are required for At = 5 x IO-” s (lo-l5 is a typical step size in explicit schemes), 5-8 iterations for Ar = 10 x 10-lss and S-14 iterations for At = 20 x IO-l5 s (Schlick et al., 199la).

Numerical investigations have been performed by our Langevin/implicit Euler scheme for a coupled harmonic oscillator system [see Fig. 6 and S&lick & Peskin (1989)], a diatomic molecule governed by a Morse potential [see Fig. 7 and Schlick & Peskin (198911 and a rigid rotator (Peskin, 1990). All results exhibited good agreement with the quantum- mechanical discrimination among activity of the various modes. In particular, for the diatomic mol- ecule we have successfully mimicked full excitation of vibrational motion only at high temperatures and, consequently, the observed dependence of heat capacity of gases on the temperature (Fig. 7).

A computational study for larger systems was initiated for deoxycytidine (Schlick et a!., 199lb),

Energies far HBr

22-1 ““‘1 “““““‘I.““1

P-Qwntum mrhankr

I I I I I 81 I I I1 I I I / I I I I I1 I, I I I

0 200 400 600 800 loOa 1200 1400 1600 1800 2ooO 2200 2400 2600 2800 3ooO

TEMPERATURE (KELVIN) .

Fig. 7. MD-computed energies for the diatomic molecule HBr with a Morse potential, by the Langevin/implicit-Euler scheme (S&lick & Peskin, 1989). MD Energies were calculated by averaging the kinetic and potential components over entire trajectories; each trajectory was carried at a different temperature with a corresponding w, = ks T/h and Ar =0.1/o,. Quantum-mechanical energies were calculated from known, approximate solutions to the Schriidinger equation. The solid lines are references from classical theory: (7/2)RT, the expected total energy per mole of diatomic molecules at thermal equilibrium, and (S/Z)RT, the expected energy associated with translation and rotation only (i.e. no

vibration).

Page 9: New approaches to potential energy minimization and molecular dynamics algorithms

Two new algorithms for macromolecular simulations 259

where a transition between two local-energy minima was captured. We have also performed liquid water simulalions with the scheme and compared structural and energetic results to those obtained by an analogous explicit formulation and Monte Carlo (Schlick et al., 1991a). In that work, the feasibility of 10 fs time steps has been demonstrated. Additionally, we ard examining how the scheme performs with larger lime steps and different cutoff frequencies on butane dynamics, and how At and w, affect the trans/@uche transition rates. We hope to implement the meihod to study macromolecular dynamics in the future.

4. SUMMARY

In this paper, we have described new approaches for petiorming large-scale potential energy minimiz- ation and molecular dynamics simulations. We have shown that Newton methods may be feasible and powerflu for large-scale functions when adapted appropriately to the problem at hand, and that implicit molecular dynamics formulations offer a viable approach toward increasing the integration step size.

The ,main features of our truncated Newton mini- mization algorithm are the following: (1) it is tailored for laqge-scale problems; (2) it concentrates com- putatidnal effort near important regions of confor- mation space; (3) it exploits the Hessian separability into banded and nonbonded energy terms-for con- structimg the preconditioner; (4) it avoids direct com- putation of the nonbonded second derivatives; and (5) it converges quadratically and reliably to a local minimum.

Our Langevin/implicit-Euler scheme for molecular dynamics has these features: (1) it allows a range of integration step sizes without loss of numerical stab- ility; (2) in return, it requires that an optimization problein be solved at every step; (3) nonetheless, the optimitation problem can be solved rapidly with the truncaied Newton method; and (4) it allows an assignment of a cutoff frequency w,, to effectively damp out high-frequency modes w % w,.

Wtile these algorithms have already demonstrated an increase in scope and efficiency in biomolecular simulations, many questions remain to be addressed:

1. How will the truncated Newton method per- form for very large problems, in particular in cOmparison with limited-memory quasi-Newton methods? Nash & Nocedal (1989) suggest that tduncated Newton methods may be more efficient for problems with ill-conditioned Hessians. For general problems, their efficiency dtpends on economy in the inner loop, through ai good preconditioning strategy (Zou et al., 1991).

2. dew can we determine systematically appropri- a e preconditioners for a given problem? For

1

example, for very large problems, some of the nonbonded terms may be important to consider. How will our MD scheme perform with larger time steps on proteins and nucleic acids? That is, how competitive will the scheme be with analogous explicit formulations? How will the choice of Ar and o, affect the structural and energetic results? In particular, what will the behavior be when w, is set to focus on a specific low frequency of interest, and how correct will the results be?

Such investigations are currently underway (S&lick & Olson, 1991; Nyberg & Schlick, 1991).

Acknowledgments-The work on nonlinear optimization is done in collaboration with Michael Gverton, and the work on molecular dynamics is done in collaboration with Charles Peskin. I am indebted to Suse Broyde for introduc- ing me to these important mathematical problems in macro- molecular simulations and for many subsequent discussions and suggestions. I thank Jerry Percus for his continuous interest and contributions and Sam Figueroa for his pro- gramming assistance. Finally, I thank Delos DeTar and David Edelson-the wonderful hosts of the conferen-for inviting me and for organizing a very exciting and enjoyable program. _ _

This work was made possible through generous support from the National Science Foundation, the American Association of Universitv Women Educational Foundation. the New York State &ence t Technology Foundation; the Searle Scholar Proaram, the Whitehead Presidential Fellowship, the San Diego Supercomputer Center, and the Academic Computing Facility at New York University.

REFERENCES

~~~~t0;;(‘gbs5~~~k;c25A.‘~~74) Numerical Methods. Prentice-Hall, Englewood Cliffs, New Jersey.

Dembo R. S. & Steihaug T. (1983) Math. Pro,. 26, 190. Dennis J. E. & Schnabel E. B. 0983j Numerical keerhodsfor

Unconstrained Oprimization and Nonlinear Equarions. Prentice-Hall, Englewood Cliffs, New Jersey.

Deufihard P. (1990) Proc. Copper Mountain Con/: on Zrerafiue Metho&, Copper Mountain, CO.

Eisenstat S. C., Schultz M. H. & Sherman A. H. (1981) SIAM J. Sci. Staf. Corn@. 2, 225.

Eisenstat S. E.. Gurskv M. C.. Schultz M. H. & Sherman A. H. (1982) Inl. J..Nwner.‘Merh. Eng. 18, 1145.

Gill P. E. & Murray W. (1974) Math. Program. 28, 311. Gill P. E., Murray W. & Wright M. H. (1983) Practical

Oprimizarion. Academic Press, New York. Gokb G. H. & Van Loan C. F. (1989) Matrix

Compuuuionr, Second Edn. John Hopkins Univ. Press, Baltimore.

van de Graaf B. (1990) Presentation at Workshop on Molecular Mechanics and Molecular Dynamics, Florida State University.

van Gun&ran W. F., Berendsen H. J. C. & Rullmann J. A. C. (1981) Mol. Phys. 44, 69.

Liu D. C. & Nocedal J. (1989) Math. Prog. 45, 503. Luenberger D. G. (1984) Linear Md Nonlinear Progrum-

ming. Second Edn. Addison-Wesley, Reading, Mass. McC-on J. A. & Harvey S. C. (1987) Dynamics of

Proteins and Nucleic Acids. Cambridge University Press, U.K.

Nash S. G. (1984) Report 84-01, Operations Research Group, John Hopkins University, Baltimore, Md.

Page 10: New approaches to potential energy minimization and molecular dynamics algorithms

260 TM SCHLICK

Nash S. G. & Nocedal J. (1989) Technical Report NAM-02, Northwestern University, Department of Electrical Eqineering and Computer Science, Evanston, Ill.

Nvbern A. & S&lick T. (19911 J. &em. PAYS.. in mess. O&h i. H. & Evans J. d. (1954) Appl. Siat.-Z$ 96: Park S. K. Bt Miller K, W. (1988) Comm. ACM 31,

1192. Peskin C. S. (1990) Comm. Pure Appl. Math. 43, 599. Peskin C. S. & S&lick T. (1989) Comm. Pure Appl. Math.

42, 1001. Schlock T. (19901 Comrx&er Science Technical Reuort 525.

Courant ‘Instiiute, Gew York University. * S&lick T. % Fogelson A. (1991) ACM Trans. Math. Sofrw.,

in press. S&lick T. & Olson W. K. (1991) Molecular dynamics of

supercoiled DNA, in prepartion.

Schlick T. & Overton M. L. (1987) J. Cornput. Ckm. 6, 1025.

SchlickT. BrPeskinC. S.(1989) Comm. Appl. Math. 42,1141. S&lick T., Figueroa S. & Mezci M. (1991a) J. C/rem. Phys.

94, 2118. Schlhk T., Hingerty B. E., Peskin C. S., Overton M. L.

Br Broyde S. (1991b) Theoretical Biochemisrry and Mol- ecular -&?iophy&s. i Compreknsbe Survey e(Edited by Beveridge D. L., and Lavery R.). Adenine Press, New York.

Schnabel R. B. 8t Eskow E. (1988) Computer Science Dept. Tech. Reuort CU-CS-415-88. University of Colorado. Boulder, e0.

Zou X., Navon 1. M., Berger M., Phua P. K. & Schlick T. (1991) Numerical Experience with Limited-Memory Quasi-Newton and Truncated Newton Methods for Large-Scale Minimization. Preprint.