26
An Analysis of the Structural and Energetic Properties of Deoxyribose by Potential Energy Methods Tamar Schlick, Charles Peskin, Suse Broyde* and Michael Overton Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, N Y 10012 Received 12 December 1986; accepted 16 April 1987 We discuss the three fundamental issues of a computational approach in structure prediction by potential energy minimization, and analyze them for the nucleic acid component deoxyribose. Predicting the conformation of deoxyribose is important not only because of the molecule’s central conformational role in the nucleotide backbone, but also because energetic and geometric discrepancies from experimental data have exposed some underlying uncertainties in potential energy calculations. The three funda- mental issues examined here are: i) choice of coordinate system to represent the molecular conformation; ii) construction of the potential energy function; and iii) choice of the minimization technique. For our study, we use the following combination. First, the molecular conformation is represented in Cartesian coordinate space with the full set of degrees of freedom. This provides an opportunity for comparison with the pseudorotation approximation. Second, the potential energy function is constructed so that all the interactions other than the nonbonded terms are represented by polynomials of the coordinate variables. Third, two powerful Newton methods that are globally and quadratically convergent are implemented: Gill and Murray’s Modified Newton method and a Truncated Newton method, specifically developed for potential energy minimization. These strategies have produced the two experimentally-observed struc- tures of deoxyribose with geometric data (bond angles and dihedral angles) in very good agreement with experiment. More generally, the application of these modeling and minimization techniques to potential energy investigations is promising. The use of Cartesian variables and polynomial representation of bond length, bond angle and torsional potentials promotes efficient second-derivative computation and, hence, application of Newton methods. The truncated Newton, in particular, is ideally suited for potential energy minimization not only because the storage and computational requirements of Newton methods are made manageable, but also because it contains an important algorithmic adaptive feature: the minimization search is diverted from regions where the function is nonconvex and is directed quickly toward physically interesting regions. I. INTRODUCTION Theoretical approaches for predicting three-dimensional structures of nucleic acids and proteins are now recognized as powerful tools for revealing details of molecular con- formation, motion, and associated biological functions. As a recent article in Science states, “It is clear that theoretical chemistry has entered a new stage.. .with the goal of being no less than full partner with experiment”.’ Indeed, theoretical calculations using semi-empirical potential energy functions can complement information obtained from experimental techniques, such as x-ray crys- tallography and spectroscopic methods, and *Biology Department, New York University, 100 Washington Square East, New York, NY 10003 can make reliable predictions. 1-4 The funda- mental idea behind potential energy calcula- tions is to construct a function that represents the free energy associated with a specific molecular conformation; the conformation corresponding to the minimum free energy can then be approximated by potential energy minimization. The challenge and power of this approach lies in its ability to incorporate all experimental data known from crystal structures of small molecules with the goal of predicting structures of large biological sys- tems. Understanding the correlation between conformation and biological function is our ultimate goal. Two basic ingredients of potential energy studies make these investigations difficult: construction of the energy function and en- ergy minimization. With increasing accuracy Journal of Computational Chemistry, Vol. 8, No. 8, 1199-1224 (1987) 0 1987 by John Wiley & Sons, Inc. CCC 0192-8651/87/0801199-26$04.00

An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Embed Size (px)

Citation preview

Page 1: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

An Analysis of the Structural and Energetic Properties of Deoxyribose by Potential Energy Methods

Tamar Schlick, Charles Peskin, Suse Broyde* and Michael Overton Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, N Y 10012

Received 12 December 1986; accepted 16 April 1987

We discuss the three fundamental issues of a computational approach in structure prediction by potential energy minimization, and analyze them for the nucleic acid component deoxyribose. Predicting the conformation of deoxyribose is important not only because of the molecule’s central conformational role in the nucleotide backbone, but also because energetic and geometric discrepancies from experimental data have exposed some underlying uncertainties in potential energy calculations. The three funda- mental issues examined here are: i) choice of coordinate system to represent the molecular conformation; ii) construction of the potential energy function; and iii) choice of the minimization technique. For our study, we use the following combination. First, the molecular conformation is represented in Cartesian coordinate space with the full set of degrees of freedom. This provides an opportunity for comparison with the pseudorotation approximation. Second, the potential energy function is constructed so that all the interactions other than the nonbonded terms are represented by polynomials of the coordinate variables. Third, two powerful Newton methods that are globally and quadratically convergent are implemented: Gill and Murray’s Modified Newton method and a Truncated Newton method, specifically developed for potential energy minimization. These strategies have produced the two experimentally-observed struc- tures of deoxyribose with geometric data (bond angles and dihedral angles) in very good agreement with experiment. More generally, the application of these modeling and minimization techniques to potential energy investigations is promising. The use of Cartesian variables and polynomial representation of bond length, bond angle and torsional potentials promotes efficient second-derivative computation and, hence, application of Newton methods. The truncated Newton, in particular, is ideally suited for potential energy minimization not only because the storage and computational requirements of Newton methods are made manageable, but also because it contains an important algorithmic adaptive feature: the minimization search is diverted from regions where the function is nonconvex and is directed quickly toward physically interesting regions.

I. INTRODUCTION

Theoretical approaches for predicting three-dimensional structures of nucleic acids and proteins are now recognized as powerful tools for revealing details of molecular con- formation, motion, and associated biological functions. As a recent article in Science states, “It is clear that theoretical chemistry has entered a new stage.. .with the goal of being no less t h a n ful l p a r t n e r wi th experiment”.’

Indeed, theoretical calculations using semi-empirical potential energy functions can complement information obtained from experimental techniques, such as x-ray crys- tallography and spectroscopic methods, and

*Biology Depar tment , New York University, 100 Washington Square East, New York, NY 10003

can make reliable predictions. 1-4 The funda- mental idea behind potential energy calcula- tions is to construct a function that represents the free energy associated with a specific molecular conformation; the conformation corresponding to the minimum free energy can then be approximated by potential energy minimization. The challenge and power of this approach lies in its ability to incorporate all experimental data known from crystal structures of small molecules with the goal of predicting structures of large biological sys- tems. Understanding the correlation between conformation and biological function is our ultimate goal.

Two basic ingredients of potential energy studies make these investigations difficult: construction of the energy function and en- ergy minimization. With increasing accuracy

Journal of Computational Chemistry, Vol. 8, No. 8, 1199-1224 (1987) 0 1987 by John Wiley & Sons, Inc. CCC 0192-8651/87/0801199-26$04.00

Page 2: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1200 Properties of Deoxyribose

of experimental data and better computing resources, these difficulties may be overcome. Nonetheless, understanding the basic com- putational problems is important for im- proved theoretical treatments.

In this article, we examine for a deoxy- ribose model the three fundamental issues of a computational approach - choice of coordi- nate system to represent molecular confor- mation, construction of the potential energy function, and choice of minimization tech- nique. We analyze the preferred sugar con- formations of C2’-endo and C3’-endo (see Figs. 1,2) that are generated from minimiza- tion with geometries as observed for many sugar fragments of crystallographically- determined nucleosides and nu~leotides.~-’

Deoxyribose is particularly suitable for this analysis since it is relatively small yet al- ready structurally complex. Linked to both the phosphate and base group of a nucleotide, the furanose ring conformation strongly in- fluences the overall conformation of a nucleic acid. Mathematically, the problem is chal- lenging because the ring atom coordinates are difficult to generate due to constraints imposed by ring closure.10-16 Many early at- tempts to reproduce the observed puckering

Figure 1. A schematic representation of the 2’- deoxyribose model investigated in this study. The base of a nucleoside is replaced by an amino group, and the CH,OH group attached to the C4’ atom of a nucleoside is replaced by a methyl group. The sugar atoms are designated by primes as in the conventional nucleic acid notation to distinguish sugar atoms from base atoms. eo - O4 are the 5 endocyclic bond angles at atoms 01’ through C4‘, in clockwise order, and 70-74

are the 5 endocyclic dihedral angles defining the rota- tions about bonds 01‘-Cl‘, Cl’-C2’, C2’-C3’, C3’-C4’ and C4‘-Ol’, respectively. A dihedral angle is defined 0” if all 4 atoms involved are coplanar and cis. The sign of the dihedral angle is determined by the procedure described in Section III(A).

preferences of nucleic acid sugars with as- sociated endocyclic bond angle values by po- tential energy methods were imperfect.’ These discrepancies were attributed to the transfer of energy parameters from general chemical sequences in small molecules to analogous chemical groups in the furanose ring. Olson has reconciled these differences by incorporating directly experimental val- ues of the bond angles that accompany the puckering and by improving the energy function with a gauche p0tentia1.l~ While more recent minimization or dynamics studies1s-20 have been more successful at re- producing qualitative aspects of pseudorota- tion (regions of energy minima and maxima, values of energy barriers), there are still some discrepancies from experiment in the exact location of the minima and maxima, and, when reported, in the bond angle deviations that occur with puckering. Accurate gen- eration of structures and energies for the furanose ring by full Cartesian energy mini- mization is important in order to compute nucleic acid structures by minimization in conformational space with all degrees of free- dom, rather than in dihedral angle space [see, for example, refs. 21-231.

Our method for analysis combines the fol- lowing three computational components. First, the potential energy is represented in Cartesian coordinate space with the full set of degrees of freedom associated with the molecular conformation. When properly parameterized, this approach provides a realistic view of the relaxed molecular con- formation and moreover, allows comparison with the pseudorotation approximation. Sec- ond, the potential energy function is con- structed so that all the interactions other than the nonbonded terms are represented by polynomials of the coordinate variables. This formulation allows direct differentiation with respect to the Cartesian variables and facilitates manipulation and differentiation of the energy terms. Third, two powerful Newton methods that are globally and qua- dratically convergentz4 are implemented: Gill and Murray’s Modified Newton meth~d’~,’~, and a Truncated Newton method specifically developed for potential energy minimization. Full details of the minimization algorithms are provided in the accompanying papeP.

In Section I1 we discuss the fundamental i s sues of computa t iona l approaches .

Page 3: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al.

cr-endo &-endo

1201

z’-deoxyri bose structures Figure 2. The two common sugar puckering modes of C3’-endo and C2‘-endo as generated by minimization. Two different views are given for each conformation. In classifying puckered conformations for five-membered rings, two basic forms are discussed: an envelope (E) in which four atoms are in a plane and the fifth atom out, or a twist (T), where two adjacent atoms are displaced on opposite sides from the plane defined by the other three ring atoms. Sugar puckering modes are defined according to which atom or atoms are displaced from these three or four atom planes: atoms displaced on the same side as exocyclic carbon C5’ are called endo, and those on the opposite side of C5’ are termed exo.

Section I11 discusses the methods: geometric construction, energy formulation and pa- rameter iza t ion , a n d organizat ion for minimization. In Section IV, we examine in detail the influence of different variations in potential energy parameters and modeling strategies on the structures obtained and on the C3’-endo/CZ’-endo energy difference. In particular, we also study the contribution of each energy potential from Energy vs. P (the pseudorotation parameter) profiles. These curves also provide a n opportunity t o investigate the approximation made in the pseudorotation description. In Section V we summarize our overall conclusions.

1. The degrees of freedom used to describe

2. The analytic form of the energy func-

3. The choice of a minimization algorithm. The structural outcome, and hence, the

biological implication of any calculation is highly dependent on the combination of choices taken. Given the same starting point, different minimization algorithms may pre- dict different local minima, and different structures may result from different potential energy models. We will describe these three issues in turn.

the molecular structure,

tion, and

11. COMPUTATIONAL APPROACHES A. Degrees of Freedom

The conformation of a molecule is described by a list of numbers that specifies the relative positions of the atoms in space. By definition, the conformation is unchanged when the

In any computational approach to the de- termination of molecular structure by energy minimization, three basic decisions must be made:

Page 4: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1202 Properties of Deoxyribose

molecule as a whole is subjected to rigid-body motion (translation or rotation). If the mole- cule contains N atoms, 3N - 6 numbers are required to specify the conformation (for N B 3). These numbers may be chosen in dif- ferent ways. For example, one might use the 3N Cartesian coordinates of the atoms and force 6 of these numbers to be zero by putting a particular atom at the origin, a second atom along the x axis, and a third atom in the x,y plane. Alternatively, the conformation may be specified as some combination of bond lengths, bond angles, and dihedral angles.27

The internal energy of a molecule is some function of its conformation (the energy is un- changed by translation or by rotation of the molecule as a whole). Thus, the task of energy minimization must be carried out in con- formation space, which has 3N - 6 dimen- sions (degrees of freedom). Since this number of dimensions can be very large for biological macromolecules, it is tempting to reduce the number of degrees of freedom by making use of certain a priori information concerning the molecule in question. One might, for example, assume that the bond lengths and bond angles were rigid and known in ad- vance. Then the conformation of the molecule (and hence the energy) would be a function of the dihedral angles alone. Another possibility would be to assume that the molecule could be deformed only along some particular path in conformation space. The use of the pseudo- rotation model is an example of the latter approach.

The concept of pseudorotation5~11~12~1*~2*~2' restricts the energetic pathway that the five- membered sugar ring follows to a wave-like motion from a chosen mean plane defined by the five ring atoms. The conformation of these skeletal atoms is described by only two in- stead of the full 9 degrees of freedom (3N-6, N = 5) associated with the molecular conformation. These coordinates can be calcu- lated in the Cremer and Pople f~rmalism'~ by expressing the z-coordinates of the atoms as periodic displacements from a chosen mean plane in terms of phase amplitude and phase shift (4, Yr}:

zj = ( 2 / 5 ) l " ~ cos

j = 0 ,1 ,2 ,3 ,4 . (1) Alternatively, the coordinates can be con-

structed in the Altona and Sundaralingam description5 from the ring's dihedral angles, expressed as periodic variations in terms of amplitude and shift {T,,,P} (see Fig. 3):

7 j = 7,, cos P + -(j - 2) , ( 1 j = 0 ,1 ,2 ,3 ,4 . (2)

Although application of this concept to nu- cleic acids is conceptually simple and can simplify formulation of ring geometry, it is necessarily approximate. First, it may pro- duce anomalies in the overall nucleic acid structure. Second, the pseudorotation formu- lation introduces mathematical difficulties in energy representation (as a function of the 2 pseudorotation parameters) and energy dif- ferentiation (with respect to the conforma- tional variables).

Thus, while it is generally an advantage to work with fewer degrees of freedom, there are several problems associated with the use of constraints:

1. The constraints may be unrealistic. In a real molecule, bond lengths and particu- larly bond angles can fluctuate in re- sponse to molecular forces that vary with conformation. Such small changes may produce large deformations of the molecule as a whole.'

2. The constraints may be inconsistent! This is particularly true in the case of five-atom rings, for which the usual con- straints on bond lengths and bond an- gles are inconsistent with ring closure.1o

3. When constraints are used to reduce the number of independent variables, the energy may be a very complicated func- tion of the variables that remain. In par- ticular, the non-bonded interactions (Coulomb and Van der Waals) are most easily expressed as a function of the Cartesian coordinates of the atoms. If, for example, the independent variables are taken as a collection of dihedral angles, then the Cartesian coordinates must be computed from the dihedral angles at every stage of the minimization process. Worse, if the minimization method uses derivatives, the corresponding deriva- tives of the Cartesian coordinates with respect to the dihedral angles must be evaluated repeatedly. The chain-rule ex- pressions for the derivatives (especially

Page 5: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1203

Figure 3. The pseudorotation cycle of the furanose ring. Ten symmetrical twists (T) are defined for phase angle values of P = 0",36", 72", . . . ,324", and ten envelopes (El are defined for P = 18",54", go", . . . ,342'. Twelve illustrative puckering forms are drawn-all 10 envelopes and the 2 symmetrical twists of C3'-endo-C2'-exo at P = 0" and CZ'-endo-C3'-exo at P = 180".

second-derivatives) are tedious to de- rive, easy to get wrong, and expensive to evaluate. Symbolic computation (e.g. MACSYMA3') avoids the tedium and the potential errors, but not the expense of repeatedly evaluating the resulting expressions.

For all of the above reasons, it is advan- tageous to avoid constraints and work with the Cartesian coordinates of the atoms as the independent variables. By including energy terms in the form of soft constraints (stiffness constants that are not infinite), to keep the bond lengths and bond angles close to their

observed values, the pitfalls outlined above can be avoided.

B. The Potential Energy Function

In principle, a potential energy function could be obtained by a quantum-mechanical description of the ground-state energy of the molecule. Since such calculations are not yet feasible for molecules as large as proteins and nucleic acids, we resort to a molecular mechanics treatment. We consider the mole- cule as a system of N masses (atoms) which is deformed by molecular forces, thereby pro-

Page 6: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1204 Properties of Deoxyribose

ducing an energy change. The basic form of this energy consists of a sum of non-bonded interactions, torsional potentials, and strain energy:

ENONBONDED represents the pairwise inter- actions and consists of two contributions: Van der Waals interactions, repulsive at short atomic separations and attractive at large distances; and electrostatic interactions be- tween charged groups obeying Coulomb's law. The Lennard-Jones 6-12 empirical potential was formed by combining the leading term t -A / r6 ) of the London attraction potential, known from quantum mechanical theory, with a steep repulsion term that is com- putationally convenient. Interactions are generally considered for atom pairs in S N B ,

the set of all (iJ pairs for which i < j and atoms i a n d j are separated by 3 bonds or more. This avoids double consideration (in terms (3a) and (3c)) of bonded atoms, and atoms involved in a bond angle. In some cal- culations, to avoid double consideration of atoms involved in a dihedral angle as well, S N B consists of the atom pairs separated by 4 bonds or more. Thus, when atoms separated by 3 bonds exactly are considered nonbonded, the contribution to the torsional potential must be recognized; this can be done by ad- justing the torsional parameters or scaling the 1-4 interactions.

The electrostatic potential energy between the partial atomic charges Qi is given by Cou- lomb's law modified by a dielectric function D(rY) to account for a weaker interaction in a polarizable medium than in a vacuum. For convenience, summation generally extends over atoms defined as non-bonded for the

Lennard-Jones potential. This involves an approximation, because although we attempt to avoid the contribution of atom pairs sepa- rated by l or 2 chemical bonds in more than one energy term, this convention assumes neutral interactions between these excluded atom pairs.

The torsional potential E T o R s I , accounts for interaction between atoms involved in in- ternal rotation, a rotation about a bond con- necting two chemical groups in a molecule and described by a dihedral angle T (see Fig. 4). In ethane, for example, the C-C bond provides a rotation axis for the 2 methyl groups. Although the origin of the barrier to internal rotation has not been entirely re- solved, the principal interactions that give rise to rotational barriers are currently thought to be repulsive interactions, caused by overlapping of bond orbitals of the two ro- tating groups.31 In ethane, the torsional en- ergy is highest when the two methyl groups are nearest, as in the eclipsed state, and low- est when the two groups are optimally sepa- rated, as in the staggered state. The empirical form of the torsional potential is given by E T o R (7) = (V,/2) (1 + cos n T ) , where the in- teger n denotes the periodicity of the rota- tional barrier and V, is the associated barrier height. Twofold and threefold potentials are most commonly used.

E s T E A I N represents the bond stretching and angle bending energy as bond lengths and bond angles ( 6 , ) deviate from equilibrium val- ues (denoted in the energy formulas with bars). SB denotes the set of all (z,j) pairs for which i < j and the atoms are bonded. Since chemical bonds display a very narrow range of fluctuation in length (generally in the order of 0.1 A), a harmonic or quadratic approxi- mation is considered sufficient. Bond angle deviations are usually small (<3") unless electron lone pairs form bond-like orbitals (e.g. water, O(H-0 - H ) = 105") or ring molecules impose closure constraints (e.g., cycloalkanes, deoxyribose). For these larger deviations, suitable potentials are still being investigated, but the most commonly used form of t h e bending potent ia l i s also harmonic.

Additional terms or variations to the basic energy form in eq. (3) may be used to account for hydrogen bonding, solvent effects or he- lical parameters where appropriate [refs. 18, 32,33, for example].

Page 7: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1205

Figure 4. Definition of a dihedral angle. The dihedral angle 7 y k l is defined as the angle between the normal to the plane defined by atoms i-j-k and the normal to the plane defined by atoms j - k - I .

The parameterization process for potential energy functions is a difficult task. Several important decisions must be made regarding choices for the functional form and numerical values for the parameters. Even if one is given a specific energy form and a set of struc- tural and energetic data to reproduce, the combinations of parameters that can be used are endless. Unrealistic choices for one group of parameters can be compensated for by ad- justment of another. Ironically, the more spe- cific the force-field becomes, that is, the more individualized the treatment of geometric se- quences (bonds, bond angles, and dihedral angles), the greater becomes the possibility of divergence from physical reality.

In theory, the energy terms should have clear physical significance with parameters calibrated by empirical fitting of crystal data and rotational barriers of analogous small molecules. However, an approximation is inherent in the extension of data from small to large systems. Moreover, interaction with solvent and counterions, reflected in the experimental data, must be interpreted and incorporated in the energy model. In summary, much freedom and manipulation are possible in constructing semi-empirical energy surfaces. Only if constructed and pa- rameterized correctly, will the energy model generate reliable structural predictions.

In the case of nucleic acid sugars, the im- portance of parameter choices in the energy function has already been realized. Different potential energy models have produced results that are qualitatively different - pseudorotation is or pseudorotation is hindered.la2O This is particularly possible when choosing equilibrium values for endo- cyclic bond angles that are inappr~pr ia te . '~ ,~~ Since the unusual puckering geometry and ring closure constraints produce significant deviations from tetrahedral bond angle ar- rangements, it is not clear what equilibrium values should be used in the harmonic bend- ing terms, more appropriate for small fluc- tuations. Indeed, Olson's ~ t u d i e s ' ~ were successful at producing energy minima at the ideal pseudorotation phase angles of 18" and 162", partly because they involved no minimization; experimentally-determined valence angles were directly incorporated into the coordinate generation of the ring atoms to give the energy as a function of a fixed puckered form.

In addition to the sensitivity of the results to the parameter choices, important issues in the modeling procedure have emerged in de- oxyribose investigations. For example, the arbitrary selections possible in modeling di- hedral angles have been noted by Harvey and Prabhakaran.lg Consider an ethane molecule; nine rotations are associated with the C-C bond. Which dehedral angles should we asso- ciate with a particular torsional term? And how should we assure proper positioning of the remaining atoms? Since the torsional po- tentials have the greatest contribution to the total energy of deoxyribose, Harvey and Prabhakaran suggest that a very careful search for appropriate torsion parameters and modeled dihedral angle sets is required to obtain energy minima at the angles corre- sponding to the ideal C2'-endo and C3'-endo structures. Selection of the torsion parame- ters is particularly important for deoxyribose, since a combination of both threefold and two- fold (gauche) torsional energies is required to produce relative energies of the preferred C2'-endo and C3'-endo structures in accor- dance with the observed ratio of these forms for sugar We will discuss the selection of modeled dihedral angles and asso- ciated constants in Section 111. Although choices in the torsion parameters and modeled dihedral angles are important for the furan-

Page 8: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1206 Properties of Deoxyribose

ose ring, the discrepancies noted in ref. 19 regarding location of the minima are un- doubtedly attributed in part to the united atom approach. Explicit inclusion of the hy- drogen atoms is necessary in order to account correctly for the eclipsing strain of the two substituents of C2’ and C3’ in the East and the Van der Waals repulsions between the equatorial base (in parallel orientation to the furanose ring plane) and the exocyclic substituents of C5’ in the West.

In our study, all heavy atom rotations about one bond (rotations not involving hydrogen atoms) are considered systematically, and all stretching and bending constants are used as group parameters for generally-defined fami- lies of atomic sequences. This procedure al- lows the molecule to attain equilibrium bond lengths, bond angles, and dihedral angles as a result of energy minimization rather than targeting strategies. It can also help to pre- vent generation of unrealistic geometries.

C. Minimization

For potential energy minimization the choice of method is important because of two inherent difficulties: existence of many local minima, all possibly of biological importance, and extensive computational requirements. The state-of-the-art today is such that for many small problems (about 30 variables or less) suitable algorithms exist for finding all local minima. For larger problems, however, unless good initial approximations are pro- vided, there is no guarantee of solving them completely in a finite number of trials. Many trials are generally required to find various local minima, and finding the global mini- mum cannot be assured.

The choice of a minimization algorithm must be made by considering the following four

1.

2.

3. 4.

features of the problem: Form of the objective function. Is it linear or nonlinear? smooth or discontinuous? a sum of squares? with or without con- straints? with or without bounds? Size. For a large number of variables storage considerations are important. Effective techniques for small problems are usually unsuitable for large prob- lems. Availability of analytic derivatives. Computer resources. The better the stor- age and speed capabilities, the more flexible the choice of an algorithm.

Potential energy functions are generally large, nonlinear, and the problem can be formulated as unconstrained. Obtaining ana- lytic first and second-derivates may be diffi- cult, but is definitely feasible. Since storage and cost considerations are directly related to the complexity of function and derivative cal- culations, the choice of a minimization algo- rithm for potential energy functions should be based on the availability of derivatives. We classify minimization algorithms into three categories:

(i) Non-derivative methods, (ii) First-derivative (or gradient) methods,

(iii) Second-derivative methods. Non-derivative methods (e.g. Powell’s) are

generally easy to implement, but they tend to be inefficient and excessively slow. The computational cost, dominated by the num- ber of function evaluations, can be large for functions of many variables and thus far outweigh the benefit of avoiding derivative computation.

Gradient methods (e.g. nonlinear Conju- gate Gradient) are the most commonly used in potential energy minimization [refs. 18, 2, 5-38, for example] because their storage and computational requirements are generally manageable and their convergence properties satisfactory, although often unpredictable.

Newton-based or second-derivative meth- ods are particularly attractive for nonlinear optimization problems because of their locally rapid convergence properties. With appropri- ate global modifications, convergence to a local minimum can be guaranteed even from distant initial coordinates. However, the use of Newton methods has been restricted to small problems or only near a solution follow- ing a gradient method [refs. 32,35,38, for example] as a result of the demanding com- putational and storage requirements associ- ated with calculation and manipulation of the second-derivative matrix. Fortunately, advances in computing resources and algo- rithms are making Newton methods feasible and powerful for large-scale problems. Trun- cated Newton methods, in particular, can maintain the quadratic convergence of New- ton methods while considerably reducing the costly manipulation and demanding storage of the analytic Hessian matrix. For more minimization details, we refer the reader to refs. 39-44 and 68.

and

Page 9: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1207

111. METHODS

A. Geometry

We consider a model for the 2’-deoxyribose ring (Fig. 1) with the full set of degrees of freedom associated with the 5 ring atoms. Since we are interested in the structural be- havior of this constituent when linked to a base and phosphate groups, we consider the NH2, OH, and C H 3 attachments as atom groups. Hydrogens linked to ring atoms are treated as individual atoms and not as part of united CH or C H , group. Individual treat- ment of these hydrogens is necessary in order to represent accurately the non-bonded energy associated with the molecule. Without individual representation of the C2’ hydro- gens, for example, no barrier will occur for the nonbonded energy component in the East region of the pseudorotation path, where one C2’ hydrogen is eclipsed with the C3‘ hydroxyl group. Thus, our model consists of 13 atomic positions in Cartesian coordinate space (39 energy variables).

We express bond lengths and cosines of bond angles and dihedral angles as poly- nomials of the Cartesian variables using a “hierarchy” of expressions as follows:

Let x , = ( X , , ~ , X ~ , ~ , X , , ~ ) , i = 1 , . . . , N ( N is the number of atoms), denote the position vector of atom i, and ru = x , - xJ denote the distance vector from atomj to i. For any vec- tor a we denote the magnitude by llall and the associated equilibrium magnitude by a. Simi- larly, if an equilibrium value is given for a bond angle 8, we denote it by 3. For con- venience later in writing the potential energy function, we denote the vector magnitude Ilr,ll also as r,, in nonbold type.

We then express the cosine of a bond angle O l l k formed by atoms i -j --K as an inner product with equilibrium bond lengths:

Let 7 1 J k l be the dihedral angle defining the rotation about bondj - k (see Fig. 4). To simplify the notation we define the follow- ing quantities:

a = r = x - b = r k , = X k - X,

51 J x,

c = r l k = xi - X k

Bab-angle between a and b

ObC-angle between b and c nab-unit normal to plane spanned by

vectors a and b nbc-unit normal to plane spanned by

vectors b and c . Then by definition, T G k l is the angle between the normals nab and nbc:

cos 7,kl = n a b * nbc

or a x b b x c

cos Tijkl = I(a(l llbll sin o a b llbll llcll sin o b c

The sign of T~~~ is determined by the sign of the triple product (a X b) * c (see Fig. 4). To obtain a polynomial in the coordinate vari- ables, we replace bond lengths Ilall, llbll and llcll and bond angles Oab and 6 b c by their equilib- rium values and then simplify this expression to a difference of inner products:

This geometric representation is direct and simple for energy evaluation and differ- entiation and is motivated by the physical “hierarchy” of flexibility - bond lengths are nearly constant, bond angles are somewhat variable, and dihedral angles are the most flexible in their ranges.

B. Potential Energy Function

The conformational energy of the deoxy- ribose model is constructed as a sum of contri- butions from a standard 6-12 Lennard-Jones function (EM), a coulombic potential (EcovL), bond length and bond angle strain terms (EBOND and E B A N G ) , and torsional potentials with rotational barrier periodicity of two and

_”

(p = 0.1, E = 4.0) (6b)

i, jrSB

Page 10: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1208 Properties of Deoxyribose

E B A N G = S 2 i ( ~ ~ ~ ei - cos 8,)' (6d) i

All parameters are chosen to produce the en- ergy in kcaUmo1. Distance is measured in A.

Since only 3N - 6 degrees of freedom are associated with a molecule of Natoms, we chose to impose by means of an energy term the following soft constraints: 1) the first atom is at the origin, 2) the second atom is on the x axis, and 3) the third atom is in the x,y plane. This procedure has two advantages: it leaves the problem as an unconstrained minimization type, and it avoids the problem of an indefinite Hessian due to translation and rotation invariance. The constraint con- stant should be sufficiently large to assure that near a solution the constraint energy is identically zero. Thus, to target the first three atoms (Or', C1' and C2') to the x , y plane as described above we add: E F I x = S4(~:,1 + x:,2 + x:,s + X $ , P

For efficient implementation of this term, we translate and rotate every initial structure before minimization so that the E F I X = 0 initially.

Several modifications have been intro- duced in our potential energy function (com- pare eqs. 6c,d to 312). The bond strain term involves a difference of squares of the bond lengths rather than a difference of bond lengths; the bond angle strain term involves a difference of cosines of angles rather than a difference of angles. Two advantages follow from these modifications. First, the expres- sions are mathematically simpler: in view of our bond angle cosine expression (41, both E B o N D and E B A N , are fourth-degree poly- nomials in the coordinate variables. Thus, evaluation and differentiation of these terms is faster. Second, the bond angle term is now an infinite Taylor series in powers of (6-8). For large angle deviations, this expression may happen to be more suitable than a har- monic potential. Furthermore, it is periodic. Indeed, this choice of bond angle potential may have contributed to the accurate bond angle values that were generated for deoxy- ribose.

+ x$,3 + ~ 3 . 3 ) . (6g)

It should be emphasized, however, that these stretching and bending functions can be matched to the more common harmonic strain functions (eq. (3c)) by equating Taylor coefficients up to and including terms of second order in a Taylor-series about equi- librium. That is, the energy values and curvatures can be made to match exactly. The relation to the coefficients of the harmonic potentials (denoted temporarily by S1' and S2') is then given by:

S1 = 4S1'F2 ( 7 4 ~2 = ~ 2 ' sin2B (7b)

Thus, for small displacements from equi- librium bond lengths and bond angles, where the mechanical behavior of the molecule is well-approximated by such quadratic terms, these functions coincide with the harmonic potentials. For larger vibrations and bending effects, the form of potentials to be used is unknown, and thus our potentials present an alternative.

For a unified treatment of all local energy terms as polynomials of the Cartesian vari- ables, we express the torsional potentials (eqs. (6e),(6f)) using the dihedral angle co- sine expression in eq. (5) in combination with the trigonometric identities

(84 cos 37 = 4 c0s3 T - 3 cos 7 . (8b)

We now describe in detail the choices for the constants associated with each energy component.

COS 27 = 2 COS' 7 - 1,

Lennard - Jones Interactions

Two different methods can be considered for computing the Lennard-Jones parame- ters A, and B , of eq. (6a). One is based on a simple scaling of the Lennard-Jones curve for each pairwise interaction45s46 and the other involves the Slater-Kirkwood equation [refs. 47,48; see also refs. 49-52 for explana- tion of increased Van der Waals radii; and ref. 53 for parameters]. For this calculation, the attractive Lennard-Jones coefficient A, was obtained from the Slater-Kirkwood equa- tion, and the repulsive coefficient B , from A , and the additional requirement that the original well depth of the LennardJones en- ergy is maintained. Parameters were taken from Olson. l7

Page 11: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et a1 1209

We include in the nonbonded energy com- ponent all interactions between individual atoms separated by three bonds or more. In cases where at least one member of the inter- acting pair is an atom group, interactions separated by two chemical bonds are addi- tionally considered. Nonbonded interactions among ring atoms are not considered since a separation of three chemical bonds in one di- rection involves a 2-bond separation in the other direction.

Electrostatic Potential

The electrostatic interaction between a pair of atoms having charges Ql and Q, and separated by a distance r, is represented in this calculation by the standard coulomb po- tential of eq. (6b) modified by a distance de- pendent dielectric function D(r,) = Eeo.l'g, as suggested by Srinivasan and

A dielectric function is introduced t o account for charge shielding by solvent mole- cules. For a neutral molecule, an effective dielectric constant of E =4.0 has generally been used for small distance separations; as interatomic distances increase, this value must be modified to model penetrating aque- ous solvent molecules. It is still difficult, however, to make a reliable estimate of the dielectric expression. Improved treatments of solvent effects and charged species are currently being investigated [refs. 56,57, for instance].

Selection of partial charges has been a diffi- cult issue in molecular mechanics calcula- tions since quantum mechanical approaches yield significantly different values [ref. 58 and references cited therein]. However, a resolution is on the horizon with experi- mental determination of these quantities.59x60 For our calculation, the partial charges Ql for individual atoms and atom groups are taken from Olson. l7

Bending and Stretching Parameters

Equilibrium bond lengths used are 1.52 A for C-C bonds, 1.42 A for C-0 bonds, 1.47 A for C-N bonds, and 1.0 A for C-H, N-H, and 0-H bonds. All equilibrium bond angles are taken to be tetrahedral (109.47').

In the choice of force-field constants, we were guided by the general relation S1 - L4 > S2 > V2, V3 where L is a characteris- tic length ( L = 1.0 A), in accordance with

the degree of flexibility associated with the geometric quantities - bond lengths, bond angles, and dihedral angles. In particular, the stretching and bending stiffness constants were chosen so that the contribution of these terms near the local minima is very small and does not affect the relative energy (see Fig. 7). For a bond length constant S1, we use a uniform value of 100.0 kcal/mol A4 for all bond types. For bond length values of 1.0 A and 1.52 A, for example, the corresponding harmonic s t r e t ch ing coefficients a r e 400.0 kcal/mol Hiz and 924.16 kcal/mol A2, respectively (see eq. (7a)). Note that the choice of S1 is not critical, since any value that can keep the bond length fluctuations small is appropriate.

For the bond angle constant S2 we use: 60.0 kcal/mol for all C-0-C, C-C-C and 0-C-C bond angles; 15.0 kcal/mol for N-C-0, N-C-C and all angles involving one or more hydrogens. The bending constant of 60.0 kcal/mol corresponds to 53.33 kcal/mol deg' in the harmonic bending functions, and similarly 15.0 to 13.33 (see eq. 7b). These val- ues were chosen to be in the range of other parameter values that have been used.17,34 By treating all bond angles of a given type uni- formly regardless of their location in the molecule, we can test whether the correct ge- ometry associated with the puckered struc- tures will result from energy minimization rather than targeting strategies.

Torsional Parameters

Torsional barriers and periodicities are es- timated from experimental data obtained principally by spectroscopic methods (NMR, IR, microwave, Raman) for low molecular weight compounds. According to a theory de- veloped by Pauling'l, potential barriers to internal rotation arise from exchange inter- actions of electrons in adjacent bonds and are thus similar for molecules with the same orbital character. This theory has allowed tabulations of barrier heights as class aver- ages.47,62-64 Since barriers for rotations about various single bonds in nucleic acids and pro- teins are not yet available experimentally, they must be estimated from analogous chemical sequences in low molecular weight compounds.

Twofold and threefold potentials are used in the present calculation. A threefold tor-

Page 12: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1210 3-oper;ies or ?eoxyrCsose

Figure 7. Results for Energy vs. P curves. The total potential energy E and its components are illustrated as a function of the phase angle of pseudorotation P for the model of deoxyribose studied. Results are given from 6 variations of the potential energy function corresponding to sets 2,3,4,5,6 and 8 of Table 11. The energy components of each plot are labeled as follows: (A) Nonbonded energy. (B) Bond length strain. ( C ) Bond angle strain. (D) Twofold dihedral term. (E) Threefold dihedral term. (F) Total energy.

Page 13: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1211

sional potential exhibits three maxima at 0", 120", and 240" and three minima at 60°, 180", and 300". In ethane, for example, all maxima are energetically equivalent and correspond to the eclipsed, or cis form, and all minima correspond to the energetically equivalent and preferred staggered, or trans form. In other molecular sequences, the local minima may not be of equal energies. The gauche forms a t 60" and 300" may be higher in energy than the trans form at 180", as in n - b ~ t a n e ~ ~ , or lower in energy as in 1-2-difluoroethane and certain C-C-C-0 linkages.65 A combina- tion of a twofold and a threefold potential can be used to reproduce both the cis/trans and translgauche energy differences. If we denote the experimental energy barrier by AV and the empirical potential energy function as given in eq. (6) by E, then the torsional pa- rameters V2 and V3 for a given rotation T are computed from the following relations:

- AVcrs/trans-Er=oa - E,=180"

= v 3 + [Em + ECOUL + EBOND + EBANGIr=O0

- [ELJ ECOUL EBOND + EBANGl.r=180"

@a)

The formulas above may appear simple but in order to evaluate the energy as a function of T, Cartesian coordinates of the molecule must be generated in terms of 7. A simplifi- cation can be made by assuming fixed bond lengths and bond angles and by calculating only nonbonded energy differences. In theory, then, every different parameterization of the nonbonded coefficients requires an esti- mation of the torsional potentials V2 and V3 to produce a consistent set.

For the present investigation of deoxy- ribose, we follow Olson in assigning the val- ues used for V3: 2.8 kcal/mol for rotations about C-C bonds and 1.8 kcal/mol for rota- tions about C-0 bonds." These values were obtained to reproduce the experimentally- observed barriers for n-butane (3.5 kcal/mol) and methyl ethyl ether (2.53 kcal/mol), re- spectively, together with the Lennard-Jones

potential. The values used for V2, namely 0.2 kcal/mol for all 0-C-C-C rotations and 1.0 kcal/mol for all 0-C-C-0 rotations, were introduced to reproduce the trans/ gauche energy differences for similar molecu- lar sequences. l7

Additivity of torsional energies. Torsional potentials discussed so far are determined to reproduce rotational barriers for only one ro- tation for a given 4-atom sequence in a given molecule. As was first mentioned by Harvey and Prabhakaranlg, previous treatments of dihedral rotations were somewhat arbitrary since only a subset of all rotations were cho- sen to be modeled in the energy function. When a dihedral angle about a B-C bond can be defined by more than one A-B-C-D se- quence, several issues must be resolved:

1. Treatment of threefold potentials. How should the V3 parameter be selected and adjusted?

2. Treatment of twofold potentials. These potentials, as used in this study, are dif- ferent from the threefold terms because their force constants V2 are associated with a 4-atom sequence and not a 2-atom bond. Thus, unlike the threefold poten- tials, the chemical type of atoms A and D in the A-B-C-D sequence affects the choice of torsion parameter. When more than one such sequence is associated with the same bond, should the V2 pa- rameter be adjusted?

3. Combined twofold and threefold poten- tials. Is further adjustment required when the same dihedral angle is includ- ed in both terms?

Since torsional potentials a r e semi- empirical in nature, the answers to these questions are not completely clear. By con- sidering eqs. (9a) and (9b), the answer to question 3 may appear simple: no conflict arises from the inclusion of one particular rotation in both the twofold and threefold potentials, since the two parameters V2 and V3 are determined by two separate equa- tions. However, computing the cisltrans and trandgauche energy differences for different molecules is not ideal. For example, in this study parameter values for rotations about the C-C bond were chosen to reproduce the cishrans barrier for n-butane and the trans/ gauche barrier for various C-C-C-0 and 0-C-C-0 sequences. Nonetheless, this pro-

Page 14: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

12 12

cedure may be justifiable since we are trying to incorporate two separate effects observed in small molecules in the study of larger and more complicated systems.

Question 1 was already raised in the mo- lecular dynamics study of ribose by Harvey and Prabhakaran.lg In their study, a compro- mise modeling strategy between the mini- mum set of five endocyclic dihedral angles and the maximum set of all 16 heavy atom rotations was implemented. For our investi- gation of deoxyribose, we examine the model- ing of all 12 dihedral rotations associated with heavy atoms or atom groups. To repro- duce a particular cis/trans energy difference about a particular bond, V3 is adjusted by a simple division by the number of modeled ro- tations. Our results, described in the next sec- tion, indicate that this simple modification of the torsional parameters can allow a system- atic treatment of all rotations in a given molecule. We list all the rotations and their associated torsional parameters in Table I.

Question 2 appears more difficult. For illus- tration, consider all 4 rotations about the C3’-C4’ bond as listed in Table I. Four three- fold rotations are associated about this endo- cyclic bond (with V3 of 2.8 kcal/mol equally distributed among them), as are 3 twofold rotations: 2 of which a re C-C-C-0 se- quences and one of which is 0-C-C-0. Since the 4-atom sequences involve different atoms, it seems that no V2 adjustment is necessary. However, this conclusion appears inconsistent when we consider the above apportionment devised for obtaining the V3

Properties of Deoxyribose

parameters for the threefold potentials. We reserve this question for future thought and experimentation. In our present study, the gauche contributions are quite small, and therefore each 4-atom sequence can be indi- vidually considered at full weight.

C. Organization for Minimization

The form of the potential energy function was constructed to allow direct differentia- tion with respect to the coordinate variables and to ensure that the form of the derivatives is as simple as possible. Organization of the data structures that store the geometric infor- mation is also crucial to the efficiency of the minimization strategy. The bond lengths, bond angles, dihedral angles and associated parameters are stored in arrays that facili- tate accessing the data for the function and derivative evaluations.

Three stages comprise the construction of such data tables. First, the individual atoms are numbered and are assigned identification labels for atom types; then the individual bonds are numbered. Second, two connectivity arrays are entered to specify bonded atoms and neighbors of each atom, and four parame- ter arrays are entered to specify stretching, bending and torsional constants associated with general chemical sequences. Third, a subroutine is called to construct lists of all the bonds, bond angles and dihedral angles in the molecule and then to associate the proper en- ergy parameters to each sequence. In this way, changing energy parameters is an easy

Table I. Dihedral angles and associated parameters investigated in this study. In column (b) of the V3 parameter, all heavy atom rotations in the model are included, with V3 appropriately scaled from values in column (a), where only endocyclic rotations about C-C and C-0 bonds are included. All angles are given in degrees and all energy parameters in kcal/mol.

Dihedral Angle Threefold Torsion Constants (V3)

Twofold Torsion Constants (V2) a b

(70) C4‘-01’-C1’-C2’

(71) Ol’-Cl’-C2’-C3’ N-C1’-C2’-C3’

(72 ) Cl’-C2’-C3’-C4’ Cl’-C2’-C3’-03’

C2’-C3’-C4’-C5’ 03’-C3’-C4’-01’ 03’-C3’-C&-C5’

(74) C3’-C4’-01’-Cl’ C5’-C4’-01’-C1’

C4’-01’-CY-N

(73) c2’-C3’-c4’-01’

0.2

0.2 0.2

1.0 0.2

1.8 0.9 0.9

2.8 1.4 1.4

2.8 1.4 1.4

2.8 0.7 0.7 0.7 0.7

1.8 0.9 0.9

Page 15: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1213

task, and calculation of the energy terms and derivatives requires a simple reading “down the list” where all the data have been preprocessed.

The required derivatives are assembled in a modular fashion. Function subprograms are used to compute the derivatives of various geometric quantities with respect to the coordinate variables xk,l, for k = 1, . . . , N and I = 1 ,2 ,3 . For example, the derivative of the interatomic distance vector rrJ with re- spect to xk,l, dr,,/dxk,z, is given by the func- tion der ( i , j , k , I ) , and a2rl, /axk,l ax,,, by dder(i,j, k , I , m, n). The expression for a vector inner product involving 4 atoms (x, - x,) - (x,. - x, 1 is given in function prod(i,j, i ’ , j 7 and t h e de r iva t ives of t h i s q u a n t i t y , dprod(i,J, i ’ , j l ) / d x k , l and a2prod(i,j, i l , j ’ ) / d X k , l dx, n , in the function subprograms dep(i, j , i I , j I , k, I ) and ddep(i, j , i ’, j ‘, k, I , m, n), r e s p e c t i v e l y . S i m i l a r l y , a f u n c t i o n cosine(i,j, i ’, j ’1 evaluates the dihedral angle cosine approximation given in eq. (51, and its associated derivatives are given in two subprograms. Full details of the program are given elsewhere .66

This construction allows products and quo- tients in the energy expression to be differ- entiated simply and efficiently. With analytic first and second-derivatives at hand, Newton methods can be implemented. The two parti- cular Newton algorithms that we have tested are: Gill and Murray’s modified Newton method, and a truncated Newton method adapted for potential energy minimization. The former solves the Newton equation di- rectly by the modified Cholesky factorization and is thus suitable for functions of about 100 variables or less, The latter solves the Newton equation iteratively by a truncated preconditioned Conjugate Gradient method and is thus attractive for large-scale prob- lems. Full details of the algorithms are pro- vided in the accompanying paper.6s

All computations described in the next sec- tion were performed on a Vax 8600 at Cou- rant Institute, New York University.

IV. RESULTS AND DISCUSSIONS

The modeling and minimization tech- niques implemented in this study were de- signed for unconstrained potential energy minimization in Cartesian coordinate space.

Minimized structures, however, are only as good as the chosen parameter set. Furthermore, the conformation space will be surveyed prop- erly only with powerful minimization strate- gies. Thus, to evaluate the reliability of the results, we must ensure that only two local minima are obtained for the deoxyribose model and that their geometries and energies are consistent with observed data. For nucleic acid sugars, even these simple criteria have provided a continuing challenge. Going a step beyond consistency, we were interested in examining the effects of changes in stiffness constants and modeling techniques on the structures obtained and on the energy dif- ferences among them. In particular, the effect of the gauche potential and different choices for modeled dihedral angle sets in the three- fold torsion potential are important for systematic parameter assignments in DNA models.

A convenient way to study all these issues is to formulate several parameter varia- tions and examine results by two techniques: 1) energy minimization, and 2) unmini- mized Energy vs. P (the pseudorotation parameter) curves. In previous pseudorota- tion investigations, E vs. P profiles were usu- ally generated using artificial constraint terms in combination with minimization.’8220234 These constraints restricted the backbone di- hedral angle passing through the ring, all 5 endocyclic dihedral angles, or one endo- cyclic dihedral angle, to assume fixed values associated with the pseudorotation path. Clearly, results are constraint-dependent; the stronger the constraint, the higher the energy barriers expected. Since full Cartesian mini- mization is our major computational tool, results from E vs. P curves that are unmini- mized are sufficient for examining influences of different modeling strategies. Thus, the E vs. P profiles are not intended to determine barrier heights as these will be exaggerated without complete energy relaxation.

The minimization results also provide an opportunity to examine the deviation from “ideal” pseudorotation behavior. We will ana- lyze the approximation made in the pseudo- rotation eq. (2) for the five endocyclic dihedral angles by two different comparisons. First, to compare P and T,,, corresponding to our minimized structures with the values deter- mined for analyzed crystal structures, we

Page 16: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1214 Properties of Deoxyribose

compute P and T,, from an exact Fourier series (described in the next section). Second, to examine the analytic fitting provided by eq. (2) for the individual dihedral angles, we compute the sum of squares of the angle de- viations between the values generated by minimization and the values predicted by eq. (2) with the computed Fourier Series parame- ters P and T , ~ ~ . We use the Fourier series representation because it provides an equal treatment of all dihedral angles. A simpler approximation for the pseudorotation pa- rameters can be obtained from the formula derived by Altona, Geise and Romers2' from the pseudorotation eq. (2):

After computingP from this formula, T,, can be obtained from eq. (2) f o r j = 2. The dis- advantage of this procedure is that it is some- what arbitrary: a different dihedral angle labeling scheme will produce different pseu- dorotation phase angle and phase shift.

The present section is organized as follows. First, we will describe the different energy sets examined, the details of generating coor- dinates as a function of P, and the expression of the Fourier series and derived pseudo- rotation parameters. Second, we will describe the structural, energetic and geometric de- tails for the deoxyribose conformations gener- ated by minimization, and then, for these structures, we will analyze the dihedral an- gles in the context of pseudorotation. Third,

we will briefly analyze the obtained E vs. P profiles. Fourth, we will conclude with a com- parison of the two approaches - energy minimization and E vs. P curves.

A. The Potential Energy Functions Investigated and Methods of Analysis

Eight different variations of the potential energy function described in Section I11 are examined (see Tables I and 11). The Lennard- Jones, electrostatic and torsional parame- t e r ~ ~ ~ remain constant throughout the calcu- lations. Different choices and combinations of bond length stiffness constants, bond angle stiffness constants, and strategies for model- ing internal rotations are considered to exam- ine the sensitivity of the results to these parameter and modeling changes.

The first potential energy function studied is composed of nonbonded interactions, bond length and bond angle terms, and a threefold torsional potential for the five endocyclic dihedral angles only. For the choice of bond angle bending constants, we wanted to ensure that the correct puckering geometry emerges. Therefore, it was not clear whether the tet- rahedral angle formed by a 3-atom sequence involving at least one hydrogen should be made to stay very near to tetrahedral or be made extremely flexible to accommodate the endocyclic bond angle arrangements. Thus, in set 1 we tried a very large bending con- stant for these angles - 200.0 kcal/mol, and in set 2, a much smaller value of 15.0 kcal/ mol. Since results were qualitatively similar,

Table 11. A description of energy function variations considered in this study. The coefficients for the Lennard- Jones, electrostatic and torsional parameters remain constant throughout the calculations, and variations in bond length stiffness constants, bond angle stiffness constants and strategies for modeling rotations comprise the various sets. The bond length force constant S1 is given in kcal/mol A" and the bending and torsional parameters in kcal/mol. The angle bending force constant S 2 has a value of 60 kcal/mol for all C-0-C, C-C-0 and C-C-C sequences throughout the sets. Only the bending constants for N-C-0 and N-C-C angles, and all sequences involving hydrogens (denoted by H**) were varied. The references in the torsional columns correspond to the appropriate columns of Table I.

Parameters s1 s 2 v 2 v3 SET Modified (Bond Length) (Bond Angle) (Torsion) (Torsion)

s 2

v 2 v3

v 2 , v3 s1 , v3

s1 , s2 , v3 s1 , s2 , v 2 , v3

100 for all bonds ,,

7,

,, ,,

25 for all bonds

50 for all bonds

15 for NCO and NCC 200 for H**

15 for NCO, NCC and H** ,,

,, ,,

60 for all angles 1,

- V3, a

v 2 - V3, b v 2

,,

7

- 9

2

- v 2

Page 17: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al.

we adopted the latter, more intuitive, choice in the remaining energy parameter sets.

We then added a twofold or gauche poten- tial to the energy function (set 31, a weighted threefold torsional term to include all heavy atom rotations about C - C and C - 0 bonds (set 41, and then examined their combined affect (set 5 ) . In sets 6-8, we tried various combinations of bond length constants, bond angle constants, and torsional terms in order to examine the overall relationship between the structures obtained and the potential energy form.

Minimization runs were started from points corresponding to P values at 18" inter- vals along the pseudorotation cycle and at randomly perturbed points from those. Coor- dinates as a function of P were generated using Pearlman and Kim's procedure," for the skeletal ring atoms. Exocyclic substitu- ents were attached to occupy positions close to tetrahedral in a local coordinate system

1215

(see Fig. 5 ) . Note that this procedure was only used to obtain the initial guess to start the minimization. Once started, the minimiza- tion procedure makes no use of the pseudoro- tation cycle. Results for minimization runs are reported in Table 111.

For the E vs. P profiles, the coordinate gen- eration procedure mentioned above was used at 1" intervals, and the variation of the total energy and individual energy components were plotted as a function of P. The curves for six representative energy sets are shown in Fig. 7.

For the minimization-generated structures, analysis of the pseudorotation approximation is accomplished by considering the exact representation of the five endocyclic dihedral angles T], j = 0, . . . ,4, given by:

Figure 5. A schematic representation of positioning the exocyclic substituents of ring atoms. Atom 1, attached to ring atom j , is positioned in a local coordinate system (el, ez, ea) through the 3 ring atoms i - j -k , as follows. Let a = x, - x,, b = xk - x, and c be the distance between x, and x1. Then we have:

el = e2 x e 3 . (a x b) e3 = ~

Ila x bll' (a + b) Ib + bll'

e 2 = -~

We define as the angle between pl and the e2 axis, where pl is the projection vector of x1 with the el, ez plane; 42 is defined as the angle between xI and pl. Once 41 and 42 are specified, the coordinates of xI are given by:

xI = x, + (G cos + 2 sin +l)el + (5 cos 4 2 cos 4dez + (G sin 4 d e 3 .

To position the exocyclic atoms symmetrically from the ring plane so that a tetrahedral arrangement is approximately occupied, we let +1 = 0 and d2 = t54.7356. For C5', N, H3' and 1H2' 42 = +54.7356 and for H4', H l ' , 0 3 ' and 2H2' 4z = -54.7356 (see Fig. 1).

Page 18: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1216 Properties of Deoxyribose

Table 111. A summary of geometric and energetic results from the minimization study. For all 8 energy sets described in Table 11, various quantities associated with the generated North (N) and South (S) minima are given: the estimated phase angle of pseudorotation P and the puckering amplitude T,,, (obtained from the procedure outlined in Section IV(A)), the endocyclic bond angles, endocyclic dihedral angles, 2 dihedral angles associated with rotations about the C3’-C4’ bond, and all energy components. For comparison, we report bond angles obtained experimentally for the C3’-endo and C2’-endo conformations7* 17, and endocyclic dihedral angles predicted by the pseudorotation eq. (2) with rmax = 38”. All angles are given in degrees and all energies in kcal/mol.

SET

Minimum P

00

01 02 03 0 4

7 0

7 1

7 2

7 3

7 4

7max

Ol’-C4’-C3‘-03’ C5’-C4’-C3’-03’

ENB EBOND EBANG Emom E~FOLD ETOTA, Pseudorotation

Deviations

Experimental 1 2

C3‘-endo C2’-endo N S N S 18 162 18 168 17 162

110 110 110 111 110 110 107 106 105 104 106 105 102 101 104 103 102 102 102 103 101 102 101 102 105 106 105 107 105 107

0 - 22 0 -20 -4 -23 -22 36 -24 35 -28 37

36 -36 37 -37 39 -36 -36 22 -37 26 -38 23

22 0 24 -4 22 -1 38 38 39 38 41 39

157 95 155 96 83 146 85 143

-0.34 0.31 -1.11 -0.87 0.09 0.11 0.04 0.04 4.20 4.03 4.17 3.77 0 0 0 0 8.13 8.27 7.77 8.32

12.08 12.72 10.87 11.26

1.86 0.93 36.06 1.59

3 4

N S N S 9 162 24 158

110 106 102 101 106

7 -29

39 -36

19 40

153 88

-1.08 0.04 4.11 1.30 7.82

12.19

110 105 102 102 107

-23 37

- 36 23 0

39

96 144

-0.86 0.04 3.81 0.46 8.27

11.72

109 107 102 101 104

-4 -22

37 - 40

28 41

159 82

-1.27 0.06 4.16 0 6.86 9.81

109 105 102 103 107

-25 37

-35 21

2 38

96 146

-0.74 0.07 3.52 0 7.21

10.06

1.86 0.74 2.34 1.11

Table I11 (continued)

SET Experimental 5 6 7 8

Minimum C3’-endo C2’-endo N S N S N S N S P 18 162 20 159 25 162 20 158 18 160

00 01

0 2

03 04

7 0

71

7 2

7 3

7 4

T,*X

Olf-C4’-C3’-03’ C5’-C4’-C3’-03‘

ENB EBOND EBANG EWOLD E~FOLD ETOTAL Pseudorotation

Deviations

110 107 102 102 105

0 - 22

36 -36

22 38

110 106 101 103 106

- 22 36

-36 22 0

38

109 107 103 101 105

-1 - 23

37 -39

25 40

157 83

-1.25 0.05 4.07 1.37 6.96

11.20

1.20

109 105 102 103 107

- 25 37

-35 22 2

39

96 146

-0.73 0.07 3.54 0.48 7.18

10.54

2.58

109 107 102 101 104

-4 -21

37 -41

28 41

159 81

- 1.41 0.22 4.10 0 6.73 9.64

1.21

109 109 105 106 102 103 103 101 107 105

-26 -1 38 -23

-36 37 21 -39 3 25

35 40

96 158 146 81

-0.92 -0.30 0.27 0.23 3.44 4.10 0 0 7.06 6.94 9.85 10.97

59.99 1.20

~

109 105 101 103 107

-26 39

-36 22 2

40

96 146

0.18 0.29 3.73 0 7.00

11.20

110 106 103 101 105

-1 - 25

37 -37

23 39

157 83

-1.16 0.10 4.08 1.33 7.20

11.55

~ ~~

109 105 101 102 107

- 25 38

-36 23 1

40

95 146

-0.34 0.15 3.68 0.44 6.98

10.91

2.29 4.70 2.67

Page 19: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 1217

The five Fourier coefficients are given by the inverse Fourier series

K = 0,. . . , 4. (lob)

Representing each complex Fourier coeffi- cient by a k = Akeiek, we obtain by periodicity a5-k = ak (the bar denotes the complex conju- gate), and thus the series can be simplified to the cosine expansion

l 4 - i2d5 j k ak = - C T j e 5 j - 0

-

T~ = A, + 2A, cos 8 , + - ( 2?!)

+ 2A2 cos 82 + - , (11) ( 4?!) where

l 4 A o = - C Tj , 5 j=O

A , cos 8, = ( T~ cos F) , 5 j - 0

rn = 1,2, ( l lb)

A , sin 8, =

rn = 1,2. ( l lc)

By comparing eq. (11) with eq. (21, it is clear that the pseudorotation approximation coin- cides with the analytic Fourier series rep- resentation when A, = A , = 0. P and T,,, correspond to the two Fourier-derived expres- sions:

8rr 5 P -- O 2 + - and T,,, 2A2. (12)

B. Results from Minimization

For all energy sets and for all initial mo- lecular conformations, only two local minima are obtained. As will be discussed below, the minima correspond closely to the ideal C2'-endo and CS'-endo sugar puckers. The North pucker is obtained from starting struc- tures corresponding to increasing P values of 270" to 72", and the South pucker is ob- tained from a P range of 90" to 252" (see Fig. 6) . Random starting points predict one of the two identical structures. These results, as we see most clearly from Fig. 6, support the idea that the pseudorotation surface is a low-energy path.

Quadratic convergence is realized near a local minimum for both the modified Newton method and the truncated Newton method.

C 3 h n d o

/

'\ \

C 2Lendo

Figure 6. Illustration of the minima obtained from starting points along the pseudorotation cycle. Starting points at 18" intervals for P generate the nearest local minimum on the pseudorotation path.

For both Newton methods, about 17 Newton iterations and 21 function and gradient evaluations were required to reach a gradient norm of less than 1.0 x lop7. Thus, for an op- timization problem of this size, performance of Newton methods based on a direct solution of the Newton equation is very similar in com- putational effort and convergence properties to performance of Newton methods based on an iterative linear solver. The truncated Newton will undoubtedly be superior for functions with a larger number of variables when the search in the large conformational space will be more adaptive to the progress made and more directed toward the important conformational regions. Moreover, the stor- age requirements of the truncated Newton method can be made less than those of the modified Newton method for any problem size. This can be accomplished by using a preconditioned Conjugate Gradient variant that requires analytic evaluation and storage

Page 20: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1218 Properties of Deoxyribose

of only the "local" Hessian elements entering from the bond lengths, bond angles and torsional potentials. In other words, the long- range, nonbonded terms in the Hessian matrix need not be computed at all! (details are provided in the accompanying paper68). Application of the truncated Newton method to large potential energy functions is promis- ing, because our experience with the trun- cated Newton code on various test problems suggests that function form, rather than problem size, governs the algorithm's per- formance. We are currently working on effi- cient implementation of the method to larger molecules.

Structures. We summarize in Table I11 the generated endocyclic bond angles, selected di- hedral angles, and the values of all energy components for the two structures obtained from all minimization trials. For comparison, we list corresponding bond angles for the C2'-endo and C3'-endo conformations ob- tained e~per imenta l ly~ ,~ and endocyclic di- hedral angles predicted theoretically by the pseudorotation eq. (2) with T , ~ ~ = 38". Addi- tionally, the pseudorotation parameter P and puckering amplitude 7,,, are computed for each structure from the coefficients of a Fourier series by eq. (12). These values are used to compute the 5 dihedral angles from eq. (2) and are compared to the dihedral angles generated by minimization by sum- ming the squares of the individual angle deviations. This value is an indication of the approximation involved in pseudorotation.

The estimated pseudorotation angles for the minimization-generated structures range

from 9"-25" with an average (over all energy parameter sets) of 19" for the North pucker (C3'-endo). The estimated P values range from 158"-168" and average 161" for the South pucker (C2'-endo). These values are very close to the ideal phase angles of 18" and 162". The puckering amplitude T,, has an average of 40.1" for the North puckers, and 38.5" for the South puckers. This is clearly a small difference, but such a trend is observed consistently for all different potential energy sets and, moreover, has been noted previously in theoretical investigations19s34 and experi- mental analyses6,'. Thus, conformational in- terchanges by pseudorotation with variable amplitudes of puckering may occur.

Energies. The small but significant energy differences between the North and South con- formations illustrate the dilemmas that arose with regard to reproducing the South prefer- ence for deoxyribose sugars. Clearly, the only energy functions that predict a lower energy state at C2'-endo are those that include the gauche potential (see Tables I11 and IV).

The nonbonded and threefold torsional energy components are lower for the North pucker than for the South pucker, while the bond length strain terms are small and nearly equal. In combination with the bond angle component the total energy is lower for C3'-endo. Thus, only the gauche potential can sufficiently penalize the energy at the North minimum to make the South, or C2'-endo, a global minimum.

The gauche term is dominated by the 01 '-C4'-C3'-03' rotation. The endocyclic 0-C-C-C sequences contribute about the

Table IV. A comparison between the minimization and E vs. P studies. For all 8 energy sets examined, the values of P and E are given for the N and S minima as well as the energy difference, EN-ES , for each set. All angles are given in degrees and all energies in kcal/mol.

Location of Minima EN. s Parameters E vs. P Minimization

SET Modified N S N S E vs. P Minimization

- s 2 v2 v3

v2, v3 Sl,V3

s1, s2, v3 s1, s2, v2, v3

15 14 13 15 14 14 14 14

177 176 176 174 174 166 166 169

18 17 9

24 20 25 20 18

168 162 162 158 159 162 158 160

-0.21 -0.15 +0.71 +0.01 +0.88 +0.34 +0.32

1.06

-0.64 -0.39 +0.47 -0.25 +0.66 -0.21 -0.23 +0.64

Page 21: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 12 19

same energy to both local minima, but the Ol’-C4’-C3’-03’ sequence exhibits a major conformational difference. This dihedral angle is gauche (-96”) at the South pucker and trans (=157’) at the North. Since the bar- rier height is higher for 0-C-C-0 than for 0-C-C-C rotations, the gauche contribution from this sequence alone lowers the South pucker energy by about 0.8 kcal/mol. The Cl’-C2’-C3’-03’ sequence is in the gauche range in the South and the trans range in the North, but its effect is balanced by the reverse trend for the C5’-C4’-C3’-03’ sequence. The observed correlation between the 0 1 ’ - C 4 ’ - C 3 ’ - 0 3 ’ ( ~ 7 ~ - 120’) a n d C5’-C4’-C3’-03’ ( ^ - T ~ + 120”) rotation sequences and the ring pucker mode has been used to characterize the state of the sugar pucker in the nucleotide backbone.34

Other variations in energy potentials have little effect on the relative energy differences between the North and South minima. Using a threefold torsional potential for all heavy rotation sequences, as described in Table I, slightly lowers the energy difference between the North and South conformers. From a careful examination of the values of all mod- eled dihedral angles, it becomes apparent that the contribution from the endocyclic angles dominates over that of other angles in the group with which the V3 parameter is distributed. Consequently, the overall effect of including all heavy rotations of the molecule in a weighted threefold torsional potential is to lower the torsional energy com- ponent. As we will see from the Energy versus P curves, the threefold torsional potential is lowered consistently for the entire pseudo- rotation cycle, and thus does not change the relative energies between different sugar puckering modes.

When changing stretching and bending stiffness constants, we also observe very small relative energy differences between the two minima. Reducing the bond stretching constant S1 from 100.0 to 25.0 kcal/mol A4 (compare sets 4 and 6) changes the energy difference by only 0.04 kcal/mol. Using a con- stant bending parameter of 60.0 kcal/mol for aZZ bond angles of the molecule (compare sets 6 and 7) changes the energy difference by only 0.02 kcal/mol. Similarly, changing both S1 and S2 significantly (compare sets 5 and 8)

changed the energy difference by only 0.02 kcal/mol. These observations suggest that the model is qualitatively good- the bending and stretching constants are appro- priately chosen in relation to each other and to other parameters of the potential energy function - so that the non-bonded inter- actions are allowed to govern the ring geome- try even with significant parameter changes in other energy components.

Thus, parameter sets 5 or 8, both of which include a gauche potential and a torsional potential for all heavy rotations in the mole- cule, are successful for reproducing the proper C3’-endo/C2’-endo relative energy difference.

Puckered Geometry. The bond angle values obtained from energy minimization are in very good agreement with available experi- mental data for all energy sets tested: the endocyclic bond angles are mostly within 1” of observed values. It should be noted however, that geometric data available for deoxyribose come only from crystal structures of nucleo- sides and nucleotides. In isolated form, the 2’-deoxyribose molecule is unstable - the fu- ranose ring tends to adopt a pyranose, or 6-membered ring, form.67

The five-membered ring puckers from a planar conformation in order to minimize the non-bonded energy. The non-bonded energy is greatest in the planar ring, when all endo- cyclic substituents are eclipsed. Any puck- ered form can reduce this strain, but the best arrangement is obtained when C2’ or C3’ puckers out of the plane of the other 4 atoms on the same side of C5’. No constituents re- main eclipsed at that position. Since rota- tional barriers about C-0 bonds are lower than those about C-C bonds, the torsional energy is lower when endocyclic dihedral an- gles about C-0 bonds are 0” (atoms are co- planar) rather than about C-C. Thus, the two conformations associated with T~ = 0” and T~ = 0” with all other rotations ( T ~ , T~ and T J

maximally eclipsed are exactly the C3‘-endo and C2’-endo sugar puckering forms.

As the ring deviates from planarity, endo- cyclic bond angles must necessarily deviate from planar ring angles of 108’ and from tet- rahedral angles of 109.47’ which are inconsis- tent with ring closure.1o The C-0-C bond angle remains very close to tetrahedral since the ring oxygen is farthest from the puckered

Page 22: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1220 Properties of Deoxyribose

C2’ or C3’ atoms. Furthermore, the oxygen atom has no exocyclic substituents. Bond an- gles at the neighboring C1’ and C4’ atoms are reduced from the tetrahedral value by about 2” to 5”, while the bond angles at C2’ and C3’ are reduced the most-by 6“ to 8”. The bond angle at the puckered atom (C2’ or C3’) is al- ways the lowest among all endocyclic angles.

All these trends are evident in the energy- minimized structures. It is interesting that the observed trend in the endocyclic bond angle values

c-0-c > c-c-0 > c-c-c is accurately reproduced as a result of energy minimization and not as a consequence of dif- ferent bending force constants for these angles: all endocyclic bond angles were as- signed a stiffness constant of 60.0 kcal/mol.

With regard to bond lengths, the large bond stretching constant S1 enforces values very close to the observed equilibrium bond lengths. Nevertheless, in most computed ge- ometries the 01’-C4’ bond is slightly shorter than the Ol’-Cl‘ bond (by about 0.004&, clearly smaller but indicative of the trend ob- served in nucleoside crystal structures.17 Bond C2’-C3’ is also slightly shorter than other C-C bonds, and computed C-H bond lengths are usually somewhat stretched from

From the sum of squares of deviations in the generated endocyclic dihedral angles and those obtained from eq. (2) (see last row of Table III), we see a range from 0.7 to 60.0. For all but 2 of the 16 structures analyzed, the sum is not large. The sum, however, is signifi- cant for the North minimum of set 2 and the South minimum of set 6: 36.0 and 60.0, re- spectively. In these cases, A, and A,, the mag- nitudes of both neglected Fourier coefficients a. and a,, are large. Thus, a good fit of the pseudorotation approximation relies on A, and A, having negligible values. From our analysis, these magnitudes are considerably smaller in relation to that of A2, but are not negligible. Thus, although in general the pseudorotation description produces a good approximation for P and T,,,, it does not al- ways produce a good approximation to the individual dihedral angles.

Mathematically, an explanation for the small magnitudes of the neglected lower order

1.0 A.

Fourier coefficients can be provided by considering the original Cremer and Pople formulation.14 Their construction guarantees that atomic displacements of the 5 ring atoms from a chosen reference plane are given exactly by a truncated Fourier series as in eq. (11, since this plane is chosen to guaran- tee that A, = A, = 0.

In view of equation set (111, Cremer and Pople have oriented the plane z = 0 so that the ring coordinates x, are translated to xj ( j = 0,. . . ,4), whose z-coordinates (x]’,,) satisfy the conditions:

2 n-j 4 4

2 x;,3 = 0, j = O j = O 5

c xg.3 cos - = 0 ,

4 2 n-j C xj,, sin = 0 . (13) il j = O

Physically, these conditions imply that the origin of the mean plane is the center of mass, and, in the case of small puckering amplitude for a regular pentagon, that angular momen- tum is c~nserved.’~

The new coordinates xj’ are defined relative to a new origin so that eq. (lla) is satisfied:

l 4 x j ’ = x j - - c x k , j = O ,..., 4.

Then, the coordinate z-axis is defined the direction of the unit normal n:

k=O

xj,, = x j . n , j = 0 ,..., 4, where

2 7i-j t l = E x j sin-, 4 t l x t2

I(t1 x t211’ j = O 5 n =

4 2 n-j and t2 = c xj cos c. j=O iJ

By construction, t l and t2 are perpendicular to n. Thus, the inner products of t l and t2 with n guarantee that the second and third conditions of eq. (13) hold also. This deriva- tion can explain why the truncated Fourier series in eq. (2) provides a reasonable ap- proximation: Dunitz has shown that eq. (2) follows from eq. (1) for a regular pentagon with infinitesimal deviations from a planar conformation. lo

Differences between the Cremer and Pople description and the Altona and Sundaral- ingam description will occur for the furanose ring, since Dunitz’ derivation is not valid for finite puckering amplitudes and irregular

Page 23: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al. 122 1

pentagons. Futhermore, the rather ingenious mean plane construction of Cremer and Pople has a subtle flaw: this plane does not remain constant as the ring follows the pseudo- rotation cycle. Consequently, even for a regular pentagon, any deviations from an initially-chosen plane necessarily lead to approximations in eq. (11, and in turn, to approximations in eq. (2).

C. Results from Energy vs. P Curves

Structures. The two local minima obtained in the Energy vs. P curves for all parameter sets occur at an average value of P = 14" for the North and P = 172" for the South, not as close to the ideal values of 18" and 162" as those predicted by energy minimization. All East maxima-corresponding to the 01'-endo pucker-occur near P = go", and all West maxima- the 01'-exo pucker - occur near P = 276". The non-bonded ener- gies are high a t these conformations as a result of the eclipsed C2' and C3' substituents in the East and the equatorial orientation of the base with the exocyclic C5' substituents in the West. Since conformations were rigid for a given value of P, barrier heights are exaggerated. Despite this procedure, the av- erage barrier for interconversion between the C2'-endo and C3'-endo conforma- tions through the East region of the pseudo- rotation path - 3.2 kcal/mol- falls in the 2-5 kcal/mol range predicted' and confirmed by theoretical s t u d i e ~ . ' ~ - ~ ~ This barrier is low enough to suggest rapid interconversion be- tween the two preferred sugar conformations but high enough to imply that pseudorotation is hindered. The barrier observed at the West is significantly higher and suggests that conformational interchanges through the unfavorable 01 '-ex0 conformation a re prohibited.

Energies. An examination of the energy component curves can explain the observed trends in the relative energy changes for the 8 different model sets studied (see Fig. 7). Clearly, the overall shape of the total energy curve is dominated by that of the nonbonded component. However, if only the nonbonded interactions are considered, the energetically equal C2'-endo and C3'-endo conformations are separated by a very low East barrier which consequently suggests that pseudo- rotation is free. Only the bond strain and

threefold torsional potential can effectively raise the East and West barrier heights.

Both the bond length and bond angle ener- gies exhibit sinusoidal dependencies as expected from the empirical functions used to describe the variations of these geometric quantities with P .'l As noted p rev i~us ly~~ , the threefold torsional potential constitutes the largest contribution in magnitude to the total energy along the pseudorotation path- way. The sinusoidal torsional potential exhibits 2 local minima at 0" and 180", corre- sponding to the symmetrical twist forms of C2'-endo-C3'-exo and C3'-endo-C2'-exo (see Fig. 3).

The gauche potential contributes 0.95 kcal/ mol at P = 0", increases slowly until 27", then falls to 0.1 kcal/mol at 164", and rises back to 0.95 kcal/mol to complete one pseudo- rotation cycle. Thus, the energy difference of the gauche contribution between the North and South conformers is essentially the same as that observed by minimization - about 0.85 kcal/mol.

The effects on the energy differences between the two local minima can now be understood. From Table IV it is evident that the energy difference between the C2'-endo and C3'-endo structures is affected by several changes in the energy function form and asso- ciated parameters. Introduction of a gauche potential, the use of a weighted threefold tor- sional potential, or a lowering of the bond stretching constant can effectively lower the energy of the C2'-endo conformation in re- lation to C3'-endo.

Since the bond energy is sinusoidal and non-negative with a minimum of 0.0 kcal/ mol at P = 17", a lowering of the stretching parameter S1 hardly affects the C3'-endo energy. The function's sinusoidal shape is dampened, thereby lowering the energy for C2'-endo. This explains the energy shift for sets 6,7 and 8 of the E vs. P study.

By modeling all dihedral angles and appro- priately adjusting the torsional parameters V3, the torsional curve is essentially trans- lated downward by about 1.2 kcal/mol. Fluctuations in this shift are simply due to differences in the overall dihedral angle con- tribution for various P values. Since the North minimum is lowered by 1.08 kcal/mol and the South minimum by 1.27 kcal/mol, the net effect of this torsional potential modification is to slightly lower the energy

Page 24: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1222 Properties of Deoxyribose

at the South. The observed translation of the ESFOLD curve indicates t h a t equal treatment of all heavy atom rotations can be achieved correctly by this weighing procedure.

Finally, we can explain the observation that changing the bending constant S2 has no effect on the energy difference. By increasing S2 from 15.0 to 60.0 kcal/mol for N-C-C, N- C - 0, and all bond angles involving hy- drogens, the bond angle energy is increased uniformly by about 0.4 kcal/mol. Thus, the energy difference between the two preferred conformations is left unchanged.

D. Comparison of the Two Approaches

The energy differences between the C3’-endo and C2’-endo conformations are not identical for the energy minimization and E vs. P studies, but they exhibit the same rela- tive energy difference trends (see Table IV). For both studies, the gauche potential can be used to shift the global minimum from the C3’-endo t o the C2’-endo s t ructure by 0.85 kcal/mol. A weighted threefold tor- sional potential for all heavy atom dihedral angles can also shift the global minimum to the South, by about 0.2 kcal/mol. However, since the energy differences are larger for the E vs. P curves than for the minimization, the energy shift induced by the weighted three- fold term alone makes the two minima ener- getically equal in the E vs. P curves.

In contrast to the minimization study, changing the bond stiffness constant S1 affects the energy difference between the two local minima in the E vs. P curves. As we see in sets 4 and 6 in Table IV, lowering S1 from 100.0 to 25.0 kcal/mol A“ hardly changes the energy of the minimization- generated structures, but it effectively raises the energy difference by about 0.3 kcal/mol for the E vs. P results. Similarly, the relative energy changes not observed in the mini- mization sets 5 and 8 are contrasted by an increase of about 0.2 kcal/mol in favor of the South minimum.

Thus, these observations demonstrate that in general the Energy vs. P approach is more susceptible to fine-tuning and manipulation for reproducing the desired quantitative energetic and structural data. Any geometric differences between the structures along the pseudorotation cycle can be exploited by

appropriate choices of energy components and parameters. This would also be the case with minimized E vs. P curves, since geome- try is generated by constraints. In energy minimization with the full set of degrees of freedom, the molecule can effectively perform “internal adjustments” to accomodate the changes in energy parameters.

In summary, results from minimization and E vs. P profiles indicate that the use of a gauche potential and a weighted threefold torsional potential a r e appropriate for reproducing an energetically more favorable state for deoxyribose at the South region of the pseudorotation cycle and for modeling systematically all heavy dihedral rotations. Thus, the stiffness constants and modeling strategy for internal rotations considered in sets 5 or 8 produce deoxyribose conforma- tions that are consistent with structural, en- ergetic and geometric data observed in crystal structures.

V. CONCLUSIONS

Previous discrepancies with experiment for the furanose ring conformation have prompted our present investigation. These discrepancies have revealed some underlying uncertainties of potential energy calcu- lations. Thus, we have chosen to examine the fundamental issues of a computational approach- degrees of freedom, potential energy parameterization and minimization for the deoxyribose model. The detailed exam- ination of structural, energetic and geometric results for various parameter assignments by minimization and E vs. P curves has allowed us to evaluate the pseudorotation approxi- mation and the influence of various stiffness constants and modeling strategies.

For any computational approach, a reduc- tion of the full set of 3N-6 degrees of freedom associated with a molecule of N atoms involves a trade-off between accuracy and reliability of potential energy studies on one hand and computational simplicity on the other. For nucleic acid sugars, a reduction accomplished by the pseudorotation approxi- mation has apparently provided a good qual- itative description of various puckering modes. However, this simplified analytic for- mulation has undeniable drawbacks.

First, fluctuations in puckering amplitudes have been theoretically and experimentally

Page 25: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

Schlick, et al.

o b ~ e r v e d . ~ , ~ , ~ ’ , ~ ~ Second, the analytic devel- opment of pseudorotation is based on statisti- cal analyses of sugar fragments in nucleic acids; various averaging techniques are used to fit structural data (bond lengths, bond angles and dihedral angles) of many com- pounds to analytic expressions with the same coefficients. Third, extension of the pseudo- rotation approximation to large molecular systems is difficult and compoundedly ap- proximate. The inclusion of P and possibly 7,,, as independent variables of the energy function requires derivatives to be computed with respect to P(and T,,, if it is variable) in each minimization step. If the conformational energy is described in Cartesian coordinate space, this implies a complicated use of the chain-rule. Alternatively, if an internal coor- dinate system of dihedral angles is chosen, sets of P and T,,, must be used as con- formational variables in some coordinate- generation procedure for all the atoms in the molecule. For all these reasons, the sacrifice of 3 degrees of freedom (for the 5 dihedral angles) may not be justified. If analysis of structural data for various sugar fragments is desired, the exact Fourier series in eq. (11) can provide a simple but more reliable scheme.

With regard to the potential energy con- struction and energy minimization, the overall structural, energetic, and geometric agreement of our minimized structures with experimental data suggests that the potential energy function is well constructed and that the Newton methods are powerful minimiza- tion techniques. By a modeling strategy that involves a simple modification to the usual harmonic bond length and bond angle poten- tials, we construct expressions tha t are simpler and hence faster to evaluate. To eliminate the arbitrariness in modeling di- hedral angles, we have implemented equal treatment of all heavy rotations in the molecule and have demonstrated, using the E vs. P curves, that such a treatment is indeed valid. In combination with polynomial representations of the bond angle cosine and dihedral angle cosine expressions and with group treatment of bond stretching, angle bending, and dihedral angle parameters, we avoid generation of unrealistic geometry and make feasible applications of efficient second-derivative minimization strategies. We are currently examining these modeling

1223

and minimization issues for larger DNA seg- ments.

This work represents a collaborative effort between Courant Institute of Mathematical Sciences and the Biology Department at New York University.

T. Schlick was supported by a research fellowship from Courant Institute and the Biology Department at New York University, and a Dean’s Dissertation fel- lowship, awarded by New York University.

C. Peskin is a MacArther fellow. M. Overton was supported by NSF grant DCR-85-

02014. S. Broyde was supported by PHS Grant 1 ROlCA

28038-06, National Cancer Institute, DHHS and DOE Contract DE-AC02-81ER60015.

Computation was supported by DOE under contract DE-AC02-76ER03077.

References

1. 2.

3.

4.

5.

6.

7.

8.

9.

10. 11.

12.

13.

14.

15. 16.

17.

18.

19.

20.

21.

22.

23,

H. F. Schaefer, Science, 231, 1100-1107 (1986). S. Stellman, B. Hingerty, S. Broyde, E. Sub- ramanian, T. Sato, and R. Langridge, Bwpolymers,

J. Kozelska, G. Petsko, S. Lippard, and G. Quigley, J . Amer. Chem. SOC., 107, 4079-4081 (1985). S. Sherman, D. Gibson, A. Wang, and S. Lippard, Science, 230, 412-417 (1985). C. Altona and M. Sundaralingam, J . Amer. Chem. SOC., 94, 8205-8212 (1972). P. Murray-Rust and S. Motherwell, Actu. Cryst.,

E. Westhof and M. Sundaralingam, J . Amer. Chem.

H. P. M. de Leeuw, C. A. G. Haasnoot, and C. Altona, Israel J . Chem., 20, 109-126 (1980). W. K. Olson and J. L. Sussman, J . Amer. Chem. SOC.,

J. B. Dunitz, Tetrahedron, 28, 5459-5467 (1972). J. E. Kilpatrick, K. S. Pitzer, and R. Spitzer, J . Amer. Chem. SOC., 69,2483-2488 (1947). K. S. Pitzer and W. E. Donath, J. Amer. Chem. SOC.,

J . B . Hendrickson, J . Amer. Chem. SOC., 83,

D. Cremer and J. A. Pople, J . Amer. Chem. SOC., 97,

T. Sato, Nuc. Acids Res, 11, 4933-4938 (1983). E.A. Merritt and M. Sundaralingam, J. Biomol. Struc. Dyn., 3, 559-578 (1985). W.K. Olson, J . Amer. Chem. SOC., 104, 278-286 (1982). S. J. Weiner, P. A. Kollman, D. T. Nguyen, and D. A. Case, J . Comp. Chem., 7 , 230-252 (1986). S. C. Harvey and M. Prabhakaran, J. Amer. Chem. SOC., 108, 6128-6135 (1986). L. Nilsson and M. Karplus, J . Comp. Chem., 7,

D. A. Pearlman and S. -H. Kim, J . Biomol. Struc. Dyn., 3, 85-98 (1985). S. Arnott and D. W. L. Hukins, Bichem. Biophys. Res. Comm., 47, 1504-1510 (1972). A. Jack, J. E. Ladner, and A. Klug, J . Mol. Biol., 108, 619-649 (1976).

12,2731-2750 (1973).

B34,2534-2546 (1978).

SOC. 102, 1493-1500 (1980).

104, 270-278 (1982).

81, 3213-3218 (1959).

4537-4547 (1961).

1354-1358 (1975).

591-616 (1986).

24. The sequence {xo, xl, x2, . . . } is said to converge to X* I(xk - x*l/ = 0. ( 1 1 - 1 1 denotes the standard iflimk -

Page 26: An analysis of the structural and energetic properties of deoxyribose by potential energy methods

1224 Properties of Deoxyribose

Euclidian distance norm). A method is said to be globally convergent when convergence to a local minimum is guaranteed from an arbitrary starting point xo. A method is quadratically convergent when

limbmIIxk+l - x*ll = pJIxk - x*1I2 for a finite p 2 0.

25. P.E. Gill and W. Murray, Math. Prog., 7, 311-350 (1974).

26. NAG Fortran Library (mark ll), Routine E04LBF, Numerical Algorithms Group Inc., Illinois (1986).

27. The terms dihedral angle and torsion angle are of- ten used interchangebly.

28. L.D. Hall, P.R. Steiner, and C. Pederson, Can. J . Chem., 48, 1155-1165 (1970).

29. C. Altona, H. J. Geise, and C. Romers, Tetrahedron,

30. UNIX MACSYMA, Release 309.1, Massachusetts Institute of Technology, Symbolics, Inc. (1984).

31. R.M. Pitzer, Acc. Chem. Res., 16, 207-210 (1983). 32. B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J.

States, S. Swaminathan, and M. Karplus, J . Comp. Chem., 4, 187-217 (1983).

33. H. Sklenar, R. Lavery, and B. Pullman, J . Biomol. Struc. Dyn., 3, 967-987 (1986).

34. M. Levitt and A. Warshel, J . Amer. Chem. Soc., 100,

35. S. Lifson and A. Warshel, J . Chem. Phys., 49,

36. M. Levitt, J . Mol. Biol., 168, 595-620 (1983). 37. C.-S. Tung, S.C. Harvey, and J .A. McCammon,

38. B. Lesyng and W. Saenger, Carbo. Res. , 133,

39. P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, New York (1981).

40. J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, New Jersey (1983).

41. D. G. Luenberger, Linear and Nonlinear Pro- gramming, second edition, Addison-Wesley, Read- ing, MA (1984).

42. G. Dahlquist and A. Bjorck, Numerical Methods, Prentice-Hall, Englewood Cliffs, New Jersey (1974).

43. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, New York (1980).

44. E. Isaacson and H. B. Keller, Analysis of Numerical Methods, John Wiley & Sons, New York (1966).

45. A. V. Larshminarayanan and V. Sasisekharan, Biopolymers, 8, 475-488 (1969).

24, 13-32 (1968).

2607-2612 (1978).

5116-5129 (1968).

Biopolymers, 23,2173-2193 (1984).

187-197 (1984).

46. A. T. Hagler, E. Huler, and S. Lifson, J . Amer. Chem.

47. F.A. Momany, R.F. McGuire, A.W. Burgess, and H.A. Scheraga, J . Phys. Chem., 79, 2361-2381 (1975).

48. G. Nemethy, M.S. Pottle, and H.A. Scheraga, J . Phys. Chem., 87,1883-1887 (1983).

49. L. Pauling, The Nature of the Chemical Bond, Cor- nell University Press, Ithaca, New York, third edi- tion, p. 130 (1960).

50. D.A. Brant, W.G. Miller, and P.J. Flory, J . Mol. Biol., 23, 47-65 (1967).

51. N.L. Allinger, Adv. Phys. Org. Chem., 13, 1-85 (1976).

52. W. K. Olson and P. J. Flory, Biopolymers, 11,25-56 (1972).

53. J. Ketelaar, Chemical Constitution, Elsevier Pub- lishing Company, New York (1958).

54. A. Srinivasan and W. K. Olson, Fed. Amer. Soc. Exp. Bio., 39, 2199 (1980).

55. E.R. Taylor and W.K. Olson, Biopolymers, 22,

56. B. E. Hingerty, R. H. Ritchie, T. L. Ferrell, and J. E. Turner, Biopolymers, 24,427-439 (1985).

57. D. Beveridge, in Computer Simulation of Chemical and Biomolecular Systems, Annals of the New York Academy of Sciences, 482, 1-23 (1985).

58. U. C. Singh and P.A. Kollman, J. Comp. Chem., 5,

59. D.A. Pearlman and S.-H. Kim, Biopolymers 24,

60. M. Eisenstein and Z. Shakked, Fourth Conversation in Biomol. Stereodyn., Albany, New York, June (1985).

61. L. Pauling, Proc. Natl. Acad. Sci., 44, 211-216 (1958).

62. R. A. Scott and H. A. Scheraga, J . Chem. Phys., 42,

63. R. A. Scott and H. A. Scheraga, J . Chem. Phys.,44,

64. K. Umemoto and K. Ouchi, Proc. Indian Acad. Sci.,

65. D. M. Hayes, P.A. Kollman, and S. Rothenberg, J . Amer. Chem. SOC., 99, 2150-2154 (1977).

66. T. Schlick, Ph.D. Thesis, Courant Institute, Dept. of Mathematics, New York University, October 1987.

67. S. Furberg, Acta Chem. Scand., 14, 1357-1363 (1960).

SOC., 96, 5319-5327 (1974).

2667-2702 (1983).

129-145 (1984).

327-357 (1985).

2209-2215 (1965).

3054-3069 (1966).

94, 1-119 (1985).

68. T. Schlick and M. Overton, J . Comp. Chem., 8, 1025-1039.