19
Computer Physics Communications 128 (2000) 446–464 www.elsevier.nl/locate/cpc Parallel Fourier Path-Integral Monte Carlo calculations of absolute free energies and chemical equilibria Jay Srinivasan, Yuri L. Volobuev, Steven L. Mielke, Donald G. Truhlar * Department of Chemistry, Chemical Physics Program, and Supercomputer Institute, University of Minnesota, Minneapolis, MN 55455-0431, USA Abstract We present a parallel implementation of the Fourier Path Integral Monte Carlo method for calculating the absolute free energies of many-body systems. The implementation adopts the message-passing paradigm for parallelization, with the use of the Message Passing Interface (MPI) libraries. A portable computer program, written using Fortran 90, has been developed and tested on a variety of platforms such as the SGI Origin, the IBM SP, and the Cray T3D and T3E. We have used the program to demonstrate the efficacy of importance sampling in configuration space. We have also used the program to calculate the partition function, and hence the absolute free energies, of triatomic molecules and four-body systems. 2000 Elsevier Science B.V. All rights reserved. Keywords: Adaptively optimized stratified sampling; Feynman path integral; Free energy; Importance sampling; Message passing; Molecular vibrations and rotations; Partition function 1. Introduction In statistical mechanics the internal partition function of a molecule is usually defined as Q int (T ) = X i d i exp(-E i /k B T ), (1) where the E i are the eigenvalues of the system, k B is Boltzmann’s constant, T is the absolute temperature, and d i is the degeneracy of eigenvalue i . The partition function directly yields the absolute free energy, and its derivatives yield other thermodynamic observables. For harmonic systems and diatomic molecules, Eq. (1) provides a convenient way to calculate the partition function. For larger systems the direct diagonalization of the molecular Hamiltonian to compute all the required eigenvalues is a very demanding computational task, but such calculations have recently been reported for simple triatomic systems [1], and progress is being made toward similar calculations on four-body systems as well [2]. However, for high-temperature calculations on small systems and calculations at any temperature on medium-sized and large systems, it is worthwhile to explore different approaches. This paper is published as part of a thematic issue on Parallel Computing in Chemical Physics. * Corresponding author. E-mail: [email protected]. 0010-4655/00/$ – see front matter 2000 Elsevier Science B.V. All rights reserved. PII:S0010-4655(00)00052-7

Parallel Fourier Path-Integral Monte Carlo calculations of absolute free energies and chemical equilibria

Embed Size (px)

Citation preview

Computer Physics Communications 128 (2000) 446–464www.elsevier.nl/locate/cpc

Parallel Fourier Path-Integral Monte Carlo calculations ofabsolute free energies and chemical equilibria✩

Jay Srinivasan, Yuri L. Volobuev, Steven L. Mielke, Donald G. Truhlar∗Department of Chemistry, Chemical Physics Program, and Supercomputer Institute, University of Minnesota, Minneapolis,

MN 55455-0431, USA

Abstract

We present a parallel implementation of the Fourier Path Integral Monte Carlo method for calculating the absolute freeenergies of many-body systems. The implementation adopts the message-passing paradigm for parallelization, with the use ofthe Message Passing Interface (MPI) libraries. A portable computer program, written using Fortran 90, has been developed andtested on a variety of platforms such as the SGI Origin, the IBM SP, and the Cray T3D and T3E. We have used the programto demonstrate the efficacy of importance sampling in configuration space. We have also used the program to calculate thepartition function, and hence the absolute free energies, of triatomic molecules and four-body systems. 2000 Elsevier ScienceB.V. All rights reserved.

Keywords:Adaptively optimized stratified sampling; Feynman path integral; Free energy; Importance sampling; Message passing; Molecularvibrations and rotations; Partition function

1. Introduction

In statistical mechanics the internal partition function of a molecule is usually defined as

Qint(T )=∑i

di exp(−Ei/kBT ), (1)

where theEi are the eigenvalues of the system,kB is Boltzmann’s constant,T is the absolute temperature,anddi is the degeneracy of eigenvaluei. The partition function directly yields the absolute free energy, and itsderivatives yield other thermodynamic observables. For harmonic systems and diatomic molecules, Eq. (1) providesa convenient way to calculate the partition function. For larger systems the direct diagonalization of the molecularHamiltonian to compute all the required eigenvalues is a very demanding computational task, but such calculationshave recently been reported for simple triatomic systems [1], and progress is being made toward similar calculationson four-body systems as well [2]. However, for high-temperature calculations on small systems and calculations atany temperature on medium-sized and large systems, it is worthwhile to explore different approaches.

✩ This paper is published as part of a thematic issue on Parallel Computing in Chemical Physics.∗ Corresponding author. E-mail: [email protected].

0010-4655/00/$ – see front matter 2000 Elsevier Science B.V. All rights reserved.PII: S0010-4655(00)00052-7

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 447

One very appealing scheme for calculatingQint or the free energy is to directly calculate the trace of the densityoperator as this circumvents the need to perform expensive diagonalizations. One algorithm based on this approachis the Fourier Path Integral (FPI) approach introduced by Feynman [3]. Monte Carlo implementations of thisapproach, which we will denote as FPIMC, have been presented for a variety of problems [4–17]. In particular, ourresearch group has presented [13–15] algorithms for using the FPIMC method to calculate the vibrational-rotationalpartition function of diatomic and triatomic systems such as HCl and H2O. Using these partition functions wecalculated well converged, absolute free energies of triatomic species over the temperature range from 600 to4000 K [13]. A vector computer implementation of the method has been used for triatomics [15], and some laterimprovements have been discussed by Topper [17]. We have now made more improvements in the method, andthe algorithm has been implemented in a multiprocessor parallel computer code that can calculate the partitionfunction for many-body systems. In the following sections, we first present the essentials of the theory of theFPIMC method, then the implementation of the method and the parallelization strategy and, finally, some resultsrelating to the performance of the code. We illustrate the improved computer code by the calculation of the partitionfunction for three-body systems at lower temperatures (300–500 K) and also by converged calculations on four-body systems.

Although there has been considerable progress in calculating relative free energies, potentials of mean force, anddistribution functions in recent years [18–22], the subject of absolute free energies, which is treated here, has beenmuch less widely studied.

2. Theoretical background

The FPIMC method utilizes the Monte Carlo algorithm for the evaluation of a several-thousands dimensionalintegral during the course of the calculation of the partition function for a few-body system. The method has beenintroduced earlier [13], so in the present paper we limit ourselves only to the essentials of the method, which wepresent in notation appropriate for an electronic ground-state, gas-phase molecule or cluster with a general numberof atoms; this recapitulation of the standard theory is limited to the background necessary to understand the newimprovements. We refer the reader to previous articles [13,17] for theoretical background.

In the FPI method [3–17] we consider closed paths whose coordinates are given by

z̄j (s)= zj +∞∑l=1

aj,l sin

(lπs

βh̄

), (2)

wheres is a time parameter,zj is a rectilinear coordinate (elementj of theN -dimensional vector̄z), β is 1/kBT ,h̄ is Planck’s constant divided by 2π , andaj,l are Fourier coefficients describing the closed paths (i.e. those thatstart and end at the same point,z, in configuration space) in a Fourier expansion. The system is described in acenter-of-mass coordinate system. Then, the internal partition function for a system withN internal degrees offreedom and symmetry numberσ is given by [6,13]

Qint(T )= J (T )σ

∞∫−∞· · ·

∞∫−∞

(N∏j=1

dzj

)(N∏j=1

∞∏l=1

daj,l

)exp

[−

N∑j=1

∞∑l=1

a2j,l

2σ2j,l

− S(z,a)], (3)

where

J (T )=N∏j=1

(õj

2πβh̄

∞∏l=1

1√2πσj,l

), (4)

and

σ2j,l =

2βh̄2

µjπ2l2. (5)

448 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

The variablesσj,l are temperature-dependent parameters (not to be confused with the symmetry numberσ) that arefunctions of the reduced massesµj associated with theN internal degrees of freedom given byzj . The quantityS(z,a) is the action integral corresponding to the pathz̄ that starts and ends at the same point in the configurationspace after the time interval ofβh̄:

S(z,a)= 1

βh̄∫0

ds V[z̄(s)

], (6)

whereV is the vibrational potential energy. We choose our zero of energy so thatV is positive semidefinite andis zero at the classical equilibrium structure. The vectorz is N -dimensional (whereN = 3NA − 3, andNA is thenumber of atoms), and the vectora is infinite-dimensional.

Eq. (3) would be an exact representation for the vibrational-rotational partition function except for two minorcomplications. The first complication is that we have neglected nuclear spin statistics. For low temperatures andlight mass systems such as H2, these effects are important, but for typical problems these effects are insignificant.The second complication results from contributions to the partition function from dissociated species. Thesecontributions are formally divergent as they scale linearly with the volume of the system. Without an energy cutoff,the FPIMC approach includes contributions from energies above the dissociation limit. For weakly bound systems,effects arising from dissociative contributions can be a significant concern, but for the systems and temperatureswe consider in the current work these effects are quantitatively small. Further discussion of the dissociation issueis provided elsewhere [23,24].

Eq. (3) is an infinite-dimensional integral, ranging over an infinite domain. If we convert the integral in Eq. (3)to an average and employ Monte Carlo methods, we can reduce the evaluation of the integral to the problemof sampling a region of space and to summing the contribution from each sample. To do this [13,14], we firstrestrict theN -dimensional configuration space to a finite domainD of volumeVD . Further, the number of Fouriercoefficients in each Fourier expansion is restricted to a finite valueK, yielding an approximate valueQ[K]int (T ) forthe internal partition function, where the[K] superscripts indicate that we are using a finite number of Fouriercoefficients. Then, by multiplying and dividing Eq. (3) by theK-coefficient approximation to the contributionQ[K]fp (T ) to a free particle partition function from the sameN -dimensional space, we obtain a form which is an

average, and which can be written as [13]

Q[K]int (T )=

Q[K]fp (T )

σ

∫D dz

∫∞−∞ da(bexp−∑N

j=1∑Kl=1a

2j,l/2σ

2j,lcexp[−S(z,a)])

VD∫∞−∞ daexp[−∑N

j=1∑Kl=1a

2j,l/2σ

2j,l]

. (7)

In practice,Q[K]fp (T ) is easily evaluated analytically, and one method of evaluating it is given in the Appendix of[13]:

Q[K]fp (T )= VD

N∏j=1

(µj

2πβh̄2

)1/2

. (8)

3. Implementation

3.1. Monte Carlo formulation

We use a rectilinear coordinate system, which is a coordinate system in which the kinetic energy may be writtenas

T = 1

2

N∑j=1

p2j

µj, (9)

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 449

Fig. 1. The coordinate system for a four-body system showing the Jacobi coordinatesS. Eq. (10) in the text gives the relation between the Jacobicoordinates and the mass-scaled Jacobi coordinates used in the calculation of the partition function.

wherepj is the momentum conjugate tozj , andµj is the corresponding reduced mass. For diatomic systemsN is3, for triatomics it is 6, and for four-body systems it is 9. We fix the origin at the center of mass of the system anduse mass-scaled generalized Jacobi coordinatesxj given by

xj =√µj/µSj ; j = 1, . . . ,N, (10)

whereSj is a generalized Jacobi coordinate, andµ the scaling mass. Theµj are reduced masses given bymjMj/(mj +Mj) wheremj andMj are the masses of the fragments connected by the Jacobi vector havingcomponentSj . For diatomics, the Jacobi coordinates are the components of the internuclear vector. Fig. 1 ofRef. [15] shows the Jacobi coordinates for a triatomic system, and Fig. 1 of the present paper shows the Jacobicoordinates that we use here for a 4-body system.

In subsequent sections it will be useful to consider hyperspheres in theN -dimensional space spanned by themass-scaledSj . The radius of such a hypersphere is the hyperradius of our system defined by

ρ =

√√√√√ N∑j=1

x2j . (11)

We now consider how to evaluate Eq. (7) using Monte Carlo methods. First we define a density functiong(a) by

g(a)= exp[−∑Nj=1

∑Kl=1a

2j,l/2σ

2j,l]∫∞

−∞ daexp[−∑Nj=1

∑Kl=1a

2j,l/2σ

2j,l]. (12)

We see thatg(a), which is part of the integrand of Eq. (7), represents a Gaussian probability density function inFourier coefficient(aj,l) space. Thus, an efficient method of sampling the Fourier coefficient degrees of freedomis to use importance sampling [25,26] where the importance function is a Gaussian distribution in Fourier space.A good method of generating random samples with a Gaussian distribution is the Box–Muller algorithm [27,28].

450 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

A comparably efficient algorithm for sampling the nuclear coordinates is stratified sampling [25]. We choose touse a variation of the adaptively optimized stratified sampling (AOSS) method which has been presented earlier[13–15]. (We present the new variations to the AOSS method below.) The one-dimensional integral in Eq. (6) isevaluated using a standard [28] Gauss–Legendre integration algorithm. We may write

βh̄∫0

ds V[x̄(s)

]= NQ∑j=1

wjV[x̄(sj )

], (13)

where thewj are the weights at theNQ Gauss–Legendre nodessj . Thus all terms on the right-hand side of Eq. (7)are known or may be evaluated by quadrature, and we use it to calculate the internal partition function which maythen be used to compute the absolute internal free energyGint [15] which is given by

Gint =−RT lnQint, (14)

whereR is the universal gas constant.In the next section (Section 3.2) we will consider the implementation of the algorithm for a single sample point

and discuss a number of significant enhancements we have made to the previous [13,15] algorithm. Section 3.3will detail the stratification scheme that we employ, and Section 3.4 will discuss importance sampling of theconfiguration space in concert with stratification.

3.2. The algorithm for a given sample point

A substantial amount of computation is involved in calculating a single sample point and we have made anumber of significant modifications to the previous algorithm to achieve more efficient results. In this section wewill consider in detail the work involved for a given sample. The algorithm proceeds through a number of phases:

(1) A candidate configuration space sample is generated that lies on the surface of a unit hypersphere, and thesample is scaled (as described below) so that it is uniformly distributed within the domainD of configurationspace that we are interested in. If we wish to use importance sampling in configuration space (as explainedin Section 3.4), the candidate sample is either accepted or rejected via the von Neumann scheme [29] so thatit is distributed according to the desired importance function.

(2) A “relative path”, which we define as a path that is relative to a certain configuration space point, i.e. afixed-shape, movable closed path, is formed via Box–Muller sampling of the Fourier space.

(3) An “absolute path” is formed by adding the configuration space sample to each of the quadrature points ofthe relative path.

(4) The potential function is evaluated at each of the quadrature points, and the action integral is formed.(5) We calculate the contribution to the partition function from the sample point, and we repeat the procedure if

more points are required.Phase 1 requires an efficient method of sampling the unit hypersphere. In the algorithm used earlier, this was done

via rejection techniques, but these become extremely inefficient as the number of dimensions of the hypersphereincreases [13]. We now instead use the PC algorithm of Guralnik et al. [30] to generate samples on the surfaceof a unit hypersphere. This method involves no rejection sampling, and it scales linearly with the dimension ofthe hypersphere. We wish, however, to obtain samples in the required hyperannulus that comprises the domain ofinterest in configuration space, defined by its inner (ρmin) and outer (ρmax) hyperradii. To distribute each samplefrom the surface of unit hypersphere uniformly in the hyperannular domain, we scale by the factor

ρscale= ρmax

((ρmin

ρmax

)N(1−

(ρmin

ρmax

)N)ξ

)1/N

, (15)

whereξ is a random number that is uniformly distributed in(0,1).

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 451

Phase 2 involves the calculation of an(N × K)-dimensional matrix of random numbers with a Gaussiandistribution followed by a matrix multiplication of this matrix with a constant matrix of dimensionK × NQ,whereNQ is the number of Gauss–Legendre quadrature points. For large values ofK, the action integral requiresO(K) quadrature points to be accurately integrated. Phase 2 thus requires O(NK2) operations and for largeK canbe the most expensive part of the algorithm. We ameliorate this computational expense by re-using a given relativepath with many different configuration space samples; typically we re-use a relative path 100 to 1000 times. Thiscan provide substantial savings in time; for example, we found in test runs for triatomics that re-using a relativepath 200 times saved a factor of 20–50 in the time for the calculation. The re-use of paths also introduces anunbiased correlation between consecutive sample points, but we will show below that our convergence rate is notappreciably retarded by this correlation. The formulas we have presented for the variance are strictly correct foruncorrelated samples; however, we will show in Section 5 by numerical tests that these formulas still give usefulestimates, even when paths are re-used, presumably due to the short term nature of the correlation introduced byre-using the Fourier paths.

Phase 4 is typically the most expensive phase in the algorithm (provided we re-use the relative paths of Phase 2 asignificant number of times). We have introduced a number of screening options that eliminate the need to performsome or all of the potential evaluations for some paths that contribute negligibly to the total partition function.

The action integralS, as evaluated from Eq. (13), is a sum of terms at quadrature points on the path, which isthen exponentiated as seen in Eq. (3). As a result, the contribution from a sample is negligible when any one ofthe points on the path is located at a very high energy, and it can also be negligible if several points on the pathare located at moderately high energies. We have implemented the screening options by aborting the evaluation ofpaths once we determine that the contribution is negligible. Phases 3 and 4 are intermingled so that only part of theabsolute path needs to be generated if a sample point can be aborted. We have defined four screening criteria:

(i) If the geometry of any quadrature point is a high-energy configuration, the sample evaluation is terminated.For example, in water, we assume zero contribution from any path having an O–H bond distance or aH–O–H bond angle that is outside some pre-specified bounds.

(ii) If any one energy evaluation on the path is above a pre-specified maximum, the entire path is terminated.(iii) Since the terms in the sum on the right-hand side of Eq. (12) are all non-negative, we note that the running

sum of the product of the quadrature weight and the potential at each point is a lower bound on the actionintegral. Thus if this running sum is above some threshold, we terminate the evaluation of that path.

(iv) We also estimate the value ofS based on the current value of the running sum described in (iii) above andthe sum of the weights already used. Thus

Sest=∑i wiV [x̄(si)]∑

i wi. (16)

In implementing this criterion, we first accumulate the sum over a (pre-specified) minimum numberPmin of samplesbefore testing to see if the value exceeds another tolerance at which point the path is terminated. (More accurateestimates could be obtained by using weights optimized for the subset of quadrature points already evaluated, butthis would incur an additional computational cost so we have not chosen to pursue such refinements.)

To decrease the number of points that we have to evaluate the potential at before deciding whether to terminatethe path, it is desirable to sample disparate points on the path at the very beginning, rather than to sample the path ina contiguous fashion. To achieve this we re-order the quadrature points so that we sample widely spaced points oneach path during the early stages of integration. The screening parameters for criteria 3 and 4 are input as energiesand scaled by the path length so that they represent a maximum tolerance on the mean energy along the path.

The results obtained using the screening parameters must be checked for convergence with respect to theparameters used. This is done using a small number of samples by ensuring that we generate the exact samesequence of samples for different screening parameter values.

452 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

3.3. Stratification strategy

In sampling the configuration space coordinates we define the domain of interest using the hyperradius inEq. (11). In the earlier [13] implementation of the AOSS method, the strata used in the stratified sampling stepwere defined by three concentric hyperspherical boundaries enclosing the domain of interest. The boundaries ofthese strata were determined adaptively during the course of the run. First, the boundary of the domain itself wasoptimized by sampling successively smaller and smaller hyperspheres with an initial batch of probe samples.The optimized hypersphere was then divided into a number of bins and the three strata consisted of variouscombinations of the bins, specified by the outer radius of the outermost bin in each stratum. Using a certainnumber of test samples each possible combination was sampled, and the best possible stratification was calculatedby minimizing the variance. Once the best possible stratification was determined, optimizing the number of furthersamples allocated to each stratum was done by “sweeping” through the entire calculation a certain number of timesand allotting samples to the strata based on the estimate of the integral at the end of each “sweep”. The results ofthe “probe” stage were thus not wasted, but used to determine the first allocation of samples to the optimized strata.

In the present implementation the domain is allowed to be a hyperannulus rather than a hypersphere. This allowsus to eliminate regions of space that correspond to small, unphysical values of the hyperradius for the systemand thus to improve the efficiency of sampling the hypersphere. The two hyperradii,ρmin andρmax, definingthe boundaries of the hyperannulus (the outer edge and the inner edge) are specified at the beginning of thecalculation and are not optimized further. This space is subdivided into a set of concentric strata having equalhypervolumes. Thus, in the present implementation the first stratum need not include the origin of the hypersphere.While earlier, the space was divided into concentric bins of equal hypervolumes and then a limited number of“coarser” strata were defined by choosing an optimum combination of bins, in the present implementation nofurther coarse stratification is done. Typically, we use 20–100 strata. When using large numbers of strata, the innerand outer boundaries of the hyperannulus need not be too carefully optimized to achieve efficient results. We chooseconservative values for these parameters and inspect the final results to ensure that the innermost and outermoststrata contribute negligibly to the final value of the partition function.

In the first stage, if we do not use an importance function, the sampling over the configuration space is performedin a uniform fashion. (Details for the initial sampling phase for the case with importance sampling are givenin Section 3.4.) Once a sample is generated, the stratum to which it belongs is determined, and if importancesampling is used (as explained in Section 3.4), and the sample is not rejected, the integrand is evaluated. The valueof the integrand in each stratum is accumulated, as is the value of the square of the integrand, and the process isrepeated until a certain number of test samples is generated. The number of such test samples is typically a fewpercent of the total number of samples desired. In the second step, an optimum number of samples to be allocatedto each stratum is chosen using the information provided by the initial sampling step. We perform the second stepin several “sweeps”, optimizing the number of sample points used in each stratum for each sweep by the followingprocedure:

(i) The quantityyi given by,

yi = vi∑i vi, (17)

wherevi is the product of the volume of stratumi and the square root of the current estimate of the varianceof the samples of the integrand on the stratumi, is calculated after each sweep. For the first sweep, theestimate from the first stage is used.

(ii) Next, for each stratumi, we calculate the quantity

ni = yiM, (18)

whereM is the total number of samples evaluated thus far plus the total number of samples we want toevaluate during the current sweep(mk).

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 453

(iii) Then, the number of samples already accumulated in step 1 for that stratum (sayn1i ) is subtracted fromni

to give the desired number of samples,nDi , to be allotted to the stratumi for that sweep. If the subtractionyields a negative number,nDi for that stratum is set to zero.

(iv) nDi is then summed over all strata and if this sum,Sk , exceeds the total number of samples to be distributedduring this sweep (which is virtually guaranteed since, as a practical matter, we always have some bins thatwere heavily oversampled in the probe stage), we scale the number of samples in each stratum by a factorof mk/Sk . Thus, the number of samples in each stratum is rescaled so that no more thanmk samples areevaluated during the sweep.

(v) For the next sweep, this process is repeated using the updated values of the various quantities describedabove. We continue this procedure until the required number of samples is collected. Typically we perform5 to 20 sweeps.

(vi) Finally, the error analysis to obtain the Monte Carlo error bars for the estimate of the partition function isperformed as described in an earlier paper [13].

In using the AOSS scheme one must be careful to adequately sample the strata in the initial probe phase. If thevariance of the integrand samples for a particular stratum is speciously low during the initial sampling phase, thestratum may never be sampled further, and inaccurate results will be obtained. This type of problem is most likelyto occur if the dominant contribution to the integrand comes from only a small subset of samples. This situationis realized in FPIMC for low temperatures where a few rare low-energy paths provide the dominant contributionto the partition function. Fortunately the contribution of each stratum to the overall result varies quite regularly asa function of the hyperradius, and we can easily detect spurious results by examination of these trends. It is alsoimportant to note that using too many strata and/or too few samples per strata can lead to erroneous results as eachstratum will then most likely be incorrectly sampled during the first stage.

3.4. Importance sampling in configuration space

AOSS is one example of a class of techniques that accelerate the convergence of a Monte Carlo estimate tothe “true” answer by reducing the variance of the estimate. Another method of variance reduction is importancesampling. In our previous algorithm [13–15] we used importance sampling as well, but only in the Fouriercoefficient space. It is also possible to use importance sampling in the configuration space. We note that importancesampling is an independent method of variance reduction. It does not depend on the use of AOSS; we can use bothmethods in conjunction or each by itself.

Consider some importance function represented byf (x). For now, we may assume that it is completely generic,except for the requirement that it be a function of the internal coordinates of the system. Converting Eq. (7) fromthez coordinate system to thex coordinate system, and multiplying and dividing the numerator byf (x) we get

Q[K]int (T )=

Q[K]fp (T )

σ

∫D

dx∫∞−∞ da

exp[−∑Nj=1

∑Kl=1a

2j,l/2σ

2j,l]exp[−S(x,a)]f (x)

f (x)

VD∫∞−∞ daexp[−∑N

j=1∑Kl=1a

2j,l/2σ

2j,l]

, (19)

whereVD is the volume of the domainD, but in thex coordinate system. As before, to obtain a Monte Carloestimate of this we must cast it in the form of an average. To do this we multiply and divide Eq. (19) by

〈f 〉 =∫D dxf (x)∫D dx

=∫D dxf (x)

VD. (20)

This gives us the following formulation ofQ[K]int (T ) as an average:

Q[K]int (T )=

Q[K]fp (T )〈f 〉σ

∫D

dx∫∞−∞ daexp[−S(x,a)]

f (x) exp[−∑Nj=1

∑Kl=1a

2j,l/2σ

2j,l]f (x)∫

Df (x)dx

∫∞−∞ daexp[−∑N

j=1∑Kl=1a

2j,l/2σ

2j,l]

. (21)

454 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

The Monte Carlo estimate of Eq. (21) is obtained by generatingP configurations and evaluating

Q[K]int (T )=

Q[K]fp (T )〈f 〉σ

1

P

P∑i=1

exp[−S(x(i),a(i))]f (x(i))

, (22)

where the components ofa are sampled using the Box–Muller algorithm as described previously [13], but thecomponents ofx (the coordinate space) now have a distribution corresponding tof (x). The variance of this estimateis [27]

w = Q[K]fp (T )2〈f 〉2σ2P

(P∑i=1

(exp[−S(x(i),a(i))]

f (xi)

)2

− 1

P

(P∑i=1

exp[−S(x(i),a(i))]f (xi)

)2). (23)

We may choose to use importance sampling in the coordinate space alone, or along with the AOSS schemedescribed above. If we use importance sampling alone, the partition function is evaluated using Eq. (22). SinceQ[K]fp (T ) depends on the volume of the domain of integration as seen from Eq. (8), we evaluateQ

[K]fp (T ) in each

stratum, and soVD in Eq. (8) is replaced by the volume of each stratum which is given by

VD,k = πN/2

Γ (N/2+ 1)

(ρNk+1− ρNk

), (24)

whereΓ is the gamma function. Thus,

Q[K]fp (T )=

nstrata∑k=1

Q[K]fp,k, (25)

where we havenstratastrata, and where

Q[K]fp,k(T )= VD,k

2πβh̄2

)N/2. (26)

If we combine AOSS and importance sampling, the expression for the partition function is

Q[K]int (T )=

nstrata∑k=1

Q[K]fp,k(T )〈f 〉kσP

P∑i=1

exp[−S(x(i),a(i))]f (x(i))

. (27)

The variance of this estimate in stratumk is given by

wk =(Q[K]fp,k(T ))

2(〈f 〉k)2σ2P

(P∑i=1

(exp[−S(x(i),a(i))]

f (xi)

)2

− 1

P

(P∑i=1

exp[−S(x(i),a(i))]f (xi)

)2). (28)

In order to ensure that the coordinate space is sampled according to the importance functionf we use thevon Neumann rejection technique [29]. This is implemented as follows:

(i) The maximum value,fmax,k, of the importance function is determined for each of the stratak. It may bepossible to analytically calculate these values, or we may need to evaluate them numerically.

(ii) For a given candidate sample pointx(i), which is uniformly distributed within the desired stratum, wecalculatef (x(i)).

(iii) We calculate the quantityfran= fmax,kξ whereξ is a random number uniformly distributed between 0and 1.

(iv) If fran is less thanf (x(i)) the sample point is accepted; otherwise it is rejected. When a sample is rejected,we donot recount the previous sample.

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 455

With this method, we obtain coordinates distributed according to the importance functionf . We note that we cansafely overestimate the values offmax,k and still retain the desired distribution but the sampling will be more costly;underestimating the values offmax,k will result in skewed (i.e. incorrect) distributions.

During the initial sampling phase the candidate samples are distributed uniformly on the entire domain. We thendetermine the stratum into which each sample falls and use the rejection method described above. We note that inthis scheme, the final distribution of sample points depends strongly on the stratification employed and each of thestrata are sampled a significant number of times. (Note, for example, that in a calculation with one stratum, thelarge hyperradial region is sampled much less frequently than if two or more strata are used.)

The importance function that we have investigated for H2O is a direct product of Gaussians in each of theatom–atom distances. This is given by

f (r)= exp

(−1

2

(rOH− ro

OH

1OH

)2

− 1

2

(rOH′ − ro

OH′

1OH′

)2

− 1

2

(rHH′ − ro

HH′1HH′

)2), (29)

whererOH, rOH′ , andrHH′ , are distances in OH, OH′ and HH′, respectively, androOH, ro

OH′ , roHH′ ,1OH, 1OH′ , and

1HH′ are adjustable parameters. We get approximate values for the parameters by fitting plots of the contributionto the partition function as a function of the atom–atom distances. These calculations may be performed at ahigher temperature, where we can get accurate, converged results easily, or they may be lower-K runs at thetemperature of interest where we take advantage of the fact thatNQ may be smaller whenK is smaller. Anothermethod for obtaining the parameters is to optimize1OH, 1OH′ , and1HH′ by minimizing the variance of a MonteCarlo calculation performed at the same temperature but using a small number of samples and a lowerK (withconcomitantly fewer quadrature points). Approximate values forro

OH, roOH′ , andro

HH′ are the equilibrium atom–atom distances in H2O. The values we use for the importance function parameters in H2O are given in the resultssection.

We evaluate〈f 〉k on each of the strata via simple Monte Carlo integration. We keep track of the maximumvalue off that was observed during this integration for each of the strata. Since we can safely overestimate themaximum value off , the values we actually use as thefmax,k values in the von Neumann sampling scheme aregiven by min(1, fscalefmax,k), wherefscale is a scaling parameter; in our calculations we have found that it issufficient to takefscaleto be 1.05. The rescaling provides a safety margin for the estimate of the maximum value,so that no sample in any stratum should have an importance function value higher than this maximum. In ourprogram we check to confirm that we never have a value of the importance function that exceedsfscalefmax,k ; ifa sample is found that exceeds this value we terminate the run. An example of the effectiveness of this type ofimportance function will be given in the results section. We note that we do not currently include the variance of〈f 〉k in our final error estimates, and so we ensure that the error in〈f 〉k is small compared to the error in the restof the calculation.

4. Parallelization strategy

We can now present our parallel implementation of the FPIMC method. The calculation consists of two mainloops, one loop over the number of samples in the initial stage, and another loop over the number of strata inthe second, optimized sampling stage. In the second stage, for each stratum we loop over the number of samplesallocated to that stratum as well. In parallelizing the algorithm, we note that in the first stage, the generation anduse of any one sample to calculate that sample’s contribution to the integral in Eq. (7) is completely unrelated to theuse of any other sample. In the second stage, at the beginning of each sweep we calculate how many samples are tobe obtained from each stratum during that sweep, for which we need the estimate of the contributions to the totalpartition function from the previous sweeps and the initial sampling stage. Thus, in our parallelization strategy weutilize a different technique for each of the two stages.

456 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

For our parallel program, we use the message-passing paradigm of parallel computing. In this approach [31],the parallel program, when executing, consists of a number of independent processes, each communicating withthe others through the explicit passing of messages. The calculation runs onp processors, one of which is calledthe master, and the others are called slave processors. Processes on these nodes are called master processes andslave processes respectively. While the implementation of the paradigm is machine specific, a standard is available,and thus we use different implementations of the Message Passing Interface (MPI) standard [32] for the differentplatforms on which we run our programs.

In the parallel implementation of the first stage, the master process starts up, initializes the required number ofslave processes, and reads in the input data. For the first stage, the master then splits up the number of first stagesamples (N1, an input variable) equally over thep processors (including the master itself) that are being used.Each processor deals with the same total numberk of strata. For the first stage, each processor is told to generateN1/p samples by the master processor, and it handles them as outlined above. At the end of this uniform samplingstep, each processor communicates its results to the master processor. The master (which has been handling itsown samples as well) sums the results over all the processors to get the overall initial estimate of the contributionto the partition function from each stratum as well the total partition function. This concludes the first stage of thecalculation.

For the second stage, the master calculates the number of samples (sayNsk,2) that the stratumk should generate insweeps of the second stage as explained in Section 3.3, and then informs each slave to generateNsk,2/p samples.The calculation then proceeds independently on each processor until the required number of samples has beengenerated, at which time the processors communicate the updates to the accumulation of the integrand in thevarious strata back to the master, which sums them and uses Eq. (18) and the scheme in Section 3.3 again todetermineNs+1

k,2 . The process continues for a specified number of sweeps and a specified number of second stagesamples (which is just the total number of samples minus the number of samples in the first stage).

At the end of the entire process the estimate of the contribution to the integral from each stratum (correspondingto a slice of the domain of integration) for each processor resides with the master processor. The master then sumsresults over all processors to get the contribution to the final estimate from each stratum and then sums over thestrata to get the final estimate of the integral over the entire domain of interest. The master then calculates thestatistics and error estimates. The master thus performs the two functions of overseeing the entire calculation andperforming a portion of the sampling calculations as well.

Since the size of the task assigned to each processor is approximately the same, the algorithm is quite wellload-balanced. Even in the cases where we use the screening options we expect that the number of action integralevaluations aborted will be uniformly distributed across the processors. We can determine the extent of load-imbalance by computing the average CPU time over all the processors used for a job and the standard deviationfrom that average. Our calculations show that we achieve good load-balance, since the standard deviation from theaverage CPU time is about 1% of the average.

An important aspect of any Monte Carlo calculation is the generation of uncorrelated samples using a high-quality, long period random number generator (RNG). The RNG we use currently is based on the SPRNG [33,34]library and is a multiplicative lagged Fibonacci generator with a period of approximately 281 random numbers [33].

Finally, we have also implemented an option for restarting the calculation after the first stage or during the secondstage while maintaining the correct statistics and also preserving the important characteristic of reproducibility.The code does this by periodically maintaining a “checkpoint” file. This allows restart in case of any unforeseenstoppages or time constraints in job queues.

The code is written using Fortran 90 [35] and is portable to any machine that has the LAPACK linear algebralibrary [36] installed; we have tested versions of it on the Cray T3D and T3E, the Cray C90, the SGI Origin 2000,and the IBM SP. The code can be used for diatomics, triatomics, and four-body systems by simply specifying thenumber of dimensions of the system as well as details about the atoms comprising the system. A global potentialenergy surface routine that accepts geometry input in either the unscaled Jacobi coordinates described earlier or

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 457

Cartesian coordinates and that outputs the potential energy in atomic units is required as well. All calculations areperformed using Hartree atomic units (a.u.).

5. Sample results

We have utilized our parallel implementation of the FPIMC method to calculate the converged partition functionfor triatomics and four-body systems, in particular, H2O and C2H2. For H2O we used the Kauppi–Halonen potential[37]. For C2H2 we used the following form:

V = 12k[(ra − roa )2+ (rb − rob )2

]+ 1

2k′[(rc − roc )2]

+ 12k′′[(θa − π)2+ (θb − π)2], (30)

wherek, k′, andk′′ are force constants,ra andrb are the C–H bond distances andrc is the C–C bond distance,roa , ro

b , androc their respective equilibrium values, andθa andθb are the H–C–C bond angles. The values of the

force constants and equilibrium bond distances are obtained from Herzberg [38]. We used masses of 1837.15 a.u.for H, 29157.0 a.u. for O, and 21874.6 a.u. for C. In both cases, the zero of energy is taken as the energy at theequilibrium geometry. All calculations used a scaling mass of 1 a.u.

In all the calculations presented below we have used a finite number of Fourier coefficients. This truncation ofthe sum over an infinite number of Fourier coefficients to a sum over a finite number of coefficients introducesan error, but in the calculations that we present here we have used a sufficiently large value ofK such that the

Table 1Screening parameters

System Screening parameter Value

H2O O–H bond distance (minimum) 1.3 bohr

H2O O–H bond distance (maximum) 3.2 bohr

H2O H–O–H bond angle (minimum) 60 deg

H2O H–O–H bond angle (maximum) 150 deg

H2O Maximum energy for a given point 4.0 eV

H2O Mean energy of path 2.0 eV

H2O Estimated mean energy of path, minimum number of points for the estimate 3.0 eV, 10

C2H2 C–H bond distance (minimum) 1.0 bohr

C2H2 C–H bond distance (maximum) 3.5 bohr

C2H2 C–C bond distance (minimum) 1.0 bohr

C2H2 C–C bond distance (maximum) 4.0 bohr

C2H2 H–C–C bond angle (maximum) Not used

C2H2 H–C–C bond angle (minimum) 100 deg

C2H2 Maximum energy for a given point 8.0 eV

C2H2 Mean energy of path 3.0 eV

C2H2 Estimated mean energy of path, minimum number of points for the estimate 4.0 eV, 10

458 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

Fig. 2. Comparison of the 1σ errors for the calculation of the partition function for water at 2400 K obtained using the standard formula forthe variance of the calculation for uncorrelated samples, averaged over 500 different calculations which re-used Fourier paths (the higher, lightline) and the 1σ error obtained as a deviation from the mean of the same 500 calculations (the lower, dark line). The spread of the calculatedpartition functions is also illustrated by a histogram plot. A total of 1 million samples were used in each run, and relative Fourier paths werere-used 200 times.

systematic error due to truncation is smaller than the statistical error. An analysis of the convergence with respecttoK has been presented in earlier work [13–15].

We calculated the partition function both by employing the screening criteria discussed above as well as withoutthem. The screening criteria we used for H2O and C2H2 are listed in Table 1.

As discussed earlier, the re-use of the Fourier paths introduces a correlation in the samples. To show that thisdoes not appreciably affect the convergence of our results, we calculated the partition function for H2O at 2400 K in500 separate runs using 1 million samples each time. Fig. 2 shows the spread in the value of the partition functionas a histogram chart. Superimposed on the chart are the 1σ error bars calculated by two different methods. InMethod A, we use the formulas presented earlier to calculate the variance for each of the 500 calculations and thenaveraged this error bar over all the calculations. In Method B, we calculated the error bar by obtaining the standarddeviation from the mean of the partition function for the 500 calculations. As the figure shows, the errors calculatedby the two different methods are essentially identical, and we conclude that we are justified in asserting that there-use of Fourier coefficients does not appreciably degrade the convergence rate.

Tables 2–4 show the results of calculating the partition function for H2O and C2H2 at various temperatures. Thetables also show the internal partition functionQint as calculated using the FPIMC method and internal free energyas defined in Eq. (14). We will concentrate here on showing the effect of the improvements we have made to theearlier algorithm [13] and showing that the method scales linearly with the number of processors used. We also

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 459

Table 2Partition function, internal free energy (kcal/mol), ratio of CPU timings averaged over all processors without screening to CPU timings withscreening, and percentage of action integrals evaluated when screening was performed as a function of the number of samples for H2O at300 Ka,b

Probe stage AOSS stage Total Partition function Internal free Timing % of action

samples samples samples Without screening With screening energyc ratio integrals evaluatedd

2.5× 107 2.5× 107 5× 107 1.0(1)× 10−8 1.0(1)× 10−8 −10.98+0.05−0.06 4.7 5.6

1.0× 108 1.0× 108 2× 108 1.03(6)× 10−8 1.03(6)× 10−8 −10.96+0.03−0.04 4.8 5.7

2.5× 107 4.75× 108 5× 108 1.06(5)× 10−8 1.06(5)× 10−8 −10.94+0.03−0.04 5.1 5.5

5.0× 107 4.5× 108 5× 108 1.01(3)× 10−8 1.01(3)× 10−8 −10.97+0.01−0.02 5.0 5.7

1.0× 108 4.0× 108 5× 108 1.00(3)× 10−8 1.00(3)× 10−8 −10.98+0.02−0.01 5.0 5.4

2.5× 108 2.5× 108 5× 108 1.01(4)× 10−8 1.01(4)× 10−8 −10.98+0.03−0.02 4.8 5.6

5.0× 108 – 5× 108 0.99(4)× 10−8 0.99(4)× 10−8 −10.98+0.02−0.03 4.8 5.7

aThe number of Fourier coefficients used was 1000, and relative Fourier paths were re-used 200 times. The action integral was calculatedby 10 repetitions of 50-point Gauss–Legendre quadrature, and stratified sampling was performed using 30 strata. The lower hyperradius ofthe hyperannulus was 80 mass-scaled bohr and the upper hyperradius was 130 mass-scaled bohr. All calculations were distributed across 8processors. The screening parameters used are given in Table 1. No importance sampling was used.bAll tables give 2σ uncertainty in the last digit of partition functions in parentheses, and this is also converted to unsymmetrical 2σ errorbars on internal free energy.cCalculated using the partition function obtained using screening parameters.dWhen screening is used.

Table 3Partition function, internal free energy (kcal/mol), and CPU timings averaged over all processors relative to the timing for500 million samples as a function of the number of samples for C2H2 at 2400 Ka

Number of samples Partition function Free energy Timing % of action integrals % of potential evaluations

evaluatedb madeb,c

5× 108 7.1(1)× 103 42.29+0.07−0.06 1.0 0.07 2.04

1× 109 7.13(9)× 103 42.31+0.06−0.06 2.0 0.07 2.04

2× 109 7.07(6)× 103 42.27+0.04−0.04 4.0 0.07 2.04

aThe number of Fourier coefficients used was 100. We used 5 repetitions of 10-point Gauss–Legendre quadrature to calculatethe action integral and stratified sampling was performed using 30 strata. The lower hyperradius of the hyperannulus was 220mass-scaled bohr and the upper hyperradius was 345 mass-scaled bohr. All calculations were distributed across 32 processors.The screening parameters used are detailed in Table 1.bWhen screening is used.cThis is the analog of the value 9.1% discussed in the text for the second last case of Table 2. It is the percentage of the numberof potential evaluations that would be required if integral screening were not used; it includes evaluations for completed actionintegrals as well as evaluations done on paths that are aborted by the screening criteria.

present an analysis of the effect of importance sampling on the calculation of the partition function for H2O andcomparisons with the effect of stratified sampling.

Table 2 shows the results of the calculation of the partition function for H2O at 300 K and the effect of screeningunimportant samples. The screening parameters we used are presented in Table 1. The last column of Table 2presents the fraction of action integral evaluations that are used to calculate the partition function; this is anapproximate measure of how many samples of those generated contribute substantially to the partition function.

460 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

Table 4Partition function, internal free energy (kcal/mol), and the speedup relative to the two-processor calculation asa function of the number of processors used in the calculation of the partition function for H2O at 400 Ka

Number of processors Total number of samples Partition function Free energy Speed-upb

2 1× 108 4.07(9)× 10−6 −9.87+0.02−0.01 1.00

4 1× 108 4.07(9)× 10−6 −9.87+0.02−0.01 2.03

8 1× 108 4.08(9)× 10−6 −9.86+0.01−0.02 4.01

8 5× 108 4.04(4)× 10−6 −9.86+0.01−0.01 –

12 1× 108 4.07(8)× 10−6 −9.87+0.02−0.01 5.90

16 1× 108 4.11(9)× 10−6 −9.86+0.02−0.02 8.10

32 1× 108 4.12(9)× 10−6 −9.86+0.02−0.01 15.67

48 1× 108 4.05(9)× 10−6 −9.87+0.02−0.02 22.44

aThe parameters used are the same as those of footnote a of Table 2. The screening parameters used aregiven in Table 1.bThe speedup is defined relative to a two-processor calculation via Eq. (31). The speedup was not calculatedfor the run using 500 million samples.

The screening parameters we used resulted in about 94–95% of the action integral evaluations being aborted asthey were deemed to have a negligible contribution to the partition function. We aborted action integral evaluationsbased on all of the four criteria presented in Section 3.2. As an example of how the screening works in detail,we consider the calculation from Table 2 involving 500 million samples where half the samples were used in theprobe stage and half in the AOSS stage: In this calculation, 94.4% of the action integrals were discarded; of these,geometric considerations eliminated 80.3%, any single point on any path being above a pre-specified maximumeliminated another 13.4%, the average energy of the path being above a certain maximum value eliminated 0.5%,and the estimate of the action integral over the path eliminated another 0.1%. As a result, we evaluated an additional3.5% of the total number of potential evaluations that would be required without screening while working on pathsthat were subsequently terminated. Thus, in total, we carried out 9.1% of the total number of potential evaluationsthat would be required without using screening. Thus, at this relatively low temperature and in the absence ofimportance sampling, only about 6% of the samples provide meaningful contributions to the partition function andonly about 9% of the number of potential evaluations need to be actually carried out. For larger systems and lowertemperatures these fractions are even smaller. The relative contribution to the total screening from each of thefour screening criteria varies from system to system. For systems where the geometry does not provide a clear-cutmethod of distinguishing samples with small or negligible contributions, the remaining three energetic criteria maybecome more important as a method of screening samples.

The last five rows of Table 2 present the value of the partition function and its 2σ error as the number of samplesused in the probe stage and the AOSS stage are varied while keeping the total number of samples fixed. Thissequence of calculations shows that there is an optimum value of the number of samples in each stage for a giventotal number of samples that results in the lowest 2σ error bar. Using too few samples in the probe stage can resultin erroneous results due to undersampling of important strata, while using too many in the probe stage negates theeffect of AOSS, and we get a 2σ error bar that is the same as that obtained using uniform sampling of the strata.Table 2 shows that allocating 15± 5% of the samples to the uniform stage may be close to optimum, at least forthe present problem.

Table 3 shows the results for the calculation of the partition function for C2H2 at 2400 K. The result for thepartition function obtained using the FPIMC method may be compared to the result obtained using the harmonicoscillator rigid-rotor (HORR) approximation. Using the HORR approximation and the frequencies of the normal

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 461

Fig. 3. Plot of speed-up versus the number of processors used; all speed-ups are relative to the two-processor timing on the IBM SP.

modes given in Ref. [38] we obtain a value forQint as 8.3× 103, which is within 15% of the value for thepartition function that we obtain using the FPIMC method. The last two columns of Table 3 present the fraction ofaction integral evaluations that are used to calculate the partition function and the percentage of additional potentialevaluations made on paths that were subsequently terminated. We see that for the screening parameters that we use,and even at this high temperature, the main contribution to the partition function comes from about only 0.07% ofthe sample space, the reason being the high dimensionality of the configuration space.

Table 4 shows how the timing on the master processor scales with the number of processors allocated for theentire calculation, for a fixed number of samples. Results are presented only for H2O, at 400 K, although wehave found similar results at other temperatures and for C2H2 as well. The table also shows the speed-up forn

processors, which we define, relative to the timing using two processors as

Sn = CPU Timing using 2 processors

CPU Timing usingn processors. (31)

We see that the speed-up is almost linear, and this is also clearly seen in Fig. 3. The efficiency when we use 32processors is 15.67/(32/2)= 97.9% and the efficiency for 48 processors is 22.44/(48/2)= 93.5%.

Importance sampling complements the stratified sampling scheme that we use. Table 5 shows the effect ofimportance sampling and the effect of stratified sampling for a calculation of the partition function of H2O at 500 K.The table presents the results using two sets of importance function parameters: those obtained by calculating theconverged partition function at 2400 K and fitting a plot of the contribution to the partition function as a functionof the interatomic distances to a Gaussian function and those obtained by fitting a plot of the contribution to thepartition function as a function of the interatomic distances to a Gaussian function at 500 K. The importancefunction parameters we used are:

roOH= ro

OH′ = 1.857ao, roHH′ = 2.914ao,

1OH=1OH′ = 0.149ao and 1HH′ = 0.282ao,

which were obtained from a calculation at 2400 K, and

roOH= ro

OH′ = 1.841ao, roHH′ = 2.899ao,

1OH=1OH′ = 0.132ao and 1HH′ = 0.214ao,

462 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

Table 5Effects of stratification and/or importance sampling for a calculationa of the partition function of H2O at 500 K

Number of strata % Error Variance reduction factorb

No importance Importance sampling No importance Importance sampling

sampling 2400 K 500 K sampling 2400 K 500 Kparametersc parametersc parametersc parametersc

1 1.82 0.43 – 1.00 18.10 –

2 1.56 0.43 – 1.36 17.92 –

3 1.61 0.42 – 1.28 18.59 –

4 1.44 0.42 – 1.59 18.54 –

5 1.38 0.42 0.44 1.73 19.15 17.01

10 1.31 0.41 0.46 1.93 19.48 15.68

30 1.28 0.41 0.43 2.03 19.83 17.68

50 1.27 0.41 0.42 2.03 19.86 18.36

aThe number of samples was 50 million with 15% in the probe stage. The number of Fourier coefficients used was 1000.Gauss–Legendre quadrature was performed with 10 repetitions of 50-point quadrature. The rest of the parameters were as infootnote a of Table 2. The screening parameters used are given in Table 1.bThe variance reduction factor is relative to the result for 1 stratum and no importance sampling. Recall that the error isproportional to the square root of the variance.cImportance sampling parameters used are given in the text. The average value of the importance function was calculated bystandard Monte Carlo using 100 million samples, such that the percentage error in each stratum was always below 0.2%.

which were obtained from a calculation at 500 K. The second column of the table shows the 2σ error bars as apercentage of the final result for different number of strata without importance sampling. The third column showsthe error when we use the importance function parameters obtained at 2400 K and the fourth column gives the errorwhen the importance function parameters obtained at 500 K are used. The final three columns give the variancereduction factor, which is defined as the ratios of the squares of the percentage errors in the second, third, andfourth columns, with respect to the result for 1 stratum and no importance sampling. It does not take into accountthe additional time used in the importance sampling runs. One can see that we get the highest variance reductionfrom high stratification combined with importance sampling. Stratification alone gives us only about a factor of2.0 in variance reduction. Importance sampling alone gives us a factor of 18 in variance reduction. If we useimportance sampling, the addition of stratification gives a further variance reduction of a factor of 1.1. Clearlyimportance sampling is much more effective than stratification; the increased benefits from stratification if we useimportance sampling are modest, but not negligible. The importance sampling runs take more time than the oneswithout importance sampling; without screening criteria this increase is only about 10–15%, but with screening inplace, and with the parameters we have used, since the importance function causes a greater fraction of the samplesto give a non-negligible contribution, the time taken typically increases by about a factor of 4–6.

We note that the combination of AOSS with importance sampling has some advantages that are not immediatelyobvious from the numbers in the tables. For example, stratification on top of importance sampling may make thecalculation less sensitive to the detailed form of the tails of the importance sampling function. Also, if we used onlyone stratum with importance sampling, it would require a larger number of samples to safely optimize the upperlimit of the hyperradius.

The use of importance function parameters obtained at 500 K in Table 5 actually gives an academic view of theusefulness of importance sampling because the importance function was determined at the actual temperature atwhich it is used; this involves some overhead. To avoid this, one can perform the calculations on a given system in

J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464 463

Table 6Partition functiona at 300 K with 2σ error bars for H2O with and without importance samplingin combination with AOSS

Importance function Number of AOSS alone AOSS with

parameter obtained at samples importance sampling

2400 K 5× 107 0.93(7)× 10−8 1.02(3)× 10−8

2400 K 1× 108 0.95(6)× 10−8 1.03(2)× 10−8

500 K 5× 107 0.93(7)× 10−8 1.01(3)× 10−8

500 K 1× 108 0.95(6)× 10−8 1.05(3)× 10−8

aParameters are the same as those in footnote a of Table 2. Importance function parametersare the same as those used in Table 5.

order of decreasing temperature and always use the previous (higher) temperature to get the importance samplingfunction for the current temperature. To see how useful this might be, Tables 5 and 6 also show results from runsat 500 K in which we used the importance function parameters that were obtained at 2400 K and results at 300 Kin which importance sampling functions were determined at 2400 or 500 K. By comparing the 2σ error bars forthe calculation using importance sampling to those without importance sampling, we see that these importancefunction parameters also result in a significant reduction in the variance. First of all, notice that there appears to bea systematic error in the “AOSS alone” results due to insufficient sampling either altogether or in the probe phase;the importance sampling improves this without increasing the size of the sample. Second, importance samplingreduces the error bars; on average, the 2σ error bars decrease by a factor of 3.1, which means that to obtain acomparable error without importance sampling we would have to use about 10 times as many samples. Thus itis not necessary that the importance sampling function be re-optimized for each new temperature. In fact, theimportance sampling functions determined at 2400 K actually seem to work slightly better than those determinedat lower temperatures. This may at first be surprising, but we should remember that the exact distribution is notprecisely a Gaussian function, especially in the wings of the distribution where(r − ro)2n exp{−[(r − ro)/1]2/2}might be a more accurate representation than exp{−[(r − ro)/1]2/2}. It is probably safer to err on the side ofhaving the importance sampling function be too broad rather than too narrow, and that is probably why the 2400 Kfunction works better than the 500 K one. The critical bottom line though, is that both work well, so the method isnot overly sensitive to quality of the importance sampling function.

6. Concluding remarks

A parallel implementation of an improved algorithm for the Fourier path-integral Monte Carlo method has beenpresented. One improvement is screening out unimportant contributions to the action integral using geometric andenergetic considerations; the use of the screening technique allows for considerable speed-up of the calculations.Importance sampling in the coordinate space has also been implemented. Importance sampling in the coordinatespace is shown to be effective in variance reduction – in fact even more effective than stratified sampling in thehyperradius. However, importance sampling and stratification complement each other in variance reduction, andso we use them both.

Acknowledgements

We thank Yao-Yuan Chuang for helpful discussions. This work was supported in part by the National ScienceFoundation under grant no. CHE97-25965.

464 J. Srinivasan et al. / Computer Physics Communications 128 (2000) 446–464

References

[1] L. Neale, J. Tennyson, Astrophys. J. 454 (1995) L169;H. Partridge, D.W. Schwenke, J. Chem. Phys. 106 (1997) 4618;G.J. Harris, S. Viti, H.Y. Mussa, J. Tennyson, J. Chem. Phys. 109 (1998) 7197.

[2] S. Carter, N. Pinnavaia, N.C. Handy, Chem. Phys. Lett. 240 (1995) 400;J. Koput, S. Carter, N.C. Handy, J. Phys. Chem. A 102 (1998) 6325.

[3] R.P. Feynman, A.R. Hibbs, Quantum Mechanics and Path Integrals (McGraw-Hill, New York, 1965) p. 71.[4] W.H. Miller, J. Chem. Phys. 63 (1975) 1166.[5] J.D. Doll, D.L. Freeman, J. Chem. Phys. 80 (1984) 2239.[6] D.L. Freeman, J.D. Doll, J. Chem. Phys. 80 (1984) 5709.[7] J.D. Doll, R.D. Coalson, D.L. Freeman, Phys. Rev. Lett. 55 (1985) 1.[8] R.D. Coalson, J. Chem. Phys. 85 (1986) 926.[9] R.D. Coalson, D.L. Freeman, J.D. Doll, J. Chem. Phys. 85 (1986) 4567.

[10] D.L. Freeman, J.D. Doll, Adv. Chem. Phys. 70 B (1988) 139.[11] T.L. Beck, J.D. Doll, D.L. Freeman, J. Chem. Phys. 90 (1989) 5651.[12] J.D. Doll, D.L. Freeman, T.L. Beck, Adv. Chem. Phys. 78 (1990) 61.[13] R.Q. Topper, D.G. Truhlar, J. Chem. Phys. 97 (1992) 3647.[14] R.Q. Topper, G.J. Tawa, D.G. Truhlar, J. Chem. Phys. 97 (1992) 3668.[15] R.Q. Topper, Q. Zhang, Y.-P. Liu, D.G. Truhlar, J. Chem. Phys. 98 (1993) 4991.[16] D.L. Freeman, J.D. Doll, Annu. Rev. Phys. Chem. 47 (1996) 43.[17] R.Q. Topper, in: Monte Carlo Methods in Chemical Physics, Advances in Chemical Physics, Vol. 105, D.M. Ferguson, J.I. Siepmann,

D.G. Truhlar (Eds.) (John Wiley and Sons, New York, 1999) p. 117.[18] P.M. King, in: Computer Simulation of Biomolecular Systems, Vol. 2, W.F. van Gunsteren, P.K. Weiner, A.J. Wilkinson (Eds.) (ESCOM,

Leiden, Netherlands, 1993) p. 267;W.F. van Gunsteren, T.C. Beutler, F. Fraternali, P.M. King, A.E. Mark, P.E. Smith, ibid., p. 315;T.P. Straatsma, M. Zacharias, J.A. McCammon, ibid., p. 349.

[19] Computer Simulation in Materials Science, M. Meyer, V. Pontikis (Eds.) (Kluwer Academic Publishers, Dordrecht, The Netherlands,1991).

[20] M.E. Tuckerman, P.J. Ungar, T. von Rosenvinge, M.L. Klein, J. Phys. Chem. 100 (1996) 12 878.[21] H.-P. Cheng, R.N. Barnett, U. Landman, Chem. Phys. Lett. 237 (1995) 161.[22] D. Marx, M. Parrinello, Nature 375 (1995) 216.[23] B.C. Garrett, D.G. Truhlar, J. Phys. Chem. 83 (1979) 1052.[24] G.K. Schenter, J. Chem. Phys. 108 (1998) 6222.[25] J.M. Hammersley, D.C. Handscomb, Monte Carlo Methods (Methuen, London, 1965).[26] M.H. Kalos, P.A. Whitlock, Monte Carlo Methods, Vol. I: Basics (John Wiley and Sons, New York, 1986).[27] G.P. Box, M.E. Muller, Ann. Math. Stat. 29 (1958) 610.[28] W.H. Press, W.T. Vetterling, S.A. Teukolsky, B.P. Flannery, Numerical Recipes in Fortran: The Art of Scientific Programming, 2nd edn.

(Cambridge University Press, Cambridge, 1992).[29] I.M. Sobol’, A Primer for the Monte Carlo Method (CRC Press, Boca Raton, FL, 1994).[30] G. Guralnik, C. Zemach, T. Warnock, Inform. Process. Lett. 21 (1985) 17.[31] V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduction to Parallel Computing (Benjamin/Cummings, Menlo Park, CA, 1994).[32] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard (version 1.1), Technical report (1995). http://www.mpi-

forum.org.[33] A. Srinivasan, D.M. Ceperley, M. Mascagni, in: Monte Carlo Methods in Chemical Physics, Advances in Chemical Physics, Vol. 105,

D.M. Ferguson, J.I. Siepmann, D.G. Truhlar (Eds.) (John Wiley and Sons, New York, 1999) p. 13.[34] SPRNG library, http://www.ncsa.uiuc.edu/Apps/SPRNG.[35] ISO/IEC, Information Technology – Programming Languages – Fortran, ISO/IEC 1539:1991 (E) (ISO/IEC Copyright Office, Geneva,

Switzerland, 1991).[36] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov,

D. Sorensen, LAPACK User’s Guide (SIAM, Philadelphia, 1992).[37] E. Kauppi, L. Halonen, J. Phys. Chem. 94 (1990) 5779.[38] G. Herzberg, Molecular Spectra and Molecular Structure II. Infrared and Raman Spectra of Polyatomic Molecules (D. Van Nostrand,

Princeton, 1945) p. 180.