Quantum monte carlo

Optimum and efficient sampling for variational quantum Monte CarloJ. R. Traila! and Ryo MaezonoSchool of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292,Japan

!Received 8 June 2010; accepted 23 August 2010; published online 4 November 2010"

Quantum mechanics for many-body systems may be reduced to the evaluation of integrals in 3Ndimensions using Monte Carlo, providing the Quantum Monte Carlo ab initio methods. Here welimit ourselves to expectation values for trial wave functions, that is to variational quantum MonteCarlo. Almost all previous implementations employ samples distributed as the physical probabilitydensity of the trial wave function, and assume the central limit theorem to be valid. In this paper weprovide an analysis of random error in estimation and optimization that leads naturally to newsampling strategies with improved computational and statistical properties. A rigorous lower limit tothe random error is derived, and an efficient sampling strategy presented that significantly increasescomputational efficiency. In addition the infinite variance heavy tailed random errors of optimumparameters in conventional methods are replaced with a Normal random error, strengthening thetheoretical basis of optimization. The method is applied to a number of first row systems andcompared with previously published results. © 2010 American Institute of Physics.#doi:10.1063/1.3488651$

I. INTRODUCTION

The accurate computational solution of the many-bodySchrödinger equation for fermionic systems is an importantand only partially solved problem in modern chemistry.Monte Carlo !MC" integration provides one of the most pow-erful methods available to address this problem, and hasbeen implemented in many different ways.1 All methods takea reformulation of the problem in terms of multidimensionalintegrals, such as a variational principle or propagators, andemploy MC methods for the evaluation of the integrals in-volved. The strength of MC integration is that it providessuperior scaling of accuracy with the number of integrandevaluations. For example, for r evaluations of a smooth in-tegrand the MC error scales as %1 /r1/2, independent of thedimensionality of the system !D", whereas for order-p poly-nomial interpolation on an evenly sampled grid the errorscales as %!1 /r"p/D.2

In this paper we limit ourselves to the variational quan-tum Monte Carlo !VMC" method. Expectation values fortrial wave functions are estimated directly, with a trial wavefunction constructed to explicitly include correlation and forwhich the dimensionality of integrals cannot be reduced ana-lytically. Typically a trial wave function is expressed in termsof a large number of unknown parameters, and a variationalprinciple invoked to provide the “best” possible trial wavefunction. The optimized quantity is estimated using MC.Higher accuracy quantum Monte Carlo !QMC" methods areavailable, but VMC itself provides a high level of accuracy,is often the starting point of other QMC methods, and otherQMC methods include MC integration in a similar manner.

For MC integration a key choice is the distribution ofrandom samples used to construct the statistical estimate.

Such a choice is not unique and different choices result indifferent distributions of the random error inherent in theestimate of the integral. For a MC estimate to be useful theunderlying distribution from which it is drawn must beknown both in form and scale via the limit theorems of prob-ability theory. This is not always possible. Within VMC thecharacterization of random errors is further complicated bythe physical quantities being quotients of estimated integrals.

In light of this the main body of this paper is concernedwith characterizing the statistical properties of estimates thatarise from different choices of sampling strategy, and fallsinto three main sections. We start by considering estimates ofexpectation values only, with no optimization of parameters,and for general sampling. A large class of distributions ofrandom error is shown to arise, the most desirable being theNormal distribution that occurs where the central limit theo-rem !CLT" is valid in its bivariate form. Conditions are pro-vided for when this occurs, and estimates are provided forthe mean and variance of this Normal distribution. Requiringa Normal random error still allows a wide range of samplingdistribution, and alternatives to the usual method used in theVMC literature !referred to here as “standard sampling”" arediscussed. An optimum sampling distribution is derived thatprovides a lower bound to the random error possible for agiven sample size, and a benchmark for assessing the effec-tiveness of other sampling strategies. An efficient samplingdistribution is constructed with the aim of significantly re-ducing the computational resources required to achieve agiven random error. For both optimum and efficient sam-plings, errors are Normal.

Next we address the issue of trial wave function optimi-zation. A number of methods exist for achieving an iterativeimprovement of an initial wave function within VMC, whichmay be interpreted as seeking the minimum !or zero" of amultivariate function that is random in the sense that it isa"Electronic mail: [email protected].

THE JOURNAL OF CHEMICAL PHYSICS 133, 174120 !2010"

0021-9606/2010/133"17!/174120/16/$30.00 © 2010 American Institute of Physics133, 174120-1

Downloaded 27 Sep 2012 to 131.111.62.210. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

http://dx.doi.org/10.1063/1.3488651

http://dx.doi.org/10.1063/1.3488651

drawn from an ensemble of possible random functions. Themethods used to find the optimum of a sample function areof secondary importance, and we focus on the distribution ofthe position of the minimum of the random function, a ran-dom variable itself. This viewpoint naturally divides the op-timization process into taking a sample random function andnumerically optimizing to some level of accuracy !a “cycle”of optimization", then taking a new sample random functionand optimizing again !repeated cycles of optimization". Wediscuss the distributions of random error in the sample opti-mum trial wave function, how much information limit theo-rems provide, and the performance of different samplingstrategies.

Finally, the efficient sampling strategy is used for theoptimization and estimation of the electronic energies of firstrow isolated atoms and molecules, with the systems chosenfor comparison with previously published ab initio resultsand accurate approximations for the exact energies. Our aimis to demonstrate the quality of results that can be obtainedusing new sampling strategies in VMC, complicated trialwave functions, and modest computational resources.

Sans-serif is used for random variables such as X, with asample value denoted as X. The expectation of a randomvariable is denoted by E#X$, and a random estimate of somequantity y !usually an integral or a quotient of two integrals"is denoted by y, and is a sample value of some randomvariable Y. The integration variable of a probability densityfunction !PDF" is usually left implicit, but where it is explicitwe use x, y, or u for scalar, and R for vector arguments.Atomic units are used throughout, unless otherwise indi-cated.

II. GENERALIZED SAMPLING

We begin with the expectation value of the total energy,Etot, of a N electron system !a generalization to differentoperators and particles is straightforward". For a givenHamiltonian and trial wave function, H and !, this takes theform

Etot =& !2EL#!$dR/& !2dR , !1"

where the local energy is defined as EL#!$=!!1H! and R isthe 3N dimensional vector composed of the positions of allelectrons.

A general sampling approach to VMC consists of defin-ing a random position vector R characterized by some dis-tribution and drawing r independent samples of this randomvariable. Writing the distribution as

Pg = "!2/w !2"

is convenient in what follows, with w as a general !positive"weight function defined in the 3N dimensional space of theintegrals. The normalization prefactor, ", is not required togenerate samples using the Metropolis algorithm,1 and is in-cluded in order to clarify that expressions for estimates areinvariate with respect to it. Note that it is unrelated to thephysical normalization of the trial wave function unlessw=1.

Estimating both integrals using MC provides an estimateof the total energy as

Etot ='wEL

'w, !3"

but for this estimate to be useful, its distribution must beknown. Since this quantity is the quotient of two samplemeans, the CLT is not directly applicable.

The sample weights in this equation may be renormal-ized such that their sum is equal to the number of samples byintroducing a new weight, w!=rw /'w, so reducing the num-ber of random variables by a factor of 2 and reducing Eq. !3"to the sample mean

Etot =1r ' w!EL. !4"

Unfortunately a price is paid for this apparent simplificationof the problem. In order for the CLT to be valid !at least in itssimplest form" it is necessary for the samples to be indepen-dent and identically distributed. For the reformulation givenabove neither of these conditions is satisfied since eachsample value of w! is composed of all the original samples,(w). Although a sample mean and sample standard error existthe CLT does not relate these to the mean and variance of aNormal distribution, or ensure that the distribution is Nor-mal.

A more successful approach is to consider the estimateexplicitly as a quotient of two sums, hence maintaining theindependence and identical distribution of the random vari-ables sampled in each sum, and including the parametricrelationship between each sample wEL and w. We begin byobtaining the co-distribution of the numerator and denomi-nator of Eq. !3", so include all correlations between them.The bivariate CLT informs us that in the large sample sizelimit !r!1'wEL ,r!1'w" is a sample drawn from a bivariateNormal distribution3 with PDF,

Pr!x2,x1" =1

2#

r1/2

*C*1/2e!1/2q2, !5"

characterized by a mean, !$2 ,$1",

!$2,$1" = !E#wEL$,E#w$" , !6"

a covariance matrix, C, with elements

c22 = var#wEL$, c12 = c21 = cov#wEL,w$, c11 = var#w$ ,

!7"

and

q2 = r+x1 ! $1

x2 ! $2,T

C!1+x1 ! $1

x2 ! $2, . !8"

Although the elements of each sample !wEL ,w" are causallyrelated to each other, the only aspect of this that survives inthe large sample size limit is the partial linear correlationcharacterized by c12.

From this bivariate Normal distribution the distributionof the quotient of sums can be expressed using the standardformulas4

174120-2 J. R. Trail and R. Maezono J. Chem. Phys. 133, 174120 "2010!


Pr!u" = ! &!%

0

x1Pr!x2 = ux1,x1"dx1

+ &0

+%

x1Pr!x2 = ux1,x1"dx1, !9"

which, in the large sample size limit, reduces to

Pr!u" =r1/2

-2#. !c11$2 ! c12$1"u + !c22$1 ! c12$2"

!c11u2 ! 2c12u + c22"3/2 .

&exp/!r

2!$2 ! $1u"2

!c11u2 ! 2c12u + c22"

0 !10"

for general $1. Since the weights are necessarily positive,$1'0 and the large sample size limit further reduces to

Pr!u" =1

-2#

1(E

exp/!!u ! Etot"2

2(E2 0 , !11"

a Normal distribution with mean

Etot =1!2ELdR

1!2dR!12"

and variance

(E2 =

1r

1

$12/c22 ! 2+$2

$1,c12 + +$2

$1,2

c110=

1r

1!2/wdR1w!2!EL ! Etot"2dR#1!2dR$2 . !13"

Replacing the means by their unbiased estimates pro-vides the total energy estimate

Etot ='wEL

'w!14"

as a sample drawn from a Normal distribution with a meanequal to the exact total energy expectation value. The vari-ance is defined by Eq. !13", and introducing standard unbi-ased estimates for the means and covariance matrix elementsgives the standard error estimate as

(E2! =

r

r ! 1'w2!EL ! Etot"2

!'w"2 . !15"

The usual interpretation of the accuracy of the estimates interms of confidence limits and standard errors then followsdirectly, provided we use these two formulas.

We finish with some comments on the properties andlimitations of these estimates. The estimates cannot be de-rived using the univariate CLT, and the standard error is nota sample variance divided by a sample size. For Pg to be avalid PDF it is necessary for !2 /w to be positive and nor-malizable. For the bivariate CLT to be valid the integral defi-nitions of the mean vector and covariance matrix must exist,which cannot be shown numerically—the situation is analo-gous to the assumptions of smoothness that are required toprovide error limits for numerical integration on a grid. Per-haps most importantly, for an exact trial wave function (E

2!=0 for any number of samples and for any choice of sam-pling distribution, hence generalized sampling possesses the

“zero-variance” property in that there is no statistical error inthe total energy estimate of an eigenfunction of the Hamil-tonian.

The above equations inform us that estimated expecta-tion values are Normal provided that the covariance exists.In the rest of this paper we repeatedly seek estimates forwhich this is the case; hence here we summarize why this isdesirable and what replaces the Normal distribution when thecovariance is not defined.

The generalized CLT !Ref. 5" states that in the large rlimit, the average of independent and identically distributedrandom variables drawn from a distribution P possesses ei-ther a Normal or Stable distribution. !Throughout this paperwe refer to a Normal distribution as distinct from a Stabledistribution. This is for brevity, and Stable distributions areoften defined to include the Normal distribution as a specialcase." Which of these arises depends on the properties of P.For tails of P decaying faster than *x*!3 the strongly localizedNormal distribution occurs which is characterized by a meanand variance that can be estimated. For tails of P decayingslower than *x*!3 a weakly localized Stable distributions oc-curs which is characterized by four parameters, which cannotall be estimated. For the Normal distribution the PDF isGaussian and the width of the distribution scales as r!1/2,whereas for Stable distributions the PDF asymptotically ap-proaches a power law and its width scales slower than r!1/2,is constant, or increases with r depending on the tails of P.

The Stable case is undesirable for two reasons. Estimatesare not available for the parameters of the distribution; hencecharacterizing the error via confidence intervals is notstraightforward. Furthermore, even if parameters could beestimated, the presence of power law tails causes the size ofthe confidence interval to increase with the level of confi-dence considerably faster than the CLT case. An example ofthis is shown in Fig. 1. The probability that a sample fallsoutside a centered region is shown as a function of the size ofthe region, for a Normal distribution and for a Stable distri-bution. The Normal distribution is chosen to have a mean

S(32 , 0, 1, 0; 0)

N(0, 1)

2x

P(|X

|>x)

10410010!2

100

10!5

FIG. 1. Probability that a sample falls outside a centralized interval of size2x for N!0,1", a Normal distribution, and for an example Stable distribution,S! 1

2 ,0 ,1 ,0 ;0" !Ref. 5" that is symmetric with x!5/2 power law tails and mean0 !the mean exists for this stable distribution". This form of stable distribu-tion is typical of the distributions that occur for an invalid CLT in standardsampling QMC.

174120-3 Optimum and efficient samplings for VMC J. Chem. Phys. 133, 174120 "2010!


and variance of zero and unity. The Stable distribution shownis symmetric, has *x*!5/2 tails, has a mean that exists and isequal to zero, and has a “scale” parameter !the analog of thevariance for the Normal case" chosen to be unity. There is norelationship between the variance and scale of the two dis-tributions, and they are chosen to be equal in this figure forconvenience only. Such a probability asymptotically ap-proaches e!!1/4"x2

/x for the normal case, with outliers remain-ing persistent for the Stable distribution due to the *x*!3/2

asymptote. Where an estimate is composed of a quotient ofsample averages and covariances are not defined, a similarsituation arises; a bivariate generalized CLT is required, abivariate Stable distribution results,5 power law tails arise,and estimates of error bars are unavailable.

The notion of using a sampling distribution such thattotal energy estimates require weighted averages is inher-ently part of the reweighted sampling of optimization, diffu-sion Monte Carlo !DMC", and other QMC methods, but isusually analyzed by incorrectly applying the univariate CLTto numerically renormalized samples. Explicit generalizedsampling has been used in VMC by a few authors,6–9 andColdwell first presented a formula for the variance estimatethat differs from Eq. !15" only in that it does not include theprefactor that corrects for bias in the estimated covariancematrix. Similarly, Assaraf et al.10 employed the same expres-sion in the context of DMC, again without the correctionprefactor.

The new results provided here are proofs that Eq. !14" isan estimate that is normally distributed provided that thecovariance matrix exists, and that if this is so, the mean ofthe distribution is the total energy and Eq. !15" estimates thevariance with the appropriate bias correction. Note that re-quiring the covariance to exist refers to the integral defini-tion, not to sample estimates since the later are always finite.Next we consider particular choices of sampling distribution,Pg.

A. Standard sampling

Choosing w=1 over all space corresponds to the usualimplementation of VMC, sampling the coordinate space as

Ps = "!2, !16"

which gives the estimate

Etot =1r ' EL, !17"

with standard error

(E2! =

1r!r ! 1" ' !EL ! Etot"2. !18"

Perhaps the greatest advantage of this choice is the simplealgebraic form of the estimates, and that they may be derivedusing the univariate CLT only. However, the previous con-sideration of generalized sampling suggests that lower ran-dom errors may be possible for other sampling distributions,and that similar random errors may occur for other samplingdistributions.

The standard sampling choice also has some further de-ficiencies, as discussed in a previous paper,11 due to the nec-essary existence of a 3N!1 dimensional nodal hypersurfacewhere ! is zero. Such zeroes are usually accompanied bysingularities in the sampled local energy on these surfaces!this is not necessarily the case, indeed is not the case for anexact wave function, but is usually unavoidable", and theanalogous “local” quantities averaged to estimate expecta-tion values for other operators. Such singularities are purelyan artifact of zeroes in Ps, and cause the distribution of esti-mates for many physical quantities to be a Stable distributionwith power law tails. For standard sampling the total energyestimate is somewhat unusual in that CLT is valid.11,12 Formost other quantities, such as the derivatives used for forcesand in wave function optimization, this is not so.

B. Optimum sampling

For a given trial wave function and Hamiltonian, therandom error depends on sample size and the sample distri-bution characterized by the weight function w. The randomerror for a given system and number of samples must begreater than or equal to zero, hence a weight function mustexist that supplies the smallest possible random error in anestimate. So, the general standard error defined by Eq. !13"should possess minimum with respect to variations in theweight. Seeking )(E

2#w$ /)w=0 gives

w2 =1

!EL ! Etot"2

1!2w!EL ! Etot"2dR1!2/wdR

, !19"

whose solution !excluding w=0" gives the optimum sam-pling distribution,

Popt = "!2*EL ! Etot* , !20"

which is consistent with the optimal sampling distributionfor estimating the energy differences between two systemsprovided by Ceperley et al.13

Equation !20" provides a lower bound to the achievableaccuracy in that (E*(opt for all sampling strategies, with

(opt =1

r1/2& !2*EL ! Etot*dR , !21"

a scaled mean-absolute deviation of the local energy. This isrelated to the more familiar standard error of standard sam-pling,

(s =1

r1/2/& !2!EL ! Etot"2dR01/2

, !22"

by (opt+(s !from Jensen’s inequality". Estimates are readilyavailable for both of these integrals.

This lower limit is analytically correct, but does not pro-vide a satisfactory distribution for the implementation of op-timum sampling. For the weight defined in Eq. !20" the co-variance matrix is undefined unless EL#Etot for any R, aspecial condition that we cannot reasonably expect to be gen-erally satisfied. Consequently the bivariate CLT will not bevalid for most systems.14

Even worse, evaluating the PDF requires the exact Etot,which is generally not available as it is the quantity whose



value is sought by the MC calculation. A natural approxima-tion is to replace the exact Etot with some previous nonexactestimate. Unfortunately this worsens the statistics consider-ably, resulting in an “estimate” that does not statistically con-verge to a constant with increasing sample size.

An improved definition of optimum sampling is re-quired. We take a self-consistent approach in order to allow

Etot to be unknown and postpone the large sample size limitas late as possible in the derivation in order to ensure that thecovariance matrix exists. Fiellers’s theorem15 provides a con-venient reformulation of the distribution of a quotient of bi-variate Normal random variables in terms of the confidenceintervals, #ll , lu". These intervals are defined by the conciseexpression

ll,u =!r$1$2 ! q2c12" , -!r$1$2 ! q2c12"2 ! !r$1

2 ! q2c11"!r$22 ! q2c22"

r$12 ! q2c11

!23"

for confidence erf!q /-2", for example, q=1 provides the68.3% confidence interval. As long as $1'0, such a descrip-tion is entirely equivalent to the Normal distribution speci-fied by Eqs. !12" and !13" since in the large r limit it definesthe interval Etot,q(E for confidence erf!q /-2".

Minimizing the size of the confidence interval 2(F= lu! ll using )(F

2 /)w=0 provides

0 = 2(F2+1 !

1r

1

$12c11, )

)w!c11" +

1r

1

$12

)

)w!c12

2 ! c11c22"

+)

)w!c22 ! 2Etotc12 + Etot

2 c11" . !24"

Solving the large r limit of this equation reproduces Eq. !20",but instead we approximate this equation by introducing an apriori total energy estimate and error that are constant withrespect to r, that is replace !Etot ,(F" with a !E0 ,-". The larger limit then gives

0 = 2-2 )

)w!c11" +

)

)w!c22 ! 2E0c12 + E0

2c11" , !25"

and solving for w gives

Popt = "!2#!EL ! E0"2 + 2-2$1/2. !26"

This distribution is optimum in the sense that it is the solu-tion of the best available approximation to the equation thatdefines optimum sampling.

In principle, the estimates could be calculated iterativelyby supplying !Etot ,(F" from one calculation as !E0 ,-" for thenext, and proceeding to convergence. Since !E0 ,-" would notconverge to the exact estimate, the covariance matrix re-mains defined and the bivariate CLT is valid. In practicalapplication we go no further than the first iteration, usingrelatively inaccurate total energies such as those provided byan optimization process.

A poor value of !E0 ,-" does not bias the total energyestimate, which converges to the true value as long as -#0.Similarly, accurate values for E0 are not necessary as long asthis is reflected in the accompanying error, -, in that for large- this sampling strategy becomes equivalent to standard sam-

pling. However, it should be borne in mind that the randomerror can be greater than that for standard sampling if - un-derestimates the accuracy of E0.

As an aside, we note that it is possible to construct asampling distribution that provides a better value of the totalenergy than this optimum sampling strategy, but such sam-pling strategies are not MC estimates. For example, a Pg forwhich all the probability is located on the hypersurface EL=Etot is such a distribution, but neither the mean nor thecovariance matrix is defined and the “estimate” is entirelymade up of bias. Hence the algorithm simply returns thevalue of Etot supplied at the start. Although the optimumstrategy described above does requires some estimate of thetotal energy !and indeed its error" this does not skew thedistribution or introduce a systematic bias. Sampling as Eq.!26" is referred to as optimum sampling in what follows, as itprovides the best available approximation to the lowest pos-sible random error for a given sample size.

C. Efficient sampling

Optimum sampling provides a useful lower limit to thestatistical error achievable for a given system and, in prac-tice, it is possible to get very close to this limit using rela-tively inaccurate estimates of !E0 ,-". However, optimumsampling will not necessarily be efficient as no account hasbeen taken from the computational cost of generating each ofthe samples, and we seek a further sampling strategy forwhich the statistical error available for a given computationalcost is low.

For VMC, the algorithm usually used to draw samplesfrom a distribution is the generation of a Markov chain viathe Metropolis algorithm. This is well documented inliterature1 and can provide an ordered list of sample vectorsdrawn from a list of correlated random vectors, each distrib-uted as Pg, and for which the degree of correlation betweenpairs in the list falls exponentially as they become moreseparate. Disregarding enough of these samples results in anew list for which correlation is negligible, providing !effec-tively" independent and identically distributed random vec-tors. This is the set (R)r used to construct MC estimates.

Consequently, the computational cost of generating aVMC estimate is composed of evaluating Pg at mr accept/



reject steps and evaluating EL at r of these steps. The totalcomputational cost is T=mTPg

+TEL, where TPg

is the com-putational cost of evaluating Pg, and TEL

is the computationalcost of evaluating the local energy, suggesting that the avail-able accuracy may be improved by reducing TPg

and increas-ing the total number of samples, r.16 Our goal is to seek an“efficient” sampling strategy for which the PDF is computa-tionally cheap to evaluate, the bivariate CLT is valid, and thestandard error for a given number of samples is low. As apragmatic measure of success we use the statistical accuracyachievable for a fixed computational budget.

A simple, but not generally successful approach is toapproximate standard sampling by choosing a single Slaterdeterminant .SD2! #such as a Hartree–Fock !HF" Slaterdeterminant$ and to sample using

PSD = ".SD2 . !27"

For the associated estimates to be useful the bivariate CLTmust be valid, and the Appendix describes the analysis usedto ascertain whether this is so, essentially a more generalform of that described in Ref. 11. To summarize, both!wEL ,w" possess a singularity at the nodal surface of .SDthat cause the covariance of !wEL ,w" to be undefined. Con-sequently, each sample mean is itself a sample drawn from aStable distribution with a mean, with x!5/2 power law tails,and with no variance. Although an estimate constructed us-ing Eq. !14" is computationally cheap and does converge tothe total energy expectation value in the large sample sizelimit, no information is available about the distribution oferrors. An important exception occurs if the nodal surfaces of! and .SD coincide, for which the bivariate CLT is valid, thetotal energy estimate is Normal, and the standard error canbe estimated.

In light of this we seek a sample distribution that doesnot possess a nodal surface, yet shares many of the propertiesof the trial wave function and is arithmetically simple incomparison with both standard and optimum samplings. Thisis not enough to uniquely define a sampling distribution, andwe propose the form

Peff = "#*.1!R"*2 + *.2!R"*2$ , !28"

where .1 and .2 are Slater determinants !or configurationstate functions" with different nodal surfaces. A source ofsuch determinants might be the dominant components of amultideterminant wave function, or ground and excited statessupplied by HF.

The distribution of Eq. !28" has the desirable qualitiesthat it is computationally cheap to evaluate relative to manyof the trial wave functions used in QMC, it is not zero on any3N!1 dimensional nodal surface, it is zero on the 3N!3dimensional coalescence hyperplanes, and it reproduces ex-ponential tails for electrons in bound systems. A physicalinterpretation of this expression would be somewhat tenuousand not particularly useful, and the distribution should beconsidered solely as a compromise between analytic proper-ties and ease of calculation. The analysis of the Appendixinforms us that no singularities are present in either of!wEL ,w", power law tails do not arise, and both the meanvector and covariance matrix exist. Consequently, this distri-

bution provides a total energy estimate that is normally dis-tributed and whose variance can be estimated. Sampling us-ing Eq. !28" is referred to as efficient sampling in whatfollows.

To compare the performance of efficient, standard, opti-mum, and Slater determinant !SD" sampling total energy es-timates, these were calculated for an isolated all-electron car-bon atom in the 3P nonrelativistic ground state. A version ofthe CASINO !Ref. 17" package was used, with modificationsintroduced to enable generalized sampling. Of the variety oftrial wave functions available for use in QMC we employ theform18,19

! = eJ!R" 'n=1

NCSF

/n.n!X" , !29"

that is made up of a multideterminant expansion, Jastrowprefactor, and backflow transformation, hence allows varia-tion of the nodal surface. The Jastrow factor, J!R", includeselectron-electron, electron-ion, and electron-electron-ionterms !see Ref. 20". Backflow is included via the nonlineartransformation R!X!R" made up of electron-electron,electron-ion, and electron-electron-ion terms !see Ref. 21".Configuration state functions !CSFs" for the multidetermi-nant expansion, .n, are obtained from the multiconfigura-tional Hartree–Fock !MCHF" atomic-structure packageATSP2K,22 and are made up of numerical orbitals and a re-duced active space that includes CSFs in order of increasingweight. To summarize, the Jastrow factor is characterized by167 parameters, backflow is characterized by 161 param-eters, and the 550 determinant expansions of 52 CSFs arecharacterized by 51 parameters. This trial wave function is ofessentially the same form as Ref. 23, but with the number ofparameters systematically increased. Samples were takenfrom the Metropolis random walk every m=64 steps, a valuechosen as the lowest power of 2 for which correlation wasstatistically undetectable.

Values for wave function parameters were obtained us-ing optimization as described in the next section. We ask thereader to bear with this somewhat inverted order since avalid sampling strategy for estimates is a necessary prereq-uisite for optimization even though in application optimiza-tion must be performed first.

For a fixed sample size there is a modest variation in thestandard error #Eq. !15"$. When compared with standardsampling, efficient sampling increases the error by 29%, op-timum sampling reduces the error by 37%, and SD samplingprovides no standard error. However, computational costdoes vary significantly. Figure 2!a" shows the evolution oftotal energy estimates with computational time. The standarderror for a given computational cost is lowest for efficientsampling, and greatest for optimum sampling, with no con-fidence interval available for SD sampling !only the estimateis shown". The SD sampling estimate does appear to ap-proach a constant value, but this is likely to be illusory sinceestimates for different sample sizes are strongly correlated.

Figure 2!b" shows the evolution of the estimated errorwith computational time for all four sampling strategies, witherrors obtained from Eq. !15". Efficient, standard, and opti-



mum samplings show a consistent 0Tcpu!1/2 evolution with a

large variation in the prefactor between different samplingstrategies. For SD sampling the result of an incorrect appli-cation of Eq. !15" shows no consistent power law behaviorwith sporadic discontinuous jumps introduced by outliers,and although confidence intervals do exist for the underlyingdistribution, they are not related to the quantity shown in thefigure. Sample sizes and estimates obtained by allocating afixed computational resource to each sampling strategy aregiven in Table I. Compared to standard sampling, efficientsampling increases the number of samples by &34, reducingthe error by &1/5, whereas optimum sampling decreases thenumber of samples by &1/7, increasing the error by &2.

The performance improvement of efficient samplingwhen compared to standard sampling varies between sys-tems, and some limits are available. For r samples distributedas Peff, and assuming that the cost of evaluating the PDF isnegligible when compared with evaluating the trial wavefunction, the speedup with respect to standard sampling isthe ratio

Ts

Teff2 !m ! 1"

T!2

TEL

+ 1. !30"

Since T!2 /TEL11 the speedup is limited to less than m-fold,

with few electron systems closest to this limit. The oppositelimit occurs as the number of electrons in the system is in-creased, where T!2 /TEL

!0 or no speedup occurs. For all

cases speedup increases with m, with none occurring for m=1. How close we are to each of these limits dependsstrongly on the relative computational cost of wave functionevaluation and local energy evaluation !which includes wavefunction evaluation", so it must be considered on a case bycase basis. For example, for the Li, C, and Ne2 calculationsperformed later in this paper standard sampling requires&22, &20, and &5 more computational resource to achievethe same accuracy as efficient sampling.

III. OPTIMIZATION

In the previous section the trial wave function, !, wastaken as given and the random error in total energy estimatesconsidered. In this section ! is taken to be variable, and weaddress the role of randomness in the process of searchingfor optimum parameters using MC estimates. This can beviewed as extending random MC from estimates of singlevalued quantities !such as the total energy" to estimates ofsmooth functions !such as total energy as a function of wavefunction parameters", and seeking the minimum or zero ofsuch a random function.

It has long been appreciated that using independentVMC estimates for different parameter values results in adiscontinuous surface, hence the powerful methods availablefor continuous surfaces are not directly applicable. This isusually avoided by using “correlated sampling,”1 where a setof sample positions, (R)r, is generated from a VMC calcula-tion using initial parameters, and a MC estimate derived orproposed that provides a continuous sample function drawnfrom an underlying ensemble of functions and whose mini-mum may be found using standard numerical methods.Cycles of optimization are repeated with fresh sample func-tions, some convergence criteria are applied, and the final setof parameters is supplied to a final VMC estimate of expec-tation values.

This introduces random error into the final estimate intwo distinct ways. The final VMC estimate for a given set ofparameters has a random error, as described in the previous

PSD

Popt

Ps

Peff

(a) Tcpu (h)

! Eto

t±

"! E(a

.u.)

100500

0.01

0PSD

Ps

Popt

Peff

Tcpu (h)(b)

"! E(a

.u.)

100101

10!2

10!3

10!4

FIG. 2. Evolution of total energy estimates with computational time. !a" shows the evolution of confidence intervals for efficient, standard, and optimumsamplings. No confidence interval is available for SD sampling, and only the estimated value is shown. An arbitrary constant offset has been added for eachsampling strategy to aid clarity. !b" shows the evolution of standard errors for efficient, standard, and optimum samplings. For SD sampling the quantity shownarises from evaluating the standard error estimate when it is incorrect to do so, and is unrelated to confidence intervals.

TABLE I. Total energy estimates for fixed computational time and opti-mum, standard, efficient, and SD samplings. No standard error is availablefor SD sampling and the number of significant figures is unrelated to ran-dom error.

Type r Etot

Popt 430 096 237.8435!2"Ps 3 193 192 237.8436!1"Peff 106 829 024 237.843 61!2"PSD 136 971 968 237.843 607



section. However, the parameters used are also samplesdrawn from a random distribution !they are the positions ofthe zero of a random function" and introduce a random errorthat is additive since optimization and estimation are inde-pendent of each other.

For the specific example of total energy, the estimatesare sample values of

Eopt = Etot + eVMC + eopt, !31"

where Etot is the exact optimum total energy allowed by thevariational freedom of the trial wave function, eVMC is therandom error in a VMC total energy estimate performed withthe optimized parameters, and eopt is the random error due tothe optimization being performed on a random function. Ifno MC estimation was required in the optimization processthen eopt=0, but eVMC would remain nonzero. Similarly, ifthe final total energy evaluation was exact eVMC=0, the MCoptimization would still give a nonzero eopt.

The random variable eVMC was discussed in the previoussection, and is well understood in that it is possible to ensurethat it is normal and to estimate a standard error for it. How-ever, eopt cannot be dealt with in the same way since it in-volves the random variation in the shape of the sample func-tion. It would be useful obtain some information about thedistribution of eopt, how it varies with the number of samplesand parameters, and what limit theorems are applicable to it.In what follows we present a preliminary analysis that pro-vides some of this information.

It is worthwhile to note that optimization to numericalconvergence within a cycle takes a large number of functionevaluations, and most of this computational effort is wastedsince the optimized function itself has a random error. Thiscan be easily avoided by relaxing convergence criteria, forexample, limiting the number of iterations allowed withineach cycle can significantly improve computational effi-ciency. In what follows we denote the number of samplesused for estimation and optimization as rVMC and ropt, withrVMC*ropt due to the sample electronic positions used inoptimization being supplied by a VMC estimate calculation.

A. Optimization with a sample function

A variety of functions are estimated for use in VMCoptimization. Here we provide a brief description of theirform, their distribution, and how they generalize to non-standard sampling.

The family of methods loosely referred to as “varianceminimization” seeks a minimum of a function, g!a", of thegeneral form

g!a" =& f · !EL!a" ! E"2dR/& f · dR ,

!32"

E =& f · EL!a"dR/& f · dR ,

characterized by f , a non-negative function of coordinate, R,and by the trial wave function characterized by parameters a.

For general sampling, the MC estimate used to define theoptimization surface takes the form24

g!a" ='w!a"!EL!a" ! E"2

'w!a",

!33"

E ='w!a"EL!a"

'w!a",

where the weight and distribution are related by Pg="f /wwith Pg independent of a. The estimated function g!a" has alower bound of zero, and obeys a zero-variance principle inthat both the estimate and its gradient are exactly zero for anexact trial wave function. These two properties allow stan-dard numerical methods to be stable and successful. How-ever, the estimate of the random surface is not necessarilynormally distributed.

Standard sampling implementations of such methodscorrespond to choosing Pg="!2!a0", where a0 are the initialparameters of a cycle of optimization. For this case theanalysis of the Appendix informs us that for most choices off the bivariate CLT fails, with the distribution of g!a" pos-sessing power law tails, no variance, and in some cases nomean. The closely related “mean-absolute deviation” optimi-zation functions can be obtained by replacing the squareddeviation in Eq. !33" with an absolute deviation,8 and infinitevariance distributions arise for the same reasons.

For either case a Normal distribution of errors occursonly for “artificial weighting,” where f is modified such thatit approaches zero at the nodal surface sufficiently fast toremove singularities in the sampled quantity. However, thisapproach does not seem entirely satisfactory since the failureof the CLT is due to standard sampling, not to properties ofthe integral whose value is estimated. Furthermore, more ad-vanced QMC methods specifically require trial wave func-tions with accurate nodal surfaces; hence artificially sup-pressing the contribution of this region to the optimizedquantity is not desirable.

Minimization of the total energy expectation value for atrial wave function is perhaps the most desirable choice as itinvolves a physical variational principle. Consequently wefocus on total energy optimization by generalizing the “linearoptimization method”25,26 to generalized sampling, and pro-vide a statistical analysis of the method.

The total energy estimate for samples distributed as Pg="!2!a" /w!a" !with Pg independent of a" is given by

Etot!a" ='w!a"EL!a"

'w!a", !34"

a sample surface drawn from a distribution of random sur-faces. Optimization on this surface is usually unsuccessfulsince it is not bounded from below, and that although theenergy estimate itself is “zero-variance,” its gradient is not.

Hermiticity of the Hamiltonian operator naturally pro-vides a variety of estimates for the total energy gradient,including a zero-variance estimate. The cost of choosing thezero-variance gradient estimate is that its gradient !the Hes-sian" is not a symmetric matrix; hence the estimated gradientis not itself a derivative of a sample surface.

For notational simplicity we introduce a new parameter



vector, b= !b0 ,a1 , . . ."T, that includes a normalization factorvia !!b"=b0!!a" to give the generalized sampling zero-variance gradient estimate as

Ei!1"#!!b"$ = 2.'w!!1!i#EL#!$ ! Etot#!$$

'w.

b, !35"

where gradients in parameter space are denoted as Ei!1"

= !$E!b""i and !i= !$!!b""i, and parameters have been madeimplicit.

Linear optimization proceeds by seeking the zero of Eq.!35" using successive first order expansions of the trial wavefunction. Expansion about parameters bn provides this as

!n+1!bn + !bn+1" = !bn+1 · $!n!bn" . !36"

Replacing ! with !n+1 in the condition Ei!1"#!!b"$=0 gives

H#bn$!bn+1 = -nS#bn$!bn+1, !37"

an eigenvalue equation with matrix elements given by theMC estimates,

Hij = *' w!0!2!i! jEL#! j$*bn, Sij = *' w!0

!2!i! j*bn.

!38"

Solving for the lowest eigenvalue provides updated param-eter values,

bn+1 =1

b0n + 3b0

n+1 !bn + !bn+1" . !39"

If this process is iterated to self-consistency, then

Ei!1"#!!bn+1"$ = 0,

!bn+1 = !1,0, . . ."T, !40"

-n+1 = .'wEL#!$'w

.bn

,

where the parameters converge to those for which the esti-mated gradient is zero.

To ensure that such a method is robust we use the semi-orthogonalization, rescaling, and level-shift modifications asdeveloped by Toulouse and Umrigar,26 which easily transferto generalized sampling, and normalize the diagonal of Sij tounity to provide an energy scale for level-shift parameters.Note that these modifications do not alter the self-consistentsolution, but are necessary to stabilize the method.

The convergence behavior of this approach requiressome attention. For the special case where the variationalfreedom of the trial wave function includes the exact eigen-state, then it is clear that self-consistency occurs for a zerogradient estimate and a minimum energy estimate at thesame parameter values. However, the two will not generallycoincide since the gradient estimate is not usually the gradi-ent of the total energy estimate. Furthermore, the gradientestimate is not the derivative of a sample surface so there isno reason to expect a point to exist where the gradient iszero. So, self-consistency need not occur. Limiting the opti-mization to a few !usually one" iteration within each cycle

and monitoring convergence across cycles provide an effec-tive control for this indefiniteness. In what follows we do nottake this into account explicitly, and assume that its effectcan be entirely accounted for by allowing for correlation be-tween cycles and a modification of eopt that becomes negli-gible close to statistical convergence.

The distribution of errors of the estimated gradient func-tion !whose zero is sought" and of the estimated matrix ele-ments can be assessed as described in the Appendix. Forstandard sampling, Stable distributions arise for estimateswith a mean, with x!5/2 power law tails, and with no vari-ance. For the gradient estimate this occurs even for a station-ary nodal surface. The total energy estimate performs better,in that it is Normal for a stationary nodal surface and for theinitial parameters of a cycle, but not otherwise. Similarly,most of the matrix elements are samples drawn from a Stabledistribution with x!5/2 tails.

Employing efficient or optimum sampling of Sec. I re-moves the nodal surface in the sampling distribution, henceestimates of the total energy, total energy gradient, and ma-trix elements are all Normal.27

B. Random errors in optimization

Given the distribution of the gradient estimate we maycharacterize the optimization random error, eopt. In principle,the route toward this is straightforward. For both varianceand linear optimization of the total energy we seek a zerovalue of a smooth yet random gradient, that is solutions ofthe random equation

g!1"!aopt" = 0. !41"

Samples of this random vector function are defined by g!1"

=$g!a" #see Eq. !33"$ for variance minimization, and by!g!1""i= Ei

!1" #see Eq. !35"$ for linear optimization. Solving therandom equation defines the “random optimum” parameters,aopt, and solving the associated estimate of the gradient equa-tion provides sample values of the random optimum param-eters. Introducing these random parameters into the total en-ergy then defines the random optimization error as

eopt = Etot#aopt$ ! Etot#a0$ , !42"

where a0 is the exact minimum of the total energy with re-spect to parameter variation. It is immediately apparent thateopt*0 and E#eopt$*0. It is also clear that the random errorin optimum parameters is related to the random error in thederivative of the minimized function, not to the random errorin the function itself.

Deriving the distribution of eopt is not simple, with theprimary difficulty due to inversion of the random vectorfunction, and here we offer a perturbative solution. We beginby expanding the random vector function about the trueminimum, a0,

g!1"!a" = g!1"!a0" + g!2"!a0"!a ! a0" + ¯ , !43"

where each coefficient in the expansion is a random variable.Truncating to first order and solving for g!1"=0 gives therandom deviation from the exact optimum as



3a = aopt ! a0 = ! #g!2"$!1 · g!1"*a0. !44"

To simplify this expression further we rotate and rescale pa-rameters such that E#g!2"!a0"$=I, and expand the matrix/vector product to first order in the deviation of each randomvariable from its mean value to give

3a = ! g!1" + #g!2" ! I$E#g!1"$*a0, !45"

a linear combination of random variables.Expanding the exact total energy to second order about

the true minimum and introducing these random optimumparameters gives

eopt = 123aT · I · 3a , !46"

a sum of the squares of Np correlated random variables. Pro-vided that both g!1" and g!2" possess a mean and a variance,then it follows that eopt possesses a mean and a variancegiven by

E#eopt$ =12'

iE#3ai

2$ ,

!47"

var#eopt$ =14'

ij!E#3ai

23a j2$ ! E#3ai

2$E#3a j2$" ,

where 3ai= !3a"i. These equations allow us to deduce someproperties of the distribution of eopt for different samplingdistributions.

For both efficient and optimum sampling, g!1"!a" is Nor-mal, hence g!1"!a0", g!2"!a0", and 3a are multivariate Nor-mal. Writing the covariance matrix elements of this distribu-tion as Cij /ropt !they arise from the bivariate CLT asdiscussed in the previous section" and limiting ourselves tototal energy minimization !where the mean of the randomdeviation is zero" then gives the co-moments28

E#3ai$ = 0,

E#3ai2$ = Cij/ropt, !48"

E#3ai23a j

2$ = !CiiCjj + 2Cij2 "/ropt

2 ,

which may be introduced into Eq. !47" to give the first twomoments of eopt as

E#eopt$ = !2ropt"!1'i

Cii,

!49"var#eopt$ = !2ropt

2 "!1'ij

Cij2 .

Provided that correlation between each 3ai is weak enoughthe distribution of eopt will be a generalized 42 distributionand approach Normal in the large Np limit, but we do notassume this to be the case.

This analysis informs us that provided the mean andvariance exist, the optimization error can be systematicallydecreased by employing more samples. However, increasingNp causes two conflicting effects. The number of covariancesin each sum increases, contributing a linear increase in theoptimization error. In competition to this all covariances are

reduced by the extra variational freedom in the trial wavefunction. The second of these effects will not necessarilydominate over the first, hence added variational freedom willnot necessarily improve VMC results.

For standard sampling normality is absent from the firststage of the above analysis since g!1"!a" is not normally dis-tributed and possesses no variance. So, the random optimumparameters are not normally distributed, Eq. !47" is unde-fined, and the distribution of eopt possesses neither a meannor a variance. Although we cannot conclude that it pos-sesses a Stable distribution, we can conclude that the distri-bution from which it is drawn possesses power law tails thatdecay as x!2 or slower, and that it approaches a )!x" functionwith increasing ropt.

A number of points should be made to clarify this analy-sis. Exact quantities used above are not available for an ac-tual calculation, but this does not influence the conclusionsince we only require these quantities to exist. Similarly, thedefinition of parameters such that the exact Hessian is theidentity matrix only simplifies notation. A more serious issueis that the above analysis is not a proof due to the truncationof series expansions to first and second orders. However, thisperturbative approach is often successful for Normal randomvariables, hence the analysis can be considered as stronglysuggestive.

To summarize, the random error due to optimization isgreater than zero, may increase with increasing number ofparameters, and is unrelated to the random error in the totalenergy estimate for a single set of parameters. For efficientor optimum sampling, the mean and standard deviation ofthis optimization error both scale as ropt

!1 . For standard sam-pling no such result arises and the optimization error isdrawn from an unknown distribution with no mean or vari-ance.

Since optimization with efficient !or optimum" samplingprovides more control over random errors this suggests thatit will perform better than standard sampling. The analysisalso suggest that !for efficient or optimum sampling" the sta-tistical quality of parameters can be improved by performingcycles of optimization until convergence is achieved, andaveraging the parameters provided by a n further cycles, soreducing the mean and standard deviation of eopt by a factorof 1 /n. In a sense this is equivalent to optimizing with nrsamples and not averaging, but has the advantage of allow-ing faster convergence that can be monitored.

We consider optimization of the isolated all-electron car-bon atom described in Sec. I using efficient sampling. Figure3 shows the evolution of estimates during optimization usingefficient sampling. Twenty cycles of optimization are shown,with ropt varying between 384 and 80 000 !there are 377 freeparameters", and with total energy estimates evaluated fromrVMC=2 200 000 samples. As ropt is increased, the conver-gence rapidly becomes stable. Estimates arising from ropt

=14 800 and ropt=80 000 are statistically indistinguishable,and for both the final nine estimates possess 97.3% confi-dence intervals that overlap with the confidence intervals ofthe lowest energy estimate. Essentially the eopt becomes un-



detectable in the presence of eVMC for rVMC%150ropt, anequal allocation of computational resource to VMC estima-tion and optimization.

We expect eopt to be worse for standard sampling, in thatpower law tails and outliers occur. Exploratory calculationsshow this to be so for small sample sizes, for no robuststabilization, or for including several iterations in each cycle.However, for modest sample sizes, including robust stabili-zation, and employing only one iteration per cycle, the opti-mization error remains undetectable at the lower statisticalresolution of standard sampling. It seems that the primaryadvantages offered by efficient sampling are the improvedaccuracy for estimates and the theoretical justification foraveraging converged sets of parameters.

Figure 4 shows the results of standard and efficient op-timizations for an isolated O atom, with the same computa-tional resource used for each. The final VMC total energyestimate was constructed using 50% of the resource, 25%was used to generate samples and total energy estimates forcycles, and 25% was used for the optimization itself. Thiscorresponds to samples proportioned between final estimate,

monitor estimate, and optimization as 220:5:1 for standardsampling and 8946:186:1 for efficient sampling.

Efficient sampling performs significantly better thanstandard sampling for all the total energy estimates, prima-rily by reducing the random error by &1/7. For efficientsampling the final estimate is consistently below the lessaccurate estimates arising in optimization, but such an im-provement is not clearly discernible for standard sampling.We take this combination of efficient sampling, equipartitionof computational cost, and averaging of statistically indistin-guishable sets of parameters and apply it to generate theresults of the next section.

IV. RESULTS AND DISCUSSION

The approach was applied to a group of first row sys-tems, the isolated atoms, homonuclear diatomic molecules,diatomic hydrides, and the diatomic molecules LiF, CN, CO,and NO. This collection of systems was chosen to allowcomparison of total energy estimates with recently publishedab initio total energies and accurate approximate total ener-gies in literature.

For any VMC calculation the achievable statistical andsystematic error is unavoidably a compromise between finitecomputational resources and the incompleteness of availablewave function parametrization. To make our compromise wechose to allocate the same computational resource to eachsystem, divided between optimization and estimation as inthe previous section.29 Trial wave functions of the form de-scribed in Sec. II, Eq. !29", were constructed with expansionorders chosen to include as much variational freedom as pos-sible while obtaining a usefully low statistical error.

For both Jastrow and backflow the number of parameterswas set so as to employ polynomials two-orders less than theonset of numerical error in polynomial evaluation. Multide-terminant expansions were constructed as sums of uncoupledspatial and spin symmetry eigenstates, that is as CSFs, withoptimized coefficients. For isolated atoms the CSF expansionwas obtained using numerical orbitals from ATSP2K !Ref. 22"as for C in Sec. I. For the diatomic molecules no MCHF

cycle

! Eto

t(a

.u.)

20100

-37.6

-37.7

-37.8

FIG. 3. Evolution of total energy estimates with optimization for all-electron C and using efficient sampling. Four optimization processes areshown using 384, 1480, 14 800, and 80 000 samples in order of decreasingEtot. All total energy estimates are generated using 2 200 000 samples.

cycle(a)

! Est

d(a

.u.)

20151050

-75.05

-75.055

-75.06

-75.065

cycle(b)

! Eef

f(a

.u.)

20151050

-75.05

-75.055

-75.06

-75.065

FIG. 4. Evolution of total energy estimates with optimization for O, for both standard and efficient sampling. !a" shows the evolution of estimates usingstandard sampling and !b" shows the evolution of estimates using efficient sampling, with the same total computational cost of each. The gray regions are theconfidence interval for the final total energy estimate constructed using an average of converged parameters and equal computational cost.



package appears to be available, hence ground state andsome low energy excited state numerical orbitals were gen-erated using the Hartree–Fock package 2DHF,30 and CSFsconstructed from these using single and double excitations ofvalence electrons. A somewhat arbitrary upper limit to thenumber of CSFs was used, aiming for 500 and 50 determi-nants for isolated atoms and molecules, but with a largeamount of variation in the later due to the finite number ofexcited bound states.

These trial wave functions offer a large degree of varia-tional freedom, with between 246 and 455 optimizable pa-rameters present !for Ne2 and BeH, respectively". The mostobvious deficiencies are that orbital relaxation is not in-cluded, and that for the diatomic molecules the chosen mul-tideterminant expansion is not self-consistent. Table II sum-marizes the properties of each trial wave function.Optimization and estimation of these trial wave function pro-ceeded as in Sec. II, with 20 cycles of optimization followedby averaging of converged parameters for use in the finaltotal energy estimate calculation. Overall optimizationshowed similar stability to that shown previously for C andO, providing between one and ten converged sets of param-eters for averaging.

Table III shows the resulting total energy estimates. Forcomparison of these results with other methods we focus onthe variation in the fraction of total correlation energy recov-ered for each system, taking Restricted Hartree-Fock !RHF"energies from numerical calculations using ATSP2K and 2DHF,

and approximate exact total energies from the sources speci-fied in the table.

The lowest variational total energy estimates for firstrow isolated atom in literature appear to be those of Brown etal.23 They employed standard sampling, linear optimization,and trial wave function of the same form as Eq. !29" but withless variational freedom, mostly due to a smaller multideter-minant expansion. Our VMC results consistently fall be-tween the VMC and DMC estimates of this paper, are con-sistently closer to DMC than VMC estimates of this paper,and provide a small but significant recovery of between 0.1%and 1.5% more correlation energy at VMC level.

Toulouse and Umrigar31 also provided isolated atom en-ergies using both VMC and DMC, based on linear optimiza-tion of trial wave function using standard sampling. Theirtrial wave functions were composed of RHF or Multi-Configurational Self-Consistent Field !MCSCF" orbitals rep-resented using a Slater basis set, with orbital relaxation, aJastrow factor, and one or two CSFs. Our VMC energiesconsistently improve on the estimates in this paper, providingbetween 0.3% and 12.5% more correlation energy than theVMC estimates, and between 0.0% and 5.6% more correla-tion than DMC estimates. The improvement is entirely con-sistent with the contribution to correlation provided by back-flow and multideterminant expansions discussed in Ref. 23.

Toulouse and Umrigar31 also provided VMC and DMCtotal energies for the first row homonuclear diatomic mol-ecules, with trial wave functions constructed in the same

TABLE II. Summary of parameters for the isolated atoms and diatomic molecules. Experimental bond lengthsused are from Ref. 31 for the homogeneous diatomic molecules and from Ref. 32 for the rest.

Ne

D!a.u." Term Ndet NCSF Nparam

Li 3 ¯ 2S 502 86 326Be 4 ¯ 1S 137 29 339B 5 ¯ 2P 521 43 368C 6 ¯ 3P 550 52 377N 7 ¯ 4S 458 25 350O 8 ¯ 3P 472 41 366F 9 ¯ 2P 553 29 354Ne 10 ¯ 1S 499 14 324Li2 6 5.051 15g

+ 120 34 275Be2 8 4.65 15g

+ 39 11 252B2 10 3.005 35g

! 61 16 257C2 12 2.3481 15g

+ 69 17 258N2 14 2.075 15g

+ 35 10 251O2 16 2.283 35g

! 12 8 249F2 18 2.668 15g

+ 7 6 247Ne2 20 5.84 15g

+ 6 5 246LiH 4 3.0139 15+ 31 15 452BeH 5 2.5372 25+ 30 18 455CH 7 2.1163 26 25 13 450NH 8 1.9581 35! 21 9 446OH 9 1.8324 26 25 13 450FH 10 1.7328 15+ 61 13 450LiF 12 2.9553 15+ 62 14 451CN 13 2.2144 25+ 35 8 445CO 14 2.1321 15+ 37 9 446NO 16 2.1746 26 53 14 451



manner as the isolated atoms but allowing larger multideter-minant expansions for some of the molecules. The differencebetween results is not as consistent as for the isolated atoms.Our VMC results recover between 6.7% less and 5.4% morecorrelation energy, with a consistent improvement as electronnumber is increased and the crossover occurring between B2and C2. The same trend occurs when comparing with DMCresults, with our VMC energies higher than the DMC resultsof Ref. 31 for all molecules except Ne2. This is consistentwith the isolated atom behavior and the observation that ourmultideterminant expansion is inferior in that it is not ob-tained from a self-consistent calculation. This reduction incorrelation energy is apparent for the small systems only,with the contribution to correlation energy provided by back-flow becoming dominant as electron number increases.

Curtiss et al.36 provided total energies for a large set ofmolecules arrived at using the Gaussian-4 !G4" theory, astate-of-the-art perturbation method. These provide a usefulcontext for the VMC results presented here, provided webear in mind that the comparison is less clear cut since G4errors may be positive or negative due to it being a nonva-riational method. Of the 26 VMC results in this paper G4total energies are available for 23. Of these, our VMC totalenergies are below !above" those of G4 for 17 !6" of thesystems, with the VMC recovering between 25.8% more and4% less correlation energy than the G4 results. The isolated

atom energies are consistently lower for VMC, but beyondthis very little consistent behavior is evident beyond the gen-eral observation that VMC energies are lower more oftenthan not, and where the VMC energy is greater than G4, thedifference is small.

Finally, we compare the correlation recovered by VMCbetween the systems considered. Perhaps the most strikingfeature is the superior performance for isolated atoms whencompared with the diatomic molecules. This can partly beascribed to the superior determinants used for the isolatedatoms when compared with diatomic molecules !the latterare made up of fewer determinant and are not self-consistentat the multideterminant level", but it seems unlikely that thisis enough when we consider the wide variation in the amountof correlation recovered. A rough observation is that correla-tion is most effectively described for those systems wherethe electronic behavior around a nucleus is most similar to anisolated atom or ion, for example, for the ionic hydrides suchas LiH, or the very weakly bonded molecules such as Ne2.This suggests the possibility that “missing correlation” maywell be due to Jastrow and backflow being limited to radialfunctions, and that it may be useful to replace these withfunctions that allow more geometric freedom. This must in-evitably involve introducing more parameters, but the com-bination of efficient sampling, linear optimization, and pa-rameter averaging employed here appears to be able tohandle large parameter sets reliably and efficiently.

V. CONCLUSIONS

Fermionic variational quantum Monte Carlo is almostalways implemented by drawing sample values of particlecoordinates from the distribution Ps="!2, where ! is thetrial wave function whose expectation values are sought—the standard sampling of this paper. We have shown that thisis an ad hoc choice, that the only special property of thischoice is the simplicity of the resulting expressions for esti-mates, and that it has some undesirable properties.

An analysis of VMC using a general sampling distribu-tion is described. We provide the conditions that must besatisfied for the resulting VMC estimates to be normally dis-tributed, conditions that are easily violated by sample distri-butions that unnecessary possess a nodal surface. For thecase where these conditions are satisfied, general estimatesare derived for the parameters of the underlying Normal dis-tribution, so providing estimates, confidence limits, and errorbars. When the conditions are not satisfied, the underlyingdistributions are shown to be of a known but undesirableform that are heavy tailed and that are not characterized by amean and variance. Obtaining estimates of the parameters ofsuch distributions is considerably more difficult than the nor-mal case, and even if such estimates were available the useof error bars is less informative than the Normal case.

This analysis is extended to VMC optimization in a lessrigorous manner to provide similar conditions that must besatisfied in order that the random fluctuation in optimizedparameters is Normal, and that the error in a total energy dueto the randomness of these parameters possesses a mean andvariance. Applying these conditions to the special case of

TABLE III. Total electronic energies for isolated atoms and diatomic mol-ecules. The table presents VMC estimates obtained as described in thispaper !EVMC". Approximate exact total energies for isolated atoms are fromRef. 33 for Li and Ref. 34 otherwise. Approximate exact total energies arefrom Ref. 31 for Be2, B2, C2, and Ne2, and from Ref. 35 otherwise.

EVMC

!a.u."Eexpt

!a.u."Ecorr

!%"

Li 27.478 052!2" 27.478 06032 99.981!5"Be 214.667 243!3" 214.667 36 99.876!4"B 224.653 29!1" 224.653 91 99.503!8"C 237.843 61!2" 237.8450 99.11!1"N 254.586 41!4" 254.5892 98.52!2"O 275.060 58!5" 275.0673 97.39!2"F 299.726 23!8" 299.7339 97.64!3"Ne 2128.9299!1" 2128.9376 98.02!3"Li2 214.983 98!4" 214.9951 91.00!3"Be2 229.318 85!7" 229.3380 90.60!4"B2 249.3837!1" 249.415 90.34!4"C2 275.8893!2" 275.9265 92.83!4"N2 2109.4981!3" 2109.5421 91.99!6"O2 2150.2843!4" 2150.3267 93.57!6"F2 2199.4819!4" 2199.5303 93.60!5"Ne2 2257.8517!5" 2257.8753 96.99!6"LiH 28.069 08!2" 28.0704 98.41!2"BeH 215.244 07!2" 215.2468 97.08!3"CH 238.468 42!5" 238.4788 94.78!3"NH 255.208 63!8" 255.2227 94.24!3"OH 275.7222!1" 275.7371 95.29!4"FH 2100.4446!2" 2100.4592 96.24!4"LiF 2107.4177!2" 2107.4344 96.22!4"CN 292.6755!2" 292.7250 90.10!5"CO 2113.2845!3" 2113.3261 92.23!6"NO 2129.8457!3" 2129.9047 90.27!5"



standard sampling informs us that although VMC total en-ergy estimates for fixed parameters are normally distributed,almost all of the quantities that appear in optimization arenot.

Two new sampling strategies are presented, both ofwhich ensure a Normal random error wherever possible. Op-timum sampling is obtained by deriving the sample distribu-tion for which the Normal random error is as small as pos-sible for a given sample size. This naturally provides a lowerlimit to the random error possible for a given system, and anaccurate approximation to this distribution is derived.

Efficient sampling is designed by attempting to avoid alarge part of the computational cost of providing samplesusing the Metropolis algorithm. The resulting sample distri-bution function is computationally cheap to evaluate, has thecorrect properties for estimates and optimization functions tobe normally distributed, and does not perform significantlyworse than standard or optimum sampling for a givensamples size.

Of the two, efficient sampling provides the most usefulimprovement over standard sampling by increasing the num-ber of samples possible for a fixed computational resource byan order of magnitude and significantly decreasing the avail-able random error. In addition, the analysis of optimum andefficient samplings suggests improved statistical propertiesfor optimization when compared to standard sampling. How-ever, no convincing numerical evidence that this is the casefor real systems was obtained, mainly due to the greatercomputational cost of standard sampling, the small randomerror in optimization compared to estimation, and that anyrigorous test would require a statistically significant numberof optimization calculations to be compared. Perhaps themost useful results provided by this analysis is a theoreticaljustification for averaging sets of optimized parameters inorder to reduce the optimization random error.

Variational quantum Monte Carlo using efficient sam-pling and optimization was applied to evaluate the electronictotal energy for 26 isolated atoms and molecules, using trialwave function with a great deal of variational freedom and afixed and modest computational budget divided equallyamong calculations. At VMC level more than 97% of corre-lation energy was recovered for first row isolated atoms, andmore than 90% for a small set of diatomic molecules, resultsthat significantly improve on many previously publishedVMC results and approaching DMC for some systems.

As systems increase in size the efficient sampling strat-egy is likely to become less advantageous, but this has notoccurred for the systems considered so far. When this doesoccur it should be possible to employ further sampling dis-tributions that perform better for these systems, such as op-timum sampling, or the distribution used by Attaccalite andSorella9 for a dense extended system.

ACKNOWLEDGMENTS

The authors thank R. J. Needs of the University of Cam-bridge for many useful discussions. Financial support wasprovided by the Special Coordination Funds for PromotingScience and Technology, “Promotion of environmental im-

provement to enhance Young Researchers’ independence,and make use of their abilities” #“Development of PersonnelSystem for Young Researchers in Nanotechnology and Ma-terials Science” !Japan Advanced Institute of Science andTechnology"$, and by a Grant-in-Aid for Scientific Researchin Priority Areas “Development of New Quantum Simulatorsand Quantum Design !No. 17064016"” !Japanese Ministry ofEducation,Culture, Sports, Science, and Technology;KAKENHI-MEXT".

APPENDIX: POWER LAW TAILS FROMSINGULARITIES AT THE NODAL SURFACE

Throughout this paper it is repeatedly stated that in orderto characterize the form of the distribution of an estimate, allthat is required is a knowledge of the behavior of a sampleand its distribution in the region of the nodal surface of thetrial wave function. This appendix describes how this isdone, and is a summary and generalization of the methodapplied to the standard case in a previous publication.11

A MC estimate is constructed by drawing r samplesfrom a distribution in 3N dimensional coordinate space,P!R", evaluating a functional at each of these samples,xL!Ri", and then taking a sample mean, that is the integral

X =& P!R"xL!R"d3NR !A1"

is estimated as a sample value of the random variable Xgiven by

X =1r ' xL!R" . !A2"

It is necessary for the central limit theorem to be valid inorder to know the distribution of X and to characterize therandom error in terms of confidence intervals for a particularvalue of the estimate. A valid CLT requires the first andsecond moments of x=xL!R" to exist, where x is a randomvariable whose distribution is critically dependent on thepresence of singularities in the function xL!R".

The distribution of x can be obtained from that of Rusing the standard formula

P!x" = &x=xL!R"

P!R"*$RxL*

d3N!1R , !A3"

where the integral is taken over all surfaces of constant xL=x. Here we consider the case where xL and P!R" possesssingularities and/or zeroes on a 3N!1 dimensional hypersur-face, the particular case that repeatedly occurs within QMCdue to the existence of the nodal surface in fermionic wavefunctions.

Introducing a general curvilinear coordinate system interms of surfaces of constant xL provides a new coordinatesystem !s ,T", with s as a scalar set to zero on the 3N!1dimensional hypersurface of interest and T as an implicit3N!1 vector contained in the hyperplane tangential to theconstant xL surface at R. Expanding the distribution aboutthe hypersurface of interest !usually the nodal surface ofsome trial wave function" in this new coordinate systemgives



P!s,T" = sm#a0!T" + a1!T"s + ¯$ !A4"

and analogously xL becomes

xL!s,T" =1sn #b0 + b1s + ¯$ , !A5"

with !m ,n" characterizing the zeros and singularities of Pand xL.

In terms of !s ,T" the distribution of x is given by

P!x" =& P!s,T". ds

dxL.d3N!1T , !A6"

so in the limit s!0,

P!x" 0sm

1/*sn+1*!A7"

and in the limit x! ,%,

P!x" " + 1*x*,!m+n+1"/n

. !A8"

The existence of positive and/or negative tails, and their pos-sible equality, is entirely decided by the continuity of P!R" atthe hypersurface of interest, and the symmetry of the singu-larity in xL. An example of this analysis arises for the totalenergy estimate via standard sampling, for which !m ,n"= !2,1", hence P possesses asymptotic tails that decay as x!4,and the CLT is valid for the estimate since the second mo-ment exists.

For the generalized sampling a normally distributed es-timate requires a valid bivariate CLT, and this is decided bythe existence of the covariance matrix. The random estimateis

Z =YX

='yi

'xi, !A9"

with each (yi) and (xi) independent and identically distrib-uted, with variables in the two sets independent for i# j,co-distributed as P!y ,x" for i= j. The validity of the bivariateCLT requires the covariance matrix of P!y ,x" to exist, andsince this matrix is positive definite, we only require all di-agonal matrix elements to exist—the variance of both yi andxi. Consequently, we only need to consider the distribution ofeach random variable independently of the other, given byEq. !A3" for xi and the equivalent expression for yi, and theunivariate analysis given above can be applied. Note that yiand xi are parametrically related to each other !via the un-derlying random spatial variable" so P!y ,x" is nonzero onlyon a line in two dimensional space, but this does not preventthe existence of the covariance matrix or the continuity ofP!Y ,X" as long as more than one sample appears in eachsum.

An example arises for optimization with standard sam-pling, for which !m ,n"= !2,2" for both of !yi ,xi" as soon asthe nodal surface moves away from its starting position. So,

P!y ,x" possesses positive asymptotic tails that decay as y!5/2

and x!5/2 for !yi ,xi", neither of the variances are defined, thecovariance matrix is not defined, and the estimate Z is notnormal in the large r limit. Generally, for P!y ,x" decayingslower than either *x*!3 or *y*!3 in the asymptotic limit abivariate stable distribution results, and although a generali-zation of Feller’s theorem could be constructed, it cannotprovide a normally distributed estimate for the quotient, orestimates of confidence limits.

1 W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Rajagopal, Rev. Mod.Phys. 73, 33 !2001".

2 J. F. Traub and A. G. Wershulz, Complexity and Information !CambridgeUniversity Press, Cambridge, 1998".

3 D. W. Stroock, Probability Theory: An Analytic View !Cambridge Uni-versity Press, Cambridge, 1993".

4 J. H. Curtiss, Ann. Math. Stat. 12, 409 !1941".5 G. Samorodnitsky and M. S. Taqqu, “Stable Non-Gaussian Random Pro-cesses !Chapman & Hall, Florida, 1994"; see also http://academic2.american.edu/jpnolan/stable/stable.html.

6 M. Dewing and D. M. Ceperley, Recent Advances in Quantum MonteCarlo Methods II !World Scientific, Singapore, 2002".

7 R. L. Coldwell, Int. J. Quantum Chem. S11, 215 !1977".8 S. A. Alexander, R. L. Coldwell, H. J. Monkhorst, and J. D. Morgan III,J. Chem. Phys. 95, 6622 !1991".

9 C. Attaccalite and S. Sorella, Phys. Rev. Lett. 100, 114501 !2008".10 R. Assaraf, M. Caffarel, and A. Khelif, J. Phys. A: Math. Theor. 40, 1181

!2007".11 J. R. Trail, Phys. Rev. E 77, 016703 !2008".12 J. R. Trail, Phys. Rev. E 77, 016704 !2008".13 D. Ceperley, M. Dewing, and C. Pierleoni, Bridging Time Scales: Mo-

lecular Simulations for the Next Decade !Springer-Verlag, Berlin Heidel-berg, 2002".

14 Finite (opt for an infinite density matrix may occur by a cancellation ofnonintegrable singularities in the sum of integrals in Eq. !13", #c22!2!$2 /$1"c12+ !$2 /$1"2c22$.

15 U. von Luxburg and V. H. Franz, Confidence Sets for Ratios: A PurelyGeometric Approach To Fieller’s Theorem !133", Max Planck Institutefor Biological Cybernetics, Tübingen, Germany, 2004; E. Fieller, Bi-ometrika 24, 428 !1932".

16 For standard sampling we replace Pg with !2, and may replace m with!m!1".

17 R. J. Needs, M. D. Towler, N. D. Drummond, and P. López Ríos, J.Phys.: Condens. Matter 22, 023201 !2010".

18 A. Badinski, J. R. Trail, and R. J. Needs, J. Chem. Phys. 129, 224101!2008".

19 A. Badinski, P. D. Haynes, J. R. Trail, and R. J. Needs, J. Phys.: Condens.Matter 22, 074202 !2010".

20 N. D. Drummond, M. D. Towler, and R. J. Needs, Phys. Rev. B 70,235119 !2004".

21 P. López Ríos, A. Ma, N. D. Drummond, M. D. Towler, and R. J. Needs,Phys. Rev. E 74, 066701 !2006".

22 C. Froese Fischer, G. Tachiev, G. Gaigalas, and M. R. Godefroid, Com-put. Phys. Commun. 176, 559 !2007".

23 M. D. Brown, J. R. Trail, P. L. Rios, and R. J. Needs, J. Chem. Phys.126, 224110 !2007".

24 These are biased estimates. The bias is small and equal to (E2! /r defined

by Eq. !15", provided the bivariate CLT is valid. We do not use unbiasedestimates to define the minimized quantity since they do not change thediscussion in this section, but do complicate the equations for generalsampling.

25 M. P. Nightingale and V. Melik-Alaverdian, Phys. Rev. Lett. 87, 043401!2001".

26 J. Toulouse and C. J. Umrigar, J. Chem. Phys. 126, 084102 !2007".27 This is not entirely obvious since Eq. !35" includes a total energy esti-

mate, hence the averaged variables are neither independent or identicallydistributed. However, the bivariate CLT remains valid due to the contri-bution of correlation between samples at different times decaying relativeto the contribution of samples at the equal times, as sample size in-creases. A derivation is analogous to showing that variance estimatesbecome Normal.

28 M. Lax, W. Cai, and M. Xu, Random Processes in Physics and Finance



http://dx.doi.org/10.1103/RevModPhys.73.33

http://dx.doi.org/10.1103/RevModPhys.73.33

http://dx.doi.org/10.1214/aoms/1177731679

http://academic2.american.edu/jpnolan/stable/stable.html.

http://academic2.american.edu/jpnolan/stable/stable.html.

http://dx.doi.org/10.1063/1.461532

http://dx.doi.org/10.1103/PhysRevLett.100.114501

http://dx.doi.org/10.1088/1751-8113/40/6/001

http://dx.doi.org/10.1103/PhysRevE.77.016703


http://dx.doi.org/10.1088/0953-8984/22/2/023201

http://dx.doi.org/10.1088/0953-8984/22/2/023201

http://dx.doi.org/10.1063/1.3013817

http://dx.doi.org/10.1088/0953-8984/22/7/074202

http://dx.doi.org/10.1088/0953-8984/22/7/074202

http://dx.doi.org/10.1103/PhysRevB.70.235119


http://dx.doi.org/10.1016/j.cpc.2007.01.006

http://dx.doi.org/10.1016/j.cpc.2007.01.006

http://dx.doi.org/10.1063/1.2743972

http://dx.doi.org/10.1103/PhysRevLett.87.043401

http://dx.doi.org/10.1063/1.2437215

!Oxford University Press, New York, 2006".29 The computational budget for each system was 24 h of computational

time for monitored optimization !12 h for optimization and 12 h forsample generation" followed by 24 h for the final accurate estimate, withcalculations performed on a modest desktop with 2 quad-core processors.

30 L. Laaksonen, P. Pyykko, and D. Sundholm, Comput. Phys. Rep. 4, 313!1986".

31 J. Toulouse and C. J. Umrigar, J. Chem. Phys. 128, 174101 !2008".

32 F. Ruette, M. Sanchez, R. Anez, A. Bermudez, and A. Sierraalta, J. Mol.Struct.: THEOCHEM 729, 19 !2005".

33 M. Puchalski and K. Pachucki, Phys. Rev. A 73, 022503 !2006".34 S. J. Chakravorty, S. R. Gwaltney, E. R. Davidson, F. A. Parpia, and C.

F. Fischer, Phys. Rev. A 47, 3649 !1993".35 D. P. O’Neil and P. M. W. Gill, Mol. Phys. 103, 763 !2005".36 L. A. Curtiss, P. C. Redfern, and K. Raghavachari, J. Chem. Phys. 126,

084108 !2007".



Documents

Quantum monte carlo