25
Block-Adaptive Quantum Mechanics: an adaptive divide-and-conquer approach to interactive quantum chemistry Mäel Bosson, Sergei Grudinin, Stephane Redon NANO-D - INRIA Grenoble - Rhone-Alpes 655, avenue de l’Europe, 38335 Saint-Ismier Cedex, France February 28, 2014 Abstract We present a novel Block-Adaptive Quantum Mechanics (BAQM) approach to interactive quantum chemistry. Al- though quantum chemistry models are known to be computationally demanding, we achieve interactive rates by focusing computational resources on the most active parts of the system. BAQM is based on a divide-and-conquer technique, and constrains some nucleus positions and some electronic degrees of freedom on the fly to simplify the simulation. As a result, each time step may be performed significantly faster, which in turn may accelerate attraction to the neighboring local minima. By applying our approach to the non-self-consistent ASED-MO (Atom Superposition and Electron Delocalization Molecular Orbital) theory, we demonstrate interactive rates and efficient virtual prototyping for systems containing more than a thousand of atoms on a standard desktop computer. Keywords: Interactive Quantum Chemistry, Reduced Basis, Adaptive, Divide-And-Conquer, ASED-MO. 1

Block-adaptive quantum mechanics: An adaptive divide-and-conquer approach to interactive quantum chemistry

Embed Size (px)

Citation preview

Block-Adaptive Quantum Mechanics: an adaptive divide-and-conquerapproach to interactive quantum chemistry

Mäel Bosson, Sergei Grudinin, Stephane RedonNANO-D - INRIA Grenoble - Rhone-Alpes

655, avenue de l’Europe, 38335 Saint-Ismier Cedex, France

February 28, 2014

Abstract

We present a novel Block-Adaptive Quantum Mechanics (BAQM) approach to interactive quantum chemistry. Al-though quantum chemistry models are known to be computationally demanding, we achieve interactive rates by focusingcomputational resources on the most active parts of the system. BAQM is based on a divide-and-conquer technique, andconstrains some nucleus positions and some electronic degrees of freedom on the fly to simplify the simulation. As a result,each time step may be performed significantly faster, which in turn may accelerate attraction to the neighboring localminima. By applying our approach to the non-self-consistent ASED-MO (Atom Superposition and Electron DelocalizationMolecular Orbital) theory, we demonstrate interactive rates and efficient virtual prototyping for systems containing morethan a thousand of atoms on a standard desktop computer.

Keywords: Interactive Quantum Chemistry, Reduced Basis, Adaptive, Divide-And-Conquer, ASED-MO.

1

Block-Adaptive Quantum Mechanics (BAQM) is a new approach to interactive quantum chemistry. BAQM is based on adivide-and-conquer technique, and constrains some nucleus positions and some electronic degrees of freedom on the fly tosimplify the simulation. By applying our approach to the non-self-consistent ASED-MO theory, we demonstrate interactiverates and efficient virtual prototyping for systems containing more than a thousand of atoms on a standard desktop computer.

2

1 Introduction

The fundamental Schrödinger equation for nuclei and electrons is a fascinating problem that has been attracting a lot ofattention in the computational chemistry community. In theory, solving this equation makes it possible to accurately describethe behavior of particles at the atomic scale. Thus, it seems natural that software applications for computer-aided design(CAD) of nanosystems should simulate quantum physics. In particular, CAD applications should interactively provide theuser with physically-based feedback when editing the structure of a nanosystem.

Because of the high computational cost of underlying numerical methods, though, interactively solving the Schrödingerequation is a challenging problem. Fortunately, many efficient computational methods have been deduced from approximatetheories15. In general, these methods solve the one-electron Schrödinger equation after it has been projected to a finite basisset. For instance, employing a basis set composed of atomic orbitals (denoted by φµ)30 leads to the following generalizedeigenvalue problem:

HC = SCD, (1)

whereHµν = 〈φµ|H|φν〉 and Sµν = 〈φµ|φν〉. (2)

The diagonal matrix D contains the sorted eigenvalues and the matrix C contains the corresponding eigenvectors (ei denotesthe ith lowest eigenvalue and Ci the corresponding eigenvector). The potential energy of the system is the sum:

E =N/2∑

i=1

2ei, (3)

where N is the number of electrons in the system. The gradient of the potential energy is:

∇xE =∑

µ

ν

Pµν∇xHµν −∑

µ

ν

Wµν∇xSµν , (4)

where P is the density matrix and W is the energy-weighted density matrix:

P =N/2∑

i=1

2CiCTi , W =N/2∑

i=1

2eiCiCTi . (5)

One approach to efficiently evaluate the Hamiltonian matrix H is to use a semi-empirical model such as the ASED-MO(Atom Superposition and Electron Delocalization Molecular Orbital) theory3. In this theory, we have recently presented aninteractive quantum chemistry approach red7 based on the Divide-And-Conquer (D&C) method16. In particular, we havedemonstrated that interactively solving the one-electron Schrödinger equation is possible on current desktop computers forsystems composed of a few hundreds of atoms. By subdividing the system into many overlapping subsystems, this approachhas a linear time complexity in the number of atoms, as well as a good parallel scaling32, which should thus allow forcontinued improvements with current hardware trends in personal computers.

Despite this, it will still be difficult to achieve interactive rates in two situations:

• Large number of subsystems: since the number of subsystems increases linearly with the number of atoms, somesystems will simply be too large to allow for interactive rates.

• Large subsystems: to reach high accuracy, the D&C approach needs to employ sufficiently large overlapping sub-systems7. In this case, solving even a single subsystem’s eigendecomposition problem may be too costly to achieveinteractive rates. Furthermore, it may be difficult to expect important speed-ups in the near future because diagonal-ization algorithms typically have poor parallel scaling5,8 and the serial speed of processing cores is reaching a physicallimit41. One approach to speed-up electronic structure calculations consists in incrementally updating eigenvectors,as in the Residual Minimization – Direct Inversion of the Iterative Subspace” (RM-DIIS) approach33. Unfortunately,this may be as slow as the direct approach when too many eigenvectors have to be updated. Another approach could

3

be to directly freeze the density matrix while letting atomic nuclei move17. However, when a non-orthogonal basisset is used, this may produce non-orthogonal molecular orbitals which might attract the system in configurations withactually higher potential energy.

To address both issues, we propose a novel Block-Adaptive Quantum Mechanics (BAQM) approach, based on the Divide-And-Conquer method and two new components.

First, in order to decouple the computational complexity from the system’s size, we propose to adaptively simulate thenucleus degrees of freedom. In general, the nearsightedness principle23 makes it possible to perform a fast incremental updateof the electronic structure when only some atoms have moved17,26,39,40. In the Divide-And-Conquer approach16, the systemis divided into nearly independent overlapping subsystems. In the context of a non self-consistent theory, when all atomsof a subsystem are frozen in space, both the Hamiltonian and its eigendecomposition are constant. To take advantage ofthis fact, we extend the approach we previously introduced for adaptive Cartesian mechanics coordinates6. Precisely, wefreeze and unfreeze groups of atoms, according to the applied atomic forces and the system’s decomposition into overlappingsubsystems. We call this first component Block-Adaptive Cartesian Mechanics.

Second, to be able to deal with large subsystems for which diagonalization is the bottleneck, we propose to use an adap-tively updated reduced basis which takes advantage of temporal coherence between successive eigendecomposition problems.For some methods, evaluating the Hamiltonian and overlap matrices may be computationally demanding. However, thesecomputations are intrinsically parallel and can benefit from modern hardware architectures such as Graphics ProcessingUnits (GPUs)43. Similarly, the computation of the density matrix has a cubic complexity in the number of basis functions,but dense matrix multiplications are memory-friendly42,46 and can be efficiently handled on modern hierarchical-memorymulticore architectures13,45. As a result, we have focused our efforts on the computation of molecular orbitals. A natural wayto accelerate the resolution of many similar differential equations is to use a reduced basis approach31. This methodologyhas been applied in specific contexts for electronic structure calculation12,29. In this paper, we propose to use an adaptivereduced basis which is automatically updated during the simulation. We call this second component Adaptive Reduced-BasisQuantum Mechanics.

We demonstrate that the BAQM approach may significantly speed-up energy minimization, as well as enable interactivequantum chemistry for large molecular systems. Figure 1 illustrates interactive virtual prototyping of a polyfluorene chainmolecule.

Figure 1: Block-Adaptive Quantum Mechanics (BAQM) in SAMSON (Software for Adaptive Modeling andSimulation Of Nanosystems)1. In this example the system is divided into four subsystems. The energy is minimizedcontinuously as the user edits the molecular system. At each time step, both the geometry and the electronic structureare incrementally and adaptively updated. Because the user pulls one atom (red arrow) in the left part of the system,the electronic structure is updated with the full basis for the leftmost subsystem (all atoms are red). In the neighboringsubsystem, the electronic structure is updated according to a reduced-basis approximation (some carbons are black and somehydrogens are white). In the right part of the molecule, the user force does not have a sufficiently large impact, and atomspositions are frozen (all atoms are blue).

4

2 Overview

In general, adaptive approaches automatically focus computational resources on the most relevant parts of a problem. We usesuch an approach to maintain interactive rates while modeling chemical structures based on quantum chemistry principles.In this section, we provide an overview of our approach, and introduce its two main components: block-adaptive Cartesianmechanics, and adaptive reduced-basis quantum mechanics. For completeness, we first briefly recall the ASED-MO theoryand the Divide-And-Conquer (D&C) technique. red We refer the reader to our previous publication7 for more details aboutour ASED-MO D&C method.

2.1 The ASED-MO theory

In this paper, we used the ASED-MO theory3 to test and validate our BAQM approach. In this theory, the electronicdensity function is split into two terms: a perfectly-following term (the electron density when atoms do not interact), anda non-perfectly-following term (corresponding to the bonds formation). This last term is computed based on the ExtendedHückel Molecular Orbital theory (EHMO)20, a simple semi-empirical quantum chemistry method which approximates theHamiltonian matrix terms as:

Hµν = KIµ + Iν

2Sµν , (6)

where Iµ is the ionization energy of the atomic orbital φµ, and K is the Wolfsberg-Helmholtz constant.

2.2 The Divide-And-Conquer (D&C) technique

redThe D&C approach is attractive because of its efficiency (nearly perfect parallel scaling32,35), simplicity for non-orthogonal

basis sets, and accuracy16,22,25,44,47,48. There are three main steps in it:

• Dividing the systemThe original system S is first divided into M non-overlapping subsystems S1, . . . , SM . Then, for each subsystem Si,an extended subsystem S∗i is defined as the one containing all atoms from Si and those closer to these atoms than acertain distance cutoff.

• Computing each subsystem electronic structure independentlyred A basis set is associated to each extended subsystem S∗i (1 6 i 6 M). The projection of the one-electron Schrödingerequation in redthis basis leads to the generalized eigenvalue problems:

HiCi = SiCiDi, 1 6 i 6 M. (7)

red Each local generalized eigenvalue problem (7) provides a set of molecular orbitals, which are then globally rankedaccording to their corresponding energies. We then populate these molecular orbitals until there are exactly N electronsin the system, as detailed in7.

• Summing up the various contributionsred The occupied molecular orbitals determine the local density matrices Pi and energy-weighted density matrices Wi,from which the density matrix P and the energy-weighted density matrix W are obtained via a superposition scheme7.Once P and W have been obtained, the potential energy is expressed as

E = Tr(HP ) (8)

and the gradient of the potential energy is approximated as:

∇xE =∑

µ

ν

Pµν∇xHµν −∑

µ

ν

Wµν∇xSµν . (9)

5

2.3 Block-adaptive Cartesian mechanics

One possible adaptive approach to control the computational cost of each time step consists in reducing the number ofnucleus degrees of freedom, to reduce the cost of updating the potential energy6,34. red In our ASED-MO D&C method7,the matrices Hi and Si involved in the eigenproblem (7) corresponding to subsystem Si are constant when all atoms in theextended subsystem S∗i are frozen in space.

Consequently, we decide to adaptively freeze and unfreeze nuclei positions extended subsystem by extended subsystem. Wedescribe this approach in section “Block-adaptive Cartesian mechanics”.

2.4 Adaptive reduced-basis quantum mechanics

Block-adaptive Cartesian mechanics allows us to reduce the number of eigendecomposition problems that have to be solved ateach time step. However, solving even just one of them may be too costly to achieve interactive rates. In order to acceleratemolecular orbitals computation, we reduce the dimension of the basis in which the one-electron Schrödinger equation isprojected.

For any given subsystem, successive eigendecomposition problems are very similar, because atoms do not move significantlyat each time step. Perturbation theory suggests that the subspace spanned by a cluster of eigenvectors might be ratherinsensitive to small perturbations37. To take advantage of this temporal coherence, we thus propose to use a reduced basiscomposed of low-energy eigenvectors computed at a previous time step. red To determine when the reduced basis should beupdated, we use a simple distance measure between two generalized eigenvalue problems (7). We describe this approach insection “Adaptive reduced-basis quantum mechanic”.

2.5 Block-adaptive quantum mechanics

red Our BAQM approach combines the two components above to incrementally update the chemical structure of the molecularsystem at each time step.

At the first time step (e.g. when the molecular system is loaded into memory), we compute the complete electronicstructure of the system, as well as all forces applied on all atoms. This first step is used to initialize the BAQM process.Then, at each time step, we perform the following adaptive chemical structure update:

• Block-adaptive Cartesian mechanics(a) Adaptively freeze some extended subsystems.(b) For each active (unfrozen) atom, move along the force applied to it.

• Adaptive reduced-basis quantum mechanics(a) For subsystems with mobile atoms, adaptively choose either a reduced-basis or a full-basis update of the molecularorbitals.red(b) For subsystems with mobile atoms, or for which some molecular orbital occupation’s number change, updatethe density matrices.

Section “The Block-Adaptive Quantum Mechanics algorithm” describes the BAQM algorithm in full detail.

3 Block-adaptive Cartesian mechanics

We now describe the block-adaptive Cartesian mechanics component, which consists in automatically freezing some positionaldegrees of freedom. As has been shown before, freezing atomic positions may accelerate geometry optimization of localdefects26,39. Generally, these methods use a pre-defined active site, and only atoms in the active region are allowed to bemobile. In the context of interactive structural modeling, however, one cannot assume a pre-defined active site, since the userhas the possibility to stress the system at any location. Therefore, to efficiently attract the system into low-energy regions,we need to efficiently choose the set of mobile atoms at each time step.

6

We have recently introduced a novel adaptive algorithm which allows for interactive modeling with a reactive force field6.The key idea was to decide whether to activate or freeze an atom depending on the norm of the force applied to it. Precisely,an atom was frozen if this norm was smaller than a certain threshold value. Similarly, in quantum chemistry models, byswitching some positional degrees of freedom off, we may avoid updating some terms in the eigendecomposition problem (eq.(2)). However, as soon as a single term changes in the problem (eq. (7)), we have to solve a new eigendecomposition problem.It is thus very computationally attractive to freeze all atoms of an extended subsystem to avoid a new diagonalization.

To extend the previous approach of comparing the atomic force norms with a certain threshold, we define an extendedsubsystem force norm fS∗i for the extended subsystem S∗i . Precisely, fS∗i is the maximum atomic force norm in S∗i . We alsodefine a threshold value ffreeze, which is compared to these force norms fS∗1 , . . . , fS∗M . When a force norm fS∗i is lower thanffreeze, then, all atoms in the extended subsystem S∗i are frozen in space, and the corresponding eigendecomposition is notupdated. On the contrary, if there exists at least one atom with a force norm larger than ffreeze, we do not choose to freezeall atoms of S∗i . Note that even in this case, though, S∗i may still contain some frozen atoms, if it overlaps with some other,frozen subsystems.

The threshold value ffreeze can either be predefined by the user, or automatically computed at each time step based onthe system’s state. This value helps us control the computational cost of a time step, since one may directly control thenumber of performed diagonalizations. For fast energy minimization, we propose

ffreeze =12

maxi=1..M

fS∗i . (10)

This scheme is illustrated in Figure 2.

30

10

30 8 6 108

4

10

Step 0:threshold

=15

Step 1:threshold =5

SAtomicsystem

Active extendedsubsystem

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

2 OverviewIn the general sense, the adaptive paradigm consists in automatically fo-

cusing computational resources on the most relevant parts of the system.We use this approach to maintain interactive rates while modeling chemicalstructures based on quantum chemistry principles. In this section, we pro-vide an overview of our approach and introduce its two main components:block-adaptive Cartesian mechanics, and adaptive reduced-basis quantummechanics. For completeness, we first recall the Divide-And-Conquer (D&C)technique.

2.1 The Divide-And-Conquer (D&C) technique

The Divide-And-Conquer technique is essentially composed of three steps.

• Dividing the systemThe original system S is first divided into non-overlapping subsystemsS1, . . . , SM . Then, for each subsystem Si, an extended subsystem S�

i isdefined as containing all atoms from Si as well as those closer to atomsin Si than a certain distance cuto�.

• Solving each subsystem independentlyEach extended subsystem S�

i , 1 6 i 6 M is associated with a vec-torial subspace V �

i in which the one-electron Schrödinger equation isprojected, forming a generalized eigenvalue problem.

H iC i = SiC iDi. (3)

The solution of each local generalized eigenvalue problem (3) deter-mines the local density matrix P i and energy weigthed density matrixW i.

• Summing up the various contributionsIn order to compute the density matrix P and the energy-weighteddensity matrix W from the local matrices P i and W i, a superpositionscheme is applied. Once P and W have been obtained, the potentialenergy is expressed as

E = Tr(HP ) (4)

and the gradient of the potential energy is approximated as:

⌅xE =�

µ

Pµ⇥⌅xHµ⇥ ��

µ

Wµ⇥⌅xSµ⇥ +⌅xErep, (5)

We refer the reader to [13, 6] for more details about the D&C scheme.

7

2 OverviewIn the general sense, the adaptive paradigm consists in automatically fo-

cusing computational resources on the most relevant parts of the system.We use this approach to maintain interactive rates while modeling chemicalstructures based on quantum chemistry principles. In this section, we pro-vide an overview of our approach and introduce its two main components:block-adaptive Cartesian mechanics, and adaptive reduced-basis quantummechanics. For completeness, we first recall the Divide-And-Conquer (D&C)technique.

2.1 The Divide-And-Conquer (D&C) technique

The Divide-And-Conquer technique is essentially composed of three steps.

• Dividing the systemThe original system S is first divided into non-overlapping subsystemsS1, . . . , SM . Then, for each subsystem Si, an extended subsystem S�

i isdefined as containing all atoms from Si as well as those closer to atomsin Si than a certain distance cuto�.

• Solving each subsystem independentlyEach extended subsystem S�

i , 1 6 i 6 M is associated with a vec-torial subspace V �

i in which the one-electron Schrödinger equation isprojected, forming a generalized eigenvalue problem.

H iC i = SiC iDi. (3)

The solution of each local generalized eigenvalue problem (3) deter-mines the local density matrix P i and energy weigthed density matrixW i.

• Summing up the various contributionsIn order to compute the density matrix P and the energy-weighteddensity matrix W from the local matrices P i and W i, a superpositionscheme is applied. Once P and W have been obtained, the potentialenergy is expressed as

E = Tr(HP ) (4)

and the gradient of the potential energy is approximated as:

⌅xE =�

µ

Pµ⇥⌅xHµ⇥ ��

µ

Wµ⇥⌅xSµ⇥ +⌅xErep, (5)

We refer the reader to [13, 6] for more details about the D&C scheme.

7

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

1 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

2 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

1 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

2 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

1 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

force norm of the extended subsystem S�i . We also define a threshold value

fM which is compared to these subsystem force norms NS⇤1 , . . . , NS⇤M . Whena subsystem force norm NS⇤i is lower than fM , then, all the atoms of thecorresponding extended subsystem are frozen in space and the eigendecom-position is saved. By opposition, if it exists one atom with a force normlarger than fM , we do not freeze the extended subsystem. In this case, onecan remark that the susbsytem can still contain some frozen atoms as theS�

1 , . . . , S�M are overlapping subsystems.

The threshold value fM can be either simply predefined by the user orautomatically computed at each time step based on the system state. fM

helps us to control the computational cost of a time step as one may controlthe number of diagonalization. For fast energy minimization, we propose tochoose

fM = ( maxi=1..M

NS⇤i )/2 (6)

This scheme is illustrated in Figure 2.

S

S130

S210

30 8 6 108

S

S14

S210

Step 0:threshold fM=15

Step 1:threshold fM=5

SAtomicsystem

Six

Active extendedsubsystem

Six

Frozen extended subsystem x

Mobileatomx

Frozenatom

3 1 3 6 104 2

Figure 2: Block-adaptive Cartesian mechanics. In this example thesystem S is divided in two subsystems S1 and S2, which have two atoms incommon. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The thresholdvalue is automatically computed as half the value of the maximum of thesubsystem force norms. In step 0, fM = 15 and, therefore, the susbystem S2is frozen. Consequently, only the two leftmost atoms are mobile. In step 1,fM = 5 and, therefore, the susbystem S1 is frozen. Consequently, only thetwo rightmost atoms are mobile.

10

To extend the previous approach of comparing the atomic force norms with a certain

threshold, we define an extended subsystem force norm fS⇤i for the extended subsystem S⇤i .

Precisely, fS⇤i is the maximum atomic force norm in S⇤i . We also define a threshold value fM ,

which is compared to these force norms fS⇤1 , . . . , fS⇤M . When a force norm fS⇤i is lower than

fM , then, all atoms in the extended subsystem S⇤i are frozen in space, and the corresponding

eigendecomposition is not updated. On the contrary, if there exists at least one atom with

a force norm larger than fM , we do not choose to freeze all atoms of S⇤i . Note that even in

this case, though, S⇤i may still contain some frozen atoms, if it overlaps with some other,

frozen subsystems.

The threshold value fM can either be predefined by the user, or automatically computed

at each time step based on the system’s state. This value helps us control the computational

cost of a time step, since one may directly control the number of performed diagonalizations.

For fast energy minimization, we propose

fM =1

2max

i=1..MfS⇤i . (10)

This scheme is illustrated in Figure 2.

11

To extend the previous approach of comparing the atomic force norms with a certain

threshold, we define an extended subsystem force norm fS⇤i for the extended subsystem S⇤i .

Precisely, fS⇤i is the maximum atomic force norm in S⇤i . We also define a threshold value fM ,

which is compared to these force norms fS⇤1 , . . . , fS⇤M . When a force norm fS⇤i is lower than

fM , then, all atoms in the extended subsystem S⇤i are frozen in space, and the corresponding

eigendecomposition is not updated. On the contrary, if there exists at least one atom with

a force norm larger than fM , we do not choose to freeze all atoms of S⇤i . Note that even in

this case, though, S⇤i may still contain some frozen atoms, if it overlaps with some other,

frozen subsystems.

The threshold value fM can either be predefined by the user, or automatically computed

at each time step based on the system’s state. This value helps us control the computational

cost of a time step, since one may directly control the number of performed diagonalizations.

For fast energy minimization, we propose

fM =1

2max

i=1..MfS⇤i . (10)

This scheme is illustrated in Figure 2.

11

To extend the previous approach of comparing the atomic force norms with a certain

threshold, we define an extended subsystem force norm fS⇤i for the extended subsystem S⇤i .

Precisely, fS⇤i is the maximum atomic force norm in S⇤i . We also define a threshold value

ffreeze, which is compared to these force norms fS⇤1 , . . . , fS⇤M . When a force norm fS⇤i is

lower than ffreeze, then, all atoms in the extended subsystem S⇤i are frozen in space, and the

corresponding eigendecomposition is not updated. On the contrary, if there exists at least

one atom with a force norm larger than ffreeze, we do not choose to freeze all atoms of S⇤i .

Note that even in this case, though, S⇤i may still contain some frozen atoms, if it overlaps

with some other, frozen subsystems.

The threshold value ffreeze can either be predefined by the user, or automatically computed

at each time step based on the system’s state. This value helps us control the computational

cost of a time step, since one may directly control the number of performed diagonalizations.

For fast energy minimization, we propose

ffreeze =1

2max

i=1..MfS⇤i . (10)

This scheme is illustrated in Figure 2.

11

To extend the previous approach of comparing the atomic force norms with a certain

threshold, we define an extended subsystem force norm fS⇤i for the extended subsystem S⇤i .

Precisely, fS⇤i is the maximum atomic force norm in S⇤i . We also define a threshold value

ffreeze, which is compared to these force norms fS⇤1 , . . . , fS⇤M . When a force norm fS⇤i is

lower than ffreeze, then, all atoms in the extended subsystem S⇤i are frozen in space, and the

corresponding eigendecomposition is not updated. On the contrary, if there exists at least

one atom with a force norm larger than ffreeze, we do not choose to freeze all atoms of S⇤i .

Note that even in this case, though, S⇤i may still contain some frozen atoms, if it overlaps

with some other, frozen subsystems.

The threshold value ffreeze can either be predefined by the user, or automatically computed

at each time step based on the system’s state. This value helps us control the computational

cost of a time step, since one may directly control the number of performed diagonalizations.

For fast energy minimization, we propose

ffreeze =1

2max

i=1..MfS⇤i . (10)

This scheme is illustrated in Figure 2.

11

Figure 2: Block-adaptive Cartesian mechanics. In this example the system S is divided in two overlapping extendedsubsystems S∗1 and S∗2 , which have two atoms in common. The value indicated in each atom is the atomic force norm. Thevalue indicated in each subsystem is the subsystem force norm. The threshold value is automatically computed as half thevalue of the maximum of the subsystem force norms. In step 0, ffreeze = 15 and, therefore, S∗2 is frozen. Consequently, onlythe two leftmost atoms are mobile. In step 1, ffreeze = 5 and, therefore, S∗1 is frozen. Consequently, only the two rightmostatoms are mobile.

7

4 Adaptive reduced-basis quantum mechanics

red In this section, we present the adaptive reduced-basis quantum mechanics component. Recall that, within any extendedsubsystem S∗i , we want to solve a simplified problem by projecting problem (7), for the current time step, in a reduced basiscomposed of low-energy eigenvectors that have been computed at a previous time step, to benefit from temporal coherencebetween successive eigenproblems. This is motivated by perturbation bound theory on the invariant subspace37,38.

For clarity, we consider a system S with only one subsystem, and we first recall how the electronic structure problem (1)may be projected to a reduced basis.

4.1 Electronic structure calculations in a reduced basis

red We consider two pairs of symmetric matrices:

• (Href , Sref) the reference matrix pair for which an eigendecomposition is available. We denote by V ref the low-energyeigenvectors used as a reduced basis,

• (H,S) the matrix pair related to the new system state.

The matrix formulation of the new electronic structure problem in the reduced basis V ref is:

HvCv = SvCvDv, (11)

where Hv and Sv can be computed by matrix multiplication:

Hv = (V ref)THV ref , Sv = (V ref)TSV ref . (12)

The diagonal matrix Dv contains the sorted eigenvalues (evi denotes the ith lowest eigenvalue).To compute forces, one could deal with the gradient of the reduced Hamiltonian Hv and overlap matrix Sv, since the

resulting eigenvectors are expressed in basis V . However, these terms are complex and can lead to a quartic complexity forthe forces expression. To compute forces in practice, we first express the eigenvectors in the full basis: red

Cn = V refCv. (13)

The resulting Cn coefficients are the solution of the problem of finding the set of molecular orbitals minimizing the energyin the subspace generated by V . Then, the force formulation that expresses the variation of the energy calculated in thereduced basis (denoted Ev) by the atomic position is:

∇xEv =∑

µ

ν

P vµν∇xHµν −∑

µ

ν

W vµν∇xSµν , (14)

where P v is the density matrix

P vµν =i=1..N/2∑

i

2CnµiCnνi, (15)

W v is the energy-weighted density matrix

W vµν =

i=1..N/2∑

i

2eviCnµiC

nνi, (16)

and Cn is the matrix of the orthogonal molecular orbitals. One can remark that we do not speed-up the forces evaluationby the reduced-basis approach. The proof of equation (14) is presented in section “Appendix”.

red

8

4.2 A temporal coherence measure

Let us use a simple distance ε between the two matrix pairs :

ε =√||H −Href ||2F + ||S − Sref ||2F , (17)

where ||.||F is the Frobenius norm. Then, the error in potential energy |E − Ev| induced by the use of the reduced basis Vis asymptotically negligible compared to ε:

|E − Ev| = O(ε2) (18)

The proof of this equation is presented in section “Appendix”.Consequently, we propose to use the distance ε as an indicator of the pertinence of using low-energy eigenvectors of

(Href , Sref) to solve the new problem (H,S), i.e. help us decide on the fly when to update the reduced basis by performinga full-basis step (when ε becomes larger than a threshold value εM ).

4.3 Energy minimization with a reduced-basis approach

We recall that the main goal of our approach is to enable interactive geometry optimization, even for large systems. Ingeneral, one looks at this problem as the following energy minimization problem:

minX

E(X),

with X = {(xi)i=1..n} ∈ R3n,

where E is the potential energy dependent on X, the nuclei positions. To understand why we can accelerate geometryoptimization, we have to look at the problem as a minimization problem on both nucleus and electrons degrees of freedom.Let Z denote the vector space of the basis functions, then the problem reads as:

minX,Ψ

E(X,Ψ) =N/2∑

i=1

< ψi|H(X)|ψi >,

with X = {(xi)i=1..n} ∈ R3n, Ψ = {(ψi)i=1..N/2} ∈ ZN/2,subject to < ψi|S(X)|ψj >= δij , i, j = 1..N/2.

We do not necessarily have to compute Ψ which minimizes E for each atomic position X (as is done when a completediagonalization is performed). Our approach is to look for Ψ in a reduced basis, i.e., a Ψ which does not minimizes E for agiven X. To guarantee convergence to local energy minima, we frequently update this reduced basis. Precisely, we performat most kmax reduced basis steps between two full basis steps. Thus, unlike approaches which reduce the accuracy (by e.g.choosing a simpler model or decreasing the cutoff distance defining the extended subsystems size) to accelerate the simulation,our approach does not alter the final geometry of the molecule. In section “Results”, we demonstrate that the reduced basiscan be used to accelerate interactive geometry optimization.

9

5 The Block-Adaptive Quantum Mechanics algorithm

In practice, we combine the two adaptive components described above in an algorithm which is now explicitly described.red

5.1 Algorithm initialization

At the very first step, a complete step is performed. ∀i ∈ 1..M , the generalized eigenvalue problem HiCi = SiCiDi isformulated and solved. We then populate the low-energy molecular orbitals until there are exactly N electrons in the system.We refer the reader to our previous work7 for more details about the ASED-MO theory and our implementation of the D&Cscheme.

5.2 Algorithm general step

We recall that ffreeze is a force threshold, εM is an eigendecomposition perturbation threshold and kmax the maximum numberof reduced basis steps between full basis steps. fS∗i is the maximum atomic force norm in S∗i . (Hi, Si, Ci, Vi) are respectivelythe Hamiltonian, overlaps, eigenvectors, low-energy eigenvectors matrices of the extended subsytem S∗i . (Href

i , Srefi , Cref

i , V refi )

are the reference matrices (for which the eigenproblem has been solved completely) of the extended subsytem S∗i . We alsointroduce ki a counter of the successive number of reduced basis steps in S∗i .

• Block-adaptive Cartesian mechanics

– Update the threshold value ffreeze (e.g. ffreeze := 12 maxi=1..M

fS∗i ).

– ∀i ∈ 1..M , if fS∗i < ffreeze, freeze the atoms of the extended subsytem S∗i .

– For each mobile atom, move along the force applied to it.

• Incremental matrix computation:

– ∀i ∈ 1..M , if S∗i has some mobile atoms, compute the new Hamiltonian Hi and overlap matrix Si, as well as thedifference between them and the reference matrices Href

i and Srefi from which we have deduced the reduced basis,

δHi and δSi.

– ∀i ∈ 1..M , if S∗i has some mobile atoms, compute εi :=√||δHi||2F + ||δSi||2F .

– Update the threshold value εM (e.g. εM := 12 maxi=1..M

εi).

• Adaptive reduced-basis quantum mechanics: ∀i ∈ 1..M ,

– If all atoms in S∗i are frozen in space, keep the previous eigenvectors: Ci := Crefi .

– Else, if ((0 < εi < εM ) and (ki < kmax)), perform a reduced-basis step:

∗ project: compute Hvi := (V ref)Ti HiV

refi and Svi := (V ref)TI SiV

refi ,

∗ solve: compute a new set of molecular orbitals from eigenproblem (Hvi , S

vi ),

∗ count: ki := ki + 1.

– Else, perform a full-basis step:

∗ solve: compute a new set of molecular orbitals from eigenproblem (Hi, Si),

∗ update the reference eigenproblem: (Hrefi , Sref

i , Crefi , V ref

i ) := (Hi, Si, Ci, Vi),

∗ reset the counter: ki := 0.

• Finalize energy and forces computation

– incremental molecular orbital occupation:

∗ for each extended subsystem S∗i with new molecular orbitals, reset the density matrices (Pi := 0 and Wi := 0)and update the current total number of electrons in the system accordingly,

10

∗ globally sort unoccupied molecular orbitals by energy (remark that, in frozen subsystems, low-energy molecularorbitals are still occupied at this stage),

∗ populate low-energy molecular orbitals until there are exactly N electrons in the system (see note 1).

– for each extended subsystem where the density matrix has been modified, incrementally update the atomic forceof all atoms in the extended subsystem, (see note 2).

– For each extended subsystem S∗i with an atomic force change, update the maximum force norm of the subsystemfS∗i .

Note 1 (incremental molecular orbitals occupation)During the incremental molecular orbital occupation stage, even the density matrices of frozen extended subsystems mightchange. Indeed, because the Fermi energy changes between two steps, it might happen that new molecular orbitals haveto be occupied or that previously occupied molecular orbitals are not anymore populated with electrons. In this case, thedensity matrices can be efficiently updated via one or more rank-one matrix update. This is a rather rare event, however,when the time step size is small and there is high temporal coherence between successive steps, since the system’s energy isa continuous function of the atoms positions.

Note 2 (incremental force update)Let us denote by f ji the force acting on atom i due to the contribution of the extended subsystem S∗j to the density matricesP and W. Assuming f ji = 0 when atom i does not belong to the extended subsystem S∗j , the total bonded force fi acting onatom i can be written:

fi =∑

j∈1..M

f ji . (19)

In our implementation, we incrementally update the forces, i.e. we only recompute the changing partial forces f ji . Precisely,for each extended subsystem S∗j with density matrices changes, we first save each partial force before recomputing them:sji := f ji . Then, the atomic force on atom i can be incrementally updated: fi := fi + f ji − sji .

5.3 Choice of the threshold values

At least two options are possible for the choice of the thresholds ffreeze and εM .The simplest choice is to predefine these values. In this case, the system will not relax completely (since some atoms with

non-zero applied forces will not move). However, this approach is very powerful when the user is prototyping a new system anddoes not need the full accuracy of the quantum chemistry model. In this mode, an adaptive minimization step is performedonly when the modeler detects that large forces are applied or that an important perturbation in the eigendecompositionproblem appeared. Two videos in the Supporting Material illustrates adaptive quantum chemistry modeling.

The second option is to automatically choose the thresholds based on the system’s state. Let K denote a user-definedconstant. We may choose ffreeze = (maxi=1..M fS∗i )/K and εM = (maxi=1..M εi)/K. Consequently the computationalresources will be focused on the most mobile atoms and on the most perturbed eigendecomposition problems. For interactivequantum chemistry modeling, one may also compute the threshold values to allow only N1 subsystems with mobile atomsand N2 subsystems with diagonalisation such that the time cost of each step is well controlled.

red Two options are also possible for kmax. In practice, for interactive quantum chemistry, we choose a value kmax = 100.However, in section “Results”, we show that kmax can be optimized for a faster energy minimization.

11

6 Results

We now present results of our block-adaptive quantum mechanics algorithm for the ASED-MO level of theory. redIn thispaper, we focus on the BAQM approach performance and we refer the reader to our previous work7 about our ASED-MOD&C approach for more details about its accuracy and efficiency.

We have used C++ as the main programming language. We have also used the highly optimized multithreaded IntelMath Kernel Library21 to solve the generalized eigenvalue problems and to perform all the linear algebra operations. Thetests have been performed on two different computers. Computer 1 is a desktop computer with two 2.67 GHz quad-coreprocessors and 4GB of RAM, running a 32-bit Linux Fedora operating system. Computer 2 is a desktop computer with two2.33 GHz quad-core processor and 4GB of RAM, running a 32-bit Linux Fedora operating system.

6.1 Reduced-basis molecular orbital computations

In this section, we compare full-basis and reduced-basis molecular orbitals computations. Computer 1 was used in this test.We recall that, for fast steps, we have to perform the linear algebra operations Hv = BTHB, Sv = BTSB, solve

the eigendecomposition HvCv = SvCvEv, and perform Cn = BCv. We also compare this scheme with a simpler S-orthogonalization. Indeed, an S-orthogonalization of the molecular orbitals can be used when the system S contains onlyone subsystem and the reduced basis dimension coincides with the number of occupied molecular orbitals. In this case, anyS-orthogonal basis of the occupied subspace results in the same energy, and is a better choice if one is not interested in themolecular orbitals (eigenvectors and eigenvalues) themselves.

Figure 3 presents timings averaged over 100 evaluations for different matrix sizes (the method does not depend on thematrix elements). All curves demonstrate a cubic behavior. With the implementation presented in the appendix, the simpleorthogonalization of 50% of the previous eigenvectors is about one order of magnitude faster than the full basis approach.Therefore, adaptive reduced-basis quantum mechanics allows for interactive rates with larger subsystems.

Figure 3: Timings of reduced-basis molecular orbitals computations with different basis dimensions. The eigendecompositionproblem is projected in a basis containing respectively 100%, 80%, 70%, 60% or 50% of the low energy eigenvectors of apreviously solved problem.

12

6.2 Energy minimization with the adaptive reduced-basis approach

We recall that, in our approach, the main goal is to provide interactive and efficient geometry optimization. During aninteractive modeling session, on-the-fly geometry optimization assists the user by continuously attracting the system intolower energy states. The previous section demonstrates that using a reduced basis leads to faster steps, which allows forinteractive rates with larger subsystems. We now demonstrate the relevance of the adaptive reduced-basis approach foraccelerating energy minimization. In interactive geometry optimization, sophisticated methods such as quasi-newton orconjugate gradient may not be appropriate since each minimization step may require several forces and potential energyevaluations, making it more difficult to achieve interactive rates with large systems. Our approach is simply to use a steepestdescent method with a constant time step size to have a smooth attraction of the system into a local minimum.

red For four structures (fullerene, polyflurorene, nanotube and graphene) presented in Figure 4, we optimized the geometryand obtained the global minimum of the potential energy. Then, we repeated the optimization with the adaptive reduced-basis approach and stopped the energy minimization when the potential energy E was close enough to the global minimumenergy E0 (|E−E0

E0| < 10−3). In these tests, we did not use εM to decide when to switch, but simply alternated between

kmax reduced-basis steps and one full-basis step. The reduced bases were always composed of 50% of the previously solvedeigendecompositon problem so that, since the dimension of the reduced basis coincided with the number of molecular orbitalsto be computed, we simply performed an orthogonalization of the molecular orbitals. These tests were performed usingcomputer 1.

Figure 5 presents the different speed-ups as a function of kmax, the number of reduced basis steps between each full basisstep. One can see that larger systems appear to benefit more from our approach, as molecular orbitals computation largelydominates the cost of a simulation step. Geometry optimizations that include bond formations (fullerene and nanotubetests) benefit less from our approach. Remark that, when kmax is large, our approach slows down energy minimization (thespeed-up is smaller than 1).

6.3 Energy minimization with the Block-Adaptive Quantum Mechanics (BAQM) approach

Here, we demonstrate how the BAQM algorithm (section “The Block-Adaptive Quantum Mechanics algorithm”) may be usedto accelerate the geometry optimization of a locally deformed graphane sheet of 1556 atoms.

In this structure, each carbon atom is bound to one hydrogen atom, explaining the potential role of graphane as anhydrogen storage medium36. Stable graphane structures were first theoretically predicted and then experimentally realized.The lowest potential energy structure is achieved when hydrogen atoms are attached to the graphane sheet in an alternatingpattern (up and down). An important problem is to understand the role of H–frustration27.

In this minimization test, one hydrogen bond was flipped in such a way that two hydrogen atoms became frustrated, andthe geometry of the graphane sheet had to be relaxed. We used 64 subsystems and a cut-off of 6 Å to define the extendedsubsystems. The tests were performed on computer 2.

Figure 6 shows the Root-Mean-Square-Deviation (RMSD) to the optimized structure as a function of wall-clock timewhile energy minimization was performed and reports the resulting speed-ups. Energy minimization was stopped when a0.01 RMSD was reached. The BAQM approach allows for a speed-up of more than 20 by choosing the threshold value ffreeze

automatically computed by ffreeze = (maxi=1..M fS∗i )/2 and by updating the reduced basis every 6 steps (i.e. for kmax = 5,without using εM ). The reduced basis sets were composed of 50% of the previously solved eigendecomposition problem(orthogonalization was not used because we needed to access each eigenvalue individually in the D&C scheme). The adaptivereduced-basis approach itself allowed us to speed-up minimization by a factor of 1.4. The block-adaptive cartesian mechanicsallowed for an important speed-up of 16, since only some atoms had to be moved to relax the structure. We note that thetotal speed-up allowed by the BAQM approach was approximately the multiplication of these two speed-ups, which showsthat the two components developed in this paper combine well.

13

a b

c

d

Figure 4: redThe four structures considered in the energy minimization benchmarks. (a) is a buckminster fullerene (C60),(b) is a graphene sheet (C216), (c) is a polyfluorene molecule (C90H72) and (d) is a carbone nanotube (C200). Remark thatgeometry optimization of molecules (a) and (d) requires bonds formation. The white surface is an isosurface of the electrondensity.

14

Spee

d-up

0

1

2

3

4

Number of fast steps (kmax)0 100 200 300 400

Spee

d-up

0

1

2

3

4

Number of fast steps (kmax)0 200 400 600 800 1,000

Spee

d-up

0

1

2

3

4

Number of fast steps (kmax)0 20 40 60 80 100

Spee

d-up

0

1

2

3

4

Number of fast steps (kmax)0 10 20 30 40 50

Fullerene Polyfluorene (C90H72)

Graphene (C216)Nanotube (C200)

Nanotube

Graphene Polyfluorene

Figure 5: redDifferent speed-ups for convergence to global minima are presented. The number in the abscissa representskmax, the number of reduced basis steps between each full basis step (reduced basis update).

15

Figure 6: Performance of the block-adaptive divide-and-conquer approach for energy minimization. The figureplots the Root-Mean-Square Deviation (RMSD) to the minimized structure of a graphane sheet as a function of wall-clocktime during energy minimization. Geometry optimization is stopped when the RMSD is smaller than 0.01 Å. In this case,our block-adaptive D&C approach allows for an important speed-up. Speed-ups of the different adaptive approaches areindicated into brackets in the legend.

16

6.4 Interactive quantum chemistry demonstration

For many systems, our approach is sufficiently efficient to enable interactive quantum chemistry simulations on a multicoredesktop computer. In the Supporting Material, we present two videos demonstrating interactive quantum chemistry modelingin SAMSON1, the software being developed in our group. We used computer 1. In these examples, the user interactivelyedits the systems by pulling on atoms. The imposed atomic displacement is proportional to the distance between the selectedatom and the position of the mouse pointer.

• Interactive quantum chemistry with a large subsystem: the user loads a carbon nanotube of 120 atoms treatedwith the ASED-MO theory with 480 basis elements and only one subsystem. For this system, the classical approachallows only 5 energy and forces computations per second. Thanks to the reduced basis approach, 20 energy and forcesevaluations per second can be achieved and the user may interactively prototype the system. In the reduced basisapproach, bonds may break and re-form, however, the reduced basis, which has been deduced from the electronicstructure of the initial geometry of the system, prevents the system from creating new bonds. To overcome thislimitation, the user activates the Adaptive Reduced-Basis Quantum Mechanics approach. Then, in this example, theuser is able to intuitively edit the system and explore different chemical structures. Figure 7 illustrates this interactivesession.

• Interactive quantum chemistry with a large system: the user loads a graphane sheet of 1556 atoms treated withthe ASED-MO divide-and-conquer approach. 64 subsystems are used and a cut-off of 4 Å for the buffer zone is chosento achieve interactivity with our block-adaptive approach. The user is able to study the impact on the geometry of thestructure when the state of a carbon-hydrogen chemical bond is changed. The bond can be broken and reformed onthe other side of the graphane sheet. The block-adaptive quantum mechanics approach allows the user to interactivelyprototype the structure with each time step being approximately one order of magnitude faster. Figure 8 illustratesthis interactive session. The BAQM approach also allows us to efficiently access different optimized configurations asillustrated in Figure 6.

17

a

b c

Figure 7: Interactive modeling session. The user selects a group of atoms (atoms in blue) and splits the carbon nanotubeinto two parts (a). Then, the user pulls on a carbon atom to break a bond (b) and designs a new chemical structure (c). Theuser force is displayed by a red arrow.

a b c

Figure 8: Interactive modeling session. The user loads a graphane sheet composed of 1556 atoms (a). The user selects ahydrogen atom and applies a force (b). The user succeeds to break the bond (c). The user force is displayed by a red arrow.

18

7 Conclusion

In this paper, we demonstrate that interactive quantum chemistry simulation is feasible for rather large systems in theframework of the ASED-MO theory and the Divide-And-Conquer (D&C) technique. The proposed Block-Adaptive QuantumMechanics (BAQM) approach allows for interactive rates with larger systems and larger subsystems than in the originalscheme. We reduced the percentage of the time spent in the diagonalization routine. As a result, optimization and multi-threading on the rest of the computations can significantly improve the speed of a simulation, so that interactive quantumchemistry should be feasible for systems up to few thousands of atoms in the near future thanks to technological progress.

To achieve these results, we developed two adaptive approaches in which nuclei positions as well as electronic degreesof freedom can be constrained on the fly to control the simulation cost. First, we presented a block-adaptive Cartesianmechanics approach, in which nuclei may be frozen in space in groups, which allows us to deal with large systems. Second,we proposed to use a reduced basis set composed of some of the low-energy eigenvectors of a previous time step to acceleratethe molecular orbital computations in large subsystems. The reduced basis is adaptively updated. We demonstrated thatthese approaches may accelerate geometry optimization. Indeed, each step is solved significantly faster by constraining somenuclei and electrons, and, by focusing computational resources on the most mobile atoms, we obtain a faster potential energydescent.

The presented method is general, and should be applicable to many quantum chemistry models. We would like todetermine whether it might be useful to accelerate geometry optimization with self-consistent models, and/or large basis setssuch as real space grids9, plane waves24 or wavelets18.

The idea of adaptively constraining the degrees of freedom in Cartesian coordinate6 can also be used to accelerate phase-space sampling in the framework of the Adaptively Restrained Particle Simulations (ARPS) algorithm4. We would liketo investigate whether the adaptive reduced basis approach may be efficiently combined to ARPS to efficiently computestatistical properties with quantum chemistry models.

8 Appendix

8.1 Forces expression in a reduced basis approach

Let us prove the force formulation presented in section “Electronic structure calculations in a reduced basis”.Theorem:The gradient with respect to an atomic position x of the potential energy computed with a reduced basis is:

∇xEv =∑

µ

ν

P vµν∇xHµν −∑

µ

ν

W vµν∇xSµν , (20)

Proof:∇xEv = ∇x

i

2evi = ∇x∑

i

2(Cvi )THvCvi , (21)

then,∇xEv =

i

2(∇xCvi )THvCvi +∑

i

2(Cvi )T (∇xHv)Cvi +∑

i

2(Cvi )THv(∇xCvi ). (22)

As Cvi are eigenvectors, we have:

∇xEv =∑

i

2evi[(∇xCvi )TSvCvi + (SvCvi )T (∇xCvi )

]+∑

i

2(Cvi )T (∇xHv)Cvi , (23)

however,(∇xCvi )TSvCvi + (Cvi )TSv(∇xCvi ) = ∇x

(Cvi )TSvCvi

)− (Cvi )T (∇xSv)Cvi , (24)

and∇x[(Cvi )TSvCvi

]= ∇x1 = 0. (25)

19

As a result,∇xEv = −

i

2evi[(BCvi )T∇xS(BCvi )

]+∑

i

(BCvi )T (∇xH)BCvi . (26)

We can develop the terms:

(BCvi )T∇xS(BCvi ) = (Cni )T∇xS(Cni ) =∑

µ

ν

CnµiCnνi∇xSµν , (27)

(BCvi )T∇xH(BCvi ) = (Cni )T∇xH(Cni ) =∑

µ

ν

CnµiCnνi∇xHµν . (28)

Finally,∇xEv =

µ

ν

P vµν∇xHµν −∑

µ

ν

W vµν∇xSµν (29)

8.2 An order-one correction

Let us consider two pairs of symmetric matrices:

• (Href , Sref) for which an eigendecomposition is available, we denote V the low-energy eigenvectors andW the remainingeigenvectors. We assume that the rank of V is larger than N

2 .

• (H,S) related to the new system state.

Let us define a simple distance between the two matrix pairs ε as:

ε =√||H −Href ||2F + ||S − Sref ||2F , (30)

Let us show that projecting the new problem in the previous low-energy eigenvectors results in an order-one correctionof the system’s potential energy.

For clarity, we first introduce four lemma.

• Lemma 1:Let Hr and Sr denote the new Hamiltonian and overlap matrices expressed in the old eigenvector basis Z = (V,W ):

Hr = ZTHZ, Sr = ZTSZ. (31)

Precisely:

Hr =

(Hv Hv−w

(Hv−w)T Hw

), Sr =

(Sv Sv−w

(Sv−w)T Sw

)(32)

whereHv = V THV, Sv = V TSV, (33)

Hw = WTHW,Sw = WTSW, (34)

Hv−w = V THW,Sv−w = V TSW. (35)

The eigenvalues of (Hr, Sr) are also eigenvalues of (H,S).

• Lemma 2:Let Hp and Sp denote the matrices of the approximate Hamiltonian by neglecting Hv−w:

Hp =

(Hv 00 Hw

), Sp =

(Sv 00 Sw

). (36)

20

When the largest eigenvalue of the matrix pair (Hv, Sv) is lower than the lowest eigenvalue of the matrix pair (Hw, Sw),the sum of the N

2 lowest eigenvalues of (Hp, Sp) is the sum of the N2 lowest eigenvalues of (Hv, Sv).

• Lemma 3:Let Ha and Sa denote the matrices of the approximate Hamiltonian by neglecting Hv−w expressed in the full basis:

Ha = Z−THpZ−1, Sa = Z−TSpZ−1. (37)

The eigenvalues of (Ha,Sa) are those of (Hp,Sp) and we can state that:

√||Ha −Href ||2F + ||Sa − Sref ||2F = O(ε). (38)

Proof:As

||Href −H ||F = O(ε), ||Sref − S ||F = O(ε) (39)

and Z is constant, we have

||ZT (Href −H )Z||F = O(ε), ||ZT (Sref − S)Z||F = O(ε) (40)

which can be rewritten as||Dref −Hr||F = O(ε), ||I − Sr||F = O(ε), (41)

where Dref denotes the diagonal matrix of the ordered eigenvalues of the matrix pair (Href , Sref).

Since both (block-)diagonal elements of Hr −Hp and Sr − Sp are zero, we have:

||Hp −Hr||F ≤ ||Dref −Hr||F , ||Sp − Sr||F ≤ ||I − Sr||F . (42)

Consequently, from equations (41) and (42), we have

||Hp −Hr||F = O(ε), ||Sp − Sr||F = O(ε). (43)

Since, again, Z is constant,

||Z−THpZ−1 − Z−THrZ−1||F = O(ε), ||Z−TSpZ−1 − Z−TSrZ−1||F = O(ε), (44)

which can be rewritten as||Ha −H ||F = O(ε), ||Sa − S ||F = O(ε). (45)

From equations (39) and (45), we can conclude that

||Ha −Href ||F = O(ε), ||Sa − Sref ||F = O(ε). (46)

• Lemma 4:Let ei denote the ith lowest eigenvalue of (H,S) and epi the ith lowest eigenvalue of (Hp, Sp), we have

|ei − epi | = O(ε2) (47)

Proof: One can apply eigenvalue perturbation theory with the matrix pairs (H,S) and (Href , Sref). In the limit of

21

small perturbations, the order of the eigenvalues is not changed and we can state:

ei = eoldi + zTi (H − eoldi S)zi +O(ε2). (48)

where zi is the ith eigenvector of (Href , Sref). By definition of (Hr, Sr), we can rewrite equation (48)

ei = eoldi +Hrii − eoldi Srii +O(ε2). (49)

One can apply eigenvalue perturbation theory with the matrix pairs(Ha, Sa

)and (Href , Sref). In view of lemma 3,

this may be written asepi = eoldi + zTi

(Z−THpZ−1 − eoldi Z−TSpZ−1

)zi +O(ε2). (50)

which is simplyepi = eoldi +Hp

ii − eoldi Spii +O(ε2). (51)

Since Hpii = Hr

ii and Spii = Srii, we can state that

|ei − epi | = O(ε2). (52)

Theorem: The potential energy error |E − Ev| induced by the use of the reduced basis V is asymptotically negligiblecompared to ε:

|E − Ev| = O(ε2). (53)

Proof: In view of lemma 1, the potential energy E is the sum of the lowest eigenvalues of (Hr, Sr):

E =∑

i=1..N2

2ei, (54)

In the limit of small ε, the largest eigenvalue of the matrix pair (Hv, Sv) is lower than the lowest eigenvalue of the matrix pair(Hw, Sw). Thus, in view of lemma 2, the reduced-basis potential energy Ev is the sum of the lowest eigenvalues of (Hp, Sp):

Ev =∑

i=1..N2

2evi =∑

i=1..N2

2epi , (55)

In the limit of small ε, lemma 3 implies that we can write the potential energy error |E − Ev| induced by the use of areduced basis as:

|E − Ev| =∑

i=1..N2

2(ei − epi ) =∑

i=1..N2

2(O(ε2)) = O(ε2), (56)

which shows that projecting the new problem (H,S) in the previous low-energy eigenvectors results in an order-one correctionof the system’s potential energy.

8.3 A multithreaded orthogonalization solver

In general, orthogonalization has a cubic scaling. However, it can be efficiently performed on multicore architecture2,10,11,28.Unfortunately, we are considering S-orthogonalization and we had to implement an orthogonalization algorithm for thisspecific case. We chose the modified Gram-Schmidt algorithm19. As a result, we designed it in such a way that for oursystem’s sizes the algorithm provides a good speed-up when using the multithreaded variant of the code using OpenMP14.

Let N denote the number of occupied molecular orbital, S – the overlap matrix, and B – the reduced basis.

22

Algorithm 1 S-orthogonalisation solverRequire: A matrix S and a matrix B with the vectors to S-orthogonalize.Ensure: B contains S-orthogonal vectors.

1: SB ← S ∗B2: for i = 1→ N do3: norm←

√B(:, i)T ∗ SB(:, i)

4: B(:, i)← 1norm ∗B(:, i) {The vector B(:, i) is now S-normalized}

5: SB(:, i)← 1norm ∗ SB(:, i)

6: #Loop using multithreading7: for j = i+ 1→ N do8: x←

√B(:, i)T ∗ SB(:, j)

9: B(:, j)← B(:, j)− x ∗B(:, i) {B(:, i) and B(:, j) are now S-orthogonal}10: SB(:, j)← SB(:, j)− x ∗ SB(:, i)11: end for12: end for

References

1. SAMSON (Software for Adaptive Modeling and Simulation Of Nanosystems). NANO-D. http://nano-d.inrialpes.fr/.

2. E. Agullo, J. Dongarra, R. Nath, and S. Tomov. A fully empirical autotuned dense QR factorization for multicorearchitectures. Technical report, Technical Report 242, LAPACK Working Note, 2011.

3. A. B Anderson. Electron density distribution functions and the ASED-MO theory. International Journal of QuantumChemistry, 49(5):581–589, 1994.

4. S. Artemova and S Redon. ARPS: Adaptively Restrained Particle Simulations. To be published, 2012.

5. P. Bientinesi, I.S. Dhillon, and R.A. Van De Geijn. A parallel eigensolver for dense symmetric matrices based on multiplerelatively robust representations. SIAM Journal on Scientific Computing, 27(1):43–66, 2006.

6. M. Bosson, S. Grudinin, X. Bouju, and S. Redon. Interactive physically-based structural modeling of hydrocarbonsystems. Journal of Computational Physics, 2011.

7. M. Bosson, C. Richard, A. Plet, S. Grudinin, and S. Redon. Interactive quantum chemistry: A divide-and-conquerASED-MO method. Journal of Computational Chemistry, 2012.

8. Elena Breitmoser and Andrew G. Sunderland. A performance study of the PLAPACK and SCALA-PACK eigensolvers on hpcx for the standard problem. Tethnical Report from the HPCx Consortium.http://www.hpcx.ac.uk/research/hpc/hpcxtr0406.pdf, 2004.

9. EL Briggs, DJ Sullivan, and J. Bernholc. Real-space multigrid-based approach to large-scale electronic structure calcu-lations. Physical Review B, 54(20):14362, 1996.

10. A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. Parallel tiled qr factorization for multicore architectures. Concurrencyand Computation: Practice and Experience, 20(13):1573–1590, 2008.

11. A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicorearchitectures. Parallel Computing, 35(1):38–53, 2009.

12. E. Cances, C. Le Bris, NC Nguyen, Y. Maday, A.T. Patera, and GSH Pau. Feasibility and competitiveness of a reducedbasis approach for rapid electronic structure calculations in quantum chemistry. In Proceedings of the Workshop forHighdimensional Partial Differential Equations in Science and Engineering (Montreal), 2007.

13. E. Chan, E.S. Quintana-Orti, G. Quintana-Orti, and R. Van De Geijn. Supermatrix out-of-order scheduling of matrixoperations for smp and multi-core architectures. In Proceedings of the nineteenth annual ACM symposium on Parallelalgorithms and architectures, pages 116–125. ACM, 2007.

23

14. R. Chandra. Parallel programming in OpenMP. Morgan Kaufmann, 2001.

15. C.J. Cramer. Essentials of computational chemistry: theories and models. John Wiley & Sons Inc, 2004.

16. S.L. Dixon and K.M. Merz Jr. Semiempirical molecular orbital calculations with linear system size scaling. Journal ofChemical Physics, 104(17):6643–6649, 1996.

17. M.D. Ermolaeva, A. van der Vaart, and K.M. Merz Jr. Implementation and testing of a frozen density matrix-divide andconquer algorithm. The Journal of Physical Chemistry A, 103(12):1868–1875, 1999.

18. L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S.A. Ghasemi, A. Willand, D. Caliste, O. Zilberberg, M. Rayson,A. Bergman, et al. Daubechies wavelets as a basis set for density functional pseudopotential calculations. The Journalof chemical physics, 129:014109, 2008.

19. G.H. Golub and C.F. Van Loan. Matrix computations, volume 3. Johns Hopkins Univ. Press, 1996.

20. R. Hoffmann. An extended Hückel theory. I. Hydrocarbons. Journal of Chemical Physics, 39(6):1397–1412, 1963.

21. Intel. Math kernel library.www.intel.com/software/products/mkl.

22. M. Kobayashi and H. Nakai. Divide-and-conquer approaches to quantum chemistry: Theory and implementation. Linear-Scaling Techniques in Computational Chemistry and Physics, pages 97–127, 2011.

23. W. Kohn. Density functional and density matrix method scaling linearly with the number of atoms. Phys. Rev. Lett.,76(17):3168–3171, Apr 1996.

24. G. Kresse and J. Furthmüller. Efficiency of ab-initio total energy calculations for metals and semiconductors using aplane-wave basis set. Computational Materials Science, 6(1):15–50, 1996.

25. T. S Lee, J. P Lewis, and W. Yang. Linear-scaling quantum mechanical calculations of biological molecules: Thedivide-and-conquer approach. Computational Materials Science, 12(3):259–277, 1998.

26. Tai-Sung Lee and Weitao Yang. Frozen density matrix approach for electronic structure calculations. InternationalJournal of Quantum Chemistry, 69(3):397–404, 1998.

27. S.B. Legoas, P.A.S. Autreto, M.Z.S. Flores, and D.S. Galvao. Graphene to graphane: the role of H frustration in latticecontraction. Arxiv preprint arXiv:0903.0278, 2009.

28. L. Liu, Z. Li, and A.H. Sameh. Analyzing memory access intensity in parallel programs on multicore. In Proceedings ofthe 22nd annual international conference on Supercomputing, pages 359–367. ACM, 2008.

29. Y. Maday and U. Razafison. A reduced basis method applied to the Restricted Hartree–Fock equations. Comptes RendusMathematique, 346(3):243–248, 2008.

30. R.S. Mulliken. Spectroscopy, molecular orbitals, and chemical bonding. Nobel Lecture, December 12, 1966.

31. A.K. Noor and J.M. Peters. Reduced basis technique for nonlinear analysis of structures. In Structures, StructuralDynamics, and Materials Conference, 20 th, St. Louis, Mo, pages 116–126, 1979.

32. W Pan, T S Lee, and W T Yang. Parallel implementation of divide-and-conquer semiempirical quantum chemistrycalculations. Journal of Computational Chemistry, 19(9):1101–1109, 1998.

33. M. J. Rayson and P. R. Briddon. Rapid iterative method for electronic-structure eigenproblems using localised basisfunctions. Computer Physics Communications, 178(2):128–134, January 2008.

34. R. Rossi, M. Isorce, S. Morin, J. Flocard, K. Arumugam, S. Crouzy, M. Vivaudou, and S. Redon. Adaptive torsion-anglequasi-statics: a general simulation method with applications to protein structure analysis and design. Bioinformatics,23(13):i408–i417, 2007.

24

35. F. Shimojo, R.K. Kalia, A. Nakano, and P. Vashishta. Divide-and-conquer density functional theory on hierarchicalreal-space grids: Parallel implementation and applications. Physical Review B, 77(8):085103, 2008.

36. J.O. Sofo, A.S. Chaudhari, and G.D. Barber. Graphane: A two-dimensional hydrocarbon. Physical Review B,75(15):153401, 2007.

37. G.W. Stewart. Pertubation bounds for the definite generalized eigenvalue problem. Linear algebra and its applications,23:69–85, 1979.

38. G.W. Stewart and J. Sun. Matrix perturbation theory, volume 175. Academic press San Diego, CA, 1990.

39. J.J.P. Stewart. Application of localized molecular orbitals to the solution of semiempirical self-consistent field equations.International journal of quantum chemistry, 58(2):133–146, 1996.

40. P.R. Surján, D. Kohalmi, Z. Rolik, and Á. Szabados. Frozen localized molecular orbitals in electron correlationcalculations-exploiting the hartree-fock density matrix. Chemical Physics Letters, 450(4-6):400–403, 2008.

41. H. Sutter. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal, 30(3):202–210,2005.

42. M. Thottethodi, S. Chatterjee, and A.R. Lebeck. Tuning Strassen’s matrix multiplication for memory efficiency. InProceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1–14. IEEE Computer Society,1998.

43. I.S. Ufimtsev and T.J. Martínez. Quantum chemistry on graphical processing units. 1. strategies for two-electron integralevaluation. Journal of Chemical Theory and Computation, 4(2):222–231, 2008.

44. Arjan Van Der Vaart, Dimas SuÃąrez, and Kenneth M Merz. Critical assessment of the performance of the semiempiricaldivide and conquer method for single point calculations and geometry optimizations of large chemical systems. Journalof Chemical Physics, 113(23):10512–10523, 2000.

45. V. Volkov and J.W. Demmel. Benchmarking gpus to tune dense linear algebra. In High Performance Computing,Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1–11. IEEE, 2008.

46. R.C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project.Parallel Computing, 27(1-2):3–35, 2001.

47. W. Yang and T.S. Lee. A density-matrix divide-and-conquer approach for electronic structure calculations of largemolecules. Journal of Chemical Physics, 103(13):5674–5678, 1995.

48. R. Zalesny, M.G. Papadopoulos, P.G. Mezey, and J. Leszczynski. Linear-scaling techniques in computational chemistryand physics: Methods and applications. Challenges and Advances in Computational Chemistry and Physics (13), 2011.

25