Quantum Annealing Initialization of the Quantum

Quantum Annealing Initialization of the Quantum Approxi-mate Optimization AlgorithmStefan H. Sack and Maksym Serbyn

IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria

June 28, 2021

The quantum approximate optimization algo-rithm (QAOA) is a prospective near-term quan-tum algorithm due to its modest circuit depthand promising benchmarks. However, an ex-ternal parameter optimization required in theQAOA could become a performance bottleneck.This motivates studies of the optimization land-scape and search for heuristic ways of parame-ter initialization. In this work we visualize theoptimization landscape of the QAOA applied tothe MaxCut problem on random graphs, demon-strating that random initialization of the QAOAis prone to converging to local minima with sub-optimal performance. We introduce the initial-ization of QAOA parameters based on the Trot-terized quantum annealing (TQA) protocol, pa-rameterized by the Trotter time step. We findthat the TQA initialization allows to circumventthe issue of false minima for a broad range oftime steps, yielding the same performance asthe best result out of an exponentially scalingnumber of random initializations. Moreover, wedemonstrate that the optimal value of the timestep coincides with the point of proliferation ofTrotter errors in quantum annealing. Our re-sults suggest practical ways of initializing QAOAprotocols on near-term quantum devices and re-veal new connections between QAOA and quan-tum annealing.

1 IntroductionRecent technological advances have led to a largenumber of implementations [1–4] of so-called NoisyIntermediate-Scale Quantum (NISQ) devices [5]. Thesemachines, which allow to manipulate a small number ofimperfect qubits with limited coherence time, inspiredthe search for practical quantum algorithms. The quan-tum approximate optimization algorithm (QAOA) [6]

Stefan H. Sack: [email protected] Serbyn : [email protected]

has emerged as a promising candidate for such NISQdevices [7–9].

The QAOA is a variational hybrid quantum algo-rithm where the classical computer operates a NISQ de-vice. The computer is responsible for the optimizationof the cost function over a set of variational parameters.The cost function is calculated using a NISQ devicethat prepares a quantum state corresponding to cho-sen parameters and performs quantum measurements.In QAOA of depth p the wave function is prepared bya unitary circuit parametrized by 2p parameters, seeFig. 1(a). Each of the p layers consist of two unitaries:the first is generated by a classical Hamiltonian HC thatencodes the cost function of a combinatorial optimiza-tion problem, and the second is generated by the mixingquantum Hamiltonian, HB .

While the p = 1 limit of QAOA allows for analyticconsiderations and derivation of performance guaran-tees [6], subsequent work suggested that higher depthp may be required in order to achieve a quantum ad-vantage [8, 10]. However, increasing p leads to a pro-gressively more complex optimization landscape, thatis characterized by a large number of local suboptimalminima [7, 9, 11, 12], see Fig. 1(c). The convergence ofclassical optimization algorithms into such sub-optimalsolutions was demonstrated to be a potential bottleneckof QAOA performance as finding a nearly optimal min-imum usually requires exponential in p number of ini-tializations of the classical optimization algorithm [6, 7].Note, that the problem of sub-optimal local minimais different from that of barren plateaus [13, 14], i.e.large regions in parameter space with vanishing gradi-ents, since barren plateaus are associated with circuitdepths p polynomial in system size N [15], beyond whatis typically considered in the QAOA.

The complexity of the energy landscape of large-pQAOA has motivated the search for heuristic ways ofimproving the convergence to a (nearly) optimal mini-mum values of the variational parameters. Recent workhas demonstrated a concentration of the QAOA land-scape for typical problem instances [16], which impliesthe existence of a typical landscape and hints at thefact that the same variational parameter choice may

Accepted in Quantum 2021-06-15, click title to verify. Published under CC-BY 4.0. 1

arX

iv:2

101.

0574

2v3

[qu

ant-

ph]

29

Jun

2021

https://quantum-journal.org/?s=Quantum%20Annealing%20Initialization%20of%20the%20Quantum%20Approximate%20Optimization%20Algorithm&reason=title-click

https://quantum-journal.org/?s=Quantum%20Annealing%20Initialization%20of%20the%20Quantum%20Approximate%20Optimization%20Algorithm&reason=title-click

https://orcid.org/0000-0001-5400-8508

https://orcid.org/0000-0002-2399-5827

mailto:[email protected]

mailto:[email protected]

work between different problem instances or sizes. Aparticular example of such a heuristic was proposed inRef. [7] which constructs a good initialization for theQAOA at level p+ 1 using the solution at level p, thusrequiring a polynomial in p number of optimizationruns. Other approaches, such as reusing parametersfrom similar graphs [12], using an initial state that en-codes the solution of a relaxed problem [17], or utilizingmachine learning techniques to predict QAOA parame-ters [18, 19] were also proposed.

In this work we propose a different approach to theQAOA initialization, based on the relation betweenQAOA and the quantum annealing algorithm. Quan-tum annealing uses adiabatic time evolution to find thelowest energy state of HC , but often requires unfeasibleevolution time T [20]. We explore the observation thatTrotterization of unitary evolution in quantum anneal-ing provides a particular choice of parameters for theQAOA [6]. This leads us to introduce a one-parameterfamily of Trotterized quantum annealing (TQA) ini-tializations for QAOA, controlled by the time step or,equivalently, total time used in adiabatic evolution.

The central result of our work is the demonstrationthat TQA initialization for QAOA gives comparableperformance to the search over an exponentially scal-ing number of random initializations. To this end, weestablish that TQA initialization leads to convergenceof the QAOA to a nearly optimal minimum for a cer-tain range of time steps, see Fig. 1(c) for visualization.Furthermore, we identify the optimal time step of theTQA initialization and suggest a purely experimentalway of fixing this parameter.

Our work reveals a connection between intermediate-p QAOA and short-time quantum annealing. Previ-ous studies [6–8] established a correspondence betweenquantum annealing with long annealing times and theQAOA protocol with large p (potentially increasingexponentially with the problem size). More recentwork proposed quantum annealing inspired initializa-tion strategies for the so-called ‘bang-bang’ modifica-tion of the QAOA [21] that however also correspondsto large circuit depths. Our work is different from thiscontext, since we establish that the best performance isachieved for a very coarse discretization of quantum an-nealing, resulting in a realistic circuit depth. We showthe existence of an optimal step for TQA discretizationthat does not depend on problem size and QAOA depth.This suggests an intimate relation between QAOA andTQA, since the optimal value of the time step is in closecorrespondence to the point where proliferation of Trot-ter error occurs in TQA [22].

The remainder of the paper is organized as follows:in Section 2 we introduce the QAOA, visualize its opti-mization landscape and show that most random initial-

(a)

optimized

i

γ*i , β*i

1 5+ ⟩ + ⟩ + ⟩

e−iγ1HC

e−iβ1HB

e−iγpHC

e−iβpHB

Measure ⟨HC⟩(b) (c)

TQA-initialization

Random initializationsinitialization

i

γi, βi

1 5

Figure 1: (a) The circuit that prepares a quantum state in theQAOA is parametrized by a set of 2p angles γi, βi. For theMaxCut problem, that is considered in the main text, the uni-taries can be expressed using single and two qubit gates that arereadily available on current NISQ devices. (b) The optimizationof 〈HC〉 is launched from a certain guess of parameters, statepreparation and measurements are iterated until the algorithmconverges to a set of optimized angles γ∗

i , β∗i . (c) The cartoon

of the cost function 〈HC〉 landscape as a function of varia-tional parameters shows that random initializations are proneto converge to sub-optimal local minima. In contrast, the fam-ily of TQA initializations proposed in this work converges tothe (nearly) optimal minimum.

izations concentrate around sub-optimal local minima.Next, in Section 3 we discuss TQA and the correspond-ing initialization and show that it avoids converging atsub-optimal local optima. Finally, in Section 4 we sum-marize the results, discuss its implications and potentialfuture work.

2 Optimization landscape of QAOA2.1 QAOA for MaxCut problemsAs we discussed in the Introduction, the QAOA is of-ten applied to hard combinatorial optimization prob-lems. In what follows we concentrate on the problemof finding a maximal cut (MaxCut) in a given graphwhich has become one of standard tasks used to bench-mark the QAOA [7, 9]. Finding the maximum cut is anNP -hard combinatorial optimization problem, thoughefficient classical algorithms exist that yield good ap-proximate solutions. Notably, the Goemans-Williamsonalgorithm yields a cut that is at least 88% of the size ofthe maximum cut in polynomial time [23].

Given a graph G = (V,E) with vertices V =1, 2, ..., N and edges E = {〈i, j〉}, the maximal cut isdefined as the partition that splits the vertices into twogroups, maximizing the number of edges that connectvertices from different groups. Mathematically, suchpartition amounts to finding the global minimum of acost function, C(~z) =

∑〈i,j〉εE zizj , where the binary

variables zi correspond to the vertices of the graph, and


their value zi = ±1 encodes which partition the givenvertex i belongs to. The cost function C(~z) can bemapped into a classical spin Hamiltonian by promotingthe binary variable zi to the quantum spin-1/2 operatorσzi . The resulting Hamiltonian,

HC =∑〈i,j〉εE

σzi σzj , (1)

operates on N spins that reside on the vertices V of thecorresponding graph and interact with each other whenconnected by an edge.

The QAOA uses a NISQ device to prepare the follow-ing quantum state [see Fig. 1(a)]:

|~γ, ~β〉 =p∏i=1

e−iβiHBe−iγiHC |0〉B , (2)

where HC is classical Hamiltonian introduced above,and HB = −

∑Ni σ

xi the mixing Hamiltonian, as pro-

posed by Farhi et al. [6]. Both operators operate on theHilbert space corresponding to N spins or, equivalently,qubits, and the initial state |0〉B = |+〉⊗N correspondsto all qubits pointing along x-direction, thus yieldingthe ground state of HB . The variational parameters areobtained by minimizing the expectation value 〈HC〉~γ,~βas:

( ~γ∗, ~β∗) = arg min(~γ,~β)〈~γ, ~β|HC |~γ, ~β〉, (3)

which is typically carried out with numerical optimiza-tion routines. To benchmark the QAOA it is useful todefine the approximation ratio,

r~γ,~β = 〈~γ,~β|HC |~γ, ~β〉Cmin

, (4)

which quantifies how close the expectation value of theclassical Hamiltonian over the QAOA wave function isto the ground state energy of HC , denoted as Cmin. ForQAOA at depth p = 1 the algorithm is guaranteed tofind a cut that is at least 69% the size of the optimalcut [6], while for p > 1 analytic results are limited [24].

The performance of the QAOA is typically investi-gated over an ensemble of graphs rather than an individ-ual realization. Below we focus on the ensemble of ran-dom 3-regular graphs, where each vertex is connected tothree other vertices chosen at random. However, in theAppendix we also consider weighted 3-regular graphsand Erdos-Renyi graph ensembles in order to illustratethe general applicability of our results.

2.2 Visualizing optimization landscapeThe performance of the classical optimization in Eq. (3)strongly depends on the properties of the optimization

landscape. While this landscape can be readily visual-ized for p = 1, the dependence of approximation ratior~γ,~β on 2p angles parametrizing QAOA was suggestedto become progressively more complex for larger valuesof p. In order to visualize the properties of this high-dimensional landscape, we focus below on points where1− r~γ,~β achieves (local) minima.

We quantify properties of minima using two differ-ent characteristics. First, we measure the difference be-tween the approximation ratio of the given minimumcharacterized by angles ~γ, ~β and the global minimumcharacterized by angles ~γ∗, ~β∗, ∆r~γ,~β = r~γ∗,~β∗ − r~γ,~β .This definition implies that the smallest possible valueof ∆r~γ,~β is 0, and larger values of ∆r~γ,~β corresponds to

local minima with poor performance (i.e. much largervalue of cost function) compared to the global mini-mum. The second characteristic measures the distancebetween minima in parameter space,

d~γ,~β =p∑i=1

(|βi − β∗i |π2 + |γi − γ∗i |π

), (5)

where | . . . |α denotes the absolute value modulo α whichtakes into account symmetries, see Appendix A.

We calculate values of ∆r~γ,~β and d~γ,~β numerically.For a given graph realization we use 2p different ran-dom initializations of variational parameters ~γ, ~β andoptimize them using the iterative BFGS algorithm [25–28]. The algorithm is accessed via the scipy.optimizePython module with default parameters [29]. Conver-gence is achieved when the norm of the gradient is lessthan 10−5, maximum number of iterations is set to 400p,where p is the QAOA depth. In our simulations the rou-tine typically converged before using up the maximumnumber of allowed iterations. We use the converged an-gles with the lowest value of 1− r~γ,~β as an estimate forthe global minimum γ∗i , β

∗i .

Figure 2 visualizes the structure of local minima viathe joint probability distribution of ∆r~γ,~β and d~γ,~β for

50 different graphs using Kernel Density Estimation [30,31]. We observe that for QAOA with p = 5 the mosttypical local minima reached from random initializationare far away from the best minimum (corresponding to∆r~γ∗,~β∗ = 0 and d~γ∗,~β∗ = 0) both in terms of quality ofapproximation ratio and parameter values. While thisfigure illustrates a particular choice of system size andQAOA depth, a similar trend is observed for differentN , p, and other graph ensembles, see Appendix A.

The tendency of random initialization to converge tosuboptimal solutions highlights the importance of bet-ter initialization methods. In the next section we inves-tigate a family of initializations inspired by quantumannealing and demonstrate that it achieves a good ap-proximation ratio with a suitable choice of parameters.


0.0 2.5 5.0 7.5 10.0 12.5 15.0d~�,~�

0.00

0.05

0.10

0.15

0.20

0.25

0.30�

r ~�,~ �

avg. TQA init. QAOA

avg. random init. QAOA

0.0

0.2

0.4

0.6

0.8

1.0

Joi

nt

pro

bab

ilty

dis

trib

ution

Figure 2: Joint probability distribution of distance to the globalminimum in parameter space d~γ,~β and in terms of approxima-tion ratio ∆r~γ,~β reveals that the most probable outcome of ran-dom initialization is a convergence to sub-optimal local minima(yellow region). The orange dot corresponds to average val-ues of d~γ,~β ,∆r~γ,~β for random initialization. In contrast, TQAinitialization leads to a local minimum with a better approx-imation ratio that occasionally outperforms the best randominitialization (red square, shifted from slightly negative valuesto ∆r~γ,~β = 0 for improved visibility). The data is averaged over50 random unweighted 3-regular graphs with N = 12 verticesand QAOA at level p = 5.

3 Trotterized quantum annealing as ini-tialization

3.1 Optimal time for TQA

Quantum annealing [32, 33] was among the first algo-rithms proposed for quantum computing [34, 35], andwas demonstrated to be universal for T →∞ and equiv-alent to digital quantum computing [36]. The generalidea of quantum annealing is to prepare the groundstate |0〉C of a classical Hamiltonian HC starting fromthe ground state |0〉B of the mixing Hamiltonian HB

using adiabatic time evolution under the HamiltonianH(t) = (1 − t/T )HB + (t/T )HC . Practical executionof quantum annealing on NISQ devices requires dis-cretization to represent such unitary evolution via a se-quence of gates, resulting in the TQA algorithm. Thefirst order Suzuki-Trotter decomposition allows to ap-proximate the time evolution with H(t) over time in-terval ∆t as e−i∆tH(t) ≈ e−iβHBe−iγHC + O(∆t2) withβ = (1− t/T )∆t and γ = (t/T )∆t.

Applying such decomposition to the quantum anneal-ing protocol that is uniformly discretized on a grid ofevolution times ti = i∆t with i = 1, ..., p and time step∆t = T/p, we obtain the unitary circuit equivalent to

2 4 6 8 10p

1

2

3

4

5

6

7

T∗

4 6 8 10 12N

0.7

0.8

δt

0 5 10T

0.0

0.5

r ~γ,~ β

p = 10

p = 5

T ∗TQA = δt p

Numerics

Figure 3: Optimal time of TQA evolution T ∗ increases linearlywith number of discretization steps p. Top inset illustratesthat optimal performance of TQA at time T ∗ is followed bythe rapid decrease in approximation ratio at longer times T ∗.Data is shown for N = 12. Bottom inset shows finite sizescaling of the time step δt, determined by the slope of the T ∗

vs p dependence, that assumes approximately constant valuewith the graph size. All averaging is performed over 50 randominstances of unweighted 3-regular graphs.

the depth-p QAOA ansatz (2) with angles being

γi = i

p∆t, βi =

(1− i

p

)∆t. (6)

In what follows we refer to such choice of angles as TQAinitialization, controlled by the time step ∆t at a fixeddepth p.

The mapping between TQA and QAOA along withthe universality of quantum annealing for T → ∞ waspreviously used as an argument for the existence of goodQAOA protocols at depths p→∞ [6]. Typically the re-quired evolution time of quantum annealing is inverselyproportional to the square of the minimal energy gapT ∝ ∆−2 encountered in the Hamiltonian H(t) overthe time evolution. Numerous studies established thatthe time required for a good performance often blowsup exponentially due to the encounter of exponentiallysmall gaps in N [20].

In contrast to previous studies, we investigate TQAperformance in a different setting that is motivated byits subsequent usage as a QAOA initialization. TheQAOA is characterized by a fixed circuit depth, p.Therefore, we fix p and study the performance of TQAas a function of total time T or, equivalently, time step∆t, related as T = p∆t. Generally the performance ofquantum annealing tends to increase with the total an-nealing time. However in case of fixed p, longer anneal-ing time corresponds to a coarser discretization, whichleads to larger Trotter errors that scale proportionallyto O(∆t2) at small values of ∆t. It is the interplay be-tween increased efficiency and Trotter errors that leads


to the existence of an optimal annealing time in thepresent setting. This is illustrated in Fig. 3 (top inset),where the approximation ratio for the TQA protocolincreases with T for small times, reaching a maximumat time T ∗ followed by a sharp downturn. The sharpdecrease of QA performance after T ∗ was reported byHeyl et al. [22], who attributed it to a phase transitioncaused by a proliferation of Trotter errors.

Main panel of Fig. 3 reveals a linear scaling of theoptimal time T ∗ with the number of time steps p. Thisis equivalent to the existence of an optimal time step δt,that determines T ∗ as

T ∗TQA = δt p. (7)

The bottom inset in Fig. 3 shows that the time step δtdefined as a slope of a linear fit of T ∗ with p convergeswith the problem size N . This gives a strong evidencethat δt is a well-defined quantity in the thermodynamiclimit N → ∞. For the family of the 3-regular graphsconsidered here we observe that the optimal time steptends to value δt ≈ 0.75. The existence of an optimaltime step that is of order one holds for three other graphensembles, considered in Appendix B, although the nu-merical value of this time step depends on the specificgraph ensemble.

We use the TQA initialization in Eq. (6) with timestep ∆t = 0.75 for the QAOA and observe in Fig. 2 thatit allows to avoid the local minima and helps the QAOAto converge to a minimum that is very close to the globalminimum in terms of approximation ratio. This resultmotivates the systematic analysis of the performance ofthe TQA initialization.

3.2 TQA initialization of QAOAWe continue with a detailed study of the TQA initial-ization defined in Eq. (6) as a function of time T atfixed p. The green line in Fig. 4(a) reveals that theapproximation ratio remains constant for a range oftimes, denoted as [T ∗min, T

∗max]. This figure shows re-

sults for p = 5 QAOA applied to graphs with N = 12vertices, but a similar trend holds for other values ofdepth, problem sizes, and graph ensembles. The con-stant approximation ratio in a range of T is naturallyexplained by the convergence of parameter optimizationroutine to the same minimum for T ∈ [T ∗min, T

∗max], see

cartoon in Fig. 1(c). In order to discriminate betweendifferent times in the above range, we study the dis-tance between initialization parameters and optimizedvalues of ~γ, ~β. The red line Fig. 4(a) shows that thisdistance has a well-pronounced minimum at a time de-noted as T ∗d that is contained within the same interval[T ∗min, T

∗max]. The TQA initialization with time T ∗d is

0 1 2 3 4 5T

0.00

0.05

0.10

0.15

0.20

1−

2 4 6 8 10

p

0

2

4

6

8

10

T∗

0.0

0.5

1.0

1.5

2.0

2.5

3.0

d �γ,� β

(a)

(b)

r �γ,� β

T ∗max

T ∗min

T ∗d

T ∗max

T ∗min

T ∗d

T ∗TQA = δtp

Figure 4: (a) Approximation ratio of the p = 5 QAOA as afunction of TQA initialization time T reveals that a range ofinitialization times [T ∗

min, T∗max] (green triangle and star) yield

the performance within 1% of the minimal 1 − r~γ,~β . On theother hand, the study of the distance between the TQA ini-tialization and the converged value of the angles reveals theexistence of a time T ∗

d where the QAOA performs the smallestparameter updates. (b) All three times T ∗

min, T ∗max, and T ∗

d de-fined in panel (a) increase linearly with QAOA circuit depth p.Moreover, T ∗

d is very close to the time where the TQA protocolitself achieves optimal performance, T ∗

TQA, see Fig. 2. Data wasobtained for N = 12 and averaged over 50 random graphs.

closest to the local minimum achieved from it in a senseof distance defined in Eq. (5).

All three times T ∗min, T ∗max, and T ∗d were defined aboveusing the QAOA with fixed depth p. Figure 4(b) revealsthat all three times scale approximately linearly with p.This allows to define a range of time steps for the TQAinitialization that yield the same performance of opti-mized QAOA, ∆t ∈ [0.16, 0.92] for the present graphensemble. Moreover, the time T ∗d nearly coincides withthe optimal TQA time T ∗TQA = δt p obtained in theprevious section, implying that ∆t = δt = 0.75 is theoptimal value of time step. This result also holds forthe MaxCut problem on other graph families, see Ap-pendix.

The similarity between the optimal time of the TQAprotocol to the time where the angular distance d~γ,~βbetween the initial and final protocol is minimized, sug-gests that the performance of the QAOA is boundedby the same phase transition that occurs in TQA [22].However, the QAOA is able to provide a significant im-provement over TQA by doing additional optimizationsof variational parameters. Recent work [7] suggestedthat such performance improvement may be due to uti-


2 4 6 8 10

p

10−2

10−1

100

2p random init.

4 6 8 10 12N

0.0

0.1

1−

r γ,β

1−

r γ,β

= 0.75δtTQA init. with

Figure 5: A single optimization run of the QAOA with TQAinitialization with time T = δt p yields equivalent performanceto the best out of 2p random initializations. System size is N =12. Inset reveals that the comparable performance persists overthe entire range of considered system sizes, circuit depth isp = 10. Averaging was performed over 50 random graphs.

lization of “diabatic pumps” that allow to return thepopulation from excited states back to the ground state.This could potentially explain the systematic deviationof the QA protocol from TQA initialization as seen inFig. 8 in Appendix C.

Finally, we compare the performance of QAOA thatused 2p random initializations to the QAOA launchedfrom TQA initialization with optimal time step δt. Sur-prisingly, Fig. 5 shows that TQA initialization yields thesame performance as the best result for random initial-ization even for QAOA protocols with depth compara-ble to the problem size, N . Moreover, the inset of Fig. 5illustrates that the excellent performance of TQA ini-tialization holds true for a broad range of system sizesN , while Appendix D presents equally encouraging re-sults for other graph ensembles. Note that the QAOAperformance for fixed p decreases with system size N ,which was attributed to the fact that the QAOA withfixed p cannot “probe” the whole graph. In order forthe QAOA to achieve constant performance for increas-ing problem size N , the depth of QAOA should increaseat least as logN [7].

4 Summary and discussionOur central result is the establishment of a family ofTQA initializations for the QAOA parametrized by atime step ∆t. We find that TQA initialization allowsthe QAOA to find a solution close to the global optimafor a broad range of parameter ∆t. In this range ourinitialization scheme achieves results similar to the bestoutcome of 2p random initializations, with a single opti-mization run. Moreover we establish a heuristic way to

identify the optimal ∆t for the TQA initialization fromthe performance of the TQA protocol.

Our results open the door to more time-efficient prac-tical implementations of the QAOA on NISQ devices.To this end, we propose a two-step practical NISQ al-gorithm that capitalizes on the success of TQA initial-ization and uses the heuristic results to establish anoptimal value of the time step. The first two steps ofAlgorithm 1 implement the TQA protocol on a NISQdevice, thus obtaining an estimate for the optimal timein the TQA initialization. This can be readily carriedout on today’s NISQ devices [37]. The second part ofthe algorithm consists of running the QAOA optimiza-tion loop using values of variational parameters accord-ing to Eq. (6).

Algorithm 1 QAOA with TQA initialization1: Implement QAOA ansatz with circuit depth p.2: Estimate time step δt using TQA:

find optimal time T ∗ ← arg minT⟨HC

⟩p

and set δt← T∗

p , see Fig. 3.3: Use TQA initialization γi ← i

pδt and βi ← (1− ip )δt.

4: Run the QAOA parameter optimization, see Fig. 1.

Numerical simulations presented above suggest goodperformance of the above algorithm in the idealized casewhen presence of noise, gate errors, and other imperfec-tions are neglected. Moreover, the fact that TQA ini-tialization converges to a good minimum for the rangeof times (equivalently, time steps) T ∈ [T ∗min, T

∗max], see

Fig. 4, suggests that this algorithm has a high toler-ance towards imperfections in determining the value ofδt. Determining the performance of this algorithm ona real NISQ device or incorporating some of the im-perfections into the numerical simulation remains aninteresting open problem.

In our studies we restricted our attention to the Max-Cut problem and demonstrated success of our approachfor three different random graph ensembles. We expectthat these results also hold for other graph ensembles,provided that the concentration of the QAOA landscapeis true [16]. It is also interesting to check if our find-ings hold true beyond the MaxCut problem. Further-more, it will be interesting to study the finite size scal-ing for problem sizes N > 12 considered here usingmatrix product states [38] or neural-network quantumstates [39, 40].

In addition to practical NISQ algorithms, our findingsuggest a previously unknown connection between theQAOA at relatively small circuit depth and quantumannealing. The fact that quantum annealing inspiredinitializations belong to a basin of attraction of a high-quality minimum in the QAOA landscape, see Fig. 1(c),


invites a more comprehensive study of the QAOA land-scape from this perspective. How many good qualityminima typically exist in such landscape? How differ-ent are they from each other and what are their basinsof attraction? Can one use other information mea-sures such as entanglement or Fisher information [41]to characterize the QAOA landscape? Finding answersto such questions may lead to other prospective familiesof QAOA initializations.

While TQA provides a good initialization, the sub-sequent QAOA optimization is able to significantlyimprove the performance. Understanding the under-lying mechanisms of such performance improvementis an outstanding challenge. In particular, there re-mains an intriguing possibility that the QAOA opti-mization routine implements some of the techniques,developed to improve the annealing fidelity, such as di-abatic pumps [7], shortcuts to adiabaticity [42], andcounterdiabatic driving [43, 44]. The fact that the op-timal time step coincides with the point of proliferationof Trotter errors [22], thus effectively taking maximalpossible value suggests interesting parallels to the Pon-tryagin’s minimum principle considered in context ofvariational quantum algorithms [45].

To conclude, we hope that TQA initialization of theQAOA established in this work will help to achieve prac-tical quantum advantage by executing the QAOA onavailable devices and inspire future research that couldlead to better understanding of what happens under thehood of QAOA optimization.

Data and code availabilityData is available upon reasonable request, a brief tuto-rial for the TQA initialization can be found in Ref. [46]

AcknowledgmentsWe would like to thank D. Abanin and R. Medina forfruitful discussions and A. Smith and I. Kim for valuablefeedback on the manuscript. We acknowledge supportby the European Research Council (ERC) under theEuropean Union’s Horizon 2020 research and innovationprogram (Grant Agreement No. 850899).

A Optimization landscape for differentgraph ensemblesWe start by reviewing all graph ensembles used in themain text and Appendices. In particular, we focus onsymmetries that allow to reduce the space of QAOAparameters.

3-regular unweighted graphs represent the graph en-semble considered in the main text. Each vertex is con-nected exactly to three other vertices chosen at random.In order to sample graphs from this ensemble we usethe networkx Python package [47]. For 3-regular un-weighted graphs the space of variational parameters canbe restricted using the fact that the classical Hamilto-nian has integer eigenvalues (thus γi are defined moduloπ) and that shifting any of angles βi by π/2 is equiv-alent to a spin flip of HC that has no effect [7]. Thisallows to restrict βi ∈ [−π4 ,

π4 ) and γi ∈ [−π2 ,

π2 ), and is

reflected in the definition of distance in Eq. (5) in themain text.

3-regular weighted graphs are characterized by pres-ence of random weights wij assigned to each edge〈i, j〉. These weights are chosen to be wij ∈ [0, 1).Presence of random weights does not allow to restrictthe domain of γi angles as before, though restrictionβi ∈ [−π4 ,

π4 ) still works. Therefore the analogue of

Eq. (5) for this and other weighted ensembles reads

d(w)~γ,~β

=∑pi=1(|βi − β∗i |π2 + |γi − γ∗i |).

Erdos-Renyi graphs represent a random graph ensem-ble where two edges are connected on random with afixed probability, chosen to be q = 0.5. In contrast toabove examples, the fixed value of q implies that edgeconnectivity increases with number of vertices as qN .Erdos-Renyi graphs exhibit the same symmetries as 3-regular unweighted graphs.

The presence of an unbounded region of parametersγi in the weighted graph ensemble represents an ad-ditional challenge in visualizing the QAOA optimiza-tion landscape and choice of initialization parameter.In order to explore the importance of large values of|γi|, we consider the sequence of enlarged intervalsγi ∈ [−k π2 , k

π2 ) with k = 1, 2. Figure 6 shows the joint

probability distributions similar to Fig. 2. We see thatfor 3-regular weighted graphs the enlarged initializationinterval k = 2 leads to a concentration of local optimafurther away from the global solution compared to thek = 1 interval. When we repeat the same analysis forErdos-Renyi graphs, we observe that ∆r~γ,~β is unaffectedby the enlarged k = 2 interval. This numerically con-firms the symmetry considerations from above and al-lows us to restrict ~γ to the k = 1 interval in all furtheranalysis. For unweighted graphs such restriction relieson symmetry, and for weighted graphs this is motivatedby the fact that an extended region of γi worsens theperformance of random initialization in the QAOA.

B Optimal time for TQABelow we discuss the dependence of the optimal timestep δt of the TQA algorithm on the graph ensemble.


0.0

0.2

0.4

0.6�

r ~�,~ �

avg. TQA init. QAOA

avg. random init. QAOA

0 10 20d~�,~�

0.0

0.2

0.4

0.6

�r ~�

,~ �

0 10 20d~�,~�

0.0

0.2

0.4

0.6

0.8

1.0

Joi

nt

pro

bab

ilty

dis

trib

ution

Figure 6: Comparing the joint probability distribution of thedistance to the global minimum in parameter space d~γ,~β andin terms of approximation ratio ∆r~γ,~β for weighted 3-regular(top) and Erdos-Renyi graphs with edge probability 0.5 (bot-tom) reveals that the distribution is dependent on the initial-ization interval for weighted 3-regular graphs. We initialize theparameters for k = 1 (left) and k = 2 (right) and observethat for weighted 3-regular graphs the enlarged interval leadsto an increased spread of the local optimas in ∆r~γ,~β (yellowregion). The spread in ∆r~γ,~β for Erdos-Renyi graphs remainslargely unaffected, as expected from the symmetry considera-tions. Similarly to Fig. 2, red squares correspond to the QAOAminimum achieved from TQA initialization (shifted from smallnegative values of ∆r~γ,~β to zero for improved visibility), or-ange dots correspond to the average performance of randominitialization. Data is for 50 random graphs with N = 10 andp = 5.

An analytical upper bound on the number of Trottersteps p needed to approximate the time evolution withprecision ε in terms of operator trace distance was ob-tained in Ref. [48]. Translating this bound into the scal-ing of δt we obtain δt ∝ 1/(||HC ||FN), where ||HC ||F isthe Frobenius norm of the classical Hamiltonian. Thisnorm exponentially diverges with N , suggesting verysmall values of δt at large system sizes. This is notsurprising, since the bound of Ref. [48] operates on thedistance between two many-body unitary operators. Incontrast, the performance of the TQA algorithm is stud-ied using the approximation ratio that quantifies howclose the expectation value of the local observable HC ,is to the ground state energy.

The effect of Trotterization on local observables wasconsidered in Ref. [22]. This work conjectured the ex-istence of a finite value of the time step of order one,at which the discretization of time evolution fails toapproximate the local observables. This value of thetime step may be related to the convergence radius ofthe Baker-Campbell-Hausdorff series expansion, which

0.75

1.00

1.25

δt

4 5 6 7 8 9 10N

2

3

4

s N

weighted

unweighted

iyneR–sodrE

Figure 7: (Top) Optimal time time step of TQA evolution δtis largely independent of system size and scales qualitativelysimilar to Eq. (8) shown in the bottom panel.

is governed by the norm of the classical Hamiltonianand its commutator with HB . Phenomenologically, theFrobenius norm divided by the square root of Hilbertspace dimension and problem size N ,

sN = N2N/2

||HC ||F, (8)

is expected to be N -independent in the thermodynamiclimit.

Figure 7 compares the dependence of δt on the sys-tem size with the phenomenological scaling sN definedin Eq. (8). We observe that the expression sN qualita-tively matches the numerical scaling that we observe forδt between different graph ensembles. In particular, thevalue of the time step is largest for weighted 3-regulargraphs that are expected to have the smallest norm ofthe classical Hamiltonian. However, sN fails to cap-ture δt quantitatively, highlighting the need to developa better analytical understanding of the point that gov-erns the phase transition from localization to quantumchaos for local observables according to Ref. [22].

C Patterns in optimized parametersThe QAOA is inspired by TQA and is thus universalfor p→∞. However, for finite p the converged QAOAparameters also display stark similarity to a QA pro-tocol which was noticed in some earlier works [7, 8].In Fig. 8 we compare the TQA initialization and fi-nal QAOA parameters. The QAOA parameters showonly slight alterations at the beginning of the protocoland remain close to their original values throughout therest of the protocol. This holds true for the three graphtypes that we considered in our analysis. In addition,the small variation between optimal parameters for dif-ferent graph instances is in line with the concentration


0.0

0.5

1.0

1.5

−0.5

0.0

0.5

1.0

1.5

2.0

2 4 6 8 10i

0.0

0.5

1.0

1.5

(a)

(b)

(c)

γ∗ i,β

∗ iγ∗ i,β

∗ iγ∗ i,β

∗ i

Figure 8: Converged parameters ~γ∗ (red) and ~β∗ (orange) showonly slight alterations from the TQA initialization indicated bythe green and blue lines respectively. The QAOA optimizationmodifies parameters at small i, while they remain TQA-like inthe rest of the protocol. The results were averaged over 50random unweighted 3-regular graphs (a), weighted 3-regulargraphs (b) and Erdos-Renyi graphs (c), all data is for p = 10and N = 10.

of the QAOA landscape demonstrated analytically atlow p in Ref. [16].

D Random vs TQA initialization forother graph ensemblesIn addition to the unweighted 3-regular graphs, dis-cussed in the main text, we also test TQA initializationon weighted 3-regular graphs and Erdos-Renyi graphs.We find that TQA initialization yields the same perfor-mance as the best of random initializations for weighted3-regular graphs, see Fig. 9. For Erdos-Renyi, TQAinitialization even outperforms the best of 2p randominitializations.

References[1] F. Arute et al., Quantum Approximate Optimiza-

tion of Non-Planar Graph Problems on a Pla-nar Superconducting Processor, arXiv e-prints ,arXiv:2004.04197 (2020), arXiv:2004.04197 [quant-ph] .

10−2

10−1

100

1−

r �γ,� β

2p random init. QAOA

2 4 6 8 10p

10−2

10−1

100

1−

r �γ,� β

2p random init. QAOA

5.0 7.5 10.0N

0.0

0.1

1−

r �γ,� β

5.0 7.5 10.0N

0.0

0.2

1−

r �γ,� β



Figure 9: TQA initialization leads to the same QAOA perfor-mance as the best of 2p random initializations for both weighted3-regular graphs (top) and Erdos-Renyi graphs (bottom). Weaverage the results over 50 graph realizations, the main plotwas obtained for system size N = 10, inset is for circuit depthp = 10.

[2] F. Arute et al., Hartree-Fock on a superconductingqubit quantum computer, Science 369, 1084 (2020),arXiv:2004.04174 [quant-ph] .

[3] F. Arute et al., Observation of separated dy-namics of charge and spin in the Fermi-Hubbardmodel, arXiv e-prints , arXiv:2010.07965 (2020),arXiv:2010.07965 [quant-ph] .

[4] K. Wright et al., Benchmarking an 11-qubit quan-tum computer, Nature Communications 10, 5464(2019), arXiv:1903.08181 [quant-ph] .

[5] J. Preskill, Quantum Computing in the NISQera and beyond, arXiv e-prints , arXiv:1801.00862(2018), arXiv:1801.00862 [quant-ph] .

[6] E. Farhi, J. Goldstone, and S. Gutmann, A Quan-tum Approximate Optimization Algorithm, arXive-prints , arXiv:1411.4028 (2014), arXiv:1411.4028[quant-ph] .

[7] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, andM. D. Lukin, Quantum approximate optimizationalgorithm: Performance, mechanism, and imple-mentation on near-term devices, Phys. Rev. X 10,021067 (2020).

[8] G. E. Crooks, Performance of the Quan-tum Approximate Optimization Algorithm onthe Maximum Cut Problem, arXiv e-prints ,arXiv:1811.08419 (2018), arXiv:1811.08419 [quant-ph] .

[9] M. Willsch, D. Willsch, F. Jin, H. De Raedt, andK. Michielsen, Benchmarking the quantum approx-imate optimization algorithm, Quantum Informa-tion Processing 19, 197 (2020), arXiv:1907.02359[quant-ph] .


https://arxiv.org/abs/2004.04197


https://doi.org/10.1126/science.abb9811



https://doi.org/10.1038/s41467-019-13534-2

https://doi.org/10.1038/s41467-019-13534-2





https://doi.org/10.1103/PhysRevX.10.021067




https://doi.org/10.1007/s11128-020-02692-8

https://doi.org/10.1007/s11128-020-02692-8



[10] S. Bravyi, A. Kliesch, R. Koenig, and E. Tang, Ob-stacles to State Preparation and Variational Op-timization from Symmetry Protection, arXiv e-prints , arXiv:1910.08980 (2019), arXiv:1910.08980[quant-ph] .

[11] G. G. Guerreschi and A. Y. Matsuura, QAOA forMax-Cut requires hundreds of qubits for quantumspeed-up, Scientific Reports 9, 6903 (2019).

[12] R. Shaydulin, I. Safro, and J. Larson, Multistartmethods for quantum approximate optimization, in2019 IEEE High Performance Extreme ComputingConference (HPEC) (2019) pp. 1–8.

[13] J. R. McClean, S. Boixo, V. N. Smelyan-skiy, R. Babbush, and H. Neven, Barrenplateaus in quantum neural network training land-scapes, Nature Communications 9, 4812 (2018),arXiv:1803.11173 [quant-ph] .

[14] Z. Holmes, K. Sharma, M. Cerezo, and P. J.Coles, Connecting ansatz expressibility to gradientmagnitudes and barren plateaus, arXiv e-prints ,arXiv:2101.02138 (2021), arXiv:2101.02138 [quant-ph] .

[15] M. Cerezo, A. Sone, T. Volkoff, L. Cincio,and P. J. Coles, Cost function dependent bar-ren plateaus in shallow parametrized quantum cir-cuits, Nature Communications 12, 1791 (2021),arXiv:2001.00550 [quant-ph] .

[16] F. G. S. L. Brandao, M. Broughton, E. Farhi,S. Gutmann, and H. Neven, For Fixed ControlParameters the Quantum Approximate Optimiza-tion Algorithm’s Objective Function Value Con-centrates for Typical Instances, arXiv e-prints ,arXiv:1812.04170 (2018), arXiv:1812.04170 [quant-ph] .

[17] D. J. Egger, J. Marecek, and S. Woerner, Warm-starting quantum optimization, arXiv e-prints ,arXiv:2009.10095 (2020), arXiv:2009.10095 [quant-ph] .

[18] M. Alam, A. Ash-Saki, and S. Ghosh, Acceler-ating Quantum Approximate Optimization Algo-rithm using Machine Learning, arXiv e-prints ,arXiv:2002.01089 (2020), arXiv:2002.01089 [cs.ET].

[19] S. Khairy, R. Shaydulin, L. Cincio, Y. Alexeev,and P. Balaprakash, Learning to Optimize Vari-ational Quantum Circuits to Solve Combinato-rial Problems, arXiv e-prints , arXiv:1911.11071(2019), arXiv:1911.11071 [cs.LG] .

[20] T. Albash and D. A. Lidar, Adiabatic quantumcomputation, Rev. Mod. Phys. 90, 015002 (2018).

[21] D. Liang, L. Li, and S. Leichenauer, Investigatingquantum approximate optimization algorithms un-der bang-bang protocols, Phys. Rev. Research 2,033402 (2020).

[22] M. Heyl, P. Hauke, and P. Zoller, Quantum lo-calization bounds Trotter errors in digital quan-tum simulation, Science Advances 5, 10.1126/sci-adv.aau8342 (2019).

[23] M. X. Goemans and D. P. Williamson, Improvedapproximation algorithms for maximum cut andsatisfiability problems using semidefinite program-ming, J. ACM 42, 1115 (1995).

[24] J. Wurtz and P. J. Love, Bounds on MAXCUTQAOA performance for p > 1, arXiv e-prints ,arXiv:2010.11209 (2020), arXiv:2010.11209 [quant-ph] .

[25] C. G. BROYDEN, The Convergence of a Class ofDouble-rank Minimization Algorithms 1. GeneralConsiderations, IMA Journal of Applied Mathe-matics 6, 76 (1970).

[26] R. Fletcher, A new approach to variable metric al-gorithms, The Computer Journal 13, 317 (1970).

[27] D. Goldfarb, A family of variable-metric meth-ods derived by variational means, Mathematics ofComputation 24, 23 (1970).

[28] D. F. Shanno, Conditioning of quasi-Newton meth-ods for function minimization, Mathematics ofComputation 24, 647 (1970).

[29] P. Virtanen, R. Gommers, T. E. Oliphant,M. Haberland, T. Reddy, D. Cournapeau,E. Burovski, P. Peterson, W. Weckesser, J. Bright,S. J. van der Walt, M. Brett, J. Wilson, K. J.Millman, N. Mayorov, A. R. J. Nelson, E. Jones,R. Kern, E. Larson, C. J. Carey, I. Polat, Y. Feng,E. W. Moore, J. VanderPlas, D. Laxalde, J. Perk-told, R. Cimrman, I. Henriksen, E. A. Quintero,C. R. Harris, A. M. Archibald, A. H. Ribeiro,F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Con-tributors, SciPy 1.0: Fundamental Algorithms forScientific Computing in Python, Nature Methods17, 261 (2020).

[30] M. Rosenblatt, Remarks on some nonparametricestimates of a density function, Ann. Math. Statist.27, 832 (1956).

[31] E. Parzen, On estimation of a probability densityfunction and mode, Ann. Math. Statist. 33, 1065(1962).

[32] T. Kadowaki and H. Nishimori, Quantum anneal-ing in the transverse Ising model, Phys. Rev. E 58,5355 (1998).

[33] J. Brooke, D. Bitko, T. F. Rosenbaum, and G. Aep-pli, Quantum Annealing of a Disordered Magnet,Science 284, 779 (1999).

[34] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan,A. Lundgren, and D. Preda, A Quantum Adia-batic Evolution Algorithm Applied to Random In-stances of an NP-Complete Problem, Science 292,472 (2001), arXiv:quant-ph/0104129 [quant-ph] .




https://doi.org/10.1038/s41598-019-43176-9

https://doi.org/10.1109/HPEC.2019.8916288

https://doi.org/10.1109/HPEC.2019.8916288

https://doi.org/10.1038/s41467-018-07090-4




https://doi.org/10.1038/s41467-021-21728-w








https://doi.org/10.1103/RevModPhys.90.015002

https://doi.org/10.1103/PhysRevResearch.2.033402

https://doi.org/10.1103/PhysRevResearch.2.033402

https://doi.org/10.1126/sciadv.aau8342

https://doi.org/10.1126/sciadv.aau8342

https://doi.org/10.1145/227683.227684



https://doi.org/10.1093/imamat/6.1.76

https://doi.org/10.1093/imamat/6.1.76

https://doi.org/10.1093/comjnl/13.3.317

http://www.jstor.org/stable/2004873




https://doi.org/10.1038/s41592-019-0686-2

https://doi.org/10.1038/s41592-019-0686-2

https://doi.org/10.1214/aoms/1177728190




https://doi.org/10.1103/PhysRevE.58.5355

https://doi.org/10.1103/PhysRevE.58.5355

https://doi.org/10.1126/science.284.5415.779

https://doi.org/10.1126/science.1057726

https://doi.org/10.1126/science.1057726

https://arxiv.org/abs/quant-ph/0104129

[35] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser,Quantum Computation by Adiabatic Evolu-tion, arXiv e-prints , quant-ph/0001106 (2000),arXiv:quant-ph/0001106 [quant-ph] .

[36] D. Aharonov, W. van Dam, J. Kempe, Z. Landau,S. Lloyd, and O. Regev, Adiabatic Quantum Com-putation Is Equivalent to Standard Quantum Com-putation, SIAM Review 50, 755 (2008).

[37] A. Smith, M. S. Kim, F. Pollmann, and J. Knolle,Simulating quantum many-body dynamics on acurrent digital quantum computer, npj Quan-tum Information 5, 106 (2019), arXiv:1906.06343[quant-ph] .

[38] U. Schollwock, The density-matrix renormalizationgroup in the age of matrix product states, Annalsof Physics 326, 96 (2011), arXiv:1008.3477 [cond-mat.str-el] .

[39] G. Carleo and M. Troyer, Solving the quantummany-body problem with artificial neural net-works, Science 355, 602 (2017), arXiv:1606.02318[cond-mat.dis-nn] .

[40] M. Medvidovic and G. Carleo, Classical vari-ational simulation of the Quantum Approxi-mate Optimization Algorithm, arXiv e-prints ,arXiv:2009.01760 (2020), arXiv:2009.01760 [quant-ph] .

[41] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Fi-galli, and S. Woerner, The power of quantum neu-ral networks, arXiv e-prints , arXiv:2011.00027(2020), arXiv:2011.00027 [quant-ph] .

[42] D. Guery-Odelin, A. Ruschhaupt, A. Kiely, E. Tor-rontegui, S. Martınez-Garaot, and J. G. Muga,Shortcuts to adiabaticity: Concepts, methods, andapplications, Rev. Mod. Phys. 91, 045001 (2019).

[43] D. Sels and A. Polkovnikov, Minimizing irreversiblelosses in quantum systems by local counterdiabaticdriving, Proceedings of the National Academy ofSciences 114, E3909 (2017).

[44] P. W. Claeys, M. Pandey, D. Sels, andA. Polkovnikov, Floquet-engineering counterdia-batic protocols in quantum many-body systems,Phys. Rev. Lett. 123, 090602 (2019).

[45] Z.-C. Yang, A. Rahmani, A. Shabani, H. Neven,and C. Chamon, Optimizing variational quantumalgorithms using Pontryagin’s minimum principle,Phys. Rev. X 7, 021027 (2017).

[46] S. H. Sack, Trotterized quantum annealing ini-tialization of the QAOA, https://github.com/shsack/TQA-init.-for-QAOA (2021).

[47] A. Hagberg, P. Swart, and D. S Chult, Explor-ing network structure, dynamics, and function us-ing NetworkX , Tech. Rep. (Los Alamos NationalLab.(LANL), Los Alamos, NM (United States),2008).

[48] D. W. Berry, G. Ahokas, R. Cleve, andB. C. Sanders, Efficient Quantum Algorithmsfor Simulating Sparse Hamiltonians, Communica-tions in Mathematical Physics 270, 359 (2007),arXiv:quant-ph/0508139 [quant-ph] .



https://doi.org/10.1137/080734479

https://doi.org/10.1038/s41534-019-0217-0

https://doi.org/10.1038/s41534-019-0217-0



https://doi.org/10.1016/j.aop.2010.09.012

https://doi.org/10.1016/j.aop.2010.09.012



https://doi.org/10.1126/science.aag2302






https://doi.org/10.1103/RevModPhys.91.045001

https://doi.org/10.1073/pnas.1619826114

https://doi.org/10.1073/pnas.1619826114

https://doi.org/10.1103/PhysRevLett.123.090602


https://github.com/shsack/TQA-init.-for-QAOA

https://github.com/shsack/TQA-init.-for-QAOA

https://networkx.org/



https://doi.org/10.1007/s00220-006-0150-x

https://doi.org/10.1007/s00220-006-0150-x


Documents

Quantum Annealing Initialization of the Quantum