Quantum physical synthesis: Improving physical design by netlist modifications

ARTICLE IN PRESS

Microelectronics Journal 41 (2010) 219–230

Contents lists available at ScienceDirect

Microelectronics Journal

0026-26

doi:10.1

� Corr

E-m

msedigh

journal homepage: www.elsevier.com/locate/mejo

Quantum physical synthesis: Improving physical design bynetlist modifications

Naser Mohammadzadeh �, Mehdi Sedighi, Morteza Saheb Zamani

Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran

a r t i c l e i n f o

Article history:

Received 19 August 2009

Received in revised form

10 February 2010

Accepted 22 February 2010Available online 11 March 2010

Keywords:

Quantum computing

Physical design

Physical synthesis

Gate exchanging

92/$ - see front matter & 2010 Elsevier Ltd. A

016/j.mejo.2010.02.005

esponding author. +98 21 64545112; fax: +9

ail addresses: [email protected] (N.

[email protected] (M. Sedighi), [email protected] (M

a b s t r a c t

Quantum circuit design flow consists of two main tasks: synthesis and physical design. In the current

flows, two procedures are performed subsequently and without any information sharing between two

processes that can limit the optimization of the quantum circuit metrics; synthesis converts the design

description into a technology-dependent netlist and then physical design takes the fixed netlist,

produces the layout, and schedules the netlist on the layout. To address the limitations imposed on

optimization of the quantum circuit objectives because of no information sharing between synthesis

and physical design processes, in this paper we introduce physical synthesis concept in quantum

circuits to improve the objectives by manipulating layout or netlist locally considering layout

information. We propose a technique for physical synthesis in quantum circuits using gate-exchanging

heuristic to improve the latency of quantum circuits. Moreover, a physical design flow enhanced by the

technique is proposed. Our experimental results show that the proposed physical design flow

empowered by the gate exchanging technique decreases the average latency objective of quantum

circuits by about 24% for the attempted benchmarks.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

As the transistor size continues to shrink to atomic scales,Moore’s law confronts the small-scale limitation that preventswires from being made thinner than atoms [1]. On the other hand,as the quantum regime is approached, quantum effects becomeincreasingly significant. For example, in a system where one bit isencoded as the presence or the absence of an electron, since thelocation of the electron is not known very precisely, based onthe Heisenberg uncertainty principle, its momentum cannot bedetermined with high accuracy [1]. Since there is no reasonablebound on the electron’s momentum, a large potential is needed tokeep it in place, and significant energy should be expended duringthe logic switching [1]. Experts of NCSU, SRC and Intel [2] use theresults of quantitative analysis of these phenomena to extractfundamental limitations on the scalability of any computingdevice that moves electrons.

Although these quantum effects are great barriers in classicalCMOS progress, they provide a radically different form ofcomputation [3]. Theoretically, quantum computers, computersusing the quantum effects, could outperform their classical

ll rights reserved.

8 21 66495521.

Mohammadzadeh),

. Saheb Zamani).

counterparts when solving certain problems. Factorization [4],unsorted database search [5], and the simulation of quantummechanical systems [6] are some classically hard problems thatbenefit from quantum algorithms. For example, successful large-scale implementation of Shor’s integer factorization [4] can havedeep effect on the RSA cryptosystem used in the electroniccommerce. Quantum computing may also be used for public-keycryptography [7]. MagiQ Technologies [8] and IdQuantique [9]have built such cryptography systems based on the single-photoncommunication.

A quantum algorithm requires a quantum circuit for successfulimplementation. In a large picture view, the quantum circuitdesign flow can be divided into two processes: synthesis andphysical design (Fig. 1). The synthesis process takes a descriptionand generates a technology-dependent netlist. The physicaldesign process creates a specific layout of the circuit based onthe target technology. Even though it might seem that taking thelayout information into consideration during the synthesisprocess or the integration of two processes into one monolithicprocess can potentially lead to a better final layout, the synthesisand physical design processes are normally done separatelyto avoid increasing the complexity of the process to anunmanageable level [17].

The CMOS design flow was similar to the above-mentionedprocess until the physical synthesis, the integration of logicsynthesis with the physical design information, was born in the

www.elsevier.com/locate/mejo

dx.doi.org/10.1016/j.mejo.2010.02.005

mailto:[email protected]



ARTICLE IN PRESS

N. Mohammadzadeh et al. / Microelectronics Journal 41 (2010) 219–230220

mid to late 1990s [10]. The physical synthesis deals with the localmanipulation of netlist or layout considering the layout informa-tion to improve the objectives or meet the design constraints.Such an approach can also be useful in quantum circuit design,but the physical synthesis techniques proposed in the classicalCMOS design are not directly applicable to the quantum circuitdesign because of the fundamental differences between CMOSand quantum technologies. Therefore, new physical synthesistechniques should be proposed for quantum circuits.

Focusing on this issue, this paper brings the physical synthesisidea to the quantum design flow and proposes a technique forphysical synthesis in quantum circuits. The proposed techniquetakes a technology-dependent netlist and layout and manipulatesthe netlist locally considering layout information to reach animproved netlist with a lower latency. Moreover, a novel physicaldesign flow embedding the proposed physical synthesis techni-que is suggested.

The rest of this paper is organized as follows: an overview ofthe prior work is presented in Section 2, followed by anintroduction to the ion trap technology in Section 3. In Section4, the physical synthesis concept and the details of gateexchanging technique are discussed. Section 5 contains thedetails of our physical design flow. In Section 6, the proposedtechnique and flow are illustrated by an example. Section 7 showsthe experimental results, and Section 8 concludes the paper.

2. Related work

Despite the significant work on quantum algorithms andunderlying physics, few studies explored the quantum circuit

Synthesis

Technology-DependentNetlist

Physical Design

Scheduled Layout

High-Level Description

Fig. 1. Current quantum circuit design flow.

Fig. 2. (a) Physical layout demonstrated for a T-junction (three-way intersection

ThreeWayIntersection macroblocks shown in Fig. 3. (c) MEMS mirrors placed above th

design flow. Svore et al. [11,12] proposed a design flow that startswith a quantum program and generates its correspondingphysical operations. Their work outlined various file formatsand provided initial implementations of some of the necessarytools. Their design flow, which has four phases, converts ahigh-level program specified in the mathematical abstractionsof quantum mechanics and linear algebra into a low-levelset of machine instructions scheduled on a fixed H-tree-basedlayout [12].

Similarly, Balensiefer et al. [13,14] proposed a design flow.Their flow takes a quantum description in QCL [15] andsynthesizes it to a technology-dependent netlist. In the physicaldesign phase, the flow schedules the generated netlist on a fixedlayout by the list-scheduling algorithm [16].

Whitney et al. [17] also suggested a quantum design flow thattakes a description and generates its layout in ion trap technology.They proposed new heuristics for the layout generation andscheduling. Their physical design stage includes laying out andscheduling a fixed netlist.

Additionally, hand-optimized layouts have been proposed inthe literature. Metodi et al. [18] proposed a uniform QuantumLogic Array architecture, and extended and improved it later in[19]. The focus of the work was on the architectural research andthe details of physical layout or scheduling were not explored. Thesame group later [20] developed a tool to automatically generatea physical operations schedule, given a quantum circuit and afixed grid-based layout structure.

All the works proposing a quantum design flow perform thesynthesis and physical design processes separately. The algo-rithms proposed for physical design take a fixed netlist and lay itout. While the approach proposed in this paper builds upon someof these ideas, its concentration is on bringing physical synthesisinto quantum design flow and proposing a physical design flowconsidering the technique.

3. Technology abstraction

Ion trap technology [21,22] was chosen as the underlyingtechnology to study the proposed flow. Trapped ions have showngood potential for scalability [23]. In this technology, a physicalqubit is an ion, and a gate is a location where a trapped ion may beoperated upon by a modulated laser. Pulse sequences applied todiscrete electrodes on the edges of ion traps cause the ion to betrapped or ballistically moved between traps. Fig. 2a shows alayout that was experimentally demonstrated for a three-wayintersection [24].

In this paper, the library of macroblocks defined in [17] areused for two reasons. First, by using the macroblocks, some of

). (b) Abstraction of the circuit in (a), built using the StraightChannel and

e ion traps plane guide the laser beams to gate locations [17].

ARTICLE IN PRESS

N. Mohammadzadeh et al. / Microelectronics Journal 41 (2010) 219–230 221

the low-level details can be removed and the analyses can beinsulated from the variations in the technology implementationsof ion traps. Details such as ion species, electrode sizing andgeometry, and exact voltage levels necessary for trapping andmoving ions are all summarized within the macroblocks.Secondly, a carefully timed application of pulse sequences toelectrodes in non-adjacent traps is required for ballistic move-ment along a channel. Using basic blocks consisting of a few iontraps has the benefit that building an interface between the basicblocks requires communication only between the two blocksinvolved.

Fig. 3 shows the library defined in [17]. As this figure shows,each macroblock consists of a 3�3 structure of trap regions andelectrodes with some ports to allow qubit movement between themacroblocks. The black squares are gate locations, which may notbe performed at intersections or turns in the ion trap technology.Different orientations of each of these macroblocks can be used ina layout. Fig. 2 shows a possible mapping of a demonstratedlayout (Fig. 2a) to macroblock abstractions (Fig. 2b). As Fig. 2cshows, the laser pulses are guided to the gate locations by anarray of MEMS mirrors located above the ion trap plane in orderto apply quantum gates [25].

Some key characteristics of ion trap technology can besummarized as follows:

�

Figma

Rectangular channels lined with electrodes make ‘‘wires’’ inion traps. Atomic ions can be suspended above the channelregions and moved ballistically [26]. The synchronized appli-cation of voltages on the channel electrodes causes qubits tomove ballistically. Therefore, the movement control circuitry isrequired for each wire to handle any qubit communication.
� Any operation available in the ion trap technology can be
performed at each gate location. This makes it possible toreuse gate locations within a quantum circuit.
� Fabrication and control of ion traps in the third dimension is
difficult. Thus, scalable ion trap systems are two-dimensional[24]. This imposes a restriction on ion crossings, i.e. all ioncrossings must be intersections.
� Multiple ions may share any routing channel as long as control
circuits prevent multi-ion occupancy.
�
1ψ

g1 g2ZZU

g0

Q2

Q1

Q0g3 g4

X

Y

2ψ 3ψ0ψ

Fig. 4. A circuit with controlled gate.

Aside from Manhattan distance, the geometry of the wirechannel is also important in the calculation of movementlatency of ions. Experiments have shown that a right angleturn takes substantially longer than a straight channel over thesame distance [26].

4. Physical synthesis of quantum circuits

Integrating the synthesis and physical design processes intoone monolithic process cannot be done because of unmanageablecomplexity of the problem [17]. On the other hand, doing themseparately and without any information sharing between twoprocesses can limit the optimization. An intermediate solution isconceivable that changes layout and/or netlist locally considering

. 3. Library of basic macroblocks used in this paper. Ports (P0–P3) and electrodes

croblocks contain a trap region where gates may be performed (black square) [17].

the layout information to improve the metrics or meet the designconstraints. Prior physical design approaches do not use this idea.They take a fixed netlist and generate a layout without themanipulation of the input netlist. This style of physical designcauses the improvements to be limited because of fixing netlistafter synthesis process while some local netlist and/or layoutmanipulations considering the physical layout information couldimprove circuit metrics. This idea is known as physical synthesis inclassical CMOS design [10]. Gate sizing, buffer insertion, and wiresizing are the techniques proposed for the physical synthesis inclassical CMOS design [10]. These techniques are not applicable toquantum designs, but the general idea can be used to improvequantum circuits. To address this issue, in this section a physicalsynthesis technique for quantum circuits is introduced thatexchanges the gates after layout generation to improve thelatency of quantum circuit execution. It is important to note thatthe initial and the modified netlists both have the samefunctionality and synthesis cost, defined as the number of gatesor circuit depth [27]. Therefore, the existing synthesis algorithmscannot prefer one to another in terms of latency. In other words,in our technique post-layout information is used to properlyexchange gates.

4.1. Gate exchanging technique

Definition 1. An important set of quantum gates is controlledgates. A controlled gate has a control set and a target. The controlset includes qubits that determine whether function U is to beapplied to the target qubit. Fig. 4 shows some controlled gates. Forexample, the control set of gate g0 includes only qubit Q1. Qubit Q0

is the target qubit of gate g0. If Q1 is ‘1’ function U is applied to Q0.

Definition 2. We call two controlled gates ‘‘of the same type’’if their corresponding function is the same. For instance, in Fig. 4,g1 and g2 are of the same type.

Definition 3. We call two gates in a circuit consecutive if theirlevels in the dataflow graph of the circuit are consecutive. Forexample, in Fig. 4, gates g1 and g2 are consecutive and their levelsare two and three, respectively.

Lemma 1. Assume that gates A and B are two consecutive controlled

gates such as controlled-NOT, controlled-Z, etc. and gate A has control

of each marcoblock make it possible for the ion to be moved and trapped. Some

ARTICLE IN PRESS


set CA and target TA and gate B have control set CB and target TB.

These two gates can exchange if, and only if, one of the following

conditions is satisfied:

1.
A and B are of the same type and TAeCB and TBeCA
2.
A and B are of different types and TAeCB and TBeCA and TBaTA.
For example, in Fig. 4, since gates g1 and g2 meet the firstcondition of Lemma 1, they can be exchanged. Gates g3 and g4

satisfy the second condition of Lemma 1. Therefore, they can alsobe exchanged. The general idea of gate exchanging heuristic is todetermine the proper order of two exchangeable gates based onlayout information. The proper order of two exchangeable gatescan decrease the latency of a quantum circuit.

5. Our physical design flow

In order to embed the proposed physical synthesis techniquein the design flow, a novel physical design flow is proposed in thissection. Our physical design flow is shown in Fig. 5. The flowstarts with an initial layout generation step, followed by anoptimization loop applying our physical synthesis technique.After generating an initial layout, in the first step of theoptimization loop, the netlist is parsed to find an exchangingpair of gates following the condition of Lemma 1. If such a pair isfound, the gates are tentatively exchanged and the new netlist isevaluated by updating the scheduling of the circuit to reflect thegate exchanging effect on the latency of the circuit execution.Since the optimization loop may be iterated many times andupdate-scheduling procedure is located on the critical path of theflow, the updating algorithm should be as quick as possible. Wefollow a greedy approach in applying our optimization technique.In other words, if one exchanging increases the latency, it isrejected and the optimization loop continues with anotherexchanging pair; otherwise, the exchanging is accepted and thenetlist is modified. The optimization loop continues until there isno pair for exchanging.

Scheduled Layout (

Update Scheduling

Placement & Routing (5

Scheduling (5.1.2

Is Any More GateExchanging Possible?

Tentative Gate ExchaYES

Is Latency Improv

Accept this Gate Exchaand Change the Netl

YES

NOReject thisGate Exchanging

Technology-Dependen

GATE-EXCHANGING TECHNIQUE PHYSICAL SYNTHESIS

Fig. 5. The proposed ph

Once the optimization is complete and layout and netlist arefinalized, the classical control should be extracted. The classicalcontrol system is responsible for executing the quantum circuiton the layout. This includes determining where and when gateoperations occur as well as managing and tracking every qubit inthe system. In the rest of this section, the details of main buildingboxes of the proposed flow are discussed.

5.1. Scheduled layout generation

The first part of the proposed flow in Fig. 5, ‘‘scheduled layout’’,takes the netlist and generates an initial layout through aniterative process. This process has two subprocesses, which areperformed subsequently in a loop to generate a better scheduledlayout. In the first subsection of this section, the placement androuting heuristic are described and the second one explains theinstruction scheduling approach used in this paper.

5.1.1. Placement and routing [17]

In this paper, the dataflow-based layout generation algorithmproposed in [17] is used to place and route a circuit. Thisalgorithm claims to offer the best latency by taking a technology-dependent netlist and generating a layout comprised of themacroblocks described in Section 3. The algorithm starts withcreating dataflow graph of the circuit. In the dataflow graph, eachnode represents an instruction and each arc represents a qubitdependency (see Fig. 6). In the next step, gate locations are placedin chronological order in the dataflow graph. Because this style ofplacement may waste space due to the uneven column sizes,a folding operation is performed. The folding operation joins ashort column with the previous column in order to fill out therectangular bounding box of the layout as much as possible anddecrease area. Then, the columns are sorted to set the gatelocations that need to be connected roughly horizontal to oneanother. After placing the gate locations, channels are routed toreflect dataflow edges. Since the initial layout has too manygate locations, the dataflow graph is collapsed using schedulerfeedback. The algorithm identifies latencies of critical edges by

5.1)

(5.2)

NO

.1.1)

)

nging

ed?

ngingist

Classical Control Extraction

t Netlist

ysical design flow.

ARTICLE IN PRESS

A B C D

EF

G

H

R

PG= DTQG+(2M+2T)

PF= 2DTQG+(5M+4T)

PD= 3DTQG+(7M+5T)PA= 3DTQG+(7M+6T) PB= 3DTQG+(6M+6T) PC= 3DTQG+(5M+4T)

0

PG= 2DTQG+(4M+4T)

PR= 3DTQG+DSQG(7M+6T) + ZP

TQG

SQG

p

F, C

G

BD

A

E, H

Fig. 6. (a) Circuit netlist [H: hadamard operation, CNOT: controlled bit-flip operation], (b) circuit representation, (c) data-flow graph and (d) Generated layout by the

dataflow-based algorithm [17].


using scheduler feedback and merges the two nodes connected byedge with the longest latency on the critical path. All instructionswithin a merged group are executed at a single gate location. Thisnew group graph is then placed, routed and scheduled again tofind the next pair of node groups to merge and this merging andplacing and routing procedure continues until a point is reachedwhere congestion at some heavily merged node group is actuallyhurting the latency with each further merge.

5.1.2. Instruction scheduling

The runtime execution order of the instructions is determinedby the instruction issue logic. The instruction issue logic involvesboth preprocessing and online scheduling. First, the instructionsequence is preprocessed to assign priorities that will help duringscheduling. The priority of an instruction is based on the length ofits critical path to the end of dataflow graph. Since the gatelocations are known in advance, the movement latencies can beincorporated in the prioritization of the instruction sequence.In other words, movement latencies can be considered as well asgate delays in the assignment of priorities to instructions during

preprocessing. This gives a better approximation of each qubit’scritical path. The scheduling used in this paper is similar to themethod used in [20], but it uses critical path with gate andmovement latencies to set the priority of a gate rather than thesize of the dependent subtree to that gate. The instructionsequence is traversed from the beginning to the end and instruc-tions are scheduled as soon as the dependencies allow.

The scheduler implements a greedy scheduling technique.It maintains a list of instructions that has all their dependenciesfulfilled and therefore is ready to be executed. Among the readyinstructions, the instruction with the highest priority will berun and is more likely to gain access to the resources it needs.These contested resources include both gates and channels/intersections. Once all the possible instructions are scheduled,time advances until one or more resources are freed and moreinstructions can be scheduled. This scheduling process continuesuntil the full instruction sequence is executed.

It is worth noting that the proposed flow uses schedulinginformation to decide whether to accept or reject exchanges. Inother words, our technique is not stuck at scheduling method andit has its advantage over different scheduling schemes. In other

ARTICLE IN PRESS


words, even if we use a scheduling method resulting in the bestlatency, our technique can still improve the latency. However,since exhaustive scheduling is impractical for large circuits, weuse a greedy heuristic to schedule operations.

5.2. Update scheduling

The scheduling information is used to accept or reject anexchanging operation. Therefore, it should be done in each itera-tion of the optimization loop. However, performing a completescheduling in each iteration can dramatically increase total run-time for large netlists. Considering that, since the gate exchangingoften modifies a small part of the netlist tree, there may not bea need for performing scheduling completely in each iteration.Focusing on this fact, in the proposed flow, the scheduling isincrementally updated in each iteration of the optimization loop.This decreases the runtime of each iteration and therefore leadsto overall runtime reduction.

The scheduler selects the operations based on their depen-dencies and priorities. The gate exchanging process may changethe priorities of the gates. Therefore, the update-schedulingoperation must modify the priorities of the exchanged nodesand propagate the effects of these changes to the nodes locatedhigher than the nodes in the dataflow graph. The priorities ofexchanged nodes are calculated based on priorities of nodeslocated lower than them in the dataflow graph. Then, this modi-fication is propagated to the nodes higher than the exchangedgates and their priorities are updated. The propagation continuesup to the root of the dataflow graph (i.e., a dummy node with thefirst level gates as its children).

6. An example

In this section, an example is given to illustrate the physicalsynthesis flow. Fig. 6a shows a QASM [28] instruction sequence

q3 q0

q1q2

Phase IA, B, C, DLPI = DSQG

Phase IE,F

LPII = 3M + 2T

Phase IVH

LPIV = 2M + 2T + DTQG

LTotal = Zp +8M

Fig. 7. The latency calculation detail

operating on qubits q0, q1, q2 and q3, with each instruction labeledwith a letter. Fig. 6b shows the equivalent quantum circuit. Thedataflow graph of the circuit is shown in Fig. 6c. Fig. 6d shows thelayout generated for the netlist by the dataflow-based algorithmdescribed in Section 5.1.1. To make the analysis independent ofthe variations in the delays of ion movements and gates, thelatency is calculated parametrically in terms of the latencies ofphysical operations (i.e. one-qubit gate delay (DSQG), two-qubitgate delay (DTQG), straight movement latency (M), zero prepara-tion time (ZP), and turn operation latency (T)). The label of eachedge shows the minimum delay between two nodes. The label ofeach node represents its delay from the end of the tree and is usedas the node’s priority. The label zero on the edge between E and H

and between C and F implies that they will execute in the samegate location. The node R is a dummy node used as the root of thecircuit dataflow graph with nodes A, B, C, D as its children.

Initially, the latency of this circuit is calculated in four phasesas shown in Fig. 7. In the first phase, gates A, B, C, and D can beexecuted simultaneously. In the second phase, gates E and F areexecuted at the same time. In the third phase, only gate G can beexecuted. Finally, in the fourth phase, gate H is executed. The totallatency (LTotal) can be calculated by adding the latencies of thephases together.

Then the gate pairs that can be exchanged are determined.Since gates H and G are of the same type and target qubit of one isnot in the control set of the other, they can be exchanged based onLemma 1. If gates H and G are exchanged, the circuit changes tothe one shown in Fig. 8a. After exchanging, the update schedulingstage updates the nodes’ priorities and edges. If the priority of theroot node increases after exchanging, the operation is denied;otherwise it is accepted. The priorities change as in Fig. 9. Thepriority of the root has been decreased and therefore theexchanging is accepted and the netlist is updated. Since there isno unattempted exchange, the optimization process is complete.The latency calculations for phases I and II for the updated netlistare the same as those for the initial netlist but they are different

q2q3 q0q1

I

+ DTQG

Phase IIIG

LPIII = 3M + 2T + DTQG

+ 6T + 3DTQG + DSQG

s for the netlist shown in Fig. 6.

ARTICLE IN PRESS


for phases III and IV. The details of the delay calculations for thesephases are shown in Fig. 8. As the calculated latencies show, theapplied gate exchanging physical synthesis technique hasimproved the latency of the circuit by 2 T.

7. Experimental results

Relatively high error rates of operations in a quantumcomputer necessitate the heavy encodings of qubits [1]. As such,this paper focuses on the encoding circuits (useful for bothdata and ancillae) and error correction circuits to experiment with

H

H

H

H

q0

q1

q2

q3

q2q3 q

Phase III H

LPIII= 3M + DTQ

q1

LTotal = Zp + 8M + 4T

Fig. 8. Phases III and IV fo

A B

E

G

R

2M+2T2M+

2T

3M+2T2M+2T

PA= 3DTQG +(5M+4T) PB= 3DTQG +(4M+4T)

0

0 00

PE= 2DTQG +(2M+2T)

PG= 0

Fig. 9. The circuit DFG afte

Table 1List of the benchmarks, with quantum gate count and number of qubits processed in t

# Circuit name [30] Qubit count Gate count

1 [[7, 1, 3]] L1 encode 7 18

2 [[9,1,3]] Bacon-Shor encode [32] 9 16

3 [[10,3,3]] L1 encode 10 44

4 [[11,1,5]] L1 encode 11 47

5 [[13,1,3]] Surface encode [33] 13 32

6 [[13,1,5]] L1 encode 13 64

7 [[16,3,5]] L1 encode 16 89

8 [[18,1,7]] L1 encode 18 102

9 [[21,1,7]] L1 encode 21 140

the proposed approach. A number of error correction andencoding circuits are used to evaluate the effectiveness of theproposed flow.

Latency is an important metric for the error encoding circuits[29]. A high latency circuit could introduce non-trivial errors dueto increased qubit idle time. On the other hand, correction circuitsare much more latency-dependent since they are on the criticalpath for the processing of data qubit blocks [29].

The proposed flow was evaluated on the benchmarks shownin Table 1. Error probabilities and physical latencies shown inTable 2 are used for the gates and for the two types of moveoperations in ion trap technology [31]. The benchmarks include

0

G

q3 q2

q1

q0

Phase IV G

LPIV = 2M + 2T + DTQG

+ 3DTQG + DSQG

r netlist shown in (a).

C D

F

H PH= DTQG +(2M+2T)

PF= 2DTQG + 5M + 2T

3M

PD= 3DTQG +(7M+4T)PC= 3DTQG + (5M + 2T)

2M+T0

0PR= 3DTQG + DSQG +(5M+4T) + ZP

r exchanging H and G.

he circuit.

# Circuit name [30] Qubit count Gate count

10 [[24,3,7]] L1 encode 24 205

11 [[25,1,9]] L1 encode 25 168

12 [[27,1,9]] L1 encode 27 244

13 [[30,20,4]] L1 encode 30 411

14 [[31,11,6]] L1 encode 31 339

15 [[33,1,9]] L1 encode 33 316

16 [[35,1,10]] L1 encode 35 389

17 [[36,7,6]] L1 encode 36 395

18 [[40,3,10]] L1 encode 40 483

ARTICLE IN PRESS


controlled-Z, controlled-X, controlled-Y, and Hadamard gates thatare realizable in ion trap technology.

Table 3 shows the latency of the benchmark circuits resultingfrom the prior physical design flow [17] and our physical designflow enhanced by the gate-exchanging physical design technique.The column ‘‘# of tentative exchanges’’ contains the number ofexchanges that can be done and the column ‘‘# of accepted

exchanges’’ includes the number of exchanges that have beendone. The latency of circuits obtained by prior physical designflow and ours are shown in the columns ‘‘prior physical design

flow’’ and ‘‘our physical design flow’’, respectively.The column ‘‘Improvement’’ shows the latency improvement

resulted from the physical synthesis approach proposed in thispaper. As can be seen, a considerable improvement of 23.64% (onaverage) has been achieved in the latency of the benchmarks. Theresults of Table 3 are summarized in Fig. 10 that shows thelatency reduction of our physical design flow using physicalsynthesis compared to the prior physical design flow. Thebehavior of latency reduction for the benchmarks is depicted inFig. 11. The horizontal and vertical lines show the number ofaccepted exchanges and the resulted latency, respectively. Eachpoint on the line of each benchmark shows the latency after thenth accepted exchange. Since we follow a greedy way, theaccepted exchanges always decrease the latency. In other words,we accept and apply only those exchanges that improve thelatency. The optimization loop continues until there is nounattempted exchange.

Table 2The error probabilities and latency values for various physical operations in ion

trap technology [31].

Physical operation Latency symbol Latency (ls) Error

One-Qubit Gate t1q 1 10�6

Two-Qubit Gate t2q 10 10�6

Measurement tmeas 50 10�6

Zero prepare tprep 51 10�6

Straight move tmove 1 10�8

Turn tturn 10 10�8

Table 3The latency of the benchmark circuits achieved by prior physical design flow and ours

# Circuit name [30] # Tentativeexchanges

# Acexch

1 [[7, 1, 3]] L1 encode 3 1

2 [[9,1,3]] Bacon-Shor encode 2 1

3 [[10,3,3]] L1 encode 16 6

4 [[11,1,5]] L1 encode 9 3

5 [[13,1,3]] Surface encode 9 2

6 [[13,1,5]] L1 encode 10 6

7 [[16,3,5]] L1 encode 28 5

8 [[18,1,7]] L1 encode 18 5

9 [[21,1,7]] L1 encode 32 20

10 [[24,3,7]] L1 encode 38 11

11 [[25,1,9]] L1 encode 27 21

12 [[27,1,9]] L1 encode 98 41

13 [[30,20,4]] L1 encode 263 73

14 [[31,11,6]] L1 encode 350 66

15 [[33,1,9]] L1 encode 150 48

16 [[35,1,10]] L1 encode 96 21

17 [[36,7,6]] L1 encode 118 46

18 [[40,3,10]] L1 encode 500 73

Average

7.1. Heuristic algorithm analysis

As stated before, we follow a greedy approach to accept orreject one exchange. In other words, the gate exchnagesincreasing the latency are rejected. To examine the impact ofapplying other heuristics on the result, we used simulatedannealing (SA) heurisitc [34] in accepting or rejecting exchanges.Table 4 shows the results of using the heuristic. The column ‘‘Our

approach based on SA’’ under ‘‘latency’’ shows the latency obtainedby our physical design flow when we substitute simulatedannealing heuristic for our greedy approach. The columns ‘‘Our

approach based on greedy’’ and ‘‘Our Approach based on SA’’ under‘‘Runtime’’ show the runtimes of our physical design flow usingsimulated annealing approach and greedy approach, respectively.The column ‘‘SA/greedy ratio’’ under ‘‘Latency’’ contains the ratio ofthe latency obtained by simulated annealing to that achieved byour greedy approach. The last column includes the ratio of theruntime of the flow based on simulated annealing approach tothat based on our greedy approach. It can be observed from thetable that simulated annealing has provided slightly better resultsthan the greedy approach in most cases. However, on average,the runtime of simulated annealing is almost 13 times longer.This observation might suggest that while various heuristics mayprovide slightly different results, it is the execution time thatvaries the most among them. In other words, it appears that theexecution time is the determining factor in choosing among theheuristic approaches. Based on this, we have chosen the greedyapproach for the remainder of this paper. Fig. 12 depicts thebehavior of the latency obtained by the two approaches.

7.2. Error analysis

To evaluate the proposed technique in terms of reliability, weuse critical error path calculation proposed in [31]. The criticalerror path is the sequence of qubit interactions that introducesthe highest error into the circuit, in a way similar to the criticallatency path through a circuit.

Fig. 13 illustrates the process of estimating the critical errorpath. It uses a simple but effective model of a complicated error

.

ceptedanges

Latency (ls) Improvement(%)

Prior physicaldesign flow [17]

Our physicaldesign flow

331 312 6.1

207 177 16.9

960 800 20

842 728 15.7

504 456 10.5

1281 1085 18.1

1757 1571 11.8

1612 1417 13.8

3068 2245 36.7

4587 4058 13

4491 3527 27.3

5687 4079 39.4

8626 6552 31.7

7362 5120 43.8

9026 6862 31.5

7347 6540 12.3

9805 7397 32.6

11405 7900 44.4

23.64

ARTICLE IN PRESS

0

2000

4000

6000

8000

10000

12000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73

[[40,3,10]] L1 encode

[[36,7,6]] L1 encode

[[35,1,10]] L1 encode

[[33,1,9]] L1 encode

[[31,11,6]] L1 encode

[[30,20,4]] L1 encode

[[27,1,9]] L1 encode

[[25,1,9]] L1 encode

[[24,3,7]] L1 encode

[[21,1,7]] L1 encode

[[18,1,7]] L1 encode

[[16,3,5]] L1 encode

[[13,1,5]] L1 encode

[[13,1,3]] L1 encode

[[11,1,5]] L1 encode

[[9,1,3]] L1 encode

[[10,3,3]] L1 encode

[[7, 1, 3]] L1 encode

Lat

ency

(µs)

Number of Exchanges

Fig. 11. The behavior of the latency vs. gate exchanging number.

Fig. 10. The latency reduction achieved by our physical synthesis approach.


propagation process to estimate a parameter referred to as error

distance [31]. We use the method proposed in [31], but we alsoconsider other physical operations as well as gates to calculate thecritical error path. The model assumes that (1) each of gate,measurement, and zero prepare operations introduces one unit oferror, (2) each of straight and turn movements introduces 0.01unit of error, (3) all qubits interacting within a gate acquire themaximum error value out of those qubits. Error probabilitiesshown in Table 2 have been used to extract the error units ofphysical operations. Symbols S and T, respectively, show thenumber of straight and turn macroblocks that should be traversedby a qubit to reach the next gate location.

Table 5 shows the maximum error distance for the bench-marks before and after applying our approach. Maximum error

distance of circuits obtained by prior physical design flow andours are shown in the columns ‘‘prior physical design flow’’ and‘‘our physical design flow’’, respectively. As can be seen, a consider-able improvement of 22.8% (on average) has been achieved in themaximum error distance of the benchmarks.

7.3. Time complexity analysis

The time complexity of our physical synthesis technique canbe calculated as follows. Since we follow a greedy approach in ourflow, our algorithm examines each two nodes connected to anedge for an exchange. On the other hand, each permanentexchange modifies at most four edges. Therefore, the permanent

ARTICLE IN PRESS

Table 4The latency of the benchmark circuits achieved by using simulated annealing heuristic instead of our greedy approach in accepting or rejecting exchangesa

# Circuit name # Acceptedexchanges

Latency (ls) Run time (ms)b

Our approachbased ongreedy

Ourapproachbased on SA

SA/greedyratio

Our approachbased ongreedy

Ourapproachbased on SA

SA/greedyratio

1 [[7, 1, 3]] L1 encode 1 312 312 1 4583 4800 1.05

2 [[9,1,3]] Bacon-Shor 1 177 177 1 4780 4880 1.02

3 [[10,3,3]] L1 encode 10 800 760 0.95 5373 51974 9.67

4 [[11,1,5]] L1 encode 6 728 708 0.97 6424 56094 8.73

5 [[13,1,3]] Surface 5 456 435 0.95 7928 44321 5.59

6 [[13,1,5]] L1 encode 7 1085 1093 1.01 7459 71409 9.57

7 [[16,3,5]] L1 encode 15 1571 1460 0.93 10171 99605 9.79

8 [[18,1,7]] L1 encode 12 1417 1432 1.01 8962 119153 13.26

9 [[21,1,7]] L1 encode 25 2245 2134 0.95 11817 191076 16.17

10 [[24,3,7]] L1 encode 22 4058 3603 0.89 13404 300889 22.45

11 [[25,1,9]] L1 encode 23 3527 3586 1.02 10795 256987 23.81

12 [[27,1,9]] L1 encode 64 4079 3908 0.96 16542 303465 18.35

13 [[30,20,4]] L1 encode 150 6552 6383 0.97 25410 352156 13.86

14 [[31,11,6]] L1 encode 108 5120 4997 0.98 18428 334475 18.15

15 [[33,1,9]] L1 encode 141 6862 6608 0.96 18196 315567 17.34

16 [[35,1,10]] L1 encode 57 6540 6120 0.94 23909 348678 14.58

17 [[36,7,6]] L1 encode 74 7397 7106 0.96 24121 349876 14.5

18 [[40,3,10]] L1 encode 134 7900 7698 0.97 33620 377653 11.23

Average 0.97 12.73

a All results of this section are obtained on a 3 GHz Pentium IV with 1 gigabyte of memory.b As calculated by ‘‘Rational Quantify’’ suit [35].

0

2000

4000

6000

8000

10000

12000Initial Latency

After Gate Exchanging Greedy Approach

After Gate Exchanging SA Approach

Lat

ency

(µs)

Fig. 12. The behavior of the latency obtained by using two different approaches in accepting or rejecting of exchanges.


exchanges add a factor equal to ‘‘4*number of permanent

exchanges’’ to the total number of edges that should be checked.This, in the worst case, is of order of number of edges.

The other factor in the time complexity is the runtime of theupdate-scheduling process in each iteration. As stated in Section5, the scheduling result should be updated for each tentativeexchange. The main part of the runtime of update-schedulingprocess is the runtime of updating the nodes’ priorities. Thenumber of steps for updating nodes priorities for each tentativeexchange is equal to the node’s depth. This is because when anode is exchanged with another, only the priorities of those nodes

located between it and the root in the dataflow graph need to beupdated. The upper bound of the node’s depth and the upperbound of the number of tentative exchanges are equal to thecircuit depth and the number of edges, respectively. Therefore,the overall time complexity of the overhead of our approach canbe calculated as

OðD� EÞ

where D and E are the number of gates. D is the upper bound ofcircuit depth and E is the upper bound of the number of edges.

ARTICLE IN PRESS

Table 5The maximum error distance of the benchmark circuits achieved by prior physical design flow and ours.

# Circuit name Max error distance Improve-ment (%)

# Circuit name Max error distance Improve-ment (%)

Prior physicaldesign flow[17]

Our physicaldesign flow

Prior physicaldesign flow[17]

Ourphysicaldesign flow

1 [[7, 1, 3]] L1 encode 9.22 8.11 13.7 10 [[24,3,7]] L1 encode 114.21 100.67 13.4

2 [[9,1,3]] Bacon-Shor 7.11 6.9 3.1 11 [[25,1,9]] L1 encode 112.08 87.86 27.6

3 [[10,3,3]] L1 encode 29.61 23.72 24.8 12 [[27,1,9]] L1 encode 146.21 104.49 40

4 [[11,1,5]] L1 encode 26.18 22.49 16.4 13 [[30,20,4]] L1 encode 269.59 200.79 34.3

5 [[13,1,3]] Surface 15.03 12.75 17.9 14 [[31,11,6]] L1 encode 196.2 145.3 35

6 [[13,1,5]] L1 encode 36.51 31.01 17.7 15 [[33,1,9]] L1 encode 212.48 168.09 26.4

7 [[16,3,5]] L1 encode 52.34 46.43 12.7 16 [[35,1,10]] L1 encode 176.79 145.44 21.6

8 [[18,1,7]] L1 encode 46.03 40.96 12.3 17 [[36,7,6]] L1 encode 235.21 180.15 30.6

9 [[21,1,7]] L1 encode 77.69 58.06 33.8 18 [[40,3,10]] L1 encode 251.54 195.46 28.7

Average¼22.8

Q0

Q1

Q2

# T = 2 # S = 10

# T = 0 # S = 0

2×0.01 (turn) + 10×0.01(straight) + 1(gate) = 1.12

Error Distance = 1.12# T = 0 # S = 0

# T = 4 # S = 20

1.12 + 4×0.01(turn) + 20×0.01(straight) + 1 (gate) = 2.36

# T = 0 # S = 0

# T = 0 # S = 0

# T = 1 # S = 10

Error Distance = 2.36

Error Distance = 1×0.01+10×0.01+2.36 = 2.47

Maximum Error Distance

Fig. 13. Simple model of counting errors.


8. Conclusion

In this paper, a new quantum physical design flow wasproposed based on the concept of physical synthesis that has beeninspired by a similar concept in CMOS design. A physicalsynthesis technique was also proposed, which modifies thecircuit netlist considering the layout information to improvelatency of quantum circuit execution. In the proposed technique,the gates that can be exchanged without changing the function-ality of the circuits are identified and layout information is used toevaluate their exchange in terms of circuit latency. The proposedflow was applied to a set of error encoding quantum circuits.Experimental results showed that the proposed physical synthesisflow could improve the latency of quantum circuits up to 43.8%for the attempted benchmarks.

We are working on new physical synthesis techniques andimproving the proposed design flow by adding new blocks inaddition to gate exchanging. Moreover, since previous approachesto determine the error tolerance of a quantum circuit are verycomputationally intensive and it may not be appropriate forcircuits with more than a few dozen gates [36], we are lookinginto ways to incorporate fault tolerance directly as a metric.

Acknowledgements

We would like to thank Prof. D. Wineland for his invaluabledeliberation about Ion Trap technology.

References

[1] M.A. Nielsen, I.L. Chuang, Quantum Computation and Quantum Computation,Cambridge University Press, 2000.

[2] V.V. Zhirnov, R.K. Cavin, J.A. Hutchby, G.I. Bourianoff, Limits to binary logicswitch scaling — a gedanken model, Proceedings of the IEEE 91 (11) (2003)1934.

[3] R.P. Feynman, Quantum mechanical computers, Foundations of Physics 16(1986) 507.

[4] P. Shor, Polynomial time algorithms for prime factorization and discretelogarithms on a quantum computer, SIAM Journal on Computing 26 (No. 5)(1997) 1484–1509.

[5] L. Grover, A Fast Quantum Mechanical Algorithm for Database Search, in:Proceeding of ACM Symposium on Theory of Computing, 1996, pp. 212–219.

[6] C. Zalka, Simulating quantum systems on a quantum computer, in:Proceeding of Mathematical, Physical and Engineering Sciences, 1998,pp. 313–322.

[7] C.H. Bennett, G. Brassard, Quantum cryptography: public-key distributionand coin tossing, in: Proceedings of IEEE International Conference onComputers, Systems, and Signal Processing, Bangalore, India, p. 175179, IEEEPress, 1984.

[8] /http://www.magiqtech.com/MagiQ/Home.htmlS.[9] /http://www.idquantique.com/S.

[10] C.J. Alpert, C. Chu, Physical synthesis comes of age, in: Proceedingsof International Conference on Computer-Aided Design (ICCAD), 2007,pp. 246–249.

[11] K. Svore, A. Aho, A. Cross, I. Chuang, I. Markov, A layered software architec-ture for quantum computing design tools, Computer 39 (No. 1) (2006) 74–83.

[12] K. Svore, A. Cross, A. Aho, I. Chuang, I. Markov, Toward a software architecturefor quantum computing design tools, in: Proceedings of the 2nd InternationalWorkshop on Quantum Programming Languages (QPL), 2004, pp. 145–162.

[13] S. Balensiefer, L. Kreger-Stickles, M. Oskin, QUALE: quantum architecturelayout evaluator, Proceedings of SPIE—The International Society for OpticalEngineering 5815 (2005) 103–114.

[14] S. Balensiefer, L. Kregor-Stickles,M. Oskin, An evaluation framework andinstruction set architecture for ion-trap based quantum micro-architectures,in: Proceedings of International Symposium on Computer Architecture(ISCA), 2005, pp. 186–196.

[15] B. Omer, Quantum programming in QCL, Master thesis, Technical Universityof Vienna, 2000.

[16] T. Yang, A. Gerasoulis, List scheduling with and without communicationdelays, Journal of Parallel Computing 19 (No. 12) (1993) 1321–1344.

[17] M. Whitney, N. Isailovic, Y. Patel, J. Kubiatowicz, Automated generation oflayout and control for quantum circuits, in: Proceeding of ComputingFrontiers, 2007, pp. 83–94.

[18] T. Metodi, D. Thaker, A. Cross, F. Chong, I. Chuang, A quantum logic arraymicroarchitecture: scalable quantum data movement and computation,

http://www.magiqtech.com/MagiQ/Home.html

http://www.idquantique.com/

ARTICLE IN PRESS


in: Proceedings of the 38th International Symposium on Microarchitecture(MICRO), 2005, pp. 305–318.

[19] D. Thaker, T. Metodi, A. Cross, I. Chuang, F. Chong, Quantum memoryhierarchies: efficient designs to match available parallelism in quantumcomputing, in: Proceedings of the 33rd International Symposium onComputer Architecture (ISCA), 2006, pp. 378–390.

[20] T. Metodi, D. Thaker, A. Cross, F. Chong, I. Chuang, Scheduling physicaloperations in a quantum information processor, Proceedings of SPIE Defenseand Security Symposium 6244 (2006) 62440T.

[21] J. Cirac, P. Zoller, Quantum computation with cold trapped ions, PhysicsReview Letters 74 (1995) 4091–4094.

[22] J. Cirac, P. Zoller, A scalable quantum computer with ions in an array ofmicrotraps, Nature 404 (2000) 578–581.

[23] D. Kielpinski, C. Monroe, D. Wineland, Architecture for a large-scale ion-trapquantum computer, Nature 417 (2002) 709–711.

[24] D. Hucul, M. Yeo, S. Olmschenk, C. Monroe, W. Hensinger, J. Rabchuk, On thetransport of atomic ions in linear and multidimensional ion trap arrays,Journal of Quantum Information and Computation 8 (No. 6) (2008)0501–0578.

[25] J. Kim, S. Pau, Z. Ma, H. McLellan, J. Gages, A. Kornblit, R. Slusher, Systemdesign for large-scale ion trap quantum information processor, Journal ofQuantum Information and Computation 5 (No. 7) (2005) 515–537.

[26] J. Chiaverini, R. Blakestad, J. Britton, J. Jost, C. Langer, D. Leibfried, R. Ozeri,D. Wineland, Surface-electrode architecture for ion-trap quantum

information processing, Journal of Quantum Information and Computation5 (No. 5) (2005) 419–439.

[27] M. Saeedi, N. Mohammadzadeh, M. sedighi, M. Saheb Zamani, Towards athorough set of metrics for quantum circuit synthesis, International Journalof Physics, January 2008, pp. 9–22.

[28] A. Cross, Synthesis and evaluation of fault-tolerant quantum computerarchitectures, Ph.D. Thesis, Massachusetts Institute of Technology, 2005.

[29] T. Metodi, F. Chong, Quantum Computing for Computer Architects, Morgan &Claypool Publishers, 2006.

[30] M. Grassl, Circuits for quantum error-correcting codes, Online available/http://iaks-www.ira.uka.de/home/grassl/QECC/index.htmlS, Accessed on2009-07-20.

[31] M. Whitney, N. Isailovic, Y. Patel, J. Kubiatowicz, A fault tolerant, area efficientarchitecture for Shor’s factoring algorithm, ISCA’09, 2009.

[32] D. Bacon, Operator quantum error correcting subsystems for self-correctingquantum memories, quant-ph/0506023, 2005.

[33] P. Aliferis, A. Cross, Subsystem fault-tolerance with the Bacon-Shorcode, Physical Review Letter, 98:220502, 2007, /http://www.arxiv.org/abs/quant-ph/0610063S.

[34] S. Kirkpatrick, C.D. Gelatt Jr., M.P. Vevvhi, Optimization by simulatedannealing, Science 220 (1983) 671–680.

[35] IBM Rational software, /www.ibm.com/software/rationalS.[36] P. Aliferis, D. Gottesman, J. Preskill, Quantum accuracy threshold for

concatenated distance-3 codes, Arxiv preprint quant-ph/0504218, 2005.

http://iaks-www.ira.uka.de/home/grassl/QECC/index.html

http://www.arxiv.org/abs/quant-ph/0610063

http://www.arxiv.org/abs/quant-ph/0610063

www.ibm.com/software/rational




Documents

Quantum physical synthesis: Improving physical design by netlist modifications