14
738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008 Power Grid Analysis and Optimization Using Algebraic Multigrid Cheng Zhuo, Student Member, IEEE, Jiang Hu, Senior Member, IEEE, Min Zhao, and Kangsheng Chen Abstract—This paper presents a class of power grid analy- sis and optimization techniques, all of which are based on the algebraic-multigrid (AMG) method. First, a new AMG-based re- duction scheme is proposed to improve the efficiency of reducing the problem size for power grid analysis and optimization. Next, with the proposed reduction technique, a fast transient-analysis method is developed and extended to an accurate solver with error control mechanism. After that, the scope of this method is further broadened for handling the analysis of the modified grid. Finally, a fast decap-allocation (DA) scheme based on AMG is suggested. Experimental results show that these techniques not only achieve a significant speedup over reported industrial methods but also enhance the quality of solutions. By using the proposed techniques, transient analysis with 200 time steps on a 1.6-M-node power grid can be completed in less than 5 min; dc analysis on the same circuit can reach an accuracy of 1 × 10 6 in about 141 s. Our DA can process a circuit with up to one million nodes in about 11 min. Index Terms—Capacitance, multigrid, optimization, power grid, simulation. I. I NTRODUCTION T HE TECHNOLOGY advance toward nanometer regime has brought the on-chip power integrity issues into the spotlight [1]. In modern very large scale integration (VLSI) design, a robust on-chip power supply network is an indis- pensable part of ensuring the system performance [2], [3]. A poorly designed power grid may easily lead to extra logic delays, signal integrity problems, and even functional failures. To overcome the increasing IR drop, electromigration, and simultaneous switching noise, a robust power-grid design is becoming more and more important [4]. For a high-performance chip, power-grid design is an itera- tive procedure [3]. From the early stage to the postlayout stage, it requires multiple iterations of planning, resource allocation, and refinement. To ensure the robustness of the design, it is necessary to have the following: 1) fast and accurate power grid Manuscript received October 31, 2006; revised March 30, 2007 and July 26, 2007. This work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education, Ministry of Education of China, under Grant 20060335065. This paper was recommended by Associate Editor A. Raghunathan. C. Zhuo was with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China. He is now with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]). J. Hu is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (e-mail: jianghu@ ece.tamu.edu). M. Zhao is with Magma Design Automation, Inc., Austin, TX 78759 USA (e-mail: [email protected]). K. Chen is with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (e-mail: chenks@ zju.edu.cn). Digital Object Identifier 10.1109/TCAD.2008.917587 analysis and 2) efficient power-grid optimization method [5]. The main challenge of power grid analysis or optimization is its huge size, typically in millions of nodes. Due to tremendous number of variables, using general-purpose simulator, such as SPICE, is no longer feasible in practice. Therefore, new techniques with high computational efficiency, in terms of both execution time and memory, are highly demanded. The challenge of huge problem size in power grid analysis and optimization resulted in many research works from both academia and industry [2], [3], [6]–[24]. Among them, some techniques can achieve a relatively low time/space complex- ity, such as the hierarchical macromodels [9], the random- walk-based method [10], and the 2-D/3-D transmission-line methods [12], [13]. Due to the intrinsic similarity between the power grid and the discretized structure of smooth partial differential equations, Kozhaya et al. [14] applies a geometric- based multigrid-like technique on power grid analysis. In order to further handle irregular power-grid structures, algebraic- multigrid (AMG)-based techniques are also developed in [15]–[17]. Each of the aforementioned methods aims at reduc- ing problem complexity to gain speedup with or without little accuracy loss. In power-grid design, decoupling capacitance (decap) is a very effective technique for suppressing transient noise. On- chip decap allocation (DA) is also a very difficult problem not only due to its huge size but also because of the nonlinear nature of its constraints. In [18], a charge-based model is developed to roughly estimate the decap size for each individual module. Other works [19]–[23] use adjoint sensitivity technique to guide the solution search in nonlinear optimization. Even though the number of transient simulations is greatly decreased by merged adjoint method [21], [22] or greedy search [22], [23], the sheer size of the problem still implies a huge computation cost and, therefore, needs to be reduced directly. In [23], the problem- size reduction is achieved by divide-and-conquer, assuming that the boundary voltages of each partition do not change during the decap optimization. The work of [24] uses a geometric- multigrid (GMG) technique [14], [25] to reduce the problem size, the effectiveness of which is mainly restricted to regular power grids [15], [16]. In this paper, we propose an AMG-based reduction for fast power grid analysis and decap optimization. Multigrid can re- duce the system size by pruning out a large number of variables. In VLSI design, its efficiency depends on the smoothness of the current distribution, the memory and runtime of keeping track of circuit geometry, and the complex procedure of deciding the interpolation operator. This paper addresses these problems in order to obtain a practical and efficient implementation of the AMG-based method for power grid analysis and optimization. The AMG-based method is very general and, therefore, can 0278-0070/$25.00 © 2008 IEEE

Power Grid Analysis and Optimization Using Algebraic Multigriddropzone.tamu.edu/~jhu/publications/ZhuoTCAD08.pdf · ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC

Embed Size (px)

Citation preview

738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

Power Grid Analysis and OptimizationUsing Algebraic Multigrid

Cheng Zhuo, Student Member, IEEE, Jiang Hu, Senior Member, IEEE, Min Zhao, and Kangsheng Chen

Abstract—This paper presents a class of power grid analy-sis and optimization techniques, all of which are based on thealgebraic-multigrid (AMG) method. First, a new AMG-based re-duction scheme is proposed to improve the efficiency of reducingthe problem size for power grid analysis and optimization. Next,with the proposed reduction technique, a fast transient-analysismethod is developed and extended to an accurate solver with errorcontrol mechanism. After that, the scope of this method is furtherbroadened for handling the analysis of the modified grid. Finally,a fast decap-allocation (DA) scheme based on AMG is suggested.Experimental results show that these techniques not only achievea significant speedup over reported industrial methods but alsoenhance the quality of solutions. By using the proposed techniques,transient analysis with 200 time steps on a 1.6-M-node power gridcan be completed in less than 5 min; dc analysis on the same circuitcan reach an accuracy of 1 × 10−6 in about 141 s. Our DA canprocess a circuit with up to one million nodes in about 11 min.

Index Terms—Capacitance, multigrid, optimization, powergrid, simulation.

I. INTRODUCTION

THE TECHNOLOGY advance toward nanometer regimehas brought the on-chip power integrity issues into the

spotlight [1]. In modern very large scale integration (VLSI)design, a robust on-chip power supply network is an indis-pensable part of ensuring the system performance [2], [3].A poorly designed power grid may easily lead to extra logicdelays, signal integrity problems, and even functional failures.To overcome the increasing IR drop, electromigration, andsimultaneous switching noise, a robust power-grid design isbecoming more and more important [4].

For a high-performance chip, power-grid design is an itera-tive procedure [3]. From the early stage to the postlayout stage,it requires multiple iterations of planning, resource allocation,and refinement. To ensure the robustness of the design, it isnecessary to have the following: 1) fast and accurate power grid

Manuscript received October 31, 2006; revised March 30, 2007 and July 26,2007. This work was supported by the Specialized Research Fund for theDoctoral Program of Higher Education, Ministry of Education of China,under Grant 20060335065. This paper was recommended by Associate EditorA. Raghunathan.

C. Zhuo was with the Department of Information Science and ElectronicEngineering, Zhejiang University, Hangzhou 310027, China. He is now withthe Department of Electrical Engineering and Computer Science, University ofMichigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]).

J. Hu is with the Department of Electrical and Computer Engineering,Texas A&M University, College Station, TX 77843 USA (e-mail: [email protected]).

M. Zhao is with Magma Design Automation, Inc., Austin, TX 78759 USA(e-mail: [email protected]).

K. Chen is with the Department of Information Science and ElectronicEngineering, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]).

Digital Object Identifier 10.1109/TCAD.2008.917587

analysis and 2) efficient power-grid optimization method [5].The main challenge of power grid analysis or optimization isits huge size, typically in millions of nodes. Due to tremendousnumber of variables, using general-purpose simulator, suchas SPICE, is no longer feasible in practice. Therefore, newtechniques with high computational efficiency, in terms of bothexecution time and memory, are highly demanded.

The challenge of huge problem size in power grid analysisand optimization resulted in many research works from bothacademia and industry [2], [3], [6]–[24]. Among them, sometechniques can achieve a relatively low time/space complex-ity, such as the hierarchical macromodels [9], the random-walk-based method [10], and the 2-D/3-D transmission-linemethods [12], [13]. Due to the intrinsic similarity betweenthe power grid and the discretized structure of smooth partialdifferential equations, Kozhaya et al. [14] applies a geometric-based multigrid-like technique on power grid analysis. In orderto further handle irregular power-grid structures, algebraic-multigrid (AMG)-based techniques are also developed in[15]–[17]. Each of the aforementioned methods aims at reduc-ing problem complexity to gain speedup with or without littleaccuracy loss.

In power-grid design, decoupling capacitance (decap) is avery effective technique for suppressing transient noise. On-chip decap allocation (DA) is also a very difficult problem notonly due to its huge size but also because of the nonlinear natureof its constraints. In [18], a charge-based model is developedto roughly estimate the decap size for each individual module.Other works [19]–[23] use adjoint sensitivity technique to guidethe solution search in nonlinear optimization. Even though thenumber of transient simulations is greatly decreased by mergedadjoint method [21], [22] or greedy search [22], [23], the sheersize of the problem still implies a huge computation cost and,therefore, needs to be reduced directly. In [23], the problem-size reduction is achieved by divide-and-conquer, assuming thatthe boundary voltages of each partition do not change duringthe decap optimization. The work of [24] uses a geometric-multigrid (GMG) technique [14], [25] to reduce the problemsize, the effectiveness of which is mainly restricted to regularpower grids [15], [16].

In this paper, we propose an AMG-based reduction for fastpower grid analysis and decap optimization. Multigrid can re-duce the system size by pruning out a large number of variables.In VLSI design, its efficiency depends on the smoothness of thecurrent distribution, the memory and runtime of keeping trackof circuit geometry, and the complex procedure of deciding theinterpolation operator. This paper addresses these problems inorder to obtain a practical and efficient implementation of theAMG-based method for power grid analysis and optimization.The AMG-based method is very general and, therefore, can

0278-0070/$25.00 © 2008 IEEE

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 739

be applied to any power grid—regular or irregular. It is alsovery flexible and can be easily combined with other techniquessuch as the conjugate-gradient (CG) method, the charge-basedtechnique [18], or the partitioning method [23] for furtherspeedup. Our approach is composed of the following steps.

1) Use a dynamic AMG-based reduction strategy to reducethe grid size.

2) Obtain the corresponding restriction and interpolationoperators.

3) Simulate or optimize on the coarsest grid.4) Map the solution back to the finest grid.

It can be seen that in our approach, the first two steps are thesame for both the power grid analysis and the decap optimiza-tion. By noting the inherent connection between the analysisand the optimization, we can reuse the information containedin analysis to gain a further speedup in optimization.

The rest of this paper is organized as follows. Section IIgives an overview of multigrid method. Next, in Section III,we provide a brief review of power-grid modeling. Followingthat, an improved AMG-based reduction scheme is discussed inSection IV. Additionally, several AMG-based methods forpower grid analysis are demonstrated in Section V. InSection VI, we present the algorithm that uses the AMG forDA. Section VII reports the performance of our algorithms witha set of benchmarks. Finally, we present concluding remarks inSection VIII. This paper is an extended description of the workin [17] and [26]. Similar as other literatures, we use h to indicatefine grid and H for coarse grid.

II. OVERVIEW OF MULTIGRID METHOD

Multigrid is a method to accelerate the convergence ofsolving differential equations numerically [25], [27]. The maincomponent of solving differential equations is to solve linearsystems by using iterative methods such as Gauss–Seidel. Theiterative methods can smooth out high-frequency errors rapidlybut are usually slow in removing low-frequency errors. Thebasic idea of multigrid is to use a projection of the fine-gridproblem on a coarser grid to remove the hard-to-damp low-frequency errors, which is called coarse-grid correction [25].The high-frequency errors are removed with those iterativetechniques, which is called smoothing [25]. The coarse-gridcorrection and smoothing work in complement to each other,i.e., the errors that are not damped by the smoothing will bedamped by the coarse-grid correction and vice versa.

The basic multigrid operations include the following:1) reduction which maps the problem to coarse level with

the restriction operator;2) interpolation which maps the problem back to fine level

with the interpolation operator;3) smoothing which uses iterative methods to smooth out

high-frequency errors;4) coarse-grid correction which eliminates low-frequency

errors on coarse grids.A classical multigrid solver repeatedly applies the aforemen-tioned operations, reduces the error in every step, and finallyconverges to the solution.

The multigrid method can be categorized to GMG and AMG.GMG is relatively straightforward that the reduction and inter-polation operations are based on predefined grid hierarchies of

the problem, preferably regular structures. Fixing the coarse-grid correction puts a more complex requirement on the choiceof smoothers to maintain the fast convergence [25], [27]. Incontrast, AMG does not require a predefined grid. Instead, itfixes its smoother to some simple methods and carries out itsreduction and interpolation only on the information containedin the underlying matrix. Therefore, AMG is more flexible inhandling general structures that may be irregular [27].

Recently, multigrid has been widely used in power gridanalysis [14]–[16] and optimization [24]. The multigrid methodin [14] uses an AMG-like interpolation procedure. However,regarding the selection of the coarser grids, this method is stillgeometrically based. Hence, it requires keeping track of thegeometry change, which may degrade the efficiency due tothe irregularity. The work in [15] emphasizes on the systemreduction part of AMG so that fast computation is achieved.However, it neglects the smoothing steps and results in nontriv-ial accuracy degradation. The AMG-based power grid analysisof [16] follows the complete AMG procedure with smoothingoperation at every level. This approach can achieve a bettersolution accuracy, but the runtime improvement becomes less.The work of [24] is based on GMG and, therefore, is restrictedto regular power grid. In practice, a power grid is often irregular[15], [16] due to the usage of IP core or system-on-chip designs.Furthermore, it is hard to estimate the violation accurately usingsimply the linear programming (LP) as in [24].

III. BRIEF REVIEW OF POWER-GRID MODELING

Before presenting our AMG-based approach, we give a briefreview of power-grid modeling and simulation. There are twosupply grids in VLSI design: the power and ground grids. Thetwo grids influence each other, and therefore, a simultaneoussimulation is preferred. However, if we take advantage of thefact that the power and ground grids are often symmetric, thecombined power/ground grids can be reduced back to a singlepower grid [6]. Power grid is usually a metal mesh where eachedge can be modeled as a resistor. Each node of the mesh hasa ground capacitance consisting of parasitic capacitance anddecap. Active devices, which are modeled as timing-varyingcurrent sources, are connected to the mesh nodes. Some nodesare also connected to power pads that can be treated as idealvoltage sources [28]. Hence, with a modified nodal analysis,the linear system can be represented with the following systemof differential equations:

G · x(t) + C · x′(t) = u(t) (1)

where G is the conductance matrix, C is the admittance matrixresulting from capacitive, or inductive, elements if metal meshis linked to pads with RL elements, x(t) is the vector, includingthe node voltage, voltage sources, and corresponding branchcurrent, and u(t) is the vector of independent time-varyingcurrent sources [9], [14], [29].

By using backward Euler method, this system can be dis-cretized to a linear algebraic system

(G + C/h) · x(t) = u(t) + C/h · x(t − h). (2)

With a fixed time step, we may rewrite (2) as Ah · x(t) = b(t),where Ah = G + C/h is the system matrix, and b(t) = u(t) +

740 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

C/h · x(t − h). If x(t) only consists of node voltage, then itrepresents an RC network that matrix Ah is symmetric andpositive definite (SPD) [14]. If inductive elements are takeninto consideration, the use of K matrix [30] still keeps thesystem matrix SPD, whereas (2) becomes two coupled iterativeequations [8].

IV. IMPROVED AMG-BASED POWER GRID REDUCTION

Reduction is a critical step in AMG methods. In this step,we can reduce the power grid to an easy-to-handle size andobtain the corresponding interpolation operator. Since the sys-tem matrix Ah is SPD, the restriction operator is simply thetranspose of the interpolation operator [27]. A good choice ofthe interpolation operator will greatly improve the convergenceof the AMG [27]. This section is focused on an improved AMG-based reduction scheme. The improved reduction procedure toobtain the interpolation operator is discussed in Section IV-A.Then, a dynamic-threshold control mechanism for speeding upthe reduction is presented in Section IV-B.

A. Interpolation Operator

In this section, we will introduce an interpolation operatorthat can provide a desired compromise between accuracy andruntime for power grid analysis. We define the following nota-tions for describing the reduction (coarsening) from a certaingranularity level to a coarser grained level:

C = {the set of nodes kept in coarse grid}F = {the set of nodes removed from

fine grid in coarsening}Ni = {node i′s neighboring node set}Si = {node i′s strongly connected node set}.

In the process of reduction or coarsening, some nodes (orvariables) are removed, whereas the others are retained in thecoarsened grid. At the beginning, C = F = ∅. The nodes arevisited in a prefixed order. Usually, critical nodes, such as powerpads, critical loads, and corner nodes, are kept in the coarse gridor added into set C [15].

Whether or not a noncritical node is kept depends on theconnection strength between nodes [15]. Same as in [15], theconnection strength between nodes j and i is defined as

Strength = |aij/aii + aij/ajj |/2 (3)

where aij is the element at the ith row and the jth column ofthe system matrix. For a node i, its neighboring nodes with aconnection strength greater than a certain threshold φ (usuallyin the range 0.1–0.3)[15], [27] form the set Si.

If a node is removed during coarsening, its smoothing errorcan be interpolated from its strongly connected nodes [27].Using too few neighboring nodes for interpolation may degradeaccuracy, whereas using too many nodes may cost large runtimeand memory overhead. In [15], only one neighboring node,which has the strongest connection, is utilized for interpolation.This method depends on the assumption that the neighboringnode with the strongest connection plays the dominant rolein interpolation and suppresses the impacts of all the other

Fig. 1. Small example of the RC model.

nodes in Si. However, this assumption is not true for everynode, particularly when the visited node has some neighboringnodes with similar connection strength. Hence, it is unfair toarbitrarily choose one of the neighboring nodes and discardthe others, as their contributions to the interpolated voltage arealmost the same. The classical AMG method uses all the nodesin Si [27] for interpolation. If any node j in Si has already beenremoved, multipass interpolation is performed that the nodesin Sj are used to replace the impact of node j [14], [27]. Asthe reduction procedure continues, it may take several passes tofind the interpolation weights for a specific node. Thus, it be-comes complex and time consuming to decide the interpolationoperator. Besides that, using too many nodes for interpolationwill increase the number of nonzero elements (NNZs) of thecoarsened system matrix and degrade the efficiency of matrixfactorization.

In order to achieve a better compromise between accuracyand computation cost, we propose to perform interpolationbased on a redefined Si as

Si ={

j|Strength ≥ max[φ,max

Ni

{Strength} − ε

]}(4)

where φ is the threshold for reduction, and ε is an empiricallychosen constant with a small value of about 0.001–0.005.The connection strength of the selected nodes is close tomaxNi

{Strength} and has the most significant influence on theremoved node. Since ε is small, the interpolation weight canbe simply set as 1/|Si|, where |Si| is the number of nodesin the set Si. This scheme simplifies the interpolation-weightcomputation and works particularly well when many nodeshave similar “strength”. Then, F = F ∪ {i}, and C = C ∪ Ni.For a removed node, all of its neighboring nodes will be kept inthe coarse grid.

We show a small example to illustrate the details of theinterpolation operation in Fig. 1. Suppose that all of the fourneighboring nodes j1, j2, j3, and j4 are strongly connectedto node i, and the connection strength of j1 and j3 to nodei is much larger than that of the other two nodes. Such acase is often met when j1, j3, and i are on one metal layer,whereas the other two nodes are on another layer. We can obtainthe interpolation operator in Fig. 2 that nodes j1 and j3 areselected for interpolation. Unlike our scheme, method in [15]randomly chooses the first-visited one in {j1, j3} and discardsthe other one with a similar strength. The classical AMG usesall the nodes for interpolation, decides whether there exists amultipass, and then computes the interpolation weights.

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 741

Fig. 2. Interpolation operator.

Based on the interpolation operation as aforementioned,we can obtain the overall interpolation operator Ph

H =Ph

H1PhH2, . . . , P

hHn as well as the coarsened system matrix

(PhH)T AhPh

H . This procedure is performed iteratively until thematrix is small enough for direct solve.

It can be observed that for our interpolation operator PhHi at

any grid level i, the sum of any row is one because there are nk

NNZs at the kth row with the value 1/nk. Therefore, we havethe following lemma for the overall interpolation operator Ph

H .Lemma 1: The sum of any row in the interpolation operator

PhH (an n × m matrix and n > m) is one.

Proof: The sum of any row in the interpolation operatorPh

H(i) at level i is one, and the overall interpolation operatoris the product of the interpolation operators at all levels likePh

H = PhH1P

hH2, . . . , P

hHn.

Let us consider the product PhH(i)P

hH(i+1), where Ph

H(i) is an

ni × mi matrix, PhH(i+1) is an ni+1 × mi+1 matrix, and mi =

ni+1. Hence, the kth row of the product is PhH(i)(k, :)Ph

H(i+1),

where PhH(i)(k, :) is the kth row of Ph

H(i). The sum can also berewritten as

mi+1∑l=1

mi∑j=1

PhH(i)(k, j)Ph

H(i+1)(j, l)

=mi∑j=1

PhH(i)(k, j)

mi+1∑l=1

PhH(i+1)(j, l). (5)

Since∑mi+1

l=1 PhH(i+1)(j, l) = 1 and

∑mi

j=1 PhH(i)(k, j) = 1, the

sum of (5) is one. By repeating the aforementioned procedure,it can be seen that the sum of any row in Ph

H is one. Moreover,as the transpose of Ph

H , the sum of any column in the restrictionoperator RH

h is also one. Q.E.D.

B. Dynamic Reduction Threshold

The reduction rate of coarsening heavily depends on thethreshold φ. The reduction rate can be quantified as the reduc-tion ratio between two consecutive levels

ratio =number of nodes at previous grid levelnumber of nodes at current grid level

. (6)

If the threshold φ is too low, the reduction ratio may be veryhigh during the first few iterations, and the accuracy is de-graded. In later iterations, the node degree increases rapidly dueto the aggressive reduction. Consequently, the matrix quicklybecomes very dense, and the reduction dramatically slowsdown. Therefore, a too low threshold may hurt both accuracyand convergence rate. If the threshold is too high, it is quitelikely that the sets Si for some nodes are empty and thatvery few nodes are removed. Previous AMG-based power-grid-analysis works [15], [16] use a constant threshold throughout all

levels of coarsening. When the system matrix changes duringcoarsening, the constant threshold may sometimes be too lowor too high.

In order to solve this problem, we propose a dynamic-threshold mechanism such that a stable reduction rate isretained in all levels of coarsening. At the beginning, the thresh-old is set to 0.2 which is an empirically good value employed inthe previous work [15], [27]. After the first level of coarsening,the threshold φ is determined by an empirical function

φiter = f(ratioiter−1)

=

0.1, φiter−1 <0.10.3, φiter−1 >0.3[u(ratioiter−1−1.5)−0.5]×k1

×(ek2|ratioiter−1−1.5|−1

)+0.2, otherwise

(7)

where u(x) is a unit step function, and k1 and k2 are empiricallychosen as 0.001 and 12, respectively. By doing so, the reductionratio can be stabilized throughout all levels of coarsening.Such an empirical function is drawn from the hope that thethreshold becomes smaller when the reduction ratio is too lowand becomes larger when the ratio is too high. We consider 1.5as a good reduction ratio for each level. In order to keep the ratiostable for different levels, the two-sided exponential function issuggested as the basic structure of the equation. With numericexperiments, we fit the equation and decided approximatelyeach parameter. Thus, such a strategy can be applied to a seriesof different cases.

V. AMG-BASED METHODS FOR POWER GRID ANALYSIS

This section begins with a fast AMG-based approximationmethod for power grid transient analysis in Section V-A. Afterthat, by combining with the error control mechanism, the pro-posed AMG is extended to an accurate solver in Section V-B.Section V-C proposes an interpolation-operator refinementmechanism and applies the AMG method to analyze the gridwith small modification.

A. Improved AMG-Based Approximation Solver

With the discussion in Section IV, we may easily get theinterpolation/restriction operator and map the problem fromthe fine grid to the coarse grid. However, for the smoothingprocedure, the complementary operation, if we do not performit like [15], great accuracy degradation may occur. On the otherhand, directly applying a general-purpose AMG solver to thepower grid analysis is not computationally efficient because thepre- and postsmoothing at each coarse-grid correction level arevery time consuming. Instead, we propose a customized AMGmethod for power grid analysis, which is summarized in Fig. 3.In this new approach, a weighted-Jacobi-based presmoothingis performed only once at the beginning. This presmoothingcan significantly improve solution accuracy while only oneiteration of presmoothing has a limited impact on runtime.After reduction, the linear system at the coarsest grid can besolved directly. Successive solutions on each time point wouldinvolve only inexpensive forward and backward substitutionprocedures.

742 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

Fig. 3. Improved AMG-based approximation solver for power grid transientanalysis.

B. AMG-Based Solver With Error Control Mechanism

For multigrid method, its convergence is affected by thesmoothness of the current distribution. If a circuit has a severelyuneven current distribution, only one presmoothing is not ad-equate for eliminating high-frequency errors, particularly indc analysis. In [17], a W-cycle-like postsmoothing scheme isperformed to control the error. With multiple iterations, thisscheme can control the error at a tolerable level. However,the aforementioned scheme is not completely consistent tothe proposed coarse-grid correction operator. By noting thatthe proposed AMG is designed to reduce the error, we mayconsider it as an approximation for the inverse of Ah and,hence, use it as a preconditioner for a Krylov subspace solver,like the CG method. The AMG preconditioned CG methodis described in Fig. 4. By employing CG as the error controlmechanism, we may reduce the error that is damped poorly bythe proposed multigrid and obtain a more robust method withgood convergence.

C. Incremental Update of the Interpolation Operator for theAnalysis of the Modified Grid

If some changes are made to a chip design, its power gridoften needs to be modified accordingly. In this scenario, thepower grid analysis can be performed incrementally based onthe results of previous design instead of being carried out fromscratch. We extend our AMG-based method for the fast analysisof the modified power grid.

By the AMG reduction, the nodes on the coarsened grid canbe considered as the representative nodes of the fine grid. Toanalyze the modified grid, we just need to add the nodes inthe modified regions into the original coarse grid. After that,the updated coarse grid can represent the modified fine gridand minimize the impact of the faraway pads in the wirebondtechnology. To include those modified regions, we define the“local window,” which is the smallest window covering themodified region. If there are several modified regions, severalcorresponding windows are built.

Fig. 4. AMG-based solver with error control mechanism.

TABLE INOTATIONS USED IN THE INTERPOLATION-OPERATOR

INCREMENTAL UPDATE

In the AMG-based method, constructing the interpolationoperator is the most time-consuming step as it includes multiplematrix multiplications. In a single dc analysis by the AMG pre-conditioned CG method, the interpolation-operator construc-tion accounts for about 40% of the entire computation cost.Therefore, in the analysis, we focus on how to incrementallyupdate the interpolation operator for the modified power grid bypartially reusing the interpolation operator of the original grid.The notations used in the methodology are shown in Table I.

Step 1) Decide the node set Sdel and Sadd for removed andinserted wire segments.

Step 2) Decide the local window for each modified region.Slocal contains the nodes in all these windows. For anode i in Sdel ∩ {Nodes of PGH

0 }, its local windowalso includes the nodes whose voltages are interpo-lated from node i.

Step 3) Find the node set Srmv.Step 4) Obtain the new interpolation node set Sinterp for the

modified grid PGhm.

Step 5) Obtain the node set Snew for PGhm.

Step 6) Clear the interpolation weights for nodes in Srmv tozero. Set their interpolation weights to one.

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 743

Step 7) Obtain the interpolation operator on PGhm. Remove

|Sdel| rows corresponding to Sdel from the inter-polation operator Ph

H on PGh0 . Insert |Sadd| rows

corresponding to Sadd in PhH . Insert |Srmv| columns

corresponding to Srmv in PhH . For a node i in Srmv,

the interpolation weight is at (xi, yi) of the newinterpolation operator, where xi is the node i’s indexin Snew, and yi is the node i’s index in Sinterp.

For the aforementioned methodology, the adjustment ofwire/decaps corresponds to the column insertion in the inter-polation operator, whereas wire-segment removal/insertion justcorresponds to row removal/insertion. We use the neighboringnodes and some strongly connected nodes to alleviate theimpact of the modified regions on the interpolation operator.The methodology does not perform any matrix operation exceptrows/column insertion/removal. Thus, it is faster than construct-ing a completely new interpolation operator.

With the node voltage V0(t) for PGh0 as the initial guess for

each time step, we can apply the updated interpolation operatorto the same algorithm in Fig. 3. This approach can handlemultiple modified regions on very large power grids and makethe modified power grid analysis more efficient.

VI. FAST DA USING AMG

In this section, we introduce a fast DA method based onAMG. An overview of the algorithm is given in Section VI-A,including the problem formulation and related power-grid-reduction issues. In Section VI-B, an error-compensationmechanism is proposed to reduce the errors resulted fromalgebraically nonsmooth structure. Several customized speeduptechniques for sequential quadratic programming (SQP) arediscussed in Section VI-C. Finally, a charge-based back-mapping flow is demonstrated in Section VI-D.

A. Overview of DA Using AMG

The size of decap at each node is a decision variable in theDA problem, in which we attempt to minimize the total areaof decaps while the voltage at each node is no less than certainthreshold at any time point. The lower and upper bounds forallowed decap size are represented as lb and ub, respectively.The DA problem is formulated as the following non-LP (NLP)problem, where Ch is the decap vector, and Ch

i is the ithelement of Ch.DA

Minimum∑

i∈PGh

Chi (8)

Subject to : ceq(Ch) =∑

i∈PGh

si = 0 (9)

lb < Ch < ub (10)

where si =∫ T

0 |min(Vi(t)−Vth, 0)|dt =∫ t2

t1 (Vth−Vi(t))dt,and [t1, t2] is the time interval in which the violation occurs,as shown in Fig. 5. This voltage-drop noise metric is adoptedfrom [19].

Due to the huge size of power grid, solving the NLP DAdirectly is extremely difficult. Therefore, we propose to reduce

Fig. 5. Illustration of voltage drop.

the problem size using the AMG-based technique described inSection IV. The reduced problem on the coarse grid is solveddirectly, and then, the solution is mapped back to the originalfine grid.

Here, we use the conductance matrix G on the originalfine grid as the system matrix Ah to obtain the interpolationoperator. Following the techniques in Section IV, we can obtainthe overall interpolation operator Ph

H for the underlying matrixAh. Moreover, the overall restriction operator is RH

h = (PhH)T .

Therefore, the system Ah can be reduced to a coarser gridAH = RH

h AhPhH . Since Ah is just the conductance matrix G,

AH can be considered as the conductance matrix correspondingto a more complex but coarsened grid.

Once the interpolation operators are obtained, the current-source vector on the coarse grid can be obtained by

IH(t) = RHh Ih(t) (11)

where Ih(t) is the current-source vector on the fine grid.Correspondingly, the bounds for decap sizes are also updated tolHb < CH < uH

b , where lHb = RHh lb, and uH

b = RHh ub [24].

In our AMG-based method, the voltage-source nodes areretained. The parasitic capacitance is used as the lower bound ofdecap optimization instead of being stamped in the capacitancematrix for the convenience of computation. If we directlyinclude the capacitive elements in the system matrix to computethe interpolation operator, the capacitance matrix on the coarsegrid is no longer a diagonal matrix. The fill-ins at the off-diagonal positions just denote the cross capacitance betweennodes. Such a capacitance matrix may increase the numberof variables in the NLP and make it unsolvable. It is alsoimpratical to map the optimized cross capacitance back to thegrounded decaps on the fine grid. Therefore, the capacitance onthe coarse grid is obtained by

CH = RHh Ch. (12)

A flow of the proposed DA algorithm is shown in Fig. 6. Onetransient analysis is performed on the original power grid, basedon the interpolation operator Ph

H and the restriction operatorRH

h previously obtained, to precondition the system matrixsuch that we can obtain further speedup. Other techniques willbe discussed in the following sections.

B. Error Compensation

Obviously, the power grid reduction causes some informationloss that is an inevitable price paid in exchange for the improve-ment of computation speed. In other words, the credibility ofthe optimization solution depends on the discrepancy betweenthe transient response on the coarse grid and that on the fine

744 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

Fig. 6. Flow of DA using AMG.

grid. Based on that, we propose a compensation technique toreduce the error due to this discrepancy.

Suppose that the NLP on the coarse grid is

DA coarse

Minimum∑

i∈PGH

CHi (13)

Subject to : ceqH(CH) =∑

i∈PGH

sHi = 0 (14)

lHb < CH < uHb (15)

where sHi =

∫ t2

t1 (Vth − V Hi (t))dt. When DA coarse is carried

out, the corresponding transient analysis of the power grid isalso performed on the coarse grid. Performing transient analysison the coarse grid and directly mapping the result back to theoriginal fine grid are equivalent to performing multigrid withoutthe smoothing step. Omitting the smoothing often results insignificant errors [15], particularly when the current is notevenly distributed.

Fig. 7 shows an example of the transient-simulation resultsfrom V(t) and Ph

HVH(t). It can be seen that the voltagePh

HVH(t) interpolated from the transient result on the coarsegrid underestimates the voltage drop.

We propose a compensation technique—raise the thresholdvoltage for the interpolated voltage by a constant δ such that

Fig. 7. Voltage-drop noise metrics of V (t) and P Hh V H(t).

Fig. 8. Raise of the threshold voltage.

its violation area is roughly equal to ceq(Ch). For example,in Fig. 8, we raise the threshold voltage for the interpolatedvoltage from Vth to Vth1 to make S1 ≈ S2. Thus, we have

ceq(Ch) =∑

i∈PGh

t2∫t1

(Vth − Vi(t)) dt

≈∑

i∈PGh

t6∫t5

(Vth + δ −

(Ph

HVH(t))i

)dt. (16)

In order to avoid voltage-drop violation or let (16) be zero, wejust need to make Ph

HVH(t) ≥ Vth + δ. With Lemma 1, this isequivalent to requiring VH(t) ≥ Vth + δ.

Hence, the NLP on the coarse grid with error compen-sation is

DA compensated

Minimum∑

i∈PGH

CHi (17)

Subject to : ceqH(CH) =∑

i∈PGH

sHi = 0 (18)

lHb < CH < uHb (19)

where sHi =

∫ t6

t5 (Vth + δ − V Hi (t))dt, and δ satisfies (16). The

value of δ is different for each specific power grid with certaindecap size. Moreover, the decap sizes are changed during theoptimization. Therefore, it is not obvious how to choose thevalue of δ before the optimization.

With some numeric experiments, we notice that the maxi-mal absolute value and the average absolute value of δ(t) =Ph

HVH(t) − V(t) decrease when the total decap area in-creases, as shown in Fig. 9. When the decap area increases,more high-frequency components of the voltage change isfiltered out. Since the analysis on the coarse grid is relativelygood at handling low-frequency errors, the magnitude of theerror or δ(t) consequently becomes smaller.

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 745

Fig. 9. Average and maximal absolute values of δ(t) versus total decap area.

Fig. 10. Flow for compensation constant computation.

As a conservative approach, we obtain the value of δ basedon using the minimum size for each decap. Our purpose isto find a practically feasible technique, which can lead toan improved, not necessarily optimal, solution. This schemeensures that the error can always be fully compensated evenin the worst case. Although it sometimes leads to some over-compensation, experimental results show that the degree ofovercompensation is very limited in general. Through transientsimulations, we can estimate the time-dependent error functionδ(t) = Vr(t) − V(t), where Vr(t) denotes the interpolatednode voltage Ph

HVH(t). Then, a binary search is performedin the range between the average absolute value δlb and themaximal absolute value δub of δ(t) so as to find a value ofsatisfying (16). The flow to compute δ is shown in Fig. 10.

C. SQP Speedup

The compensated NLP problem “DA compensated” on thecoarse grid is solved using an SQP package [31]. The sensi-tivities of the voltage constraints with respect to decap sizesare calculated by the adjoint network method [19], [21]. SQPsolves the NLP problem by converting it to a series of locallyapproximated quadratic programming problems. Here, we sug-gest three simple yet effective speedup techniques to the SQP:1) violation aware decap preallocation; 2) variable removal fornodes at power pads; and 3) search-step scaling.

SQP starts from an initial solution and then successivelymoves the solution toward the optimal point. Therefore, a

good initial solution can greatly improve the convergence. Wepropose to find a good initial solution by preallocating moredecaps to nodes with relatively large current withdrawn duringthe violation time intervals.

Step 1) Run transient simulation on the coarse grid with aminimum-sized decap at each node.

Step 2) Obtain the accumulated violation charge during thevoltage violation period for each node i : QH

i =∫ t2

t1 IHi (t)dt, where [t1, t2] is the time interval of

voltage-drop violation.Step 3) Add decap equivalent to the amount of the

charges during the violation period: CH0 = lHb +

QH/(VDD − Vth).Step 4) If there is no noise with the decap CH

0 , go to step 5);otherwise, exit the preallocation procedure.

Step 5) Set CH0i =Ki×(CH

0i − lHbi ) + lHbi for each decap,where Ki = 1 − (1/(max(1, (VDD − min(Vi(t)))/(VDD − Vth))) [18], and then, go to step 4).

One can see that the aforementioned procedure allocates moredecaps to nodes with relatively large violation charge. Since thecoarse grid is small, this preallocation procedure can be carriedout rapidly.

During the power grid reduction of AMG, the nodes con-nected to power pads are always kept [15], [17]. Thus, asignificant portion of nodes in the coarse grid is connected tothe power pads. Since these nodes are directly connected to thevoltage sources in the circuit model, decaps at these nodes haveno effect on the voltage drop there. Hence, we can remove manyvariables corresponding to these nodes in the NLP. Evidently,such a removal may reduce the runtime of SQP.

Since the problem “DA compensated” is on the coarse grid,the allowed range (uH

b −lHb ) for each decap size is very large.This implies a large solution search space and slow runtime.In SQP [31], the searching step in each iteration is limitedin order to maintain a decent accuracy for the local quadraticapproximation. We find that we can scale the searching stepof [31] by a small factor (1 + β) with β = 0.1 without sig-nificantly influencing the solution quality. β is an empiricallychosen value. Our motivation is to enlarge the search stepbut still maintain some accuracy. A too large one will greatlyoverestimate required decaps, whereas a too small one will havea little effect in speedup. As a result, the computation speed isfurther improved.

D. Charge-Based Back-Mapping

After the solution for decap CH on the coarse grid is ob-tained, we need to map it back to decap Ch on the fine grid.The back-mapping is equivalent to spreading decaps on thecoarse grid to nodes in the fine grid. Even though we knowthat RH

h Ch = CH, there are usually many solutions on the finegrid, satisfying RH

h Ch = CH. The work of [24] finds a uniqueback-mapping by solving an LP of minimizing a weighted sumof total decap area subject to RH

h Ch = CH. However, such aback-mapping neglects the voltage-drop constraint on the finegrid. Moreover, it tends to be very slow when the problem sizeis huge. For example, if there are one million nodes in the powergrid, such a back-mapping requires solving an LP with roughlyone million variables.

746 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

Fig. 11. Algorithm for charge-based back-mapping.

We propose a charge-based back-mapping. When we spreaddecaps on the coarse grid to nodes in the fine grid, we allo-cate greater portions to nodes with relatively large violationcharge. Same as in Section VI-C, the violation charge forthe nodes in the fine grid is the accumulated charge flowinginto a node during the time interval of voltage-drop violation:Qh

i =∫ t2

t1 Ihi (t)dt, where [t1, t2] is the time interval of voltage

violation.Our proposed charge-based back-mapping is shown in

Fig. 11. Because the computations here are mostly vectoroperations and the violation time intervals are already avail-able from the decap preallocation described in Section VI-C,our back-mapping is much faster and more scalable than theLP-based back-mapping in [24].

Since RHh Ih(t) = IH(t), the violation charge on the coarse

grid satisfies QHi =

∫ t2

t1 IHi (t)dt =

∫ t2

t1 RHh (i, :)Ih(t)dt, where

RHh (i, :) is the ith row of RH

h . The violation time interval onthe fine grid is approximated by that on the coarse grid. Then,we have

QHi =

∑j∈PGh

t2∫t1

RHh (i, j)Ih

j (t)dt =∑

j∈PGh

t2∫t1

PhH(j, i)Ih

j (t)dt

=∑

j∈PGh

PhH(j, i)Qh

j (20)

where PhH(j, i) is the element at the jth row and ith column of

PhH . This indicates that the total decap area remains the same

after the back-mapping in the proposed algorithm.

VII. EXPERIMENTAL RESULTS

The proposed AMG-based methods for power grid analysisand decap optimization are implemented in C language withnumeric libraries TAUCS [32] and RFSQP [31]. The experi-ments are carried out on PC with Pentium IV 2.6-GHz CPU,1-GB memory, and Windows operating system. The experi-ments are performed on a set of testcases of different sizes. All

TABLE IINOTATIONS FOR THE AMG-BASED METHODS

of them are in mesh structure with some local irregularities.The supply voltage is 1.8 V. The efficiency of the proposedAMG-based reduction is demonstrated in Section VII-A. Afterthat, the results of power grid analysis and decap optimizationare exhibited in Sections VII-B and VII-C, respectively. Forsimplicity of presentation, we summarize the notations for ourAMG-based methods in Table II.

A. AMG-Based Power Grid Reduction

In order to test the effect of the dynamic-reduction-thresholdscheme introduced in Section IV, we run experiments to com-pare the methods with dynamic and constant reduction thresh-olds (= 0.2). Six cases of different sizes are employed, asshown in Table III. In columns five to ten of Table III, thereductions on the number of nodes and the NNZs after the samelevels of AMG coarsening (column 4) are compared. Theresults in columns five and six of Table III tell that using thedynamic reduction threshold results in greater node reduction,particularly for large cases. For the largest case (case C6), thereduction from using dynamic threshold is almost ten timesgreater than that of using constant threshold. Similarly, the datain columns seven and eight of Table III indicate that dynamicthreshold may yield much greater reduction on the numberof NNZs for large cases as well. The computation accuracy,runtime, and memory are compared in columns 11–16 ofTable III. The data of dc analysis using the accurate AMG-based solver in Section V-B show that using the dynamicthreshold results in about the same accuracy as constant thresh-old with remarkably less runtime and memory for large cases.Due to the memory limitation, the method using a constantthreshold cannot complete for the larger cases.1

B. AMG-Based Methods for Power Grid Analysis

The same testcases C1–C6 in Table III are employed forpower grid analysis. In this section, the transient analysisis performed using the proposed AMG-based approximationsolver in Section V-A (abbreviation: AMG_A). After that,we also compare the accurate AMG-based solver proposed inSection V-B (abbreviation: AMG_CG) with both iterative anddirect solvers. Finally, we demonstrate the results of the tech-niques proposed in Section V-C that the interpolation operatoris incrementally updated for the analysis of the modified grid(abbreviation: AMG_M).1) Fast Transient Analysis Using AMG-Based

Approximation Solver: We performed transient analysis using

1The constant threshold 0.2 is chosen as a reasonable tradeoff betweenaccuracy and speed [15], [27]. A lower threshold may complete the testcasesat the cost of larger accuracy loss.

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 747

TABLE IIICOMPARISON BETWEEN THE DYNAMIC AND CONSTANT REDUCTION THRESHOLDS

TABLE IVCOMPARISON BETWEEN SAN [15] AND OUR AMG_A METHOD ON TRANSIENT ANALYSIS

Fig. 12. Error histograms of SAN [15] and our AMG_A method.

the AMG-based approximation solver described in Section V-A(AMG_A) and compared the results with the similar solver,i.e., SAN [15]. The solution from an incomplete CholeskyCG (ICCG) solver is employed as a baseline for evaluatingthe accuracy on testcases C1–C6. We ran both methods with1000 time steps (step size = 5 ps) for the first five cases and200 time steps for the last case. Table IV compares the numberof nodes and the NNZ after reduction for two methods. Dueto the high density of the interpolation operator, our AMG_Amethod always leads to a coarser grid and a higher NNZdensity than the SAN method [15]. Table IV also comparesthe runtime for each time step and the accuracy of the twomethods. It is clear that our AMG_A method can obtain a

Fig. 13. Comparison of SAN [15] with our AMG_A method on voltagewaveforms.

much higher accuracy at about the same computation speed asthe SAN [15] method without presmoothing. Moreover, ourAMG_A method is also much faster and more accurate thanSAN [15] with the same iterations of presmoothing.

We recorded the maximal error at each node during thetransient analysis for case C4. Fig. 12 shows the histogram ofthe errors for our AMG_A method and the SAN method [16]without presmoothing. It is easy to see that the errors fromour AMG_A method are very close to zero and are about oneorder of magnitude lower than the errors from SAN [15]. Theerror distribution of our AMG_A method has a mean valueof −7.38 × 10−4 and a standard deviation of 2.97 × 10−4,

748 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

TABLE VCOMPARISON AMONG OUR AMG_CG AND ITERATIVE SOLVERS CG, ICCG, AND HYBRID SOLVER [34] ON DC ANALYSIS

TABLE VICOMPARISON AMONG OUR AMG_CG AND DIRECT SOLVERS TAUCS [32] AND CHOLMOD [33] ON DC ANALYSIS

whereas the error distribution of SAN without presmoothing[15] has a mean of 0.0031 and a standard deviation of 0.0048.The waveform of a node in case C4 is simulated by both ourAMG_A method and the SAN method without presmoothing[15]. The simulated waveforms are shown in Fig. 13. Onecan see that the waveform obtained from our AMG_A methodmatches the exact waveform that is much better than theSAN method [15].2) AMG-Based Solver With Error Control Mechanism: In

Table V, we compared the AMG-preconditioned CG method(AMG_CG) in Section V-B against several iterative solvers,including a recently reported hybrid solver [34], [35], a CGsolver, and an ICCG solver. The error tolerance of all iterativemethods is set as [34] ‖b − Ax‖2 < 10−6 × ‖b‖2 for the iter-ative solvers. The dropping threshold for incomplete Choleskyfactorization in ICCG is 0.1. The overall CPU time is comparedin columns two to eight of Table V.

One can see that our AMG_CG method runs significantlyfaster than CG, ICCG, and even the hybrid solver [34]. Forthe largest case with 1.6-M nodes, our AMG_CG method isabout 2.7× faster than the hybrid solver [34]. Moreover, wecan expect greater speedup in transient analysis for larger cases.Columns 9–11 compare the preconditioning runtime. For allcases, our method takes the least time to finish the precon-ditioning. The last four columns compare the accuracies ofAMG_CG, CG, ICCG, and hybrid solver [34]. Our AMG_CGmethod has a better residual than the other three methods.Compared with the hybrid solver [34], the norm of residual byAMG_CG is about 100× smaller.

In Table VI, we compared our AMG_CG with TAUCS [32]and a state-of-the-art direct solver CHOLMOD [33]. Generally,AMG_CG requires direct solve on the coarsest grid at itspreconditioning stage. However, the advantage of AMG_CGis solver independent. To demonstrate the independence, weintegrate the CHOLMOD [33] solver to the original AMG_CG,which is named as AMG_CG_C. The original AMG_CG solverusing TAUCS [32] is named as AMG_CG_T. It can be observedfrom columns two to three of Table VI that our AMG_CG_T

runs faster than TAUCS [32], which cannot complete for casesC5 and C6 due to the memory problem. Although AMG_CG_Cis slower than CHOLMOD [33] for the first four small cases, thegap becomes smaller when the size of power grid increases. Itis a little faster than CHOLMOD [33] for the two largest cases.We may expect to see more speed advantages of AMG_CGwhen the size of the network further increases. Thus, we can seethat the AMG_CG’s advantage is mainly embodied in solvingthe large power/ground network. The last four columns ofTable VI compare the accuracies. Our AMG_CG solvers canalways achieve the required accuracy.3) Incremental Update of the Interpolation Operator for

the Analysis of the Modified Grid: We conducted the experi-ments for the incremental update of the interpolation operatorproposed in Section V-C. The AMG-based solver (AMG_M)analyzes the modified grids C1–C4 of Table III, with wire-bond technology instead of flip-chip. The last two cases C4_Dand C4_A are used to demonstrate the efficiency of the pro-posed methodology whenever some wire segments are removedor inserted. The analysis was performed for 30 time steps(step size = 50 ps) for all cases.

Column 2 (Pervasiveness) in Table VII lists the pervasivenessof the modification, which is the ratio of the number of thenodes in the modified grid to that in the original grid. Thepervasiveness ranges from 1.1% to 9.6% for cases C1–C4. Incase C4_D, 100 randomly selected nodes and correspondingwire segments are removed from grid C4. In case C4_A,50 wire segments are randomly inserted. Columns three to fivein Table VII list the CPU times for critical stages in the wholetransient analysis. It can be seen that the time for interpolation-operator update (Stage I) is almost negligible. Columns six toseven demonstrate the CPU times of our AMG_A method inSection V-A for a full grid analysis and our AMG_M method.One can see that the runtime of the AMG_M method is usuallyhalf of the full analysis. We recorded the maximal absolute errorat each node during the transient analysis and named that asthe vector e. The quality of the solution by AMG_M method isshown in columns eight to ten, including the maximal value, the

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 749

TABLE VIIRESULTS OF THE INTERPOLATION-OPERATOR UPDATE AND THE COMPARISON BETWEEN AMG_M AND AMG_A ON TRANSIENT ANALYSIS

TABLE VIIITESTCASE INFORMATION

average value, and the standard deviation of e. The last columnlists the maximal ratio of the error to the maximal voltage drop.Generally, the ratio is less than 4%. The data show that usingthe proposed AMG_M method can reach a good accuracy withless runtime compared with the full grid analysis.

C. AMG-Based Methods for DA

The experiments of the proposed DA method in Section VIare performed on five small testcases and five large testcases.The information of the testcases, including the number ofnodes, the threshold of voltage drop, the maximal violation timeintervals, and the percentage of violation nodes, is shown inTable VIII. The threshold of voltage drop is set artificially suchthat about 20%–35% nodes have violations for each testcase.1) Comparison With Previous DA Works: In order to

demonstrate the efficiency of our AMG-based method (abbre-viation: AMG_O), we implemented and compared it with thefollowing previous works.

1) CG_O: solving DA problem using the standard CGmethod as in [21]. It generally provides very good resultsbut is relatively slow.

2) iCG_O: the improved CG method proposed in [23].By replacing standard line search in CG_O with greedysearch, it is significantly faster than CG_O. However, ittends to overbudget the decap area.

3) Theta: the charge-based method introduced in [18]. Itruns very fast but may result in large overestimation onthe decap area.

The results of comparisons are shown in Table IX. The datashow that our AMG_O method runs much faster than CG_O

which cannot even complete for the two largest cases. OurAMG_O method is also faster than iCG_O on large cases. Itis slower than Theta, but the gap becomes smaller when thesize of power grid is increased. In fact, its runtime is less thanthat of Theta on the largest case. This is because the inaccuratedecap estimation of Theta method incurred multiple iterationsof transient analysis and corrections. One can also observe thatthe speedup of our AMG_O method versus CG_O and iCG_Oincreases for large cases. These evidences indicate that ourAMG_O method has a better scalability than the other methods.This is an appealing feature when we face increasingly largecircuit designs.

The solution quality is evaluated in terms of total decap areaand voltage slack, which is the threshold of voltage drop minusthe worst voltage drop after optimization. It can be seen that ourAMG_O method almost always provides the minimum decaparea except case D6. In contrast, Theta results in unnecessarilylarge decap area that is 30%–77% more than our AMG_Omethod. With the similar or smaller decap, the voltage slackafter being optimized by our proposed AMG_O is usually largerthan that of CG_O. It indicates that decaps are more reasonablyallocated by our AMG_O method.2) Effectiveness of AMG and Proposed Speedup for SQP:

The comparisons in this section are made to investigate the ef-fectiveness of using the AMG and the SQP speedup techniquesproposed in Section VI-C. We compare the full version of ourAMG_O method with the following variants in Table X:

1) SQP: solving the DA through the SQP package [31]without using the AMG and the speedup techniquesintroduced in Section VI-C;

2) Our SQP: solving the DA through the SQP package [31]with the speedup techniques introduced in Section VI-Cbut without using the AMG;

3) SQP_P: solving the DA through the SQP package [31]with a violation aware decap preallocation techniqueintroduced in Section VI-C but without using the AMGand other techniques.

Since the SQP package [31] is not able to handle large casesdirectly, the comparisons are made for only the small cases. TheSQP [31] yields the minimum decap area but has minor voltageviolations. This is because the original SQP [31] terminateswhenever the nonlinear constraint is smaller than a predefinedtolerance.

The comparison of the AMG_O results with that of our SQPshows an excellent speed/quality compromise of the proposedAMG techniques. The comparison of the original SQP result

750 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 4, APRIL 2008

TABLE IXCOMPARISON AMONG CF_O, iCG_O, THETA, AND OUR AMG_O METHOD ON DA

TABLE XCOMPARISON AMONG SQP, OUR SQP, SQP_P, AND OUR AMG_O METHOD ON DA

TABLE XIDETAILED RESULTS OF OUR AMG_O METHOD

with that of our SQP and SQP_P implies that the speeduptechniques of Section VI-C are very effective in improving thecomputation speed with a very minor overestimation. Overall,the overestimation on the decap area from our AMG_O methodis no more than 6%. With the increase of the power grid size,we expect to see even larger speedup with our AMG_O method.3) Other Details of our AMG-Based Method: Table XI

shows other details of our AMG_O method, including thenumber of iterations for the grid reduction, the number ofnodes after reduction, the compensation constant described inSection VI-B, the number of iterations for the SQP, and theamount of total decap allocated. It can be seen that the coarsegrids usually contain only hundreds of nodes.4) Charge-Based Versus LP-Based Back-Mappings: In

Table XII, we compare our charge-based back-mapping andthe LP-based back-mapping [24] on the four large cases. Sinceboth mappings give the same total decap area, we compare onlythe CPU time and the voltage slack here. It is evident that ourcharge-based back-mapping provides a better voltage slack atfaster speed.

TABLE XIICOMPARISON BETWEEN THE CHARGE-BASED AND THE

LP-BASED BACK-MAPPINGS

VIII. CONCLUSION

In this paper, we have presented an improved AMG-basedpower-grid-reduction scheme with a dynamic-threshold tech-nique. Based on that, several AMG-based methods were devel-oped for power grid analysis. Moreover, the reduction schemewas combined with some efficient techniques, like error com-pensation and charge-based back-mapping, for fast DA. Com-pared with several recently reported industrial methods, ourmethods can reach better accuracy-runtime tradeoffs and leadto higher quality solutions.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir helpful comments during the entire review process.

REFERENCES

[1] Q. K. Zhu, Power Distribution Network Design for VLSI. Hoboken, NJ:Wiley, 2004.

[2] H. H. Chen and D. D. Ling, “Power supply noise analysis method-ology for deep-submicron VLSI chip design,” in Proc. DAC, 1997,pp. 638–642.

[3] A. Dharchoudhury, R. Panda, D. Blaauw, and R. Vaidyanathan, “Designand analysis of power distribution networks in powerPC microproces-sors,” in Proc. DAC, 1998, pp. 738–743.

ZHUO et al.: POWER GRID ANALYSIS AND OPTIMIZATION USING ALGEBRAIC MULTIGRID 751

[4] S. Bobba, T. Thorp, K. Aingaran, and D. Liu, “IC power distributionchallenges,” in Proc. ICCAD, 2001, pp. 643–650.

[5] S. S. Sapatnekar and H. Su, “Analysis and optimization of power grids,”IEEE Des. Test Comput., vol. 20, no. 3, pp. 7–15, May/Jun. 2003.

[6] R. Panda, D. Blaauw, R. Chaudhry, V. Zolotov, B. Young, andR. Ramaraju, “Model and analysis for combined package and on-chippower grid simulation,” in Proc. ISLPED, 2000, pp. 179–184.

[7] H. Su, K. Gala, and S. S. Sapatnekar, “Fast analysis and optimization ofpower/ground networks,” in Proc. ICCAD, 2000, pp. 477–480.

[8] T. H. Chen and C. C.-P. Chen, “Efficient large-scale power grid analysisbased on preconditioned Krylov-subspace iterative methods,” in Proc.DAC, 2001, pp. 559–562.

[9] M. Zhao, R. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchicalanalysis of power distribution networks,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 2, pp. 159–168,Feb. 2002.

[10] H. Qian, S. R. Nassif, and S. S. Sapatnekar, “Power grid analysis usingrandom walks,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,vol. 24, no. 8, pp. 1204–1224, Aug. 2005.

[11] W. Guo, S. X.-D. Tan, Z. Luo, and X. Hong, “Partial random walk forlarge linear network analysis,” in Proc. ISCAS, 2004, pp. 173–176.

[12] Y. Lee and C. C.-P. Chen, “Power grid transient simulation in lineartime based on transmission-line-modeling alternating-direction-implicitmethod,” in Proc. ICCAD, 2001, pp. 75–80.

[13] Y. Lee and C. C.-P. Chen, “The power grid transient simulation inlinear time based on 3D alternating-direction-implicit method,” IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 11,pp. 1545–1550, Nov. 2003.

[14] J. N. Kozhaya, S. R. Nassif, and F. N. Najm, “A multigrid-like tech-nique for power grid analysis,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 21, no. 10, pp. 1148–1160, Oct. 2002.

[15] H. Su, E. Acar, and S. R. Nassif, “Power grid reduction based on algebraicmultigrid principles,” in Proc. DAC, 2003, pp. 109–112.

[16] Z. Zhu, B. Yao, and C. K. Cheng, “Power network analysis using anadaptive algebraic multigrid,” in Proc. DAC, 2003, pp. 105–108.

[17] C. Zhuo, J. Hu, and K. Chen, “An improved AMG-based method for fastpower grid analysis,” in Proc. ISQED, 2006, pp. 290–295.

[18] C. K. S. Zhao, C.-K. Koh, and K. Roy, “Decoupling capacitance allo-cation and its application to power-supply noise-aware floorplanning,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 1,pp. 81–92, Jan. 2002.

[19] H. Su, S. S. Sapatnekar, and S. R. Nassif, “Optimal decoupling capaci-tor sizing and placement for standard cell layout designs,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 4, pp. 428–436,Apr. 2003.

[20] A. Kahng, B. Liu, and S. X.-D. Tan, “Efficient decoupling capacitorplanning via convex programming methods,” in Proc. ISPD, 2006,pp. 102–107.

[21] J. Fu, Z. Luo, X. Hong, Y. Cai, S. X.-D. Tan, and Z. Pan, “A fast decou-pling capacitor budgeting algorithm for robust on-chip power delivery,”in Proc. ASP-DAC, 2004, pp. 505–510.

[22] Z. Qi, H. Li, S. X.-D. Tan, L. Wu, Y. Cai, and X. Hong, “Fast decapallocation algorithm for robust on-chip power delivery,” in Proc. ISQED,2005, pp. 542–547.

[23] H. Li, Z. Qi, S. X.-D. Tan, L. Wu, Y. Cai, and X. Hong, “Partitioning-based approach to fast on-chip decap budgeting and minimization,” inProc. DAC, 2005, pp. 170–175.

[24] K. Wang and M. Marek-Sadowask, “On-chip power supply networkoptimization using multigrid-based technique,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 3, pp. 407–417,Mar. 2005.

[25] W. Hackbusch, Multigrid Methods. Beijing, China: Science Press, 1988.[26] C. Zhuo, J. Hu, M. Zhao, and K. Chen, “Fast decap allocation based on

algebraic multigrid,” in Proc. ICCAD, 2006, pp. 107–111.[27] K. Stuben, “Algebraic multigrid (AMG): An introduction with

applications,” in Multigrid Methods. New York: Academic, 2000.Guest appendix.

[28] H. H. Chen and J. S. Neely, “Interconnect and circuit modeling techniquesfor full-chip power noise analysis,” IEEE Trans. Compon., Packag.,Manuf. B, vol. 21, no. 3, pp. 209–215, Aug. 1998.

[29] S. Pant and E. Chiprout, “Power grid physics and implications for CAD,”in Proc. DAC, 2006, pp. 199–204.

[30] A. Devgan, H. Ji, and W. Dai, “How to efficiently capture on-chip in-ductance effects: Introducing a new circuit element K,” in Proc. ICCAD,2000, pp. 150–155.

[31] RFSQP. [Online]. Available: http://aemdesign.com/downloadrfsqp.htm[32] TAUCS. [Online]. Available: http://www.tau.ac.il/~stoledo/taucs/

[33] CHOLMOD. [Online]. Available: http://www.cise.ufl.edu/research/sparse/cholmod/

[34] H. Qian and S. S. Sapatnekar, “A hybrid linear equation solver and itsapplication in quadratic placement,” in Proc. ICCAD, 2005, pp. 905–909.

[35] Hybrid Solver. [Online]. Available: http://www.ece.umn.edu/users/qianhf/hybridsolver/

Cheng Zhuo (S’06) received the B.S. and M.S.degrees in information science and electronic engi-neering from Zhejiang University, Hangzhou, China,in 2005 and 2007, respectively. He is currently work-ing toward the Ph.D. degree in the Department ofElectrical Engineering and Computer Science, Uni-versity of Michigan, Ann Arbor.

His research interests include power-grid design,robust circuit optimization, and timing analysis.

Jiang Hu (M’01–SM’07) received the B.S. de-gree in optical engineering from Zhejiang Univer-sity, Hangzhou, China, in 1990, the M.S. degree inphysics from the University of Minnesota, Duluth,in 1997, and the Ph.D. degree in electrical engineer-ing from the University of Minnesota, Minneapolis,in 2001.

He was with IBM Electronics Design Automationfrom January 2001 to June 2002. He is currently anAssistant Professor with the Department of Electricaland Computer Engineering, Texas A&M University,

College Station. His research interest is computer-aided design (CAD) forvery large scale integration circuits, particularly on interconnect optimiza-tion, clock network synthesis, variation tolerance technology, and design formanufacturability.

Dr. Hu has served as a technical program committee member for theDesign Automation Conference, the International Conference on CAD, theInternational Symposium on Physical Design, the International Symposium onQuality Electronic Design, the International Conference on Computer Design,the Design Automation and Test in Europe, and the International Symposiumon Circuits and Systems. He is currently an Associate Editor of the IEEETRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS

AND SYSTEMS. He was the recipient of the Best Paper Award at the ACM/IEEEDesign Automation Conference in 2001 and the IBM First Plateau InventionAward in 2003.

Min Zhao received the B.S. degree in computersand applications and the M.S. degree in electricaltransmission and automation from Dalian MaritimeUniversity, Dalian, China, in 1993 and 1996, respec-tively, and the Ph.D. degree in electrical engineer-ing from the University of Minnesota, Minneapolis,in 1999.

From 1999 to 2007, she was with the AdvancedTool Group, Freescale Semiconductor, Inc., Austin,TX. She is currently with Magma Design Automa-tion, Inc., Austin, TX. Her research interests include

logic synthesis and technology mapping, power grid analysis and optimization,on-chip inductance, and circuit simulation.

Kangsheng Chen received the B.S. degree fromZhejiang University, Hangzhou, China, in 1962.

Since 1985, he has been a Full Professor with theDepartment of Information Science and ElectronicEngineering, Zhejiang University. He has publishedmore than 200 papers. His current research interestsinclude on-chip signal integrity analysis and opti-mization, radio-frequency circuit design, and wire-less communication system modeling.