14
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006 1763 IC Thermal Simulation and Modeling via Efficient Multigrid-Based Approaches Peng Li, Member, IEEE, Lawrence T. Pileggi, Fellow, IEEE, Mehdi Asheghi, Member, IEEE, and Rajit Chandra, Senior Member, IEEE Abstract—The ever-increasing power consumption and pack- aging density of integrated systems creates on-chip temperatures and gradients that can have a substantial impact on performance and reliability. While it is conceptually understood that a thermal equivalent circuit can be constructed to characterize the tem- perature gradients across the chip, direct and iterative solutions of the corresponding three-dimensional (3-D) equations are often intractable for a full-chip analysis. Integrated circuit (IC)-specific multigrid (MG) techniques for fast chip level thermal steady-state and transient simulation are proposed. This approach avoids an explicit construction of the matrix problem that is intractable for most full-chip problems. Specific MG treatments are proposed to cope with the strong anisotropy of the full-chip thermal problem that is created by the vast difference in material thermal properties and chip geometries. Importantly, this paper demonstrates that only with careful thermal modeling assumptions and appropriate choices for grid hierarchy, MG operators, and smoothing steps across grid points can a full-chip thermal problem be accurately and efficiently analyzed. This paper further speeds up the large thermal transient simulations by incorporating reduced-order thermal models that can be efficiently extracted under the same MG framework. The experiments carried out in this work have shown that the proposed methodology provides sufficient effi- ciency in both runtime and memory usage. Index Terms—Integrated circuit thermal factor, integrated circuits, simulation, temperature control. I. I NTRODUCTION D RIVEN by the aggressive scaling of modern integrated circuit (IC) technologies, functionality and speed im- provements of IC designs are achieved by increasing both the packing density and clock frequency. As a result, IC power den- sity has been rapidly increasing with Moore’s law, and is now projected as a potential show stopper for future performance improvements [1]. Not only are we challenged to distribute an increased power density, but we must also consider the equally daunting problem of removing the corresponding heat that is dissipated. Manuscript received January 25, 2005; revised May 22, 2005. This paper was recommended by Associate Editor F. N. Najm. P. Li is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (e-mail: [email protected]). L. T. Pileggi is with the Department of Electrical and Computer Engi- neering, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: [email protected]). M. Asheghi is with the Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: masheghi@andrew. cmu.edu). R. Chandra is with Gradient Design Automation Inc., Santa Clara, CA 95054 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCAD.2005.858276 Elevated chip temperature can create many design challenges and reliability issues. First, high temperature and hot spots degrade the reliability of interconnects and transistors [2], [3]. Furthermore, on-chip thermal gradient can cause functional or timing failures through electrothermal coupling [2], [4]. Additionally, active power consumption and leakage are strong functions of the on-chip temperature profile, thereby making the prediction and minimization of power consumption insepa- rable problems from that of temperature analysis and control [5]. Moreover, the deployment of low-k dielectric materials to reduce capacitance, hence, power dissipation, worsens the thermal transport due to the decrease in material thermal conductivities. Recently, the IC thermal effects have received a great deal of attention from the circuit design community, which has spawned several works. In [2], [4], and [6], the authors simulate the full-chip temperature profile by discretizing the partial differential equation (PDE) of heat conduction using finite difference and finite element methods. The resulting discretized problem is then solved by adopting an equivalent circuit ap- proach and a corresponding direct solution method. In [7], the full-chip thermal transients are solved in a similar manner using an alternating direction implicit (ADI) method for efficiency. The self-heating of multilevel IC interconnects and the cor- responding impact on reliability and performance have been investigated in [3] and [8]–[10]. Of particular interest recently is the estimation of leakage power, which is exponentially dependent on the device temperatures. In [11], leakage analysis was conducted for large industrial designs while considering power supply and temperature variations. The awareness of thermal effects has also spawned new research in design opti- mization. In [12] and [13], the thermal gradient was included as an optimization objective in cell-level placement. At the microarchitecture level, runtime dynamic thermal management has been proposed as a means to regulate microprocessor operating temperature [14]. Clearly, an efficient full-chip thermal analysis methodology is becoming increasingly important for the design and optimiza- tion of modern very large scale integrated (VLSI) systems. This includes the packaging design, since the cost of the IC package and associated cooling [15] can dominate the IC product cost. However, considering the problem size as well as the complex on-chip three-dimensional (3-D) multilayer structures, 3-D full- chip thermal analysis is a daunting problem. The aforemen- tioned discretized heat PDEs are often solved using direct methods, or via SPICE simulation by treating the heat equation as an equivalent resistance–capacitance (RC) circuit. Although 0278-0070/$20.00 © 2006 IEEE

IC thermal simulation and modeling via efficient multigrid-based approaches

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006 1763

IC Thermal Simulation and Modeling via EfficientMultigrid-Based Approaches

Peng Li, Member, IEEE, Lawrence T. Pileggi, Fellow, IEEE, Mehdi Asheghi, Member, IEEE, andRajit Chandra, Senior Member, IEEE

Abstract—The ever-increasing power consumption and pack-aging density of integrated systems creates on-chip temperaturesand gradients that can have a substantial impact on performanceand reliability. While it is conceptually understood that a thermalequivalent circuit can be constructed to characterize the tem-perature gradients across the chip, direct and iterative solutionsof the corresponding three-dimensional (3-D) equations are oftenintractable for a full-chip analysis. Integrated circuit (IC)-specificmultigrid (MG) techniques for fast chip level thermal steady-stateand transient simulation are proposed. This approach avoids anexplicit construction of the matrix problem that is intractable formost full-chip problems. Specific MG treatments are proposed tocope with the strong anisotropy of the full-chip thermal problemthat is created by the vast difference in material thermal propertiesand chip geometries. Importantly, this paper demonstrates thatonly with careful thermal modeling assumptions and appropriatechoices for grid hierarchy, MG operators, and smoothing stepsacross grid points can a full-chip thermal problem be accuratelyand efficiently analyzed. This paper further speeds up the largethermal transient simulations by incorporating reduced-orderthermal models that can be efficiently extracted under the sameMG framework. The experiments carried out in this work haveshown that the proposed methodology provides sufficient effi-ciency in both runtime and memory usage.

Index Terms—Integrated circuit thermal factor, integratedcircuits, simulation, temperature control.

I. INTRODUCTION

DRIVEN by the aggressive scaling of modern integratedcircuit (IC) technologies, functionality and speed im-

provements of IC designs are achieved by increasing both thepacking density and clock frequency. As a result, IC power den-sity has been rapidly increasing with Moore’s law, and is nowprojected as a potential show stopper for future performanceimprovements [1]. Not only are we challenged to distribute anincreased power density, but we must also consider the equallydaunting problem of removing the corresponding heat that isdissipated.

Manuscript received January 25, 2005; revised May 22, 2005. This paperwas recommended by Associate Editor F. N. Najm.

P. Li is with the Department of Electrical and Computer Engineering, TexasA&M University, College Station, TX 77843 USA (e-mail: [email protected]).

L. T. Pileggi is with the Department of Electrical and Computer Engi-neering, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:[email protected]).

M. Asheghi is with the Department of Mechanical Engineering, CarnegieMellon University, Pittsburgh, PA 15213 USA (e-mail: [email protected]).

R. Chandra is with Gradient Design Automation Inc., Santa Clara, CA 95054USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TCAD.2005.858276

Elevated chip temperature can create many design challengesand reliability issues. First, high temperature and hot spotsdegrade the reliability of interconnects and transistors [2], [3].Furthermore, on-chip thermal gradient can cause functionalor timing failures through electrothermal coupling [2], [4].Additionally, active power consumption and leakage are strongfunctions of the on-chip temperature profile, thereby makingthe prediction and minimization of power consumption insepa-rable problems from that of temperature analysis and control[5]. Moreover, the deployment of low-k dielectric materialsto reduce capacitance, hence, power dissipation, worsens thethermal transport due to the decrease in material thermalconductivities.

Recently, the IC thermal effects have received a great dealof attention from the circuit design community, which hasspawned several works. In [2], [4], and [6], the authors simulatethe full-chip temperature profile by discretizing the partialdifferential equation (PDE) of heat conduction using finitedifference and finite element methods. The resulting discretizedproblem is then solved by adopting an equivalent circuit ap-proach and a corresponding direct solution method. In [7], thefull-chip thermal transients are solved in a similar manner usingan alternating direction implicit (ADI) method for efficiency.The self-heating of multilevel IC interconnects and the cor-responding impact on reliability and performance have beeninvestigated in [3] and [8]–[10]. Of particular interest recentlyis the estimation of leakage power, which is exponentiallydependent on the device temperatures. In [11], leakage analysiswas conducted for large industrial designs while consideringpower supply and temperature variations. The awareness ofthermal effects has also spawned new research in design opti-mization. In [12] and [13], the thermal gradient was includedas an optimization objective in cell-level placement. At themicroarchitecture level, runtime dynamic thermal managementhas been proposed as a means to regulate microprocessoroperating temperature [14].

Clearly, an efficient full-chip thermal analysis methodologyis becoming increasingly important for the design and optimiza-tion of modern very large scale integrated (VLSI) systems. Thisincludes the packaging design, since the cost of the IC packageand associated cooling [15] can dominate the IC product cost.However, considering the problem size as well as the complexon-chip three-dimensional (3-D) multilayer structures, 3-D full-chip thermal analysis is a daunting problem. The aforemen-tioned discretized heat PDEs are often solved using directmethods, or via SPICE simulation by treating the heat equationas an equivalent resistance–capacitance (RC) circuit. Although

0278-0070/$20.00 © 2006 IEEE

1764 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

useful for small thermal problems, such an approach does notscale well with the complexity of the full-chip analysis.

The multigrid (MG) method, as a multilevel iterative scheme,has become increasingly popular for solving a class of PDEproblems due to its superior efficiency in the computationalfluidic dynamics (CFD) community [16], [17]. However, manyCFD-based solvers were designed for solving generic PDEs andcannot be applied directly to 3-D full-chip thermal analysis.General finite element analysis tools (such as ANSYS) are im-practical for full-chip problems in terms of setup and runtime.Additionally, black-box algebraic multigrid (AMG) solvers aredesigned for wide applicability to general problems. Whenapplied to specific chip thermal analysis, they usually do notprovide an optimal solution. For instance, AMG solvers requirea relatively expensive setup phase before the actual matrix solvetakes place, which makes it especially inefficient for thermaltransient analysis and coupled electrothermal simulation. Forthese cases, the system matrix must be updated frequentlywhen variable time steps are used and/or when the temperaturedependency of thermal conductivities is considered.

In this paper, we focus on developing efficient numericalmethods for solving large IC thermal problems. Although wedo not consider electrothermal coupling due to the scope ofthis paper, our thermal analysis methodologies can be readilycombined with the temperature-dependent electrical modelingto facilitate a coupled simulation, e.g., considering temperature-dependent leakage power consumption. Our proposed full-chip thermal analysis methodologies are based on a geometricmultigrid (GMD) method [18]. In our approach, large linearproblems produced from the finite difference approximationsof the heat PDEs are efficiently solved using our GMD method.We characterize our methodology as geometric, since an ef-ficient MG solution is constructed with a knowledge of 3-Dchip structures and their thermal properties. Our simulator wasdesigned to handle large thermal problems with efficiency inboth runtime and memory usage. As will be shown in the paper,a successful adoption of the generic MG concept to the chipthermal analysis requires IC-specific treatments. In this paper,we show that the use of a proper grid hierarchy, interpolation,and restriction operators as well as a robust smoothing stepare crucial to achieve the required efficiency and robustness. Inparticular, to handle large problems with low memory usage,we adopt a simple h → 2h type grid coarsening operator toavoid the explicit formation of the problem matrices throughoutthe MG hierarchy. The application of the MG method to full-chip thermal analysis is complicated by the significant problemanisotropy created by vastly different material thermal proper-ties and the asymmetry in chip dimensions. In our approach, therobustness of the MG solution is achieved by designing properinterpolation, restriction operators, and robust smoothers.

In addition to the techniques for thermal steady-state sim-ulation presented in our prior work [18], in this paper, wefurther improve the efficiency of our MG solver by adoptinga restriction operator that is completely consistent with theemployed coarse-grid operator. Furthermore, we extend ourMG methodology to thermal transient analysis. Under the caseswhere the chip power consumption is modeled at the granularityof chip level circuit blocks, we show that low-order single-

input multioutput (SIMO)-based reduced-order thermal modelscan be adopted to significantly speed up the transient analysis.Moreover, these reduced-order models are extracted efficientlyunder the same MG framework. Experimental results havedemonstrated the efficiency of our MG thermal simulator forlarge-scale full-chip thermal problems.

The rest of the paper is organized as follows. We first pro-vide some background on the heat conduction PDE and MGmethods in Section II. Related thermal modeling issues arethen described in Section III. In Section IV, the proposedMG methodology for full-chip thermal steady-state analysis isdescribed. These techniques are extended for thermal transientanalysis in Section V, where reduced-order thermal modelingis also presented. Experimental results are shown in Section VI,followed by concluding remarks in Section VII.

II. BACKGROUND

A. Physical Models

The heat diffusion in an IC is governed by the followingPDE [19]:

ρcp∂

∂tT (r, t) = ∇ · (k(r, t)∇T (r, t)) + g(r, t) (1)

subject to the boundary condition

k(r, T )∂

∂niT (r, t) + hiT (r, t) = fi(r, t) (2)

where T is the temperature (◦C), r denotes the location inthe 3-D space, ρ is the material density (kg/m3), cp is thespecific heat [J/(kg · ◦C)], k is the thermal conductivity ofthe material (W/m2 · ◦C), g is the power density of the heatsources (W/m3), and ni, hi, and fi are the outward directionnormal to the boundary surface i, heat transfer coefficient[W/(m2 · ◦C)], and an arbitrary function at the surface i, res-pectively. For numerical solution, (1) can be discretized in bothtime and space by applying the backward Euler formula and thefinite difference discretization to the left and right sides of theequation, respectively. For an interior discretization point in ahomogeneous material, the discretized equation is written as

ρcp∆x∆y∆zTn+1

i,j,k − Tni,j,k

∆t

=−2(Gx + Gy + Gz)Tn+1i,j,k + GxT

n+1i−1,j+1,k + GxT

n+1i+1,j,k

+ GyTn+1i,j−1,k + GyT

n+1i,j+1,k + GzT

n+1i,j,k−1 + GzT

n+1i,j,k+1

+ ∆x∆y∆z gi,j,k (3)

where {i, j, k} indicates the location of the point in space, ∆t,∆x, ∆y, and ∆z are the discretization steps in time and alongthe x-, y-, and z-directions in space, respectively. Gx, Gy , andGz are defined as

Gx =k∆y∆z

∆xGy =

k∆x∆z

∆yGz =

k∆x∆y

∆z. (4)

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1765

Fig. 1. Generic MG cycles.

Proper modifications must be made in (3) and (4) when dis-cretizing (1) at a material interface or boundary [2]. To computethe thermal steady-state temperature, one drops the term onthe left side of (3), leading to a linear matrix problem. Thethermal transient simulation requires the consideration everyterm in (3) that, in fact, improves the matrix property ofthe underlying linear matrix problem. The enhanced diagonaldominance under this case is more favorable to the applicationof iterative solution methods.

B. MG Cycles

To solve the temperature distribution, a matrix problem de-scribed by (3) needs to be solved. For large problem sizes, directsolution methods are often intractable. Single-level iterativemethods can also become impractical due to their slow rateof convergence. Since the matrix corresponding to (3) can beformulated as symmetric positive definite (SPD), MG iterativemethods are applicable and become a good choice.

When properly designed, MG methods can be highly effi-cient for many PDE problems. They are optimal in a sensethat the time complexity is linear in the number of un-knowns (h-independent convergence) with a small constantfactor [16], [17]. In contrast, classical iterative methods suchas Gauss–Seidel (GS) suffer from their fundamental ineffi-ciency in removing (spatially) high-frequency solution errors.MG overcomes this limitation by constructing a multilevelscheme. By first discretizing the original PDE at a fine grid,a few relaxation or smoothing steps (such as pointwise GS)are applied to quickly remove the high-frequency errors in thesolution. Once the convergence deteriorates due to the existence

of low-frequency error components, an approximate problem isconstructed at a coarser grid. The residue equation at the finegrid is then mapped to the coarser grid, and the fast removalof low-frequency errors is achieved at the coarser grid, namelythrough coarse-grid correction. Upon the completion of coarse-grid correction, the correction of solution is mapped back fromthe coarse grid to the fine grid and a few more postsmoothingsteps are applied at the fine grid using the corrected solution asthe initial guess. The rationale behind this multilevel scheme isthat low-frequency errors will appear to be more oscillatory atthe coarser grid, therefore, they can be damped effectively bythe smoothing steps at the coarser level.

The classical MG cycle is sketched in Fig. 1. As shown in thefigure, three components need to be defined for an MG solver,namely: 1) coarser grid operator (Ak+1 with respect to Ak);2) interpolation (P k

k+1); and 3) restriction operators (Rk+1k ).

The coarser grid operator defines an approximate (coarsened)linear problem at each level. Interpolation and restriction oper-ators facilitate the mapping between fine grids and coarse grids.The standard MG V-cycles correspond to m = 1 and W-cyclesto m = 2. More details on MG methods and definitions can befound in [16], [17], [20], and [21].

C. GMD Versus AMG

There are two varieties of MG methods, namely: 1) GMDand 2) AMG. GMD employs a fixed grid hierarchy and en-sures an effective interplay between smoothing and coarse-gridcorrection by choosing proper smoothers. AMG attempts touse fixed simple smoothers while enforcing convergence byadopting more complex coarsening schemes. In many ways,

1766 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

Fig. 2. Thermal modeling of a packaged chip. (a) Chip in a C4/CBGA package with heat sink. (b) 3-D thermal modeling of the die.

AMG can be viewed as a black-box tool where a coarse-grid operator and an interpolator operator are derived from theunderlying matrix without a notion of grid hierarchy. Whena good interplay between smoothing and coarse-grid correc-tion is achieved, GMD can be highly efficient for specificproblems. In contrast, AMG operates on the matrix directly.Thus, the problem matrix must be assembled explicitly. Beforethe matrix solve begins, AMG also needs to perform a setupphase to construct all of the coarse levels and assemble coarse-grid operators. Therefore, AMG is usually not considered as“optimal” for problems for which efficient GMD approachesare available. In thermal transient analysis and electrothermalsimulation, the problem matrix needs to be updated frequentlybetween iterations. To consider the temperature dependencyof the problem, an AMG solver would need to repeatedlyperform the setup phase, thereby making it highly inefficient.The advantages of AMG solvers lie in the ease of use and wideapplicability.

III. THERMAL MODELING STRATEGY

A. Modeling of the Chip

A diagram of chip in a C4/CBGA-like package is shownin Fig. 2(a). We take a standard approach to model the heatdiffusion in a packaged chip [4], [7], [22]. In this paper, heatis assumed to be generated only by active devices, although werecognize that the self-heating of metal lines can be estimatedbased on metal current densities using an approach such asin [3]. Two major heat conduction paths are modeled, onethrough the heat sink to the ambient and the other though thepackage to the board. Therefore, the heat generated on thedie can dissipate into the surrounding environment throughthe top material layer of the die and the bottom surface of thesilicon substrate. These two heat conduction paths are modeledas convective. For simplicity, a reflective (insulated) boundarycondition is assumed for chip sides due to the small sidewallarea. Thermal properties of various material layers are modeledby the corresponding thermal conductivities. To maintain a highaccuracy for on-chip temperature distribution, we apply 3-D

modeling to the die while modeling external components usingone-dimensional (1-D) models as illustrated in Fig. 2(b).

B. Modeling of Inhomogeneous Material Layers

One difficulty in chip thermal modeling is that the metal,dielectric, and other layers are not uniform due to the com-plex layout patterns. A feasible full-chip thermal analysis willrequire a proper thermal modeling of these inhomogeneousmaterial layers. A simple analysis will shed useful insights onthe scale of the thermal problem for a large design. Commercialmicroprocessors typically have a chip size in the range ofa few centimeters on each side. For modern complementarymetal–oxide–semiconductor (CMOS) technologies, die thick-ness is usually between 200 and 500 µm. In contrast, detailedlayout features such as metal line and via patterns are at asubmicrometer scale. If we assume a chip dimension of 2 cm ×2 cm × 500 µm, discretizing the PDE for heat conduction witha grid size of 10 µm in all directions will already lead to ananalysis problem with 200 million unknowns! Obviously, theexplosion of the problem size will make it even more imprac-tical to consider submicrometer-scale layout patterns using afiner discretization.

Clearly, it is imperative to employ compact thermal modelingfor metal, interlayer dielectric (ILD), and other inhomogeneouslayers to facilitate an efficient yet accurate 3-D full-chip analy-sis. Hence, thermal models must be built for composite materiallayers in order to capture the average thermal behavior of asmall region (at least at the scale of a few tens of micrometerseach side) and its interaction with neighboring regions. Ideally,modeling can be accomplished via a precharacterized homoge-nization process, where an equivalent thermal conductivity forthe region is computed based on layout features such as vialocation and density [3].

The focus of this paper is to develop efficient simulationtechniques for large IC thermal problems. To demonstratethe proposed simulation engine, we adopt a relatively simplemodeling approach for the localized regions and leave moredetailed modeling as an open research problem. We considerthe nonuniformity of a composite material layer by computing

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1767

Fig. 3. Grid hierarchy. Coarsening takes place laterally in thin material layers and in all directions in the thick substrate.

localized averaged material thermal conductivities. A layer islaterally divided into uniform bins with certain size. A simpleequivalent thermal conductivity is computed for each bin bytaking a weighted average of thermal conductivities of variousmaterials in the bin. Once such a binning process is completed,a step size somewhat smaller than the bin dimension is chosento discretize the PDE in 3-D.

IV. THREE-DIMENSIONAL FULL-CHIP THERMAL

STEADY-STATE ANALYSIS USING MG

In this section, the proposed GMD-based thermal steady-state analysis technique is presented. In the subsequent sub-sections, we will show how the three components of our MGfull-chip thermal solver are constructed in order to achieveefficiency in both runtime and memory usage for IC thermalproblems.

A. Grid Hierarchy

For modern deep-submicron (DSM) technologies, typicallyall material layers except for the silicon substrate have asubmicrometer thickness. Therefore, during the 3-D coars-ening MG process, the information about thin material lay-ers and their interfaces tends to be lost at coarse grids.To prevent this from happening, we do not apply coars-ening in the z-direction except for within the silicon sub-strate. We keep at least two discretization points in thez-direction for thin layers throughout the grid hierarchy. Oneadditional advantage of this approach is that it also simplifiesthe construction of a coarse-grid operator that is discussed as

follows. Given the potential large problem size of a 3-D full-chip thermal analysis, we combine the simple h → 2h directdiscretization scheme with thermal conductivity weighting todefine a coarse-grid operator to reduce the memory usage. Asdepicted in Fig. 3, at each coarse grid, the discretization stepsize is doubled in both x- and y-directions for all layers, butonly doubled in the z-direction for substrate. Moving down thegrid hierarchy, an equivalent thermal conductivity is succes-sively computed for each control volume as a weighted averageof thermal conductivities of the enclosed smaller volumes onthe next fine grid. The coarser grid operator is obtained by adirect discretization of the PDE using the new grid size and theupdated thermal conductivities. Notice that one of the advan-tages of using direct discretization to derive the coarser gridoperator is that we only need to store the thermal conductivitiesfor different regions of the chip, but do not need to constructa matrix representation of the corresponding discretized linearsystem. A sparse matrix representation consumes much morestorage.

It should be noted that this strategy allows us to avoidforming coarse-grid operators explicitly in matrix form. Incontrast, Galerkin-type coarse-grid operators are constructed bymultiplying the fine-grid operator using interpolation and re-striction operators: Ak+1 = Rk+1

k AkPkk+1 (using the notation

of Fig. 1). In this scheme, averaging of thermal conductivitiesis done in a more abstract fashion. The overhead of usingGalerkin operators includes extra matrix multiplications andstorage of coarse-grid operators. Additionally, Galerkin coarse-grid operators can be denser than the fine-grid operator, leadingto more memory usage. The advantage of Galerkin coarse-grid

1768 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

Fig. 4. Trilinear interpolation in a uniform medium.

operators is that they are usually considered as robust. However,as will be shown in the following sections, when combined withour grid hierarchy and robust line smoother, our simple coarse-grid operator can also perform very robustly while allowing usto avoid explicit construction of the thermal problem matrix.

B. Interpolation and Restriction

Since a coarse grid has fewer number of grid points than thenext fine grid, interpolation needs to be employed to computethe solution correction at the fine-grid points that are not presenton the coarse grid. Conversely, restriction needs to be appliedin order to map the residue from a fine grid to the next coarsergrid. Proper interpolation and restriction operators are vital tothe performance of the MG solver. Badly designed operatorscan lead to a slow convergence in MG, in some cases, they mayeven cause the divergence of iterations.1) Operator-Dependent Interpolation: In a homogeneous

material, 3-D trilinear operators can be used very effectivelyto transfer the solution error from the coarse grid to the finegrid. The interpolation is simply done by linearly averaging thevalues of neighboring coarse-grid points. The trilinear interpo-lation is depicted in Fig. 4, where the errors of the eight cornerpoints of the cube are available from the solution of the coarse-grid problem. To compute the error at the center of the cube,which is not present at the coarse grid, an average of the eightpoints is used

ei,j,k =18(ei−1,j−1,k−1 + ei+1,j−1,k−1

+ ei−1,j+1,k−1 + ei+1,j+1,k−1

+ ei−1,j−1,k+1 + ei+1,j−1,k+1

+ ei−1,j+1,k+1 + ei+1,j+1,k+1). (5)

However, at material interfaces where thermal conductivitychanges abruptly, trilinear interpolation becomes inappropriate.This is particularly true for IC thermal analysis, since on-chip materials can have vastly different thermal properties.For example, at the interface between an ILD layer and ametal layer, the thermal conductivity k(r, T ) can jump by

two orders of magnitude. The underlying assumption of atrilinear interpolator is the continuity in ∇T (r, t) [see (1)]. Atmaterial interfaces with jumping k(r, T ), ∇T (r, t) is no longercontinuous, but k(r, T ) · ∇T (r, t) is. Applying a trilinear in-terpolator for diffusion problems with jumping coefficientscan lead to convergence problems. Under this case, a morereliable interpolator is the one that is based on the continu-ity of k(r, T ) · ∇T (r, t) [20]. This is the so-called operator-dependent interpolator, since the interpolation is built based onthe discretization of the PDE, not the geometric distances.

Except for 1-D PDE problems, it becomes impossible toenforce the continuity of k(r, T ) · ∇T (r, t) at every interpo-lation point. Usually, certain averaging and approximation areused when defining an operator-dependent interpolator. For the3-D full-chip thermal problem, we use an operator-dependentinterpolator at the material boundaries by extending the two-dimensional (2-D) approach in [20]. To illustrate, consider adiscontinuity between two material layers with distinct ther-mal conductivities k1 and k2, as shown in Fig. 5(a), wherethe material interface is assumed to be aligned to grid lines.Points marked by a dark circle are on both the coarse andfine grids. Fine-grid points marked by “x” are not on theinterface, therefore, they can be interpolated using the trilinearinterpolation. During the interpolation operation, we alwaysprocess the points that are located in a homogeneous materialfirst followed by points located at a material interface. Now,consider the surface shared by the two coarse-grid controlvolumes in Fig. 5(a). For each of such regions at the materialsurface, we first interpolate the points on the boundary of thesurface region. Let us consider the boundary point 1 in Fig. 5(a).The seven-point finite difference discretization at point 1 isillustrated in Fig. 5(b), where (i, j, k) is used to identify apoint’s 3-D location. Let us assume that the standard seven-point discretization of the heat PDE at point 1 has the form

(ai−1,j,k + ai+1,j,k + ai,j−1,k

+ ai,j+1,k + ai,j,k−1 + ai,j,k+1)Ti,j,k

− ai−1,j,kTi−1,j,k − ai+1,j,kTi+1,j,k

− ai,j−1,kTi,j−1,k − ai,j+1,kTi,j+1,k

− ai,j,k−1Ti,j,k−1 − ai,j,k+1Ti,j,k+1 = fi,j,k (6)

where fi,j,k is the right-hand side of the discretization due tothe power sources in the control volume.

Notice that ai,j,k’s in (6) are not necessarily identical due tothe nonuniformity in the discretization step size and the thermalconductivities. To handle the material discontinuity at point 1properly, instead of using the simple trilinear interpolation, wewould like to more generally enforce

Lnen = 0 (7)

where Lnen denotes the application of the finite differenceoperator to the (interpolated) solution error on the fine grid.In other words, (7) is obtained by replacing Ti,j,k

′s by ei,j,k′s

at the left-hand side of (6) and setting the right-hand side tozero. If the solution errors at all six neighboring points were

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1769

Fig. 5. Operator-dependent interpolator. (a) Interpolation at a material interface. (b) Seven-point discretization at point 1 of (a).

known a priori, (7) could be used to determine the interpolatederror at point 1. Notice that points (i, j − 1, k) and (i, j + 1, k)are the center points of two different material interface regions.Therefore, the errors at these two points are not known yet,since they have not been processed. To solve this problem,an averaging along the y-direction is taken. By approximatingei,j−1,k and ei,j+1,k by ei,j,k in (7), an approximation to (7) ismade using

(A− ai,j−1,k − ai,j+1,k)ei,j,k

− ai−1,j,kei−1,j,k − ai+1,j,kei+1,j,k

− ai,j,k−1ei,j,k−1 − ai,j,k+1ei,j,k+1 = 0 (8)

where A= ai−1,j,k+ai+1,j,k+ai,j−1,k+ai,j+1,k+ai,j,k−1+ai,j,k+1. Thus, the operator-dependent interpolation at point 1is defined by (9), shown at the bottom of the page.

Notice that all of the error values at the right-hand side of(9) are either directly available from the coarse grid or obtainedfrom trilinear interpolation. In the same fashion, (9) is appliedto perform interpolation at points 2, 3, and 4. The last pointto consider is point 5. For this point, the operator-dependentinterpolation can be done by directly applying (7), since errorsof its six neighboring points have already been determined.Using the same notation as in (9), the interpolation for point5 is written as

ei,j,k =1A

(ai−1,j,kei−1,j,k + ai+1,j,kei+1,j,k

+ ai,j−1,kei,j−1,k + ai,j+1,kei,j+1,k

+ ai,j,k−1ei,j,k−1 + ai,j,k+1ei,j,k+1). (10)

2) Restriction: In MG, restrictions are required to map theresidue of the fine-grid problem solution to the coarse grid.In order to perform the restrictions properly under nonuni-

form discretization steps and varying thermal conductivities, anoperator-dependent restriction operator is usually adopted. In[18], the restricted residue at a coarse-grid point is a weightedaverage of the residues of the corresponding seven discretiza-tion points on the fine grid. The weight of each point in therestriction is taken as the absolute value of its coefficient inthe finite difference. However, the above interpolator is notcompletely consistent to the adopted coarser grid operator,which is based on the direct discretization of the heat PDEusing larger step sizes. To improve the performance of the MGsolver, in this paper, we avoid this inconsistency by adopting arestriction operator as follows.

When deciding the residue value for any given control vol-ume at the next coarser grid, we first determine the overlappingbetween this volume and any (smaller) control volume at thefine grid. Then, we obtain the residue of the coarser grid volumeby adding to it portions of fine-grid residues according to theamount of overlapping. In other words, the residue of a coarsergrid volume consists of residues corresponding to portions offine-grid volumes that are confined in the coarser grid volume.We demonstrate this approach in 2-D for an interior controlvolume in Fig. 6(a), where the dashed lines are grid lines ofthe fine grid, and the dotted-dashed lines those of the nextcoarser grid. Notice that the overlapping of control volumesalong the z-direction should be similarly considered which isnot shown in the figure. To compute the residue for the coarsergrid volume at the center of the plot (dashed region), we firstfind the overlapping between the dashed region and 27 smallervolumes (shaded regions) at the fine grid (only nine of them areshown). We set the residue of the coarser grid as a weightedsum of the residues of the overlapped fine-grid volumes, whereeach weight is determined as the ratio between the volume ofthe overlapped region and the total volume of the correspondingfine-grid control volume. For instance, for the uniform griddingof Fig. 6(a), the fine-grid volume at (i− 1, j − 1, k) locationgets a weight of 0.25. The complete expression for the restricted

ei,j,k =(ai−1,j,kei−1,j,k + ai+1,j,kei+1,j,k + ai,j,k−1ei,j,k−1 + ai,j,k+1ei,j,k+1)

A− ai,j−1,k − ai,j+1,k(9)

1770 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

Fig. 6. Restriction operations at interior and boundary points. (a) Interior point. (b) Boundary point.

Fig. 7. Robust vertical line smoother. (a) Anisotropy in heat conduction. (b) Simultaneous relaxation of a vertical stack.

residue for this interior coarser grid control volume in 3-D isgiven as

rcoarse

= ri,j,k + 0.5(ri−1,j,k + ri+1,j,k + ri,j−1,k + ri,j+1,k)

+ 0.25(ri−1,j−1,k+ ri+1,j−1,k+ ri−1,j+1,k+ ri+1,j+1,k)

+ 0.5ri,j,k−1 + 0.25(ri−1,j,k−1 + ri+1,j,k−1

+ ri,j−1,k−1 + ri,j+1,k−1)

+ 0.125(ri−1,j−1,k−1 + ri+1,j−1,k−1

+ ri−1,j+1,k−1 + ri+1,j+1,k−1)

+ 0.5ri,j,k+1 + 0.25(ri−1,j,k+1 + ri+1,j,k+1

+ ri,j−1,k+1 + ri,j+1,k+1)

+ 0.125(ri−1,j−1,k+1 + ri+1,j−1,k+1

+ ri−1,j+1,k+1 + ri+1,j+1,k+1). (11)

For nonuniform gridding, control volumes at a material inter-face or chip boundary, the above expressions should be adjustedproperly. For instance, for the coarser grid volume sitting atthe chip periphery [dashed region in Fig. 6(b)], the restrictedresidue is given by

rcoarse = ri,j,k + 0.5(ri+1,j,k + ri,j+1,k) + 0.25ri+1,j+1,k

+ 0.5ri,j,k−1 + 0.25(ri+1,j,k−1 + ri,j+1,k−1)

+ 0.125ri+1,j+1,k−1 + 0.5ri,j,k+1

+ 0.25(ri+1,j,k+1 + ri,j+1,k+1)

+ 0.125ri+1,j+1,k+1. (12)

Our experiments have shown that the adoption of this consistentrestriction operator can bring a few times runtime speedupcompared to our earlier approach in [18].

C. Robust Vertical Line Smoother

Another complication of 3-D thermal analysis comes fromthe strong problem anisotropy created by discontinuity in ther-mal conductivity and the asymmetry of the chip dimension.Problem anisotropy has a profound impact on the convergenceproperty of an MG approach. Under this case, the standard GSsmoother is no longer effective and can easily lead to diver-gence in MG iteration. In addition to adopting the operator-dependent interpolator and restriction operator, more robustsmoothers have to be developed to enhance the robustnessof MG.

To this end, the nature of the problem anisotropy in the3-D chip thermal analysis must be first understood. First, gridpoints in a highly (thermally) conductive material layers tend tobe more strongly coupled than with other points. For instance,grid points in a metal layer are more strongly coupled thanthose in an ILD layer due to the low metal thermal resistance.The second source of anisotropy is from geometry as a resultof finite difference discretization of the PDE. Notice that thelateral dimension of a chip is usually much larger than thechip thickness. As a result, starting from some coarse grid,the lateral dimension of a control volume will be much largerthan its thickness. This geometrically induced asymmetry ismore pronounced in thin material layers where the lateraldimension of a control volume can be comparable to the diedimension, while its thickness is only at the submicrometerscale. As depicted in Fig. 7(a), since the sidewall surface areaof a control volume is much smaller than that of its top/bottom

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1771

surface, the heat diffusion between two laterally neighboringcontrol volumes is significantly smaller than that between twovertically stacked ones.

Anisotropy of the problem can introduce significant conver-gence problems if not handled appropriately. Our experimentsshow that use of the simple pointwise GS smoother can easilylead to divergence in MG. This is because pointwise relaxationsbecome ineffective even in removing high-frequency solutionerrors for anisotropic problems. Note that the coarse grid cor-rectly works under the premise that the high-frequency errors atthe fine grid has been effectively removed by the presmoothingsteps. Thus, the existence of large high frequency destroys theproper interplay between the fine grid and coarse grid on whichthe MG method operates.

For highly anisotropic problems, more complex smoothersare needed in order to effectively smooth the error [17]. Thisis usually accomplished by relaxing strongly coupled pointssimultaneously [17]. In our MG approach, instead of usinga standard GS pointwise smoother, we adopt a vertical linesmoother to simultaneously relax all the control volumes in avertical stack as shown in Fig. 7(b). There is a straightforwardphysical intuition behind this vertical line smoother. Namely,the majority of heat accumulated in any control volume willalways find low resistive paths to diffuse. As a result, most ofthe heat will diffuse vertically to the volume’s neighbors belowand above rather than laterally to its adjacent volumes. In otherwords, control volumes in a vertical stack are strongly coupled.Due to the special 3-D form of multiple on-chip materiallayers and our choice of grid hierarchy, anisotropy createdby geometry is more dominant than that due to variance inthermal conductivity. The adoption of the vertical line smootheris very effective in coping with both sources of anisotropy,thereby significantly enhancing the robustness of MG iteration.To simultaneously relax a vertical stack, a tridiagonal systemneeds to be solved. The size of the tridiagonal problem is equalto the number of volumes in the vertical stack, which is usuallyless than 100 for the IC problems of interest. Therefore, thissmall tridiagonal problem can be very efficiently solved usingthe Thomas algorithm, leading to almost negligible runtimeoverhead.

V. THERMAL TRANSIENT SIMULATION

In addition to the steady-state on-chip temperature distrib-ution, the thermal transients can also be of interest for variousapplications. For instance, in dynamic thermal/leakage manage-ment, the dynamic variation of on-chip temperature is used toadjust the operation of the chip such that the leakage power andthe peak chip temperature can be properly controlled. In thissection, we extend the MG techniques for thermal steady-stateanalysis presented in the previous section to thermal transientanalysis. We also present SIMO-based reduced-order thermalmodeling.

A. MG-Based Thermal Transient Analysis

To perform thermal transient analysis, a numerical inte-gration method such as the backward Euler is required.

Reorganizing (3), we have

(2(Gx + Gy + Gz) +

ρcp∆x∆y∆z

∆t

)Tn+1

i,j,k

−GxTn+1i−1,j+1,k −GxT

n+1i+1,j,k −GyT

n+1i,j−1,k

−GyTn+1i,j+1,k −GzT

n+1i,j,k−1 −GzT

n+1i,j,k+1

= ∆x∆y∆z gi,j,k +ρcp∆x∆y∆z

∆tTn

i,j,k. (13)

Notice that the term ρcp∆x∆y∆z in (3) represents a capaci-tance. To solve the thermal transient analysis problem, one canmodel the above thermal system as an equivalent RC circuit.Then, a SPICE-like simulation technique can be applied to theequivalent RC circuit to provide the thermal transient response[2], [4], [6]. However, just as for the thermal steady-stateanalysis, a direct solution method does not scale well with theproblem size. To avoid the direct matrix factorization, in [7], aniterative ADI method is proposed to more efficiently solve thethermal transient analysis. It should be noticed that the abovesystem of equations well resembles what is for the thermalsteady-state analysis, therefore, the MG approach describedabove can be used to efficiently solve the transient analysis. Inthe following, we describe a few straightforward modificationsneeded for applying MG. In the next subsection, we offer afurther improvement by using reduced-order modeling.

As indicated by ρcp∆x∆y∆zTn+1i,j,k /∆t term in the above

equation, the system matrix has been made more diagonallydominant compared to the thermal steady-state case. Therefore,the convergence property of iterative solutions is improved.When defining the coarse problems at different grid levels, weneed to include the ρcp∆x∆y∆zTn+1

i,j,k /∆t term in the direc-tion discretization wherein the thermal capacitive effects areconsidered for larger and larger control volumes. The operator-dependent interpolator is modified accordingly, i.e., the samefactor is included in (7), (9), and (10). However, the samerestriction operator described in Section IV-B2 can be usedwithout any change. Since the transient response at the previoustime step provides a good initial guess for the response at thecurrent time step, solving a new time point usually requires aless amount of central processing unit (CPU) time comparedto a complete dc solution. Therefore, a smaller number ofiterations will be needed to achieve a good convergence for eachtime point.

B. Reduced-Order Thermal Modeling

Although MG offers an efficient methodology for thermaltransient analysis, solving a large thermal problem over manytime points may still require a large amount of CPU time. Inthis section, we consider to use reduced-order thermal modelsfor speeding up the simulation. For cases where the chip powerconsumption is modeled at the granularity of chip level circuitblocks, such as what is provided by certain high-level powerestimation techniques, we may evenly distribute the estimatedpower within each block’s bounding box and treat it as a sin-gle input excitation to the equivalent thermal RC model. This

1772 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

implies that under this situation, the number of independentinputs to the thermal model is constrained to be the numberof the chip-level circuit blocks, which is usually rather limited.Thus, the whole chip can be thermally modeled as a set ofSIMO models, one corresponding to each circuit block. EachSIMO model relates the chip temperature distribution with thecorresponding power source, and the overall whole chip dy-namic temperature response can be obtained by superimposingall the SIMO models.

This suggests that the thermal transient analysis can be spedup if each of these SIMO models is replaced by a compact lowerorder model. To achieve this goal, any existing model orderreduction technique for IC interconnects (such as that of [23])can be applied. One benefit of using the passive model orderreduction technique of [23] for SIMO-based reduction is thateven though the passivity is not guaranteed, the reduced SIMOmodels, however, are always stable. For our full-chip thermalsimulation, in fact, the model passivity is not an issue, sincethe full model captures all the thermal effects. In other words,the reduced SIMO models are interconnected neither with eachother nor with other thermal systems. It should be noticed thatwe require the power consumption of the chip be modeled at thecircuit block level to apply the SIMO-based reduced modeling.The limitation of this approach is that it is not applicable to amuch finer power modeling granularity.

Consider a standard state-space SIMO model

d

dtCx(t) + Gx(t) = bu(t) (14)

where x(t) is the vector of states or temperature response, Gand C are the system conductance and capacitance matrices,u(t) is the input to the system representing a power source,and b is a vector linking the input to the system, respectively.To use the Krylov subspace technique of [23], one needsto compute an orthonormal basis V of the Krylov subspacedefined as colspan{r,A−1r,A−2r, . . .}, where A = G−1C, r =G−1b. The reduced SIMO model is defined using the reducedsystem matrices with smaller dimensions: G̃ = V TGV , C̃ =V TCV , b̃ = V T b. The major computation involved in modelconstruction is due to the multiple linear system solutionsdefined by matrix G. Notice that, under the MG framework,the linear system solutions can be efficiently facilitated bysolving equivalent dc thermal problems with proper right-handsides. This means that the reduced thermal models can berather efficiently extracted by exploiting our MG-accelerateddc thermal solver. Additionally, due to the RC nature of theIC thermal behavior as well as the relatively large thermal timeconstants, we have observed that usually low-order (e.g., thirdorder) models are sufficient for good accuracy. Our experimentsshow that reduced-order models are able to provide significantruntime speedup.

VI. EXPERIMENTAL RESULTS

In this section, we demonstrate the full-chip thermal simu-lation results of two simplified abstractions of microprocessor

TABLE ISTATISTICS OF TWO MICROPROCESSOR DESIGNS

designs. The statistics of these two microprocessors are shownin Table I. The primary (through the heat sink to ambient) andsecondary (through the package to board) heat conduction pathsare modeled using certain equivalent heat transfer coefficientslisted in Table I. Notice that for simplicity, we have adopted arelatively simple 1-D model to model the thermal properties ofthe exterior components of a packaged chip. We anticipate thata more detailed thermal package model can lead to improvedaccuracy of the overall analysis.

A. Thermal Steady-State Analysis

The on-chip steady-state power distribution for these twodesigns were obtained by evenly distributing the power con-sumption of each circuit block within its bounding box. Thecorresponding steady-state on-chip temperature profiles weresimulated by discretizing the heat PDE using a fine-grid sizeand solving the resulting matrix problems using the proposedMG solver (implemented in C++ language). The on-chip powerdensities and the corresponding steady-state temperature pro-files on the surface of silicon substrate are plotted in Figs. 8and 9, respectively, for processors A and B. The temperaturevariation along a vertical cross section of the substrate forprocessor B is shown in Fig. 10.

We compare five solution methods on the thermal steady-state simulation of these two designs, given as follows: 1) adirect commercial solver for symmetric matrix problems;2) incomplete LU (ILU) preconditioned conjugate gradient(CG) method (LASPACK [24]); 3) red–black point GS; 4) GSwith the vertical line smoother described in the previous sec-tion; and 5) MG with the vertical line smoother. In order tocompare the five solvers on different problem sizes, heat PDEsof the two designs were discretized using different step sizes.Due to the low memory requirement of red–black GS, GS withthe line smoother and MG, these three solvers were executed ona 2.5-GHz Pentium IV personal computer (PC) with 512 MBof memory running the Linux operating system. The other twosolvers were executed on an IBM RS6000 workstation with2 GB of memory. The runtimes and memory usages of thesefive methods are reported in Tables II and III for processorsA and B, respectively. Here, a relative accuracy tolerance of1e-6 is used for the convergence check. In the tables, “–”indicates that the runtime of a solver was too long or thememory usage exploded on the respective machine. As canbe seen in Tables II and III, the direct solver can be onlyapplied to a problem with a size less than a few hundredthousands, and beyond this, the memory usage explodes. The

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1773

Fig. 8. Thermal steady-state simulation of processor A. (a) On-chip power distribution. (b) Temperature distribution at the substrate surface.

Fig. 9. Thermal steady-state simulation of processor B. (a) On-chip power distribution. (b) Temperature distribution at the substrate surface.

ILU preconditioned CG method is relatively efficient in runtimeand memory usage for small problem sizes. However, for largeproblems, runtime can become extremely excessive while re-quiring a huge amount of memory. The remaining three iterativemethods all have very low memory requirement. However,red–black GS did not converge for any of these test problems.As can be seen in the tables, the convergence property ofGS iterations was improved by incorporating the vertical linesmoother, but there is still a problem with slow convergencefor large problems, as expected. It is evident from the tablesthat the MG method enhanced by the vertical line smoother andoperator-dependent interpolation and restriction is both runtimeand memory efficient for all these problems. For instance, thelargest problem of 11.8M unknowns can be solved using the

proposed MG method in 125 CPU seconds with only 231 MBof memory.

In addition to the results listed in Tables II and III, ourexperiments also show that for more inhomogeneous problems,a smaller discretization step size is usually required to achievethe same level of accuracy. The same is observed when the inputpower has a larger degree of spatial variation. To have a quickestimate of the chip temperature, however, a relative large stepsize can be chosen to speed up the analysis. For MG method,interpolation and restriction operators do play an important rolein convergence. This is because the quality of the both operatorswill impact the interplay of the fine and coarse grids. As anexample, the restriction operator adopted in this paper is moreconsistent with the coarser grid operator employed compared

1774 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

Fig. 10. Cross section temperature profile of the substrate along x =7700 µm and y = 8800 µm of processor B.

TABLE IICOMPARISONS OF RUNTIME AND MEMORY USAGE FOR PROCESSOR A

to the operator-dependent restrictor in [18]. In our experiments,we have observed that the use of new restrictor brings more than3× runtime speedup.

B. Thermal Transient Analysis

To demonstrate the MG-based transient analysis, we con-sider a thermal simulation problem for processor B. A total of15 chip-level circuit blocks are included in the simulation.Out of these 15 blocks, one block is always on dissipating aconstant power consumption; 10 other blocks are turned onfrom t = 0 ms to t = 0.1 ms and shut off afterward; anotherfour blocks are turned on, since t = 0.4 ms, and remain activesince then. The transient simulation problem is set up basedon one million discretization points in 3-D space. Using atime step of 1 µs, the thermal analysis is performed for 1000steps. Running on the same PC, the simulation takes 924 s tocomplete. The temperature distributions of the substrate surface

TABLE IIICOMPARISONS OF RUNTIME AND MEMORY USAGE FOR PROCESSOR B

at t = 0.1 ms and t = 0.7 ms are shown in Fig. 11(a) and (b),respectively.

The same circuit example is simulated based on reduced-order SIMO models. Fifteen reduced SIMO models, one foreach circuit block, are computed using the aforementionedreduction procedure. The final temperature response is obtainedby superimposing the responses of these 15 reduced models.Two reduction orders are considered for the reduced-ordermodeling, i.e., second order and third order. Constructing 15reduced second-order models takes 64 s to complete, whilethe subsequent 1000 transient time steps take 34 s based onthe reduced-order models in comparison with the 924 s of thefull system. For the case of third-order model reduction, themodel extraction and transient simulation times are 97 and 36 s,respectively. We expect that for transient simulations over manymore time points, the reduced-order models will bring an evenmore significant runtime speedup.

To illustrate the accuracy of the reduced-order model, twosimulation cases are considered. In case A, all the circuit blocksare turned on from t = 0 ms and remain active during the courseof the simulation while case B is the same as in the previoussection. The temperature responses of one point at the substratesurface computed from the full model (with one million states)and the reduced-order models are plotted in Fig. 12 for thosetwo cases. As can be seen, the reduced-order modeling is veryaccurate for capturing the temperature variation over time. Forthese particular examples, the third-order modeling is sufficientfor providing an excellent accuracy.

VII. CONCLUSION

Efficient thermal analysis is becoming a critical requirementfor ensuring the performance and reliability of large VLSIdesigns. 3-D full-chip thermal simulation can lead to a problemscale that cannot be handled by traditional simulation tech-niques. In this paper, we have presented a highly efficient MG-based 3-D full-chip thermal simulator. Our experiments haveshown that this simulator can perform large-scale 3-D full-chipsteady-state and transient thermal analyses with significantly

LI et al.: IC THERMAL SIMULATION AND MODELING VIA EFFICIENT MULTIGRID-BASED APPROACHES 1775

Fig. 11. Thermal transient simulation of processor B. (a) Temperature profile at t = 0.1 ms. (b) Temperature profile at t = 0.7 ms.

Fig. 12. Thermal transient simulation using reduced-order models. (a) Case A. (b) Case B.

improved runtime and memory efficiency. Furthermore, wehave shown that the thermal transient analysis can be sped upvia model order reduction for which the reduced models areefficiently extracted under the same MG framework.

REFERENCES

[1] International Technology Roadmap for Semiconductors, 2003 Edition.[Online]. Available: http://public.itrs.net/

[2] Y. Cheng, P. Raha, C. Teng, E. Rosenbaum, and S. Kang, “ILLIADS-T:An electrothermal timing simulator for temperature-sensitive reliabilitydiagnosis of CMOS VLSI chips,” IEEE Trans. Comput.-Aided Des. Integr.Circuits Syst., vol. 17, no. 8, pp. 668–680, Aug. 1998.

[3] T. Chiang, K. Banerjee, and K. Saraswat, “Compact modeling and spice-based simulation for electrothermal analysis of multilevel ULSI intercon-nects,” in Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD),San Jose, CA, Nov. 2001, pp. 165–172.

[4] Y. Cheng, C. Tsai, C. Teng, and S. Kang, Electrothermal Analysis of VLSISystems. Boston, MA: Kluwer, 2000.

[5] L. He, W. Liao, and M. Stan, “System level leakage reduction consideringthe interdependence of temperature and leakage,” in Proc. IEEE/ACMDesign Automation Conf., San Diego, CA, Jun. 2004, pp. 12–17.

[6] Z. Yu, D. Yergeau, R. Dutton, S. Nakagawa, and J. Deeney, “Fastplacement-dependent full chip thermal simulation,” in Proc. Int. Symp.VLSI Technology, Systems, and Applications, Hsinchu, Taiwan, R.O.C.,Apr. 2001, pp. 249–252.

[7] T. Wang and C. Chen, “3-D thermal-ADI: A linear-time chip level tran-sient thermal simulator,” IEEE Trans. Comput.-Aided Des. Integr. CircuitsSyst., vol. 21, no. 12, pp. 1434–1445, Dec. 2002.

[8] S. Rzepka, K. Banerjee, E. Meusel, and C. Hu, “Characterization ofself-heating in advanced VLSI interconnect line based on thermal finiteelement simulation,” IEEE Trans. Compon., Packag., Manufact. Technol.A, vol. 21, no. 3, pp. 406–411, Sep. 1998.

[9] K. Banerjee, A. Mehrotra, A. Sangiovanni-Vincentelli, and C. Hu,“On thermal effects in deep sub-micron VLSI interconnects,” in Proc.

1776 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2006

IEEE/ACM Design Automation Conf., New Orleans, LA, Jun. 1999,pp. 885–891.

[10] Z. Lu, W. Huang, J. Lach, M. Stan, and K. Skadron, “Interconnect lifetimeprediction under dynamic stress for reliability-aware design,” in Proc.IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD), San Jose, CA,Nov. 2004, pp. 327–334.

[11] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, “Full chip leakageestimation considering power supply and temperature variations,” in Proc.Int. Symp. Low Power Electronics and Design (ISLPED), Seoul, Korea,Aug. 2003, pp. 78–83.

[12] C. Tsai and S. Kang, “Cell-level placement for improving substrate ther-mal distribution,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,vol. 19, no. 2, pp. 253–266, Feb. 2000.

[13] B. Goplen and S. Sapatnekar, “Efficient thermal placement of standardcells in 3D ICs using a force directed approach,” in Proc. IEEE/ACMInt. Conf. Computer-Aided Design (ICCAD), San Jose, CA, Nov. 2003,pp. 86–89.

[14] K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan,and D. Tarjan, “Temperature-aware microarchitecture,” in Proc. 30thInt. Symp. Computer Architecture, San Diego, CA, Jun. 2003, pp. 2–13.

[15] S. Gunther, F. Binns, D. Carmean, and J. Hall, “Managing the impact ofincreasing microprocessor power consumption,” Intel Technol. J., vol. 5,no. 1, pp. 1–9, Feb. 2001. Q1.

[16] W. Briggs, A Multigrid Tutorial. Philadelphia, PA: SIAM, 1987.[17] K. Stuben and U. Trottenberg, Multigrid Methods: Fundamental Algo-

rithms, Model Problem Analysis and Applications, vol. 960, Lecture Notesin Mathematics. Berlin, Germany: Springer-Verlag, 1982.

[18] P. Li, L. Pileggi, M. Asheghi, and R. Chandra, “Efficient full-chip thermalmodeling and analysis,” in Proc. IEEE/ACM Int. Conf. Computer-AidedDesign (ICCAD), San Jose, CA, Nov. 2004, pp. 319–326.

[19] M. Ozisik, Boundary Value Problems of Heat Conduction. London,U.K.: Oxford Univ. Press, 1968.

[20] J. D. R. Alcouffe, A. Brandt, and J. Painter, “The multigrid method for thediffusion equation with strongly discontinuous coefficients,” SIAM J. Sci.Statist. Comput., vol. 2, no. 4, pp. 430–454, Dec. 1981.

[21] J. Ruge and K. Stuben, Algebraic Multigrid (AMG), Multigrid Methods,Frontiers in Applied Mathematics. Philadelphia, PA: SIAM, 1987.

[22] G. Kromann, “Thermal management of a c4/ceramic-ball-grid array:The Motorola PowerPC 603 and PowerPC 604 RISC microprocessors,”in Proc. 12th Semiconductor Thermal Measurement and ManagementSymp., Austin, TX, Mar. 1996, pp. 36–42.

[23] A. Odabasioglu, M. Celik, and L. Pileggi, “PRIMA: Passive reduced-order interconnect macromodeling algorithm,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 17, no. 8, pp. 645–654,Aug. 1998.

[24] T. Skalicky, Laspack Reference Manual. Dresden, Germany: Dres-den University of Technology, 1995 [Online]. Available: http://www.tu-dresden.de/mwism/skalicky/laspack/laspack.htm

Peng Li (S’02–M’04) received the B.Eng. degree ininformation engineering and the M.Eng. degree insystems engineering from Xi’an Jiaotong University,Xi’an, China, in 1994 and 1997, respectively, and thePh.D. degree in electrical and computer engineeringfrom Carnegie Mellon University, Pittsburgh, PA,in 2003.

From December 2003 to July 2004, he was aPost-Doctoral Research Associate in the Departmentof Electrical and Computer Engineering, CarnegieMellon University. Since August 2004, he has been

an Assistant Professor in the Department of Electrical Engineering, TexasA&M University, College Station. His research interests are in the variousaspects of very large scale integrated (VLSI) computer-aided design with anemphasis on simulation and modeling aspects.

Dr. Li received the Inventor Recognition Awards from the SemiconductorResearch Corporation (SRC) in 2001 and 2004 and the Best Paper Award fromthe Design Automation Conference in 2003.

Lawrence T. Pileggi (S’85–M’89–SM’94–F’01) re-ceived the Ph.D. degree in electrical and computerengineering from Carnegie Mellon University, Pitts-burgh, PA, in 1989.

He is the Tanoto Professor of Electrical and Com-puter Engineering and the Director of the Center forSilicon System Implementation at Carnegie MellonUniversity. From 1984 to 1986, he worked for West-inghouse Research and Development. From 1989 to1995, he was a Faculty Member at the University ofTexas at Austin. In January of 1996, he joined the

faculty at Carnegie Mellon University. His research interests include variousaspects of digital and analog design and electronic design automation (EDA).He has consulted for several EDA and semiconductor companies. He is acoauthor of Electronic Circuit and System Simulation Methods (McGraw-Hill,1995) and IC Interconnect Analysis (Kluwer, 2002). He has published over 200refereed conference and journal papers and holds 13 U.S. patents.

Dr. Pileggi received the Best CAD Transactions Paper Awards in 1991and 1999, the Best Paper Award from the Design Automation Conference in2003, the Best Paper Award from the International Conference on Computer-Aided Design in 2004, the Presidential Young Investigator Award from theNational Science Foundation in 1991, the Semiconductor Research CorporationTechnical Excellence Award in 1991 and 1999, and the Invention Awardfrom the SRC and, subsequently, a U.S. Patent for the RICE simulation tool,in 1993. In 1994 he received the University of Texas Parent’s AssociationCentennial Teaching Fellowship for excellence in undergraduate instruction. In1995 and 2005, he received Faculty Partnership Awards from IBM. He was alsoawarded with Westinghouse Research and Development’s highest engineeringachievement award in 1986. He served as the Technical Program Chairman ofthe 2001 ICCAD and the Conference Chairman of the 2002 ICCAD.

Mehdi Asheghi (S’96–M’01) received the Ph.D. de-gree in mechanical engineering from Stanford Uni-versity, Stanford, CA, in 2000.

He subsequently joined the Mechanical Engi-neering Department at Carnegie Mellon Univer-sity in 2000. Study of microscale/nanoscale thermalphenomena in microelectronic devices, multilayerstructures, data storage devices, and microelectro-mechanical systems (MEMS) has been the focus ofhis research during the past ten years.

Dr. Asheghi is a member of the American Societyof Mechanical Engineers (ASME).

Rajit Chandra (M’90–SM’04) received the B.Tech.and M.Tech. degrees in radio physics & electron-ics from the University of Calcutta, Calcutta, India,and the Ph.D. degree in electrical engineering fromLondon South Bank University, London, U.K.

He is the Founder, President, and CEO of Gradi-ent Design Automation, Santa Clara, CA. Gradientdelivers electronic design automation (EDA) solu-tions for temperature-induced problems in modern-day semiconductor products. Prior to his current role,he co-founded Moscape Inc., which specialized in

software solutions for signal integrity in chip designs. Moscape was acquiredby Magma in August 2000. He worked as the Vice President of Technologyat Magma for two years following the acquisition before leaving to pursuehis interest in the challenges of nanometer-scale semiconductor designs. Heworked on design automation projects while teaching at the Imperial College,London U.K. His interest drew him to work closely with designers and broughthim to the United States where he worked closely with design teams at Intel,AT&T Bell Laboratories, Sun Microsystems, and Cadence Design Systems,where he dealt with timing tools and developed the SDF format to enable timingdriven design flows. He is interested in the challenges of cost-effective andreliable design methodologies and pursues his interest through innovations indesign automation technology.

Dr. Chandra served as the Chair of the Circuits and Systems Chapter of theSanta Clara Valley Section from 2003 to 2004.