45
VLSI DESIGN 2000, Vol. ll, No. 3, pp. 175-218 Reprints available directly from the publisher Photocopying permitted by license only (C) 2000 OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint. Printed in Malaysia. Tutorial on VLSI Partitioning SAO-JIE CHEN a, and CHUNG-KUAN CHENG b,, aDept, of Electrical Engineering, National Taiwan Universitl:, Taipei, Taiwan 10764; bDept, of Computer Science and Engineering, University of California, "San Diego, La Jolla, CA 92093-0114 (Received March 1999," In finalform 10 February 2000) The tutorial introduces the partitioning with applications to VLSI circuit designs. The problem formulations include two-way, multiway, and multi-level partitioning, parti- tioning with replication, and performance driven partitioning. We depict the models of multiple pin nets for the partitioning processes. To derive the optimum solutions, we describe the branch and bound method and the dynamic programming method for a special case of circuits. We also explain several heuristics including the group migration algorithms, network flow approaches, programming methods, Lagrange multiplier methods, and clustering methods. We conclude the tutorial with research directions. Keywords: Partitioning; clustering; Network flow; Hierarchical partitioning; Replication; Perfor- mance driven partitioning 1. INTRODUCTION Automatic partitioning [5, 61,72, 78, 95] is becom- ing an important topic with the advent of deep submicron technologies. An efficient and effective partitioning 12, 17, 19, 48, 69, 70, 77, 81,94, 105] tool can drastically reduce the complexity of the design process and handle engineering change orders in a manageable scope. Moreover, the quality of the partitioning differentiates the final product in terms of production cost and system performance. The size of VLSI designs has increased to sys- tems of hundreds of millions of transistors. The complexity of the circuit has become so high that it is very difficult to design and simulate the whole system without decomposing it into sets of smaller subsystems. This divide and con- quer strategy relies on partitioning to manipu- late the whole system into hierarchical tree structure. Partitioning is also needed to handle engineering change orders. For huge systems, design iterations require very fast turn around time. A hierarchical *Corresponding author. Tel." (858)534-6184, Fax" (858)534-7029, e-mail: [email protected] tTel.: (8862)2363-5251, Ext. 417, e-mail: [email protected] 175

Tutorial on Partitioning - Hindawi

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tutorial on Partitioning - Hindawi

VLSI DESIGN2000, Vol. ll, No. 3, pp. 175-218Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2000 OPA (Overseas Publishers Association) N.V.Published by license under

the Gordon and Breach SciencePublishers imprint.

Printed in Malaysia.

Tutorial on VLSI Partitioning

SAO-JIE CHEN a, and CHUNG-KUAN CHENG b,,

aDept, of Electrical Engineering, National Taiwan Universitl:, Taipei, Taiwan 10764;bDept, of Computer Science and Engineering, University of California, "San Diego, La Jolla, CA 92093-0114

(Received March 1999," In finalform 10 February 2000)

The tutorial introduces the partitioning with applications to VLSI circuit designs. Theproblem formulations include two-way, multiway, and multi-level partitioning, parti-tioning with replication, and performance driven partitioning. We depict the modelsof multiple pin nets for the partitioning processes. To derive the optimum solutions,we describe the branch and bound method and the dynamic programming methodfor a special case of circuits. We also explain several heuristics including the groupmigration algorithms, network flow approaches, programming methods, Lagrangemultiplier methods, and clustering methods. We conclude the tutorial with researchdirections.

Keywords: Partitioning; clustering; Network flow; Hierarchical partitioning; Replication; Perfor-mance driven partitioning

1. INTRODUCTION

Automatic partitioning [5, 61,72, 78, 95] is becom-ing an important topic with the advent of deepsubmicron technologies. An efficient and effectivepartitioning 12, 17, 19, 48, 69, 70, 77, 81,94, 105]tool can drastically reduce the complexity of thedesign process and handle engineering changeorders in a manageable scope. Moreover, thequality of the partitioning differentiates the finalproduct in terms of production cost and systemperformance.

The size of VLSI designs has increased to sys-tems of hundreds of millions of transistors. Thecomplexity of the circuit has become so highthat it is very difficult to design and simulatethe whole system without decomposing it intosets of smaller subsystems. This divide and con-

quer strategy relies on partitioning to manipu-late the whole system into hierarchical treestructure.

Partitioning is also needed to handle engineeringchange orders. For huge systems, design iterationsrequire very fast turn around time. A hierarchical

*Corresponding author. Tel." (858)534-6184, Fax" (858)534-7029, e-mail: [email protected].: (8862)2363-5251, Ext. 417, e-mail: [email protected]

175

Page 2: Tutorial on Partitioning - Hindawi

176 S.-J. CHEN AND C.-K. CHENG

partitioning methodology can localize the modifi-cations and reduce the complexity.

Furthermore, a good partitioning tool can

decrease the production cost and improve thesystem performance. With the advance of fabri-cation technologies, the cost of a transistor dropswhile the cost of input/output pads remains fairlyconstant. Consequently, the size of the interfacebetween partitions, e.g., between chips, deter-mines a significant portion of the manufacturingexpenses. And the quality of the partitioninghas strong effect on production cost. Further-more, in submicron designs, interconnection de-lays tend to dominate gate delays [8]; thereforesystem performance is greatly influenced by thepartitions.

Partitioning has been applied to solve thevarious aspects of VLSI design problems [5, 36]:

Physical packaging Partitioning decomposesthe system in order to satisfy the physical pack-aging constraints. The partitioning conformsto a physical hierarchy ranging from cabinets,cases, boards, chips, to modular blocks.Divide and conquer strategy Partitioning isused to tackle the design complexity with adivide and conquer strategy [21]. This strategyis adopted to decompose the project betweenteam members, to construct a logic hierarchy forlogic synthesis, to transform the netlist into phy-sical hierarchy for floorplanning, to allocate cellsinto regions for placement and RLC extraction,and manipulate hierarchies between logic andlayout for simulation.System emulation andrapidprototyping One ap-proach for system emulation and prototypingis to construct the hardware with field program-mable gate arrays. Usually, the capacity of thesefield programmable gate arrays is smaller thancurrent VLSI designs. Thus, these prototypingmachines are composed of a hierarchical struc-ture of field programmable gate arrays. A par-titioning tool is needed to map the netlist intothe hardware [110].Hardware and software codesign For hardwareand software codesign, partitioning is used to

decompose the designs into hardware andsoftware.Management of design reuse For huge designsespecially system-on-a-chip, we have to managedesign reuse. Partitioning can identify clustersof the netlist and construct functional modulesout of the clusters.

While partitioning is a tool required to managehuge systems in many fields such as efficientstorage of large databases on disks, data mining,and etc., in this tutorial, we focus our efforts on

partitioning with applications to VLSI circuit

designs. In the next section, we describe the nota-tions for the tutorial. In section three, the formu-lations of the partitioning problems are stated.Section four covers the models for mutiple pinnets. Section five depicts the partitioning algo-rithms. The tutorial is concluded with researchdirections.

2. PRELIMINARIES

In this section, we establish notations used andformulate the partitioning problems addressedin our approaches. A circuit is represented by a

hypergraph, H(V,E), where the vertex setV- {vii i= 1,2,...,n} denotes the set of modulesand the hyperedge set E={e#lj 1,2,...,m} de-notes the set of nets. Each net e- is a subset of Vwith cardinality le.l > 2. The modules in e. arecalled the pins of e-.The hypergraph representation for a circuit with

9 modules and 6 signal nets is shown in Figure 1,where nets e, e3 and e5 are two-pin nets, net e6 is athree-pin net, and nets e2 and e4 are four-pin nets.When the circuit has only two pin nets, we can

simplify the representation to a graph G(V, E). Anet connecting modules v; and v# is represented byeo. with a connectivity ci-. We set co.-0 if there isno net connecting modules .F and v#. We shall showlater that for certain formulations we replacemultiple pin nets with models of two pin nets.The replacement is performed when the partition-ing algorithm is devised for graph models.

Page 3: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 177

Vl

v2

v7

v8

v9

FIGURE Hypergraph Example.

(i) Module Size and Net Connectivity Each mod-ule V is attached with a size si in R +, positivereal numbers. We define S(Vj) viv si to be thesize of a partition .. Each net ei is attached witha connectivity ci in R +. By default, ci 1. For abus of multiple signal lines, we can represent thebus with a net ei of connectivity ci equal to the num-ber of lines. We can also assign higher weights forsome important nets, this will enable us to keepthe modules of these nets in the same partition.

In this tutorial, we will assume that circuits are

represented as hypergraphs except when statedotherwise, hence, the terms circuit, netlist, andhypergraph are used interchangeably throughoutthe tutorial.

(ii) Partitions and Cuts The set of hyperedges con-

necting any two-way partition (V1, V2) of twodisjoint vertex sets V1 and V2 is denoted by a cut

E(V, V2): {e-C E 0 < le.N vii and 0 <

i.e., e# E(V, V2) if there exist some pins of ei in

v and some different pins of e- in v2. We define

C(V, V2)- -,e,E(v,,v2)ci to be the cut count ofthe partition (V, V2).

For a multiway partition (V, V2,..., Vk) wherek>2, a cut E(V,Vz,...,V.I):{eiEIi s.t.0 < le/ Vii < le#l}. For each subset Vi, we de-note its external cut set E(Vi){e-EI0 < leCq Vii < le/l}. We denote its adja-cent net set to be the nets with some pin con-tained in Vi, i.e., I(Vi)-{eil [ei Vii > 0}.

(iii) Replication Cuts and Directed Cuts For repli-cation cuts and performance driven partitioning,

the direction of the nets makes a difference in theprocess. We characterize the pins of each net intotwo types: source and sink. A directed net e.is denoted by (a,bz) where a.c V are the source

pins of the net and bi c V are the sink pins of thenet. We assume that laiLJ bil >_ 2, lail >_ and

Ibl > 1. Usually, each net has one source pin andmultiple sink pins. However, some nets may havemultiple sources which share the same interconnectline. Furthermore, one pin can be both a source

pin and sink pin of the same net. Therefore, aand bg may have a nonempty intersection.For two disjoint vertex sets X and Y, we shall

use E(X-+ Y) to denote the directed cut set fromX to Y. Net set E(X--, Y) contains all the netseg (a,bg) such that X intersects the source pinset a; and Y intersects the sink pin set b, i.e.,E(X-- g)={ele=(a,bi), aYO, bY:/:O}.We use the function C(X--, Y) to denote the to-tal cut count of the nets in E(X--, Y), i.e.,

C(X --+ Y) -eiE(X-Y) Ci"

(iv) Performance Driven Partitioning In perfor-mance driven partitioning [106], modules are

distinguished into two types: combinational ele-ments and globally clocked registers. In illustra-

tion, we shall use circles to represent thecombinational elements and rectangles to repre-sent the registers in figures (Fig. 13). Each module

v. has an associated delay d..A path p length k from a module vi to a module

v- is a sequence (Vpo, Vp,..., Vp:) of modules suchthat vi Vpo, v# Vp: and for each 1,2,..., k},

Page 4: Tutorial on Partitioning - Hindawi

178 S.-J. CHEN AND C.-K. CHENG

modules Vp(l-1), Vpl are a source pin and a sink pinof a net in E, respectively.

(v) Clustering Given a hypergraph H(V,E),highly connected modules in V can be groupedtogether to form some single supermodules calledclusters. After this process, a clustering F= {V1,V2,..., Vz:} of the original hypergraph H isobtained and a contracted (i.e., coarser) hyper-graph Hr(Vv Er) is induced, where Vr {vv,..., v}. For every ejCE, the contracted net

e} CEr if ejrl_>2, where ejr-{vrilejffl Vi=/=that is, e spans the set of clusters containingmodules of ej. A contracted hypergraph, of course,can be used to induce another coarser contractedhypergraph based on the same clustering process.On the other hand, a contracted hypergraphHr(Vr, Er) can be unclustered to return to a finerhypergraph H(V, E).

3. PROBLEM FORMULATIONS

In this section, we describe different formulationsof the partitioning problems addressed in thistutorial. We will cover two-way partitioning,multiway partitioning, multiple level partitioning,partitioning with replication, and performancedriven partitioning.

3.1. Two-way Partitioning or Bipartitioning

We consider several possible variations on the sizeconstraints and cost functions in the formulation.Additionally, in certain formulations, we fix twomodules vs and vt to be on the opposite sides ofthe cut as two seeds.

3.1.1. Min-cut Separating Two Modules

Vs and vt

Given a hypergraph, we fix two modules denotedas v, and vt at two sides. A min-cut is a partition(V1, V2), vs V1 and v V2 such that the cut count

C(V1, V2) is minimized, i.e.,

minvsEVl,v,cV2 C(V1, V2) (1)where V1 and V2 are disjoint and the union of thetwo sets is equal to V.

This partitioning is strongly related to a linearplacement problem. In a linear placement, we have

Vl equally spaced slots on a straight line (Fig. 2).Modules vs and v are fixed at the two extremeends, i.e., vs on the first slot (left end) and v on thelast slot (right end). The goal is to assign all mod-ules to distinct slots to minimize the total wirelength. Let us use xi to denote the coordinate ofmodule vi after it is assigned to the slot. The lengthof a net ei can be expressed as the difference ofthe maximum coordinate and the minimum coordi-nate of the modules in the net, i.e., maxv.ce, xj--minvkEe, Xk. The total wire length can be expressedas follows.

Z(maxv.exj minv.,xj) (2)etE

The relation between partitioning and place-ment can be derived under the assumption thatall nets are two pin nets [50].

THeOreM 3.1 Given a graph G(V, E) with modulesvs and vt in V, let (V1, V2) be a min-cut partitionseparating modules v and vt. Let v and vt be the two

(Vl, V2)

vsV V

v VI V2 v

FIGURE 2 Suppose partition (V1, V2) is a min-cut separatingmodules vs and v. There exists an optimal linear placement thatmodules in V2 are at the right side of modules in V.

Page 5: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 179

modules locating at the two extreme ends ofa linearplacement. Then, there exists an optimal linearplacement solution such that all modules in V2 are

on the slots right of all modules in V1 (Fig. 2).

Thus, we can use the min-cut to partition a

linear placement into two smaller problems andstill maintain optimality. Conceptually, we canconceive that modules in V1 or V2 have strongerinternal connection within the set than its mutualconnection to the other set. Thus, if the span ofmodules in V1 and in V2 are mixed in a linearplacement, we can slide all modules in V to theleft and all modules in V2 to the right to reducethe total wire length. In fact, this is the procedureto prove the theorem.The min-cut with no size constraints can be

found in polynomial time using classical maximumflow techniques [1]. However, it may happen thatthe optimal solution separates only vs or vt fromthe rest of the modules, i.e., V {vs} or V2--{l:t}.This result is very likely to occur because mostVLSI basic modules have very small degreesof connecting nets (e.g., the degree of a 3-inputNAND gate 4).

3.1.2. Minimum Cost Ratio Cut

modules. Thus, if the min-cut cannot provide anynontrivial solution, we may adopt the cost ratiocut to perform another trial.

In cost ratio cut, we fix two modules v and vt attwo different sides. Our objective is to find a vertexset A to minimize a cost ratio function:

C(A, V- A {Vs}) C(A, {Vs})S(A)

where vertex set A does not contain v and v.Vertex set A is non-empty, i.e., S(A) > O.

Cost ratio cut is also strongly related to a linearplacement. Assuming that all nets are two pin nets,we can derive the following theorem [22]:

TrtEOREM 3.2 Given a graph G(V, E) with modules

v and v in V, let (VI, V2) be an optimal cost ratio

cut partition. There exists an optimal linear place-ment solution such that all modules in A are on

the slots left of all modules in V-A- {v.}.

Conceptually, we can conceive that C(A, V-A-{v}) is the force to pull A to the rightand C(A, {v}) is the force to push A to the left.The denominator S(A) is the inertia of the set A.A set A with the minimum cost ratio moves withthe fastest acceleration toward left end of the slots

The cost ratio cut formulation supplies a partitiondifferent from the min-cut that separates two fixed

Example In Figure 3, the circuit contains sixmodules. The optimum cost ratio cut solution has

v v3

..’

v2

t

FIGURE 3 A six module circuit to illustrate the cost ratio cut.

Page 6: Tutorial on Partitioning - Hindawi

180 S.-J. CHEN AND C.-K. CHENG

A {11, 1:2, 1:3}. The cost ratio value is

C(A, V- A {vs}) C(A, {Vs}) 4-3

S(A) 3 3

(4)

The cost ratio value of any other choice of set A islarger than expression (4).

The cost ratio cut solution can be found inpolynomial time for a special case of serial parallelgraphs [22]. We are unaware of algorithms forgeneral cases. Note that, the solution may haveV-A-{v,} equal to set {vt}. In such case, thepartitioning result is not useful for decomposingthe circuit.

3.1.3. Min-cut with Size Constraints

For min-cut with size constraints, we have lowerand upper bounds on the partition size $I andS,, where 0 < $/_< S, < S(V) and Sz+ S, S(V).The bipartitioning problem is to divide vertex setV into two nonempty partitions V1, V2, where

V1 C? V2 (3 and V U V2 V, with the objective ofminimizing cut count C(V, V2) and subject to thefollowing size constraints:

St<_S(Vh) <_S, forb- 1,2 (5)

The min-cut problem with size constraints isNP complete [43]. However, because of the import-ance of the problem in many applications, manyheuristic algorithms have been developed.Random Partitioning We use a random parti-

tion estimation of min-cut with size constraintsto demonstrate that the quality variation of parti-tioning results can be significant. Let us simplifythe case by assigning the modules with uniformsize, i.e., s; for all vi in V, and the nets withuniform connectivity, i.e., ci for all e; in E.

Let us assume that the modules are partitionedinto two sets U1, V2 with equal sizes: S(V)= S(V2).The partition is performed with an independentrandom process [10] so that each module has a

50% chance to go to either side. For a net e; of

two pins, we can derive that net e; belongs to thecut set E(V, V2) with a 0.5 probability (Fig. 4).Similarly, we can derive that for a net ei of k pins(k > 2), the probability that net e; belongs to cutset E(V, V2) is (2k- 2)/2k. This probability islarger than 0.5 and approaches one as k increases.In other words, the expected cut count C(V1, V2) is

equal to or larger than half the number of nets.For example, a circuit ofone million modules usual-ly has an asymptotic number of nets, i.e., IEIO(I V I)= 1,000,000. The expected cut count wouldbe C(V, V2)>_ 500,000. This number is muchworse than the results we can achieve. In practice,the cut counts on circuits of a million of mod-ules are usually no more than several thousands[34, 36]. In other words, the probability that a net

belongs to a cut set is small, below one percentfor a circuit of one million gates.

Suppose the two bounds of partitioned sizesare not equal, Sz - S,. Using the proposed randomgraph model, the expected cut count C(V, V2) is

proportional to the product of two sizes, i.e.,

S(V) S(V2). Consequently, the expected cutcount is smallest if the size of one partition ap-proaches the upper bound S(Vi)=S, and thesize of another partition approaches the lowerbound S(V-) Sz. In practice, we do observe thisbehavior. One partition is fully loaded to itsmaximum capacity, while another partition isunder utilized with a large capacity left unused.

b

(V, V2)

ba

b

FIGURE 4 Four possible configurations of net ei {a, b} in arandom placement.

Page 7: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 181

This phenomena is not desirable for certainapplications.

3.1.4. Ratio Cut

Ratio cut formulation integrates the cut countand a partition size balance criterion into a singleobjective function [87,109]. Given a partition(V, V2) where V1 and V2 are disjoint and

V1 U V2 V, the objective funtion is defined as

C(VI, V2)s(v,

(6)

The numerator of the objective function minimizesthe cut count while the denominator avoidsuneven partition sizes. Like many other partition-ing problems, finding the ratio cut in a generalnetwork belongs to the class of NP-completeproblems [87].

Example Figure 5 shows a seven module exam-

ple. The modules are of unit size and the nets areof unit connectivity. Partition (V1, V2) has a costC (gl, g2)/(S(gl) x s(g2)) 2/(4 x 3)= (1/6).Any other partition corresponds to a much largercost.

The Clustering Property of the Ratio Cut Theclustering property of the ratio cut can be illust-rated by a random graph model. Let us assumethat the circuit is a uniformly distributed randomgraph, with uniform module sizes, i.e., si 1.We construct the nets connecting each pair ofmodules with identical independent probability f.Consider a cut which partitions the circuit into

(Vl, V2)

FIGURE 5 An example of seven modules, where partition(V, V2) is a minimum ratio cut.

two subsets V and V2 with comparable sizes

vl and (1 c0 x vl respectively, wherec<l.The expected cut count equals the probability f

multiplied by the number of possible nets between

V1 and V.

Expec(C(V,, V2))=f IVll x

c(1 c)[VI 2 x f. (7)

On the other hand, if another cut separates onlyone module vs from the rest of the modules, theexpected cut count is

Expec(C({vs}, V- {vs))) (Igl- a) f (8)

As IV approaches infinity, the value of Eq. (7)becomes much larger than Eq. (8).

This derivation provides another explanationwhy the min-cut separating two fixed modulestends to generate very uneven sized subsets. Thevery uneven sized subsets naturally give the lowestcut value. Therefore, the ratio value C(VI, V2)/(S(V1) x S(V2)) is proposed to alleviate the hiddensize effect. As a consequence, the expected valueof this ratio is a constant with respect to differentcuts:

C(V1,V2) )_fExpecS(V S(V2) --f (9)

Thus, if the nets of the graph are uniformlydistributed, all cuts have the same ratio value. Inother words, the choice of the cuts and thepartition sizes does not make difference in such a

uniformly distributed random graph. In a generalcircuit different cuts generate different ratios. Cutsthat go through weakly connected groups corre-

spond to smaller ratio values. The minimum ofall cuts according to their corresponding ratiosdefines the sparsest cut since this cut deviatesthe most from the expectation on a uniformlydistributed graph.

Page 8: Tutorial on Partitioning - Hindawi

182 S.-J. CHEN AND C.-K. CHENG

3.2. Multi-way Partitioning

For multi-way partitioning, we discuss a k-waypartitioning with fixed size constraints and a

cluster ratio cut. These two problems are theextensions of the min-cut with fixed size con-straints and the ratio cut from two-way to multi-way partitioning, respectively.

3.2.1. K-way Partitioning

For multi-way partitioning, we separate vertexset V into k disjoint subsets where k > 2, i.e.,(V1, V2,..., Vk). There is an upper bound Suand a lower bound $l on the size of each subsetVi, i.e., SI <_ S( Vi) <_ Su.There are different ways to formulate the cut

cost because of the different criteria used to countthe cost of multiple pin nets. In the following welist a few possible objective functions.

(i) Minimize the cut count,

C(Vl, V2,..., Vk) Z Ci (10)eiGE(Vl ,V2 Vk)

(ii) Minimize the sum of cut counts of all vertexsets. Let us denote the cut count of vertex set

Vi to be C(Vi)- ,eicF(Vi)Ci. The sum of cut

counts of all subsets can be expressed as

k k

i=1 i=1 eyCE(Vi)(11)

Thus, the cost of a net connecting three sub-sets is more expensive than the same net

connecting two subsets.(iii) Minimize the maximum cut count of all

subsets, i.e.,

maxl<i<icC(Vi) (12)

3.2.2. Cluster Ratio Cut

Cluster ratio cut is an extension of ratio cut fromtwo-way partition to multiway partition. There is

no bound on the size of each subset. Furthermore,the number of partitions, k, is not fixed, andinstead is part of the objective function.

C(Vl, V2,..., gk)(13)Rc mink>l ,<i<k- j>_iS(Vi) X S(Vj)

Note that we can rewrite the denominator toreduce complexity of the derivation.

C(Vl, g2,..., gk)Rc min> (1/2) y<i<S(Vi) x [S(V) S(Vi)]

(14)

If the number of partitions is one, the denomi-nator becomes zero. Thus, k is restricted to belarger than one.

Example Figure 6 shows a fifteen module circuit.The modules are of unit size and the nets are ofunit connectivity. The square dot in the figure rep-resents a hypernet. The partition shown by thedashed line is a minimum cluster ratio cut. Thecost of the cut is

c v v v4(1/2) l<i<4S(Vi) x [S(V)-S(Vi)]

4

(1/2) [4(15-4) + 3(15- 3)+4(15-4) +4(15-4)]

21(15)

The physical intuition of cluster ratio can beexplained using a random graph model [10]. Let Gbe a uniformly distributed random graph. We con-struct the nets connecting each pair of moduleswith identical independent probability f. Sincethe nets are uniformly distributed, the probabilityof finding a subgraph which is significantly denserthan the rest of the graph is very small, meaningthat there is no distinct cluster structure in G.

Consider a cut E(V, V2,..., V), the expectedvalue of C(V1, V2,..., Vk) equals

k k-1

Expec(C(V1, g2,..., gk)) f X Z IVil IVIi=j+l j--1

(16)

Page 9: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 183

V1 V3

FIGURE 6 A fifteen module example to demonstrate cluster ratio cut.

and the expected value of cluster ratio equals

( C(Vl, V2," Vk) )Expe(Rc)- Expec x_,------,T-i i,l IVjIZ..i=j+ Z..j=

=f (17)

Since f is a constant, all cuts have the sameexpected cluster ratio value. Therefore, if we usecluster ratio as the metric, all cuts would beequally favored, which is consistent with the factthat G has no distinct clusters. However, in ageneral circuit, different cuts generate differentratio values. Cuts that go through weakly con-nected groups correspond to smaller ratio values.The minimum of all cuts according to their clusterratio values defines the cluster structure of thecircuit since this cut deviates the most from thecuts of a uniformly distributed graph.

3.3. Multi-level Partitioning

In multi-level partitioning [4, 23, 47, 58, 67, 68,109110], the final result is represented by a tree struc-ture. All the modules are assigned to the leavesof the tree. The tree is directed from the root to-ward the leaves. The level of the nodes is definedto be the maximum number of nodes to traverse to

reach the leaves. Thus, the leaves are ranked levelzero. Each node is one level above the maximumlevel of its children. When the level of the root isonly one, the problem is degenerated to two-wayor multiway partitioning.Each net ei spans a set of leaves. Given a set of

leaves, there is a unique lowest common ancestor.The level of the lowest ancestor is defined to bethe level l(ei) of the net.The cost of a net ei is defined to be the multi-

plication of its connectivity ci and the weightw(l(ei)) of level l(ei) for net ei to communicate, i.e.,

ci x w(l(e)). The cost of the multi-level partition isthe sum of the cost of all nets, i.e., -]e,E ciw(l(ei)).

3.3.1. J-level K-way Partitioning

When the root of the partitioning tree is level jand the number of branches of each node is no

more than k, we say it a j-level k-way partition.We can set different communication weights foreach level. Usually, the function is monotone, i.e.,w(1) is larger when level increases. The ver-tex set Vi of each leaf has its size bounded byS S(Vi) SFor electronic packaging, the tree is bounded

by the number of external connections. We call aleaf is covered by a node if there is a directed path

Page 10: Tutorial on Partitioning - Hindawi

184 S.-J. CHEN AND C.-K. CHENG

from the node to the leaf in the tree representa-tion. For each node ni, we define T; to be the unionof the modules in the leaves covered by noden;. Let E(Ti) be the external nets of Ti, i.e.,

E(Ti)={eil O < [eiA Til < [eil}. The cut count ofeach node should not exceed the capacity of theexternal connection of the packaging, i.e.,

C(Ti)- Z cj < Cap(l(ni)) (18)ejE(Ti)

min Ci21(ei)eiCE

subject to the constraint on the capacity of theleaves, i.e., S(Vi)< S, where Vi is the vertex set ofleaf i. The level of the root is adjusted accordingto the minimization of the objective function.

Example Figure 8 illustrates a generic binary treefor partitioning. In this figure, the root is at levelthree. Each node has at most two children.

where Cap(l(ni)) is the capacity of the externalconnection of level l(ni).

Example Figure 7 shows an example of a 3-level5-way partitioning structure. The leaves are atlevel 0 and the root is at level 3. Each node hasat most five children. Net ei {Vl, 12, 13} is coveredby node na at level l(na)= 2.

3.3.2. Generic Binary Tree

A generic binary tree structure [110] is proposedto simplify the multi-level partitioning. Thereis only one constant S, to set in the binary tree.Thus, it is much easier to make a fair comparisonbetween different algorithms.

In a generic binary tree, each internal node hasexactly two children. The weight of each level isdefined to be w(l)--21. Thus, we have the objectivefunction

3.4. Replication Cut

In the replication cut problem, a subset of thecircuit may be replicated to reduce the cut countof a partition [54, 64, 82]. In this section, we use a

two-way partition to illustrate the problem. Wefix two modules vs and vt at two sides of the cut.We use three vertex sets to represent the partition,V1, V2, and R, where V1, V2, and R are disjointand V1U V2UR= V, vs V1, vt V2. Subsets V1and V2 are separated by the cut and subset R isto be replicated at both sides (Fig. 9).Each copy of R needs to collect a complete set

of input signals in order to compute the function

properly. Thus, the nets from V to R and from V2to R are duplicated. However, the output signalsof R can be obtained from either copy of R. Forexample, nets from the right side R to V in Figure9(b) are not duplicated because V gets inputs

FIGURE 7 An example of a 3-level 5 way partitioning tree structure.

Page 11: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 185

FIGURE 8 An example of a generic binary tree.

II

(a) (b)

FIGURE 9 Replication cut problem: (a) the three sets of nodes V, R and V2; (b) the duplicated circuit with R being replicated.

from the left side R. For the same reason, we donot replicate the nets from the left side R to V2.Given two disjoint sets V1 and V2, let a replicationcut R(V1, V2) denote the cut set of a partitioningwith R V- V- V2 being duplicated. From Fig-ure 9(b), we can see that R(V, V2) is the unionof four directed cuts, that is,

(v, v:) (v - v:) (v. - v)(v --+ ) e(v: ).

Let St and S, denote the size limits on the twopartitioned subsets. We state the Replication CutProblem as follows:Given a directed circuit G, we want to find a

replication cut R(V, V2) with an objective

minCR(V,,V2)- C (19)e, ER(V ,V2)

subject to the size constraints

S[ <_ S(V U R) < Su and S[ <_ S(V2 U R) <_ Su,

and the feasible condition

VfhV--O, R--V-V-V2.

Interpretation of the Replication Cut Supposewe rewrite the replication cut in the format:

(v, v) (v --+ a) (v v:)(v: v) (v - )(v - ’) (v --+

Page 12: Tutorial on Partitioning - Hindawi

186 S.-J. CHEN AND C.-K. CHENG

where r and ’2 denote the complementary setsof V1 and V2, i.e., 1 V- V1 and ’2 V- V2.The cut set becomes the union of E(V1 -+ V1) andE(V2 ---, V2). We can interpret the cut set of thereplication cut R(V1, g2) as two directed cuts onthe original circuit G as shown in Figure 10.

3.5. Performance Driven Partitioning

The goal of performance driven partitioning is togenerate a partition that satisfies some timing con-straints. Due to the physical geometric distanceand interface technology limitations, inter-parti-tion delay contributes the dominant portion of sig-nal propagation delay. Consequently, instead ofminimizing the number of the crossing nets as theonly objective during partitioning, we should takeinto account the interpartition delay to satisfythe timing constraints.

Clock period is a major measurement for circuitperformance. It is determined by the longest sig-nal propagation delay between registers. Eachcrossing net is associated with an interpartitiondelay (5 determined by VLS! technologies. Given a

path p from one register to another register withno interleaving registers, let dp be the sum ofcombinational block delays and dp be the sum ofinterpartition delays along path p. The longestdelay dp + dp among all paths p should be smal-ler than the clock period T, i.e."

max4 + 4 < T. (20)p

Now we state the performance-driven partition-ing problem as follows:Given hypergraph H(V,E), clock period T, two

bounds of sizes $I and S, and interpartition delay(5, find a partition (V1, V2) with the minimum cut

count, subject to SI <_ S(V1) Su, S S(V2) Su,and maxpdp + dp <_ T.

Example In Figure 11, path p starts at register V

and ends at register v/. The path crosses betweenthe partition (V1, V2) three times. Thus, the inter-partition delay dp- 3(5.

Replication can improve the performance of thepartitioned results [83]. In Figure 12(a), vertex setR locates at the side of V2. Path p crosses betweenthe partition (V1, R U V2) three times. By replicat-ing vertex set R (Fig. 12(b)), path p needs to crossthe partition only once.

3.5.1. Retiming

Retiming shifts the locations of the registers toimprove the system performance [76]. It is an

effective approach to reduce the clock period.Moreover, the process also reduces the primaryinput to primary output latency which is anotherimportant measurement for circuit performance.

(v --+ w) (v2--+ v:)

FIGURE 10 An interpretation of the replication cut, R(V, V2) E(V1 Zl) k3 E(V2 /).

Page 13: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 187

VI V2

FIGURE 11 An illustration of performance driven partitioning.

FIGURE 12 Illustration of replication and its effect onpartitioning. The figure shows path p (a) before and (b) aftervertex set R is replicated.

As in [85], we assume that the combinationalblocks are fine-grained. A module is called fine-grained, if it can be split into several smaller mod-ules. Alternatively, if a module cannot be split,it is called coarse-grained. The interpartitiondelay 6 on crossing nets is inherently coarse-grained and cannot be split.

Given a path p, we use rp to denote the numberof registers on the path. Let W(i,j) denote theminimum rp among all possible paths p from toj, i.e.,

W(i, j) min{rpl p E Pij},

where Po is the set of all paths from module Yi to

vy. We define a path p from 1 to vy as a W-criticalpath if rp equals W(i,j); W-critical path p is alsocalled an IO-W-critical path if modules vi and vjare the primary input and output, respectively.

(i) Iteration Bound While retiming can reduce theclock period of a circuit, there is a lower boundimposed by the feedback loops in the hypergraph[92]. Given a loop l, let dl, dl and rl be the sum ofcombinational block delays, the sum of interparti-tion delays, and the number of registers in loopl, respectively. The delay-to-register ratio of a loopis equal to (d + d)/r. The iteration bound is defi-

ned as the maximum delay-to-register ratio, i.e."

J(V, V2) max { d +rl d lEL}, (21)

where L is the set of all loops. Note that theiteration bound of a given circuit yields a lowerbound on the achieved clock period by retiming.

Page 14: Tutorial on Partitioning - Hindawi

188 S.-J. CHEN AND C.-K. CHENG

(ii) Latency Bound Let p denote the I0-W-criticalpath with maximum path delay among all IO-W-critical paths from vi to vj.. Since the number ofregisters in path p is equal to W(i,j), the I0 latency(i.e. (W(i,j)- 1) x T) between vi and vj. is not lessthan dp + dp, where T denotes the clock period,and dp and dp are the sum of combinational blockdelays and the sum of interpartition delays on pathp, respectively. Thus, we define latency bound M asfollows [85, 86]"

of cut count, subject to St < S( V1) <_ S,,St _< S(V2) <_ Su, J(V1, V2) _< ), and M(V1, V2) _</17/.

Example Figure 13 illustrates the effect of repli-cation on the iteration bound. Let us assume thatthe interpartition delay is 6=4. Before replica-tion, the iteration bound is dominated by loop ll.The bound is equal to

dr, + dl 8 + 2 x 44. (23)

rll 4

M(V,, V2) max{dp + dpl p P,ow}, (22)

where PIOW is the set of all lO-W-critical paths.Latency bound also imposes a lower bound on thesystem latency achieved by using retiming. Anall-pair shortest-path algorithm can be used tocalculate the latency bound.

We have two reasons to use the iteration andlatency bounds. (i) It is faster to calculate thesebounds. (ii) The iteration and latency boundsstand for the lower bounds of the clock period andsystem latency achieved by adopting retiming, re-

spectively. The partition with lower iteration andlatency bounds can achieve better clock periodand system latency by using retiming. Therefore,we want to generate a partition with small iterationand latency bounds.

Statement of the Problem Now we state the per-formance-driven partitioning problem as follows:Given hypergraph H(V, E), two numbers (1 and 1I,bounds of sizes St and Su, and interpartition delay6,find a partition VI, V2) with the minimum number

After replication [85], the bound contributed byloop l is equal to

dll + dll 82. (24)

rll 4

The iteration bound now is dominated by theunion of loops l and 12,

d,+ + d11+ 18 + 2 x 4

rl+123.25, (25)

which is smaller than the iteration bound beforereplication.

3.6. Clustering

Clustering [6] is similar to multiway partitioningin that the process groups modules into k subsets.However, for clustering the number of subsets isusually much greater than for a typical multiwaypartitioning problem, e.g., k >_ 10.

Often, a clustering process is used as part ofa divide and conquer approach. Thus, it is

FIGURE 13 Illustration of replication anal its effect on iteration bound.

Page 15: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 189

important to choose an objective function thatfits the target application. If the goal is to reduceproblem complexity, we set the objective functionto be"

k C(Vi) (26)minCl(Vi)i=1

where Vi’s are disjoint vertex sets and their unionis equal to V. Function C(Vi) is the external cutcount of cluster Vi and CI(Vi) is the count of netsconnecting vertex set Vi, i.e., eix(vi) ci.

For performance driven clustering, the objectivefunction is to minimize the number of cuts be-tween registers.

4. MULTIPLE PIN NET MODELS

The handling of multiple pin nets strongly dependson the partitioning approach [102]. A propermodel is needed to reflect the correct cut count andimprove the efficiency. In this section, we firstintroduce a shift model which is used for itera-tions of shifting a module or swapping a pair ofmodules. We then describe a clique model whichis used to replace a multiple pin net. The star andloop models are variations of two pin net mod-els, however, with less complexity than the cliquemodel. Finally, a flow model is introduced for net-work flow approaches.

4.1. Shift Model

The shift model [101] for multiple pin net is usefulwhen we perturb the partition by shifting onemodule to a different vertex set or by swappingtwo modules between different vertex sets. Let us

simplify the description by assuming only one mod-ule is shifted to a different vertex set. A swap ofa pair of modules can be treated as two steps ofmodule shifting.For each shift, we want to update the cut count.

We also want to update the potential change in

cost for each module if it were to be shifted, so thatwe can rank the modules for the next move. Suchcost revision can be expensive if the circuit haslarge nets which contain huge numbers of pins,e.g., hundreds of thousand pins.The shift model reduces the complexity of the

cost revision by utilizing the property that forhuge nets most shifts of its pins do not changethe cost of the other pins in the net.

Let us simplify the description by considering a

two way partitioning. The model can be extendedto multiple way partitioning according to thechoice of objective functions. Let module v beshifted from vertex set V1 to V2. The configurationof nets ei E({vj.}) connecting module vj. is revised.For each net ei, we denote ki to be the numberof pins of ei in V1 and ]ei]- ki the number of pinsof ei in V2 (Fig. 14). With respect to net ei, we

update the pin numbers ki and lei]- ki after mod-ule v.. is shifted. We also update the cost of mod-ules in nets

1. If the revised ki>_2, the potential cost ofpins due to net ei is zero. For the case that]ei]-ki =1, we increase the cut count by ciand set the potential cost of pins in ei. Other-wise, the move has no effect on the cut countand potential cost.

2. If the revised pin count ki 1, the shift of thelast pin of ei in V will decrease the cut countby ci. We then update the potential cost of thislast pin.

3. If ki=O, the cut count reduces by c;. However,the shift of any pin v ei from V2 to V1 willincrease the cut count. Thus, in this case, we

reflect the cost of potential shift on the pins ofei, which takes O(]eil) operations.

v V2

kl levi-k,

FIGURE 14 Multiple pin net model of shifting process.

Page 16: Tutorial on Partitioning - Hindawi

190 S.-J. CHEN AND C.-K. CHENG

4.2. Clique of Two Pin Nets

Some researchers use cliques of two pin nets tomodel multiple pin nets. Given a multiple pin net

6’i, we construct a clique of (1/2)[eil(leil- 1) twopin nets to connect all pairs of pins in the net.The clique model maintains the symmetric rela-tion of the modules of the same net in the sensethat the order of the pins in the net has no effecton the cost.The weight of two pin nets in the clique module

is adjusted by some factor. One approach is touse 2/lei to scale down the connectivity. The totalweight of all the nets in the clique is (2/leil) x(1/2)[eil(lei[ 1)ci--- (lei[ 1)Ci. Note that it takeslei[- two pin nets to form a spanning tree of[eil modules.Other factor has been proposed such as 1/

(leil- 1) which is based on a different probabilitymodel. However, no factor can exactly reflect thecost of a multiple pin net model.

Complexity of the Clique Model The complex-ity of the clique model is high. There are O(leil 2)two pin nets in a clique model. Suppose the processof each two pin net takes a constant time. It takesO(lei[ 2) operations to process a multiple pin net

ei. Therefore, in practice, if the pin number islarger than a threshold, the net is ignored in theprocess.

4.4. Loop Model of Two Pin Nets

A loop model reflects the exact cut count [22],however, it is sensitive to the order of the pins.We can derive heuristic ordering of the pins us-

ing a linear placement. Modules are sequenced ac-

cording to their x coordinates in the placement.We find the partition by collecting the modulesaccording to the sequence.

Following the order of the modules in the xcoordinates, we link the modules of a multiplepin net with two pin nets into a loop. We link thepins in a sequence (Fig. 15) alternating on everyother module. The loop is formed by the two con-

nections at the two ends.A factor of (1/2) is assigned to the two pin nets

so that the cut count separating modules accord-ing to the sequence is one. The model remains cor-rect even if any two consecutive modules in thesequence swap their order.

4.5. Flow Model

For the network flow approach, we consider eachnet ei as a pipe. A set of saturated pipes forms abottleneck of the flow. The union of the saturatedpipes becomes the cut of the circuit. In such a

model, we set the capacity of the pipe equal to thecorresponding connectivity ci [52].

4.3. Star of Two Pin Nets

A star model introduces less complexity than a

clique model. Given a net ei, we create a dummymodule i. The dummy module i connects everypin in ei with a two pin net. This module maintainsthe symmetry of the net. However, we need only

leil two pin nets.For the clique and star models, the cost of the

partition depends on the number of pins on thetwo sides of the partition. The cost is higher whenthe pins are distributed more evenly on the twosides of the cut. Thus, these models discourageeven partitioning of the pins in the nets.

FIGURE 15 A loop model of multiple pin net where modulesare placed on an x axis.

FIGURE 16 A flow model with respect to net eu.

Page 17: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 191

Let Xiu be the amount of flow from pin 1 to nete, and x,a. be the amount of flow from net e, to pin

va. (Fig. 16). The total flow injected into the netshould be smaller than or equal to its capacity andthe incoming flow is equal to the outgoing flow,i.e.,

Z xiu cu’ (27)li C

Xiu Xui- O. (28)eu eu

functions. For example, we can apply group migra-tion to multiway [98,99] or multiple level parti-tioning problems [67, 68] with modification to thecost of the moves. Furthermore, some methodsmay be combined to solve a problem. For ex-

ample, we can use clustering to reduce the sizeof an input circuit and then use group migrationto find a partition of the reduced circuit withmuch greater efficiency [24, 59]. In fact, this strategyderives the best results in terms of CPU time andcut count in recent benchmark [2].

5. APPROACHES

In this section we introduce several approachesto partitioning. We first discuss two methods foroptimal solutions: a branch and bound methodand a dynamic programming algorithm. Thebranch and bound method is effective in search-ing exhaustively for the optimal solution for smallcircuits. The dynamic programming method pre-sented runs in polynomial time and finds anoptimal partition for a special class of circuits.We then explain a few heuristic algorithms:

group migration, network flow, nonlinear program-ming, Lagrangian, and clustering methods. Thegroup-migration approach is a popular methodin practice due to its flexibility and effectiveness.The network flow method gives us a differentview of the partitioning problem by transformingthe minimization of the cut count into the maxi-mization of the flow via a duality in linear pro-gramming. This approach derives excellentresults with respect to certain objective functions.The nonlinear programming method provides aglobal view of the whole problem. The Lagrangianmethod is a useful approach for performancedriven problems. Finally, we depict a clusteringmethod for the partitioning.

In most cases, we illustrate the method inquestion using two-way partitioning as the targetproblem. However, many methods can be ex-

tended to other problems or different objective

5.1. Branch and Bound Method

The branch and bound method is an exhaustivesearch technique that may be effectively applied tothe min-cut problem with size constraints for smallcases. In the branch and bound process, themodules are first ordered in a sequence. For eachmodule, we try placing it to either side of the cut.The process can be represented by a complete

binary tree with IV levels. The root of the tree isthe first module in the sequence. The nodes in thekth level of the tree correspond to the kth modulein the sequence. The two branches at each noderepresent the two trials where the kth module isplaced on each of the two different sides. A pathin the tree from the root to a leaf corresponds toone assignment for the partition.We use a depth first search approach to traverse

the binary tree. We prune the search space ac-

cording to the size constraint and a partial cutcount. In the binary tree, a node at level k alongwith the path from the root to the node representsa partition assignment of the first k modules. LetV1 and V2 be the two vertex sets of the partitionsof the first k modules. If S(Vi) > Su for or 2,the size constraint is violated, and there is no needto proceed. Thus, we prune the branches below.We also use a partial cut count to prune the

binary tree. The cut of the partial partition isexpressed as: E(VI, V2)={eil leiUI VII > 0 and

leiN V21 > 0}. The partial cut count is describedas" C(V1,V2)= Y’]eieE(v,v2) Ci. If the partial cut

Page 18: Tutorial on Partitioning - Hindawi

192 S.-J. CHEN AND C.-K. CHENG

count C(V, V2) is larger than the cut count of aknown solution, the partition results below thisnode are going to be worse than the existing solu-tion. We prune the branches of such a node.

Complexity of the Method Suppose the circuithas unit size si =1 on each module and theconstraint requires an even size SI=Su=[VI/2(assuming that vI is even). Applying Stirling’sapproximation [63], we have the number of pos-sible partitions:

Ivl! /21vl (29)(IV[/2)!2

(a)

(b)

Vsl Vtl

Vs2 Vt2

FIGURE 17 Construction of serial and parallel graphs.

Although the number of combinations is huge,we have found that the application to small cir-cuits is practical. We improve the efficiency ofthe pruning by ordering the modules according totheir degrees, i.e., the number of nets connectingto the modules, in a descending order. With anelegant implementation, we can find optimal solu-tions when the number of modules is small,e.g., vl _< 60.

5.2. Dynamic Programming for a Serial

and Parallel Graph

For the special case where the circuit can berepresented by a serial and parallel graph of unitmodule size, we can find a minimum two waypartition (V, V2) with size constraints in poly-nomial time. In this section, we first describe theserial and parallel graph. We then depict a dy-namic programming algorithm that solves thepartitioning problem on this class of graphs.We assume that all modules are of unit size, i.e.,

Si-- 1.A serial and parallel graph can be constructed

from smaller serial and parallel graphs by serialor parallel process. Each serial and parallel graphhas a source module v. and a sink module vt. Agraph G(V, E) with two modules, V {v., vt} andone edge E={e}, e={v, vt} is a basic serial andparallel graph. A serial and parallel graph is

constructed from the basic graph by a series ofserial and parallel processes.

Serial Process Given two serial and parallelgraphs, G(V,E1) and Gz(V2, E2), we construct aserial and parallel graph G(V, E) by merging thesink module Vl of G1 and the source module v,;2

of G2 (Fig. 17(a)). The source module V.l of graphG becomes the source module of graph G, i.e.,

v. v. The sink module vt2 of graph G2 becomesthe sink module of graph G, i.e., vt vt2.

Parallel Process Given two serial and parallelgraphs, G(V,E) and Gz(V2, E2), we construct a

serial and parallel graph G(V, E) by merging thesource module vs of G and the source module

v.2 of G2 and by merging the sink module Vtlof G1 and the sink module vt2 of G2 (Fig. 17(b)).The merged source module and merged sinkmodule become the source module v and the sinkmodule v of graph G, respectively.Dynamic Programming The dynamic program-

ming algorithm performs a bottom up processaccording to the construction of the serial andparallel graph. It starts from the basic serial andparallel graph. For each graph G(V, E), we derivetwo tables.

a(i,j): the minimum cut count with modules onthe left hand side and j modules on the righthand side under the condition that sourcemodule v is on the left hand side and sinkmodule v is on the right hand side.

Page 19: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 193

b(i,j): the minimum cut count with modules onthe left hand side and j modules on the righthand side under the condition that bothsource module v and sink module vt are onthe left hand side.

Let graph G(V,E) be constructed withG(V1,E) and G2(V2, E2) by one of the serial andparallel processes. Let a, b be the tables ofgraph G and a2, b2 be the tables of graph G2.We construct the tables a, b of graph G(V, E) asfollows.

Table Formulas for Parallel Process

a(i, j) mink+m=lv21a (i + k,j + m)+ az(k,m), Vi +j IVI,

b(i, j) mink+m=lv21bl (i + 2 k,j- m)+bz(k,m), Vi+j ]V].

(30)

(31)

For table a(i,j), we try all combinations oftables al and a2 with the constraint that the num-ber of modules on the left hand side is and thenumber of modules on the right hand side is j.Note that the extra addition of in the index isused to compensate the merging of the two sourcemodules or the sink modules. For table b(i,j), wetry all combinations of tables b and b2 with thesame size constraint.

Table Formula for Serial Process

a(i, j) min(mink+m=lv21al (i- k,j + m)q- bz(m,

minz:+m=lV21 bl (i + k,j- m)+a2(k,m)), Vi+j IvI, (32)

b(i,j) min(mink+m=lv21al (i- k,j + m)

+ a2(m, k),

min+m=lV21 b (i + k,j- m)

+ b2(k,m)), Vi +j IVI. (33)

For table a(i,j), we try all combinations oftables a and b2 and all combinations of tables

bl and a2. For the combinations of tables al andb2, the merged module (by merging vtl and ;s2)is on the right hand side. For the combinationsof tables bl and a2, the merged module is on theleft hand side. For table b(i,j), we try all combi-nations of tables al and a2 and all combinationsof tables bl and b2. For the combinations of tab-les al and a2, the merged module is on the righthand side. In terms of G2, its source module v2is on the right hand side and its sink module vt2is on the left hand side. Thus, the indices oftable a2 are reversed, i.e., a2(m,k) instead ofaz(k,m). For the combinations of tables b andb2, the merged module is on the left hand side.

5.3. Group Migration Algorithms

The group migration algorithm was first proposedby Kernighan and Lin [60] in 1970. Since then,many variations [15, 26, 27, 33, 39, 45, 49, 84, 97-99, 108, 111, 116] have been reported to improvethe efficiency and effectiveness of the method.Today, it is still a popular method in practice.The probability of finding the optimum solu-

tion in a single trial drops exponentially as thesize of the circuit increases [60]. Using the origi-nal version, Kernighan and Lin showed that theprobability of obtaining an optimal solution is afunction of the problem size, p(I vl)- 2-n/30.In other words, if the circuit size is large, then theheuristic Kernighan-Lin algorithm is unlikely tojump out of local minima, and so the optimumsolution will not be found. The progress made byresearchers on the method has definitely pushedthe envelope further.

In this section, we concentrate on two-way min-cut with size constraints. The method is flexibleand can be extended to other partitioning pro-blems with modifications of the moves and thecost function.The algorithm performs a series of passes. At

the beginning of a pass, each module is labeledunlocked. Once a module is shifted, it becomeslocked in this pass. The group migration algorithmiteratively interchanges a pair of unlocked modules

Page 20: Tutorial on Partitioning - Hindawi

194 S.-J. CHEN AND C.-K. CHENG

or shifts a single module to a different side withthe largest reduction (gain) of the cost function.This continues until all modules are locked. Thelowest cost along the whole sequence of swappingis recorded. The group migration takes the sub-sequence that produces the lowest cut count andundoes the moves after the point of the lowestcost. This partitioning result is then used as theinitial solution for the next pass. The algorithmterminates when a pass fails to find a result witha cost lower than the cost of the previous pass.

5.3.1. Group Migration Algorithm

Cut Count

___Sequence of moves

Subsequence .L Subsequ.enceto execute "[" to undo

FIGURE 18 Cost of a sequence of moves and subsequenceselection.

Input: Hypergraph H(V, E) and an initial parti-tion. Cost function and size constraints.

shifts, however, with consideration of the mutualeffect between the two shifts.

1. One pass of moves. (i)

1.1 Choose and perform the best move.1.2 Lock the moved modules.1.3 Update the gain of unlocked modules.1.4 Repeat Steps 1.1-1.3 until all modules are

locked or no move is feasible.1.5 Find and execute the best subsequence of

the move. Undo the rest of the sequence.

2. Use the previous result as an initial partition.3. Repeat the pass (Steps and 2) until there is no (ii)

more improvement.

Figure 18 illustrates the cost of a sequence ofmoves. This algorithm escapes from local optimaby a whole sequence of the moves even when a

single move may produce a negative gain.In the following, we discuss variations of severalparts in the process: basic moves (Step 1.1), datastructure, gains (Steps 1.1 and 1.3). At the end ofthis subsection, we introduce a net based moveand a simulated annealing approach.

5.3.2. Basic Moves

Basic moves cover the shifting of a single moduleand the swapping of a pair of modules. Aswapping can be conceived as two consecutive

Module Shifting For each unlocked module,we check its gain: the cost function reductionby shifting the module to a different sideassuming that the rest of the modules arefixed. To select the best module to shift, we

order on each side the modules according totheir shift gains. If the size constraints are vio-lated after the shift, the move is not feasible.We search for the best feasible module tomove [40].Pairwise Swapping We exchange two modulesin two vertex sets of the partition. Note thatthe gain of the swap is not equal to the sumof the gains of two shifts. The mutual effectbetween the two modules needs to be includedwhen we derive the gain. Thus, the best pairmay not be the two modules on the top ofthe two sides. The search of all pairs takeso(Iv llv21) operations. In practice, we ordermodules according to their shift gain. Thesearch of the best pair is limited to the top kmodules on each side, e.g., k 3. Thus, thecomplexity is actually O(k2).

Pairwise swapping is a natural adoption whenthe size constraint is tight. When no single shiftis feasible, we can use swapping to balance thesize of the partition.

Page 21: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 195

5.3.3. Data Structure

The choice of data structure strongly depends onthe cost functions, gains, and the characteristic ofVLSI circuitry. A sorting structure such as heapor AVL tree is a natural choice to sort for thetop modules. However, for the case that the gaindiffers by a very limited quantities, an array struc-ture can simplify the coding and the complexity.

(i) Heap or AVL Tree We can use a heap orAVL tree to sort the modules according totheir shift gain. Each side of the partitionkeeps a heap. The top of the heap is themodule of the maximum gain. The sorting ofeach module takes O(1VIlog([ vl )) operations.

(ii) Array (Bucket) of Link List Figure 19 illus-trate a bucket list data structure. The gain istransformed to the index of the bucket [40].Modules of the same gain are stored in thesame bucket by a link list. A bucket is an ef-fective data structure when the objective func-tion is the cut count. The gain of cut countis limited by the maximum degrees of themodules, i.e., degmax -maxv,cVeE({vi})Thus, the dimension of the bucket is set tobe 2 degmax.

For VLSI applications, the degree of modules ismuch smaller than the number of modules. Thus,the dimension of the bucket is small. It is very

efficient to search and revise the module order inthe bucket structure. In fact, it is proven that us-

ing the bucket structure and cut count as the objec-tive function, it takes linear time proportionalto the total number of pins to perform each pass[4o].

5.3.4. Gains

In this subsection, we use cut count as theobjective function. The extension to other costfunctions is possible. However, we may looseefficiency.

(i) Shift Gain We use shift model for multiple pinnets. Given a module vi, we check the set E({vi}) ofnets connecting to this module. The contributionof each net e E E({vi}) by shifting module vi is thegain ge(Vi) of the net with respect to module vi.

The gain g(vi) of module vi is the total gains ofall its adjacent nets, i.e., g(vi) e6E({vi}) ge(Vi).

(ii) Swap Gain The swap gain is the sum ofthe gains of two modules vi and vj, deducting theeffect on common nets, i.e., g(vi)+g(vj)--eE({vi))fqE({vj})(ge(Vi) + ge(vj)).

(iii) Weights of Multipin Nets The sequence ofthe move depends much on the gain calculation.For a circuit of 1,000,000 modules, suppose thedegree of most modules is less than 100 and each

max

gainmodule #

module

2

FIGURE 19 Bucket list.

Page 22: Tutorial on Partitioning - Hindawi

196 S.-J. CHEN AND C.-K. CHENG

net is of unit weight. We have roughly 1,000,000modules/200 gain levels 5,000 modules per gainlevel. To differentiate these 5,000 modules, wehave to adjust the weight of multiple pin nets.

(iii) (a) Levels with Priority The first level gain isidentical to the shift gain of cut count. The secondlevel gain is equal to the number of nets that haveone more pins on the same side. Thus, the kthlevel gain is equal to the number of nets that havek more pins on the same side [65]. The pins onthe other side will increase by one after the mod-ule is shifted. Thus, the negative gain of level kis contributed by the nets with k-1 pins on theother side.

Let us assume that module vi is in vertex set Vto simplify the notation. For each net e/E E({vi}),we denote kj lej-A V[ the number of pins in V.Let us define E(+,i,k) to be the set of nets

e./E E({vi}) with kj.=k+l pins in V (the extra oneis used to count module vi itself) and nonzero pinsin V2, i.e., ]e/l > k/. And E(-, i, k) to be the set ofnets e/ E({vi}) with no other pins in V and k-pins in V2, i.e., [ej. =k and kj 1. Then, the kthlevel gain of module vi, gi(k), is the weightdifference of the two sets, E(+, i, k) and E(-, i, k).

gi(k)- ce- ce (34)eEE(+,i,k) eEE(-,i,k)

E(+,i,k)- {ejlej E({vi}),kj-k--t- 1,]ej > kj}(35)

(36)

We compare the modules with a priority on thelower level gain. In other words, we compare thefirst level first. If the modules are equal atthe first level gain, we then compare the secondlevel and so on. In practice, we limit the numberof levels by a threshold, e.g., <_ 3.

(iii) (b) Probabilistic Gain In probabilistic gainmodel [37], each module vi is assigned a weightp(vi). The weight p(vi) is a function of the gain g(vi)of module vi to reflect the belief level (potential)

that the shift of module F will be executed at theend of the pass. Thus, if module vi is unlocked,

p(vi) f(g(vi)). (37)

Otherwise, p(vi)=0. Figure 20 illustrates function

f, which increases monotonically. The slopewithin go and gup amplifies the difference of gains.The slope is clamped at two ends Pmax and Pmin(0_<Pmin < Pma_< 1) which represent the maxi-mum potential that the module will shift or stay.

For each net eE({vi}), its contribution ge(vi)to the gain of module vi is the tendency that thewhole net will shift with module vi to the otherside. To simplify the notation, let us assume thatmodule vi is in V1. Thus, we have the followingexpression.

ji,vj CefqVl vjeenV2

where I-Ivjsp(vj) if s is an empty set. The firstterm IIji,vjecv,p(vj) in the parentheses is thepotential that all the pins will shift with module vito V2. Hence, Ce x 1-Iji,vEeev, p(vj) is the expectedgain if module vi is shifted. The second term

I-Ivjenv2p(vj) is the potential that the pins in V2will shift to V. Thus, Ce x I-Ivecw2p(vj) is theexpected loss if module vi is shifted.The gain of a module vi is the total gains of the

adjacent nets with respect to this module, i.e.,

g(vi)-- Z ge(vi). (39)eGE({v,})

f(g(v,))

tll

- g(v,)go

FIGURE 20

gup

Function of probabilistic gain.

Page 23: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 197

Net gain ge(V) and module potential p(vi) are

mutually dependent. We derive the values via

iterations. Initially, we use the plain shift gain (bycut count) to derive the potential p(vi)=f(g(vi)).From these initial potentials, we derive the prob-abilistic net gain. The net gain is then used toderive the module gain. In practice, we stop aftera limited number of cycles, e.g., two iterations([37]). Note that there is no guarantee that theiteration will converge.

After each move, the associated module poten-tial and probabilistic net gains are updated andthe plain cut count is recorded. Exact cut count isused when we select the subsequence of moveto execute.

It has been shown via benchmarks released byACM/SIGDA, the probabilistic gain model pro-duces excellent partitioning results; it outperformsthe other gain models by wide margins.

5.3.5. Net-based Move

The net based process [32, 115] is similar to themodule based approach except that all operationsare based on the concept of the critical and com-

plementary critical sets. The main differencesare (1) Instead of a single module, each move nowshifts one critical or complementary critical set,depending on the type of objective function. Forconvenience, we say a move is initiated by a net

eu if this move is composed of shifting the criticalor complementary critical set associated with e,.(2) The locking mechanism is operated on a net,that is, if the critical or complementary criticalset of a net has been moved then all the moves ini-tiated by this net will be prohibited thereafter.Given a net eu and a vertex set Vb, let us define

the critical set of net eu with respect to set Vb as

sub eu Cq Vb, (40)

and the complementary critical set of eu with re-spect to set V as

sub eu Vb (41)

For a move associated with a net eu, we caneither place the critical set Sub into a partitionother than V, or the complementary critical set

Sub into the partition Vb. The gain of each moveis then computed by evaluating the change of thecost due to the move of the critical or comple-mentary critical set.Usage of Basic Module Moves Although the

net-based move model provides a different processto improve current partition, it is more expensivethan the module-based move model because moremodules are involved in each move.We can mimic the net based move by adding

weights to the connectivity of desired nets [38].The basic move is still based on the modules.However, after module vi is moved, we add moreweights on the nets connecting to vi, i.e., E({vi}).These extra weights encourage the adjacent mod-ules to go along with module vi and thus achievesthe effect of net based move. Empirical study findsimprovement on the partitioning results.

5.3.6. Simulated Annealing Approach

For simulated annealing [14, 20, 56, 62, 81 ], we can

adopt the basic moves such as module shiftingand pairwise swapping. There is no need of lockmechanism. To allow a larger searching space, weincorporate the size constraints into objectivefunction, e.g.,

C(V1, V2) + a(S(V,) S(V2)) 2. (42)

where a is a coefficient. We can adjust it accord-ing to the annealing temperature. As temperaturedrops, we gradually increase a to enforce the sizebalance.

5.4. Flow Approaches

In this section, we assume that the circuit can berepresented by a graph G(V, E) with unit modulesize, i.e., si and all nets are two pin nets. Theflow approach can be extended to multiple pinnets using a flow model.

Page 24: Tutorial on Partitioning - Hindawi

198 S.-J. CHEN AND C.-K. CHENG

We first go through maximum flow minimumcut [1,73] to introduce the duality [30] and theconcept of shadow price. The derivation is thenextended to a weighted cluster ratio cut and a

replication cut. Finally, we introduce heuristic algo-rithms that accelerate the flow calculation. Theflow approach can derive excellent results. Fur-thermore, exploiting its duality formulation, wecan derive a tight bound of the optimal solutions.

5.4.1. Maximum Flow Minimum Cut

In maximum flow minimum cut formulation,the flow injects into module Vs and drains frommodule vt. The flow is conservative at all othermodules. The capacity of the nets eij is equal toits connectivity, co.. We set cij=O if there is nonet connecting modules vi and v#. The notation xi9denotes the amount of flow from module vi tomodule v# and x#i denotes the amount of flow frommodule vj to module vi on net e0.. The objectiveis to maximize the flow injection f into vs.

Obj" maxf (43)

subject to the constraints,

Xij-at- Xji Cij, Vl < i,j <_ (44)

xs Zxsj f O (45)j=l j=l

j=l j=l

(46)

ivl

Z xijj=l

IvlXji- O, V1 _<i_< IvI

j=l

(47)

xij>_O, V1 <i,j<lV[. (48)

To derive the duality, we use shadow prices: a

bidirectional distance do. for each net ei9 Eq. (44),potential Ai for each module vi Eqs. (45)-(47) Thedual problem can be expressed as follows [30].

Obj" min Z cijdij (49)EE

subject to

d0- IAi- Aj], V1 <_ i,j <_ IvI, (50)

Figure 21 illustrates the formulation. As we

increase the flow, certain nets are going to satu-rate, i.e., the two sides of inequality expression(44) become equal. Once the saturated nets be-come a bottleneck of the flow, the set of netsforms a cut E(V1, V2) with vs E V1 and vt E V2. Induality, the potential of modules in V2 increases to

one, and the potential of modules in V1 remainsto be zero, i.e., Ai 1, Vvi V2 and Ai 0, Vvi VI.

FIGURE 21 Illustration of maximum flow minimum cut formulation.

Page 25: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 199

The distance of nets in the cut is one, while thedistance of nets outside the cut is zero, i.e., do.= 1,

Vc E(V, V) and d=0, Vci

_E(V, V2).

5.4.2. The Weighted Cluster Ratio Metricand a Uniform Multi-commodityFlow Problem

In a uniform multi-commodity flow problem[74, 75], the demand of flow between each pairof modules is equal to an identical value f. As we

keep increasing f, some of the nets becomesaturated. These saturated nets form a bottleneckof communication and thus prescribes a potentialclustering of the communication system [71].We simplify the notation by assuming a graph

model G(V,E). From each module Vp, we injectflow f/2 to each of the rest modules. Summingup the flow in two directions, the flow betweeneach pair of modules is f. We define the flow origi-nated from module Vp as commodity p. Let xbe the flow for commodity p on net e0.. The objec-tive is to maximize f:

Obj" maxf (52)

subject to the flow demand from module Vp to theother modules

/ -f/2 ifi:/-p, and

(IVI-1)f/2 if/-p, and

<_i,p<_ IVI,<_i,p<_lVI,

(3)

and the net capacity constraint,

p=l p=l

(54)

We transform the above linear programmingproblem to its dual expression by assigning dualvariables Alp) to module vi with respect to com-

modity p Eq. (53), and distance do. to net eiy

Eq. (54), then we have:

Obj" minZ cidi (55)eo.EE

subject to

d/j C I, i,j,p <_ gl (56)

Ivl

E E (A/(P)-Ap(p)) >_1p=l i=l,ip

(57)

The Properties of Shadow Prices The shadowprice d can be viewed as bidirectional, i.e., do.= 4i.It represents the distance of net ei#, which cor-

responds to the cost to transmit flow through ei#.Variable A/(p) is the potential of module vi withrespect to commodity p.From constraints (56), (57), we can derive two

properties for distance function do and potential

Property I: Triangular Inequality The distancemetric d satisfies the triangular inequality"

dij -- 4k dik, Viii, Vj, Fk V (58)

Property II: Potential Function The term A/(p)-Ap(p) in expression (56) is equal to the shortestdistance between modules v; and Vp based on netdistances do.. In fact, from triangular inequality,we obtain A7)- Ap(P)= dip.

We normalize the objective function (55) withthe left hand side terms of inequality (57). Theobjective function can be expressed as:

Obj" minEetjEE cijdij

(1/2) -lpV__l, EI.V p ,i(P )

EevE cijdij

(1/2) E, El’V=ll,ip dip(59)

In the solution of linear programming problem(52)-(56), the nets with positive do. values parti-tion V into vertex sets V1, V2,..., Vk. More

Page 26: Tutorial on Partitioning - Hindawi

200 S.-J. CHEN AND C.-K. CHENG

specifically, nets connecting modules in differentsets, Vi, Vj., Cj, have the same distance dO. values(we use do to denote the distance between vertexsets Vi and V. when this does not cause confusion),while nets connecting only modules in the samesubgraph have zero distance, d/y 0 (Fig. 22). Wecan rewrite the denominator of the objectivefunction and state the problem as follows.

Statement of Weighted Cluster Ratio Cut[103] Find the distance do and the number ofpartition k with an objective function of weightedcluster ratio:

minu,kWc V1, V2,..., Vk)

(60)m,nu,, y,./:j+, ,)f dijS(Vi) S(Vj)

where distance do is subject to the property oftriangular inequality.According to the mechanism of the duality,

the objective functions of the primal and dualformulations are equal when the solution isoptimal [25].

THEOREM 5.1 For feasible solutions, we have theinequality f <_ Wc(V1, V2,..., Vk). The equalityholds when the solution is optimal, i.e., the maxi-

mum uniform multicommodity flow equals theminimum weighted cluster ratio of any cut,

maxxgjf <_ mind,kWc(V1, V2,..., Vk).

Expression (60), weighted cluster ratio [103], issimilar to cluster ratio with a weighted metric do..In general, the solution for the minimum weightedcluster ratio does not directly correspond to thepartition of optimum cluster ratio. However, ifdistance do. is a constant value between all pairsof vertex sets Vi and V then the weighted clusterratio provides the solution for cluster ratio.When the nets with positive distance do. form a

two-way partition, we can show that the partitiondefines the ratio cut. When the nets with positivedistances form a k-way partition with k < 4, wealso find that there exists a two-way partition thatagain defines the ratio cut [28].

THEOREM 5.2 Let net set D {eo.ldO. > 0} definea cut that separates the circuit into k disconnectedsubsets. If k <_ 4, then there exists a ratio cut thatis a subset of D.

5.4.3. A Replication Cut for Two-wayPartitioning

We adopt the linear programming formulation ofnetwork flow problem [1, 30], where each moduleis assigned a potential and a cut is representedby the difference of module potentials as shownin Figure 23. With respect to the directed cutE(Vl -- 0’1 ), we use w; to denote the potential dif-ference between the cut from module vi V1 tomodule v V1. The potential of each module vi isdenoted by Pi. For module vi in V1, pi 1, and for

Pi=O, qi=l

p/=l, q/=l Pk =O, qk =O

FIGURE 22 Distance between clusters. FIGURE 23 p potential and q potential of each module.

Page 27: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 201

modules vi in rl, pi=O. Thus all nets e6 E E(V1 --V1) have Wig 1. The remaining nets haveWith respect to the directed cut E(V2 V2), we

USe Uji with a reversed subscript ji to denote thepotential difference between the cut from module

vi E V2 to module Vg V2 (Fig. 23). The potential ofeach module vi is denoted by qi. For modules viin V2, q;= 1, and for modules vi in V2, q,.=0. Thepotential difference Hji has a reverse directionwith net eig because we set the potential on V2 sidehigh and the potential on V2 side low. All nets

eij E E(V2 -+ V2) have Ugi 1. The remaining netshave ugi O.

Primal Linear Programming Formulation Theproblem is to minimize the total weight of crossingnets:

Obj" min E cowo + E cjiuo (61)e CE CE

subject to

wo -pi +pj > 0

uij qi + qj 0

qi --Pi

_0 VVi E V, vi vs, vt

ps

(62)

(63)

(64)

(65)

q.- (66)

Pt -0 (67)

qt 0 (68)

wij, uij 0 Vl i,j <_ Ivl (69)

To minimize objective function (61), the equal-ity of constraint (62) holds, i.e., wo.=pi-pg, ifp-_>_ pg, otherwise, w0-= 0. Similarly, constraint (63)requires uig= qi qg if qi qj, otherwise uo.= O.Expression (64) demands potential qi be not lessthan potential pz for any module vie V. Sincehigh potential Pi corresponds to set V1, and highpotential qi corresponds to set V2, inequality (64)enforces V1 be a subset of ’2. Consequently, therequirement that V1 N V2 is satisfied.

Constraints (65)-(68) set the potentials ofmodules vs and yr. Constraint (69) requires poten-tial difference wig and u/ be nonnegative. Fig-ure 23 shows one ideal potential configuration ofthe solution.Dual Linear Programming Formulation If we

assign dual variables (Lagrangian multiplier) x0.to inequality (62) with respect to each net, x. toinequality (63), Ai to inequality (64) with respectto module Vi, and a, bs, at, bt to inequalities(65)-(68), respectively, then we have the dualformulation.

ON’maxa + b (70)

subject to

xij <_cij Vl_<i,j<_ IV (71)

xij<_cji V1 <_i,j<_ IVI (72)

--Xij @- Xji )i 0

Ivl__Xij @ Xj _qt_ /i

j=l

VVi E V, Vi L Vs Vt (74)

Ivl+ + 0x as

j--1

(75)

Ivl+ Xjt -t- at 0

j--1

(76)

Ivl-xj + xj + b

j=l

0 (77)

Ivl-xtj + x2t + bt 0 (78)j=l

(79)

a, at, b, bt unrestricted (80)

where inequalities (71), (72) are derived withrespect to each wig and uo. respectively. Similarly,

Page 28: Tutorial on Partitioning - Hindawi

202 S.-J. CHEN AND C.-K. CHENG

Eqs. (73)-(78) are derived with respect to each Pi,

qi, Ps, Pt, qs and qt. The equality of Eqs. (73)-(78)holds because Pi, qi, Ps, Pt, q and qt are notrestricted on sign in the primal formulation.Variables Ai, xij, and xij are positive in Eq. (79)because their corresponding expressions (62)-(64)are inequality constraints.We can view G(V, E as a network flow problem

and interpret cij as the flow capacity, xij as theflow of net %.. Constraint (71) requires that theflow x0. be not larger than the flow capacity ci# oneach net ei#. In constraint (72), the set of nets

is notare in a reversed direction and flow x/larger than the capacity of the capacity c#; of net

e#i in E. Corresponding to G(V,E), we use

G’(V,E I) to denote the reversed graph.Constraint (73) has the total flow xij injected

from module vi into G be equal to -A;. On theother hand, constraint (74) has the total flow xijinjected from module vi, into G be equal toSuppose we combine Eqs. (73) and (74), we have

--Xij + Xji "i Z X. Xi. (81)J J

This means that the amount of flow Ai whichemanates from module v; in G enters its corre-

sponding module in vi, in G.

Constraints (75)-(78) indicate that as and bsare the flow injections to module vs in G and itsreversed circuit G; at and bt are the flow ejectionsfrom module vt in G and its reversed circuit G’,respectively. Combining circuit G and G together,we have the maximum total flow, as+ b, be theoptimum solution of the minimum replication cutproblem.

5.4.4. The Optimum Partition

In this subsection, we describe the construction ofreplication graph and take an example to describeit. We then apply the maximum flow algorithmon the constructed replication graph to derive an

optimum replication cut. The optimality of thederived replication cut is proved by using a net-work flow approach.

Construction of Replication Graph Given a cir-cuit G(V, E and modules Vs and vt, we constructanother circuit G’(V’,E’) where V’ 1=1 V[ with

in V corresponding to a module vieach module vin V, and ]E’l= EI with each directed net eij in E’in the reverse direction of net %. in E. We createsuper modules v and v and nets (v, v), (v, v),(vt, v’), and (v’t, v’) with infinite capacity as shownin Figure 24. From every module vi in V except vs

x X

O:D

X’ X’

FIGURE 24 The replication graph G*.

Page 29: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 203

and vt, we add a directed net of infinite capacityin Vt. We refer toto the corresponding module v

the combined circuit as G*.Polynomial-time Algorithm The optimum repli-

cation cut problem with respect to module pairvs and vt and without size constraints can besolved by a maximum-flow minimum-cut solutionof the circuit G* with v as the source and v asthe sink of the flow (Fig. 24). Suppose themaximum-flow minimum-cut finds partition(X,X) of V with vsE X and vt X and partition

X 2o(X,2’) of V’ with vs and v Then a repli-cation cut (V1, V2) of the original circuit withVI=X, V2-{ii’2’} andR=V-V-V2isanoptimum solution. Note that V2 is derived fromthe cut in vertex set V. To simplify the notation,we shall use (X,2) to denote the derived replica-tion cut of G.

Example Given a circuit in Figure 25, itsreplication graph G* is constructed as shown inFigure 26. The maximum-flow minimum-cut ofG* derives (X,2) ({vs, Va}, {vb, Vc, vt}i and2’)- ({v, Va, v, vc}, {vt}) with a flow amount, 5

(Fig. 26). Thus the sets V1 {v, Va} and V2 {vt}define an optimum replication cut R(V1, V2) withR {vb, vc} and a cut cost equal to 5 (Fig. 27).

The network flow approach leads to the opti-mality of the solution as stated in the followingtheorem.

THEOREM 5.3 The replication cut R(X,f() derived

from the transformed circuit G* generates theminimum replication cut count CI(X,f(I) (expression(19)).

FIGURE 25tion cut.

3

A five module circuit to demonstrate the replica-

3

FIGURE 26 The constructed replication graph of the circuit shown in Figure 25.

Page 30: Tutorial on Partitioning - Hindawi

204 S.-J. CHEN AND C.-K. CHENG

FIGURE 27

II

tI!

The duplicated circuit of the circuit shown in Figure 25.

5.4.5. Heuristic Flow Algorithms

We introduce the heuristic approaches that accel-erate the flow calculation and take advantagethe optimality properties of the flow methods.We first introduce an approach that utilizes themaximum flow m,i.nimum cut method for the mincut with size constraints. We then explain a short-est path method for multiple commodity flowcalculation.

(i) Usage of Maximum Flow Minimum Cut Weadopt a heuristic approach [113] to get around theunbalanced partition of the maximum flow andminimum cut method. First, we find two seedsas the source and the sink modules, vs, yr. We thenuse the maximum flow and minimum cut meth-od to find partition (V1, V2) with vsE V1 andvt E V2. Suppose the size S(VI) of V is larger thanthe size S(V2) of V2, we find from V a module vito merge with V2 and shrink set V2 as a new sinkmodule. Otherwise, we find from V2 a module vi tomerge with V1 and shrink set V as a new sourcemodule. We repeat the maximum flow minimumcut process on the graph with new source or sinkmodule until the size of the partition fits the sizeconstraint.

Two Way Partitioning using Maximum FlowMinimum Cut

1. Find two seeds as v and vt.2. Call Maximum Flow Minimum Cut to find

partition (V1, V2).3. If S(V1)> S(V2), find a seed vie V, merge

{vi} U V2 into a new sink module v.4. Else find a seed vie V2, merge {vi} V1 into

a new source module Vs.5. Repeat Steps 1-4, until S < S(VI) < Su and

S < S(V2) < S

We can use parametric flow approach recur-sively to the maximum flow minimum cut prob-lems recursively (Step 2). The total complexityis equivalent to a single maximum flow minimumcut.The seeds are chosen according to its con-

nectivity to the vertex set in the other side. Theresult is sensitive to the choice of the seeds. Wecan make multiple trials and choose the bestresults. Other methods such as programming ap-proach can serve as a guideline on the choice ofthe seeds [79,80]. The method has shown toderive excellent results with reasonable runningtime.

Page 31: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 205

(ii) Approximation of Multiple Commodity FlowBased on the multicommodity flow formulation[103], we try to solve a multiple way partitioningby deriving approximate multiple commodityflow with a stochastic process [13, 55, 114, 117].

Given a circuit H(V, E ), the flow increment A,and the distance coefficient c, the algorithm startswith procedure Saturate-Network to saturatethe circuit with flows. A stochastic flow injectionalgorithm is adopted to reduce the computationalcomplexity. Then, Select-Cut is activated to selecta set of nets by the flow values to constitute a cut.The conversion from weighted ratio cut to clusterratio cut is performed by a Select-Cut routinewhich selects the subset of the cut derived fromSaturate-Network with a greedy approach.

Multiple Commodity Flow Approximation(H, A, cO1. Iterate the following procedures

1.1. Saturate-Network (H, A, c).1.2. Select-Cut (H) until the clustering result are

satisfactory

2. Output clustering result.

Procedure Saturate-Network (H, A, c)

1. Set the distance of each net e to be one.2. While (H is connected) do Steps 2.1 to 2.3.

2.1. Randomly pick two distinct modules vand vt.

2.2. Find the shortest path between v and vt.2.3. For each net e on the shortest path, let

f (e) and de be the flow and distance ofnet e.

2.3.1. If n is not saturated, increase f (e)by A and set de exp ((c x f (e))/

2.3.2. If e is saturated, set de to be

3. Output E with flow informations.

The initial distance of each net is one sincethere is no flow being injected (see the distance

formulation in Step 2.3.1). Step 2.1 uses a randomprocess with even distribution over all modulesto pick two distinct modules, and Steps 2.2-2.3inject A amount of flows along the shortest pathbetween the modules. In Steps 2.3.1- 2.3.2, the dis-tances of the nets whose flow has been increasedare recomputed using an exponential functionde=exp((c x f(e))/Ce) to penalize the congestednets, where de and f (e) are the distance and flowof net e, respectively. Steps 2.1- 2.3 are iterativelyexecuted until a pair of modules are chosen whereall possible paths between them are saturated byflows. These saturated nets identify a partitionof the circuit.

Figure 28 shows a sample circuit saturatedby flows after executing Saturate-Network withA 0.01 and c 10. The flow values are shown bythe numbers right beside each net. The dashedlines indicate the cut lines along the set ofsaturated nets to form the three clusters. Thesesaturated nets define an approximate weightedcluster ratio cut which are potential set of nets fora selection of cluster ratio cut.

5.5. Programming Approaches

For programming approaches [7, 18, 35, 41,44, 46],we adopt two way minimum cut with size con-straints as the target problem. We assume thatthe nets are two pin nets and thus, the circuit can

be described as a graph G(V, E). We also assumethe modules are of unit size, i.e., si 1.The two way partition (V1, V2) is represented by

a linear placement with only two slots at coordi-nates and 1. For an even sized partition, halfof the modules are assigned to each slot. Let X

denote the coordinate of module vi. If vieXi--- 1, else Xi---- for 11 E V2. The cut count can beexpressed as follows.

c(u ,

where X is a vector of x;, and X- is the transposeof vector X. Matrix B has its entry b0.=-c0. if

Page 32: Tutorial on Partitioning - Hindawi

206 S.-J. CHEN AND C.-K. CHENG

node net

cut line

.55

-- .65.67

! \ .,l.OO’:-.-./_ ...--’:""" "1.00

t

FIGURE 28 The flow and partition generated by saturate-network.

ij, else bii- -]l_<j<_lvl cij. Suppose we relax theslot constraint by enforcing only the rules of thegravity center and the norm. The constraint ofvector X can be expressed as:

lvX- O, (83)

X X- IVl (84)

Matrix B is symmetric and diagonally semido-minant. Thus, it is semipositive definite, i.e., alleigenvalues are nonnegative. And its eigenvectorsare orthogonal. Let us order its eigenvalues fromsmall to large, i.e., A0_< A1... < AlVl_l. The smal-lest eigenvalue A0=0 with its eigenvector X0 1.The second eigenvalue A1 is nonnegative with itseigenvector orthogonal to the first eigenvector, i.e.,

X-X 1-rX 0. Therefore, the second eigenvec-tor X is an optimal solution to objective function(82) with constraints (83) [46]. Since X-rX=IVEq. (84) the solution

X-[BX1- - /1 X?X ,,l x IVl, (85)

which is a lower bound of the min-cut problem.

To push for a higher lower bound, we can

adjust the diagonal term of matrix B by addingconstants di. Let

C(Vl, V2) C(Vl, V2) --- di x X2il<i<lvl

4 2----" di (86)l<i<lvl-- x-x

where matrix has its entry i- b if Cj, else

ii bii + di. Either xi or xi 1, the last twoterms cancel each other. The modification thusdoes not alter the optimal partition solution.The new nonlinear programming problem is to

find the assignment of d to maximize the objec-tive function [11]:

4l<i<lvI

where /1 is the second smallest eigenvalue ofmatrix l. The solution is an upper bound of the

Page 33: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 207

partition. It is larger than A1 in the sense that A1can serve as an initial feasible solution to maximizeexpression (87).

Remarks The programming approach finds a

global view of the problem [9, 79, 80, 118]. How-ever, the formulation is very restricted. The exten-sion to multiple pin nets and the incorporationof fixed modules will destroy the nice structurebased on which we have the eigenvalue and eigen-vector as optimal solutions. Therefore, it is diffi-cult to utilize the approach recursively.

For a general case, we can view the problemas nonlinear programming with Boolean quad-ratic objective function. Nonlinear programmingtechniques are adopted to derive the results[16,107].

5.6. A Lagrange Multiplier Approachfor Performance Driven Partitioning

Lagrange multiplier is one useful tool for perfor-mance optimization. In this section, we demon-strate the usage of Lagrange multiplier forperformance driven partitioning. The problem isto optimize the performance of a two-waypartition (V1, V2) with retiming [86].We first introduce a vector of binary variables

to represent a partition. The performance-drivenpartitioning problem is thus represented by aBoolean quadratic programming formulation withnonlinear constraints. We then absorb the non-linear constraints into the objective function as a

Lagrangian. We use primal and dual subproblemsto decompose the Lagrangian and derive thepartitions. Lagrange multiplier is adjusted in eachiteration via a subgradient method to monitor thetiming criticality and improve the performance.

5.6.1. Programming Formulationwith Lagrange Multiplier

We assume that the circuit can be represented by a

graph G(V, E) with two pin nets and unit module

size. The two-way partition is described by a vec-tor x= (Xl,1,..., x,n, x2,,..., x2m), where Xb,i is

if module vi is assigned to vertex set Vb,otherwise xb,i is 0. If modules vi and v. are indifferent vertex set, the value of the term

Xl,iX2,jqt-X2,iXl,j is equal to 1. This contributesone interpartition delay 8 into the delay of thenet eij. Let gt(x) denote the delay to register ratioof loop I. Delay ratio gt(x) can be written as thefollowing formula:

dg @ etjEl X (Xl,iX2,j @ X2,iXl,j)gl(x)

rl(88)

Given a path p, the total delays hp(x) of p is asfollows"

hp(x) dp Av Z x (Xl,iX2,j Av x2,iXl,j) (89)eo.Ep

To formulate the problem, we use an objectivefunction of cut count"

min cij(x,,ix2,j + X2,iXI,j), (90)ejE

subject to the following constraints:

C1 (Size Constraints)

IvlZXb,iSi S V b E {1,2}. (91)i=1

C2 (Variable Assignment Constraints)

2

Z xb’i V Yi g. (92)b=l

C3 (Iteration Bound Constraints)

gt(x) <_ J V loop I. (93)

C4 (Latency Bound Constraints)

hp(x) _< V/O-critical path p. (94)

Actually, we don’t need to consider all loops in C3.Because all loops are composed of simple loops,we have the following lemma:

Page 34: Tutorial on Partitioning - Hindawi

208 S.-J. CHEN AND C.-K. CHENG

LEMMA Given a number ), ifgl(x) is less than or

equal to )for any simple loop l, then g(x) is lessthan or equal to J for all loops 1.

Let 7rc and 7rp represent the number of the simpleloops and the number of /O-critical paths,respectively. Let A denote the vector (Ag,,...,Auc, Ah,,..., Ahp). Using Lagrangian Relaxation[104], we absorb the constraints (93) and (94) intothe objective function (90). The Lagrangian-relaxed problem is as follows.

max min L(x,A) (95)A>0 x

subject to constraints C1 and C2, where

t(x,/) Z ij(Xl,iX2,j --t- x2,iXl,j)

+ Z Ag, (gl(x) 1)V simple loop

V/O-critical path p

(96)

(i) The Dual Problem Given vector x, we canrepresent (96) as a function of variable A, i.e.,

Lx(A). Thus, the dual problem can be written as:

maxL(A) (97)A>0

(ii) The Primal Problem Let FO. and Qo denotethe sets of the simple loops and/O-critical pathspassing the net e0.. The cost ao. of net e0. iscomposed of connectivity c,../ and the penalty ofthe timing constraints.

aij cij -+- Z -l Ag + Z tSAh

IEFij pEQij

(98)

Given vector A, we can represent (96) as a functionof vector x, i.e., La(x). Thus, the primal problemcan be rewritten as:

min L(x) min Z aij(Xl,iX2,j nc- x2,iXl,j) +/E

(99)

subject to C1 and C2, where /3 represents theconstant contributed by A.

5.6.2. Subgradient Method using CycleMean Method

We solve the partitioning problem through primaland dual iterations on the Lagrangian. A Quad-ratic Boolean Programming, QBP, [16] is used tosolve the primal problem and generate a solutionx (Step 2).

For the dual problem based on x, we select theset of loops and paths that violates the timingconstraints as active loops and paths. The netscontained in the active loops or paths are termedactive nets.

Active Loops and Paths Given a solution x, a

loop is called active, if g;(x) is not less than J. Apath p is called active, if hp(x) is not less than .

Active Nets Given a net e, we define e to be anactive net, if net e is covered by an active loop oran active path.We call a minimum cycle mean algorithm [57]

and an all-pairs shortest-paths algorithm to markall the nets on active loops and paths, respectively(Step 3). For every net eij on active paths, we

record q0: the maximum path delay among allpaths passing through e0.. For every net eij on

active loops, we record Po: the maximum delay-to-register ratio among all loops passing through e0..We then calculate the subgradient on the markednets and update the constants ao. for the nextprimal dual iteration (Steps 4-5). We increase thecosts of active nets using subgradient approach[104]. The iteration proceeds until the bound ofall loops and paths are within the given limits.

Algorithm using Lagrange Multiplier Input: Con-stants ),,c- 1.3 and an initial partition

Initialize k +- 1" a(.tj Cij.

2. Run QBP [16] to find a partition (Vk), v(k)2an object to minimize cut count C(Vk),-,with

Vk) -"eGE(VIk),Vk)) a!,;

Page 35: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 209

3. Calculate the iteration and latency bounds ofthe partition (Vk),Vk)), respectively. Stopif timing constraints are satisfied. Otherwise,revise P0 and qij for all nets eij.

4. Compute

lc(vl v]--ej E (P j )2 -ll- -e eE qO" 21’/) 2

5. Revise shadow price a0. for all nets eij E E:a(+/_ a!./.0 J (k+) a!.k) +if net e!j is in active loop, then tij tj

(+1) a!.) +if net e0. is in active path, then aij zj

6. While k <_ MaxNumIter, set k k+ and gotoStep 2.

5.7. Clustering Heuristics

We first discuss the usage of clustering heuristics.We then discuss top down clustering and bottomup clustering approaches. At the last, we discusssome variations of clustering metrics.

5.7.1. Usage of Clustering Heuristics

The usage of clustering heuristics plays an

important role in determining the quality of thefinal results. In the following, we discuss the issuein different topics. We use a two-way partitioningwith size constraints as the target problem.

1. Top Down Clustering versus Bottom Up Clus-tering: Top down clustering approach providesa global view of the solution. The operationsare consistent with the target problem. How-ever, it is more time consuming because theclustering operates on the whole circuit [29].Bottom up clustering is efficient. However, be-cause the process operates locally, the targetsolution is sensitive to the clustering heuristics

[59].2. The Level of the Clustering: Suppose we rep-

resent the clustering results with a hierarchical

tree structure. Let the root correspond to thewhole circuit, the leaves correspond to thesmallest clusters, and the internal nodes corre-

spond to the intermediate clusters. Hence, thesize of the clusters grows with the level of thenodes. Top down clustering creates clusterscorresponding to nodes in high levels, whilebottom up clustering creates clustering corre-

sponding to nodes in low levels.For example, in [60], Kernighan and Lin

proposed a top down clustering approach,which divides the whole circuit into four clustersonly. In [59], Karypis et al., used a bottom upclustering which starts with clusters of twomodules or a net. If we continue the applicationof bottom up clustering on intermediate clus-ters, the quality of the clusters degenerates asthe size of the clusters grows bigger.Iteration of Clustering and Unclustering: Wego through the iterations of clustering and un-

clustering to improve the quality of the results.At each level of the hierarchical tree, we derivean intermediate target solution, e.g., a two-waypartition. In unclustering, we go down the levelof tree hierarchy to find an expanded circuitwith more modules. In clustering, we go upthe level of tree hierarchy with a circuit of asmaller number of modules. The previous parti-tioning result becomes the initial of the new par-titioning problem. Note that the hierarchicaltree is constructed dynamically. For each clus-tering, the modules can be grouped based on

the current partitioning configuration.The Clustering Operations and the TargetSolution: The clustering operation has to beconsistent with the target solution. For exam-

ple, suppose the target is finding a two-waymin-cut with size constraints. Then, it is naturalto cluster modules based on net connectivitybecause the probability that a net is in an opti-mal cut set is small (see the subsection ofmin-cut with size constraints in problem for-mulations). Moreover, it is important that theclustering follows the current partitioning

Page 36: Tutorial on Partitioning - Hindawi

210 S.-J. CHEN AND C.-K. CHENG

results, i.e., only modules in the same parti-tion are clustered.

5.7.2. Top Down Clustering Approachfor Partitioning

We use an application to two-way cut with sizeconstraints to illustrate the top down clusteringapproach [24, 29]. The partitioning of huge designsis complicated and the results can be erratic. Ourstrategy (Fig. 29) is to reduce the circuit complex-ity by constructing a contracted hypergraph.The clusters for the contracted hypergraph aresearched via a recursive top down partitioningmethod. The number of modules is much reducedafter we contract the clusters. Hence, a group mig-ration approach can derive excellent two way cutresults on the contracted hypergraph with muchefficiency. Furthermore, since the clusters aregrouped via a top down partitioning, concep-tually a minimum cut on the hypergraph can takeadvantage of the previous results and generatebetter solutions.

In this section, we describe a top down clus-tering algorithm. A ratio cut is adopted to perform

the top down clustering process. Other partitionapproaches can also be used to replace the ratiocut. A group migration method is used to finda minimum cut of the contracted hypergraphwith size constraint. Finally, we apply a last runof the group migration algorithm to the originalcircuit to fine tune the result.

Input a hypergraph H(V,E), an integer kfor the number of expected clusters, an integernum_of_reps for repetition, and St, S, for the sizeconstraints of two resultant subsets.

1. Initialize tI, { V} and V*= V.2. Apply ratio cut [109] to obtain a partition

(A, A’) of V* A U A’.3. Set P=(-{V*})U{A,A’}. Set V* to be a

vertex set in P such that S(V*) maxv/ S(Vi).4. While S(V*) > ((S(V ))/k), repeat Steps 2, 3.5. Construct a contracted hypergraph Hr(Vr, Er).6. Apply num_of_reps times of a group migration

algorithm to Hr with the size constraints St, S,.7. Use the best result from Step 6 to the circuitH as an initial partition. Apply a group migra-tion algorithm once to H with the size con-straints St, S,.

H(V,E)

CConstructHF(VF’EF) --i C2

(_ /-"(C)- _/// Partition

Hr(Vv,Er)

FIGURE 29 Strategy of top down clustering.

Page 37: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 211

The choice of cluster number k It was shown[24] that the cut count versus cluster number k is aconcave curve. When k is smll, the quality is notas good because the cluster is too coarse. When kis large, there are too many clusters. We lose thebenefit of the clustering.For the case that the circuit is large, we may

need to adopt multiple levels of clustering to pushfor the performance and efficiency [58, 66].

5.7.3. Bottom Up Clustering Approaches

In this section, we discuss bottom up clustering[90] with two applications: linear placement andperformance driven designs. We then show two

strategies to perform the clustering: maximummatching and maximum pairing. We will demon-strate via examples the advantage of maximumpairing over maximum matching.

(i) Linear Placement For linear placement, wereduce the complexity of the problem by a bot-tom up clustering approach [53, 96, 100]. The clus-tering is based on the result of a tentativeplacement. We adopt a heuristic approach togenerate tentative placements throughout itera-tions. In each iteration, we cluster modules onlywhen they are in consecutive order of the place-ment. We then construct a contracted hypergraph.In the next iteration, the heuristic approach gen-erates the placement of the contracted hypergraph.For each iteration, we either grow the size of theclusters or construct new clusters adaptively.

Inspired by the property of the minimum cutseparating two modules (Theorem 3.1), we use a

density as a measure to find the cluster. A densityd(i) at a slot of a linear placement is the totalconnectivity of nets connecting modules on thedifferent sides of the slot. The following algorithmdescribes the clustering using a given placement.Each cluster size is between L and U.

Input placement P, two parameters L and U.

1. Initialize cluster boundary at slot p 1.2. Scan placement P from slot p toward

the right end. Find slot such that

p + L <i <p + U and density d(i) is minimumamong d( p + L) d(p + U).

3. Cluster modules between slots p and i. Setp=i+l.

4. Repeat Steps 2, 3 until the scan reaches theright end.

Remark The proposed clustering process andthe criteria are consistent with the target linearplacement application. The whole process dependson an efficient and effective linear placement.

(ii) Performance Driven Clustering For perfor-mance driven clustering [31, 112], nets which con-tribute to the longest delay are termed criticalnets. Pins of the critical net are merged to formclusters.

For a special case that the circuit is a directedtree, we can find optimal solution in polynomialtime. Let us assume the tree has its leaves at theinput and its root at the output. We use a dynamicprogramming approach to trace from the leavestoward the root. Each module is not traced untilall its input modules are processed. For eachmodule, we treat it as a root of a subtree andfind the optimal clustering of the subtree. Sinceall the modules in the subtree except its roothave been processed, we can derive an optimalsolution of the root in polynomial time.

(iii) Maximum Matching The maximum match-ing pairs all modules into IV[/2 groups simulta-neously. Given a measurement of pairing modules,we can find a matching that maximizes the totalpairing measurement in polynomial time.We can call maximum matching recursively

to create clusters of equal sizes. However, thisstrategy may enforce unrelated pairs to merge. Theenforcement will sacrifice the quality of finalclustering results.

Example Figure 30 illustrates the clustering be-havior of maximum matching. The circuit con-tains twelve modules of equal size. The first levelmaximum matching pairs modules (a,b), (d,e),(g,h), (j,k), (c, 1), and (f, i). Modules in the firstfour pairs are strongly connected with their

Page 38: Tutorial on Partitioning - Hindawi

212 S.-J. CHEN AND C.-K. CHENG

FIGURE 30 Clustering of two module circuit.

partners. However, the last two are not. Modulec and have no common nets but are mergedbecause their choices are taken by others.

Furthermore, as we proceed to the next levelmaximum matching, the merge of pairs (c, l) and(f, i) will enforce grouping modules into cluster{a, b, c,j, k, l} and cluster {d, e,f g, h, i}. If wemeasure the quality of the results with cluster cost(expression (26)), the cost of the two clusters is

,i((C(Vi))/(C(Vi)))=4/12 + 4/12=2/3. For thiscase, we can find a better solution of clusters{a, b, c, d, e,f} and {g, h, i,j, k, l} of which thecluster cost is equal to zero.

Figure 31 shows another example of twelvemodules with connectivities attached to the nets.The connectivity is if not specified. Figure 3 l(a)shows an optimum cut with cut count 6.6. If amaximum matching [61] criterion is adopted in thebottom up clustering approach, then moduleswith a net of weight 1.1 between them will bemerged. A minimum cut on the merged modulesyields a cut count of 18 (Fig. 31(b)). In general,a 2n module circuit having a symmetric configu-ration as in Figure 31 will have a cut count ofn2/2 if the maximum matching criterion is ap-plied to perform the clustering; while the optimumsolution will have a cut weight of 1.1 x n. Fromthis extreme case, we can claim the followingtheorem:

THEOREM 5.4 There is no constant factor of errorbound of the cut count generated by the maximummatching approach, from the cut count of a

minimum cut.

Proof As shown in the above example, the factorof error bound is (n2/2)/(1.1 x n) n/2.2, which isnot a constant. Q.E.D.

cut weight 6.6

1.1 1.1(a)

cut weight 18

(b)

FIGURE 31 A twelve module example to demonstrate maxi-mum matching.

(iv) Maximum Pairing The maximum pairing issimilar to maximum matching, except that it doesnot enforce the matching of all modules. Only thetop q percent of the modules are paired. Thus,we can avoid the enforced pairing of unrelatedmodules.

However, this strategy may cause certain mod-ules to keep on growing and produce very un-even cluster results. Thus, we need to choose aproper cost function that discourages unlimitedgrowth of the cluster size, e.g., cost function(26).

Page 39: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 213

5.7.4. Variations of Clustering Metric

In order to identify good clusters, we need to lookbeyond the direct adjacency between modules.It is useful if we can also extract the relation be-tween the neighbors’ neighbors, or even severallevels of neighbors’ neighbors. The probabilisticgain model of group migration approach is onegood example of such approach [37, 42].

In this section, we will discuss a few differentclustering metrics. For the case of k connectivity,we count the number of k-hop paths betweentwo modules. Or, we use an analogy of a resistivenetwork to check the conductance between themodules. Furthermore, we check beyond thehypergraph and use other information such asthe module functions, pin locations, and controlsignals.

(i) kth Connectivity The number of k-hop pathsbetween two modules provides a different aspectof information on the adjacency. Suppose the cir-cuit has only two-pin nets. We can derive the kthconnectivity with sparse matrix multiplication.Let C be the connectivity matrix with connectiv-ity c/j as its elements at row column j, and atrow j column i, and its diagonal entry ii:O.Note that we set co.=O if there is no net connect-ing modules vi and vj.

Let c!.2) be the element of the square of matrix Ctj

(C2), and el.) be the element of the kth order ofk (k)matrix C (C). Then we have cij representing the

number of distinct k-hop paths connecting mod-ules vi and vj.

(ii) Conductivity We use a resistive networkanalogy [21,93] to derive the relation betweenmodules. Suppose the circuit has only two pinnets. We replace each net eiy with a resistor ofconductance ciy. Hence, we can view the wholesystem as a resistive network and derive theconductance between modules. The system con-ductance between two modules vi and vy revealsthe adjacency relation between the two modules.The network conductance can be derived using

circuit analysis. We can also approximate the

conductance with a random walk approach. In arandom network model, we start walking from amodule vi. At each module Vk, the probability towalk via net ekl to module v is proportional tothe connectivity, i.e. (Ckl/-]m Ckm). We can derivethe relation between the random walk and theconductivity [89]:

2-e Ceho. @ hji [El (100)

oij

where ho. denotes the expected number of hopsto walk from modules vi and v, and aij denotesthe conductance between vi and

(iii) Similarity of Signatures We can use certainfeatures beyond connectivity for the clusteringmetric [88,91]. For example, the index of databits, sequence of the pins, function of logic, andrelation with common control signals can serveas signatures of function blocks in data pathdesigns. All these features form the first leveladjacency. We can extend the relation to multiplelevels. For example, two modules connecting aset of modules with strong similarity makes thesetwo modules similar.

Example As shown in Figure 32, modules A andB are similar in signature because they are of

2 2

A OR B OR

y

3 3

NOR D NOR

FIGURE 32 Signature identifies data structure.

Page 40: Tutorial on Partitioning - Hindawi

214 S.-J. CHEN AND C.-K. CHENG

the same OR function, connected to consecutivebit number at the same pin location, and control-led by the same control signal at the same pinlocation.

Modules C and D become similar becausemodule C obtains signal from A, module D ob-tains signal from B, and modules A and B aresimilar.

6. RESEARCH DIRECTIONS

Partitioning remains to be an important researchproblem. Many applications such as floorplan-ning, engineering change orders, and performancedriven emulation demand effective and efficientpartitioning solutions.

Recent efforts released benchmarks with reason-able complexity [3]. However, more design casesare still needed to represent the class of huge cir-cuitry with details of functions and timing.

In this section, we touch on a few interestingresearch problems regarding the correlation be-tween the partition of logic and physical designs,the manipulation of hierarchical tree structure,and the performance driven partitioning.

6.1. Correlation of Hierarchical PartitioningStructure Between Logic Synthesisand Physical Layout

It is desired to correlate the logic hierarchy withthe physical design hierarchy. The main reasonis the control of timing for huge designs. Current-ly, the design turnaround takes 2-8 monthsfor ASIC and much longer for custom designs.Throughout the design process, designs keep on

changing. We don’t want to lose control of timingas design changes. A tight correlation of logicand physical hierarchies makes timing predictable.Without this kind of mechanism, the timing char-acteristics of a floorplan may become erratic afteriterations of design changes.

6.2. Manipulation of Hierarchical

Partitioning Structure

One main issue in mapping a huge hierarchicalcircuit is the utilization of the hierarchy to reducethe mapping complexity. We can drasticallyimprove the efficiency of the mapping process,if we properly exploit the structure of the de-sign hierarchy. The generic binary tree is a goodformulation to start with.The handling of a hierarchy tree gives rise to

many fundamental research problems. For exam-

ple, finding k shortest-paths or exploring themaximum-flow minimum-cut of the whole circuit

[51] embedded in a hierarchical tree can be use-ful for interconnect analysis and optimization.Such research can also benefit many different fieldswhich have to handle huge hierarchical systems.

6.3. Performance Driven Partitioning

For performance driven partitioning, we need afast evaluation on the hierarchical tree structure.The analysis needs to be incremental with incor-poration of signal integrity.The network flow method is a potential ap-

proach for the partitioning with timing con-straints. More efforts are needed to improve thespeed and derive desired results.

Acknowledgements

The authors thank the editor for the encourage-ment of preparing this manuscript. The authorswould also like to thank Ted Carson, Lung-TienLiu, and John Lillis for helpful discussions.

References

[1] Ahuja, R. K., Magnanti, T. L. and Orlin, J. B., NetworkFlows, Prentice Hall, 1993.

[2] Alpert, C. J., "The ISPD98 circuit benchmark suite", Int.Symp. on Physical Design, pp. 80-85, April, 1998.

[3] Alpert, C. J., Caldwell, A. E., Kahng, A. B. and Markov,I. L., "Partitioning with Terminals: a "New" Problemand New Benchmarks", Int. Symp. on Physical Design,pp. 151 157, April, 1999.

Page 41: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 215

[4] Alpert, C. J., Huang, J. H. and Kahng, A. B., "Multi-level circuit partitioning", In: Proc. ACM/IEEE DesignAutomation Conf., June, 1997, pp. 530-533.

[5] Alpert, C. J. and Kahng, A. B., "Recent directions innetlist partitioning: a survey", Integration: The VLSI J.,19(1), 1-81, August, 1995.

[6] Alpert, C. J. and Kahng, A. B., "A general frameworkfor vertex orderings with applications to circuit cluster-ing", IEEE Trans. VLSI Syst., 4(2), 240-246, June,1996.

[7] Alpert, C. J. and Yao, S. Z., "Spectral partitioning: themore eigenvectors, the better", In: Proc. ACM/IEEEDesign Automation Conf., June, 1995, pp. 195-200.

[8] Bakoglu, H. B., Circuits, Interconnections, and Packagingfor VLSI, MA: Addison-Wesley, 1990.

[9] Blanks, J. (1989). "Partitioning by Probability Conden-sation", ACM/IEEE 26th Design Automation Conf.,pp. 758-761.

[10] Bollobas, B. (1985). Random Graphs, Academic PressInc., pp. 31 53.

[11] Boppana, R. B. (1987). "Eigenvalues and GraphBisection: An Average Case Analysis", Annual Symp.on Foundations in Computer Science, pp. 280-285.

[12] Breuer, M. A., Design Automation of Digital Systems,Prentice-Hall, NY, 1972.

[13] Bui, T., Chaudhuri, S., Jones, C., Leighton, T. andSipser, M. (1987). "Graph bisection algorithms with goodaverage case behavior", Combinatorica, 7(2), 171-191.

[14] Bui, T., Heigham, C., Jones, C. and Leighton, T.,"Improving the performance of the Kernighan-Lin andsimulated annealing graph bisection algorithms", In:Proc. ACM/IEEE Design Automation Conf., June, 1989,pp. 775 778.

[15] Buntine, W. L., Su, L., Newton, A. R. and Mayer, A.,"Adaptive methods for netlist partitioning", In: Proc.IEEE Int. Conf. Computer-Aided Design, November,1997, pp. 356-363.

[16] Burkard, R. E. and Bonniger, T. (1983). "A Heuristicfor Quadratic Boolean Programs with Applications toQuadratic Assignment Problems", European Journal ofOperational Research, 13, 372-386.

[17] Camposano, R. and Brayton, R. K. (1987). "PartitioningBefore Logic Synthesis", Int. Conf. on Computer-AidedDesign, pp. 324-326.

[18] Chan, P. K., Schlag, D. F. and Zien, J. Y., "Spectral k-way ratio-cut partitioning and clustering", IEEE Trans.Computer-Aided Design, 13(9), 1088-1096, September,1994.

[19] Charney, H. R. and Plato, D. L., "Efficient Partitioningof Components", IEEE Design Automation Workshop,July, 1968, pp. 16.0-16.21.

[20] Chatterjee, A. C. and Hartley, R., "A new SimultaneousCircuit Partitioning and Chip Placement Approachbased on Simulated Annealing", In: Proc. ACM/IEEEDesign Automation Conf., June, 1990, pp. 36-39.

[21] Cheng, C. K. and Kuh, E. S., "Module Placement Basedon Resistive Network Optimization", IEEE Trans. onComputer-Aided Design, CAD-3, 218-225, July, 1984.

[22] Cheng, C. K., "Linear Placement Algorithms and Ap-plications to VLSI Design", Networks, 17, 439-464,Winter, 1987.

[23] Cheng, C. K. and Hu, T. C., "Ancestor Tree forArbitrary Multi-Terminal Cut Functions", Porc. IntegerProgramming/Combinatorial Optimization Conf., Univ.of Waterloo, May, 1990, pp. 115-127.

[24] Cheng, C. K. and Wei, Y. C. (1991). "An ImprovedTwo-Way Partitioning Algorithm with Stable Perfor-mance", IEEE Trans. on Computer Aided Design, 10(12),1502-1511.

[25] Cheng, C. K. (1992). "The Optimal Partitioning ofNetworks", Networks, 22, 297- 315.

[26] Cherng, J. S. and Chen, S. J., "A Stable PartitioningAlgorithm for VLSI Circuits", In: Proc. IEEE CustomIntegrated Circuits Conf., May, 1996, pp. 9.1.1 9.1.4.

[27] Cherng, J. S., Chen, S. J. and Ho, J. M., "EfficientBipartitioning Algorithm for Size-Constrained Circuits",lEE Proceedings-Computers and Digital Techniques,145(1), 37-45, January, 1998.

[28] Cheng, C. K. and Hu, T. C. (1992). "Maximum Con-current Flow and Minimum Ratio Cut", Algorithmica,8, 233- 249.

[29] Chou, N. C., Liu, L. T., Cheng, C. K., Dai, W. J. andLindelof, R., "Local Ratio Cut and Set CoveringPartitioning for Huge Logic Emulation Systems", IEEETrans. Computer-Aided Design, pp. 1085-1092, Septem-ber, 1995.

[30] Chvatal, V. (1983). Linear Programming, W. H. Freemanand Company.

[31] Cong, J. and Ding, Y., "FlowMap: An Optimal Tech-nology Mapping Algorithm for Delay Optimization inLookup-Table Based FPGA Designs", IEEE Trans.Computer-Aided Design, January, 1994, 13, 1-12.

[32] Cong, J., Labio, W. and Shivakumar, N., "Multi-wayVLSI circuit partitioning based on dual net representa-tion", In: Proc. IEEE Int. Conf. Computer-Aided Design,November, 1994, pp. 56-62.

[33] Cong, J., Li, H. P., Lim, S. K., Shibuya, T. and Xu, D.,"Large scale circuit partitioning with loose/stable netremoval and signal flow based clustering", In: Proc.IEEE Int. Conf. Computer-Aided Design, November,1997, pp. 441-446.

[34] Donath, W. E. and Hoffman, A. J. (1973). "LowerBounds for the Partitioning of Graphs", IBM J. Res.Dev., pp. 420-425.

[35] Donath, W. E. and Hoffman, A. J. (1972). "Algorithmsfor partitioning of graphs and computer logic based oneigenvectors of connection matrices", IBM TechnicalDisclosure Bulletin 15, pp. 938-944.

[36] Donath, W. E. (1988). "Logic partitioning", In: PhysicalDesign Automation of VLSI Systems, Preas, B. andLorenzetti, M. (Eds.) Menlo Park, CA: Benjamin/Cummings, pp. 65- 86.

[37] Dutt, S. and Deng, W., "A Probability-based Approachto VLSI Circuit Partitioning", In: Proc. ACM/IEEEDesign Automation Conf., June, 1996, pp. 100-105.

[38] Dutt, S. and Deng, W., "VLSI Circuit Partitioning byCluster-Removal Using Iterative Improvement Techni-ques", In: Proc. IEEE Int. Conf. Computer-Aided Design,November, 1996, pp. 194-200.

[39] Enos, M., Hauck, S. and Sarrafzadeh, M., "Evaluationand optimization of Replication Algorithms for logicBipartitioning", IEEE Trans. on Computer-Aided Design,September, 1999, 18, 1237-48.

[40] Fiduccia, C. M. and Mattheyses, R. M., "A Linear-TimeHeuristic for Improving Network Partitions", In: Proc.ACM/IEEE Design Automation Conf., June, 1982,pp. 175-181.

[41] Frankle, J. and Karp, R. M. (1986). "Circuit Placementand Cost Bounds by Eigen.vector Decomposition", Proc.Int. Conf. on Computer-Aided Design, pp. 414-417.

Page 42: Tutorial on Partitioning - Hindawi

216 S.-J. CHEN AND C.-K. CHENG

[42] Garbers, J., Promel, H. J. and Steger, A. (1990)."Finding clusters in VLSI circuits", In: Proc. IEEE Int.Conf. Computer-Aided Design, pp. 520-523.

[43] Garey, M. R. and Johnson, D. S., Computers andInstractability: A Guide to the Theory of NP-Complete-ness, W.H. Freeman, San Francisco, CA, 1979.

[44] Hagen, L. and Kahng, A. B., "New spectral methodsfor ratio cut partitioning and clustering", IEEE Trans.Computer-Aided Design, 11(9), 1074-1085, September,1992.

[45] Hagen, L. and Kahng, A. B., "Combining problemreduction and adaptive multistart: a new technique forsuperior iterative partitioning", IEEE Trans. Computer-Aided Design, 16(7), 709-717, July, 1997.

[46] Hall, K. M., "An r-dimensional Quadratic PlacementAlgorithm", Management Science, 17(3), 219-229,November, 1970.

[47] Hamada, T., Cheng, C. K. and Chau, P., "An EfficientMulti-Level Placement Technique Using HierarchicalPartitioning", IEEE Trans. Circuits and Systems, 39,432-439, June, 1992.

[48] Hennessy, J. (1983). "Partitioning Programmable LogicArrays Summary", Int. Conf. on Computer-Aided Design,pp. 180-181.

[49] Hoffmann, A. G., "The Dynamic Locking Heuristic ANew Graph Partitioning Algorithm", In: Proc. IEEE Int.Symp. Circuits and Systems, May, 1994, pp. 173-176.

[50] Adolphson, D. and Hu, T. C., "Optimal LinearOrdering", SIAM J. Appl. Math., 25(3), 403-423,November, 1973.

[51] Hu, T. C., "Decomposition Algorithm", pp. 17-22, In:Combinatorial Algorithms, Addison Wesley, 1982.

[52] Hu, T. C. and Moerder, K., "Multiterminal flows ina hypergraph", In: VLSI Circuit Layout: Theory andDesign, Hu, T. C. and Kuh, E. (Eds.) NY: IEEE Press,1985, pp. 87-93.

[53] Hur, S. W. and Lillis, J. (1999). "Relaxation andClustering in a Local Search Framework: Applicationto Linear Placement", Design Automation Conference,pp. 360- 366.

[54] Hwang, J. and Gamal, A. E., "Optimal Replicationfor Min-Cut Partitioning", Proc. IEEE/ACM Intl.Conf. Computer-Aided Design, November, 1992, pp.432-435.

[55] Iman, S., Pedram, M., Fabian, C. and Cong, J.,"Finding uni-directional cuts based on physical parti-tioning and logic restructuring", In: Proc. ACM/SIGDAPhysical Design Workshop, May, 1993, pp. 187-198.

[56] Johnson, D. S., Aragon, C. R., McGeoch, L. A. andSchevon, C. (1989). "Optimization by Simulated Anneal-ing: an Experimental Evaluation, Part I, Graph Parti-tioning", Operations Research, 37(5), 865-892.

[57] Karp, R. M. (1978). "A Characterization of TheMinimum Cycle Mean in A Digraph", Discrete Mathe-matics, 23, 309- 311.

[58] Karypis, G., Aggarwal, R., Kumar, V. and Shekhar, S.,"Multilevel Hypergraph Partitioning: Application inVLSI Domain", In: Proc. ACM/IEEE Design Automa-tion Conf., June, 1997, pp. 526-529.

[59] Karypis, G., Aggarwal, R., Kumar, V. and Shekhar, S.(1998). "Multilevel Hypergraph Partitioning: Applicationin VLSI Domain", Manuscript of CS Dept., Univ.of Minnesota, pp. 1-25 (http://www.users.cs.umn.edu/karypis/metis/publications/).

[60] Kernighan, B. W. and Lin, S., "An Efficient HeuristicProcedure for Partitioning Graphs", Bell Syst. Tech. J.,49(2), 291 307, February, 1970.

[61] Khellaf, M., "On The Partitioning of Graphs andHypergraphs", Ph.D. Dissertation, Indus. Engineeringand Operations Research, Univ. of California, Berkeley,1987.

[62] Kirkpatrick, S., Gelatt, C. and Vechi, M., "Optimizationby Simulated Annealing", Science, 221)(4598), 671-680,May, 1983.

[63] Knuth, D. E., The Art of Computer Programming,Addison Wesley, 1997.

[64] Kring, C. and Newton, A. R. (1991). "A Cell-ReplicatingApproach to Mincut Based Circuit Partitioning", Proc.IEEE Int. Conf. on Computer-Aided Design, pp. 2-5.

[65] Krishnamurthy, B., "An Improved Min-Cut Algorithmfor Partitioning VLSI Networks", IEEE Trans. Compu-ters, C-33(5), 438-446, May, 1984.

[66] Krupnova, H., Abbara, A. and Saucier, G. (1997). "AHierarchy-Driven FPGA Partitioning Method", DesignAutomation Conf., pp. 522-525.

[67] Kuo, M. T. and Cheng, C. K., "A New Network FlowApproach for Hierarchical Tree Partitioning", In: Proc.ACM/IEEE Design Automation Conf., June, 1997,pp. 512- 517.

[68] Kuo, M. T., Liu, L. T. and Cheng, C. K., "NetworkPartitioning into Tree Hierarchies", In: Proc.ACM/IEEE Design Automation Conf., June, 1996,pp. 477-482.

[69] Kuo, M. T., Liu, L. T. and Cheng, C. K., "Finite StateMachine Decomposition for I/O Minimization", In:Proc. IEEE Int. Symp. on Circuits and Systems, May,1995, pp. 1061 1064.

[70] Kuo, M. T., Wang, Y., Cheng, C. K. and Fujita, M.,"BDD-Based Logic Partitioning for Sequential Cir-cuits", In: Proc. ASP/DAC, Chiba, Japan, January,1997, pp. 607 612.

[71] Lomonosov, M. V. (1985). "Combinatorial Approachesto Multiflow Problems", Discrete Applied Mathematics,11(1), 1-94.

[72] Landman, B. S. and Russo, R. L., "On a Pin VersusBlock Relationship for Partitioning of Logic Graphs",IEEE Trans. on Computers, C-2I), 1469-1479, Decem-ber, 1971.

[73] Lawler, E. L., Combinatorial Optimization: Networksand Matroids, Holt, Rinehart and Winston, New York,1976.

[74] Leighton, T. and Rao, S. (1988). "An ApproximateMax-Flow Min-cut Theorem for Uniform Multicom-modity Flow Problems with Applications to Approx-imation Algorithms", IEEE Symp. on Foundations ofComputer Science, pp. 422- 431.

[75] Leighton, T., Makedon, F., Plotkin, S., Stein, C.,Tardos, E. and Tragoudas, S., "Fast ApproximationAlgorithms for Multicommodity Flow Problems", Tech.report no. STAN-CS-91-1375, Dept. of ComputerScience, Stanford University.

[76] Leiserson, C. E. and Saxe, J. B. (1991). "RetimingSynchronous Circuitry", Algorithmica, 6(1), 5 35.

[77] Lengauer, T. and Muller, R. (1988). "Linear Arrange-ment Problems on Recursively Partitioned Graphs",Zeitschrift fur Operations Research, 32, 213 230.

[78] Lengauer, T., Combinatorial Algorithms for IntegratedCircuit Layout, Wiley, 1990.

Page 43: Tutorial on Partitioning - Hindawi

VLSI PARTITIONING 217

[79] Li, J., Lillis, J. and Cheng, C. K., "Linear decompositionalgorithm for VLSI design applications", In: Proc. IEEEInt. Conf. Computer-Aided Design, November, 1995,pp. 223- 228.

[80] Li, J., Lillis, J., Liu, L. T. and Cheng, C. K., "NewSpectral Linear Placement and Clustering Approach",In: Proc. ACM/IEEE Design Automation Conf., June,1996, pp. 88- 93.

[81] Liou, H. Y., Lin, T. T., Liu, L. T. and Cheng, C. K.,"Circuit Partitioning for Pipelined Pseudo-ExhaustiveTesting Using Simulated Annealing", In: Proc. IEEECustom Integrated Circuits Con., May, 1994,pp. 417-420.

[82] Liu, L. T., Kuo, M. T., Cheng, C. K. and Hu, T. C., "AReplication Cut for Two-Way Partitioning", IEEETrans. Computer-Aided Design, May, 1995, pp. 623-630.

[83] Liu, L. T., Kuo, M. T., Cheng, C. K. and Hu, T. C.,"Performance-Driven Partitioning Using a ReplicationGraph Approach", In: Proc. ACM/IEEE Design Auto-mation Conf., June, 1995, pp. 206- 210.

[84] Liu, L. T., Kuo, M. T., Huang, S. C. and Cheng, C. K.,"A gradient method on the initial partition of Fiduccia-Mattheyses algorithm", In: Proc. IEEE Int. Conf.Computer-Aided Design, November, 1993, pp. 229-234.

[85] Liu, L. T., Shih, M., Chou, N. C., Cheng, C. K. and Ku,W., "Performance-Driven Partitioning Using Retimingand Replication", In: Proc. IEEE Int. Conf. Computer-Aided Design, November, 1993 pp. 296-299.

[86] Liu, L. T., Shih, M. and Cheng, C. K., "Data FlowPartitioning for Clock Period and Latency Minimiza-tion", In: Proc. ACM/IEEE Design Automation Conf.,June 1994, pp. 658-663.

[87] Matula, D. W. and Shahrokhi, F., "The MaximumConcurrent Flow Problem and Sparsest Cuts", Tech.Report, southern Methodist Univ., 1986.

[88] McFarland, M. C., "Computer-aided partitioning ofbehavioral hardware descriptions", In: Proc. ACM/IEEE Design Automation Conf., June, 1983, pp. 472-478.

[89] Motwani, R. and Raghavan, P. (1995). RandomizedAlgorithms, Cambridge University Press.

[90] Ng, T. K., Oldfield, J. and Pitchumani, V., "Improve-ments of a mincut partition algorithms", In: Proc. IEEEInt. Conj Computer-Aided Design, November, 1987,pp. 470-473.

[91] Nijssen, R. X. T., Jess, J. A. G. and Eindhoven, T. U.,"Two-Dimensional Datapath Regularity Extraction",Physical Design Workshop, April, 1996, pp. 111 117.

[92] Parhi, K. K. and Messerschmitt, D. G. (1991). "StaticRate-Optimal Scheduling of Iterative Data-Flow Pro-grams via Optimum Unfolding", IEEE Trans. onComputers, 40(2), 178-195.

[93] Riess, B. M., Doll, K. and Johannes, F. M., "Partition-ing very large circuits using analytical placementtechniques", In: Proc. ACM/IEEE Design AutomationConf., June, 1994, pp. 646-651.

[94] Roy, K. and Sechen, C., "A Timing Driven N-Way Chipand Multi-Chin Partitioner", Proc. IEEE/ACM Int.Conf on Computer-Aided Design, pp. 240-247, Novem-ber, 1993.

[95] Russo, R. L., Oden, P. H. and Wolff, P. K. Sr., "Aheuristic procedure for the partitioning and mapping ofcomputer logic graphs", IEEE Trans. on Computers,C-20, 1455-!462, December, 1971.

[96] Saab, Y., "A fast and robust network bisectionalgorithm", IEEE Trans. Computers, 44(7), 903- 913,July, 1995.

[97] Saab, Y. and Rao, V. (1989). "An Evolution-BasedApproach to Partitioning ASIC Systems", ACM/IEEE26th Design Automation Conf., pp. 767-770.

[98] Sanchis, L. A., "Multiple-Way Network Partitioning",IEEE Trans. Computers, 38(1), 62-81, January, 1989.

[99] Sanchis, L. A., "Multiple-Way Network Partitioningwith Different Cost Functions", IEEE Trans. on Com-puters, pp. 1500-1504, December, 1993.

[100] Schuler, D. M. and Ulrich, E. G. (1972). "Clustering andLinear Placement", Proc. 9th Design Automation Work-shop, pp. 50-56.

[101] Schweikert, D. G. and Kernighan, B. W. (1972). "AProper Model for the Partitioning of Electrical Circuits",Proc. 9th Design Automation Workshop, pp. 57-62.

[102] Sechen, C. and Chen, D. (1988). "An Improved Objec-tive Function for Mincut Circuit Partitioning", Proc. Int.Conf. on Computer-Aided Design, pp. 502-505.

[103] Shahrokhi, F. and Matula, D. W., "The MaximumConcurrent Flow Problem", Journal of the ACM, 37(2),3"18-334, April, 1990.

[104] Shapiro, J. F., Mathematical Programming: Structuresand Algorithms, Wiley, New York (1979).

[105] Sherwani, N. A., Algorithms for VLSI Physical DesignAutomation, 3rd edn., Kluwer Academic (1999).

[106] Shih, M., Kuh, E. S. and Tsay, R.-S. (1992). "Perfor-mance-Driven- System Partitioning on Multi-ChipModules", Proc. 29th ACM/IEEE Design AutomationConf., pp. 53-56.

[107] Shih, M. and Kuh, E. S. (1993). "Quadratic BooleanProgramming for Performance-Driven System Partition-ing", Proc. 30th ACM/IEEE Design Automation Conf.,pp. 761 765.

[108] Shin, H. and Kim, C., "A Simple Yet Effective Tech-nique for Partitioning", IEEE Trans. on Very Large ScaleIntegration Systems, pp. 380-386, September, 1993.

[109] Wei, Y. C. and Cheng, C. K. (1991). "Ratio CutPartitioning for Hierarchical Designs", IEEE Trans. onComputer-Aided Design, 10(7), 911 921.

[110] Wei, Y. C., Cheng, C. K. and Wurman, Z., "MultipleLevel Partitioning: An Application to the Very LargeScale Hardware Simulators", IEEE Journal of SolidState Circuits, 26, 706-716, May, 1991.

[111] Woo, N. S. and Kim, J. (1993). "An Efficient Meth-od of Partitioning Circuits for Multiple-FPGA Imple-mentation", Proc. ACM/IEEE Design AutomationConf., pp. 202-207.

[112] Yang, H. and Wong, D. F. (1994). "Edge-Map: OptimalPerformance Driven Technology Mapping for IterativeLUT Based FPGA Designs", Int. Conf. on Computer- AAided Design, pp. 150-155.

[113] Yang, H. and Wong, D. F., "Efficient Network Flowbased Min-Cut Balanced Partitioning", In: Proc. IEEEInt. Conf Computer-Aided Design, November, 1994,pp. 50- 55.

[114] Yeh, C. W., "On the Acceleration of Flow-OrientedCircuit Clustering", IEEE Trans. Computer-Aided De-sign, 14(10), 1305-1308, October, 1995.

[115] Yeh, C. W., Cheng, C. K. and Lin, T. T. Y., "A generalpurpose, multiple-way partitioning algorithm", IEEETrans. Computer-Aided Design, 13(12), 1480-1488,December, 1994.

Page 44: Tutorial on Partitioning - Hindawi

218 S.-J. CHEN AND C.-K. CHENG

[116] Yeh, C. W., Cheng, C. K. and Lin, T. T.Y., the Association for Computing Machinery, the"Optimization by iterative improvement: an experimen-tal evaluation on two-way partitioning", IEEE Trans. IEEE, and the IEEE Computer Society.Computer-Aided Design, 14(2), 145-153, February, Chung-Kuan Cheng received the B.S. and M.S.1995.

[117] Yeh, C. W., Cheng, C. K. and Lin, T. T. Y., "Circuit degrees in electrical engineering from Nationalclustering using a stochastic flow injection method", Taiwan University, and the Ph.D. degree in elec-IEEE Trans. Computer-Aided Design, 14(2), 154-162,February, 1995. trical engineering and computer sciences from

[118] Zien, J. Y., Chan, P. K. and Schlag, M., "Hybrid University of California, Berkeley in 1984. Fromspectral/iterative partitioning" In: Proc. IEEE 1984 to 1986 he was a senior CAD engineer atInt. Conf. Computer-Aided Design, November, 1997pp. 436-440. Advanced Micro Devices Inc. In 1986, he joined

the University of California, San Diego, where

Authors’ Biographieshe is a Professor in the Computer Science andEngineering Department, an Adjunct Professor

Sao-Jie Chen has been a member of the faculty in in the Electrical and Computer Engineeringthe Department of Electrical Engineering, Na- Department. He served as a chief scientist attional Taiwan University since 1982, where he is Mentor Graphics in 1999. He is an associate editorcurrently a full professor. During the fall of 1999, of IEEE Trans. on Computer Aided Design sincehe held a visiting appointment at the Department 1994. He is a recipient of the best paper award,of Computer Science and Engineering, University IEEE Trans. on Computer-Aided Design 1997,of California, San Diego. His current research the NCR excellence in teaching award, School ofinterests include: VLSI circuits design, VLSI Engineering, UCSD, 1991. His research interestsphysical design automation, and object-oriented include network optimization and design automa-software engineering. Dr. Chen is a member of tion on microelectronic circuits.

Page 45: Tutorial on Partitioning - Hindawi

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

DistributedSensor Networks

International Journal of