clock routing

Embed Size (px)

Citation preview

  • 7/29/2019 clock routing

    1/11

    622 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001

    Global Routing by New Approximation Algorithmsfor Multicommodity Flow

    Christoph Albrecht

    AbstractWe show how the new approximation algorithms byGarg and Knemann with extensions due to Fleischer for the mul-ticommodity flow problem can be modified to solve the linear pro-gramming relaxation of the global routing problem. Implementa-tion issues to improve the performance, such as a discussion of dif-ferent functions for the dual variables and how to use the Newtonmethod as an additional optimization step, are given. It is shownthat not only the maximum relative congestion is minimized, butthe congestion of the edges is distributed equally such that the so-lution is optimal in a well-defined sense: the vector of the relativecongestion of theedges sortedin nonincreasing order is minimal bylexicographic order. This is an important step toward improvingsignal integrity by extra spacing between wires. Finally, we showhow the weighted netlength can be minimized. Our computationalresults with recent IBM processor chips show that this approachcan be used in practice even for large chips and that it is superioron difficult instances where ripup and reroute algorithms fail.

    Index TermsGlobal routing, physical design, VLSI.

    I. INTRODUCTION

    ROUTING is usually split up into two subtasks: globalrouting, which gives the approximate area for the Steiner

    tree for a net, and detailed routing, which assigns the actualtracks and vias to the nets. In this paper, we consider theproblem of global routing only.

    Many global routers are based on an initial route of all netsfollowed by a ripup and reroute procedure that tries to reducethe congestion of the edges by rerouting segments of nets onoverloaded edges. For difficult instances, these algorithms mayrun forever and not come up with a solution, even though a so-lution exists.

    Ting and Tien [1] present such an iterative improvement al-gorithm for global routing. Their work was later improved byShragowith and Keel [2], who present an algorithm that termi-nates in some cases with a proof that a solution does not exist.Meixner and Lauther [3] consider several nets which have oneterminal in common at the same time by a single commodityflow formulation.

    Our algorithm is based on linear programming relaxation andrandomized rounding. This approach is not new. It was pro-posed by Raghavan and Thompson [4], [5]. However, they donot state explicitly how to solve the linear programming relax-ation. Solving the linear program directly is possible for very

    Manuscript received July 24, 2000; revised November 13, 2000. This paperwas recommended by Associate Editor D. Hill.

    The author is withthe Research Institutefor Discrete Mathematics, Universityof Bonn, 53113 Bonn, Germany (e-mail: [email protected]).

    Publisher Item Identifier S 0278-0070(01)03085-8.

    small instances only and a simple formulation as a multicom-modity flow problem works only for nets with two terminals.

    Early work on linear programming approaches has alreadybeen done by Aoshima and Kuh [6], Hu and Shing [7], who usethe column generation technique, and Huang et al. [8]. CardenIV et al. [9], [24] use the approximation algorithm for multi-commodity flow by Shahrokhi and Matula [10]. Their basic ap-proach is similar to ours. However, we use a new and differentapproximation algorithmwithvarious extensions withwhichwecan solve much larger instances than in [9] and [24]. Finally, weshow how the weighted netlength can be minimized, which is

    important for the approach to be used in practice.Other global routers use a hierarchical decomposition ap-proach, for example, Brouwer and Banerjee [11] and Dai andKuh [12]. Cong and Preas [13] give a spanning forest formula-tion.

    Our contribution is to show that the new combinatorialapproximation algorithm by Garg and Knemann [14] forthe maximum concurrent flow problem can be modified tosolve the linear programming relaxation of the global routingproblem and to show that an idea introduced by Fleischer[15], [25] to improve the theoretical worst case running timeof the maximum multicommodity flow problem can be usedto speed up the approximation algorithm in practice while

    the same upper bound on the worst case running time canbe proved. We use a function for the dual variables that isdifferent to the one used by Garg and Knemann and give acomparison of the results for both functions in practice. Furtherperformance improvements are achieved by using the Newtonmethod as an additional optimization step. We show that thealgorithm computes a solution for which the congestion of alledges is minimized and distributed equally: the vector of therelative congestion of the edges sorted in nonincreasing orderis minimal by lexicographic order. Finally, it is possible tominimize the total weighted netlength with respect to a givenmaximum relative congestion. Our computational results showfor the first time that these approximation algorithms give good

    results in practice even for very large chips.The approximation algorithm also provides a solution for the

    dual linear program. By linear programming duality, the valueof any feasible solution for the dual linear program is a lowerbound on theoptimum maximum relative congestion and, there-fore, gives a factor by which the capacities of the edges have tobe multiplied at least in order to make the chip routable.

    It is possible to add additional constraints for some nets.Nets that are timing-critical may be routed with minimumlength only, perhaps even with respect to a given delay optimaltopology, or a bound for the length of the net may be given.

    02780070/01$10.00 2001 IEEE

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/29/2019 clock routing

    2/11

    http://-/?-
  • 7/29/2019 clock routing

    3/11

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/29/2019 clock routing

    4/11

  • 7/29/2019 clock routing

    5/11

    http://-/?-http://-/?-
  • 7/29/2019 clock routing

    6/11

    http://-/?-http://-/?-http://-/?-
  • 7/29/2019 clock routing

    7/11

  • 7/29/2019 clock routing

    8/11

    ALBRECHT: GLOBAL ROUTING BY NEW APPROXIMATION ALGORITHMS FOR MULTICOMMODITY FLOW 629

    edge , we have such that the relative congestionof the edges for is better by lexicographic order than for .

    VII. MINIMIZING THE TOTAL NETLENGTH

    Minimizing the maximum relative congestion is not the onlyobjective in global routing. Here, we show how to modify thealgorithm described in Section III such that the total weightednetlength is considered and minimized subject to the conditionthat the maximum relative congestion of the edges is at most1. We follow the approach by Garg and Knemann [14] for theminimum cost multicommodity flow problem.

    In addition to the capacity for each edge , thelength is given, that is, for an edge in or direction,

    the distance between the midpoints of the tiles corresponding toand . For each net , a weight is given. Let be a

    target for the total weighted netlength; then, the constraint

    (7)

    is added to the linear program (1) of the fractional globalrouting problem. This constraint is very similar to the capacityconstraints for the edges, the first constraint in (1), and thealgorithm can be modified to treat this new constraint in thesame way as the capacity constraints. To minimize the totalweighted netlength, we want to be as small as possible suchthat , the maximum relative congestion, is at most one. Thisis achieved by binary search over . In practice, the netlengthin the final solution is only slightly higher compared to theminimum netlength if each Steiner tree is as short as possibleignoring capacities. This gives a good estimate for .

    For the dual of the linear program an additional dual variablefor constraint (7) is needed. The dual linear program is given

    by

    subject to

    for

    for

    Fig. 7 shows the modified algorithm. During the algorithm,minimal Steiner trees are computed with respect to the length

    for net and edge (line 10). The lengthof an edge is a linear combination of , measuring howmuch the edge is used, and the weighted length of theedge.

    With the assumption that for each Steinertree found in line 10 (which usually holds for the globalrouting problem), the proof in Section III can be extended toshow that a approximation for the fractional globalrouting problem with constraint (7) is found by the algorithmin Fig. 7 in phases. If for some Steiner tree

    Fig. 7. Approximation algorithm for fractional global routing regarding thetotal weighted netlength.

    , we have , variable in line 13 isincreased only by , and another Steiner treehas to be found for the same net until the total increment of

    is one (for details, see [14]).To apply the Newton method as in Section V, we define

    for constraint (7) and the potential .Higher values for relax constraint (7) and, as a result, the

    congestion of the edges is decreased further.

    VIII. COMPUTATIONAL RESULTS

    We have implemented the algorithm in C. All runs are onan IBM workstation RS/6000 model 270. The global routingalgorithm has been used together with a local router in therouting package XRouter (see [20][22]) for several IBMprocessor chips.

    For computing the Steiner trees, we use a subroutine that

    computes the optimum Steiner tree for nets with two or threeterminals. For nets with more terminals, it is a heuristic: it com-putes the Steiner points of the minimal Steiner tree with respectto length (see [20]), then performs a Dijkstra search usingthe Steiner points as additional targets. However, these Steinerpoints do not necessarily have to belong to the Steiner tree, asantennas are removed in the end.

    The condition in line 4 of the algorithm in Figs. 2 and 7 is notchecked. Instead, the algorithm stops when a desired maximumrelative congestion is achieved.

    Table I shows some chips with their grid size, number ofglobal routing nets (nets with pins in different tiles), averagenumber of pins per net, and average capacity for the edges in

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/29/2019 clock routing

    9/11

    630 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001

    TABLE ICHARACTERISTICS OF CHIPS FOR THE GLOBAL ROUTING PROBLEM

    TABLE IIMINIMIZING THE MAXIMUM RELATIVE CONGESTION

    -, -, and direction. PRIZMA is a chip used for telecommu-nication, 630 fpp is a followup of the Power3 microprocessor,and MBA00 is part of the S/390 processor chip set. All otherchips are different ASICs.

    All chips are placed flat, except for 630 fpp and m_3l, whichare hierarchically designed and placed in blocks. The differ-ence for the global routing problem is that these two chips havemore nets which are longer. The chip m_3l has seven layers ofmetal for routing, chip 630fpp, PRIZMA, and MBA00 have sixrouting layers, and WIND_PCI has five layers of metal. Thechip quorina has only three routing layers. The chip WIND_ICPis in technology SA-12E, the chip PRIZMA is SA-27 (for de-tails see [23]).

    In thecurrent implementation, allrouting layers in directionare condensed into a single layer for global routing and thesameis done for all routing layers in direction. The graph for theglobal routing problem is the same as in Fig. 1.

    TABLE IIIMINIMIZING CONGESTION AND TOTAL WEIGHTED NETLENGTH

    Table II shows the running times of the approximation algo-rithm in Fig. 2 with the Newton method (Section V) for thesechips and how the maximum relative congestion and the lower

    bound improve during the algorithm. The running time to com-pute the lower bound (compute a minimal Steiner tree for eachnet according to Theorem 1 at the end of a phase) is not includedin the time given in the last column. The lower bound is basedon the assumption that the subroutine computes the optimumSteiner tree, which is not the case for nets with more than threeterminals.

    Table III shows the results of the algorithm in Fig. 7, whichalso considers the netlength. In the fourth column, the weightednetlength of the fractional solution is shown. For the targetnetlength , we always chose the minimum netlength (eachSteiner tree routed as short as possibleignoring capacities),

    which is found in the first column.The computational results also show that the running timeneeded for a phase is smaller if the netlength is also consid-ered. The reason is the following. The dual variables increasemore for highly congested edges and these edges build up bar-riers that have to be broken by the Dijkstra search for those netsthat need to cross these barriers and, hence, more nodes have tobe labeled. If the netlength is also considered, the length of anedge is the sum of the usage and the weighted length

    and the Dijkstra search can go straighter to the targetif the terminals of the net lie in uncongested regions. For thesame reason, the running time for a phase increases during thealgorithm if the netlength is not considered.

    http://-/?-http://-/?-
  • 7/29/2019 clock routing

    10/11

    ALBRECHT: GLOBAL ROUTING BY NEW APPROXIMATION ALGORITHMS FOR MULTICOMMODITY FLOW 631

    (a) (b) (c)

    Fig. 8. Congestion of edges for the chip MBA00. (a) Initial solution for the ripup and reroute algorithm. (b) Final solution of the ripup and reroute algorithm. (c)Fractional solution after ten phases of the approximation algorithm in Fig. 7, which also considers the netlength.

    We compared the algorithm with the following ripup andreroute algorithm: In the first phase each net is routed with min-imum length. The length of the overloaded edges isincreasedby a very small value, such that the total number of overloadededges used is minimized as a second objective for the case thatthere are several routes of minimum length.

    In a second phase, the congestion of overloaded edges is re-duced by rerouting all segments crossing an edge with max-imum overload and finally exchanging that segment for whichthe length increased by as little as possible. When segments arererouted, edges with themaximum overloadareeither forbidden

    or the edges have high exponential cost as a function of theircongestion.

    For chip PRIZMA, this ripup andreroute algorithm failed andincreased the netlength by more than 3% starting from the min-imum netlength (each net routed separately), whereas the ap-proximation algorithm found a solution with the total netlengthbeing only 1.7% greater than the minimum netlength.

    On easy instances, for example, chip 630 fpp, routing eachnet with minimum length gave almost a feasible solution.In this case, the running time of the approximation algorithmwas much higher and there was almost no improvement in thequality of the solution.

    Fig. 8 shows a comparison of the congestion of the edgesfor the ripup and reroute algorithm and of the approximationalgorithm: (a) shows the congestion of the edges for the initialsolution (each net routed as short as possible but overloadededges arealready avoided); (b) thefinal solution of theripup andreroute algorithm; and (c) shows the fractional solution after 10ten phases of the approximation algorithm in Fig. 7, which alsoconsiders the netlength. The number of edges, which are highlycongested(for example, have a relative congestion between 0.85and 1.0), is considerably smaller.

    After solving the fractional global routing problem, random-ized rounding is applied. For each net , independently oneSteiner tree is chosen at random and the probability that Steiner

    tree is chosen is . For details, see [4] and [5]. Random-ized rounding is not at all critical during the whole process. Wehave not even implemented a derandomized version; we simplyperform 100 randomized rounding steps and chose the solutionfor which the maximum relative congestion is minimum. Themaximum relative congestion increased only on very few edgesand by using the ripup and reroute algorithm, the congestioncould be reduced to almost the value of the solution for thelinear program in a few seconds.

    For the chip MBA00, the distribution of the congestion ofthe edges after randomized rounding looked exactly the same

    as the distribution before randomized rounding. There was nodifference in the Fig. 8(c).

    We would also like to mention another advantage that we ex-perienced by our comparison with the ripup and reroute algo-rithm. For each net, we computed the factor of how much longerthe net is in the final solution compared to the minimum lengthpossible. For the ripup and reroute algorithm, there were severalnets with a high factor; forthe approximation algorithm(the ver-sion which considers the netlength), the maximum factor andalso the number of nets with a high factor was much smaller.When a ripup and reroute algorithm decreases the congestionof an edge, it has a very limited view and limited number ofchoices, so it may have to increase the length of the Steiner treefor a net substantially.

    IX. CONCLUSION

    We have presented an approximation algorithm for solvingthe linear programming relaxation of the global routingproblem. If a solution for the fractional global routing problemwith maximum relative congestion at most one does not exist,the global routing problem is not solvable. In this case, thealgorithm provides a proof by the dual solution. This is one ofthe advantages over many other algorithms for global routing.

    Our computational results show that the algorithm is superioron difficult instances where ripup and reroute algorithms fail.

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/29/2019 clock routing

    11/11

    632 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001

    ACKNOWLEDGMENT

    The author would like to thank B. Korte for his continuoussupport, L. Fleischer for helpful discussions, and A. Rohe, K.Khler, and J. Werber for joint development of XRouter. Theauthor would also like to thank the IBM teams in Bblingen,Fishkill, and Austin for feedback, test cases, and joint develop-ment of XRouter.

    REFERENCES

    [1] B. S. Ting and B. N. Tien, Routing techniques for gate array, IEEETrans. Computer-Aided Design, vol. CAD-2, pp. 301312, Oct. 1983.

    [2] E. Shragowith and S. Keel, A global router based on a multicommodityflow model, Integr. VLSI J., vol. 5, no. 2, pp. 316, 1987.

    [3] G. Meixner and U. Lauther, A new global router based on a flowmodel and linear assignment, in Proc. Int. Conf. Computer-Aided

    Design, Nov. 1990, pp. 4447.[4] P. Raghavan and C. D. Thompson, Randomized rounding: A technique

    for provably good algorithms and algorithmic proofs, Combinatorica,vol. 7, no. 4, pp. 365374, 1987.

    [5] , Multiterminal global routing: A deterministic approximation,Algorithmica, vol. 6, pp. 7382, 1991.

    [6] K. Aoshima and E. S. Kuh, Multi-channel optimization in gate-array

    LSI layout, in Proc. IEEE Int. Symp. Circuits and Systems, 1983, pp.10051008.[7] T. C. Hu and M. T. Shing, A decomposition algorithm for circuit

    routing, in VLSI Circuit Layout: Theory and Design, T. C. Hu and E.S. Kuh, Eds. New York: IEEE Press, 1985, pp. 144152.

    [8] J. Huang,X.-L. Hong, C.-K. Cheng,and E. S. Kuh, Anefficienttiming-driven global routing algorithm, in Proc. 30th ACM/IEEE Design Au-tomation Conf., June 1993, pp. 596600.

    [9] R. C. Carden IV, J. Li, and C.-K. Cheng, A global router with a theo-retical bound on the optimum solution., IEEE Trans. Computer-Aided

    Design, vol. 15, pp. 208216, Feb. 1996.[10] F. Shahrokhi and D. W. Matula, The maximum concurrent flow

    problem, J. Assoc. Comput. Mach., vol. 37, no. 2, pp. 318334, Apr.1990.

    [11] R. J. Brouwer and P. Banerjee, PHIGURE: A parallel hierarchicalglobal router, in Proc. 27th ACM/IEEE Design Automation Conf., June1991, pp. 650653.

    [12] W.-M. Dai and E. S. Kuh, Simultaneous floor planning and globalrouting for hierarchical building-block layout, IEEE Trans. Computer-Aided Design, vol. 6, pp. 828837, May 1997.

    [13] J.Congand B. Preas,A new algorithmfor standardcell global routing,Integr. VLSI J., vol. 14, no. 1, pp. 4965, 1992.

    [14] N. Garg andJ. Knemann,Fasterand simpler algorithms formulticom-modity flow and other fractional packing problems, in Proc. 39th Annu.Symp. Foundations of Computer Science, Nov. 1998, pp. 300309.

    [15] L. K. Fleischer, Approximating fractional multicommodity flow in-dependent of the number of commodities, in Proc. 40th Annu. Symp.Foundations of Computer Science, Oct. 1999, pp. 2431.

    [16] B. Korte and J. Vygen, Combinatorial Optimization: Theory and Algo-rithms. Berlin, Germany: Springer-Verlag, 2000.

    [17] A. V. Goldberg, J. D. Oldham, S. Plotkin, andC. Stein,An implementa-

    tionof a combinatorialapproximation algorithm for minimum-cost mul-ticommodity flow, in Proc. 6th Int. Integer Programming and Combi-natorial Optimization Conf., June 1998, pp. 338352.

    [18] C. Albrecht, Provably good global routing by a new approximation al-gorithm for multicommodity flow, in Proc. Int. Symp. Physical Design,Apr. 2000, pp. 1925.

    [19] S. Plotkin, D. Shmoys, and . Tardos, Fast approximation algorithmsfor fractional packing and covering problems, Math. Oper. Res., vol.20, no. 2, pp. 257301, May 1995.

    [20] A. Hetzel, Verdrahtungsprobleme im VLSI-Design:Spezielle Teilprob-leme und ein sequentielles Lsungsverfahren, Ph.D. dissertation (inGerman), Univ. Bonn, Bonn, Germany, 1995.

    [21] , A sequential detailed router for huge grid graphs, in Proc. De-sign, Automation, Test Eur., Feb. 1998, pp. 332338.

    [22] A. Rohe and M. Zachariasen, Rectilinear group steiner trees and appli-cation in VLSI-design, Res. Inst. Discrete Mathematics, Univ. Bonn,Bonn, Germany, Tech. Rep. 00906, 2000.

    [23] IBM Technol. [Online]. Available: http://www.chips.ibm.com/prod-ucts/asics/products/stdcell.html[24] R. C. Carden IV and C. K.Cheng, A global router using an efficient ap-

    proximate multicommunity multiterminal flowapproximation, in Proc.28th ACM/IEEE Design Automation Conf., June 1991, pp. 316321.

    [25] L. K. Fleischer, Approximating fractional multicommodity flow inde-pendent of the number of commodities, SIAM J. Discrete Math., vol.13, no. 4, pp. 505520, 2000.

    Christoph Albrecht received the Diploma and Ph.D.degree in mathematics from the University of Bonn,Bonn, Germany, in 1996 and 2001, respectively.

    Since 1994, he has been with the ResearchInstitute for Discrete Mathematics, the University ofBonn. His research interests include combinatorial

    optimization and the application of algorithms inVLSI design.