38
F.F. Dragan F.F. Dragan (Kent State) (Kent State) A.B. Kahng A.B. Kahng (UCSD) (UCSD) I. Mandoiu I. Mandoiu (UCLA/UCSD) (UCLA/UCSD) S. Muddu S. Muddu (Sanera Systems) (Sanera Systems) A. Zelikovsky A. Zelikovsky (Georgia State) (Georgia State) Practical Approximation Algorithms for Separable Packing LPs

F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

F.F. Dragan F.F. Dragan (Kent State)(Kent State)

A.B. Kahng A.B. Kahng (UCSD)(UCSD)

I. Mandoiu I. Mandoiu (UCLA/UCSD)(UCLA/UCSD)

S. Muddu S. Muddu (Sanera Systems)(Sanera Systems)

A. Zelikovsky A. Zelikovsky (Georgia State)(Georgia State)

Practical Approximation Algorithms for Separable Packing LPs

Practical Approximation Algorithms for Separable Packing LPs

Page 2: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

2

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 3: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

3

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 4: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

4

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 5: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

5

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 6: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

6

VLSI Global Routing

Page 7: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

7

VLSI Global RoutingBuffered

Buffer Blocks

Page 8: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

8

Problem Formulation

Global Routing via Buffer-Blocks (GRBB) ProblemGiven:

• BB locations and capacities

• List of multi-pin nets– upper-bound on #buffers for each source-sink path

• L/U bounds on the wirelength b/w consecutive buffers/pins

Find:

• Buffered routing of a maximum number of nets subject to the given constraints

Page 9: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

9

Integer Program Formulation

}],[)(:)({

BlocksBuffer terminals

:),(graph Routing

ULu,vdistu,vE

V

EVG

otherwisecapacity BB terminal,is vif 1 cap(v)

otherwise 0 , if 1 ),(

}1,0{)(

)(cap)(),(..

)(max

TvvT

Tf

vTfvTts

Tf

T

T

Page 10: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

10

Enforcing Parity Constraints

• Inverting buffers change the polarity of the signal• Each sink has a given polarity requirement

Parity constraints for the #buffers on each routed source-sink path A path may use two buffers in the same buffer block

)(cap)()]'',()',([ rTfrTrTT

Integer program changes• Split each BB vertex r of G into two copies, r’ and r’’• Impose capacity constraint on the sets of vertices {r’,r’’}

Page 11: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

11

Combining with compaction

Page 12: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

12

Combining with compaction

Page 13: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

13

Combining with compaction

Set capacity constraints: cap(BB1) + cap(BB2) const.

Page 14: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

14

GRBB with Buffer Library

• Discrete buffer library: different buffer sizes/driving strengths Need to allocate BB capacity between different buffer types

)(cap)()'()',()('

rTfrsizerTT rXr

Integer program changes• Replace each BB vertex r of G by a set X(r) of vertices (one

for each buffer type)• Modify edge set of G to take into account non-uniform

driving strengths• Impose capacity constraint on the sets of vertices X(r):

Page 15: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

15

“Relax+Round” Approach to GRBB

1. Solve the fractional relaxation

– Exact linear programming algorithms are impractical for large instances

– KEY IDEA: use an approximation algorithm

• allows fine-tuning the tradeoff between runtime and solution quality

2. Round to integer solution

– Provably good rounding [RT87]

– Practical runtime (random-walk based)

Page 16: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

16

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing LP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 17: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

17

Separable Packing LP

vZcap

vvsizeRVsize

EVG

V inalevery termfor 1cap({v}) s.t. 2:function Capacity

inalevery termfor 1)( s.t. :function Size

),(graph Routing

X

T

T

vsizevTXT

Tf

XTfXTts

Tf

)(),( ),(

0)(

)(cap)(),(..

)(max

Page 18: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

18

Previous Work

• MCF and packing/covering LP approximation: [FGK73,SM90, PST91,G92,GK94,KPST94,LMPSTT95,R95,Y95,GK98,F00,…]

• Exponential length function to model flow congestion [SM90]

• Shortest-path augmentation + final scaling [Y95]

• Modified routing increment [GK98]

• Fewer shortest-path augmentations [F00]

• We extend speed-up idea of [F00] to separable packing LPs

Page 19: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

19

Separable Packing LP Algorithm

w(X) , f 0, = For i = 1 to N do For k = 1, …, #nets do Find min weight feasible Steiner tree T for net k While weight(T) < min{ 1, (1+) } do f(T)= f(T) + 1 For every X do w(X) ( 1 + (T,X)/cap(X) ) * w(X) End For Find min weight feasible Steiner tree T for net k End While End For = (1+) End ForOutput f/N

Page 20: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

20

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 21: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

21

Runtime

0)(

1)(),(..

)(cap)(min

Xf

XwXTts

XXw

X

X

Dual LP:

• Choose #iterations N such that all feasible trees have weight 1 after N iterations (i.e., 1)

• Tree weight lower bound is initially, and is multiplied by (1+) in each iteration

1

log 1N

Page 22: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

22

Approximation Guarantee

)log)nets(#( 2 LTO tree

Theorem: For every <.15, the algorithm finds factor

1/(1+4 ) approximation by choosing

where L is the maximum number of vertices in a

feasible Steiner tree. For this value of , the running

time is

1

))1)((1(

L

Page 23: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

23

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 24: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

24

Implementation choices

2-Pin 3,4-pin Multi-pin

Decomposition Star,

Minimum Spanning tree

Matching,

3-restricted Steiner tree

Not needed

Min-weight DRST Shortest path (exact)

Try all Steiner pts

+ shortest paths (exact)

Very hard!

heuristics

Rounding Random-walk Backward random-walks

Page 25: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

25

1. Store fractional flows f(T) for every feasible Steiner tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T feasible for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Problem: Impractical to store all non-zero flow trees

Provably Good Rounding

Page 26: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

26

1. Store fractional flows f(T) for every valid routing tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T routing for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Random-Walk 2-TMCF Rounding

use random walk from source to sink

Practical: random walk requires storing only flows on edges

Page 27: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

27

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

Page 28: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

28

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

Page 29: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

29

The MTMCF Rounding Heuristic

1. Round each net k with probability f(k), using backward

random walks

– No scaling-down, approximate MTMCF < OPT

2. Resolve capacity violations by greedily deleting routed paths

– Few violations

3. Greedily route remaining nets using unused BB capacity

– Further routing still possible

Page 30: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

30

Implemented Heuristics

• Greedy buffered routing:1. For each net, route sinks sequentially along shortest paths to

source or node already connected to source

2. After routing a net, remove fully used BBs

• Generalized MCF approximation + randomized rounding– G2TMCF – G3TMCF (3-pin decomposition)– G4TMCF (4-pin decomposition)– GMTMCF (no decomposition, approximate DRST)

Page 31: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

31

Experimental Setup

• Test instances extracted from next-generation SGI microprocessor

• Up to 5,000 nets, ~6,000 sinks • U=4,000 m, L=500-2,000 m• 50 buffer blocks• 200-400 buffers / BB

Page 32: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

32

% Routed Nets vs. Runtime

93

94

95

96

97

98

99

0.1 1 10 100 1000 10000 100000

CPU Seconds

% r

ou

ted

ne

ts

MT-Greed

G2TMCF

G3TMCF

G4TMCF

GMTMCF

Page 33: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

33

Conclusions and Ongoing Work

• Provably good algorithms and practical heuristics based on separable packing LP approximation– Higher completion rates than previous algorithms

• Extensions:– Combine global buffering with BB planning– Buffer “site” methodology tile graph– Routing congestion (channel capacity constraints)– Simultaneous pin assignment

Page 34: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

34

Page 35: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

35

% Sinks Connected

#sinks/

#netsGreed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

92.2 93.8 95.5 96.2 97.8 96.6 98.3 96.7 97.4

3077/ 2438

92.3 93.9 96.5 96.4 98.5 96.9 98.8 97.6 99.3

3099/ 2784

92.1 93.6 95.5 96.4 98.0 96.6 98.1 97.3 98.7

6038/ 4764

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

6296/ 4925

93.6 96.2 97.6 97.0 98.6 97.7 99.1 97.7 98.4

6321/ 4938

93.3 96.2 97.5 96.8 98.4 97.7 98.9 97.7 98.2

Page 36: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

36

Runtime (sec.)

#sinks/ #nets

Greed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

.30 1.63 357 9.16 2,090 98.91 29,190 2.33 947

3077/ 2438

.33 2.35 350 11.10 2,356 128.38 37,970 2.87 846

3099/ 2784

.33 1.80 392 12.56 2,364 132.81 38,341 2.86 877

6038/ 4764

.53 2.84 600 16.57 3,166 182.55 60,450 4.98 1,866

6296/ 4925

.55 4.35 690 19.5 3,721 265.78 77,671 5.38 1,828

6321/ 4938

.54 3.37 730 18.99 3,813 255.37 79,123 5.43 1,833

Page 37: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

37

Resource Usage

GreedG2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04# Conn. Sinks

5,645 5,725 5,842 5,779 5,896 5,827 5,942 5,813 5,897

% Conn. Sinks

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

WL (meters)

42.22 45.18 47.80 44.48 47.66 44.18 47.49 45.33 47.51

WL/sink (microns)

7,479 7,891 8,182 7,697 8,083 7,582 7,992 7,798 8,057

#Buff 9037 9,860 10,676 9,591 10,610 9,497 10,507 9,860 10,647

#Buff/sink 1.60 1.72 1.83 1.66 1.80 1.63 1.77 1.70 1.81

#nets = 4,764 #sinks = 6,038 400 buffers/BB

Page 38: F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms

38

Resource Usage for 100% Completion

Greed 4TMCF, =.04=.04

#buffers/BB 1,000 or INF 500 600 1,000 INF

WL (meters) 47.89 49.46 49.58 49.98 51.40

WL/sink (microns)

7,931 8,191 8,212 8,278 8,513

#Buff 10,330 11,079 11,115 11,373 11.803

#Buff/sink 1.71 1.83 1.84 1.88 1.95

#nets = 4,764 #sinks = 6,038 MTMCF wastes routing resources!