A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer

Insertion

Shiyan Hu*, Zhuo Li**, Charles Alpert**Shiyan Hu*, Zhuo Li**, Charles Alpert**

*Dept of Electrical and Computer Engineering *Dept of Electrical and Computer Engineering Michigan Technological UniversityMichigan Technological University

**IBM Austin Research Lab**IBM Austin Research LabAustin, TXAustin, TX

2

Outline

3

0.180

50100150200250300

Technology generation (m)

Del

ay (p

sec)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.250.25

0.150.35

Interconnect Delay Dominates

44

Timing Driven Buffer Insertion

R

Buffers Reduce RC Wire Delay

x/2

cx/4 cx/4rx/2

∆t = t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

x/2

x

Delay grows linearly with interconnect length

6

25% Gates are Buffers

05

101520253035

Technology node

% b

uffe

red

nets

M3

M6

01020304050607080

Technology node

% c

ells

that

are

buf

fers clocked

unclocked

total

Saxena, et al.

[TCAD 2004]

7

Problem Formulation

T

Minimal cost (area/power) solution

1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate

buffer buffer locationslocations

8

Solution Characterization

To model effect To model effect to downstream, to downstream, a candidate a candidate solution is solution is associated withassociated with

• v: a nodev: a node• C: downstream C: downstream

capacitancecapacitance• Q: required Q: required

arrival timearrival time• W: cumulative W: cumulative

buffer costbuffer cost

9

Dynamic Programming (DP)

Candidate solutions are propagated toward the source

Start from sinks Candidate

solutions are generated

Three operations– Add Wire– Insert Buffer– Merge

Solution Pruning

10

Generating Candidates

(1)

(2)

(3)

11

Pruning Candidates

(3)

(a) (b)

Both (a) and (b) look the same to the source.Remove the one with the worse slack and cost

(4)

12

Merging Branches

Right Candidates

Left Candidates

O(nO(n11nn22) solutions ) solutions after each branch after each branch merge. Worst-case merge. Worst-case O((n/m)O((n/m)mm) solutions.) solutions.

13

DP Properties

((QQ11,C,C11,W,W11))

((QQ22,C,C22,W,W22))

inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22

Non-dominated solutions are Non-dominated solutions are maintained - for the same Q maintained - for the same Q and W, pick min Cand W, pick min C # solutions depends on # of # solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues

14

Previous Works

1990 1991 ……. 1996 ……. 2003 2004 ……. 2008 2009

van Ginn

eken

van Ginn

eken ’’s s

algori

thm

algori

thm

Lillis

Lillis ’’

algori

thm

algo

rithm

Shi a

nd Li’

s algo

rithm

Shi a

nd Li’

s algo

rithm

Chen a

nd Zho

u

Chen a

nd Zho

u ’’s s

algori

thm

algori

thm

NP-hard

ness

proof

NP-hard

ness

proof

1515

Bridging The Gap

We are We are bridging bridging the gap!the gap!

A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)

• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical

1616

The Rough Picture

W*: the cost of optimal solutionW*: the cost of optimal solution

Check it

Make guess on W*

Return the solution

Good (close to W*)

Not Good

Key 2: Smart guessKey 1: Efficient checking

17

Key 1: Efficient Checking

Benefit of guessBenefit of guess• Only maintain Only maintain the solutions with the solutions with cost no greater cost no greater than the guessed than the guessed costcost• Accelerate DPAccelerate DP

Oracle (x): the checker, able to decide whether x>W* Oracle (x): the checker, able to decide whether x>W* or notor not

– Without knowing W*Without knowing W*– Answer efficientlyAnswer efficiently

1818

The Oracle

Oracle (x)

Guess x within the bounds

Setup upper and lower bounds of cost W*

Update the bounds

1919

Construction of Oracle(x)

Scale and Scale and round each round each buffer costbuffer cost

nxww/

Only interested in Only interested in whether there is whether there is a solution with a solution with

cost up to x cost up to x satisfying timing satisfying timing

constraintconstraint

Dynamic Dynamic ProgrammingProgramming

Perform DP to Perform DP to scaled problem scaled problem with n/ɛ. with n/ɛ. Runtime Runtime polynomial in polynomial in n/ɛn/ɛ

20

Scaling and Rounding

xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n

Buffer cost

0

buffer costs are integers due to

rounding and are bounded by n/ɛ.

Rounding error at each buffer Rounding error at each buffer xɛɛ/n, total rounding error , total rounding error xɛ. ɛ. • Larger x: larger error, fewer Larger x: larger error, fewer distinct costs and faster distinct costs and faster • Smaller x: smaller error, more Smaller x: smaller error, more distinct costs and slower distinct costs and slower • Rounding is the reason of Rounding is the reason of accelerationacceleration

DP Results

21

Yes, there is a solution satisfying timing

constraint

No, no such solution

With cost rounding back, the solution has cost at most n/ɛ • xɛ/n

+ xɛ= (1+ɛ)x > W*

With cost rounding back, the solution has cost at least n/ɛ • xɛ/n

= x W*

DP result w/ all w are integers n/ɛ

22

Rounding on Q

# # solutions bounded by # distinct W and Qsolutions bounded by # distinct W and Q # W = O(n/ɛ# W = O(n/ɛ11))

– Rounding before DPRounding before DP # Q# Q

– Round up Q to nearest value in {0, ɛRound up Q to nearest value in {0, ɛ22T/m , 2ɛT/m , 2ɛ22T/m, T/m, 3ɛ3ɛ22T/m,…,T T/m,…,T }, }, in branch merge (m is # sinks)in branch merge (m is # sinks)

– Rounding during DPRounding during DP– # Q = O(m/ɛ# Q = O(m/ɛ22))

# non-dominated solutions is O(mn/ɛ# non-dominated solutions is O(mn/ɛ11ɛɛ22))

3ɛ3ɛ22T/T/mm

2ɛ2ɛ22T/T/mm

ɛɛ22T/mT/m 4ɛ4ɛ22T/T/mm

00

Q-W Rounding Before Branch Merge

WW

QQ

n/ɛn/ɛ11

TT

ɛɛ22T/mT/m

0 1 2 3 4

2ɛ2ɛ22T/mT/m

3ɛ3ɛ22T/mT/m

4ɛ4ɛ22T/mT/m

24

Solution Propagation: Add Wire

cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length

(v1, c1, w1, q1)(v2, c2, w2, q2)x

25

Solution Propagation: Insert Buffer

(v1, c1, w1, q1)(v1, c1b, w1b, q1b)

qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay

Buffer Insertion Runtime

branch single ain solutions dominated-non )(most At 1

2

21 bnmnO

pruning.bin - Wcross No node.each for time)( 1

22

21 bnmnbO

mergebranch aafter solutions )(21

mnO

esbuffer typ b with solutions dominated-non )( introducesinsertion buffer A 1nbO

bins- W)(1nO

27

Solution Propagation: Merge

Round q in both branchesRound q in both branches ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))

(v, cl , wl , ql) (v, cr ,wlr,qr)

Branch Merge Runtime - 1

Target Q=0Target Q=0

Branch Merge Runtime - 2

Target Q= Target Q= ɛɛ22T/m T/m

Branch Merge Runtime -3

Target Q= Target Q= 22ɛɛ22T/m T/m

Branch Merge Runtime -4

time)( each takes wherea,W Wall try a, WmergedFor 2

rl amO

)( is runtime total,0,1,...,aFor 2

21

2

1 mnOn

)( isit bins, into solutions puttingfor timeIncluding2

21

2

1

2

21 mnbnmnO

mergebranch aafter solutions )(21

mnO

32

Timing-Cost Approximate DP

Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

33

Key 2: Geometric Sequence Based Guess

U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach

Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L

Oracle (x)

x=(U+L)/2

Set U and L on W*

U= (1+ɛ)x(1+ɛ)x L= x

W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x

34

Adapt ɛAdapt ɛ11

Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L

35

U/L Related Scale and Round

Buffer cost

0U/L

xɛ/nxɛ/n

36

Conceptually

Begin with large ɛBegin with large ɛ11 and progressively reduce it and progressively reduce it (towards ɛ) according to U/L as x approaches W*(towards ɛ) according to U/L as x approaches W*

Fix ɛFix ɛ22=ɛ in rounding Q for limiting timing violation=ɛ in rounding Q for limiting timing violation

• Set ɛSet ɛ11 as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛɛ• One run of DP takes about O(n/ɛɛ11) time. Total runtime is bounded by the last run as Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(n/ɛ), independent of # iterationsO(n/ɛ), independent of # iterations

Oracle Query Till U/L<2

37

'

*,

*,

*,

*,'

1 ,1

i

iliu

il

iui

WWx

WW

)()()1()3/4(2/1

1*,

*,

2

2

1*,

*,

2

2

1'

2

2it

ti iu

il

ti iu

il

ti i WWnmO

WWnmOnmO

)() 59.0()(2

2

0

)3/4(2/1

2

2)3/4(2/1

0*,

*,

2

2

nmOnmO

WWnmO

tjtj iu

il j

j

it

tu

tl

iu

il

iu

il

iu

il

il

iu

il

iu

WW

WW

WW

WW

WW

WW

)3/4(

*,

*,

*,

*,

3/4

*,

*,

*,

*,

4/3

*,

*,

*1,

*1,

38

Mathematically

39

The Algorithmic Flow

Oracle (x)

Adapting ɛ1 =[U/L-1]1/2

Set U and L of W*

Set x=[UL/(1+ ɛ1)]1/2

Update U or L

U/L<2

Compute final solution

When U/L<2

40

At least one At least one feasible feasible solution, solution, otherwise no otherwise no solution with solution with cost 2n/ɛcost 2n/ɛ • Lɛ/n = 2L Lɛ/n = 2L U U

A single DP A single DP runtimeruntime

Pick min cost solution satisfying Pick min cost solution satisfying timing at drivertiming at driver

W=2n/ɛW=2n/ɛ

Scale and round each cost by Scale and round each cost by Lɛ/nLɛ/n

Run DP

Main Theorem

Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11

42

Experiments

Experimental SetupExperimental Setup– 1000 industrial nets1000 industrial nets– 48 buffer types including non-inverting 48 buffer types including non-inverting buffers and inverting buffersbuffers and inverting buffers

Compared to Dynamic Compared to Dynamic ProgrammingProgramming

4343

Cost Ratio Compared to DP

Approximation Ratio ɛ

Buffer Cost Ratio

00.020.040.060.080.1

0.120.14

FPTAS

4444

Speedup Compared to DP


Speedup

0123456

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS

45

Timing Violations (% nets)

0%1%2%3%4%5%6%7%

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS


Timing

violations

4646

Cost Ratio w/ Timing Recovery


Buffer Cost Ratio

0

0.05

0.1

0.15

0.2

0.25

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS FPTAS w/ Recovery

4747

Speedup w/ Timing Recovery


Speedup

0123456

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS FPTAS w/ Recovery

48

Observations

Without timing recoveryWithout timing recovery– FPTAS always achieves the theoretical guaranteeFPTAS always achieves the theoretical guarantee– Larger Larger ɛɛ leads to more speedup leads to more speedup– On average about 5x faster than dynamic programmingOn average about 5x faster than dynamic programming– Can run 4.6x faster with 0.57% solution degradationCan run 4.6x faster with 0.57% solution degradation– <5% nets with timing violations<5% nets with timing violations

With timing recoveryWith timing recovery– FPTAS well approximates the optimal solutions FPTAS well approximates the optimal solutions – Can still have >4x speedupCan still have >4x speedup

NP-Hardness NP-Hardness ComplexityComplexity

Exponential Exponential Time Time

AlgorithmAlgorithm

Our Bridge

50

Conclusion

Propose a (1+ ɛ) approximation for timing constrained Propose a (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0minimum cost buffering for any ɛ > 0– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically

The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem

51

0.18

Source: Gordon Moore, Chairman Emeritus, Intel Corp.

050

100150200250300

Technology generation (m)

Del

ay (p

sec)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.25

0.15

0.35

Summary on Buffer Insertion and Layer Assignment

This is why Moore’s law does not hold This is why Moore’s law does not hold anymore.anymore.

Interconnect Delay Scaling

Scaling factor s=0.7 per generationScaling factor s=0.7 per generation Emore Delay of a wire of length Emore Delay of a wire of length ll : :

intint = = (rl)(cl)/2= rcl(rl)(cl)/2= rcl22/2/2 (first order) (first order)

Local interconnects : Local interconnects : intint : : (r/s(r/s22)(c)(ls))(c)(ls)22/2 = rcl/2 = rcl22/2/2

– Local interconnect delay roughly unchangedLocal interconnect delay roughly unchanged

Global interconnects : Global interconnects : intint : : (r/s(r/s22)(c)(l))(c)(l)22/2= (rcl/2= (rcl22)/2s)/2s22

– Global interconnect delay doubles – unsustainableGlobal interconnect delay doubles – unsustainable

Interconnect delay increasingly more dominant Interconnect delay increasingly more dominant

Interconnect Optimization

Analogy

Advancing technology = period of city expansion More transistors = larger city Buffers = gas stations Interconnects = streets

– Lower layer = local street– Higher layer = highways

Signal delay (timing) = time to cross the city Highway is fast but its power has not been well

explored– Traditional wire sizing = make lane wider– Layer assignment = highway overpasses

R

Buffers Reduce RC Wire Delay

x/2

cx/4 cx/4rx/2

∆t = t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

x/2

x

Detailed Analysis

The delay of a wire of length L is T=rcL2/2

Assume N identical buffers with equal inter-buffer length l (L = Nl). To minimize delay

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

lCRrcL

rcCR

l gdopt

2

L

r,c – Resistance, cap. per unit lengthRd – On resistance of inverterCg – Gate input capacitance

l

Quadratic Delay -> Linear Delay

Substituting lopt back into the interconnect delay expression:

rcCR

CRcRrC

rcCR

rcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

1

cRrCrcCRLT dggdopt 2

Delay grows linearly with L instead of quadratically

58

25% Gates are Buffers

01020304050607080

90nm

65nm

45nm

32nm

Technology node

% c

ells

that

are

buf

fers clocked

unclocked

total

Saxena, et al.

[TCAD 2004]

59

Problem Formulation

T

Minimal cost (area/power) solution

1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate

buffer buffer locationslocations

60

Dynamic Programming (DP)

Candidate solutions are propagated toward the source

Start from sinks Candidate

solutions are generated

Three operations– Add Wire– Insert Buffer– Merge

Solution Pruning

61

Solution Propagation: Add Wire

cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length

(v1, c1, w1, q1)(v2, c2, w2, q2)x

62

Solution Propagation: Insert Buffer

(v1, c1, w1, q1)(v1, c1b, w1b, q1b)

qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay

63

Solution Propagation: Merge

ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))

(v, cl , wl , ql) (v, cr, wr, qr)

Solution Pruning

Needs solution pruning for accelerationNeeds solution pruning for acceleration Two candidate solutionsTwo candidate solutions

– (v, c(v, c11, q, q11,w,w11))– (v, c(v, c22, q, q22,w,w22))

Solution 1 is inferior to Solution 2 if Solution 1 is inferior to Solution 2 if – cc11 c c22 : larger load : larger load– and and qq11 q q2 2 : tighter timing: tighter timing– and and ww11 ww22: larger cost: larger cost

ENDEND

Car Race - Speed

Car Speed <=> RATCar Speed <=> RAT

Car Race - Load

Load <=> Load CapacitanceLoad <=> Load Capacitance

Faster & Smaller Load

ENDENDFaster & smaller loadFaster & smaller load(larger RAT, smaller (larger RAT, smaller

capacitance):capacitance):GoodGood

Slower & larger loadSlower & larger load(smaller RAT, larger (smaller RAT, larger

capacitance):capacitance):InferiorInferior

ENDEND

Faster & Larger Load: Result 1

ENDEND

Who will be the winner?Who will be the winner?Cannot tell at this moment, Cannot tell at this moment,

so keep both of them.so keep both of them.

Faster & Larger Load: Result 2

70

Pruning

((QQ11,C,C11,W,W11))

((QQ22,C,C22,W,W22))

inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22

Non-dominated solutions are Non-dominated solutions are maintained: for the same Q and maintained: for the same Q and W, pick min CW, pick min C # of solutions depends on # of # of solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues

7171

FPTAS For Buffer Insertion

We are We are bridging bridging the gap!the gap!

A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)

• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical

7272

The Rough Picture

W*: the cost of optimal solutionW*: the cost of optimal solution

Check it

Make guess on W*

Return the solution

Good (close to W*)

Not Good

Key 2: Smart guessKey 1: Efficient checking

7373

Key 1: Construction of Oracle(x)

Scale and Scale and round each round each buffer costbuffer cost

Only interested in Only interested in whether there is whether there is a solution with a solution with

cost up to x cost up to x satisfying timing satisfying timing

constraintconstraint

Dynamic Dynamic ProgrammingProgramming

Perform DP to Perform DP to scaled problem scaled problem with cost upper with cost upper bound n/ɛ. Time bound n/ɛ. Time polynomial in polynomial in n/ɛn/ɛ

74

Scaling and Rounding

xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n

Buffer cost

0

Timing-Cost Approximate DP

Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

75

76

Key 2: Geometric Sequence Based Guess

U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach

Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L

Oracle (x)

x=(U+L)/2

Set U and L on W*

U= (1+ɛ)x(1+ɛ)x L= x

W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x

77

Adapt ɛAdapt ɛ11

Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L

78

U/L Related Scale and Round

Buffer cost

0U/L

xɛ/nxɛ/n

Oracle Query Till U/L<2

79

'

*,

*,

*,

*,'

1 ,1

i

iliu

il

iui

WWx

WW

)()()1()3/4(2/1

1*,

*,

2

2

1*,

*,

2

2

1'

2

2it

ti iu

il

ti iu

il

ti i WWnmO

WWnmOnmO

)() 59.0()(2

2

0

)3/4(2/1

2

2)3/4(2/1

0*,

*,

2

2

nmOnmO

WWnmO

tjtj iu

il j

j

it

tu

tl

iu

il

iu

il

iu

il

il

iu

il

iu

WW

WW

WW

WW

WW

WW

)3/4(

*,

*,

*,

*,

3/4

*,

*,

*,

*,

4/3

*,

*,

*1,

*1,

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

Mathematically

80

Main Theorem

81

Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11

Extension For Layer Assignment

Theorem: a (1+ɛ) approximation to the Theorem: a (1+ɛ) approximation to the timing constrained minimum cost layer timing constrained minimum cost layer assignment problem can be computed in assignment problem can be computed in O(mnO(mn22/ɛ) time for any ɛ>0./ɛ) time for any ɛ>0.

82

Oracle Lemma: given a tree with n wire Oracle Lemma: given a tree with n wire segments and m layers, the optimal layer segments and m layers, the optimal layer assignment subject to cost budget W=n/ɛ assignment subject to cost budget W=n/ɛ can be computed in O(mnW)=O(mncan be computed in O(mnW)=O(mn22/ɛ) /ɛ) time.time.

Conclusion

A (1+ ɛ) approximation for timing constrained minimum cost A (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0 buffering for any ɛ > 0 (DAC’09)(DAC’09)– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically

The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem

A similar algorithm for layer assignment problem A similar algorithm for layer assignment problem (ICCAD’08)(ICCAD’08)

83

84

Thanks

Documents

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion