Upload
jacie
View
50
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion. Shiyan Hu *, Zhuo Li**, Charles Alpert** *Dept of Electrical and Computer Engineering Michigan Technological University **IBM Austin Research Lab Austin, TX. Outline. - PowerPoint PPT Presentation
Citation preview
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer
Insertion
Shiyan Hu*, Zhuo Li**, Charles Alpert**Shiyan Hu*, Zhuo Li**, Charles Alpert**
*Dept of Electrical and Computer Engineering *Dept of Electrical and Computer Engineering Michigan Technological UniversityMichigan Technological University
**IBM Austin Research Lab**IBM Austin Research LabAustin, TXAustin, TX
2
Outline
3
0.180
50100150200250300
Technology generation (m)
Del
ay (p
sec)
Transistor/Gate delay
Interconnect delay
0.8 0.5 0.250.25
0.150.35
Interconnect Delay Dominates
44
Timing Driven Buffer Insertion
R
Buffers Reduce RC Wire Delay
x/2
cx/4 cx/4rx/2
∆t = t_buf – t_unbuf = RC + tb – rcx2/4
x/2
cx/4 cx/4rx/2
CC R
x
∆t
x/2
x
Delay grows linearly with interconnect length
6
25% Gates are Buffers
05
101520253035
Technology node
% b
uffe
red
nets
M3
M6
01020304050607080
Technology node
% c
ells
that
are
buf
fers clocked
unclocked
total
Saxena, et al.
[TCAD 2004]
7
Problem Formulation
T
Minimal cost (area/power) solution
1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate
buffer buffer locationslocations
8
Solution Characterization
To model effect To model effect to downstream, to downstream, a candidate a candidate solution is solution is associated withassociated with
• v: a nodev: a node• C: downstream C: downstream
capacitancecapacitance• Q: required Q: required
arrival timearrival time• W: cumulative W: cumulative
buffer costbuffer cost
9
Dynamic Programming (DP)
Candidate solutions are propagated toward the source
Start from sinks Candidate
solutions are generated
Three operations– Add Wire– Insert Buffer– Merge
Solution Pruning
10
Generating Candidates
(1)
(2)
(3)
11
Pruning Candidates
(3)
(a) (b)
Both (a) and (b) look the same to the source.Remove the one with the worse slack and cost
(4)
12
Merging Branches
Right Candidates
Left Candidates
O(nO(n11nn22) solutions ) solutions after each branch after each branch merge. Worst-case merge. Worst-case O((n/m)O((n/m)mm) solutions.) solutions.
13
DP Properties
((QQ11,C,C11,W,W11))
((QQ22,C,C22,W,W22))
inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22
Non-dominated solutions are Non-dominated solutions are maintained - for the same Q maintained - for the same Q and W, pick min Cand W, pick min C # solutions depends on # of # solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues
14
Previous Works
1990 1991 ……. 1996 ……. 2003 2004 ……. 2008 2009
van Ginn
eken
van Ginn
eken ’’s s
algori
thm
algori
thm
Lillis
Lillis ’’
algori
thm
algo
rithm
Shi a
nd Li’
s algo
rithm
Shi a
nd Li’
s algo
rithm
Chen a
nd Zho
u
Chen a
nd Zho
u ’’s s
algori
thm
algori
thm
NP-hard
ness
proof
NP-hard
ness
proof
1515
Bridging The Gap
We are We are bridging bridging the gap!the gap!
A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)
• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical
1616
The Rough Picture
W*: the cost of optimal solutionW*: the cost of optimal solution
Check it
Make guess on W*
Return the solution
Good (close to W*)
Not Good
Key 2: Smart guessKey 1: Efficient checking
17
Key 1: Efficient Checking
Benefit of guessBenefit of guess• Only maintain Only maintain the solutions with the solutions with cost no greater cost no greater than the guessed than the guessed costcost• Accelerate DPAccelerate DP
Oracle (x): the checker, able to decide whether x>W* Oracle (x): the checker, able to decide whether x>W* or notor not
– Without knowing W*Without knowing W*– Answer efficientlyAnswer efficiently
1818
The Oracle
Oracle (x)
Guess x within the bounds
Setup upper and lower bounds of cost W*
Update the bounds
1919
Construction of Oracle(x)
Scale and Scale and round each round each buffer costbuffer cost
nxww/
Only interested in Only interested in whether there is whether there is a solution with a solution with
cost up to x cost up to x satisfying timing satisfying timing
constraintconstraint
Dynamic Dynamic ProgrammingProgramming
Perform DP to Perform DP to scaled problem scaled problem with n/ɛ. with n/ɛ. Runtime Runtime polynomial in polynomial in n/ɛn/ɛ
20
Scaling and Rounding
xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n
Buffer cost
0
buffer costs are integers due to
rounding and are bounded by n/ɛ.
Rounding error at each buffer Rounding error at each buffer xɛɛ/n, total rounding error , total rounding error xɛ. ɛ. • Larger x: larger error, fewer Larger x: larger error, fewer distinct costs and faster distinct costs and faster • Smaller x: smaller error, more Smaller x: smaller error, more distinct costs and slower distinct costs and slower • Rounding is the reason of Rounding is the reason of accelerationacceleration
DP Results
21
Yes, there is a solution satisfying timing
constraint
No, no such solution
With cost rounding back, the solution has cost at most n/ɛ • xɛ/n
+ xɛ= (1+ɛ)x > W*
With cost rounding back, the solution has cost at least n/ɛ • xɛ/n
= x W*
DP result w/ all w are integers n/ɛ
22
Rounding on Q
# # solutions bounded by # distinct W and Qsolutions bounded by # distinct W and Q # W = O(n/ɛ# W = O(n/ɛ11))
– Rounding before DPRounding before DP # Q# Q
– Round up Q to nearest value in {0, ɛRound up Q to nearest value in {0, ɛ22T/m , 2ɛT/m , 2ɛ22T/m, T/m, 3ɛ3ɛ22T/m,…,T T/m,…,T }, }, in branch merge (m is # sinks)in branch merge (m is # sinks)
– Rounding during DPRounding during DP– # Q = O(m/ɛ# Q = O(m/ɛ22))
# non-dominated solutions is O(mn/ɛ# non-dominated solutions is O(mn/ɛ11ɛɛ22))
3ɛ3ɛ22T/T/mm
2ɛ2ɛ22T/T/mm
ɛɛ22T/mT/m 4ɛ4ɛ22T/T/mm
00
Q-W Rounding Before Branch Merge
WW
n/ɛn/ɛ11
TT
ɛɛ22T/mT/m
0 1 2 3 4
2ɛ2ɛ22T/mT/m
3ɛ3ɛ22T/mT/m
4ɛ4ɛ22T/mT/m
24
Solution Propagation: Add Wire
cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length
(v1, c1, w1, q1)(v2, c2, w2, q2)x
25
Solution Propagation: Insert Buffer
(v1, c1, w1, q1)(v1, c1b, w1b, q1b)
qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay
Buffer Insertion Runtime
branch single ain solutions dominated-non )(most At 1
2
21 bnmnO
pruning.bin - Wcross No node.each for time)( 1
22
21 bnmnbO
mergebranch aafter solutions )(21
mnO
esbuffer typ b with solutions dominated-non )( introducesinsertion buffer A 1nbO
bins- W)(1nO
27
Solution Propagation: Merge
Round q in both branchesRound q in both branches ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))
(v, cl , wl , ql) (v, cr ,wlr,qr)
Branch Merge Runtime - 1
Target Q=0Target Q=0
Branch Merge Runtime - 2
Target Q= Target Q= ɛɛ22T/m T/m
Branch Merge Runtime -3
Target Q= Target Q= 22ɛɛ22T/m T/m
Branch Merge Runtime -4
time)( each takes wherea,W Wall try a, WmergedFor 2
rl amO
)( is runtime total,0,1,...,aFor 2
21
2
1 mnOn
)( isit bins, into solutions puttingfor timeIncluding2
21
2
1
2
21 mnbnmnO
mergebranch aafter solutions )(21
mnO
32
Timing-Cost Approximate DP
Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time
)(1
23
21
2
22
1
22
1
2
21
2
bnbmnnmbmnnmO
33
Key 2: Geometric Sequence Based Guess
U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach
Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L
Oracle (x)
x=(U+L)/2
Set U and L on W*
U= (1+ɛ)x(1+ɛ)x L= x
W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x
34
Adapt ɛAdapt ɛ11
Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L
35
U/L Related Scale and Round
Buffer cost
0U/L
xɛ/nxɛ/n
36
Conceptually
Begin with large ɛBegin with large ɛ11 and progressively reduce it and progressively reduce it (towards ɛ) according to U/L as x approaches W*(towards ɛ) according to U/L as x approaches W*
Fix ɛFix ɛ22=ɛ in rounding Q for limiting timing violation=ɛ in rounding Q for limiting timing violation
• Set ɛSet ɛ11 as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛɛ• One run of DP takes about O(n/ɛɛ11) time. Total runtime is bounded by the last run as Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(n/ɛ), independent of # iterationsO(n/ɛ), independent of # iterations
Oracle Query Till U/L<2
37
'
*,
*,
*,
*,'
1 ,1
i
iliu
il
iui
WWx
WW
)()()1()3/4(2/1
1*,
*,
2
2
1*,
*,
2
2
1'
2
2it
ti iu
il
ti iu
il
ti i WWnmO
WWnmOnmO
)() 59.0()(2
2
0
)3/4(2/1
2
2)3/4(2/1
0*,
*,
2
2
nmOnmO
WWnmO
tjtj iu
il j
j
it
tu
tl
iu
il
iu
il
iu
il
il
iu
il
iu
WW
WW
WW
WW
WW
WW
)3/4(
*,
*,
*,
*,
3/4
*,
*,
*,
*,
4/3
*,
*,
*1,
*1,
38
Mathematically
39
The Algorithmic Flow
Oracle (x)
Adapting ɛ1 =[U/L-1]1/2
Set U and L of W*
Set x=[UL/(1+ ɛ1)]1/2
Update U or L
U/L<2
Compute final solution
When U/L<2
40
At least one At least one feasible feasible solution, solution, otherwise no otherwise no solution with solution with cost 2n/ɛcost 2n/ɛ • Lɛ/n = 2L Lɛ/n = 2L U U
A single DP A single DP runtimeruntime
Pick min cost solution satisfying Pick min cost solution satisfying timing at drivertiming at driver
W=2n/ɛW=2n/ɛ
Scale and round each cost by Scale and round each cost by Lɛ/nLɛ/n
Run DP
Main Theorem
Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11
42
Experiments
Experimental SetupExperimental Setup– 1000 industrial nets1000 industrial nets– 48 buffer types including non-inverting 48 buffer types including non-inverting buffers and inverting buffersbuffers and inverting buffers
Compared to Dynamic Compared to Dynamic ProgrammingProgramming
4343
Cost Ratio Compared to DP
Approximation Ratio ɛ
Buffer Cost Ratio
00.020.040.060.080.1
0.120.14
FPTAS
4444
Speedup Compared to DP
Approximation Ratio ɛ
Speedup
0123456
0.01
0.05 0.1 0.2 0.3 0.4 0.5
FPTAS
45
Timing Violations (% nets)
0%1%2%3%4%5%6%7%
0.01
0.05 0.1 0.2 0.3 0.4 0.5
FPTAS
Approximation Ratio ɛ
Timing
violations
4646
Cost Ratio w/ Timing Recovery
Approximation Ratio ɛ
Buffer Cost Ratio
0
0.05
0.1
0.15
0.2
0.25
0.01
0.05 0.1 0.2 0.3 0.4 0.5
FPTAS FPTAS w/ Recovery
4747
Speedup w/ Timing Recovery
Approximation Ratio ɛ
Speedup
0123456
0.01
0.05 0.1 0.2 0.3 0.4 0.5
FPTAS FPTAS w/ Recovery
48
Observations
Without timing recoveryWithout timing recovery– FPTAS always achieves the theoretical guaranteeFPTAS always achieves the theoretical guarantee– Larger Larger ɛɛ leads to more speedup leads to more speedup– On average about 5x faster than dynamic programmingOn average about 5x faster than dynamic programming– Can run 4.6x faster with 0.57% solution degradationCan run 4.6x faster with 0.57% solution degradation– <5% nets with timing violations<5% nets with timing violations
With timing recoveryWith timing recovery– FPTAS well approximates the optimal solutions FPTAS well approximates the optimal solutions – Can still have >4x speedupCan still have >4x speedup
NP-Hardness NP-Hardness ComplexityComplexity
Exponential Exponential Time Time
AlgorithmAlgorithm
Our Bridge
50
Conclusion
Propose a (1+ ɛ) approximation for timing constrained Propose a (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0minimum cost buffering for any ɛ > 0– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically
The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem
51
0.18
Source: Gordon Moore, Chairman Emeritus, Intel Corp.
050
100150200250300
Technology generation (m)
Del
ay (p
sec)
Transistor/Gate delay
Interconnect delay
0.8 0.5 0.25
0.15
0.35
Summary on Buffer Insertion and Layer Assignment
This is why Moore’s law does not hold This is why Moore’s law does not hold anymore.anymore.
Interconnect Delay Scaling
Scaling factor s=0.7 per generationScaling factor s=0.7 per generation Emore Delay of a wire of length Emore Delay of a wire of length ll : :
intint = = (rl)(cl)/2= rcl(rl)(cl)/2= rcl22/2/2 (first order) (first order)
Local interconnects : Local interconnects : intint : : (r/s(r/s22)(c)(ls))(c)(ls)22/2 = rcl/2 = rcl22/2/2
– Local interconnect delay roughly unchangedLocal interconnect delay roughly unchanged
Global interconnects : Global interconnects : intint : : (r/s(r/s22)(c)(l))(c)(l)22/2= (rcl/2= (rcl22)/2s)/2s22
– Global interconnect delay doubles – unsustainableGlobal interconnect delay doubles – unsustainable
Interconnect delay increasingly more dominant Interconnect delay increasingly more dominant
Interconnect Optimization
Analogy
Advancing technology = period of city expansion More transistors = larger city Buffers = gas stations Interconnects = streets
– Lower layer = local street– Higher layer = highways
Signal delay (timing) = time to cross the city Highway is fast but its power has not been well
explored– Traditional wire sizing = make lane wider– Layer assignment = highway overpasses
R
Buffers Reduce RC Wire Delay
x/2
cx/4 cx/4rx/2
∆t = t_buf – t_unbuf = RC + tb – rcx2/4
x/2
cx/4 cx/4rx/2
CC R
x
∆t
x/2
x
Detailed Analysis
The delay of a wire of length L is T=rcL2/2
Assume N identical buffers with equal inter-buffer length l (L = Nl). To minimize delay
gddg
ggd
CRl
cRrCrclL
clCrlclCRNT
12/
2/
0dldT
02 2
opt
gd
lCRrcL
rcCR
l gdopt
2
L
r,c – Resistance, cap. per unit lengthRd – On resistance of inverterCg – Gate input capacitance
l
Quadratic Delay -> Linear Delay
Substituting lopt back into the interconnect delay expression:
rcCR
CRcRrC
rcCR
rcL
CRl
cRrCrclLT
gd
gddg
gd
gdopt
dgoptopt
2
2
1
cRrCrcCRLT dggdopt 2
Delay grows linearly with L instead of quadratically
58
25% Gates are Buffers
01020304050607080
90nm
65nm
45nm
32nm
Technology node
% c
ells
that
are
buf
fers clocked
unclocked
total
Saxena, et al.
[TCAD 2004]
59
Problem Formulation
T
Minimal cost (area/power) solution
1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate
buffer buffer locationslocations
60
Dynamic Programming (DP)
Candidate solutions are propagated toward the source
Start from sinks Candidate
solutions are generated
Three operations– Add Wire– Insert Buffer– Merge
Solution Pruning
61
Solution Propagation: Add Wire
cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length
(v1, c1, w1, q1)(v2, c2, w2, q2)x
62
Solution Propagation: Insert Buffer
(v1, c1, w1, q1)(v1, c1b, w1b, q1b)
qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay
63
Solution Propagation: Merge
ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))
(v, cl , wl , ql) (v, cr, wr, qr)
Solution Pruning
Needs solution pruning for accelerationNeeds solution pruning for acceleration Two candidate solutionsTwo candidate solutions
– (v, c(v, c11, q, q11,w,w11))– (v, c(v, c22, q, q22,w,w22))
Solution 1 is inferior to Solution 2 if Solution 1 is inferior to Solution 2 if – cc11 c c22 : larger load : larger load– and and qq11 q q2 2 : tighter timing: tighter timing– and and ww11 ww22: larger cost: larger cost
ENDEND
Car Race - Speed
Car Speed <=> RATCar Speed <=> RAT
Car Race - Load
Load <=> Load CapacitanceLoad <=> Load Capacitance
Faster & Smaller Load
ENDENDFaster & smaller loadFaster & smaller load(larger RAT, smaller (larger RAT, smaller
capacitance):capacitance):GoodGood
Slower & larger loadSlower & larger load(smaller RAT, larger (smaller RAT, larger
capacitance):capacitance):InferiorInferior
ENDEND
Faster & Larger Load: Result 1
ENDEND
Who will be the winner?Who will be the winner?Cannot tell at this moment, Cannot tell at this moment,
so keep both of them.so keep both of them.
Faster & Larger Load: Result 2
70
Pruning
((QQ11,C,C11,W,W11))
((QQ22,C,C22,W,W22))
inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22
Non-dominated solutions are Non-dominated solutions are maintained: for the same Q and maintained: for the same Q and W, pick min CW, pick min C # of solutions depends on # of # of solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues
7171
FPTAS For Buffer Insertion
We are We are bridging bridging the gap!the gap!
A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)
• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical
7272
The Rough Picture
W*: the cost of optimal solutionW*: the cost of optimal solution
Check it
Make guess on W*
Return the solution
Good (close to W*)
Not Good
Key 2: Smart guessKey 1: Efficient checking
7373
Key 1: Construction of Oracle(x)
Scale and Scale and round each round each buffer costbuffer cost
Only interested in Only interested in whether there is whether there is a solution with a solution with
cost up to x cost up to x satisfying timing satisfying timing
constraintconstraint
Dynamic Dynamic ProgrammingProgramming
Perform DP to Perform DP to scaled problem scaled problem with cost upper with cost upper bound n/ɛ. Time bound n/ɛ. Time polynomial in polynomial in n/ɛn/ɛ
74
Scaling and Rounding
xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n
Buffer cost
0
Timing-Cost Approximate DP
Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time
)(1
23
21
2
22
1
22
1
2
21
2
bnbmnnmbmnnmO
75
76
Key 2: Geometric Sequence Based Guess
U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach
Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L
Oracle (x)
x=(U+L)/2
Set U and L on W*
U= (1+ɛ)x(1+ɛ)x L= x
W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x
77
Adapt ɛAdapt ɛ11
Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L
78
U/L Related Scale and Round
Buffer cost
0U/L
xɛ/nxɛ/n
Oracle Query Till U/L<2
79
'
*,
*,
*,
*,'
1 ,1
i
iliu
il
iui
WWx
WW
)()()1()3/4(2/1
1*,
*,
2
2
1*,
*,
2
2
1'
2
2it
ti iu
il
ti iu
il
ti i WWnmO
WWnmOnmO
)() 59.0()(2
2
0
)3/4(2/1
2
2)3/4(2/1
0*,
*,
2
2
nmOnmO
WWnmO
tjtj iu
il j
j
it
tu
tl
iu
il
iu
il
iu
il
il
iu
il
iu
WW
WW
WW
WW
WW
WW
)3/4(
*,
*,
*,
*,
3/4
*,
*,
*,
*,
4/3
*,
*,
*1,
*1,
)(1
23
21
2
22
1
22
1
2
21
2
bnbmnnmbmnnmO
Mathematically
80
Main Theorem
81
Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11
Extension For Layer Assignment
Theorem: a (1+ɛ) approximation to the Theorem: a (1+ɛ) approximation to the timing constrained minimum cost layer timing constrained minimum cost layer assignment problem can be computed in assignment problem can be computed in O(mnO(mn22/ɛ) time for any ɛ>0./ɛ) time for any ɛ>0.
82
Oracle Lemma: given a tree with n wire Oracle Lemma: given a tree with n wire segments and m layers, the optimal layer segments and m layers, the optimal layer assignment subject to cost budget W=n/ɛ assignment subject to cost budget W=n/ɛ can be computed in O(mnW)=O(mncan be computed in O(mnW)=O(mn22/ɛ) /ɛ) time.time.
Conclusion
A (1+ ɛ) approximation for timing constrained minimum cost A (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0 buffering for any ɛ > 0 (DAC’09)(DAC’09)– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically
The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem
A similar algorithm for layer assignment problem A similar algorithm for layer assignment problem (ICCAD’08)(ICCAD’08)
83
84
Thanks