Upload
gil
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model. Natarajan Viswanathan Chris Chong-Nuen Chu Iowa State University International Symposium on Physical Design April 19, 2004. FastPlace – Key Features. - PowerPoint PPT Presentation
Citation preview
FastPlace: Efficient FastPlace: Efficient Analytical Placement Analytical Placement using Cell Shifting, using Cell Shifting,
Iterative Local Iterative Local Refinement and a Hybrid Refinement and a Hybrid
Net ModelNet ModelNatarajan ViswanathanNatarajan Viswanathan
Chris Chong-Nuen ChuChris Chong-Nuen Chu
Iowa State UniversityIowa State University
International Symposium on Physical International Symposium on Physical DesignDesign
April 19, 2004April 19, 2004
FastPlace – Key FeaturesFastPlace – Key Features
1.1. Cell Shifting Cell Shifting 2.2. Iterative Local Iterative Local Refinement Refinement 3.3. Hybrid Net ModelHybrid Net Model
Standard cell placementStandard cell placement Wirelength minimizationWirelength minimization Flat placementFlat placement
Efficient Analytical Efficient Analytical PlacementPlacement
usingusing
Are Existing Algorithms Are Existing Algorithms Adequate?Adequate?
Solution QualitySolution Quality
There may be significant room for improvementThere may be significant room for improvement For existing wirelength-driven placement For existing wirelength-driven placement algorithmsalgorithms
Cong et al. [ASPDAC 03] [ISPD 03]Cong et al. [ASPDAC 03] [ISPD 03] For existing timing-driven placement For existing timing-driven placement algorithmsalgorithms
Cong et al. [ICCAD 03]Cong et al. [ICCAD 03]
EfficiencyEfficiency
Important to have fast placement algorithmsImportant to have fast placement algorithms Circuit sizes are huge in modern designCircuit sizes are huge in modern design Placement must be run in early design stagesPlacement must be run in early design stages
Why Analytical ?Why Analytical ?
Inherently minimize the wirelengthInherently minimize the wirelength
Efficient IntrinsicallyEfficient Intrinsically Elegant convex quadratic programming Elegant convex quadratic programming
formulationformulation Very efficient techniques to solve convex QPVery efficient techniques to solve convex QP
Typically employ a flat placement Typically employ a flat placement methodologymethodology
All cells are placed simultaneouslyAll cells are placed simultaneously
Maintain relative positions of cells Maintain relative positions of cells throughout the placement processthroughout the placement process
Analytical Placement Analytical Placement FormulationFormulation
ectorsSolution v, cell and cellbetween net theofWeight
cell ofcenter theof sCoordinateLet
yxjiw
i),y(x
ij
ii
ectorsSolution v, cell and cellbetween net theofWeight
cell ofcenter theof sCoordinateLet
yxjiw
i),y(x
ij
ii
22 )()(2
1 cell and cellbetween net theofCost
jijiij yyxxw
ji
22 )()(2
1 cell and cellbetween net theofCost
jijiij yyxxw
ji
const2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ const
2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ
Analytical Placement FrameworkAnalytical Placement Framework::
repeatrepeatSolve the convex quadratic programSolve the convex quadratic programSpread the cellsSpread the cells
until the cells are evenly distributeduntil the cells are evenly distributed
FastPlace ApproachFastPlace Approach Framework:Framework:
repeatrepeatSolve the convex quadratic program Solve the convex quadratic program Reduce wirelength by iterative heuristic Reduce wirelength by iterative heuristic Spread the cells Spread the cells
until the cells are evenly distributed until the cells are evenly distributed
Special features of FastPlace:Special features of FastPlace: Cell ShiftingCell Shifting
Easy-to-compute techniqueEasy-to-compute technique Enable fast convergenceEnable fast convergence
Hybrid Net ModelHybrid Net Model Speed up solving of convex QP Speed up solving of convex QP
Iterative Local RefinementIterative Local Refinement Minimize wirelength based on linear objectiveMinimize wirelength based on linear objective
Framework:Framework:repeatrepeat
Solve the convex quadratic Solve the convex quadratic program program
Reduce wirelength by iterative Reduce wirelength by iterative heuristic heuristic
Spread the cells Spread the cells until the cells are evenly distributed until the cells are evenly distributed
OutlineOutlineFastPlace: FastPlace:
Efficient Analytical Efficient Analytical Placement Placement using using
1.1. Cell Shifting Cell Shifting 2.2. Iterative Local Iterative Local Refinement Refinement 3.3. Hybrid Net ModelHybrid Net Model
Spreading by Cell ShiftingSpreading by Cell Shifting Quadratic placement should produce good Quadratic placement should produce good
relative position of cellsrelative position of cells Simple shifting of cells should be able to Simple shifting of cells should be able to
produce a good placementproduce a good placement Major difficulties:Major difficulties:
1.1. How to shift cells in a 2-D region?How to shift cells in a 2-D region?
2.2. How to make sure wirelength will still be good?How to make sure wirelength will still be good? Our Approach:Our Approach:
1.1. Perform 1-D shifting in x and y directions Perform 1-D shifting in x and y directions independentlyindependently
2.2. Interleave a small amount of shifting with Interleave a small amount of shifting with quadratic placementquadratic placement
Cell ShiftingCell Shifting
Uniform Bin Structure Non-uniform Bin Structure
1.1. Shifting of bin boundary Shifting of bin boundary
2.2. Shifting of cells linearly within Shifting of cells linearly within each bineach bin Apply to all rows and all columns independentlyApply to all rows and all columns independently
Cell Shifting – Animation Cell Shifting – Animation ……
NBi
Bini
Bini+1
OBiOBi-1 OBi+1
Ui Ui+1
j
k
l
Bini
Bini+1
OBiOBi-1 OBi+1
j
k
l
NBi
Pseudo pin and Pseudo netPseudo pin and Pseudo net
Pseudo net
Additional Force
Pseudo pin
Target Position
Original Position
Pseudo net
Pseudopin
Need to add forces Need to add forces to prevent cells to prevent cells from collapsing backfrom collapsing back
Done by adding Done by adding pseudo pins and pseudo pins and pseudo netspseudo nets
Only diagonal and Only diagonal and linear terms of the linear terms of the quadratic system quadratic system need to be updatedneed to be updated
Takes a single pass Takes a single pass of of O(n)O(n) time to time to regenerate matrix Q regenerate matrix Q (which is common for (which is common for both x and y both x and y problems)problems)
OutlineOutlineFastPlace: FastPlace:
Efficient Analytical Efficient Analytical Placement Placement using using
1.1. Cell ShiftingCell Shifting 2.2. Iterative Local Iterative Local Refinement Refinement 3.3. Hybrid Net ModelHybrid Net Model
Iterative Local RefinementIterative Local Refinement Iteratively go through all the cells one by Iteratively go through all the cells one by oneone
For each cell, consider moving it in four For each cell, consider moving it in four directions by a certain distancedirections by a certain distance
Compute a score for each direction based onCompute a score for each direction based on Half-perimeter wirelength (HPWL) reductionHalf-perimeter wirelength (HPWL) reduction Cell density at the source and destination Cell density at the source and destination regionsregions
Move in the direction with highest positive Move in the direction with highest positive score score (Do not move if no positive score)(Do not move if no positive score)
Distance moved (H or V) is Distance moved (H or V) is decreasing over iterationsdecreasing over iterations
Detailed placement is handledDetailed placement is handledby the same heuristicby the same heuristic
H H
V
V
OutlineOutlineFastPlace: FastPlace:
Efficient Analytical Efficient Analytical Placement Placement using using
1.1. Cell ShiftingCell Shifting 2.2. Iterative Local Iterative Local RefinementRefinement 3.3. Hybrid Net ModelHybrid Net Model
Effect of Net Model on Effect of Net Model on RuntimeRuntime
Need to replace each multi-pin net by 2-pin netsNeed to replace each multi-pin net by 2-pin nets Then the placement problem (even with pseudo nets) Then the placement problem (even with pseudo nets) can be formulated as a convex QP:can be formulated as a convex QP:
Solved by any convex QP algorithmsSolved by any convex QP algorithms Use Incomplete Cholesky Conjugate Gradient (ICCG)Use Incomplete Cholesky Conjugate Gradient (ICCG)
Runtime is proportional to # of non-zero entries in Runtime is proportional to # of non-zero entries in Q Q
Each non-zero entry in Q corresponds to one 2-pin Each non-zero entry in Q corresponds to one 2-pin netnet
Traditionally, placers model each multi-pin net by a Traditionally, placers model each multi-pin net by a cliqueclique
High-degree nets will generate a lot of 2-pin netsHigh-degree nets will generate a lot of 2-pin nets Slow down convex QP algorithms significantlySlow down convex QP algorithms significantly
const2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ const
2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ
Clique, Star and Hybrid Clique, Star and Hybrid Net ModelsNet Models
Star Node
Clique Model Star Model Hybrid Model
# # pinspins
Net Net ModelModel
22 CliqueClique33 CliqueClique44 StarStar55 StarStar66 StarStar…… ……
Star model is introduced by Mo et al. Star model is introduced by Mo et al. [ICCAD-00] for macro placement[ICCAD-00] for macro placement
Introduce a star node even for 2-pin Introduce a star node even for 2-pin netsnets
Not clear how the placement result Not clear how the placement result will be affectedwill be affected
Equivalence of Clique and Equivalence of Clique and Star ModelsStar Models
Lemma: By setting the net weights Lemma: By setting the net weights appropriately,appropriately,
clique and star net models clique and star net models are equivalent.are equivalent.
Proof: When star node is at Proof: When star node is at equilibrium position,equilibrium position,
total forces on each cell total forces on each cell are the same forare the same for
clique and star net models.clique and star net models.
Star Node
Clique Model Star Model
Weight = γWWeight = γW Weight = γ kWfor a k-pin net
Weight = γ kWfor a k-pin net
Experimental SetupExperimental Setup ISPD-02 mixed-mode benchmark suite by ISPD-02 mixed-mode benchmark suite by IBMIBM
Macro blocks replaced by standard cells Macro blocks replaced by standard cells with width set to 4 x average cell widthwith width set to 4 x average cell width
10% whitespace10% whitespace
FastPlace implemented in CFastPlace implemented in C Compared with:Compared with:
MetaPl-Capo 8.8 in default modeMetaPl-Capo 8.8 in default mode Dragon 2.2.3 in fixed die modeDragon 2.2.3 in fixed die mode
All placers run on a 750MHz Sun Sparc-2 All placers run on a 750MHz Sun Sparc-2 machinemachine
CircuitCircuit #Nodes#Nodes #Terminals#Terminals #Nets#Nets #Pins#Pins #Rows#Rows
ibm01ibm01 12506 12506 246246 1411114111 5056650566 9696ibm02ibm02 19342 19342 259259 1958419584 8119981199 109109ibm03ibm03 22853 22853 283283 2740127401 9357393573 121121ibm04ibm04 27220 27220 287287 3197031970 105859105859 136136ibm05ibm05 28146 28146 12011201 2844628446 126308126308 139139ibm06ibm06 32332 32332 166166 3482634826 128182128182 126126ibm07ibm07 45639 45639 287287 4811748117 175639175639 166166ibm08ibm08 51023 51023 286286 5051350513 204890204890 170170ibm09ibm09 53110 53110 285285 6090260902 222088222088 183183Ibm10Ibm10 68685 68685 744744 7519675196 297567297567 234234Ibm11Ibm11 70152 70152 406406 8145481454 280786280786 208208ibm12ibm12 70439 70439 637637 7724077240 317760317760 242242ibm13ibm13 83709 83709 490490 9966699666 357075357075 224224ibm14ibm14 147088 147088 517517 152772152772 546816546816 305305ibm15ibm15 161187 161187 383383 186608186608 715823715823 303303ibm16ibm16 182980 182980 504504 190048190048 778823778823 347347ibm17ibm17 184752 184752 743743 189581189581 860036860036 379379ibm18ibm18 210341 210341 272272 201920201920 819697819697 361361
Placement Benchmark Placement Benchmark StatisticsStatistics
CircuitCircuit
# Non-zero Entries# Non-zero EntriesSpeed-UpSpeed-Up
( Hybrid / Clique )( Hybrid / Clique )Clique ModelClique Model Hybrid ModelHybrid Model Clique / HybridClique / Hybrid
ibm01ibm01 109183109183 4116441164 2.652.65 1.51.5ibm02ibm02 343409343409 7001470014 4.904.90 2.42.4ibm03ibm03 206069206069 7468074680 2.762.76 1.41.4ibm04ibm04 220423220423 8455684556 2.612.61 1.21.2ibm05ibm05 349676349676 108282108282 3.233.23 1.31.3ibm06ibm06 321308321308 106835106835 3.013.01 1.61.6ibm07ibm07 373328373328 147009147009 2.542.54 1.31.3ibm08ibm08 732550732550 173541173541 4.224.22 2.02.0ibm09ibm09 478777478777 185102185102 2.592.59 1.41.4ibm10ibm10 707969707969 251101251101 2.822.82 1.61.6ibm11ibm11 508442508442 230865230865 2.202.20 1.21.2ibm12ibm12 748371748371 270849270849 2.762.76 1.61.6ibm13ibm13 744500744500 295048295048 2.522.52 1.51.5ibm14ibm14 11251471125147 456474456474 2.462.46 1.31.3ibm15ibm15 17514741751474 607289607289 2.882.88 1.41.4ibm16ibm16 19239951923995 668491668491 2.882.88 1.31.3ibm17ibm17 22357162235716 753507753507 2.972.97 1.41.4ibm18ibm18 22218602221860 711702711702 3.123.12 1.41.4
AverageAverage 2.952.95 1.51.5
Clique Net Model vs Hybrid Net Clique Net Model vs Hybrid Net ModelModel
Half Perimeter WirelengthHalf Perimeter Wirelength
Average Wirelength Ratio Average Wirelength Ratio
FastPlace / Capo : FastPlace / Capo : 1.0101.010 FastPlace / Dragon FastPlace / Dragon : : 1.0161.016
0
10
20
30
40
50
60
70
80
ibm
01
ibm
02
ibm
03
ibm
04
ibm
05
ibm
06
ibm
07
ibm
08
ibm
09
ibm
10
ibm
11
ibm
12
ibm
13
ibm
14
ibm
15
ibm
16
ibm
17
ibm
18
Wirele
ngth
(x 1
0 e
6)
Capo 8.8 Dragon 2.2.3 FastPlace
CircuitCircuit
RuntimeRuntime Speed-UpSpeed-Up
Capo 8.8Capo 8.8 Dragon 2.2.3Dragon 2.2.3 FastPlaceFastPlace (Capo / FP)(Capo / FP) (Dragon / FP)(Dragon / FP)
ibm01ibm01 3 m 59 s3 m 59 s 29 m 06 s29 m 06 s 13 s13 s x 18.4x 18.4 x 134.3x 134.3ibm02ibm02 7 m 15 s7 m 15 s 31 m 13 s31 m 13 s 33 s33 s x 13.2x 13.2 x 56.8x 56.8ibm03ibm03 8 m 23 s8 m 23 s 31 m 49 s31 m 49 s 33 s33 s x 15.2x 15.2 x 57.8x 57.8ibm04ibm04 10 m 46 s10 m 46 s 1 h 5 m1 h 5 m 39 s39 s x 16.6x 16.6 x 100.0x 100.0ibm05ibm05 10 m 44 s10 m 44 s 1 h 48 m1 h 48 m 51 s51 s x 12.6x 12.6 x 127.1x 127.1ibm06ibm06 12 m 08 s12 m 08 s 1 h 21 m1 h 21 m 45 s45 s x 16.2x 16.2 x 108.0x 108.0ibm07ibm07 18 m 32 s18 m 32 s 1 h 47 m1 h 47 m 1 m 19 s1 m 19 s x 14.1x 14.1 x 81.3x 81.3ibm08ibm08 19 m 53 s19 m 53 s 4 h 30 m4 h 30 m 1 m 33 s1 m 33 s x 12.8x 12.8 x 174.2x 174.2ibm09ibm09 22 m 50 s22 m 50 s 3 h 43 m3 h 43 m 1 m 42 s1 m 42 s x 13.4x 13.4 x 131.2x 131.2ibm10ibm10 29 m 04 s29 m 04 s 3 h 19 m3 h 19 m 2 m 25 s2 m 25 s x 12.0x 12.0 x 82.3x 82.3ibm11ibm11 31 m 11 s31 m 11 s 2 h 22 m2 h 22 m 2 m 13 s2 m 13 s x 14.1x 14.1 x 64.1x 64.1ibm12ibm12 30 m 41 s30 m 41 s 3 h 48 m3 h 48 m 2 m 23 s2 m 23 s x 12.9x 12.9 x 95.7x 95.7ibm13ibm13 39 m 27 s39 m 27 s 3 h 04 m3 h 04 m 2 m 54 s2 m 54 s x 13.6x 13.6 x 63.4x 63.4ibm14ibm14 1 h 12 m1 h 12 m 7 h 37 m7 h 37 m 5 m 34 s5 m 34 s x 12.9x 12.9 x 82.1x 82.1ibm15ibm15 1 h 30 m1 h 30 m 10 h 34 m10 h 34 m 8 m 45 s8 m 45 s x 10.3x 10.3 x 72.4x 72.4ibm16ibm16 1 h 31 m1 h 31 m 12 h 06 m12 h 06 m 10 m 52 s10 m 52 s x 8.4x 8.4 x 66.8x 66.8ibm17ibm17 1 h 43 m1 h 43 m 26 h 54 m26 h 54 m 11 m 30 s11 m 30 s x 9.0x 9.0 x 140.3x 140.3ibm18ibm18 1 h 44 m1 h 44 m 23 h 39 m23 h 39 m 12 m 21 s12 m 21 s x 8.4x 8.4 x 114.9x 114.9
AverageAverage x 13.0x 13.0 x 97.4x 97.4
Runtime ComparisonRuntime Comparison
FastPlace - Breakdown of RuntimeFastPlace - Breakdown of Runtime
All runtime in seconds
%% RuntimRuntim
ee%%
RuntimRuntimee
% % RuntimRuntim
ee% %
RuntimRuntimee
52.9852.98
47.9947.99
11.8611.86
3.213.21
1.551.55
345.28345.28
292.74292.74
49.2049.20
25.0425.04
6.376.37
57.0957.09
53.9353.93
14.6714.67
3.913.91
1.441.44
285.57285.57
257.41257.41
56.8056.80
13.2713.27
3.753.75
740.92740.927.27.246.646.67.77.738.538.5ibm18ibm18
652.07652.077.47.444.944.98.38.339.439.4ibm16ibm16
132.53132.538.98.937.137.111.111.142.942.9ibm11ibm11
45.4345.437.17.155.155.18.68.629.229.2ibm06ibm06
13.1113.1111.811.848.648.611.011.028.628.6ibm01ibm01
TotalTotalDetail Detail
PlacementPlacement
Iterative Iterative
Local Local RefinementRefinement
Cell Cell
ShiftingShiftingGlobal Global
OptimizationOptimizationCircuitCircuit
Complexity AnalysisComplexity Analysis
Runtime ≈ O(n1.412)where n = # of pins
Runtime ≈ Runtime ≈ O(nO(n1.371.37))
where n = # of where n = # of pinspins
SummarySummary FastPlace -- Efficient Flat Placement FastPlace -- Efficient Flat Placement
AlgorithmAlgorithm 13.0x faster than Capo13.0x faster than Capo 97.4x faster than Dragon97.4x faster than Dragon Comparable WL to Capo and DragonComparable WL to Capo and Dragon
Based on three techniques:Based on three techniques:1.1. Cell ShiftingCell Shifting
Fast convergenceFast convergence Simple computationSimple computation
2.2. Iterative Local RefinementIterative Local Refinement Reduce wirelength based on HPWL measureReduce wirelength based on HPWL measure
3.3. Hybrid Net ModelHybrid Net Model 1.5x speedup compared to Clique1.5x speedup compared to Clique Applicable to any analytical placement Applicable to any analytical placement
toolstools
Thank You !!Thank You !!