Upload
zola
View
49
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Constraint-Driven Large Scale Circuit Placement Algorithms. Advisor: Prof. Jason Cong Student: Min Xie September, 2006. Outline. Chapter 1. Introduction Chapter 2. Optimality and scalability study of existing placement algorithms - PowerPoint PPT Presentation
Citation preview
Constraint-Driven Large Scale Circuit Constraint-Driven Large Scale Circuit Placement AlgorithmsPlacement Algorithms
Advisor: Prof. Jason CongAdvisor: Prof. Jason Cong
Student: Min XieStudent: Min Xie
September, 2006September, 2006
UCLA VLSICAD LAB
OutlineOutline Chapter 1. IntroductionChapter 1. Introduction
Chapter 2. Optimality and scalability study of existing placement Chapter 2. Optimality and scalability study of existing placement
algorithmsalgorithms
Chapter 3. Routability driven multilevel global placement and white Chapter 3. Routability driven multilevel global placement and white
space allocation space allocation
Chapter 4. A robust legalization scheme for mixed-size placementChapter 4. A robust legalization scheme for mixed-size placement
Chapter 5. Applications of mixed-size placement legalizationChapter 5. Applications of mixed-size placement legalization
Chapter 6. “Global” localized preprocessing for detailed placementChapter 6. “Global” localized preprocessing for detailed placement
Chapter 7. Heterogeneous placement for FPGAsChapter 7. Heterogeneous placement for FPGAs
Chapter 8. Conclusions and future worksChapter 8. Conclusions and future works
UCLA VLSICAD LAB
Publication ListPublication List Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,” Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,”
Proceedings of the ICCADProceedings of the ICCAD, pp. 51-58, 2002., pp. 51-58, 2002.
Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement
Algorithms,” Algorithms,” Proceedings of ASPDACProceedings of ASPDAC, pp. 621-627, 2003., pp. 621-627, 2003.
Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of
Existing Partitioning and Placement Algorithms,” Existing Partitioning and Placement Algorithms,” Proceedings of ISPDProceedings of ISPD, pp. 88-94, , pp. 88-94,
2003.2003.
Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven
Placement Algorithms,” Placement Algorithms,” Proceedings of ICCADProceedings of ICCAD, pp. 472-478, 2003., pp. 472-478, 2003.
Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement: Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement:
Gap and Promise,” Gap and Promise,” Proceedings of ICCADProceedings of ICCAD, pp. 883-890, 2003., pp. 883-890, 2003.
Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing
Placement Algorithms,” Placement Algorithms,” IEEE TCADIEEE TCAD, vol. 23, no. 4, pp. 537-549, 2004., vol. 23, no. 4, pp. 537-549, 2004.
UCLA VLSICAD LAB
Publication ListPublication List Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement
and White Space Allocation,” and White Space Allocation,” Proceedings of ICCADProceedings of ICCAD, pp. 883-890, 2004., pp. 883-890, 2004.
J. Cong, J. Fang, M. Xie, and Y. Zhang, J. Cong, J. Fang, M. Xie, and Y. Zhang, "MARS - A Multilevel Full-Chip Gridless Routing System," IEEE TCAD,IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. Vol. 24, No. 3, pp. 382-394, March 2005.
J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit Placement," Placement," ACM TODAES,ACM TODAES, Vol. 10, No. 2, pp. 389-430, April 2005. Vol. 10, No. 2, pp. 389-430, April 2005.
Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” and White Space Allocation,” IEEE TCAD, to appearIEEE TCAD, to appear..
T. Chan , J. Cong M. Romesis J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust T. Chan , J. Cong M. Romesis J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust Multilevel Mixed-size Placement Engine,” Multilevel Mixed-size Placement Engine,” Proceedings of ISPDProceedings of ISPD, pp. 227-229, April , pp. 227-229, April 2005.2005.
Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Designs”, Designs”, Proceedings of ASPDACProceedings of ASPDAC, pp.188-194., 2006., pp.188-194., 2006.
J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel Mixed-size PlacementMixed-size Placement,," " Proceedings of the ISPDProceedings of the ISPD, pp. 212-214, April 2006. , pp. 212-214, April 2006.
UCLA VLSICAD LAB
Relative Wirelength
mPL 1.0 [ICCAD00]• Recursive ESC clustering• NLP at coarsest level• Goto discrete relaxation• Slot Assignment legalization• Domino detailed placement
year2000 2001 2002 2003 2004
mPL 1.1• FC-Clustering• added partitioning to legalization
mPL 2.0 • RDFL relaxation• primal-dual netlist pruning
mPL 3.0 [ICCAD 03]• QRS relaxation• AMG interpolation• multiple V-cycles• cell-area fragmentation
UNIFORM CELL SIZE
NON-UNIFORM CELL SIZE
mPL 4.0• improved DP• better coarsening • backtracking V-cycle
mPL5,mPL6• Multilevel Force-Directed
A Brief History of mPLA Brief History of mPL
UCLA VLSICAD LAB
Multiscale Optimization FrameworkMultiscale Optimization Framework
Interpolation &Relaxation (optimization)
Coarsening(Clustering)
Prob
lem
siz
e de
crea
ses
• Explores different scales of the solution space at different levels• Supports VERY FAST and SCALABLE methods• Supports inclusion of complicated objectives and constraints• Successful across MANY DIVERSE applications
Given problem
UCLA VLSICAD LAB
mPL6 – Generalized Force Directed RefinementmPL6 – Generalized Force Directed Refinement
Logsum wirelengthLogsum wirelength
Average bin densityAverage bin density
Equality constraintEquality constraint Average bin density = utilization Average bin density = utilization
ratioratio
1
1
3
2
432
v6
v5
v4
v3
v2
v1v7
= a13(v7) = fractional area of cell v7 in bin B13
7
1
area)bin /()(k
kijij vaD
UCLA VLSICAD LAB
mPL6 – Iterative FlowmPL6 – Iterative Flow
Level 3
Level 2
Level 1
C
C
I
I
C+I
C+I
I
I
Bestchoice clustering [Alpert et al, ISPD05]Bestchoice clustering [Alpert et al, ISPD05]
AMG declustering [Chen et al, DAC03, Chan et al ICCAD03]AMG declustering [Chen et al, DAC03, Chan et al ICCAD03]
Multiple V cycle with distance based reclustering [Chan et al, ICCAD03]Multiple V cycle with distance based reclustering [Chan et al, ICCAD03]
UCLA VLSICAD LAB
OutlineOutline Chapter 1. IntroductionChapter 1. Introduction
Chapter 2. Optimality and scalability study of existing placement algorithmsChapter 2. Optimality and scalability study of existing placement algorithms
Chapter 3. Routability driven multilevel global placement and white space Chapter 3. Routability driven multilevel global placement and white space allocation allocation Motivation and previous workMotivation and previous work Routability-driven multilevel placementRoutability-driven multilevel placement Experiment resultsExperiment results Conclusions and future workConclusions and future work
Chapter 4. A robust legalization scheme for mixed-size placementChapter 4. A robust legalization scheme for mixed-size placement
Chapter 5. Applications of mixed-size placement legalizationChapter 5. Applications of mixed-size placement legalization
Chapter 6. “Global” localized preprocessing for detailed placementChapter 6. “Global” localized preprocessing for detailed placement
Chapter 7. Heterogeneous placement for FPGAsChapter 7. Heterogeneous placement for FPGAs
Chapter 8. Conclusions and future worksChapter 8. Conclusions and future works
UCLA VLSICAD LAB
MotivationMotivation
mPL does not consider routing congestion mPL does not consider routing congestion Aggressive HPWL minimization != routabilityAggressive HPWL minimization != routability
Routability-driven placementRoutability-driven placement Routability modelingRoutability modeling
Routability optimizationRoutability optimization
UCLA VLSICAD LAB
Previous Work -- Routability ModelingPrevious Work -- Routability Modeling Topology-free methods Topology-free methods
Dragon [Yang et al., TCAD03] Dragon [Yang et al., TCAD03] Sparse [Hu et al., ICCAD02] Sparse [Hu et al., ICCAD02] BonnPlace [Brenner & Rohe, ISPD02]BonnPlace [Brenner & Rohe, ISPD02]
Topology-based methods Topology-based methods [Mayrhofer & Lauther, ICCAD90] [Mayrhofer & Lauther, ICCAD90] mPG [Chang et al., ISPD02]mPG [Chang et al., ISPD02]
UCLA VLSICAD LAB
Previous Work -- Routability OptimizationPrevious Work -- Routability Optimization
Cell weightingCell weighting Cell inflation based on congestionCell inflation based on congestion
Constructive and iterative methodsConstructive and iterative methods• Dragon [Yang et al, TCAD03]Dragon [Yang et al, TCAD03]• BonnPlace [Brenner & Rohe, ISPD02]BonnPlace [Brenner & Rohe, ISPD02]
Net weightingNet weighting Translate into bin weights and optimize weighted wirelengthTranslate into bin weights and optimize weighted wirelength
Iterative methodsIterative methods• Sparse [Hu & Sadowska, ICCAD02]Sparse [Hu & Sadowska, ICCAD02]• mPG [Chang et al, ISPD02]mPG [Chang et al, ISPD02]
UCLA VLSICAD LAB
Routability-Driven Multilevel PlacementRoutability-Driven Multilevel Placement
Global placementGlobal placement Congestion estimation by a fast LZ routerCongestion estimation by a fast LZ router
Congestion-driven cell re-placement based on weighted Congestion-driven cell re-placement based on weighted wirelengthwirelength
Hierarchical top-down white space allocationHierarchical top-down white space allocation Geometric-based slicing tree Geometric-based slicing tree
Congestion estimation on treeCongestion estimation on tree
Cutline adjustmentCutline adjustment
UCLA VLSICAD LAB
mPL-R Congestion Estimation with LZ RoutermPL-R Congestion Estimation with LZ Router
Use LZ-Router [Chang et al., ISPD02] for fast congestion analysis on each level
Binary search on V-stem (or H-stem) Initialize left region and right
region to cover bounding box Repeat
• Query wire usage on both regions
• Select region with less congestion
Left region Right region
HVH VHV
Less congested
More congested
UCLA VLSICAD LAB
mPL-R Congestion-Driven Re-PlacementmPL-R Congestion-Driven Re-Placement
Pick cells whose incident nets Pick cells whose incident nets
cross congested regions to movecross congested regions to move
Start from the optimal location Start from the optimal location
for HPWLfor HPWL
Search adjacent bins within Search adjacent bins within
certain windowcertain window
0.5 1.2 2.0
WLc = 15.5
Choose the bin based on Choose the bin based on
weighted WLweighted WL
WLc = 9.2
UCLA VLSICAD LAB
White Space Allocation -- Slicing Tree ConstructionWhite Space Allocation -- Slicing Tree Construction
root
A B
C D
E F
G H
A B C D E F G H
Recursively bipartition chip region from top to bottom.
Estimate congestion on leaf nodes. Congestion on other nodes can be computed from bottom to top.
Cut direction
Cut location
Node area
Congestion
Group cells into children nodes according to location relative to cutline.
UCLA VLSICAD LAB
A B
C D
E F
G H
A B C D E F G H
White Space Allocation – Cutline AdjustmentWhite Space Allocation – Cutline Adjustment
Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their overflow.
root240/88
116/28 124/60
cell area/congestion
Assuming chip area of root = 300
Total WS area = 300 – 240 = 60
WS area for left child = 60*28/(28+60) = 19.1
WS area for right child= 40.9
Chip area for left child = 116+19.1 = 135.1
Chip area for right child = 124+40.9 = 164.9
A B
C D
E F
G H
UCLA VLSICAD LAB
A B
C D
E F
G H
A B C D E F G H
White Space Allocation – Cutline AdjustmentWhite Space Allocation – Cutline Adjustment
Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions.
root240/88
116/28 124/60
54/9 62/19 58/34 66/26
cell area/congestion
UCLA VLSICAD LAB
A B
C D
E F
G H
A B C D E F G H
White Space Allocation – Cutline AdjustmentWhite Space Allocation – Cutline Adjustment
Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions.
root240/88
116/28 124/60
cell area/congestion
UCLA VLSICAD LAB
Experiment SetupExperiment Setup 16 IBM version 2 examples16 IBM version 2 examples
5% to 15% white space5% to 15% white space
Three state-of-the-art routability-driven placersThree state-of-the-art routability-driven placers Dragon-fd 3.01 [Yang et al, TCAD03]Dragon-fd 3.01 [Yang et al, TCAD03]
• Simulated annealing with bin swappingSimulated annealing with bin swapping• Two-step white space allocationTwo-step white space allocation
Capo 10.0 [Roy et al, ISPD06]Capo 10.0 [Roy et al, ISPD06]• Fast steiner tree approximation Fast steiner tree approximation • Congestion based cutline shiftingCongestion based cutline shifting
Fengshui 5.1 [Agnihotri et al, ISPD05] Fengshui 5.1 [Agnihotri et al, ISPD05] • Recursive bi-sectionRecursive bi-section• Similar white space allocation method incorporatedSimilar white space allocation method incorporated
Magma router for evaluationMagma router for evaluation
UCLA VLSICAD LAB
Routability-Driven Placement Tools ComparisonRoutability-Driven Placement Tools Comparison
route WL
0.900
0.950
1.000
1.050
1.100
1.150
Dragon3.01
Capo 10.0 Fengshui5.1
mPL-R+WSA
#violation
-200
0200
400600
8001000
12001400
1600
Dragon 3.01 Capo 10.0 Fengshui 5.1 mPL-R+WSA
mPL-R+WSA is the only flow to produce all successful routing
mPL-R+WSA produces the shortest wirelength
UCLA VLSICAD LAB
Routability Optimization Techniques ComparisonRoutability Optimization Techniques Comparison
mPLmPL Latest pure WL-driven versionLatest pure WL-driven version
No consideration of routing congestionNo consideration of routing congestion
mPL-RmPL-R
mPL-ImPL-I Cell inflation + dummy density assignmentCell inflation + dummy density assignment
Highest quality in ISPD06 contest [Nam ISPD06]Highest quality in ISPD06 contest [Nam ISPD06]
Density target set as utilizationDensity target set as utilization
mPL+WSAmPL+WSA
mPL-R+WSAmPL-R+WSA
UCLA VLSICAD LAB
Routability Optimization Techniques ComparisonRoutability Optimization Techniques Comparison
route WL
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
mPL mPL-R mPL-I mPL+WSA mPL-R+WSA
#success
0
2
4
6
8
10
12
14
16
18
mPL mPL-R mPL-I mPL+WSA mPL-R+WSA
mPL-I with heuristic penalty term does not perform very well
Both mPL-R and WSA improves routability significantly
Combined workflow gives the highest completion rate
UCLA VLSICAD LAB
OutlineOutline Chapter 1. IntroductionChapter 1. Introduction
Chapter 2. Optimality and scalability study of existing placement algorithmsChapter 2. Optimality and scalability study of existing placement algorithms
Chapter 3. Routability driven multilevel global placement and white space Chapter 3. Routability driven multilevel global placement and white space
allocation allocation
Chapter 4. A robust legalization scheme for mixed-size placementChapter 4. A robust legalization scheme for mixed-size placement
Chapter 5. Applications of mixed-size placement legalizationChapter 5. Applications of mixed-size placement legalization Enhancement for macro legalization algorithmEnhancement for macro legalization algorithm Additional experiment resultsAdditional experiment results
Chapter 6. “Global” localized preprocessing for detailed placementChapter 6. “Global” localized preprocessing for detailed placement
Chapter 7. Heterogeneous placement for FPGAsChapter 7. Heterogeneous placement for FPGAs
Chapter 8. Conclusions and future worksChapter 8. Conclusions and future works
UCLA VLSICAD LAB
Enhancement for Macro LegalizationEnhancement for Macro Legalization
Constraint graph reductionConstraint graph reduction Original constraint graph Original constraint graph
• One edge for each pair of macrosOne edge for each pair of macros• O(nO(n22) in total) in total
Reduced constraint graphReduced constraint graph• Edge inserted only when no Edge inserted only when no
transitive closure presenttransitive closure present• Significant reduction of memory Significant reduction of memory
consumptionconsumption
?A
B
C
UCLA VLSICAD LAB
Experiment Result with ICCAD04-MSExperiment Result with ICCAD04-MSold new old new
ibm01 30184 6600 2.16E+06 2.15E+06ibm02 36631 7459 4.78E+06 4.64E+06ibm03 41947 8414 6.47E+06 6.68E+06ibm04 43400 7819 7.15E+06 7.23E+06ibm05 9.32E+06 9.32E+06ibm06 15799 4141 5.67E+06 5.66E+06ibm07 42249 8100 9.80E+06 9.64E+06ibm08 45214 8077 1.13E+07 1.12E+07ibm09 31940 6330 1.21E+07 1.21E+07ibm10 308569 29383 2.94E+07 2.90E+07ibm11 69439 10256 1.76E+07 1.76E+07ibm12 211641 23926 3.14E+07 3.13E+07ibm13 89730 12769 2.27E+07 2.24E+07ibm14 188299 20803 3.58E+07 3.57E+07ibm15 77081 11549 4.60E+07 4.61E+07ibm16 104719 14981 5.38E+07 5.35E+07ibm17 288492 30100 6.34E+07 6.35E+07ibm18 40517 7346 4.29E+07 4.29E+07Avg 1 0.16 1.00 1.00
circuit#edge WL
84% reduction of constraint edges84% reduction of constraint edges
No degradation of solution qualityNo degradation of solution quality
UCLA VLSICAD LAB
Enhancement for Macro LegalizationEnhancement for Macro Legalization
vijij
vijijijij
hijij
hijijijij
vijji
ji
hijji
ji
iiii
iiii
n
i
n
ijiijij
n
iiiii
Gefy
GeyyVfy
Gefx
GexxHfx
Gehh
'-yy
Geww
'-xx
nidyyydy
nidxxxdxts
fyfxdywydxwx
0
)''(
0
)''(
2
'
2
'
1 '
1 ' ..
min1 ,11
fij
xHij
Used in ISPD 2006 placement contestUsed in ISPD 2006 placement contest
UCLA VLSICAD LAB
ISPD05 ExamplesISPD05 Examplescircuit #cell #macro #pad #net #row utilization
adaptec1 210967 63 480 221142 890 76%adaptec2 254616 159 407 266009 1170 79%adaptec3 451746 723 0 466758 1944 75%adaptec4 496141 1329 0 515951 1944 63%bigblue1 277636 32 528 284479 890 55%bigblue2 557962 23084 0 577235 1566 62%bigblue3 1096908 3778 0 1123170 2316 86%bigblue4 2177449 8170 0 2229886 2694 66%
Bigger problem sizeBigger problem size
Suitable to test scalabilitySuitable to test scalability
UCLA VLSICAD LAB
Scalability Comparison on ISPD05Scalability Comparison on ISPD05-- Global Placements by APlace-- Global Placements by APlace
FWL RT(s) FWL RT(s)Adaptec1 7.83E+07 1846 7.91E+07 417Adaptec2 9.57E+07 3616 9.66E+07 406Adaptec3 2.19E+08 10142 2.22E+08 708Adaptec4 2.09E+08 13115 2.13E+08 955Bigblue1 1.00E+08 2767 1.05E+08 506Bigblue2 1.54E+08 13848 1.56E+08 1846Bigblue3 4.12E+08 27186 3.84E+08 3133Bigblue4 8.71E+08 103002 8.84E+08 5732
Avg. 1.00 1.00 1.01 0.12
circuitAPlace XDP
XDP produces 1% longer WL, but is 10X fasterXDP produces 1% longer WL, but is 10X faster
UCLA VLSICAD LAB
Scalability Comparison on ISPD05Scalability Comparison on ISPD05-- Global Placements by mPL-- Global Placements by mPL
FWL RT(s) FWL RT(s)Adaptec1 7.84E+07 1665 7.85E+07 430Adaptec2 9.28E+07 2906 9.28E+07 518Adaptec3 2.15E+08 8050 2.17E+08 944Adaptec4 1.94E+08 9784 1.97E+08 836Bigblue1 9.86E+07 2031 9.74E+07 483Bigblue2 1.53E+08 11713 1.54E+08 2200Bigblue3 3.50E+08 31414 3.49E+08 2371Bigblue4 8.50E+08 60249 8.37E+08 5363
Avg. 1.00 1.00 1.00 0.15
circuitAPlace XDP
XDP can be 10x faster with comparable qualityXDP can be 10x faster with comparable quality
UCLA VLSICAD LAB
Impact of Gradual Macro Legalization – ISPD05Impact of Gradual Macro Legalization – ISPD05
WL runtime(s) WL runtime(s)adaptec1 7.79E+07 2895 7.22E+07 3279adaptec2 9.20E+07 3002 8.81E+07 3500adaptec3 2.14E+08 9410 1.63E+08 12492adaptec4 1.94E+08 8844 1.60E+08 12674bigblue1 9.68E+07 3637 9.91E+07 3704bigblue2 1.52E+08 10326 1.12E+08 15475bigblue3 3.44E+08 13565 3.14E+08 19948bigblue4 8.29E+08 30664 7.60E+08 35766
Avg. 1.00 1.00 0.88 1.28
circuitfixed movable
12 % WL reduction possible with macros movable12 % WL reduction possible with macros movable
UCLA VLSICAD LAB
OutlineOutline Chapter 1. IntroductionChapter 1. Introduction
Chapter 2. Optimality and scalability study of existing placement algorithmsChapter 2. Optimality and scalability study of existing placement algorithms
Chapter 3. Routability driven multilevel global placement and white space Chapter 3. Routability driven multilevel global placement and white space allocation allocation
Chapter 4. A robust legalization scheme for mixed-size placementChapter 4. A robust legalization scheme for mixed-size placement
Chapter 5. Applications of mixed-size placement legalizationChapter 5. Applications of mixed-size placement legalization
Chapter 6. “Global” localized preprocessing for detailed placementChapter 6. “Global” localized preprocessing for detailed placement
Chapter 7. Heterogeneous placement for FPGAsChapter 7. Heterogeneous placement for FPGAs Motivation and previous worksMotivation and previous works Multilevel heterogeneous placement – mPL-HMultilevel heterogeneous placement – mPL-H Experiment resultsExperiment results Conclusions and future workConclusions and future work
Chapter 8. Conclusions and future worksChapter 8. Conclusions and future works
UCLA VLSICAD LAB
MotivationMotivation
Popularity of FPGAsPopularity of FPGAs Ease of useEase of use
Low cost for small to medium productionLow cost for small to medium production
Modern FPGA placement impose heterogeneous Modern FPGA placement impose heterogeneous
constraintsconstraints Memory block of different capacity, DSP blocksMemory block of different capacity, DSP blocks
Each block should only be placed on sites of the same typeEach block should only be placed on sites of the same type
UCLA VLSICAD LAB
Example FPGA ChipExample FPGA Chip
Figure taken from Altera Stratix Handbook
UCLA VLSICAD LAB
Previous Works -- AcademiaPrevious Works -- Academia Simulated annealingSimulated annealing
VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00]VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00] PATH [Kong, ICCAD02]PATH [Kong, ICCAD02] SPCD [Chen & Cong, FPL04, FPGA05]SPCD [Chen & Cong, FPL04, FPGA05]
PartitioningPartitioning PPFF [Maidee et al, DAC03]PPFF [Maidee et al, DAC03]
Graph embeddingGraph embedding CAPRI [CAPRI [Gopalakrishnan et al, DAC06]
MultilevelMultilevel Ultrafast-VPR [Sankar & Rose, FPGA99]Ultrafast-VPR [Sankar & Rose, FPGA99] mPG-ms [Cong & Yuan, ASPDAC03]mPG-ms [Cong & Yuan, ASPDAC03]
None of them handle heterogeneous constraintNone of them handle heterogeneous constraint
UCLA VLSICAD LAB
Previous Works -- IndustryPrevious Works -- Industry
Quartus II by Altera CorporationQuartus II by Altera Corporation Stratix, Stratix II, etc.Stratix, Stratix II, etc.
ISE by Xilinx CorporationISE by Xilinx Corporation Virtex II, Virtex II Pro, etc.Virtex II, Virtex II Pro, etc.
Do have heterogeneous capabilityDo have heterogeneous capability Only for proprietary chip architectureOnly for proprietary chip architecture
Algorithms and techniques not publicly documentedAlgorithms and techniques not publicly documented
UCLA VLSICAD LAB
Multilevel Heterogeneous Placement – mPL-HMultilevel Heterogeneous Placement – mPL-H
Based on multilevel generalized force directed placementBased on multilevel generalized force directed placement
Multi-layered placement to handle heterogeneous Multi-layered placement to handle heterogeneous
placementplacement
Filler cells to enhance quality and stabilityFiller cells to enhance quality and stability
Gradual carry chain legalizationGradual carry chain legalization
UCLA VLSICAD LAB
Limitations of mPL for Heterogeneous PlacementLimitations of mPL for Heterogeneous Placement
Does not consider heterogeneous constraintsDoes not consider heterogeneous constraints Any block can be placed anywhereAny block can be placed anywhere
Requires density to be uniform everywhereRequires density to be uniform everywhere Penalize wirelength for low utilizationPenalize wirelength for low utilization
UCLA VLSICAD LAB
mPL-H -- Global Placement (I)mPL-H -- Global Placement (I) Multiple layers, each layer for Multiple layers, each layer for
each resourceeach resource DSP layerDSP layer
M-RAM layerM-RAM layer
LAB layerLAB layer
M4K layerM4K layer
M512 layerM512 layer
Forbidden regions blocked by Forbidden regions blocked by
obstaclesobstacles
Uniform wirelength computationUniform wirelength computation
DSP
M-RAM
LAB
UCLA VLSICAD LAB
mPL-H -- Global Placement (II)mPL-H -- Global Placement (II)
Filler cell Filler cell
Occupy the residual capacityOccupy the residual capacity
Transform inequality into Transform inequality into equalityequality
Density computed Density computed independently on each layerindependently on each layer
Granularity may not be fine Granularity may not be fine enoughenough
ijij CD
ijijij CdD
UCLA VLSICAD LAB
mPL-H -- Legalization (I)mPL-H -- Legalization (I)
DSP and memory blocksDSP and memory blocks Domains do not overlapDomains do not overlap
• Legalized independentlyLegalized independently
Uniform size for the same typeUniform size for the same type• Linear assignment O(nLinear assignment O(n33))• Cost as distanceCost as distance
cellssites
UCLA VLSICAD LAB
mPL-H -- Legalization (II)mPL-H -- Legalization (II) Carry chainsCarry chains
Vary in lengthVary in length Legalized in descending order Legalized in descending order
of lengthof length Partition each column into Partition each column into
same sizesame size Assign chains of same length Assign chains of same length
using linear assignment using linear assignment
UCLA VLSICAD LAB
mPL-H -- Legalization (III)mPL-H -- Legalization (III)
Column-wise rearrangement of carry chainsColumn-wise rearrangement of carry chains P(n,m) is the minimum perturbation of assign (vP(n,m) is the minimum perturbation of assign (v11,…v,…vn) to sites (s) to sites (s11,s,s22,…,…
ssmm))
P(1,j) = d(1,j), d(1,j) is the perturbation of assigning vP(1,j) = d(1,j), d(1,j) is the perturbation of assigning v11 to site s to site sjj
P(i,j) = min{P(i-1,j-hP(i,j) = min{P(i-1,j-hii), P(i, j-1)}), P(i, j-1)}
Can be solved more efficiently for some special casesCan be solved more efficiently for some special cases• Quadratic distanceQuadratic distance• No site constraintNo site constraint
UCLA VLSICAD LAB
Experiment SettingExperiment Setting
Quartus_map
Verilog netlist
Quartus_fitter mPL-HClustered .vqm netlist
Quartus_router
Chip type
Architecture
Description
XML
.qsf placement .qsf placement
UCLA VLSICAD LAB
QUIP Suite QUIP Suite circuit LUT LAB I/O Mem bits DSP
fip_risc8 1791 219 113 384 0mux64_16bit 1188 150 87 0 0mux8_128bit 1155 141 140 0 0oc_cordic_p2r 1016 111 82 0 0oc_cordic_r2p 1424 157 74 0 0oc_aes_core 1474 181 388 32768 0
oc_aes_core_inv 1614 183 389 34176 0oc_aquarius 5530 646 35 131072 8
oc_cfft_1024x12 1601 191 68 24576 0oc_des_des3area 1072 120 304 0 0oc_des_des3perf 13791 1569 298 2744 0oc_des_perf_opt 4566 550 185 0 0
oc_fpu 6693 793 110 0 8oc_mem_ctrl 3222 387 267 0 0
oc_mips 3294 387 201 1152 0oc_oc8051 2742 331 164 4608 0
oc_video_compression_systems_dct 36410 4440 25 18688 0oc_video_compression_systems_jpeg 32178 3924 47 16640 0
oc_wb_dma 3087 396 444 0 0os_blowfish 1445 168 585 67168 0
UCLA VLSICAD LAB
Wirelength ComparisonWirelength ComparisonPWL RWL PWL ratio RWL ratio
fip_risc8 219 6322 15828 5872 0.93 15304 0.97mux64_16bit 150 4560 9464 4582 1.00 9728 1.03mux8_128bit 141 3608 7328 3541 0.98 6556 0.89
oc_cordic_p2r 111 2981.5 6852 2786 0.93 6264 0.91oc_cordic_r2p 157 4239 8848 3889 0.92 8260 0.93oc_aes_core 181 16362 27100 15537 0.95 25944 0.96
oc_aes_core_inv 183 18639 30288 17962 0.96 30680 1.01oc_aquarius 646 32683.5 78616 31703 0.97 78280 1.00
oc_cfft_1024x12 191 5914.5 12256 5757 0.97 11988 0.98oc_des_des3area 120 8273 15588 8139 0.98 15472 0.99oc_des_des3perf 1569 62982 116312 59611 0.95 118028 1.01oc_des_perf_opt 550 17443 31548 17096 0.98 32476 1.03
oc_fpu 793 26570.1 64324 24583 0.93 63628 0.99oc_mem_ctrl 387 16501.5 38428 16808 1.02 39132 1.02
oc_mips 387 18550 43916 18639 1.00 43260 0.99oc_oc8051 331 13640 31172 13332 0.98 30520 0.98
oc_video_compression_systems_dct 4440 165835 423292 172902 1.04 436708 1.03oc_video_compression_systems_jpeg 3924 139076 370064 142234 1.02 368528 1.00
oc_wb_dma 396 25614 57128 24999 0.98 57704 1.01os_blowfish 168 27023 43832 22569 0.84 39804 0.91oc_ethernet 265 12013 24812 11935 0.99 25128 1.01
Avg. 0.97 0.98
Quartus 5.0 mPL-Hcircuit #LAB
mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0
UCLA VLSICAD LAB
Runtime ComparisonRuntime Comparison
0200400600800
10001200140016001800
111
141
157
181
191
265
387
396
646
1569
4440
#LAB
run
tim
e(s)
Quartus II 5.0 mPL-H
mPL-H can be 2X faster than Quartus II v5.0
when the circuit becomes sufficiently large
UCLA VLSICAD LAB
Optimality Study of mPL-HOptimality Study of mPL-H
PEKO-H constructionPEKO-H construction Populate all sites with Populate all sites with
corresponding resource typecorresponding resource type
Generate each net with Generate each net with optimal wirelengthoptimal wirelength
Extract the netlist in the endExtract the netlist in the end
UCLA VLSICAD LAB
Experiment Results with PEKO-HExperiment Results with PEKO-Hcircuit LAB M4K M512 DSP M-RAM OWL PWL runtime(s)
PEKO-H01 974 58 90 2 1 3151 3877 109PEKO-H02 849 43 75 4 1 1480 2302 206PEKO-H03 926 50 81 5 1 2042 3122 301PEKO-H04 1618 66 180 8 2 3895 6711 1227PEKO-H05 1690 72 186 10 2 4807 7085 363PEKO-H06 988 58 90 6 1 12099 14699 219PEKO-H07 957 53 85 6 1 2698 3454 101PEKO-H08 943 46 87 5 1 2779 3551 116PEKO-H09 1754 80 188 10 2 36593 42456 578PEKO-H10 988 58 90 6 1 11288 13126 163PEKO-H11 988 58 90 6 1 12410 14850 204PEKO-H12 988 58 90 6 1 7560 8745 184PEKO-H13 988 58 90 6 1 7894 8992 141PEKO-H14 986 56 90 6 1 6521 7836 162PEKO-H15 5550 288 564 18 6 49120 85280 1788PEKO-H16 5549 288 564 18 6 42123 66923 1656PEKO-H17 1730 78 188 10 2 7329 8598 380PEKO-H18 2241 109 210 5 2 4698 6788 1202PEKO-H19 986 56 90 6 1 5356 6029 154
Avg. 1.00 1.34
mPL-H produces HPWL 34% longer than the optima
UCLA VLSICAD LAB
Displacement of PEKO-H13Displacement of PEKO-H13
UCLA VLSICAD LAB
Displacement of PEKO-H16Displacement of PEKO-H16
Swirls are difficult for local refinement to recover
UCLA VLSICAD LAB
ConclusionsConclusions
First analytical work for heterogeneous placementFirst analytical work for heterogeneous placement
Compared to leading edge Quartus II v5.0 for StratixCompared to leading edge Quartus II v5.0 for Stratix 3 % shorter HPWL, 2 % shorter routed WL3 % shorter HPWL, 2 % shorter routed WL
Can be 2X faster when example becomes sufficiently largeCan be 2X faster when example becomes sufficiently large
Optimality study with PEKO-HOptimality study with PEKO-H Displacement observed from the optimaDisplacement observed from the optima
34% longer HPWL than the optima34% longer HPWL than the optima
UCLA VLSICAD LAB
Future WorkFuture Work
Accurate timing analysisAccurate timing analysis Only point-to-point delay table released Only point-to-point delay table released
• OK for overlap-free intermediate resultsOK for overlap-free intermediate results• Not accurate enough for analytical placerNot accurate enough for analytical placer
Guide timing-driven placementGuide timing-driven placement
Routing congestionRouting congestion Proprietary routing resource information not publicly availableProprietary routing resource information not publicly available
The EndThe EndThank You!Thank You!