Upload
myra-flowers
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Multi-objective Placement Optimization forHigh-performance Nanoscale Integrated Circuits
Igor L. Markov
August 20, 2012
A Traditional VLSI Design Flow
System Specification
Functional Design (HDL)
Logic Synthesis and Technology Mapping
Physical Design
Fabrication
Packaging and Testing
Chip
module foo(a,b,c,o1,o2);input a, b, c;output o1, o2;reg o2;assign o1 = a & b;always @(a, b, c)
o2= a | c; RTL…
Partitioning
Chip Floorplanning
Clock Network Synthesis
Detailed Routing
Placement
Global Routing
Placement
Global Routing
Timing Closure?
Timing Opt. Transforms
Global Placement: Motivation
■Interconnect lagging in performance while transistors continue scaling
− Circuit delay, power dissipation and areadominated by interconnect
− Routing quality highly controlled by placement
■Placement remains one of the most influential optimizations
− Attracted attention from both industry and academia • ISPD wirelength-driven placement contests [2005][2006]• ISPD and DAC routability-driven placement contests [2011][2012]
− A consistent WL improvement > 2% in placement is considered significant
Unloaded
Coupling
IR drop
RC delay
Prior Work in Interconnect-driven Analytical Placement
■ Ideal Placer
− Fast runtime without sacrificing solution quality
− Simplicity and easy integration with other optimization
Sp
eed
Solution Quality
Non-convex optimization
mFAR, Kraftwerk2, FastPlace3, RQL
Ideal placer
mPL6, APlace2, NTUPlace3
Quadratic and force-directed
The SimPL Family of Placement Algorithms
SimPLICCAD`10, TCAD`12
ComPLxDAC`12
Implementedby four groups
A common mathematical foundation
Five types ofextensions
SimPLRICCAD`11
RippleICCAD`11
MAPLEISPD`12
LopperISPD`11
PADÉDAC`12
SAPTISPD`12
NCTU@ASPDAC`12
RoutabilityClock-tree codesign and power optim’n
Multilevel optimizat’n
Datapathawareness
Thermalawareness
ComPLx : a Competitive Primal-Dual Lagrange Optimization [DAC 2012]
■Analysis and comparisons of placement algorithms have been mostly empirical, with little formal justification− Generalizes SimPL to handle arbitrary interconnect models− Illustrates how to add new constraints− Extends to macro and timing-driven placement
■A projected subgradient primal-dual Lagrange optimization for global placement
− Decomposes the original non-convex problem into “more convex” sub-problems
− Lends mathematical substantiation for placement algorithms derived from SimPL
Revised Formulation of Placement
■Given a netlist N and net weights wi,j
■Objective: Half-Perimeter Wirelength (wHPWL)■Sample intermediate objective:
quadratic approximation of HPWL
■Constraints in placement− Resource-type constraints:
Legality, target utilization, routability, etc.− Other constraints:
Region, alignment, power density, thermal
FeasibilityProjection
(Upper-bound)
LagrangianMultiplier
Update
InterconnectOptimization
(Lower-bound)
Converge? no
yes
ComPLx: Overall Flow
Placement Instance
Initial HPWLOptimization
Legalization
Detailed Placement
Can consider:Legality, routability, region, resource-type constraints
GlobalPlacementIterations
Unconstrained optimization
Can be modified to consider: timing/power-criticality
Review: Lagrangian Relaxation
■Given: optimization problem with constraints
a) Convert constraints to penalties
b) Add penalties to original objective–New variable for each penalty: Lagrange multiplier λ
c) Solving an unconstrained problem solves the orig. problem
Review: Projected Subgradient Methods
■Solve constrained optimization problem minimize f(x) where f : Rn → R
subject to x ϵ C where C R⊆ n
■Projected subgradient method iteratesx(k+1) = P (x(k) – ak g(k)),
P is a projection onto C (feasible solutions), and g(k) is a subgradient of f(x(k))
Converting Constraints to Penalties
■Challenge: Working with supply-demand inequalities directly is difficult because they are specified
algorithmically,not as closed-form expressions in (x, y)
■Our solution:Work with subgradients, pointing to a closest
C-feasible solution, found by a feasibility projection
We approximate the penalty term by L1-distance from (x, y) to a closest C-feasible solution
(when Φ represents HPWL, λ remains dimensionless)
Feasibility Projection in ComPLx
■Purpose: Approximating the penalty term allows one to
replace the nonconvex Lagrangian by a convex one
■We define the feasibility projection to find a closest C-feasible approximation (pseudo-legalization)
a. Feasibility projection is C-feasible
b. is Lipschitz continuous
must generally decrease, providing upper bounds on final placement cost
ComPLx: Feasibility Projection on adaptec1
Cells are spread over the region’s expanse avoiding obstacleswhile minimizing displacement for a given solution
ComPLx: Primal-dual Lagrangian Relaxation
■Alternates minimization over the primal variables (x, y) with maximization over the dual variable λ
■ subject to can be found by sequential unconstrained optimization
where
− Subsequent iterations increase the sensitivity of Lagrangian to the penalty
− The minimization of affects more, and the penalty decreases while increasesover iterations
ComPLx: Progression of Key Quantities
Unconstrained Optimization in ComPLx
■The minimization of Lagrangian− After finding C-feasible anchor locations− The simplified Lagrangian
can be minimized with respect to fixed andby solving for
A system of linear equation for quadratic
■The ComPLx framework re-solves and
until convergence
Convergence of ComPLx
■ Controlling Lagrange multipliers− for −
■ Convergence criteria− L1-distance to feasibility projection stops decreasing− Duality gap becomes small enough
ComPLx: Closing Gap between Two Bounds
■ Feasibility projection provides upper-bounds■ Legal solution is formed between two bounds
Upper-bounds found by feasibility projection
Lower-bounds found by minimization of Lagrangian
ComPLx Iterations on adaptec1 (1)Iteration=0 (Init WL Opt.) Iteration=1 (Upper Bound)
Iteration=2 (Lower Bound) Iteration=3 (Upper Bound)Fixed macros
ComPLx Iterations on adaptec1 (2)Iteration=11 (Upper Bound)
Iteration=20 (Lower Bound) Iteration=21 (Upper Bound)
Iteration=11 (Upper Bound)
Iteration=20 (Lower Bound) Iteration=21 (Upper Bound)
Iteration=10 (Lower Bound)
Fixed macros
ComPLx Iterations on adaptec1 (3)Iteration=31 (Upper Bound)Iteration=30 (Lower Bound)
Iteration=40 (Lower Bound) Iteration=41 (Upper Bound)Fixed macros
Motivation for Macro Placement
■The traditional “sea-of-gates” IC design style is being replaced by “sea-of-hard-macros” design style
− Reuse predesigned IP modules / macros− Reduce the design cost, deal with increasing complexity− Previously performed in floorplanning at designers’ discretion■The boundary between placement and floorplanning
is increasingly blurred
Courtesy of EE Times
ComPLx: Macro Placement by Macro Shredding (1)
■Observation− Feasibility projection largely preserves the relative placement− The array of cells are transformed into similar shapes
■Revised “macro shredding” : a one-stage approach for simultaneous standard-cell and macro placement− Macro cells are divided into equal-sized cells (shreds)
only for the feasibility projection− PC on macros is calculated by averaging PC locations of shreds− Linear systems remain unchanged (limiting complexity increases)
ComPLx: Macro Placement by Macro Shredding (2)
PC is applied to shreds
PC on macros is found by averaging PC
locations of shreds
Minimization of Lagrangian
ComPLx: Experiments on ISPD 2005 benchmarks
■10% faster than FastPlace, 2.8X and 7.2X faster than NTUPlace3 and mPL6, >2.3X Faster than RQL
Benchmarks size (# of modules)
Best published excluding MAPLE(as of Aug 2012)
ComPLx + FastPlace_DP (~50:50)single thread
HPWL HPWL Runtime (MIN.)
ADAPTEC1 211K 77.82 (RQL) 77.73 3.09
ADAPTEC2 255K 88.51 (RQL) 88.84 4.31
ADAPTEC3 452K 207.67 (SimPL) 203.55 10.75
ADAPTEC4 496K 186.80 (SimPL) 183.16 9.57
BIGBLUE1 278K 94.98 (RQL) 94.41 7.00
BIGBLUE2 558K 145.47(SimPL) 145.39 8.53
BIGBLUE3 1.10M 323.09 (RQL) 330.74 24.80
BIGBLUE4 2.18M 797.66 (RQL) 788.30 41.89
GEOMEAN 1.00 1.00
ComPLx: Experiments on ISPD 2006 benchmarks
■Scaled HPWL = HPWL * ( 1+ density_overflow_penalty)■Demonstrates fast convergence and strong spreading quality
Benchmarks (target_utilization)
NTUP3 MPL6 RQL ComPLx
ScaledHPWL
ScaledHPWL
ScaledHPWL
ScaledHPWL
ADAPTEC5 (0.5) 451.22 431.27 443.28 415.13
NEWBLUE1 (0.8) 62.65 68.08 64.43 64.75
NEWBLUE2 (0.9) 205.45 201.85 199.60 193.39
NEWBLUE3 (0.8) 277.87 284.11 269.33 273.42
NEWBLUE4 (0.5) 306.56 300.58 308.75 292.82
NEWBLUE5 (0.5) 509.71 537.14 537.49 507.85
NEWBLUE6 (0.8) 520.31 522.54 515.69 501.97
NEWBLUE7 (0.8) 1109.6 1084.4 1057.8 1041.4
GEOMEAN 1.04 1.04 1.03 1.00
Timing-driven Placement
■Extensions for timing- and power-driven placementtraditionally rely on net weights
− Weigh the nets with high activity factors / timing criticality
Timing-driven Placement
■Extensions for timing- and power-driven placementtraditionally rely on net weights
− Weigh the nets with high activity factors / timing criticality
Extending ComPLx to Routability-driven Placement
ComPLx SimPLR
FeasibilityProjection
(Upper-bound)
LagrangianMultiplier
Update
InterconnectOptimization
(Lower-bound)
Converge? no
yes
ComPLx: Baseline Wirelength-driven Placement
Placement Instance
Initial HPWLOptimization
Legalization
Detailed Placement
consider:Legality , Target Utilization
GlobalPlacementIterations
Unconstrained optimization
FeasibilityProjection
(Upper-bound)
LagrangianMultiplier
Update
InterconnectOptimization
(Lower-bound)
Converge? no
yes
ComPLx: Routability-driven Placement
Placement Instance
Initial HPWLOptimization
Legalization
Congestion-awareDetailed
Placement
consider:Legality, Target Utilization, Routability
GlobalPlacementIterations Congestion
Estimation on Upper-bound
Enables early routability prediction Placer can respond early and often
SimPLR Illustration (1)
SimPLR Illustration (2)
SimPLR Illustration (3)
SimPLR Illustration (4)
SimPLR Illustration (5)
SimPLR Illustration (6)
Congestion-aware Detailed Placement: Illustration
After Global Placement Congestion-unaware DP Congestion-aware DP
Congestion Map Improvement due to SimPLR
Best in ISPD 2011 Contest SimPLR
SAPT Illustration W Alignment Constraints
Manual Placement Automated Placement
Skewed Netweighting +Anchor Alignment
Conclusions
SimPLICCAD`10, TCAD`12
ComPLxDAC`12
A common mathematical foundation
SimPLRICCAD`11
RippleICCAD`11
MAPLEISPD`12
LopperISPD`11
PADÉDAC`12
SAPTISPD`12
NCTU@ASPDAC`12
Routability Clock-tree codesign and power optim’n
Multilevel optimizat’n
Datapathawareness
Thermalawareness
As of Aug 2012, MAPLE is used at IBM as the default option for all ASIC and CPU designs
Thank you!
Relevant Publications
M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” ICCAD 2010, pp. 649-656.
M.-C. Kim*, J. Hu*, D.-J. Lee, and I. L. Markov, “A SimPLR method for Routability-driven Placement,” ICCAD 2011, pp. 67-73.
M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” IEEE TCAD 31(1): pp. 50-60, 2012.
M.-C. Kim, N. Viswanathan, C. J. Alpert, I. L. Markov and Shyam Ramji, “MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs,” ISPD 2012, pp. 193-200.
M.-C. Kim and I. L. Markov “ComPLx: A Competitive Primal-dual Lagrange Optimization for Global Placement,” Design Automation Conference (DAC) 2012.
S. Ward, M. –C. Kim, N. Viswanathan, Z. Li, C. J. Alpert, E. Swartzlander, D. Z. Pan,“Keep it Straight: Teaching Placement how to Better Handle Designs with Datapaths,” ISPD 2012, pp. 79-86.
SimPLR Empirical Results vs. SimPL and ISPD`11 Contest
■Overflow is reported by running a full-fledged global router
■Versus HPWL-driven placement:− Average of 3.81x better overflow (7 of 8 best)
at the cost of 4% routed wirelength
■Versus other routability-driven placers in the ISPD`11 Contest:− Average of 2.04x better overflow
(8 of 8 best) with 1% better routed wirelength
SimPLR Emprical Results : Ca-DP
■Versus HPWL-driven detailed placement:− Average of 18% better overflow (7 of 8 best)
at the cost of 1% in routed wirelength
SAPT Experimental Results: Hybrids
■ Hybrid designs integrates datapaths into larger netlist− A mixture of datapaths and random logic standard cells− We used industrial hybrid designs in IBM
■ We report a 5.8% improvement in total StWL compared to SimPL