Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012

Multi-objective Placement Optimization forHigh-performance Nanoscale Integrated Circuits

Igor L. Markov

August 20, 2012

A Traditional VLSI Design Flow

System Specification

Functional Design (HDL)

Logic Synthesis and Technology Mapping

Physical Design

Fabrication

Packaging and Testing

Chip

module foo(a,b,c,o1,o2);input a, b, c;output o1, o2;reg o2;assign o1 = a & b;always @(a, b, c)

o2= a | c; RTL…

Partitioning

Chip Floorplanning

Clock Network Synthesis

Detailed Routing

Placement

Global Routing

Placement

Global Routing

Timing Closure?

Timing Opt. Transforms

Global Placement: Motivation

■Interconnect lagging in performance while transistors continue scaling

− Circuit delay, power dissipation and areadominated by interconnect

− Routing quality highly controlled by placement

■Placement remains one of the most influential optimizations

− Attracted attention from both industry and academia • ISPD wirelength-driven placement contests [2005][2006]• ISPD and DAC routability-driven placement contests [2011][2012]

− A consistent WL improvement > 2% in placement is considered significant

Unloaded

Coupling

IR drop

RC delay

Prior Work in Interconnect-driven Analytical Placement

■ Ideal Placer

− Fast runtime without sacrificing solution quality

− Simplicity and easy integration with other optimization

Sp

eed

Solution Quality

Non-convex optimization

mFAR, Kraftwerk2, FastPlace3, RQL

Ideal placer

mPL6, APlace2, NTUPlace3

Quadratic and force-directed

The SimPL Family of Placement Algorithms

SimPLICCAD`10, TCAD`12

ComPLxDAC`12

Implementedby four groups

A common mathematical foundation

Five types ofextensions

SimPLRICCAD`11

RippleICCAD`11

MAPLEISPD`12

LopperISPD`11

PADÉDAC`12

SAPTISPD`12

NCTU@ASPDAC`12

RoutabilityClock-tree codesign and power optim’n

Multilevel optimizat’n

Datapathawareness

Thermalawareness

ComPLx : a Competitive Primal-Dual Lagrange Optimization [DAC 2012]

■Analysis and comparisons of placement algorithms have been mostly empirical, with little formal justification− Generalizes SimPL to handle arbitrary interconnect models− Illustrates how to add new constraints− Extends to macro and timing-driven placement

■A projected subgradient primal-dual Lagrange optimization for global placement

− Decomposes the original non-convex problem into “more convex” sub-problems

− Lends mathematical substantiation for placement algorithms derived from SimPL

Revised Formulation of Placement

■Given a netlist N and net weights wi,j

■Objective: Half-Perimeter Wirelength (wHPWL)■Sample intermediate objective:

quadratic approximation of HPWL

■Constraints in placement− Resource-type constraints:

Legality, target utilization, routability, etc.− Other constraints:

Region, alignment, power density, thermal

FeasibilityProjection

(Upper-bound)

LagrangianMultiplier

Update

InterconnectOptimization

(Lower-bound)

Converge? no

yes

ComPLx: Overall Flow

Placement Instance

Initial HPWLOptimization

Legalization

Detailed Placement

Can consider:Legality, routability, region, resource-type constraints

GlobalPlacementIterations

Unconstrained optimization

Can be modified to consider: timing/power-criticality

Review: Lagrangian Relaxation

■Given: optimization problem with constraints

a) Convert constraints to penalties

b) Add penalties to original objective–New variable for each penalty: Lagrange multiplier λ

c) Solving an unconstrained problem solves the orig. problem

Review: Projected Subgradient Methods

■Solve constrained optimization problem minimize f(x) where f : Rn → R

subject to x ϵ C where C R⊆ n

■Projected subgradient method iteratesx(k+1) = P (x(k) – ak g(k)),

P is a projection onto C (feasible solutions), and g(k) is a subgradient of f(x(k))

Converting Constraints to Penalties

■Challenge: Working with supply-demand inequalities directly is difficult because they are specified

algorithmically,not as closed-form expressions in (x, y)

■Our solution:Work with subgradients, pointing to a closest

C-feasible solution, found by a feasibility projection

We approximate the penalty term by L1-distance from (x, y) to a closest C-feasible solution

(when Φ represents HPWL, λ remains dimensionless)

Feasibility Projection in ComPLx

■Purpose: Approximating the penalty term allows one to

replace the nonconvex Lagrangian by a convex one

■We define the feasibility projection to find a closest C-feasible approximation (pseudo-legalization)

a. Feasibility projection is C-feasible

b. is Lipschitz continuous

must generally decrease, providing upper bounds on final placement cost

ComPLx: Feasibility Projection on adaptec1

Cells are spread over the region’s expanse avoiding obstacleswhile minimizing displacement for a given solution

ComPLx: Primal-dual Lagrangian Relaxation

■Alternates minimization over the primal variables (x, y) with maximization over the dual variable λ

■ subject to can be found by sequential unconstrained optimization

where

− Subsequent iterations increase the sensitivity of Lagrangian to the penalty

− The minimization of affects more, and the penalty decreases while increasesover iterations

ComPLx: Progression of Key Quantities

Unconstrained Optimization in ComPLx

■The minimization of Lagrangian− After finding C-feasible anchor locations− The simplified Lagrangian

can be minimized with respect to fixed andby solving for

A system of linear equation for quadratic

■The ComPLx framework re-solves and

until convergence

Convergence of ComPLx

■ Controlling Lagrange multipliers− for −

■ Convergence criteria− L1-distance to feasibility projection stops decreasing− Duality gap becomes small enough

ComPLx: Closing Gap between Two Bounds

■ Feasibility projection provides upper-bounds■ Legal solution is formed between two bounds

Upper-bounds found by feasibility projection

Lower-bounds found by minimization of Lagrangian

ComPLx Iterations on adaptec1 (1)Iteration=0 (Init WL Opt.) Iteration=1 (Upper Bound)

Iteration=2 (Lower Bound) Iteration=3 (Upper Bound)Fixed macros

ComPLx Iterations on adaptec1 (2)Iteration=11 (Upper Bound)

Iteration=20 (Lower Bound) Iteration=21 (Upper Bound)

Iteration=11 (Upper Bound)

Iteration=20 (Lower Bound) Iteration=21 (Upper Bound)

Iteration=10 (Lower Bound)

Fixed macros

ComPLx Iterations on adaptec1 (3)Iteration=31 (Upper Bound)Iteration=30 (Lower Bound)

Iteration=40 (Lower Bound) Iteration=41 (Upper Bound)Fixed macros

Motivation for Macro Placement

■The traditional “sea-of-gates” IC design style is being replaced by “sea-of-hard-macros” design style

− Reuse predesigned IP modules / macros− Reduce the design cost, deal with increasing complexity− Previously performed in floorplanning at designers’ discretion■The boundary between placement and floorplanning

is increasingly blurred

Courtesy of EE Times

ComPLx: Macro Placement by Macro Shredding (1)

■Observation− Feasibility projection largely preserves the relative placement− The array of cells are transformed into similar shapes

■Revised “macro shredding” : a one-stage approach for simultaneous standard-cell and macro placement− Macro cells are divided into equal-sized cells (shreds)

only for the feasibility projection− PC on macros is calculated by averaging PC locations of shreds− Linear systems remain unchanged (limiting complexity increases)

ComPLx: Macro Placement by Macro Shredding (2)

PC is applied to shreds

PC on macros is found by averaging PC

locations of shreds

Minimization of Lagrangian

ComPLx: Experiments on ISPD 2005 benchmarks

■10% faster than FastPlace, 2.8X and 7.2X faster than NTUPlace3 and mPL6, >2.3X Faster than RQL

Benchmarks size (# of modules)

Best published excluding MAPLE(as of Aug 2012)

ComPLx + FastPlace_DP (~50:50)single thread

HPWL HPWL Runtime (MIN.)

ADAPTEC1 211K 77.82 (RQL) 77.73 3.09

ADAPTEC2 255K 88.51 (RQL) 88.84 4.31

ADAPTEC3 452K 207.67 (SimPL) 203.55 10.75

ADAPTEC4 496K 186.80 (SimPL) 183.16 9.57

BIGBLUE1 278K 94.98 (RQL) 94.41 7.00

BIGBLUE2 558K 145.47(SimPL) 145.39 8.53

BIGBLUE3 1.10M 323.09 (RQL) 330.74 24.80

BIGBLUE4 2.18M 797.66 (RQL) 788.30 41.89

GEOMEAN 1.00 1.00

ComPLx: Experiments on ISPD 2006 benchmarks

■Scaled HPWL = HPWL * ( 1+ density_overflow_penalty)■Demonstrates fast convergence and strong spreading quality

Benchmarks (target_utilization)

NTUP3 MPL6 RQL ComPLx

ScaledHPWL

ScaledHPWL

ScaledHPWL

ScaledHPWL

ADAPTEC5 (0.5) 451.22 431.27 443.28 415.13

NEWBLUE1 (0.8) 62.65 68.08 64.43 64.75

NEWBLUE2 (0.9) 205.45 201.85 199.60 193.39

NEWBLUE3 (0.8) 277.87 284.11 269.33 273.42

NEWBLUE4 (0.5) 306.56 300.58 308.75 292.82

NEWBLUE5 (0.5) 509.71 537.14 537.49 507.85

NEWBLUE6 (0.8) 520.31 522.54 515.69 501.97

NEWBLUE7 (0.8) 1109.6 1084.4 1057.8 1041.4

GEOMEAN 1.04 1.04 1.03 1.00

Timing-driven Placement

■Extensions for timing- and power-driven placementtraditionally rely on net weights

− Weigh the nets with high activity factors / timing criticality

Timing-driven Placement

■Extensions for timing- and power-driven placementtraditionally rely on net weights

− Weigh the nets with high activity factors / timing criticality

Extending ComPLx to Routability-driven Placement

ComPLx SimPLR


(Upper-bound)


Update


(Lower-bound)

Converge? no

yes

ComPLx: Baseline Wirelength-driven Placement

Placement Instance


Legalization

Detailed Placement

consider:Legality , Target Utilization

GlobalPlacementIterations

Unconstrained optimization


(Upper-bound)


Update


(Lower-bound)

Converge? no

yes

ComPLx: Routability-driven Placement

Placement Instance


Legalization

Congestion-awareDetailed

Placement

consider:Legality, Target Utilization, Routability

GlobalPlacementIterations Congestion

Estimation on Upper-bound

Enables early routability prediction Placer can respond early and often

SimPLR Illustration (1)






Congestion-aware Detailed Placement: Illustration

After Global Placement Congestion-unaware DP Congestion-aware DP

Congestion Map Improvement due to SimPLR

Best in ISPD 2011 Contest SimPLR

SAPT Illustration W Alignment Constraints

Manual Placement Automated Placement

Skewed Netweighting +Anchor Alignment

Conclusions

SimPLICCAD`10, TCAD`12

ComPLxDAC`12

A common mathematical foundation

SimPLRICCAD`11

RippleICCAD`11

MAPLEISPD`12

LopperISPD`11

PADÉDAC`12

SAPTISPD`12

NCTU@ASPDAC`12

Routability Clock-tree codesign and power optim’n

Multilevel optimizat’n

Datapathawareness

Thermalawareness

As of Aug 2012, MAPLE is used at IBM as the default option for all ASIC and CPU designs

Thank you!

Relevant Publications

M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” ICCAD 2010, pp. 649-656.

M.-C. Kim*, J. Hu*, D.-J. Lee, and I. L. Markov, “A SimPLR method for Routability-driven Placement,” ICCAD 2011, pp. 67-73.

M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” IEEE TCAD 31(1): pp. 50-60, 2012.

M.-C. Kim, N. Viswanathan, C. J. Alpert, I. L. Markov and Shyam Ramji, “MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs,” ISPD 2012, pp. 193-200.

M.-C. Kim and I. L. Markov “ComPLx: A Competitive Primal-dual Lagrange Optimization for Global Placement,” Design Automation Conference (DAC) 2012.

S. Ward, M. –C. Kim, N. Viswanathan, Z. Li, C. J. Alpert, E. Swartzlander, D. Z. Pan,“Keep it Straight: Teaching Placement how to Better Handle Designs with Datapaths,” ISPD 2012, pp. 79-86.

SimPLR Empirical Results vs. SimPL and ISPD`11 Contest

■Overflow is reported by running a full-fledged global router

■Versus HPWL-driven placement:− Average of 3.81x better overflow (7 of 8 best)

at the cost of 4% routed wirelength

■Versus other routability-driven placers in the ISPD`11 Contest:− Average of 2.04x better overflow

(8 of 8 best) with 1% better routed wirelength

SimPLR Emprical Results : Ca-DP

■Versus HPWL-driven detailed placement:− Average of 18% better overflow (7 of 8 best)

at the cost of 1% in routed wirelength

SAPT Experimental Results: Hybrids

■ Hybrid designs integrates datapaths into larger netlist− A mixture of datapaths and random logic standard cells− We used industrial hybrid designs in IBM

■ We report a 5.8% improvement in total StWL compared to SimPL

Documents

Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012