25
A NetworkFlow Approach t o TimingDriven Increment al Placement for ASICs Shantanu Dutt, Huan Ren, Fenghua Yuan and Vishal Suthar Dept. of Electrical and Computer Engineering University of IllinoisChicago

A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

  • Upload
    arch

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

A Network­Flow Approach to Timing­Driven Incremental Placement for ASICs 􀀀. Shantanu Dutt, Huan Ren, Fenghua Yuan and Vishal Suthar Dept. of Electrical and Computer Engineering University of Illinois­Chicago. Outline. Motivation & prior work General methodology of FlowPlace - PowerPoint PPT Presentation

Citation preview

Page 1: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

A Network Flow Approach to Timing Driven IncrementalPlacement for ASICs 1048576

Shantanu Dutt Huan Ren Fenghua Yuan and Vishal Suthar

Dept of Electrical and Computer Engineering

University of Illinois Chicago

Outline

Motivation amp prior work General methodology of FlowPlace Net delay model TD analytical global placement TD network flow based detailed placer Benchmarks Experimental results Conclusions

Motivation Placement in high performance designs

Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time

Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi

al placement Minimize the affect to other metrics in initial placementmdashconver

gence is a byproduct Also important for ECO applications

Prior Work Existing timing driven placement

Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths

Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem

Net-based approach is the most common method Kahng et al (ISPDrsquo02)

Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net

All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow

Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets

with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement

Prior Work (cont)

Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)

Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)

Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation

Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p

art of the delay accurately

Incremental TD placement

Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results

Nw flow based detailed placement

i j

i j

n p limitn p

d d d

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 2: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Outline

Motivation amp prior work General methodology of FlowPlace Net delay model TD analytical global placement TD network flow based detailed placer Benchmarks Experimental results Conclusions

Motivation Placement in high performance designs

Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time

Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi

al placement Minimize the affect to other metrics in initial placementmdashconver

gence is a byproduct Also important for ECO applications

Prior Work Existing timing driven placement

Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths

Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem

Net-based approach is the most common method Kahng et al (ISPDrsquo02)

Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net

All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow

Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets

with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement

Prior Work (cont)

Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)

Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)

Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation

Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p

art of the delay accurately

Incremental TD placement

Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results

Nw flow based detailed placement

i j

i j

n p limitn p

d d d

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 3: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Motivation Placement in high performance designs

Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time

Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi

al placement Minimize the affect to other metrics in initial placementmdashconver

gence is a byproduct Also important for ECO applications

Prior Work Existing timing driven placement

Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths

Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem

Net-based approach is the most common method Kahng et al (ISPDrsquo02)

Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net

All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow

Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets

with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement

Prior Work (cont)

Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)

Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)

Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation

Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p

art of the delay accurately

Incremental TD placement

Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results

Nw flow based detailed placement

i j

i j

n p limitn p

d d d

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 4: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Prior Work Existing timing driven placement

Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths

Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem

Net-based approach is the most common method Kahng et al (ISPDrsquo02)

Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net

All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow

Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets

with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement

Prior Work (cont)

Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)

Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)

Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation

Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p

art of the delay accurately

Incremental TD placement

Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results

Nw flow based detailed placement

i j

i j

n p limitn p

d d d

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 5: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Prior Work (cont)

Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)

Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)

Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation

Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p

art of the delay accurately

Incremental TD placement

Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results

Nw flow based detailed placement

i j

i j

n p limitn p

d d d

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 6: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Our Goals amp Methodology Initial placed circuit

STA amp Determine critical node set (moveC)

TD nw-flow based detailedplacement (TIF) On moveC

TD analytical global placement (TAN)on moveC

New placement w improvedperformance

bull Goalsbull Accurate pre-route delay est

bull Targeted global amp detailed TD re-placement of critical amp near-critical paths

bull Minimal effect on the rest of the circuit

bull Fast

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 7: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

WL and Pre-Route Delay Model

( ) ( ) ( )i j

j i c i cu n

L n x x y y

3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C

ud (xd yd)up (xp yp)

uq (xq yq)

ui (xi yi) centroidC (xc yc)

Star graph model

ud (xd yd)

ui (xi yi)

up (xp yp)

uq (xq yq)

ld i2

ld i

Delay model

WL calculationWe use a star graph model to calculate WL

Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C

22 ( )

2i j d i d i g

rcD u n l rl C

Driver node driving load capacitance

Self interconnect delay

Self interconnect seeing other interconnect amp load capacitance

of Ctotal

(1-of Ctotal

Fidelity of our model The future model is still under development which modeling nets with multiple star structures

Best results for

Circuit Mac64 Matrix Vp2 Mac32 error

Routed delay 34 38 43 67 0

Our curr model 40 51 62 82 295

Multi-star model 35 49 52 73 155

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 8: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

TD Analytical Global Placement (TAN)

A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective

function

Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation

A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step

1 2 3( )

( ) ( ) ( ) ( )i j

c j j i iu critical n

D n D n D u D u

Net w 2 critical pathsthrough it

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 9: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

TD Analytical Global Placement (contd)

Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the

net and of nets in that path

Therefore we uniformly allocate path slack to each net the allocated slack of a net is

( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n

4 4 4

6 6

Before optimization two paths have the same delay

3 3 3

3 3

After optimization one is longer than the other

Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization

2 2 2

3 3

After optimization both paths have approx the same delay

Thus we can get

Equi-delaypaths

Net delay

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 10: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

TD Analytical Global Placement (contd)

( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n

F ( ( ) ( )) ( )j

c quad j c lin j a jn moveN

D n D n S n

Final objective function to solve min-max via min-sum The delay cost of a net

The objective function

The delay cost part is divided into quadratic and linear part

Quadratic terms Can be solved by normal quadratic programming technique

Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way

2

( )( )

( )i c

i ci c

x xx x

x x

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 11: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

TD NW-Flow Based Detailed Placement (TIF)General Purpose

1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints

General nw-flow graph

Source

A2

C11 C12 C13 C14

C21 C22 W21 C24

C31 C32 C33

A1

W2

W1

W3

Sink

Row1

Row2

Row3

Flow to legalize A1 position

C12 C13 C14C11C21

C22 C24 W2A1

Cell placement after cells are moved in the flow direction

bull Arc cost = TD cost linear amp step functbull Arc capacity

bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row

ST

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 12: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Arc Cost in TIF Sensitivity based cost

We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved

Arc cost formulation For a cell we find the most critical nets (belong to path with smallest

slack) connected to it the unit flow cost of the arcs from the cell is

Delay model

ud

ui

up

uq

of Crsquototal

(1-of Crsquototal

ursquoi

ldi

lrsquodi

1

3

( )

( ) ( 2)(1 2)( )

j d d i

b j d i d i

D n R c l

D n r l c l

If ui is the critical sink or driver

2

3

( )

( ) ( 2)(1 2)( ( ) ( 2) )

j d i d i d i g

a j d i j g

D n rcl l r l C

D n r l c L n k C

Otherwise

2 3( ) ( ) 0j a jD n D n

_

1( ) ( ( ))

( ) ( )j

jn critical nets j

cost e D nS n cap e

From experiments gives best results

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 13: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities

are introduced in it by using a continuous optimization method There are two

major problems I Discrete flow requirement in vertical arcs

The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)

flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems

Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled

v

w u x

u

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

disp(v)=2

disp(w)=3v

overlap

w(v)=5

w x

u v

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 14: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Tackling Illegalities in TIF (contd)

Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent

horizontal arc are updated to 0

Step1

Full cost is incurred

Step2Final placement

w u x

v

(7 c3) (7 c2)

(1 full-cost)f1=1

w(v)=7

w(v)=5w u x

v

(7 c3) (7 c2)

f1=1

w(v)=7

w(v)=5

(40)

f2=4

w u x

v

(inf0)

(inf0)

f1=1

w(v)=7

w(v)=5

(40)

f2=4 u v

w x

disp(u)=5

disp(w)=5

disp(v)=5

Encourage flow to keep going through arc

Step3

Horiz arc costupdated

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 15: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Tackling illegalities in TIF (contd)II Split flows

This occurs when there are flows on both upward and downward arcs

Two heuristics to solve the problem The two split flow will go through the tree

structure to the sink There are two heuristic

1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller

flow cost looking at the first k levels

C21 C22

C31 C32

A1

(5c1)

(5c2)f1=2

f2=3

Our experiment shows Max flow heuristic does better

C21 C22

C31 C32

helliphellip

hellip

f1

f2

C12

C23

C33

Tree1

Tree2

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 16: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Satisfying White Space Constraints

Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process

Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration

Non-discrete flow

w u x

v

(7 c3) (7 c2)

(5 c1)

w(v)=7

f1=2

f2=3disp(w)=5

disp(u)=5

w(v)=5

w

u vWS=3 WS=-2

WS violation

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 17: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Satisfying WS Constraints (contd) Dynamic WS constraint monitoring

Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g

oes to 0 Monitored by top and bottom viol guards Gb and Gt

If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis

allowing reverse movement

W=3

W=7

W=9

W=4Min-costflow

Min-costflow

Gb = 0

Gt = 0

Full row

4

-5

Net viol = 0 4 -1

Violated row

S

Otherwise

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 18: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Global Network Flow Global flow network gives a global view of generally how

flows will go With the global flow we can eliminate detailed-flow arcs that are

not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus

reducing time without obvious improvement deterioration

Row i-1

Row i

Row i+1

A2

A1

Sink

(w(A2)0) (violi 0)

violated row

(w(Wi+1) Ci+1) (w(R

) C

i+1

i))

Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row

Ci+1I is the weighted average of the detailed vertical arc costs between two rows

65 runtime reduction at the cost of 1-2 timing deterioration

Global nw flow

Detailed nw flow

Physical flow interpretation

All new cells placed amp all viol fixed

No

YesEnd

TIFrsquos High-level Flow

(on inducednetwork)

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 19: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago

n The Ibm and Faraday are originally not timing benchmarks we

generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a

nd break all cycles with minimum of FFs The average percentage of FFs is 13

Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters

Benchmark Characteristics

Ibm Faraday TD-Dragon

of cells 12506-210341 11734-32622 3093-25616

of nets 13636-201640 11815-33186 3200-26017

critical path length 21-220 16-25 20-60

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 20: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost

ibm03 331 303 259 273

ibm09 121 84 36 49

Ibm08 307 284 202 241

ibm15 218 188 127 140

ibm18 369 341 290 311

Dsp1 251 213 167 171

Risc1 246 222 197 199

TDMatrix 79 40 05 22

TDMac32 121 77 36 59

TDmac64 139 100 69 78

GeomArith Avg

195 218 1523 185 88 139 117 154

bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 21: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Final Results

Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)

Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)

197

242

206

243

45

37

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 22: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)

Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)

Delay improvement on TD-Dragon placement for different WS constraints

0

2

4

6

8

10

12

matri x vp2 mac32 mac64 avg TAN

3510TAN

bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)

Final Results (contd)

82120

196241

62

102

3845

40

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 23: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Empirical Asymptotic Time Complexity

y = 0 0057x + 266 51

0200400600800

1000120014001600

0 50000 100000 150000 200000 250000

cel l

runt

ime

Linear curve best fits data

y = 0 6857x + 51 48

0200400600800

10001200140016001800

0 500 1000 1500 2000 2500movabl e cel l s

runt

ime

Linear curve best fits data

bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 24: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Conclusions Proposed a TD incremental placement flow FlowPlace

Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement

sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph

Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi

ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime

TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml

Concepts can be extended to timing and power optimization with constraints and physical re-synthesis

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25
Page 25: A NetworkFlow Approach to TimingDriven Incremental Placement for ASICs

Satisfying white space constraints Dynamic WS constraint monitoring

We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow

initial flow on vertical arc the total cell width is moved target row

fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati

on under constraintsW=5

Sink

vio_top=0

WS=2

vio_top=3

vio_bot=0

u

v

xWS=-3

WS=-2

W=7

W=5

vio_top=3

vio_bot=2

uvWS=-5

WS=5

W=5 vio_top=0

vio_bot=0

u

vWS=0

WS=0

Viol_max=max cell width Violation from above and bellow are calculated separately

Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement

  • A NetworkshyFlow Approach to TimingshyDriven Incremental Placement for ASICs 1048576
  • Outline
  • Motivation
  • Prior Work
  • Prior Work (cont)
  • Our Goals amp Methodology
  • WL and Pre-Route Delay Model
  • TD Analytical Global Placement (TAN)
  • TD Analytical Global Placement (contd)
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Empirical Asymptotic Time Complexity
  • Conclusions
  • Slide 25