1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 6: Interconnect power Avinoam...

Preview:

Citation preview

1

Modeling and Optimization of VLSI Interconnect049031

Lecture 6: Interconnect power

Avinoam KolodnyKonstantin Moiseev

2

Outline Interconnect power modeling

Definition Activity factor (AF) and signal probability (SP) and relations between them Cross-coupling power. Miller Coupling Factor for timing and power Relation between MCF and AF AF and SP generation

Interconnect power breakdown Interconnect length distribution Local and global interconnects and their power Clock power Interconnect power of total power

Interconnect power prediction Interconnect length prediction

Rent’s rule Donath’s model

Fanout prediction

3

1 Google search = ?

Same energy as 11-watt light bulb for an 1 hr Emit 7gr CO2

There are 0.4B Google searches daily

Adopted from Muhammad Abozaed, Intel

4

So why power is important? Mobile – battery life

Reliability - Power density

User experience – skin temperature

Servers – cooling costs, environmental heating

0.1

1

10

100

1000

1970 1980 1990 2000 2010 2020

Power(Watts)

1000's ofWatts?

8080

8086 386

Pentium® proc

Pentium® 4 proc

5

Electrical Energy

Energy is defined as the ability to do work Electrical energy is energy stored in an electric field or

transported by an electric current Electrical energy can be:

Dissipated as heat by an electric current flowing through resistor

Stored in a capacitor Transformed to magnetic field energy

The work performed by current on section with voltage difference during time is:

IV T

0

( ) ( )T

E I t V t dt

6

Power

Power is work performed per unit time Measured in Watts

In VLSI, the power is usually either consumed or dissipated

Consumed from the source Dissipated by resistors (converted to heat)

The average power dissipation by current with voltage difference during time is:

0

1( ) ( )

T

P I t V t dtT

IV T

7

Power dissipation sources

Dynamic power

Dynamic power Static powerStatic power Short-circuit

powerShort-circuit

power

Power dissipation

Power dissipation

8

Energy dissipation in RC circuit

First stage – charging capacitor:

VDD

R

C

I

Vc

VR

• Capacitor current:

• Energy stored in the capacitor:

• Energy dissipated by the source:

• Energy dissipated by the resistor (converted to heat):

CC

dVI C

dt

0 0

2

0

( ) ( ) ( )

( )2

DD

T TC

C c c

VDD

c C

dVE V t I t dt C V t dt

dt

CVC V t dV

2

0 0

( )DDVT

S DD DD C DDE V I t dt CV dV CV 2

2R S CDDCV

E E E

Assumption: ( )C DDV t T V

9

Energy dissipation in RC circuit

Second stage – discharging capacitor:

VDD

R

C

I

Vc

VR

• Capacitor current:

• Energy freed by the capacitor:

• Energy dissipated by the source:

• Energy dissipated by the resistor (converted to heat):

CC

dVI C

dt

0 0

0 2

( ) ( ) ( )

( )2

DD

T TC

C c c

DDc C

V

dVE V t I t dt C V t dt

dt

CVC V t dV

0

( ) 0T

S DDE V I t dt 2

2D

R CDCV

E E

Assumption: ( ) 0CV t T

10

Dynamic power dissipation in VLSI

So, for two capacitor switches (charge and discharge), the energy dissipated is CVDD

2

For two switches of signal during time T (clock period), the average power dissipation is

If the signal switches times in average during time T, then the average power dissipation is

22DD

DD

CVP CV f

T

2DDP CV f

2

is called activity factor

11

Dynamic power contributors Dynamic power dissipation:

The capacitance is contributed by three elements:

Self-capacitance and cross-coupling capacitance

2DDP CV f

Layer 1

Layer 2

Layer 3

Cupper

Clower

CsideCside

lower upper side,1 side,2C =C +C +C +C

Area and fringe

capacitance

Coupling capacitance

2area+fringe coupling DDP=α C +C V f

12

For quiet neighbors (tied to VDD or ground)

For switching neighbors the capacitance will depend on switching direction Power calculation by equivalent circuit method Power calculation by application of Miller’s theorem

Coupling capacitance calculation

Coupling capacitance value depends on neighbor wires

L T

S

sideC

S T

L

13

Equivalent circuit method

Equivalent circuit for two coupled lines:

Simplest case – wire is switched from 0 to VDD; neighbor is quite and tied to ground, R1=R2

Energy dissipated by each resistor (wire) in this case is

Total energy dissipated is

R1

R2

V1

V2

Cc

R1

R2

CcVDD

2

4c DDC V

E 2

2c DDC V

E

14

Equivalent circuit method

For all cases of one quite wire and one switched wire the same results as in previous slide are obtained

Second case – both wires are switched simultaneously from 0 to VDD

The current through resistors is

( is voltage on the capacitor)

No power dissipation in this case!

R1

R2

VDD

R1

R2

VDD

Cc

Cc

Before

After

1 2

0cCV

IR R

cCV

15

R1

R2

VDD

R1

R2

VDD

Cc

Cc

Equivalent circuit method

Third case – both wires are switched simultaneously in opposite directions

Current in the circuit:

Energy consumed by the second source is zero (voltage of source is zero)

Energy consumed by the first source:

No energy change of the capacitor It means all the energy is dissipated

by resistors Each resistor dissipates , totally

Before

After

1 2

cDD CV VI

R R

( is the capacitor voltage)

cCV

22 DD CE V C

2C DDC V 22 C DDC V

16

Miller’s theorem

Z

Vx Vy

Z1 Z2

Z is impedance

1 (1 )V

ZZ

A

2 1(1 )V

ZZ

A

yV

x

VA

V

17

Usage of Miller’s theorem for coupling capacitance and power calculations

VX Vy

0

VDD

0

CC

VX Vy

0

VDD

0

CC

0VA 1 2 0Z Z Z 2

2C DD

total

C VP

VX Vy

0

VDD

CC

VDD

VX Vy

0

VDD

CC

VDD

VX Vy

0

VDD

0

VDD

disconnected

1VA 1 2Z Z 0totalP

VA 1 20Z Z Z 2

2C DD

total

C VP

VX Vy

0

VDD

0

VDD

2CC 2CCVX Vy

0

VDD

1VA 1 2 2

ZZ Z 22total C DDP C V

VX Vy

0

VDD

CC

0

VDD

18

Observations

Miller’s theorem gives the same results for total power dissipation as equivalent circuit method, however, the results for each wire power dissipation are inaccurate

Total power dissipation calculated by using of both methods is follows:

For one-wire switch – power dissipation is

For simultaneous switch in the same direction – there is no power dissipation

For simultaneous switch in opposite directions:

2

2C DD

total

C VP

22total C DDP C V

19

Miller factor for power

Miller factor is used in order to account effects of changing coupling capacitance due to switching

Nominal coupling capacitance is multiplied by Miller Coupling Factor (MCF) in order to obtain real capacitance:

For one-wire switching, MCF = 1

For switching in the same direction, MCF = 0

For switching in opposite directions, MCF = 4

2( )

2area fringe coupling DDC C V

P

2

2area fringe DDC V

P

222

2area fringe DD

coupling DD

C VP C V

20

Recall: MCF for delay

Z

Zx Zy

Vx Vy

Vx Vy

1y

kZZ

k

y

x

Vk

V

1x

ZZ

k

21

Activity factor Activity Factor (AF) ( a.k.a toggle rate) is an average fraction of cycles

in which signal changes from 0 to 1 or from 1 to 0, as compared to clock signal

Clock toggles twice a cycle, so its AF = 1 Combinational logic data signal normally will have maximum AF = 0.5

Domino signal can have AF = 1

Is it possible for signal to have AF > 1? Yes, because of glitches

#signal_toggles_in_ 2 N_cycles

2AF

N

clk

dataout

clk

out

Dominod1 outclk

outd2

clk

22

Signal probability

Signal probability (SP) is an average fraction of cycles in which signal has logic value of “1”

CLK SP = 0.5

1

0

SP = 1

1

0

SP ≈ 1

23

Relation between MCF and AF

Assume two neighbor uncorrelated signals make and

transitions during clock cycles It can be shown that number of simultaneous transitions

of the signals is negligible no more than 4 Therefore, energy dissipated by cross capacitance

between signals is

The power dissipated during cycles is:

For the same reason, it is usually assumed that MCF=1 for uncorrelated signals

1N

2N N

21 2

1

2 x ddE N N C V

N

2

1 2 21 22

212 2 2 x

x ddx dd

cycle cyc edd

l

N N C V N NEP C V f

N t N t N NC V f

24

Activity Factors Generation

Power test vectors generation(worst case for high power, unit stressing)

RTL full-chip simulation(results in blocks primary inputs: Activity,Probability)

Monte-Carlo based block inputs generation(based on the RTL statistics)

Transistor level simulation - per block(Unit delay, tuning for glitches)

Per node activity factorSource -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003

25

Interconnect power breakdown

case study

26

Low-power, state-of-the-art μ-processor Dynamic switching power analysis Interconnect attributes:

Length Capacitance Fan Out (FO) Hierarchy data Net type Activity factors (AF) Miscellaneous.

Case study

27

Power Estimation accuracy

Simulated activity density

IREM measurement

Source -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003

28

Interconnect Length Distribution

Source: Shekhar Y. Borkar, CRL - Intel

0.001

0.01

0.1

1

10

100

1000

10000

1 10 100 1000 10000 100000

Net Length [um]

Nu

mb

er o

f n

ets

Pentium® 0.5 [um]Pentium® MMX 0.35 [um]Pentium® Pro 0.5 [um]Pentium® II 0.35 [um]Pentium® II 0.25 [um]Pentium® III 0.18 [um]Low Power Processor 0.13 [um]

29

0.001

0.01

0.1

1

10

100

1000

1 10 100 1000 10000 100000

Length [um]

Num

ber

of N

ets

Total

0.001

0.01

0.1

1

10

100

1000

1 10 100 1000 10000 100000

Length [um]

Num

ber

of N

ets

Local

Global

Total

Interconnect Length Distribution

• Log – Log

scale

• Exponential

decrease with

length

• Global clock –

not included

Nets vs. Net Length

30

Total Dynamic Power

Total Dynamic

Power

Global clock –

not included

Local

nets = 66%

Global

nets = 34%

Total Power vs. Net Length

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000Length [um]

Nor

mal

ized

Dyn

amic

Pow

er

Total

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000Length [um]

Nor

mal

ized

Dyn

amic

Pow

erLocal

Global

Total

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000Length [um]

Nor

mal

ized

Dyn

amic

Pow

er

Interconnect

Total

Peak 1

Nets: 390kCap: 10[nF]FO: 2AF: 0.0485

Peak 2

Nets: 75kCap: 20[nF]FO: 20AF: 0.055

31

Local and Global Interconnect

Local and Global IC are different:

Number by Length breakdown

IC breakdown –cap and power

Fan out Metal usage AF is similar

0%

20%

40%

60%

80%

100%

4.16 8.32 16.64 32.864 65.728 131.456 262.496 523.744 1044.99 2084.99 4160 8300.45 16561.4 33930 83850

Length [um]

Po

wer

[ uw

]

IC

Diff

Gate

0%

20%

40%

60%

80%

100%

4.16 8.32 16.64 32.864 65.728 131.456 262.496 523.744 1044.99 2084.99 4160 8300.45 16561.4 33930 83850

Length [um]

Po

we

r

[ uw

]

IC

Diff

Gate

Local Power breakdown vs. Net Length

Global Power breakdown vs. Net Length

32

local clock20%global clock

19%

local signals27%

global signals

34%global clock

13%

global signals

21%local signals

37%

local clock29%

Interconnect power(Interconnect only)

Power Breakdown by Net Types

Total power(Gate, Diffusion and Interconnect)

Global clock included

33

Interconnect Length Prediction

Technology projections - ITRS Interconnect length predictions:

ITRS model: 1/3 of the routing space Davis model:

o Rent’s rule based

o Predicts number of nets as function of:the number of gates and complexity factors

• Models calibrated based on the case study

?

Time

34

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.15 0.13 0.1 0.09 0.08 0.07 0.065 0.045 0.032 0.022

Generation

% G POW

% D POW

% IC POW

Future of Interconnect Power

(using optimistic interconnect scaling)

Dynamic Power breakdown

Interconnect

Diffusion

Gate

Technology generation [μm]Source - ITRS 2001 Edition adapted data

Interconnect power grows to 65%-80% within 5 years !

35

0.00001

0.0001

0.001

0.01

0.1

1

1 10 100 1000 10000 100000

Length [um]

Nu

mb

er

of

Ne

ts

Measured

model

Interconnect Power Prediction

The number of nets vs. unit length – Modified Davis model

The dynamic power average breakdown

Interconnect length projection

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Local Intermediate Global

Po

wer

Interconnect

Diff

Gate

Dynamic power breakdown

Interconnect

Diffusion

Gate

Local Intermediate Global

Upper local bound

Lower global bound

Nu

mb

er o

f N

ets

(no

rmal

ized

)P

ow

er

100

00.001

0.01

0.1

1

10

36

Interconnect Power Model

Multiplication of the number of interconnects with power breakdowns gives:

Projected dynamic power vs. net length

0

1

2

3

4

5

6

1 10 100 1000 10000 100000

Length [um]

Po

wer

Measured power

Projection

The power model matches processor power distribution !

Po

wer

(n

orm

aliz

ed)

Length [μm]

37

Experiment - Power-Aware Router Routing Experiment optimizing processor’s blocks

Local nodes (clock and signals) consume 66% of dynamic

power

10% of nets consume 90% of power

Min. spanning trees can save over 20% Interconnect power

Routing with spacing can save up to 40% Interconnect power

Small block’s local clock network

38

Power-Aware Router Flow

Power grid routing

Clock tree routingWith spacing

Global and Detailed Routing -of the un-routed nets

(timing and congestion driven)

All netsrouted?

Power-aware Rip upand re-route

No

Yes

Finish

Top n% power consumingsignal nets routing

Clock tree: high FO, long lines, very active

Rip-up: not high power nets

Avoiding congestion

Followed by downsizing

39

0%

10%

20%

30%

40%

50%

60%

Block A Block B Block C Block D Block E

Dyn

amic

pow

er s

avin

g

Driver Downsizing

Router Power Saving

Results - Power Saving

Average saving results: 14.3% for ASIC blocks 1

Downsize saving

Router saving

Average

1 - Estimated based on clock interconnect power

40

Backup

41

Rent’s Rule

Empirical rule

Terminals versus

Number of gates.

Published by:

B. S. Landman and R. L. Russo. On a pin versus block relationship for partitions of logic graphs.

IEEE Trans. on Comput., vol. C--20: pages 1469--1479, 1971.

Taken from Krishna Saraswat in SLIP 2000

42

Rent’s parameters

N gates

Rent’s rule: T = k N r

T = # of I/O terminals (pins)N = # of gatesk = avg. I/O’s per gater = Rent’s exponent

can be: 0 < r < 1 , but common - (simple) 0.5 < r < 0.75 (complex)

T terminals

43

Rent’s Rule Example

Lets assume Rent’s parameters: r=0.79 and k=2.

For a single gate: N=10.792 1 2rT k N

For a block of four gates: N=4

0.792 4 6rT k N

Fan out is implied by Rent.

44

Is Rent’s rule a coincidence ?

Random circuits do not obey Rent.

Rent’s parameters are correlated with Place and Route algorithms.P. Verplaetse J. Dambre D. Stroobandt J. Van Campenhout. On Partitioning vs. Placement Rent Properties. In Proc. of Intl. Workshop on System-Level Interconnect Prediction, March 2001.

Self similarity within circuits – Obeys Rent.

Assumption: the complexity of the interconnection topology is equal at all levels.

Conclusion – Rent’s rule is a result of the design and synthesis.

45

Donath’s Hierarchical Placement Model

1. Partition the circuit 4 equal sized modules, with a minimal cut.

2. Partition the Manhattan grid 4 equal sized modules, with a minimal cut.

3. Map the modules to the grid Arbitrary mapping.

4. Repeat recursively Until each block is assigned to one cell.

W. E. Donath. Placement and Average Interconnection Lengths of Computer Logic. IEEE Trans. on Circuits & Syst., vol. CAS-26, pp. 272-277, 1979.

Result – Rent’s parameters

46

Donath’s length estimation model

For the i-th level:

There are 4 blocksi

For each block there are: r

terminals4i

Nk

Assuming two-terminal nets :

r

nets2 4i

k N

The nets of the i-1 level must be substracted.

r r r

1 11

4 - 4 4 1 42 4 2 4 2 4

i i i ri i i

k N k N k N

Nets for level i : ni=

47

Average interconnection length

Taken from a SLIP 2001 tutorial by Dirk Stroobandt

The wires can be of two types A and D.

LA =

LD =

1 1 1 1

4

4 1

3 3A A B B

A B B Ai j i j

i i j j

1 1 1 1

4

2

2A A B B

A A B Bi j i j

i j i j

The average: ri= 14 2

9 9

1

1

I

i ii

I

ii

n rR

n

0.5 1.5 1

0.5 1.5 1

2 1 1 1 47

9 4 1 1 4 1

r r r

r r r

N N

N

Overall : equals

48

Results Donath

Scaling of the average length L as a function of the number of logic blocks N :

0.5 ( 0.5)

log( ) ( 0.5)

( ) ( 0.5)

rN r

L N r

f r r

Similar to measurements on placed designs.Taken from a SLIP 1999 tutorial by Dirk Stroobandt

L

G

0

5

10

15

20

25

30

1 10 100 103 106105104 107

r = 0.7

r = 0.5

r = 0.3

N

49

Donath’s Model - overview

Provides average net length based on the circuit’s size and Rent parameters.

Can provide a rough net length distribution.

Obvious limitations:Uniform distribution.Partitioning algorithm.Two terminals nets only.Assumes perfect similarity.

Recommended