64
Jan M. Rabaey ow Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Embed Size (px)

Citation preview

Page 1: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Jan M. Rabaey

Low Power Design Essentials ©2008 Chapter 4

Optimizing Power @ Design Time

Circuits

Dejan Marković

Borivoje Nikolić

Page 2: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.2

Chapter Outline

Optimization framework for energy-delay trade-off Dynamic power optimization

– Multiple supply voltages– Transistor sizing– Technology mapping

Static power optimization– Multiple thresholds– Transistor stacking

Page 3: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.3

Energy/Power Optimization Strategy

For given function and activity, an optimal operation point can be derived in the energy-performance space

Time of optimization depends upon activity profile Different optimizations apply to active and static power

Fixed Activity

Variable Activity

No Activity - Standby

ActiveDesign time Run time Sleep

Static

Page 4: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.4

Maximize throughput for given energy orMinimize energy for given throughput

Delay

Unoptimized design

Emax

DmaxDmin

Energy/op

Emin

Energy-Delay Optimization and Trade-off

Trade-off space

Other important metrics: Area, Reliability, Reusability

Page 5: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.5

The Design Abstraction Stack

Logic/RT

(Micro-)Architecture

Software

Circuit

Device

System/Application

Th

is C

hap

ter

A very rich set of design parameters to consider!It helps to consider options in relation to their abstraction layer

sizing, supply, thresholds

logic family, standard cell versus custom

Parallel versus pipelined, general purpose versus application specific

Bulk versus SOI

Choice of algorithm

Amount of concurrency

Page 6: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.6

Architecture

Micro-Architecture

Circuit (Logic & FFs)

Optimization Can/Must Span Multiple Levels

Design optimization combines top-down and bottom-up: “meet-in-the-middle”

Page 7: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.7

topology A

DelayE

ner

gy/

op

Globally optimal energy-delay curve for a given function

Energy-Delay Optimization

topology B

topology A

topology B

Delay

En

erg

y/o

p

Page 8: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.8

Some Optimization Observations

∂E / ∂A∂D / ∂A A=A0

SA=

SB

SA

f (A0,B)

f (A,B0)

Delay

En

erg

y

D0

(A0,B0)

Energy-Delay Sensitivities

[Ref: V. Stojanovic, ESSCIRC’02]

Page 9: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.9

∆E = SA∙(∆D) + SB∙∆D

On the optimal curve, all sensitivities must be equal

Finding the Optimal Energy-Delay Curve

f (A0,B)

f (A,B0)

Delay

En

erg

y

D0

(A0,B0)

∆D

f (A1,B)

Pareto-optimal:the best that can be achieved without disadvantaging at least one metric.

Page 10: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.10

Reducing voltages– Lowering the supply voltage (VDD) at the expense of clock speed– Lowering the logic swing (Vswing)

Reducing transistor sizes (CL)– Slows down logic

Reducing activity (a)– Reducing switching activity through transformations– Reducing glitching by balancing logic

fVVCP DDswingLactive ~DDswingLactive VVCE ~

Reducing Active Energy @ Design Time

Page 11: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.11

Downsizing and/or lowering the supply on the critical path lowers the operating frequency

Downsizing non-critical paths reduces energy for free, but– Narrows down the path delay distribution– Increases impact of variations, impacts robustness

tp (path)

# o

f pa

ths target

delay

tp (path)

# o

f pa

ths target

delay

Observation

Page 12: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.12

topology A

topology B

Delay

En

erg

y/o

p

Reference case– Dmin sizing @ VDD

max, VTHref

minimize Energy (VDD, VTH, W) subject to Delay (VDD, VTH, W) ≤ Dcon

Constraints VDD

min < VDD < VDDmax

VTHmin < VTH < VTH

max

Wmin < W

Circuit Optimization Framework

[Ref: V. Stojanovic, ESSCIRC’02]

Page 13: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.13

i i+1

CwgCiCi Ci+1

Optimization Framework: Generic Network

VDD,i+1VDD,i

Gate in stage i loaded by fanout (stage i+1)

Page 14: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.14

Fit parameters: Von, d, Kd, g

Alpha-power based Delay Model

VDDref = 1.2V, technology 90 nm

)1

1()()(

11

i

inom

i

iwi

onDD

DDdp C

C

C

CCC

VV

VKt

d

(90nm technology)

0 2 4 6 8 100

10

20

30

40

50

60

Fanout (Ci+1/Ci)

Del

ay (

ps)

tp

0.5 0.6 0.7 0.8 0.9 1 0

0.5

1

1.5

2

2.5

3

3.5

4

VDD

/ VDDref

FO

4 de

lay

(nor

m.)

Von

= 0.37 Va

d = 1.53

simulationmodel

tnom = 6 psg = 1.35

simulationmodel

Page 15: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.15

Parasitic delay pi – depends upon gate topology

Electrical effort fi ≈ Si+1/Si

Logical effort gi – depends upon gate topology

Effective fanout hi = figi

For Complex Gates

[Ref: I. Sutherland, Morgan-Kaufman’99]

Combined with Logical Effort Formulation

)(

iiinomp

gfpt

Page 16: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.16

= energy consumed by logic gate i

Dynamic Energy

i i+1

CwgCiCi Ci+1

VDD,i+1VDD,i

iiiiwiiei

iDDiiiDDiwidyn

SSCCCfSKC

VfCVCCCE

//)(

)()(

11

2,

2,1

)( 2,

21, iDDiDDiei VVSKE

Page 17: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.17

for equal h

(Dmin)

max at VDD(max)

(Dmin)

Depends on Sensitivity (E/D)

Optimizating Return on Investment (ROI)

Gate Sizing

Supply Voltage

)( 1

iinom

i

i

i

hh

E

SDS

E

DD

ond

DD

on

DD

DD

VVVV

D

E

VDV

E

1

)1(2

Page 18: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.18

Properties of inverter chain– Single path topology– Energy increases geometrically from input to output

Example: Inverter Chain

CL1

S1 = 1 S2 … SNS3

Goal– Find optimal sizing S = [S1, S2, …, SN], supply voltage, and

buffering strategy to achieve the best energy-delay tradeoff

Page 19: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.19

Variable taper achieves minimum energy Reduce number of stages at large dinc

[Ref: Ma, JSSC’94]

Inverter Chain: Gate Sizing

1 2 3 4 5 6 70

5

10

15

20

25

stage

effe

ctiv

e fa

nout

, h

0%

1%

10%

30%

dinc

= 50%nomopt

1

21

112

21

ii

iS

Snom

DDe

i

iii

hh

EF

F

VKS

SSS

Page 20: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.20

VDD reduces energy of the final load first

Variable taper achieved by voltage scaling

Inverter Chain: VDD Optimization

1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1.0

stage

V DD

/ V

DDno

m0%

1%

10%

30%

dinc

= 50%

nomopt

Page 21: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.21

Parameter with the largest sensitivity has the largest potential for energy reduction

Two discrete supplies mimic per-stage VDD

Inverter Chain: Optimization Results

50

inc

0 10 20 30 400

20

40

60

80

100

d (%)en

ergy

red

uctio

n (%

)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1.0

dinc

(%)

Sen

sitiv

ity (

norm

)

cVDD

SgVDD

2VDD

Page 22: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.22

Tree adder– Long wires– Re-convergent paths– Multiple active outputs

S0

S15

(A0, B0)

(A15, B15)

Cin

Example: Kogge-Stone Tree Adder

[Ref: P. Kogge, Trans. Comp’73]

Page 23: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.23

sizing: E (-54%)dinc=10%

referenceD=Dmin

2Vdd: E (-27%)dinc=10%

Tree Adder: Sizing vs. Dual-VDD Optimization

Reference design: all paths are critical

Internal energy S more effective than VDD

– S: E(-54%), 2Vdd: E(-27%) at dinc = 10%

Page 24: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.24

Tree Adder: Multi-dimensional Search

Can get pretty close to optimum with only 2 variables Getting the minimum speed or delay is very expensive

En

erg

y /

Ere

f

Delay / Dmin

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1Reference

S, VDD

VDD, VTH

S, VTH

S, VDD, VTH

Page 25: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.25

Block-level supply assignment– Higher throughput/lower latency functions are

implemented in higher VDD

– Slower functions are implemented with lower VDD

– This leads to so-called “voltage islands” with separate supply grids

– Level conversion performed at block boundaries

Multiple supplies inside a block– Non-critical paths moved to lower supply voltage– Level conversion within the block– Physical design challenging

Multiple Supply Voltages

Page 26: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.26

V1 = 1.5V, VTH = 0.3V

Using Three VDD’s

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

V1 (V)V2

(V)

+

V2 (V)

V3 (

V)

0.4 0.6 0.8 1 1.2 1.4

0.4

0.6

0.8

1

1.2

1.4

V2 (V)

V3 (V)

Po

we

r R

ed

uc

tio

n R

ati

o

00.5

11.5

0

0.5

1

1.50.4

0.5

0.6

0.7

0.8

0.9

1

[Ref: T. Kuroda, ICCAD’02]

© IEEE 2002

Page 27: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.27

1.0

0.5

VD

D R

ati

o

1.0

0.4

0.5 1.0 1.5V1 (V)

P R

ati

o

V2/V1

P2/P1

{ V1, V2 }

V2/V1

V3/V1

{ V1, V2, V3 }

0.5 1.0 1.5V1 (V)

P3/P1

V2/V1

V3/V1

V4/V1

0.5 1.0 1.5V1 (V)

P4/P1

{ V1, V2, V3, V4 }

[Ref: M. Hamada, CICC’01]

Optimum Number of VDD’s

The more VDD’s the less power, but the effect saturates Power reduction effect decreases with scaling of VDD

Optimum V2/V1 is around 0.7

© IEEE 2001

Page 28: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.28

Two supply voltages per block are optimal

Optimal ratio between the supply voltages is 0.7

Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)

An option is to use an asynchronous level converter– More sensitive to coupling and supply noise

Lessons: Multiple Supply Voltages

Page 29: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.29

i1 o1

VDDHVDDL

VSS

Conventional

VDDH circuit VDDL circuit

i2 o2i1 o1

VDDH

VDDL

VSS

Shared N-well

VDDH circuit VDDL circuit

i2 o2

Distributing Multiple Supply Voltages

Page 30: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.30

VDDH circuit

VDDH VDDL

VSS

N-well isolation

VDDL circuit

(a) Dedicated row

(b) Dedicated region

VDDH Row

VDDH Row

VDDH

RegionVDDL

Region

Conventional

VDDL Row

VDDL Row

Page 31: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.31

VDDH circuit

VDDH

VDDL

VSS

Shared N-well

VDDL circuit

(a) Floor plan image

VDDL circuit

VDDH circuit

Shared N-Well

[Shimazaki et al, ISSCC’03]

Page 32: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.32

Lower VDD portion is shared

[Ref: M. Takahashi, ISSCC’98]

“Clustered voltage scaling”

Example: Multiple Supplies in a Block

FF

FF

FF

FFFF

FF

FF

FF

FF

FF

CVS StructureConventional Design

Critical Path

Level-Shifting F/F

Critical Path

FF

FF

FF

FF

FF

FF FF

FF

FF

FF

FF

© IEEE 1998

Page 33: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.33

Pulsed Half-Latch versus Master-Slave LCFFs Smaller # of MOSFETs / clock loading Faster level conversion using half-latch structure Shorter D-Q path from pulsed circuit

[Ref: F. Ishihara, ISLPED’03]

Level Converting Flip-Flops (LCFFs)

q

ck

ckb ck

clk

level conversion

ckb

ckd q (inv.)

ck

ckclk

level conversion

dmo

mf

sfso db

sfso

MN1 MN2

Master-Slave Pulsed Half-Latch

© IEEE 2003

Page 34: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.34

Pulsed precharge LCFF (PPR)– Fast level conversion by

precharge mechanism– Suppressed

charge/discharge toggle by conditional capture

– Short D-Q path

clk

ckd1

qb

clk level conversion

x

db

qb

ckd1

VDDH

VDDH

VDDH

d

xb

IV1

q (inv.)

ck

MN1

MN2

MP1

[Ref: F. Ishihara, ISLPED’03]

Dynamic Realization of Pulsed LCFF

Pulsed Precharge Latch

© IEEE 2003

Page 35: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.35

carrygen.

partialsum

gpgen.

5:1MUX

ain

bin

carry

s0/s1

sum

sumb (long loop-back bus)

clk

clock gen.

: VDDH circuit

: VDDL circuit

INV1INV2

0.5pF

sumsel.

2:1MUX

9:1MUX

logicalunit

9:1MUX

ain0

Case Study: ALU for 64-bit mProcessor

[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003

Page 36: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.36

sum

keeperpc

sumb

VDDH

VDDL

INV1 INV2

domino level converter (9:1 MUX)

ain0sel(VDDH)

VDDH

VDDL

INV2 is placed near 9:1 MUX to increase noise immunity Level conversion is done by a domino 9:1 MUX

Low-Swing Bus and Level Converter

[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003

Page 37: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.37

Single-supply

Shared well(VDDH=1.8V)E

nerg

y [p

J]

TCYCLE [ns]

Room temperature

200

300

400

500

600

700

800

0.6 0.8 1.0 1.2 1.4 1.6

1.16GHz

VDDL=1.4VEnergy:-25.3% Delay :+2.8%

VDDL=1.2VEnergy:-33.3% Delay :+8.3%

Measured Results: Energy and Delay

[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003

Page 38: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.38

Practical Transistor Sizing

Continuous sizing of transistors only an option in custom design

In ASIC design flows, options set by available library

Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell– Leads to larger libraries (> 800 cells)– Easily integrated into technology mapping

Page 39: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.39

Larger gates reduce capacitance, but are slower

Technology Mapping

a

b

c

slack=1

d

f

Page 40: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.40

(a) Implemented using 4 input NAND + INV (b) Implemented using 2 input NAND + 2-input NOR

Library 1: High-Speed

Technology Mapping

Example: 4-input AND

Gatetype

Area (cell unit)

Input cap. (fF)

Average delay (ps)

Average delay (ps)

INV 3 1.8 7.0 + 3.8 CL 12.0 + 6.0 CL

NAND2 4 2.0 10.3 + 5.3 CL 16.3 + 8.8 CL

NAND4 5 2.0 13.6 + 5.8 CL 22.7 + 10.2 CL

NOR2 3 2.2 10.7 + 5.4 CL 16.7 + 8.9 CL

Library 2: Low-Power

(delay formula: CL in fF)

(numbers calibrated for 90 nm)

Page 41: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.41

Technology Mapping – Example

4-input AND(a) NAND4 +

INV(b) NAND2 +

NOR2

Area 8 11

HS: Delay (ps) 31.0 + 3.8 CL 32.7 + 5.4 CL

LP: Delay (ps) 53.1 + 6.0 CL 52.4 + 8.9 CL

Sw Energy (fF) 0.1 + 0.06 CL 0.83 + 0.06 CL

Area– 4-input more compact than 2-input (2 gates vs. 3 gates)

Timing– both implementations are 2-stage realizations– 2nd stage INV (a) is better driver than NOR2 (b)– For more complex blocks, simpler gates will show better

performance Energy

– Internal switching increases energy in the 2-input case– Low-power library has worse delay, but lower leakage (see later)

Page 42: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.42

Technology mapping Gate selection Sizing Pin assignment

Logical Optimizations Factoring

Restructuring

Buffer insertion/deletion

Don’t care optimization

Gate-Level Tradeoffs for Power

Page 43: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.43

Logic restructuring to minimize spurious transitions

Buffer insertion for path balancing

Logic Restructuring

01

1

1

0

1

1

1

0

11

1

1

1

1

11

12

3

Page 44: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.44

Idea: Modify network to reduce capacitance

Caveat: This may increase activity!

pa = 0.1; pb = 0.5; pc = 0.5

Algebraic Transformations

a

bc

ff

a

a

b

c

p1=0.05

p2=0.05

p3=0.075

p4=0.75

p5=0.075

Page 45: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.45

Joint optimization over multiple design parameters possible using sensitivity-based optimization framework– Equal marginal costs ⇔ Energy-efficient design

Peak performance is VERY power inefficient– About 70% energy reduction for 20% delay penalty– Additional variables for higher energy-efficiency

Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage

Choice between sizing and supply voltage parameters depends upon circuit topology

But … leakage not considered so far

Lessons from Circuit Optimization

Page 46: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.46

Considering leakage as well as dynamic power is essential in sub-100 nm technologies

Leakage is not essentially a bad thing– Increased leakage leads to improved

performance, allowing for lower supply voltages– Again a trade-off issue …

Considering Leakage @ Design Time

Page 47: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.47

Must adapt to process and activity variations

2

ln

Lk Sw optd

avg

E EL

K

Topology Inv Add Dec

(ELk/ESw)opt 0.8 0.5 0.2

Leakage – Not Necessarily a Bad Thing

Optimal designs have high leakage (ELk/ESw ≈ 0.5)

10-2

10-1

100

101

0

0.2

0.4

0.6

0.8

1

Estatic /Edynamic

En

orm

Vthref-180mV

0.81VDDmax

Vthref-140mV

0.52VDDmax

Version 1

Version 2

[Ref: D. Markovic, JSSC’04]

© IEEE 2004

Page 48: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.48

Switching energy

Leakage energy

with: I0(Y): normalized leakage current with inputs in state Y

Refining the Optimization Model

210 )( DDedyn VfSKE

cycleDDqkT

VV

stat TVeSIEDDdTH

/0 )(

Page 49: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.49

Using longer transistors– Limited benefit– Increase in active current

Using higher thresholds– Channel doping– Stacked devices– Body biasing

Reducing the voltage!!

Reducing Leakage @ Design Time

Page 50: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.50

10% longer gates reduce leakage by 50%

Increases switching power by 18% with W/L = const.

Doubling L reduces leakage by 5x Impacts performance

– Attractive when don’t have to increase W (e.g. memory)

Longer Channels

100 110 120 130 140 150 160 170 180 190 2000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Transistor length (nm)

1

2

3

4

5

6

7

8

9

10

90 nm CMOS

Switching energy

Leakage power

Nor

mal

ized

sw

itchi

ng e

nerg

y

Nor

mal

ized

leak

age

pow

er

Page 51: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.51

There is no need for level conversion

Dual thresholds can be added to standard design flows– High-VTh and Low-VTh libraries are a standard in sub-0.18m

processes– For example: can synthesize using only high-VTh and then only

in-place swap in low-VTh cells to improve timing.

– Second VTh insertion can be combined with resizing

Only two thresholds are needed per block– Using more than two yields small improvements

Using Multiple Thresholds

Page 52: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.52

VDD = 1.5V, VTH.1 = 0.3V

Three VTH’s

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

Vth2 (V)

Vth1

(V)

+

VTH.3 (V)

VT

H.2 (

V)

0.4 0.6 0.8 1 1.2 1.4

0.4

0.6

0.8

1

1.2

1.4

Lea

kag

e R

edu

ctio

n R

atio

VTH.3 (V)

VTH.2 (V) 0

0.51

1.5

0

0.5

11.50

0.2

0.4

0.6

0.8

1

Impact of third threshold very limited

[Ref: T. Kuroda, ICCAD’02]

© IEEE 2002

Page 53: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.53

Using Multiple Thresholds

FF

FF

FF

FF

FF

Cell-by-cell VTH assignment (not at block level)

Achieves all-low-VTH performance with substantial leakage reduction in leakage

Low VTHHigh VTH

[Ref: S. Date, SLPE’94]

Page 54: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.54

Shaded transistors are low threshold

Low-threshold transistors used only in critical paths

Dual-VT Domino

P1

Inv1

Inv2 Inv3

Dn+1

Clkn

Clkn+1

Dn …

Page 55: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.55

Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds– Selection of cells during technology mapping– No impact on dynamic power– No interface issues (as was the case with multiple

VDD’s)

Impact: Can reduce leakage power substantially

Multiple Thresholds and Design Methodology

Page 56: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.56

High-VTH Only

Low-VTH Only

Dual VTH

Total Slack -53 psec 0 psec 0 psec

Dynamic Power 3.2 mW 3.3 mW 3.2 mW

Static Power 914 nW 3873 nW 1519 nW

All designs synthesized automatically using Synopsys Flows

[Courtesy: Synopsys, Toshiba, 2004]

Dual-VTH Design for High-Performance Design

Page 57: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.57

Example: High- vs. Low-Threshold Libraries

i10 des C7552 seq pair AVER0

1000

2000

3000

4000

5000

6000

7000

8000

LVthLVth+HVthHVthHVth+LVth

Lea

kag

e P

ow

er

(nW

)

Selected combinational tests130 nm CMOS

[Courtesy: Synopsys 2004]

Page 58: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.58

Complex Gates Increase Ion/Ioff Ratio

Ion and Ioff of single NMOS versus stack of 10 NMOS transistors

Transistors in stack are sized up to give similar drive

No stack

Stack

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

VDD (V)

I off (

nA

)

No stack

Stack

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

20

40

60

80

100

120

140

I on (m

A)

VDD (V)

(90nm technology) (90nm technology)

Page 59: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.59

Complex Gates Increase Ion/Ioff Ratio

Stacking transistors suppresses submicron effects Reduced velocity saturation Reduced DIBL effect Allows for operation at lower thresholds

Stack

No stack

Factor 10!

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5x 105

VDD (V)

I on/I o

ff r

atio

(90nm technology)

Page 60: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.60

Example: 4-input NAND

With transistors sized for similar performance: Leakage of Fan-in(2) = Leakage of Fan-in(4) x 3(Averaged over all possible input patterns)

Fan-in (2)Fan-in (4)

versus

Complex Gates Increase Ion/Ioff Ratio

2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

Input pattern

Lea

kag

e C

urr

ent

(nA

)

Fan-in (2)

Fan-in (4)

Page 61: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.61

Example: 32 bit Kogge-Stone Adder

[Ref: S.Narendra, ISLPED’01]

% o

f in

pu

t v

ecto

rs

Standby leakage current (mA)

factor 18

Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60

© Springer 2001

Page 62: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.62

Circuit optimization can lead to substantial energy reduction at limited performance loss

Energy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s.

Well-defined optimization problem over W, VDD and VTH parameters

Increasingly better support by today’s CAD flows

Observe: leakage is not necessarily bad – if appropriately managed.

Summary

Page 63: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.63

Books: A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer

Academic Publishers, 1st Ed, 1995. D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002. D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007. J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed,

Prentice Hall 2003. I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-

Kaufmann, 1st Ed, 1999.

Articles: R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power

Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002. S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology

with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994.

M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.

F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.

P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.

T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.

References

Page 64: Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Power @ Design Time Circuits Dejan Marković Borivoje Nikolić

Low Power Design Essentials ©2008 4.64

Articles (cont.): H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.

Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975. S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS

Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994. D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True

Energy-Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004.

MathWorks, http://www.mathworks.com S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its

applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195-200, Aug. 2001.

T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990.

Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.

V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid-State Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.

M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37, Feb. 1998.

References