73
Design of Power Design of Power Efficient VLSI Efficient VLSI Arithmetic: Speed and Arithmetic: Speed and Power Trade-offs Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis www.ece.ucdavis.edu/acsel Tutorial Presentation 16 th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003

Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

Embed Size (px)

Citation preview

Page 1: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

Design of Power Efficient VLSI Design of Power Efficient VLSI Arithmetic: Speed and Power Arithmetic: Speed and Power

Trade-offsTrade-offs

Vojin G. Oklobdzija, Ram KrishnamurthyIntel AMR / ACSEL Laboratory

Intel Corp/ University of California Daviswww.ece.ucdavis.edu/acsel

Tutorial Presentation16th International Symposium on Computer

Arithmetic

Santiago de Compostela, SPAIN

June 18, 2003

Page 2: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 2

Issues to be addressed

• How do we compare different topologies for their efficiency ?

• How do we estimate speed and efficiency of our algorithm ?

• What criteria's should we use when developing a new algorithm ?

• How does power enter into this equation ?

Page 3: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 3

Additional Issues

• Determine which topology is the best for given Power or Delay budget

• Determine which topology can stretch the furthest in terms of speed or power

Page 4: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

Metric

Page 5: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 5

Previously used estimates Counting the number of gates (logic levels): not accurate

C in

C out C in

C 4C 8C 12

C out

C 20C 24C 28

C in

C 16

a ib i

ind ividua l addersgenera ting: g i, p i,

and sum S i

C arry-lookahead b locks o f4-b its generating:

G i, P i, and C in fo r theadders

C arry-lookahead super- b locks o f4-b its b locks genera ting:

G * i, P * i, and C in fo r the 4-b itb locks

G roup producing fina lcarry C out and C 16

C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay

Page 6: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 6

Critical path in Motorola's 64-bit CLACritical path in Motorola's 64-bit CLA

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3

... G5

2

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G

0

P,G

1:0

P,G

2:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G

60

P,G

61

:60

P,G

62

:60

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

Page 7: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 7

Motorola's 64-bit CLA

Modified PG Block

Intermediate propagate signals Pi:0 are generated to speed-up C3

Page 8: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 8

Fan-In and Fan-Out DependencyFan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985)

Page 9: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 9

Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Delay Complexity

Page 10: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 10

Design Objective• Design takes time:

– finding results afterward is not of much value

• There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation– we want to estimate as close to the measured

results

• A simple tool that can evaluate different design trade-off for a given technology is needed

• Power trade-off is the most important– speed and power are tradable

Page 11: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 11

Logical Effort Theory

• “Back of the Envelope” complexity: good for estimating speed

• Gate delay = linear function of load– Slope: logical effort gate driving characteristics– Intersect: parasitic gate internal load

• “Logical Effort” accuracy is not sufficient– We needed to extend and refine the method– However, that becomes more than “Back of the

Envelope”

• Logical Effort does not account for possible power-delay trade-offs

Page 12: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 12

Logical Effort Theory

• Excel –a platform of choice (ARITH-16)– Simple enough– Can provide computation quickly– Easy to enter a given design

• Technology characterization is needed:– This needs to be done only once: available for

every design afterwards– Domino gate = 2 stages of dynamic and static

• Different driving characteristics of these stages

• Multi-output gate (carry-look-ahead, Ling/conditional sum)

• Energy model needs to be included

Page 13: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 13

AGUs: performance and peak-current limitersHigh activity thermal hotspotGoal: high-performance energy-efficient design

Energy Energy MotivationMotivation

Execution core

120oC

Cache

Processor thermal

map

AGU

Temp(oC)

*courtesy of Intel Corp.

Page 14: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 14

Kogge-Stone AdderKogge-Stone Adder

Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b

Energy inefficientEnergy

inefficient

1235 4679 8101113 12141517 16181921 20222325 24262729 283031PG

Car

ry-m

erg

e g

ates

XOR

00

Page 15: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 15

Sparse-tree Adder ArchitectureSparse-tree Adder Architecture

Generate every 4th carry in parallelSide-path: 4-bit conditional sum generator73% fewer carry-merge gatesenergy-efficient

C27 C23 C19 C15 C11 C7 C3

293031 28 252627 24 212223 20 171819 16 131415 12 91011 8 567 4 123 0

Page 16: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 16

StageLogical Effort

(G)Branch

Effort (B)Int. Pitch

(C)Effective Brnch Effort (B+I.C)

Paras tic Com p.

Path Branch

Effort = Bi Path Logical Effort=Gi

Path EffortPath Delay

(ps)

PG 0.6 2 1 2.1 1.3CM0 1.48 2 2 2.2 2.5CM1 0.59 2 4 2.4 1.6CM2 1.48 2 8 2.8 2.5CM3 0.59 2 16 3.6 1.6CM4 1.48 1 0 1.0 2.5XOR 1.69 1 0 1.0 3.0Inv 1 1 0 1.0 1.0

124.63 93.97

Kogge Stone Adder

108.92 1.14

Kogge-Stone adder (8-stage)Kogge-Stone adder (8-stage)

Adder Pitch (um)

10

Interconnect Cap

(fF/um) 0.157

Gate Cap (fF/um)

1.15

Avg inp. Cap /gate (um)

14

% int to gate

cap/pitch I10%

Inv. L.E. 2.24

Parasitic delay 3.8

Design ParametersAdder Pitch

(um)10

Interconnect Cap

(fF/um) 0.157

Gate Cap (fF/um)

1.15

Avg inp. Cap /gate (um)

14

% int to gate

cap/pitch I10%

Inv. L.E. 2.24

Parasitic delay 3.8

Design Parameters

D = 8*(GBH)1/8*2.2 + 3.8*P

Page 17: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 17

MXA2 – Architecture & Result

• Multiplexer-based• Generate carries using

radix-2 (P,G)• 4-bit conditional sum

selected by carries• 4-b cell width = 17m• 9-stage critical path

– Per-stage effort = 3.7– Total effort delay = 33.3– Total parasitic = 22.5– Total delay = 55.8

PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4

S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4

60..6356..5952..5548..5144..4740..4336..3932..3528..3124..2720..2316..1912..158..114..70..3

S1 0

S

1 0S

10

G01G23

2

a3 a1a2 b2 a0 b0a3 b3 a2 b2 b0 a0 b1 a1

2

2

P03P03

p3p3

P23P23

G03

PG Group

S10

S

1 0

S10

S10

S10

S10

S10

p0

Sum0Sum1Sum2Sum3

p1g0p2

p3

G01

g2 g2 g1 a0 b0

a1 b1a2 b2

G01

Cin

Page 18: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 18

(p,g)

XOR2NAND2

NOR2OAI

CM6CM1

NAND2AOI

NOR2OAI

CM2 CM3

NAND2AOI

NOR2OAI

CM4 CM5

AOI

OAI

CMo

XOR2NAND2

XOR2

XOR2

SumCiN

Evenbits

Oddbits

HC2 – ArchitectureHC2 – Architecture

• Generate even carries using radix-2 (P,G)

• Generate odd carries from even carries

• CMOS adder for sum• 1-b cell width 4m• 10-stage critical path

4 3 02 114 7 663

30

31

15... ... ...

L2

L4

L6

L1

L3

L5

562

Odd

Sum ... ... ...

Page 19: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 19

HC2 – Circuits & ResultsHC2 – Circuits & Resultspi gi-1 gi

G

pi gi-1 gi

G

pi pi-1

P

pi pi-1

P

a b a b

g p

P Cin

Sum

CK

Gi

Gi-1

G

Pi

CKPi

Ai

Bi Gi-1

Pi

Gi

G

Gi-1

Gi

Pi-1

CKGi

Ai Bi

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayStatic 2.8t 28.0t 34.5t 62.5t

Page 20: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 20

KS2 – Architecture & ResultsKS2 – Architecture & Results

• Generate carries using radix-2 (P,G)

• CMOS adder for sum• Similar circuits as HC2• 1-b cell width 4m• 9-stage critical path

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayStatic 3.0t 27.0t 30.6t 57.6tDynamic 2.11t 19.0t 23.6t 42.6t

4 3 02 114 7 615 ...

L2

L4

L6

L1

L3

L5

5

Inv

Sum ...

13...

...

...

...

30

31

29

63

62

Page 21: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 21

63 62 5961 60 4 3 02 18 57 648 1632 12... ...... ... ...

G4P4

G16P16

CoSum

KS4 – ArchitectureKS4 – Architecture

• Generate carries using redundant radix-4 (P,G)• Dynamic circuit• 1-b cell width 4m• 6-stage critical path

Page 22: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 22

CKG4

A3

B3

A2

B2

A1

B1 B0

A0

B1 A1

A3

B3

A3

A2

B3

B2

A3

B3

A2

A3

B2

B3

A3

B3

A2

B2

A1

B1 A0

A1 B1

B0

P4CK

CK

CKG16

CK

g3 g2 g1 g0

p1

g3 p2

p1

g3 p2

p3

p1CK

g3 g1g2 g0

CKP16

G3 P2

P3 HS

STB

HSN

Sum

CK P1

G3 G2 G1 G0

CK

KS4 – Circuits & ResultKS4 – Circuits & Result

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 2.3t 13.8t 16.3t 30.1t

Page 23: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 23

b32

b0

b16

b48 b15

b31b47

b63

Cin = C0

C48

C16

C32

C4

C8

C12

C20

C24

C28C36

C40

C44

C52

C56

C60

PGC PGC PGC

PGC PGC

PGC PGC PGC PGC PGC

C

PGC

PGC

PGCPGCPGCPGC

PGC

PGC PGC PGC

(P,G,C) Network

G-PathP-Path

CLA4 – ArchitectureCLA4 – Architecture• Generate carries using radix-4 (P,G,C)• 1-b cell width 4m• 15-stage critical path

Page 24: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 24

A

B

AAN

CK

BNB

CK

G P K

AN

BN

CK CK

CK Sum

CiN

STBpg

Ci

CLA4 – Circuits & ResultCLA4 – Circuits & Result

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 1.4t 21.0t 33.3t 54.3t

G0 G1 G2 G3P0 P1 P2 P3

C0

P2:0 P3:0P1:0

G2:0 G3:0G1:0

C2 C3C1

Page 25: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 25

LNG4 – ArchitectureLNG4 – Architecture• Generate carries using Ling pseudo-carries• Conditional sums selected by local & long carries• 1-b cell width 5.1m; 9-stage critical path

Page 26: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 26

LNG4 – Circuits & ResultLNG4 – Circuits & Result

A0

B0

A1 B1A1

B1

A2

B2

A2 B2

CKG3

G4

CK

A3

B3P4

A2 B2

B3A3B1

A0 B0

A1

CK

CK

P

LCH LCL

C1H C0LC1L C0H

SumH

CK

K

G

SumL LCH LCL

C1H C0LC1L C0H

CK

P2

P1

G0

CKLC

G2G1

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 2.4t 21.6t 22.3t 43.9t

Page 27: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 27

Results from SimulationResults from Simulation

2.7

0.10.50.4

1.3

0.5

1.4-0.9

0

2

4

6

8

10

12

14

16

KS CS HC KS-4 KS-2 Ling HC CLA

HS

PIC

E &

Diff

eren

ce (

FO4)

• Fairly consistent with logical effort analysis

• Per-stage delay– 1.4 FO4 (static)

– 0.8 FO4 (dynamic)

Type Adder # Stages LE (FO4) SPICE (FO4) Diff (FO4)Static KS2 9 11.8 10.9 -0.88

MX2 9 11.4 12.8 1.41HC2 10 12.8 13.3 0.46

Dynamic KS4 6 6.2 7.4 1.27KS2 9 8.7 9.2 0.44

LNG4 9 9.0 9.5 0.51HC2 10 9.8 9.9 0.08

CLA4 16 11.4 14.2 2.74

Page 28: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 28

Delay of Representative 64-b AddersDelay of Representative 64-b Adders

0

2

4

6

8

10

12

MXA2 HC2 KS2 QTA2 KS4 LNG4

To

tal D

elay

(F

O4)

Static

Dynamic

Page 29: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 29

What happened when Power is considered ?

Delay

Energy

A

B

Adder A

Adder B

Region 1 Region 2

Page 30: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 30

Energy-Delay Space

Energy

Delay

Emin

Dmin

speed barrier

power limit

Different Adders

Page 31: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 31

Logical EffortLogical Effort

Page 32: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 32

Delay in a Logic Gate

Delay of a logic gate has two components

d = f + p

• Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter)

• Electrical effort is the ratio of output to input capacitance

parasitic delay

effort delay, stage effort

f = gh

logical effort

electrical effort = Cout/Cin

electrical effortis alsocalled “fanout”

*from Mathew Sanu / D. Harris

Page 33: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 33

Logical Effort Parameters: Inverter

• d = gh + p• Delay increases linearly with fanout• More complex gates have greater g and p

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6

p=3.8ps (parasitic delay)

Fanout: h =Cin/Cout

Del

ay

d=gh+p

g=2.2 (logic effort)

*from Mathew Sanu / D. Harris

Page 34: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 34

Normalized Logical Effort: Inverter

• Define delay of unloaded inverter = 1 • Define logical effort ‘g’ of inverter = 1• Delay of complex gates can be defined w.r.t d=1

1

2

3

4

5

6

1 2 3 4 5

parasitic delay

effortdelay

Fanout: h = Cout/Cin

Nor

mal

ized

del

ay:

d

inver

ter g =

p =d =

1 1gh + p = h+1

*from Mathew Sanu / D. Harris

Page 35: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 35

Computing Logical EffortDEF: Logical effort is the ratio of the input capacitance to the input

capacitance of an inverter delivering the same output current

• Measured from delay vs. fanout plots of simulated gates• Or estimated, counting capacitance in units of transistor W

*from Mathew Sanu / D. Harris

Page 36: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 36

L.E for Adder Gates

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

0 1 2 3 4 5 6

Fanout

Del

ay (

ps)

Inverter

Static CM

Dyn PG

Dyn CM

Mux

• Logical effort parameters obtained from simulation for std cells• Define logical effort ‘g’ of inverter = 1• Delay of complex gates can be defined w.r.t d=1

*from Mathew Sanu / D. Harris

Page 37: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 37

Normalized L.E

• Logical effort & parasitic delay normalized to that of inverter

Gate type Logical Eff. (g)Parasitics

(Pinv)

Inverter 1 1

Dyn. Nand 0.6 1.34

Dyn. CM 0.6 1.62

Dyn. CM-4N 1 3.71

Static CM 1.48 2.53

Mux 1.68 2.93

XOR 1.69 2.97

*from Mathew Sanu

Page 38: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 38

Delay of a string of gates

• Delay of a path, D = di = gihi + pi

• gi & pi are constants

• To minimize path delay, optimal values of hi are to be

determined

D is minimized when each stage bears the same effort, i.e. gihi = g i+1h i+1

*from Mathew Sanu / D. Harris

Page 39: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 39

Minimizing path delay

• Logical Effort of a string of gates:

• Path Electrical Effort:

• Branching Effort

• Path Branching Effort:

• Path Effort: F=GBH

giG = Cout(path)

Cin(path)

H = hi =

biB =

Con-path + Coff-path

Con-path

b =

Delay is minimized when each stage bears the same effort:

f = gihi = F1/N

The minimum delay of an N-stage path is: NF1/N + P*from Mathew Sanu / D. Harris

Page 40: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 40

Inclusion of Wire DelayInclusion of Wire Delayinto Logical Effortinto Logical Effort

Page 41: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 41

Wiring LoadWiring Load

• Wiring in hand analysis– Only lumped capacitance included

• Wiring in HSPICE– Short wire: 1-segment -model RC network– Long wire: 4-segment -model RC network– Using worst-case wire capacitance

• Wire length– Estimated from most critical 1-bit pitch

Page 42: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 42

Modeling interconnect cap.• Include interconnect cap in branching factor

Con-path + Coff-path

Con-path

b =

CM0

CM0

Coff-path

Con-path

PG

Add

er b

itpitc

h CM0

CM0Cint

Con-path

PG

Add

er b

itpitc

h

Coff-path

= 2 Con-path + Coff-path+Cint

Con-pathb = = 2+

Cint

Con-path

= 2 + I I : % int. cap to gate cap in 1 adder bitpitch

Page 43: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 43

Branching

CINCOUT1

COUT2

f0 f1

f2 f3

g0 g1

g2 g3

Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies

Page 44: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 44

CINCOUT1

COUT2

f0 f1

f2 f3

f0 = f1 , f2 = f3

Td1 = (f0 + f1 + parasitics) Td2 = (f2 + f3 + parasitics)

g0 g1

g2 g3

Minimum Delay occurs when Td1 = Td2

Correction on Branching

Page 45: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 45

F1g0 g1 out1

CinF2

g2 g3 out2Cin

B1F1 F2

F1

B1g0 g1 out1 g2 g3 out2

g0 g1 out1

B2F1 F2

F2

B2g0 g1 out1 g2 g3 out2

g2 g3 out2

““Real” Branching CalculationReal” Branching Calculation

Branching only equals 2 when: g0 g1 out1 g2 g3 out2

This explains why we had to resort to Excel !

Page 46: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 46

Technology Characterization

Page 47: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 47

Characterization Setup

• Logical Effort Requirements:– Equalize input and output transitions.

• Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads.

• The Logical Effort of each gate is characterized for each input.

• Energy is characterized for each output transition of the gate caused by each input transition.

i.e. for an inverter: energy is measured for tLH and tHL

Page 48: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 48

LE Characterization Setup forLE Characterization Setup for Static Gates Static Gates

Gate Gate Gate GateIn

•tLH

•tHL

•Average•Energy

..

Variable Load

Page 49: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 49

LE Characterization Setup forLE Characterization Setup for Dynamic Gates Dynamic Gates

Gate GateIn

•tHL

•Energy

Variable Load

Page 50: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 50

LE Table (Static CMOS)

• Technology: P/N Ratio = 2 INV = 3.67, pINV = 4.29

• Measured on worst-case single-input switching

Fan-out INV NAND2 NAND3 NOR2 TGXORi TGXORs TGM UXi TGM UXs AOI OAI2 11.6 16.3 22.2 20.5 34.9 22.3 8.0 26.0 23.2 21.33 15.3 20.0 26.6 25.4 42.6 28.2 9.9 33.0 28.5 26.74 19.0 24.0 31.2 30.6 50.2 34.2 12.0 39.0 34.1 32.16 26.4 32.4 40.6 41.1 64.4 45.7 16.0 53.0 45.3 43.68 33.6 40.6 50.0 51.9 79.8 56.5 20.2 68.0 56.7 55.3

g (ps) 3.67 4.08 4.65 5.25 7.43 5.71 2.04 6.97 5.60 5.68p (ps) 4.29 7.90 12.74 9.77 20.19 11.12 3.85 11.76 11.82 9.69

g (norm) 1.00 1.11 1.27 1.43 2.03 1.56 0.55 1.90 1.52 1.55p (norm) 1.00 1.84 2.97 2.28 4.71 2.59 0.90 2.74 2.76 2.26

Page 51: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 51

0

10

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8 9

Fanout

Delay

INV

NAND2

NAND3

NOR2

AOI

OAI

Static CMOS Gates: Delay Graphs

0

10

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8 9

FanoutD

elay

INV

TGXORi

TGXORs

TGMUXi

TGMUXs

Page 52: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 52

Static Gates: Pull-up Delay Graph

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8 9

Fanout

Del

ayINV

NAND2

NAND3

NOR2

AOI

OAI

Page 53: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 53

LE Table (Dynamic CMOS)• Technology:• Minimum-sized keeper included• Measured on all-input switching of worst path

Fan-out DN2 DN3 DN4 Dk1ND2 Dk1NR2 DAOI_A DOAI_O2 9.9 12.7 16.0 13.7 10.6 10.1 8.83 12.6 14.7 19.1 16.7 13.2 12.1 11.34 16.0 18.3 23.2 20.7 16.7 14.7 14.06 21.7 24.7 30.2 27.9 23.2 20.0 19.28 27.3 31.2 37.8 36.1 29.5 24.8 24.0

g (ps) 2.92 3.15 3.65 3.75 3.19 2.49 2.55p (ps) 4.04 5.82 8.46 5.76 3.95 4.86 3.75

g (norm) 0.80 0.86 1.00 1.02 0.87 0.68 0.69p (norm) 0.94 1.36 1.97 1.34 0.92 1.13 0.87

Page 54: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 54

Dynamic CMOS: Delay Graphs

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10

N2

N3

N4

k1ND2

k1NR2

AOI_A

OAI_O

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10

G4

P4

C4

STBSum

Page 55: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 55

Dynamic CMOS: Delay Graphs

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8 10

LG3

LP4

G4

P4

LC

Lsum

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8 10

KSG4

KSP4

KSG16KSP16KSSum

Page 56: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 56

Energy CalculationEnergy Calculation

Page 57: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 57

Energy Calculation

8X Minimal Size Dyn-NAND

16X Minimal Size Dyn-NAND

Page 58: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 58

Energy CalculationOffset (parasitic+wiring energy) vs. Size (in multiplesof the

gate size)

y = 0.8931x + 4.6411

y = 1.1413x + 10.22

y = 1.6382x + 11.988

y = 0.5538x + 12.338

y = 3.89x + 14.5

y = 1.9595x + 9.621

y = 1.2559x + 6.762

y = 1.0592x + 1.71

0

10

20

30

40

50

60

0 5 10 15 20 25 30 35 40 45

Gate Size (x)

Off

se

t

invdgckoai_odaoitgxoraoi_ona2stgmuxsLinear (inv)Linear (dgck)Linear (oai_o)Linear (daoi)Linear (tgxor)Linear (aoi_o)Linear (na2s)Linear (tgmuxs)

Page 59: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 59

Energy CalculationEnergy Calculation

1218

2436

482.5

5

7.5

10

0.00E+00

2.00E+01

4.00E+01

6.00E+01

8.00E+01

1.00E+02

1.20E+02

1.40E+02

Energy [fJ]

Load [u]

Size

Inverter

Page 60: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 60

Energy Calculation

M 1 5 10 15 20 1 5 10 15 200 1.12 5.6 11.2 16.8 22.4 2.51E+00 1.26E+01 2.51E+01 3.77E+01 5.02E+011 2.24 11.2 22.4 33.6 44.8 3.70E+00 1.85E+01 3.70E+01 5.54E+01 7.39E+012 3.36 16.8 33.6 50.4 67.2 4.85E+00 2.42E+01 4.85E+01 7.27E+01 9.70E+013 4.48 22.4 44.8 67.2 89.6 6.16E+00 3.08E+01 6.16E+01 9.24E+01 1.23E+024 5.6 28 56 84 112 7.45E+00 3.73E+01 7.45E+01 1.12E+02 1.49E+025 6.72 33.6 67.2 100.8 134.4 8.74E+00 4.37E+01 8.74E+01 1.31E+02 1.75E+026 7.84 39.2 78.4 117.6 156.8 1.02E+01 5.08E+01 1.02E+02 1.52E+02 2.03E+027 8.96 44.8 89.6 134.4 179.2 1.15E+01 5.75E+01 1.15E+02 1.72E+02 2.30E+028 10.08 50.4 100.8 151.2 201.6 1.27E+01 6.36E+01 1.27E+02 1.91E+02 2.54E+029 11.2 56 112 168 224 1.42E+01 7.08E+01 1.42E+02 2.13E+02 2.83E+0210 12.32 61.6 123.2 184.8 246.4 1.55E+01 7.76E+01 1.55E+02 2.33E+02 3.10E+0211 13.44 67.2 134.4 201.6 268.8 1.69E+01 8.44E+01 1.69E+02 2.53E+02 3.37E+0212 14.56 72.8 145.6 218.4 291.2 1.81E+01 9.05E+01 1.81E+02 2.71E+02 3.62E+0213 15.68 78.4 156.8 235.2 313.6 1.97E+01 9.85E+01 1.97E+02 2.96E+02 3.94E+0214 16.8 84 168 252 336 2.09E+01 1.04E+02 2.09E+02 3.13E+02 4.18E+0215 17.92 89.6 179.2 268.8 358.4 2.26E+01 1.13E+02 2.26E+02 3.39E+02 4.52E+0216 19.04 95.2 190.4 285.6 380.8 2.39E+01 1.20E+02 2.39E+02 3.59E+02 4.79E+0217 20.16 100.8 201.6 302.4 403.2 2.53E+01 1.27E+02 2.53E+02 3.80E+02 5.06E+0218 21.28 106.4 212.8 319.2 425.6 2.67E+01 1.34E+02 2.67E+02 4.01E+02 5.34E+0219 22.4 112 224 336 448 2.81E+01 1.40E+02 2.81E+02 4.21E+02 5.61E+02

INV

Output Capacitance (u) Energy [fJ]

Multiplier FactorEnergy Factors

1.211300121 7.39E-01Output Capacitance Factor

NAND-2

Page 61: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 61

ExamplesExamples

Page 62: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 62

64-Bit Adders

• Han-Carlson (prefix-2, HC2): Static and Dynamic• Han-Carlson (prefix-2, HC2-2): Dynamic-Static• Kogge-Stone (prefix-2, KS2): Static and Dynamic• Kogge-Stone (prefix-2, KS2-2): Dynamic-Static• Quaternary-Tree (prefix-2, QT2): Static and

Dynamic

Included wire delay, tdelay = 0.7RwireCwire

Included wire energy, Ew = CwireV2

Len (um) 10 20 30 40 60 80 120 160 240 320 480Delay (ps) 0.01 0.04 0.09 0.17 0.38 0.67 1.50 2.67 6.01 10.7 24.1

Page 63: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 63

Adder

S0

S63

A0

A63

Cwire

Cwire

Test Setup

1mm wire

H=(Cin + Cwire)/Cin

Page 64: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 64

Energy-Delay Estimates

Page 65: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 65

Adders: EnergyAdders: EnergyEnergy vs. Delay

Cout = 1mm wire (160u gate cap)For Cin = ~minimum input to 50*minimum input

0

100

200

300

400

500

600

700

800

900

0 50 100 150 200 250 300

Delay [pS]

En

erg

y [p

J]

HC Dynamic (2-2)

KS Dynamic (2-0)

HC Dynamic (2-0)

KS Dynamic (2-2)

KS Static Prefix 2

HC Static Prefix 2

Quarternary Dynamic (2-2)

Quarternary Static

Dynamic: KS, HC

Static

Dynamic-Static

QT

KS

HC

Page 66: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 66

Dynamic Static ImplementationDynamic Static Implementationof Carry-Merge stageof Carry-Merge stage

VDD

Clk

Gi

Gi-1 Pi

VDD

Clk

Gi-2

Gi-3 Pi-2

VDD

Clk

Pi-1 Pi

VDD

Delayed Clk

VDD

Clk

Gi-2

Gi-3 Pi-2

VDD

Clk

Gi

Gi-1 Pi

VDD

Clk

Pi-1 Pi

Static Gate

Regular Domino Implementation Compound-Domino Implementation

inverters to be eliminated

Page 67: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 67

Energy-Delay comparison of 64-bit Energy-Delay comparison of 64-bit KS, HC and QT addersKS, HC and QT adders

0

0.5

1

1.5

2

2.5

3

0.9 1.1 1.3 1.5 1.7 1.9 2.1

Normalized Delay

No

rmal

ized

En

erg

y

QT Static

HC Static

KS Static

QT compound-domino

HC compound-domino

KS compound-domino

Page 68: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 68

Adders: Critical Path EnergyAdders: Critical Path EnergyCritical Path Energy vs. Delay (no internal w ire Energy)

Cout = 1mm wire (160u gate cap)For Cin = ~minimum input to 50*minimum input

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200 250 300

Delay [S]

En

erg

y [

fJ]

HC Dynamic (2-2)

KS Dynamic (2-0)

HC Dynamic (2-0)

KS Dynamic (2-2)

KS Static Prefix 2

HC Static Prefix 2

Quarternary (2-2)

Quarternary Static (2-2)

QT dynamic-static

HC dynamic-staticQT static

KS dynamic-static

HC-dynamic

KS dynamic

HC-staticKS-static

Page 69: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 69

Intel 32-bit Adder 0.13u 1.2V [VLSI-2002]Intel 32-bit Adder 0.13u 1.2V [VLSI-2002]Comparison with Intel Measured Data

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140 160 180 200

Delay [pS]

En

erg

y [f

J]

Kogge-Stone (2-0)

Quarternary (2-2)

Intel Kogge-Stone (2-0)

Intel Quarternary (2-2)

QT

KS

KS estimated

QT Estimated

Page 70: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 70

Energy-Delay comparison of 32-bit QT and KS adders: estimated vs. simulation

in 0.10mm technology

0

10

20

30

40

50

60

90 100 110 120 130 140 150 160Delay [pS]

En

erg

y [p

J]

KS [9]

QT [9]

KS Estimate

QT Estimate

55%

35%

Page 71: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 71

Est. Results: All AddersEst. Results: All Addersw/o Wiresw/o Wires

0E+

002E

-11

4E-1

16E

-11

8E-1

11E

-10

7 8 9 10 11 12 13 14 15

Delay (FO4)

Est

imat

ed E

ner

gy

(J)

sKS

sHC

sQT9

dKS

dHC

dQT9

dQT7

dCLA

dIBM

dLNG

Page 72: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 72

Est. Results: All Addersw/ Wires

0.0E

+00

5.0E

-11

1.0E

-10

1.5E

-10

2.0E

-10

8 10 12 14 16 18Delay (FO4)

Est

imat

ed E

ner

gy

(J).

sKS_LE

sHC_LE

sQT9_LE

dKS_LE

dHC_LE

dQT9_LE

dQT7_LE

dIBM_LE

dLNG_LE

Page 73: Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 73

ConclusionConclusion

• Using realistic measures for comparing various designs leads to better design choices

• Power is as important as speed

• Making comparison in Energy-Delay space is necessary:– power can always be traded for speed and

vice versa

• Wire effects are significant

• Leakage currents ?