Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

1

EE141 EECS141 1 Lecture #7


Lab 3 this week

No lab next week

Midterm on We Febr 18 2-3:30pm in 203 McLaughlin

Review Session: Tu Febr 17 6-7:30pm in 247 Cory

EE141

2


Last lecture

Optimizing complex logic

Today’s lecture

Applying what we learned on memory

decoders

Reading (Ch 6.2, 12.1,12.3)


Measure everything in units of tinv (divide by tinv):

p – intrinsic delay (k g) - gate parameter f(W)

LE – logical effort (k) – gate parameter f(W)

f – electrical effort (effective fanout)

Normalize everything to an inverter:

LEinv =1, pinv =

tpgate = tinv (p + LE f)

EE141

3


OUT = D + A • (B + C)

D

A

B C

D

A

B

C


Effective fanout: EFi = LEifi

Path electrical fanout: F = Cout/Cin

Path logical effort: LE = LE1LE2…LEN

Branching effort: B = b1b2…bN

Path effort: PE = LE F

Path delay D = di = pi + EFi

EE141

4


When each stage bears the same effort:

Minimum path delay

Effective fanouts: LE1f1 = LE2f2 = … = LENfN


For a given load,

and given input capacitance of the first gate

Find optimal number of stages and optimal sizing

The ‘best effective fanout’

Remember: we can always add inverters to the end of the chain

is still around 4

(3.6 with =1)

EE141

5


Electrical fanout, F =

LE =

PE = EF/stage =

a =

b =

c =

LE = 1

f = a LE = 5/3

f = b/a

LE = 5/3

f = c/b

LE = 1

f = 5/c


Electrical fanout, F = 5

LE = 1·(5/3)·(5/3)·1 = (25/9)

PE = ( LE)·F = (125/9) EF/stage = (125/9)^(1/4) = 1.93

a = 1.93

b = 2.23

c = 2.59

LE = 1

f = a LE = 5/3

f = b/a

LE = 5/3

f = c/b

LE = 1

f = 5/c

5/c = 1.93

(5/3)c/b = 1.93

(5/3)b/a = 1.93

EE141

6


LE=10/3 1

LE = 10/3

P = 8 + 1

LE=2 5/3

LE = 10/3

P = 4 + 2

LE=4/3 5/3 4/3 1

LE = 80/27

P = 2 + 2 + 2 + 1


Branching effort:

EE141

7


5

15

15

90

90

LE =

FO =

PE = SE1 =

SE2 =

PE =

1

90/5 = 18

18 (wrong!) (15+15)/5 = 6 90/15 = 6

36, not 18!

Introduce new kind of effort to account for branching:

• Branching Effort:

• Path Branching Effort:

Con-path + Coff-path

Con-path

b =

bi B =

Now we can compute the path effort:

• Path Effort: PE = LE·FO·B

Branching Example 1


Select gate sizes y and z to minimize delay from A to B

Logical Effort: LE =

Electrical Effort: FO =

Branching Effort: B =

Path Effort: PE =

Best Stage Effort: SE =

Delay: D =

(4/3)3

Cout/Cin = 9

2•3 = 6

LE·FO B= 128

PE1/3 5

3•5 + 3•2 = 21

Work backward for sizes:

5 z = 9C•(4/3)

= 2.4C

5 y = 3z•(4/3)

= 1.9C

Branching Example 2

EE141

8


Compute the path effort: PE = ( LE)BF

Find the best number of stages N ~ log4PE

Compute the effective fanout/stage EF = PE1/N

Sketch the path with this number of stages

Work either from either end, find sizes: Cin = Cout*LE/EF

Reference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.


16

EE141

9


Intel 45nm Core 2


Read-Write Memory Non-Volatile

Read-Write

Memory Read-Only Memory

EPROM

E 2 PROM

FLASH

Random Access

Non-Random Access

SRAM

DRAM

Mask-Programmed

Programmable (PROM)

FIFO

Shift Register

CAM

LIFO

EE141

10


19

STATIC (SRAM)

DYNAMIC (DRAM)

Data stored as long as supply is applied

Larger (6 transistors/cell)

Fast

Differential (usually)

Periodic refresh required

Smaller (1-3 transistors/cell)

Slower

Single Ended


Conceptual: linear array

Each box holds some data

But this does not lead to a nice layout shape

Too long and skinny

Create a 2-D array

Decode Row and Column

address to get data

EE141

11



Word 0 Word 1 Word 2

Word N-2

Word N-1

Storage cell

M bits M bits

N

words

S 0 S 1 S 2

S N-2

A 0 A 1

A K-1

K = log 2 N

S N -1

Word 0 Word 1 Word 2

Word N-2

Word N-1

Storage cell

S 0

Input-Output ( M bits)

Intuitive architecture for N x M memory

Too many select signals:

N words == N select signals K = log 2 N Decoder reduces the number of select signals

Input-Output ( M bits)

D

e

c

o

d

e

r

EE141

12


Collection of 2M complex logic gates

Organized in regular and dense fashion

(N)AND Decoder

NOR Decoder


Look at decoder for 256x256 memory

block (8KBytes)

EE141

13


Goal: Build fastest possible decoder with

static CMOS logic

What we know

Basically need 256 AND

gates, each one of them

drives one word line

N=8


Each word line has 256 cells connected to it

Total output load is 256*Ccell + Cwire

Assume that decoder input capacitance is

Caddress=4*Ccell

Each address drives 28/2 AND gates A0 drives of the gates, A0_b the other of the

gates

Neglecting Cwire, the fan-out on each one of the

16 address wires is: B

EE141

14


FB of at least 213 means that we will want to

use more than log4(213) = 6.5 stages to

implement the AND8

Need many stages anyways

So what is the best way to implement the AND

gate?

Will see next that it’s the one with the most stages

and least complicated gates


LE=10/3 1

LE = 10/3

P = 8 + 1

LE=2 5/3

LE = 10/3

P = 4 + 2

LE=4/3 5/3 4/3 1

LE = 80/27

P = 2 + 2 + 2 + 1

EE141

15


Using 2-input NAND gates 8-input gate takes 6 stages

Total LE is (4/3)3 2.4

So PE is 2.4*213 – optimal N of ~7.1


256 8-input AND gates

Each built out of

tree of NAND gates

and inverters

Issue:

Every address line has

to drive 128 gates (and

wire) right away

Can’t build gates small enough - Forces us

to add buffers just to drive address inputs

EE141

16



Use a single gate for each of the shared

terms

E.g., from A0, A0, A1, and A1, generate four

signals: A0A1, A0A1, A0A1, A0A1

In other words, we are decoding smaller

groups of address bits first

And using the “predecoded” outputs to do

the rest of the decoding

EE141

17



EE141

18


Two options for predecoding:


Larger predecode usually better:

More stages before the long wires Decreases their effect on the circuit

Fewer long wires switch Lower power

Easier to fit 2-input gate into cell pitch

EE141

19


Given decoder structure, input capacitance, final load

Can size the entire chain using LE for minimum delay

Is this the “best” we can do in terms of power too?

Not necessarily – probably want to reduce sizes – (especially on final decoder inputs)

Is there anything else we can do to improve energy even further?

Documents

Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm