19
EE141 1 EE141 EECS141 1 Lecture #7 EE141 EECS141 2 Lecture #7 Lab 3 this week No lab next week Midterm on We Febr 18 2-3:30pm in 203 McLaughlin Review Session: Tu Febr 17 6-7:30pm in 247 Cory

Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

1

EE141 EECS141 1 Lecture #7

EE141 EECS141 2 Lecture #7

Lab 3 this week

No lab next week

Midterm on We Febr 18 2-3:30pm in 203 McLaughlin

Review Session: Tu Febr 17 6-7:30pm in 247 Cory

Page 2: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

2

EE141 EECS141 3 Lecture #7

Last lecture

Optimizing complex logic

Today’s lecture

Applying what we learned on memory

decoders

Reading (Ch 6.2, 12.1,12.3)

EE141 EECS141 4 Lecture #7

Measure everything in units of tinv (divide by tinv):

p – intrinsic delay (k g) - gate parameter f(W)

LE – logical effort (k) – gate parameter f(W)

f – electrical effort (effective fanout)

Normalize everything to an inverter:

LEinv =1, pinv =

tpgate = tinv (p + LE f)

Page 3: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

3

EE141 EECS141 5 Lecture #7

OUT = D + A • (B + C)

D

A

B C

D

A

B

C

EE141 EECS141 6 Lecture #7

Effective fanout: EFi = LEifi

Path electrical fanout: F = Cout/Cin

Path logical effort: LE = LE1LE2…LEN

Branching effort: B = b1b2…bN

Path effort: PE = LE F

Path delay D = di = pi + EFi

Page 4: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

4

EE141 EECS141 7 Lecture #7

When each stage bears the same effort:

Minimum path delay

Effective fanouts: LE1f1 = LE2f2 = … = LENfN

EE141 EECS141 8 Lecture #7

For a given load,

and given input capacitance of the first gate

Find optimal number of stages and optimal sizing

The ‘best effective fanout’

Remember: we can always add inverters to the end of the chain

is still around 4

(3.6 with =1)

Page 5: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

5

EE141 EECS141 9 Lecture #7

Electrical fanout, F =

LE =

PE = EF/stage =

a =

b =

c =

LE = 1

f = a LE = 5/3

f = b/a

LE = 5/3

f = c/b

LE = 1

f = 5/c

EE141 EECS141 10 Lecture #7

Electrical fanout, F = 5

LE = 1·(5/3)·(5/3)·1 = (25/9)

PE = ( LE)·F = (125/9) EF/stage = (125/9)^(1/4) = 1.93

a = 1.93

b = 2.23

c = 2.59

LE = 1

f = a LE = 5/3

f = b/a

LE = 5/3

f = c/b

LE = 1

f = 5/c

5/c = 1.93

(5/3)c/b = 1.93

(5/3)b/a = 1.93

Page 6: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

6

EE141 EECS141 11 Lecture #7

LE=10/3 1

LE = 10/3

P = 8 + 1

LE=2 5/3

LE = 10/3

P = 4 + 2

LE=4/3 5/3 4/3 1

LE = 80/27

P = 2 + 2 + 2 + 1

EE141 EECS141 12 Lecture #7

Branching effort:

Page 7: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

7

EE141 EECS141 13 Lecture #7

5

15

15

90

90

LE =

FO =

PE = SE1 =

SE2 =

PE =

1

90/5 = 18

18 (wrong!) (15+15)/5 = 6 90/15 = 6

36, not 18!

Introduce new kind of effort to account for branching:

• Branching Effort:

• Path Branching Effort:

Con-path + Coff-path

Con-path

b =

bi B =

Now we can compute the path effort:

• Path Effort: PE = LE·FO·B

Branching Example 1

EE141 EECS141 14 Lecture #7

Select gate sizes y and z to minimize delay from A to B

Logical Effort: LE =

Electrical Effort: FO =

Branching Effort: B =

Path Effort: PE =

Best Stage Effort: SE =

Delay: D =

(4/3)3

Cout/Cin = 9

2•3 = 6

LE·FO B= 128

PE1/3 5

3•5 + 3•2 = 21

Work backward for sizes:

5 z = 9C•(4/3)

= 2.4C

5 y = 3z•(4/3)

= 1.9C

Branching Example 2

Page 8: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

8

EE141 EECS141 15 Lecture #7

Compute the path effort: PE = ( LE)BF

Find the best number of stages N ~ log4PE

Compute the effective fanout/stage EF = PE1/N

Sketch the path with this number of stages

Work either from either end, find sizes: Cin = Cout*LE/EF

Reference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.

EE141 EECS141 16 Lecture #7

16

Page 9: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

9

EE141 EECS141 17 Lecture #7

Intel 45nm Core 2

EE141 EECS141 18 Lecture #7

Read-Write Memory Non-Volatile

Read-Write

Memory Read-Only Memory

EPROM

E 2 PROM

FLASH

Random Access

Non-Random Access

SRAM

DRAM

Mask-Programmed

Programmable (PROM)

FIFO

Shift Register

CAM

LIFO

Page 10: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

10

EE141 EECS141 19 Lecture #7

19

STATIC (SRAM)

DYNAMIC (DRAM)

Data stored as long as supply is applied

Larger (6 transistors/cell)

Fast

Differential (usually)

Periodic refresh required

Smaller (1-3 transistors/cell)

Slower

Single Ended

EE141 EECS141 20 Lecture #7

Conceptual: linear array

Each box holds some data

But this does not lead to a nice layout shape

Too long and skinny

Create a 2-D array

Decode Row and Column

address to get data

Page 11: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

11

EE141 EECS141 21 Lecture #7

EE141 EECS141 22 Lecture #7

Word 0 Word 1 Word 2

Word N-2

Word N-1

Storage cell

M bits M bits

N

words

S 0 S 1 S 2

S N-2

A 0 A 1

A K-1

K = log 2 N

S N -1

Word 0 Word 1 Word 2

Word N-2

Word N-1

Storage cell

S 0

Input-Output ( M bits)

Intuitive architecture for N x M memory

Too many select signals:

N words == N select signals K = log 2 N Decoder reduces the number of select signals

Input-Output ( M bits)

D

e

c

o

d

e

r

Page 12: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

12

EE141 EECS141 23 Lecture #7

Collection of 2M complex logic gates

Organized in regular and dense fashion

(N)AND Decoder

NOR Decoder

EE141 EECS141 24 Lecture #7

Look at decoder for 256x256 memory

block (8KBytes)

Page 13: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

13

EE141 EECS141 25 Lecture #7

Goal: Build fastest possible decoder with

static CMOS logic

What we know

Basically need 256 AND

gates, each one of them

drives one word line

N=8

EE141 EECS141 26 Lecture #7

Each word line has 256 cells connected to it

Total output load is 256*Ccell + Cwire

Assume that decoder input capacitance is

Caddress=4*Ccell

Each address drives 28/2 AND gates A0 drives of the gates, A0_b the other of the

gates

Neglecting Cwire, the fan-out on each one of the

16 address wires is: B

Page 14: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

14

EE141 EECS141 27 Lecture #7

FB of at least 213 means that we will want to

use more than log4(213) = 6.5 stages to

implement the AND8

Need many stages anyways

So what is the best way to implement the AND

gate?

Will see next that it’s the one with the most stages

and least complicated gates

EE141 EECS141 28 Lecture #7

LE=10/3 1

LE = 10/3

P = 8 + 1

LE=2 5/3

LE = 10/3

P = 4 + 2

LE=4/3 5/3 4/3 1

LE = 80/27

P = 2 + 2 + 2 + 1

Page 15: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

15

EE141 EECS141 29 Lecture #7

Using 2-input NAND gates 8-input gate takes 6 stages

Total LE is (4/3)3 2.4

So PE is 2.4*213 – optimal N of ~7.1

EE141 EECS141 30 Lecture #7

256 8-input AND gates

Each built out of

tree of NAND gates

and inverters

Issue:

Every address line has

to drive 128 gates (and

wire) right away

Can’t build gates small enough - Forces us

to add buffers just to drive address inputs

Page 16: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

16

EE141 EECS141 31 Lecture #7

EE141 EECS141 32 Lecture #7

Use a single gate for each of the shared

terms

E.g., from A0, A0, A1, and A1, generate four

signals: A0A1, A0A1, A0A1, A0A1

In other words, we are decoding smaller

groups of address bits first

And using the “predecoded” outputs to do

the rest of the decoding

Page 17: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

17

EE141 EECS141 33 Lecture #7

EE141 EECS141 34 Lecture #7

Page 18: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

18

EE141 EECS141 35 Lecture #7

Two options for predecoding:

EE141 EECS141 36 Lecture #7

Larger predecode usually better:

More stages before the long wires Decreases their effect on the circuit

Fewer long wires switch Lower power

Easier to fit 2-input gate into cell pitch

Page 19: Lab 3 this week No lab next week Midterm on We Febr 18 2 …bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s09/Lectures/Lectu… · No lab next week Midterm on We Febr 18 2-3:30pm

EE141

19

EE141 EECS141 37 Lecture #7

Given decoder structure, input capacitance, final load

Can size the entire chain using LE for minimum delay

Is this the “best” we can do in terms of power too?

Not necessarily – probably want to reduce sizes – (especially on final decoder inputs)

Is there anything else we can do to improve energy even further?