Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
EE141
1
EE141 EECS141 1 Lecture #7
EE141 EECS141 2 Lecture #7
Lab 3 this week
No lab next week
Midterm on We Febr 18 2-3:30pm in 203 McLaughlin
Review Session: Tu Febr 17 6-7:30pm in 247 Cory
EE141
2
EE141 EECS141 3 Lecture #7
Last lecture
Optimizing complex logic
Today’s lecture
Applying what we learned on memory
decoders
Reading (Ch 6.2, 12.1,12.3)
EE141 EECS141 4 Lecture #7
Measure everything in units of tinv (divide by tinv):
p – intrinsic delay (k g) - gate parameter f(W)
LE – logical effort (k) – gate parameter f(W)
f – electrical effort (effective fanout)
Normalize everything to an inverter:
LEinv =1, pinv =
tpgate = tinv (p + LE f)
EE141
3
EE141 EECS141 5 Lecture #7
OUT = D + A • (B + C)
D
A
B C
D
A
B
C
EE141 EECS141 6 Lecture #7
Effective fanout: EFi = LEifi
Path electrical fanout: F = Cout/Cin
Path logical effort: LE = LE1LE2…LEN
Branching effort: B = b1b2…bN
Path effort: PE = LE F
Path delay D = di = pi + EFi
EE141
4
EE141 EECS141 7 Lecture #7
When each stage bears the same effort:
Minimum path delay
Effective fanouts: LE1f1 = LE2f2 = … = LENfN
EE141 EECS141 8 Lecture #7
For a given load,
and given input capacitance of the first gate
Find optimal number of stages and optimal sizing
The ‘best effective fanout’
Remember: we can always add inverters to the end of the chain
is still around 4
(3.6 with =1)
EE141
5
EE141 EECS141 9 Lecture #7
Electrical fanout, F =
LE =
PE = EF/stage =
a =
b =
c =
LE = 1
f = a LE = 5/3
f = b/a
LE = 5/3
f = c/b
LE = 1
f = 5/c
EE141 EECS141 10 Lecture #7
Electrical fanout, F = 5
LE = 1·(5/3)·(5/3)·1 = (25/9)
PE = ( LE)·F = (125/9) EF/stage = (125/9)^(1/4) = 1.93
a = 1.93
b = 2.23
c = 2.59
LE = 1
f = a LE = 5/3
f = b/a
LE = 5/3
f = c/b
LE = 1
f = 5/c
5/c = 1.93
(5/3)c/b = 1.93
(5/3)b/a = 1.93
EE141
6
EE141 EECS141 11 Lecture #7
LE=10/3 1
LE = 10/3
P = 8 + 1
LE=2 5/3
LE = 10/3
P = 4 + 2
LE=4/3 5/3 4/3 1
LE = 80/27
P = 2 + 2 + 2 + 1
EE141 EECS141 12 Lecture #7
Branching effort:
EE141
7
EE141 EECS141 13 Lecture #7
5
15
15
90
90
LE =
FO =
PE = SE1 =
SE2 =
PE =
1
90/5 = 18
18 (wrong!) (15+15)/5 = 6 90/15 = 6
36, not 18!
Introduce new kind of effort to account for branching:
• Branching Effort:
• Path Branching Effort:
Con-path + Coff-path
Con-path
b =
bi B =
Now we can compute the path effort:
• Path Effort: PE = LE·FO·B
Branching Example 1
EE141 EECS141 14 Lecture #7
Select gate sizes y and z to minimize delay from A to B
Logical Effort: LE =
Electrical Effort: FO =
Branching Effort: B =
Path Effort: PE =
Best Stage Effort: SE =
Delay: D =
(4/3)3
Cout/Cin = 9
2•3 = 6
LE·FO B= 128
PE1/3 5
3•5 + 3•2 = 21
Work backward for sizes:
5 z = 9C•(4/3)
= 2.4C
5 y = 3z•(4/3)
= 1.9C
Branching Example 2
EE141
8
EE141 EECS141 15 Lecture #7
Compute the path effort: PE = ( LE)BF
Find the best number of stages N ~ log4PE
Compute the effective fanout/stage EF = PE1/N
Sketch the path with this number of stages
Work either from either end, find sizes: Cin = Cout*LE/EF
Reference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.
EE141 EECS141 16 Lecture #7
16
EE141
9
EE141 EECS141 17 Lecture #7
Intel 45nm Core 2
EE141 EECS141 18 Lecture #7
Read-Write Memory Non-Volatile
Read-Write
Memory Read-Only Memory
EPROM
E 2 PROM
FLASH
Random Access
Non-Random Access
SRAM
DRAM
Mask-Programmed
Programmable (PROM)
FIFO
Shift Register
CAM
LIFO
EE141
10
EE141 EECS141 19 Lecture #7
19
STATIC (SRAM)
DYNAMIC (DRAM)
Data stored as long as supply is applied
Larger (6 transistors/cell)
Fast
Differential (usually)
Periodic refresh required
Smaller (1-3 transistors/cell)
Slower
Single Ended
EE141 EECS141 20 Lecture #7
Conceptual: linear array
Each box holds some data
But this does not lead to a nice layout shape
Too long and skinny
Create a 2-D array
Decode Row and Column
address to get data
EE141
11
EE141 EECS141 21 Lecture #7
EE141 EECS141 22 Lecture #7
Word 0 Word 1 Word 2
Word N-2
Word N-1
Storage cell
M bits M bits
N
words
S 0 S 1 S 2
S N-2
A 0 A 1
A K-1
K = log 2 N
S N -1
Word 0 Word 1 Word 2
Word N-2
Word N-1
Storage cell
S 0
Input-Output ( M bits)
Intuitive architecture for N x M memory
Too many select signals:
N words == N select signals K = log 2 N Decoder reduces the number of select signals
Input-Output ( M bits)
D
e
c
o
d
e
r
EE141
12
EE141 EECS141 23 Lecture #7
Collection of 2M complex logic gates
Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
EE141 EECS141 24 Lecture #7
Look at decoder for 256x256 memory
block (8KBytes)
EE141
13
EE141 EECS141 25 Lecture #7
Goal: Build fastest possible decoder with
static CMOS logic
What we know
Basically need 256 AND
gates, each one of them
drives one word line
N=8
EE141 EECS141 26 Lecture #7
Each word line has 256 cells connected to it
Total output load is 256*Ccell + Cwire
Assume that decoder input capacitance is
Caddress=4*Ccell
Each address drives 28/2 AND gates A0 drives of the gates, A0_b the other of the
gates
Neglecting Cwire, the fan-out on each one of the
16 address wires is: B
EE141
14
EE141 EECS141 27 Lecture #7
FB of at least 213 means that we will want to
use more than log4(213) = 6.5 stages to
implement the AND8
Need many stages anyways
So what is the best way to implement the AND
gate?
Will see next that it’s the one with the most stages
and least complicated gates
EE141 EECS141 28 Lecture #7
LE=10/3 1
LE = 10/3
P = 8 + 1
LE=2 5/3
LE = 10/3
P = 4 + 2
LE=4/3 5/3 4/3 1
LE = 80/27
P = 2 + 2 + 2 + 1
EE141
15
EE141 EECS141 29 Lecture #7
Using 2-input NAND gates 8-input gate takes 6 stages
Total LE is (4/3)3 2.4
So PE is 2.4*213 – optimal N of ~7.1
EE141 EECS141 30 Lecture #7
256 8-input AND gates
Each built out of
tree of NAND gates
and inverters
Issue:
Every address line has
to drive 128 gates (and
wire) right away
Can’t build gates small enough - Forces us
to add buffers just to drive address inputs
EE141
16
EE141 EECS141 31 Lecture #7
EE141 EECS141 32 Lecture #7
Use a single gate for each of the shared
terms
E.g., from A0, A0, A1, and A1, generate four
signals: A0A1, A0A1, A0A1, A0A1
In other words, we are decoding smaller
groups of address bits first
And using the “predecoded” outputs to do
the rest of the decoding
EE141
17
EE141 EECS141 33 Lecture #7
EE141 EECS141 34 Lecture #7
EE141
18
EE141 EECS141 35 Lecture #7
Two options for predecoding:
EE141 EECS141 36 Lecture #7
Larger predecode usually better:
More stages before the long wires Decreases their effect on the circuit
Fewer long wires switch Lower power
Easier to fit 2-input gate into cell pitch
EE141
19
EE141 EECS141 37 Lecture #7
Given decoder structure, input capacitance, final load
Can size the entire chain using LE for minimum delay
Is this the “best” we can do in terms of power too?
Not necessarily – probably want to reduce sizes – (especially on final decoder inputs)
Is there anything else we can do to improve energy even further?