Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
VLSI Design : Chapter 5-1 1
Schedule
07. 04/09/21 (QZ1) Packaging (Flow and Reliability)
08. 04/16/21 Midterm Examination
09. 04/23/21 Exam review , Chapter 6 (Adder)
10. 04/30/21 停課
11. 05/07/21 Chapter 6 (Multiplier)
VLSI Design : Chapter 5-1 2
Schedule
12. 05/14/21 Chapter 7 (Placement, Routing, Special nets)
13. 05/21/21 Chapter 7 (Placement)
14. 05/28/21 Chapter 7 (Misc. before TO)
15. 06/04/21 (QZ2) Chapter 8 (Architecture)
16. 06/11/21 Chapter 8 (Architecture)
17. 06/18/21 Final Examination
18. 06/25/21 Exam review, What’s Next?
VLSI Design: Chapter 6-1 3
Design Flow
Design flow is not fixed. It depends on the design application, technologies, company resources….and the most important, design managers…
Design flow will be shown in Chapter 9 again, we explain it tonight and will revisit it weeks later.
VLSI Design: Chapter 6-1 4
A Simplified Design Flow
System Design
Verify and Debug
Synthesis
Gate Level
Virtual Prototype
Place and Route
RC Extraction
Transistor CKT
GDSII
RTL Design
System
RTL
Netlist
Timing
Physical
SPICE
VLSI Design: Chapter 6-1 5
Design Flow -1 (early 90’s)
Chip integration
RTL Database
Chip Simulation *
Bottleneck Analysis
Constraints
Sub-block RTL
coding and simulation *
IP Timing Model
Timing Libraries
Architectural
Functions, I/Fs Definition,
Process, Foundry, IP selection
Gate-Level
Pre Simulation *
Documents
Standard Flow
Notes
Synthesis
Gate Level Database
VLSI Design: Chapter 6-1 6
Design Flow -2 (early 90’s)
Data format
Standard Flow
Notes
Placement
& Routing
Buffer insertion,
Clock Tree Synthesis
Spare gates
EDIF, HDL
Gate Level Netlist, SDC
IPs, Floorplan
Standard cell, IOs,
PLL, MEM, …
SDF, Set load, SPF
RC Extraction
(Back Annotate)
Synthesis
Post Layout
Simulation *
Test Pattern
Generation
Tapeout
DRC/ERC/LVS *
GDSII
VLSI Design: Chapter 6-1 7
Chapter 6: System Design
Memory
Program
Decode
ALU (exe)
Store
Instruction
Address
Data
Address
Data
+ - * / shifting compare
move permutation …….
VLSI Design: Chapter 6-1 8
Chapter 6: ALU
ALU design
1. Shifter
2. Adder / Substrater.
3. Compare
4. Multiplier
5. Divider
VLSI Design: Chapter 6-1 9
ALUs
ALU (Arithmetic Logic Unit) computes a
variety of logical and arithmetic functions
based on opcode (operation code).
ALU built around adder, since carry chain
determines delay.
VLSI Design: Chapter 6-1 10
Combinational shifters
Useful for arithmetic operations, bit field
extraction, etc.
Shift register can shift only one bit per clock
cycle. A multiple-bits-shifter requires
additional connectivity.
VLSI Design: Chapter 6-1 11
Barrel shifter
Perform n-bit shifts in a single cycle.
Need efficient layout.
Require transmission gates and long wires.
VLSI Design: Chapter 6-1 12
Barrel shifter structure
Accepts 2n data inputs and n control signals,
producing n data outputs.
VLSI Design: Chapter 6-1 13
Barrel shifter operation
Selects arbitrary contiguous n bits out of 2n
input bus.
Examples:
right shift: data into top, 0 into bottom;
left shift: 0 into top, data into bottom;
rotate: data into top and bottom.
VLSI Design: Chapter 6-1 14
Barrel shifter layout
Two-dimensional array of 2n vertical X n
horizontal cells.
Input data travels diagonally upward. Output
wires travel horizontally.
Control signals run vertically. Exactly one
control signal is set to 1, turning on all
transmission gates in that column.
VLSI Design: Chapter 6-1 15
Barrel shifter cell
Large number of cells, but each one is small.
Delay is large: long wires and transmission gates.
VLSI Design: Chapter 6-1 16
Shuffler vs. Shifter
Sifter will not change the bit order,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, 0 (Warp around)
1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, x (Shift out)
Byte swap (word swap)
8, 9, a, b, c, d, e, f , 0, 1, 2, 3, 4, 5, 6, 7
Shuffler will change the order
0, 8, 1, 9, 2, a, 3, b, 4, c, 5, d, 6, e, 7, f
VLSI Design: Chapter 6-1 17
Byte/Word Swap
B0, B1, B2, B3
B2, B3, B0, B1
B0, B1, B2, B3
B0, B2, B1, B3Other applications: Encryption
VLSI Design: Chapter 6-1 18
2. Adders
Adder delay is dominated by carry.
Carry chain analysis must consider transistor,
wiring delay.
Modern VLSI favors adder designs which
have compact carry chains.
VLSI Design: Chapter 6-1 19
Most common adders
Serial adder
Ripple adder
Carry Look Ahead adder (CLA)
Carry Skip adder
Carry Select adder
VLSI Design: Chapter 6-1 20
Full adder
Computes one-bit sum, carry:
si = ai XOR bi XOR ci
ci+1 = aibi + aici + bici
Carry Ripple adder: n-bit adder built from
full adders.
Delay of ripple-carry adder goes through all
carry bits.
ai bi
ci
si
ci+1Full adder
VLSI Design: Chapter 6-1 21
Full Adder Truth Table
a b Ci Sum Co
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
VLSI Design: Chapter 6-1 23
Serial adder structure
LSB control signal clears the carry shift
register:
Carry Out
Sum
Carry
In
VLSI Design: Chapter 6-1 24
Ripple adder
May be used in signal-processing arithmetic
where fast computation is important but
latency is unimportant.
Data format (LSB first):
ai bi
ci
si
ci+1Full adder
ai+1
si+1
bi+1
ci+2Full adder
ai+2
si+2
bi+2
ci+3Full adder
VLSI Design: Chapter 6-1 25
Carry-look-ahead adder
First compute carry propagate, generate:
Pi = ai + bi
Gi = ai bi
Pi XOR Gi = ai XOR bi
Compute sum and carry from P and G:
si = ci XOR Pi XOR Gi
ci+1 = Gi + Pici = aibi + (ai+ bi) ci
VLSI Design: Chapter 6-1 26
Carry-look-ahead expansion
Can recursively expand carry formula:
ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1)
ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2)
Expanded formula does not depend on
intermediate carries, Gi-1and ci-1.
Allows carry for each bit to be computed
independently.
VLSI Design: Chapter 6-1 28
Analysis
Deepest carry expansion requires gates with large fanin/fanout --> slow.
Carry-look ahead unit requires complex wiring between adders and look ahead unit—values must be routed back from look ahead unit to adder.
Layout is even more complex with multiple levels of look ahead.
VLSI Design: Chapter 6-1 29
Carry-skip adder
Looks for cases in which carry out of a set of
bits is identical to carry in.
Typically organized into m-bit stages.
VLSI Design: Chapter 6-1 30
Carry-select adder
Computes two results in parallel, each for
different carry input assumptions.
Uses actual carry in to select correct result.
Reduces delay
VLSI Design: Chapter 6-1 32
Adders Conclusions
Serial adder
Ripple adder
Carry Look Ahead adder
Carry Skip adder
Carry Select adder
The speed is faster at the bottom of above list, but
the area/cost is higher
VLSI Design: Chapter 6-1 33
Substrate
Negative number: inverted then add one
Using sign-bit to denote the negative numbers
Using adder to perform the substrate
VLSI Design: Chapter 6-1 34
Invert and then plus 1
0001 --> 1110 --> 1111 (-1)
0010 --> 1101 --> 1110 (-2)
0011 --> 1100 --> 1101 (-3)
0100 --> 1011 --> 1100 (-4)
0101 --> 1010 --> 1011 (-5)
0110 --> 1001 --> 1010 (-6)
0111 --> 1000 --> 1001 (-7)
VLSI Design: Chapter 6-1 35
Invert and then plus 1!!
1000 --> 0111 --> 1000 (8) !!!
1001 --> 0110 --> 0111 (7) !!!
Four bits could only stand for -8 ~ +7
VLSI Design: Chapter 6-1 36
Check
0111 + 1111 (-1) = 0110 (6)
0111 + 1110 (-2) = 0101 (5)
0111 + 1101 (-3) = 0100 (4)
0111 + 1100 (-4) = 0011 (3)
0111 + 1011 (-5) = 0010 (2)
0111 + 1010 (-6) = 0001 (1)
0111 + 1001 (-7) = 0000 (0)
VLSI Design: Chapter 6-1 37
Modify an Adder to ADD/SUB
ai bi
ci
si
ci+1Full adder
ai+1
si+1
bi+1
ci+2Full adder
ai+2
si+2
bi+2
ci+3Full adder
VLSI Design: Chapter 6-1 38
Compare
Using substrate to get the comparison results:
smaller / equal or smaller/ equal to / .../greater
VLSI Design: Chapter 6-1 39
Multipliers
0 1 1 0 (6) multiplicand
x 1 0 0 1 (9) multiplier
0 1 1 0
+ 0 0 0 0
+ 0 0 0 0
+ 0 1 1 0
0 1 1 0 1 1 0 (2+4+16+32=54)
partial product
VLSI Design: Chapter 6-1 41
Serial-parallel multiplier
Used in serial-arithmetic operations.
Multiplicand can be held in place by register.
Multiplier is shifted into array.
VLSI Design: Chapter 6-1 43
Array multiplier
Array multiplier is an efficient layout of a
combinational multiplier.
Array multipliers may be pipelined to speed
up clock period at the expense of latency.
VLSI Design: Chapter 6-1 44
Unsigned array multiplier
+
x0y0x1y0x2y0
xny0
0
x0y1+ x1y1
0
+ x0y2+ x1y2
+ 0+
P(2n-1) P(2n-2) P0
VLSI Design: Chapter 6-1 45
Array Multiplier
0 1 1 0 (6) multiplicand
x 1 0 0 1 (9) multiplier
0 1 1 0
+ 0 0 0 0
+ 0 0 0 0
+ 0 1 1 0
0 1 1 0 1 1 0 (2+4+16+32=54)
skew array
for rectangular
layout
VLSI Design: Chapter 6-1 46University of PatrasICECS 2010, Athens, Greece 46
Architectures of multiply adders
VLSI Design: Chapter 6-1 47
Booth multiplier
Encoding scheme to reduce number of stages
in multiplication.
Performs two bits of multiplication at once—
requires half the stages.
Each stage is slightly more complex than
simple multiplier, but adder / subtracter is
almost as small/fast as adder.
VLSI Design: Chapter 6-1 48
Basic Idea of Booth
0 1 1 0 (6) multiplicand
x 1 0 0 1 (9)multiplier
0 1 1 0
+ 0 1 1 0
0 1 1 0 1 1 0
(2+4+16+32=54)
Reduce the depth of the adding process!!
VLSI Design: Chapter 6-1 49
Booth encoding
Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ...
Rewrite using 2a = 2a+1 - 2a:
y = -2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
VLSI Design: Chapter 6-1 50
Booth actions
yi yi-1 yi-2 increment
0 0 0 0
0 0 1 x (+1)
0 1 0 x
0 1 1 2x (1+1)
1 0 0 -2x (2-4, 4 will be added in the next stage)
1 0 1 -x (2-4+1)
1 1 0 -x (3-4)
1 1 1 0 (3-4+1)
VLSI Design: Chapter 6-1 51
Booth actions(cont.)
yi yi-1 yi-2 increment
0 0 0 0
0 1 0 x
1 0 0 -2x (2-4)
1 1 0 -x (3-4)
0 0 1 x (+1)
0 1 1 2x (1+1)
1 0 1 -x (2+1-4)
1 1 1 0 (3+1-4)
Reorder the previous page
VLSI Design: Chapter 6-1 52
Booth example
x = 011001 (2510), y = 101110 (-1810).
y1y0y-1 = 100, (-2x)
P1 = P0 - (10 × 011001) = 11111001110 (-5010)
y3y2y1= 111, (0)
P2 = P1+ 0 = 11111001110.
y5y4y3= 101, (-x)
P3 = P2 - 0110010000 = 11000111110. (-45010)
y7y6y5= 111, (0) where y7y6 are sign extension
shift
VLSI Design: Chapter 6-1 53
Booth example
1810 = 00100102 1101101 1101110 (-1810).
0011001 (2510, x), 0110010 (5010, 2x), 1100111 (-2510, -x), 1001110 (-5010, -2x)
0011001
x 111011100
11111111001110 -2x
000000000000 0
1111100111 -x
00000000 0
11111000111110 ( -45010)
VLSI Design: Chapter 6-1 54
Example (2)
0 1 1 0 (6) multiplicand
x 1 0 0 1 (9) multiplier
0 0 0 0 0 0 0 0 1 1 0 (x)
1 1 1 1 1 0 1 0 0 (-2x, 01100, 10011, 10100 )
0 0 0 0 1 1 0 (x)
0 0 0 1 1 0 1 1 0 (2+4+16+32=54)
Sign extension!!
Question: How many digits do you need?
VLSI Design: Chapter 6-1 55
Example (2 cont.)
Where 00110 (6) -> 6 x (-2) = -12
1~111001 (invert + sign bits)
1~111010 (plus one)
1~110100 (shift to left, times 2)
01100 (12)
10011 inv
10100 plus 1
VLSI Design : Chapter 5-1 57
Schedule
11. 05/07/21 Chapter 6 (Multiplier)
12. 05/14/21 Chapter 7 (Placement, Routing, Special nets)
13. 05/21/21 Chapter 7 (Placement)
14. 05/28/21 Chapter 7 (Misc. before TO)
15. 06/04/21 (QZ2) Chapter 8 (Architecture)
16. 06/11/21 Chapter 8 (Architecture)
17. 06/18/21 Final Examination
18. 06/25/21 Exam review, What’s Next?
VLSI Design: Chapter 6-1 58
Wallace tree
Reduces depth of adder chain.
Built from full adders (carry save adders):
three inputs a, b, c
produces two outputs y, z such that
y + z = a + b + c
Carry-save equations:
yi = parity(ai,bi,ci)
zi = majority(ai,bi,ci)
VLSI Design: Chapter 6-1 59
Wallace tree operation
Final adder completes the summation.
Wiring is more complex.
Can build a Booth-encoded Wallace tree
multiplier.
VLSI Design: Chapter 6-1 60
Example
7 6 5 4 3 2 1 0
* *
* *
*
*
0
1 * S
2 C
3 * S
4 * S C
5 C * S
6 6 * S C * S
7 7 C C C
VLSI Design: Chapter 6-1 61
Example (2)
6 5 4 3 2 1 0
* *
*
*
*
0
1 * S
2 C
3 * S
4 * S C * S
5 C C C * S
6 6 6 6 C
VLSI Design: Chapter 6-1 62
Multipliers conclusions
Serial-parallel multiplier
Array multiplier
Booth ~ Wallace Tree
Speed is getting faster, but, again, area/cost is
higher
VLSI Design: Chapter 6-1 63
Huge multiplier
a X + b multiplicand
x c X + d multiplier
ad X + bd
+ ac X2 + bc X
ac X2 + (ad + bc) X + bd
Reduce the length of the multiplication by
four shorten multiplications!!
VLSI Design: Chapter 6-1 64
Extra Home Works
Please using inverter, NAND and inverter
gates design a full adder.
Using full adder, MUX, AOI, and the gates
assigned above, building a 4x4 multiplier
with any method you learned.
VLSI Design: Chapter 6-1 65
A Fixed Number Divider
Divider could be implement by adder (substrater)
and shifter
substrate than shift; substrate than shift;….
Divider could be built by multiplier and shifter too
X / Y = X * (1024/Y) / 1024
Example: 300 / 14 = 21.4 (where 1024/14 = 73.14)
300 * (73) /1024 = 21.38
VLSI Design: Chapter 6-1 66
SIMD
Single Instruction, Multiple Data
(B0, B1, B2, B3 ) + (B0, B1, B2, B3 )
(B0+B0), (B1+B1), (B2+B2), (B3+B3)
Or
(W0, W1) x (W0, W1)
(W0 x W0), (W1 x W1)
VLSI Design: Chapter 6-1 67
SIMD
Streaming SIMD Extensions (SSE) is a SIMD
instruction set extension to the x86 architecture, designed
by Intel and introduced in 1999 in their Pentium
III series.
SIMD instructions can greatly increase performance
when exactly the same operations are to be performed on
multiple data objects. Typical applications are digital
signal processing and graphics processing.
VLSI Design: Chapter 6-1 68
SIMD
Single instruction, multiple data (SIMD). It describes
computers with multiple processing elements that
perform the same operation on multiple data points
simultaneously. There are simultaneous (parallel)
computations, but only a single process (instruction) at a
given moment. SIMD is particularly applicable to
common tasks like adjusting the contrast in a digital
image or adjusting the volume of digital audio. Most
modern CPU designs include SIMD instructions in order
to improve the performance of multimedia use.
VLSI Design: Chapter 6-1 69
GPU / VPU
A graphics processing unit (GPU) is a specialized electronic circuit designed to
rapidly manipulate and alter memory to accelerate the creation of images in a frame
buffer intended for output to a display device. GPUs are used in embedded
systems, mobile phones, personal computers, workstations, and game consoles.
Modern GPUs are very efficient at manipulating computer graphics and image
processing, and their highly parallel structure makes them more efficient than general-
purpose CPUs for algorithms where the processing of large blocks of data is done in
parallel. In a personal computer, a GPU can be present on a video card, or it can be
embedded on the motherboard or—in certain CPUs—on the CPU die.
The term GPU was popularized by Nvidia in 1999, who marketed the GeForce 256 as
"the world's first GPU", or Graphics Processing Unit. It was presented as a "single-
chip processor with integrated transform, lighting, triangle setup/clipping, and
rendering engines". Rival ATI Technologies coined the term "visual processing unit"
or VPU with the release of the Radeon 9700 in 2002.
-- Wiki
VLSI Design: Chapter 6-1 70
AI processor / GPGPU / TPU
An AI accelerator is (as of 2016) an emerging class of microprocessor or
computer system designed to accelerate artificial neural networks, machine
vision and other machine learning algorithms for robotics, internet of things
and other data-intensive or sensor-driven tasks. – Wiki
GPGPU, General-Purpose computing on Graphics Processing Units
TPU, A tensor processing unit (TPU) is an application-specific integrated
circuit (ASIC) developed by Google specifically for machine learning.
VLSI Design: Chapter 6-1 72
Data paths
A data path is a logical and a physical
structure:
bitwise logical organization;
bitwise physical design.
Datapath often has ALU, registers, some other
function units.
Data is passed via busses.
VLSI Design: Chapter 6-1 73
Typical data path structure
Slice includes one bit of function units,
connected by busses:
VLSI Design: Chapter 6-1 74
Bit-slice structure
Many arithmetic and logical functions can be
defined recursively on bits of word.
A bit-slice is one-bit (or n-bit) segment of an
operation of minimum size to ensure
regularity.
Regular logical structure allows regular
physical structure.
VLSI Design: Chapter 6-1 75
Abutting and pitch-matching
Cells in bit-slice may be abutted together—
requires matching positions on terminals.
Pitch-matching is designing cells to ensure
that pins are at proper positions for abutting.
VLSI Design: Chapter 6-1 76
Well Error
NW to
PDIFF
Space error
METAL
width
error
Pdiff and ndiff error
VLSI Design: Chapter 6-1 77
Wiring plans
A wiring plan shows layer assignments and
directions for major signals.
Put most important signals on lowest-
impedance, accessible layers.
cell1 cell2 cell3
VDD
VSS
VLSI Design: Chapter 6-1 78
Example
Pick a process ( 1P3M, 2P5M…).
Decide cell height (MUX821 or DFF with rstn).
Calculate how many routing channels can be
use in the X direction.
Make a rough floorplan for datapath and
calculate the wires go through any Y-
direction cross-section.
VLSI Design: Chapter 6-1 79
Programmable logic array (PLA)
Used to implement special logic functions.
A PLA decodes only some addresses (input
values); a ROM decodes all addresses.
PLA not as common in CMOS as in nMOS,
but it has been used for logic functions.
VLSI Design: Chapter 6-1 80
PLA continue
Any pure logic function can be build with
only AND, OR, INV functions and
expressed as a truth table.
We implement the truth table into a
Programmable Logic Array (PLA).