Upload
egbert-warner
View
226
Download
0
Embed Size (px)
Citation preview
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
CSE477VLSI Digital Circuits
Fall 2002
Lecture 21: Multiplier DesignMary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg477
[Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Review: Basic Building Blocks Datapath
– Execution units» Adder, multiplier, divider, shifter, etc.
– Register file and pipeline registers– Multiplexers, decoders
Control– Finite state machines (PLA, ROM, random logic)
Interconnect– Switches, arbiters, buses
Memory– Caches (SRAMs), TLBs, DRAMs, buffers
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Review: Binary Adder Landscapesynchronous word parallel adders
ripple carry adders (RCA) carry prop min adders
signed-digit fast carry prop residue adders adders adders
Manchester carry parallel conditional carry carry chain select prefix sum skip
T = O(N), A = O(N)
T = O(1), A = O(N)
T = O(log N)A = O(N log N)
T = O(N), A = O(N)T = O(N)
A = O(N)
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Multiply Operation
Multiplication as repeated additions
multiplicand
multiplier
partialproductarray
double precision product
N
2N
N can be formed in parallel
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Shift & Add Multiplication Right shift and add
– Partial product array rows are accumulated from top to bottom on an N-bit adder
– After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
– Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster– Use a faster adder– Use higher radix (e.g., base 4) multiplication
»Use multiplier recoding to simplify multiple formation
– Form partial product array in parallel and add it in parallel
Making it smaller (i.e., slower)– Use an array multiplier
»Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI
»Can be easily and efficiently pipelined
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Tree Multiplier Structure
partial productarray reduction tree
fast carry propagate adder (CPA)
P (product)
mux + reductiontree (log N)+CPA (log N)
Q (‘ier)
D (‘icand)
DD
D
0
00
0
multiple forming circuits
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
(4,2) Counter Built out of two (3,2) counters (just FA’s!)
– all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
– the internal output is carried to the next higher weight position (indicated by the )
(3,2)
(3,2)Note: Two carry outs - one “internal” and one “external”
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Tiling (4,2) Counters
Reduces columns four high to columns only two high– Tiles with neighboring (4,2) counters– Internal carry in at same “level” (i.e., bit position weight)
as the internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
4x4 Partial Product Array Reduction
multiplicand
multiplier
partialproductarray
reduced pp array (to CPA)
double precision product
Fast 4x4 multiplication using (4,2) counters
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
8x8 Partial Product Array Reduction
‘icand
‘ier
partialproductarray
reduced partial product array
How many (4,2) countersminimum are needed to reduce it to 2 rows?
Answer: 24
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Alternate 8x8 Partial Product Array Reduction
‘icand
‘ier
partialproductarray
reduced partial product array
More (4,2) counters, so what is the advantage?
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Array Reduction Layout Approach
multiple generators
multiplicand
multiple selection signals(‘ier)
. . .2(4,2) counter slice
(4,2) counter slice
(4,2) counter slice
CPA
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Next Lecture and Reminders
Next lecture– Shifters, decoders, and multiplexers
»Reading assignment – Rabaey, et al, 11.5-11.6
Reminders– Project final reports due December 5th – HW5 (last one!) due November 19th – Final grading negotiations/correction (except for the final
exam) must be concluded by December 10th – Final exam scheduled
»Monday, December 16th from 10:10 to noon in 118 and 121 Thomas
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Topics
Adders and ALUs (§6.4, §6.5) Multipliers (§6.6)
– Array multiplier– Baugh-Wooley multiplier– Booth encoding– Wallace tree multiplier
Subsystem design principles (§6.2)
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Elementary School Algorithm
0 1 1 0 multiplicand
× 1 0 0 1 multiplier
0 1 1 0
+ 0 0 0 0
0 0 1 1 0
+ 0 0 0 0
0 0 0 1 1 0
+ 0 1 1 0
0 1 1 0 1 1 0
partial products
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Combinational Multiplier
bit of multiplier controls whether addition occurs
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Array Multiplier
Regular layout – An n × m cell layout – Easy to be pipelined – Used frequently in FPGA and ASICs
Critical path– Less than (n+m-1) bit adder delay
Handles unsigned multiplication ONLY
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
A 4 × 4 Unsigned Array Multiplier
skew arrayfor rectangularlayout
X3 X2 X1 X0
× Y3 Y2 Y1 Y0
X3Y0 X2Y0 X1Y0 X0Y0
X3Y1 X2Y1 X1Y1 X0Y1
X3Y2 X2Y2 X1Y2 X0Y2
X3Y3 X2Y3 X1Y3 X0Y3
P7 P6 P5 P4 P3 P2 P1 P0
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Unsigned Array Multiplier
+
a
b
Cin
Cout Sum
+ x0y1
x0y2
P1+
x0y0
x0y3+
0+
+
0
x1y1
x1y2+
x1y0
x1y3+
+
P2
P3
P4
0
P0+
0
x2y1
x2y2+
x2y0
x2y3+
+
x3y1
x3y2
x3y0
x3y3
P5P6P7
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Signed Multiplication
Signed number representation–
Signed n×n multiplication– (1110)2 × (0011)2 = (1010)2 (-2) × 3 = (-6)
– No difference from unsigned multiplication if the result has the same bit-width as the input
But what if we want the result to be 2n bit?– Use sign-bit extension
– Needs 2n × 2n array multiplier
2
0
11 22
n
i
ii
nn xxX
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Baugh-Wooley Multiplier: Principle
2
0
111
2
0
2
0
2211 2)(22
n
i
niinin
n
i
n
j
jiji
nnn xyyxyxyxXY
ii xx 1 ii yy 1
111
221111 2)(2)( n
nnn
nnnn yxyxyxXY
2
0
111
2
0
2
0
2)(2n
i
niinin
n
i
n
j
jiji xyyxyx
111
221111
12 2)(2)(2 nnn
nnnnn
n yxyxyxXY
2
0
111
2
0
2
0
2)(2n
i
niinin
n
i
n
j
jiji xyyxyx
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Baugh-Wooley Multiplier: Structure
+
a
b
Cin
Cout Sumx3
+ x0y1
x0y2
P1+
x0y0
x0y3+
y3+
+
0
x1y1
x1y2+
x1y0
x1y3+
+
P2
P3P4
0
P0+
0
x2y1
x2y2+
x2y0
x2y3+
+
x3y1
x3y2
x3y0
P5P6P7
1
y3
x3y3+
+ +
x3
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Booth Multiplier
Utilize Booth encoding scheme
Booth encoding scheme Handles signed multiplication Reduce the number of partial products by half Small area and fast Encoding scheme cannot be applied hierarchically
» Often used as the first stage partial products reduction
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Booth Encoding: Principle
Two’s-complement form of multiplier y– –
Consider first two terms– – By looking at three bits of y, we can determine
whether to add x, 2x to partial product.
...222 33
22
11
n
nn
nn
n yyyY
...2)(2)(2)( 334
223
112
n
nnn
nnn
nn yyyyyyY
...2)2(2)2( 4543
2321
n
nnnn
nnn XyyyXyyyXY
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Booth Actions
yi yi-1 yi-2 increment
0 0 0 0
0 0 1 X
0 1 0 X
0 1 1 2X
1 0 0 -2X
1 0 1 -X
1 1 0 -X
1 1 1 0
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Booth Example
Don’t forget the sign extension of the encoded value when add them together – Only have to extend 2 bits though
x = 011001 (2510), y = 101110 (-1810).
y1y0y-1 = 100, P1 = P0 - (10 011001) = 11111001110
y3y2y1= 111, P2 = P1 0 = 11111001110.
y5y4y3= 101, P3 = P2 - 0110010000 = 11000111110.
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Wallace Tree
Reduces the number of partial products Built from carry-save adders:
– Three inputs: a, b, c – Two outputs: y, z such that y + z = a + b + c
Carry-save equations:– yi = ai bi ci
– zi+1 = aibi + bici + ciai
– What’s the difference from carry-ripple adder?
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Wallace Tree Structure
FA FA FA
a2 b2c2a1 b1
c1 a0 b0c0
s0s1s2
carry-ripple adder
FA FA FA
a2 b2c2a1 b1
c1 a0 b0c0
y0
carry-save adder
z1y1z2y2z3
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Wallace Tree Operation
n additions are reduced to (2n/3) additions after each level– Sum of inputs = Sum of outputs– Can apply the reduction hierarchically– More efficient design uses 4-2 adders to reduce
n additions to (n/2) additions after each level
Need final adder to add the last two numbers
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
A Booth-Wallace Tree Multiplier
4-2 adder array 4-2 adder array 4-2 adder array 4-2 adder array FF
B B B B B B B B B B B B B B B B B
4-2 adder array 4-2 adder array FF
4-2 adder array FF
3-2 adder array
64-bit adder
Booth encoders
Wallace tree level 1
Wallace tree level 2
Wallace tree level 3
Wallace tree level 4
Final Adder(not part of pipeline)
Most commonly used high-performance multiplier
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Topics
Adders and ALUs (§6.4, §6.5)
Multipliers (§6.6)
Subsystem design principles (§6.2)
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Pipelining
Pipelining can be used to reduce clock period at the expense of latency:
combinationallogic 1
combinationallogic 2
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Cycle Time and Latency
# stages
cycl
e ti
me
# stages
late
ncy
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Data Paths
A data path is a logical and physical structure:– bit-wise logical organization– bit-wise physical structure
Data paths generally use busses to pass data between function units.
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Bit Slice Organization
registers shifter ALU
bit n-1
bit 0
bus
control
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Data Path Cell Design
Connections may be made by:– abutment, requiring stretching cells;– river routing, requiring a routing channel
between function units.
Digital Integrated Circuits Chpt. 5 Lec. 01- 08/29/2006
Project
Due 10/26– Schematic– Verilog/Spectre simulation results– 10/27 presentation (10-15 PowerPoint slides)
Important (efficiency-related) – How to add array of instances