Upload
janardhan-ch
View
81
Download
2
Embed Size (px)
Citation preview
Jackson Adders
Prof. David Money Harris
9 July 2010
Jackson Adders 2
Overview
Definitions Tree Adders Ling Adders Jackson Adders 18-bit Jackson Tree Evaluation Methodology Preliminary Results
Jackson Adders 3
Addition
Carry Propagate Adder
– Inputs: AN:0, BN:1
• A0 = Cin
– Outputs: SN:1
• Discard Cout
+
BN...1AN...1
SN...1
CinCou
Jackson Adders 4
Propagate, Generate, KillOh My!
Bitwise Signals– Generate: Gi:i = Gi ≡ AiBi
– Propagate: Pi:i = Pi ≡ Ai+Bi Also called ~Ki
Xi ≡ Ai xor Bi
Group Recursion to form prefixes– Propagate Pi:j = Pi:kPk-1:j
– Generate Gi:j = Gi:k+Pi:kGk-1:j
– Group generates if upper part generates or upper part propagates and the lower part generates
Bitwise Sum
Si = Xi xor Gi-1:0
Jackson Adders 5
Higher Valency Groups
Valency-2
– Propagate Pi:j = Pi:kPk-1:j
– Generate Gi:j = Gi:k+Pi:kGk-1:j
Valency-3
– Propagate Pi:j = Pi:kPk-1:lPl-1:j
– Generate Gi:j = Gi:k+Pi:k (Gk-1:j+Pk-1:IGl-1:j)
Valency-4
– Propagate Pi:j = Pi:kPk-1:lPl-1:mPm-1:j
– Generate Gi:j = Gi:k+Pi:k(Gk-1:j+Pk-1:I(Gl-1:m+Pl-1:mGm-1:j))
Jackson Adders 6
Tree Adders
How should the recursion be organized?
S1
B1A1
P1G1
G0:0
S2
B2
P2G2
G1:0
A2
S3
B3A3
P3G3
G2:0
S4
B4
P4G4
G3:0
A4 Cin
G0 P0
1: Bitwise PG Logic
2: Group PG Logic
3: Sum LogicC0C1C2C3
Cout
Jackson Adders 7
Black and Gray Cells
Black cell: – Group G and P
Gray cell: – Group G only
Inverting vs. non Higher Valency
i:j
i:j
i:k k–1:j
i:j
i:j
i:k k–1:l l–1:m m–1:j
i:k k–1:j
i:j
Gi:k
Pk–1:j
Gk–1:j
Gi:j
Pi:j
Pi:k
Gi:k
Gk–1:j
Gi:j Gi:j
Pi:j
Gi:j
Pi:j(a)
Gi:k
Gk–1:l
Gl–1:m
Gm–1:j
Gi:j
Pi:j
Pi:k
Pi:k
Pk–1:l
Pl–1:m
Pm–1:j
Gi:k
Pk–1:j
Gk–1:j
Gi:j
Pi:j
Pi:k
Gi:k
Gk–1:j
Gi:j Gi:j
Pi:j
Gi:j
Pi:j
Pi:k
Gi:k
Pk–1:j
Gk–1:j
Gi:j
Pi:j
Pi:k
Gi:k
Gk–1:j
Gi:j Gi:j
Pi:j
Gi:j
Pi:j
Pi:k
(b)
Odd Rows
Even Rows
Black Cell Gray Cell B
Jackson Adders 8
Tree Adders
(e) Knowles [2,1,1,1]
1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14
3:04:15:26:37:48:59:610:711:812:913:1014:1115:12
4:05:06:07:08:19:210:311:412:513:614:715:8
2:0
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
1:03:25:47:69:811:1013:12
3:07:411:815:12
5:07:013:815:8
15:14
15:8 13:0 11:0 9:0
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
(f) Ladner-Fischer
1:03:25:47:69:811:1013:1215:14
3:05:27:49:611:813:1015:12
5:07:09:211:413:615:8
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
(c) Han-Carlson(a) Brent-Kung
(b) Sklansky
1:03:25:47:69:811:1013:1215:14
3:07:411:815:12
7:015:8
11:0
5:09:013:0
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
1:0
2:03:0
3:25:47:69:811:1013:1215:14
6:47:410:811:814:1215:12
12:813:814:815:8
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
15:0
(d) Kogge-Stone
1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14
3:04:15:26:37:48:59:610:711:812:913:1014:1115:12
4:05:06:07:08:19:210:311:412:513:614:715:8
2:0
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Jackson Adders 9
Higher Valency Trees
Jackson Adders 10
Sparse Trees
Sklansky sparseness 4– Only compute prefixes for every 4th column– Precompute 4-bit results for each possible carry in– Select result based on carry (group generate)
2:1
4:1
4:36:58:710:912:1114:1316:15
8:512:916:13
16:9
123456789101112131415
8:1
12:116:1
16 123456789101112131415
16
18:17
20:17
20:1922:2124:2326:2528:27
24:2128:25
171819202122232425262728293031
24:17
28:17
32 171819202122232425262728293031
32
20:124:128:1 '0
Jackson Adders 11
Carry Selection
i j...(a)
+
+
Ai:j Bi:j
Si:j
0
1
Gj–1:0
(b)P1
Cin
S1S2S3S4
P4P4 P2P2P3P3 P1
G1P1G1PG2PG3
(c)
Cin =Gj–1
Jackson Adders 12
Ling Adders
Factor some complexity out of first term Insert it back into sum selection Remove 1 transistor from critical path
Exploits fact that GiPi = (AiBi)(Ai+Bi) = Gi
Jackson Adders 13
Ling Equations
Define Pseudogenerate: Hi:j ≡ Gi + Gi-1:j
– Simpler than Gi:j = Gi + PiGi-1:j
– Recreate Gi:j = PiHi:j = Pi(Gi + Gi-1:j) = Gi + PiGi-1:j
Define Pseudopropagate Ii:j ≡ Pi-1:j-1
– Shifted version of group propagate Valency-2 recursion is same as PG
– Hi:j = Hi:k + Ii:kHk-1:j
– Ii:j = Ii:kIk-1:j
Sum: Si = Xi xor Gi-1:0 = Xi xor (Pi-1Hi-1:0)
– Selection mux: Si = Hi-1:0 ? [Xi xor Pi-1] : Xi
Sum selection mux chooses Si
based on late-arriving Hi-1:j
Jackson Adders 14
Ling Circuits
Simplifies first stage Compute Hi+1:I in one swell foop
A1
B1
A2 B2A2
B2
A2 B2
A1 B1
A2
B2
G2:1
G2:1 = G2 + P2G1
= A2B2 + (A2 + B2)A1B1
H2:1 = G2 + G1
= A2B2 + A1B1
A1A2
B1B2
A2 B2
A1 B1
H2:1
(a) (b)
Too hard
Easy
Jackson Adders 15
Jackson Adders Generalized Ling technique
– Simplify logic in the prefix tree as well– Use sum selection to reinsert missing terms– Balance logic so both data and select to sum mux are
comparable in criticality Developed by Jackson and Talwar in 2004
– Used in Arithmetica synthesis tool– Parameterized by architecture, valency, sparseness– Reportedly produced superior energy-delay tradeoffs– Burgess09 indicates benefits over standard designs– No comprehensible complete published designs
Jackson Adders 16
Jackson Logic Define new terms D: a group generates or propagates a carry
– Special case: B: a group generates a carry in at least one bit
Rewrite group generate:– Group generates if upper part generates or
propagates and either at least one bit of upper part generates or the low part generates
: : : : 1 :i j i j i j i j i jD G P G P
:
j
i j kk i
B g
:i i iD p
: : : 1:i j i k i k k jG D B G
Jackson Adders 17
Reduced Generate
Again, Rename bracketed term reduced generate R
– Rp has the top p propgate signals stripped out
– R0i:j = Gi:j
– R1i:j = Hi:j
– Jackson consideres p ≥ 2 Group generate can be rewritten in terms of R
– Computing R prefixes can be easier than G
: : : 1:i j i k i k k jG D B G
: : 1 :pi j i i p i p jR B G
: : 1 :p
i j i i p i jG D R
Jackson Adders 18
Hyperpropagate
Another term will be useful for recursion: hyperpropagate Define
– Special case for 2-bit groups: : : 1 :pi j i i p i p jQ P D
1 21: 1: 1:i i i i i iQ Q P
Jackson Adders 19
Jackson Recursions
Valency-2 is no simpler
Valency-3 simplifies R at expense of Q
1
: : : 1:
1: : : 1:
p p i p k qi j i k i p k q k j
p p i p k qi j i k i p k q k j
R R Q R
Q Q R Q
1 1: : 1: 1: 1:
1 1: : 1: 1: 1:
i p i k k p p l l mi j i k k l p m l j
i p i k k p p l l mi j i k k l p m l j
R R R Q R
Q Q Q R Q
total top top mid top mid bot
total top mid bot
G G P G P P G
P P P P
Compare with
total top top bot
total top bot
G G P G
P P P
Compare with
Jackson Adders 20
Valency-3 Circuits
Compound gate implementation
Simpler gate implementation
Rtotal
Rtop Rmid Rbot
Qflex
Rtop
Rmid
RbotQf.ex
Qbot
Qtop
Qmid
Rflex
Qtop Qmid
Rflex
Qbot Qtotal
Rtotal
Rtop
Rmid
Rbot
Qflex
Qtotal
Qtop
Qmid
Qbot
Rflex
Jackson Adders 21
Logical Effort of Valency-3
PG RQ Compound
RQ Simpler
Ggenerate 4 2.67 2.22
Gpropagate 1.67 3.33 2.77
Pgenerate 5 4.33 4
Ppropagate 4 4.66 4
Jackson Adders 22
Sum Selection
Select sum based on Rpi-1:0
– Requires p-bit D signal for sum-selection data input• This is the complexity that is factored out of R
D recursion
1:0
1: 1:0
1:0 1:0 1:
i i i
pi i i p i
p pi i i i i i p
s x G
x D R
R x R x D
1: : 1 : :
p i k pi j i i p i k i p jD D R Q
Jackson Adders 23
Prior Work
[Jackson04]
+ Introduced R and Q
+ Showed how to compute a single sum output- Does not show how to build an entire adder- Does not include recursions for D, valency-2 R/Q
[Burgess09]
+ Comments on critical path
+ Comparisons suggest benefits of Jackson adder
- Hard to decipher diagram of 24-bit adder
Jackson Adders 24
Example
18-bit Jackson Adder– Sklansky tree with sparseness 2– Valency-2 initial stage (like Ling)– Valency-3 2nd and 3rd stages– Only 4 levels of noninverting logic
Jackson Adders 25
Initial Stage
Reduced Generate
Hyperpropagate
Also will need gi for even bits, pi for odd bits, xi for all bits
– For sum selection logic
12 1:2 2 1 2 2 1 2 1 2 2
11:0 0 1 1
i i i i i i i iR g g a b a b
R a a b
12 1:2 2 1 2 2 1 2 1 2 2i i i i i i i iQ p p a b a b
Jackson Adders 26
Second Stage
Compute 3 and 6-bit group signals– Note potential for sharing common terms
3 1 1 1 117:12 17:16 15:14 14:13 13:12
1 1 1 115:12 15:14 14:13 13:12
3 1 1 1 111:6 11:10 9:8 8:7 7:6
1 1 1 19:6 9:8 8:7 7:6
3 1 1 1 15:0 5:4 3:2 2:1 1:0
1 1 1 13:0 3:2 2:1 1:0
R R R Q R
R R Q R
R R R Q R
R R Q R
R R R Q R
R R Q R
3 2 1 1 114:9 14:13 12:11 11:10 10:9
1 1 1 112:9 12:11 11:10 10:9
3 2 1 1 18:3 8:7 6:5 5:4 4:3
1 1 1 16:3 6:5 5:4 4:3
Q Q Q R Q
Q Q R Q
Q Q Q R Q
Q Q R Q
Jackson Adders 27
Third Stage
Reduced generate signals for all groups
9 3 3 3 317:0 17:12 11:6 8:3 5:0
7 1 3 3 315:0 15:12 11:6 8:3 5:0
5 1 3 3 313:0 13:12 11:6 8:3 5:0
3 3 3 311:0 11:6 8:3 5:0
1 1 3 39:0 9:6 8:3 5:0
1 1 1 37:0 7:6 6:3 5:0
R R R Q R
R R R Q R
R R R Q R
R R Q R
R R Q R
R R Q R
Jackson Adders 28
D Logic
Medium-length groups of D are required for sum selection
Note that D17:9 depends on R317:12
– Hence, arrives at same time as R917:0
3 3 1 1 3 317:9 17:15 17:12 14:9 17:17 17:16 16:15 17:12 14:9
1 315:9 15:15 15:12 14:9
1 113:9 13:13 13:12 12:9
1 111:9 11:11 11:10 10:9
9:9 9
7:7 7
15:3 5:4 5:3 5 5:4
D D R Q D R Q R Q
D D R Q
D D R Q
D D R Q
D p
D p
D G P p R
1 1 14:3 5:5 5:4 4:3
3:3 3
1:1 1
Q D R Q
D p
D p
Jackson Adders 29
Sum Selection
Sparseness of 2 requires 1-bit ripple from even to odd
2 2 2 1:0 2 2 1:2 2 1:0
2 1:0 2 2 1:2 2? :
pi i i i i i p i
pi i i i p i
s x G x D R
R x D x
2 1 2 1 2 2 2 1:0 2 1 2 2 2 1:2 2 1:0
2 1:0 2 1 2 2 2 1:2 2 1 2? :
pi i i i i i i i i i p i
pi i i i i i p i i
s x g x G x g x D R
R x g x D x g
Jackson Adders 30
Prefix Network
0 ***
a1, b1a2, b2
1
a3, b3a4, b4
2
a5, b5a6, b6
3
a7, b7a8, b8
4
a9, b9a10, b10
5
a11, b11a12, b12
6
a13, b13a14, b14
7
a15, b15a16, b16
8 ***
a17, b17a18, b18 a0
A2i
B2i
A2i+1
B2i+1
Buffer noncritical logic
x2i+2
x2i+1
g2i+2
p2i+1
A2i+2
B2i+2
Q12i+2:2i+1
R12i+1:2i
R11:0, Q
12:1R1
3:2, Q14:3R1
5:4, Q1
6:5R17:6, Q
18:7R1
9:8, Q110:9R1
11:10, Q112:11R1
13:12, Q114:13R1
15:14, Q116:15R1
17:16
012
R13:0R3
5:0R19:6R3
11:6R115:12R3
17:12 Q16:3Q3
8:3Q112:9Q3
14:9
i
A2i+2
B2i+2
A2i+1
B2i+1
A2i
B2i
j
R16j+1:6j
Q16j+2:6j+1
R16j+3:6j+2
Q16j+4:6j+3
R16j+5:6j+4
Q16j+6:6j+5
Q16j+8:6j+7
R16j+3:6j R3
6j+5:6j Q16j+6:6j+3 Q3
6j+8:6j+3
R16j+3:6j
R36j+5:6j
R16j+1:6j
Q16j+2:6j+1
R16j+3:6j+2
R16j+5:6j+4
Q16j+6:6j+3
Q36j+8:6j+3
Q16j+4:6j+3
R16j+5:6j+4
Q16j+6:6j+5
Q16j+8:6j+7
Notes: Black cells compute R and Q. Gray cells compute only RD network not shown
R17:0R1
9:0R311:0R5
13:0R715:0R9
17:0
8
s18
7
s16s17
6
s14s15
5
s12s13
4
s10s11
3
s8s9
2
s6s7
1
s4s5 s2s3
s2ks2k+1
Rp2k-1:0
Rp2k-1:0
s2k
s2k+1
0
1
0
1
x2k
s1
D2k-1:2k-p
X2k+1g2k
Jackson Adders 31
Observations
Only 4 levels of noninverting logic D17:9 is critical
– Too much factored out of R917:0
– Could eliminate need by doing a 2-bit ripple into s18
18 18 17:0
18 17 17 16 17 16 15:0
718 17 17 16 17 16 15:9 15:9
715:9 15:9 18 17 17 16 16 18 17 17 16? :
s x G
x g p g p p G
x g p g p p D R
D R x g x g p x x x g
Jackson Adders 32
Comparison Methodology Goal: energy-delay curves for Jackson adders compared
to conventional adders How can we objectively compare against the best
conventional design?– Technology mapping challenges– Sizing
• Gatesizer limitations• SCOT is better, but we only have 130 nm models
– Inadequate design effort on conventional cases Plan: synthesize with Design Compiler
– Compare against assign y = a + b;
Jackson Adders 33
Preliminary Results
130 nm Artisan library for IBM CMOS8sf– 1.2 V– FO4 Delay: 55 ps
Fastest designs are 570 ps (10 FO4)
Jackson takes more energy except at very long delay
s18 optimization helps at fastest delays
Energy-Delay Tradeoffs
0
200
400
600
800
1000
1200
1400
1600
1800
0 0.5 1 1.5 2
Delay (ns)
En
erg
y (f
J) Jackson
Behavioral
JacksonOptS18
Jackson Adders 34
Optimization Ideas Compare against Design Compiler architectures
– Starts with NAND/NOR to compute ~gi, ~pi
• Computes xi = pi * ~gi to avoid costly XORs– Appears to use valency-2 Sklansky tree with inverting gates– Final XOR
Logical effort analysis of critical path– Look for areas to reduce effort
Architecture– Valency: consider direct bitwise PG, followed by valency-3 Jackson tree– Sparseness (sparseness 3 in tree above?, sparseness 1)– Sklansky vs. Kogge-Stone
Verilog coding– Does sharing of terms explicitly help or hurt?– Code tuning experiments
Jackson Adders 35
Sun Feedback
Issues raised at Sun review on 9 July 2010– Should we use SCOT to evaluate the effects of
continuous sizing?– Follow SCOT up with SPICE– Start without wire loads, add later– Wire load modeling in Design Compiler
Jackson Adders 36
Short-Term Action Items Adder modeling (write eqns, code in Verilog, compare to DC)
– 32-bit Sklansky valency-2 baseline similar to DC• NAND/NOR to form Pbar, Gbar• G * Pbar to form X• Inverting stages of group logic• Final XOR• Does it exactly match DC results?
– 27-bit Jackson (1-bit, followed by 3 radix-3 stages)– 54-bit Jackson (2-bit Ling PG, followed by 3 radix-3 stages)– Explore optimization of 18-bit design
Logical effort analysis of critical path through 18-bit Jackson Tool to automatically generate energy-delay curves with DC Tool flow for DC 2010 with placement and expected wire cap Subversion repository setup Selection of cell library
Jackson Adders 37
Cell Library IBM 45 nm partially-depleted SOI 12S ARM Library
– sc12_base_v31_rvt_soi12s0_ss_nominal_max_0p90v_125c_mxs.lib
– A12TR library with regular Vt (RVT) transistors– 12 track cell height (1.68 m)– Typical operating point: 1.0 V, 25 C– We use worst-case slow-slow, 0.9 V, 125 C library
• Use Maxsol (mxs) version for worst-case history effect– 1X inverter INV_X1B_A12TR:
• Width = 0.38 m• Cin = 1.6 fF• Intrinsic delay: 16.6 ps rise / 14.1 fall / 15.3 average• Kload: 1.46 ps/pF rise / 1.17 fall / 1.3 average• FO4 delay = 15.3 ps + 1.3 * 1.6 * 4 ≈ 24 ps
– But .lib for 21 ps slew rate, 7.9 fF load suggests» tpdf = 17 ps, tpdr = 23 ps, tpd = 20 ps, tf = 13 ps, tr = 23 ps
• Switching energy: 0.00078 W/MHz ≈ 0.8 fJ– equals 0.5 CinVDD
2
• Leakage power: 0.1 W (very high!)
Jackson Adders 38
Summary Jackson adders appear to offer potential benefits
– Logical effort– Arithmetica results– Burgess results
Preliminary synthesis results don’t yet demonstrate the advantages
HMC 2010-11 Clay-Wolkin Research goals– Understand Jackson design space– Logical effort analysis of critical path– Develop Jackson adders superior to conventional
Design Compiler results
Jackson Adders 39
References [Burgess09] N. Burgess, “Implementation of recursive Ling adders in CMOS
VLSI,” Proc. Asilomar Conf. Signals, Systems and Computers, 2009, pp. 1777-1781.
[Jackson04] R. Jackson and S. Talwar, “High speed binary addition,” Proc. Asilomar Conf. Signals, Systems and Computers, 2004, pp. 1350-1353.
[Jackson08] R. Jackson, “Data detection algorithms for perpendicular magnetic recording in the presence of strong media noise,” Ph.D. thesis, Department of Mathematics, University of Warwick, 2008.
[Ling81] H. Ling, "High-speed binary adder," IBM J. Research and Development, vol. 25, no. 3, May 1981, pp. 156-166.
[Patil07] D. Patil, O. Azizi, M. Horowitz, R. Ho, and R. Ananthraman, "Robust energy-efficient adder topologies," Proc. Computer Arithmetic Symp., Jun. 2007, pp. 16-28.
[Weste10] N. Weste and D. Money Harris, CMOS VLSI Design, 4th Ed., Boston: Addison-Wesley, 2010.
[Zlatanovici09] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-delay optimization of 64-bit carry-lookahead adders with a 240 ps 90 nm CMOS design example,” IEEE J. Solid-State Circuits, vol. 44, no. 2, Feb. 2009, pp. 569-583.