Upload
caren-hall
View
229
Download
0
Embed Size (px)
Citation preview
Asynchronous Datapath Design• Adders• Comparators• Multipliers• Registers• Completion Detection• Bus• Pipeline•….. Read Reading 3:
Delay-Insensitive Adders
Asynchronous Adder Design
• Motivation• Background: Sync and Async adders• Delay-insensitive carry-lookahead adders• Complexity Analysis• Conclusions
Motivation
• Integer addition is one of the most important operations in digital computer systems
• Statistics shows that in a prototypical RISC
machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath.
• In ARM processors it even reaches 80%.
• The performance of processors is significantly influenced by the speed of their adders.
Background
• Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance
• For example:
Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)
Background: Binary Addition
• Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000
• Adders can perform average case behavior
• Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000
Background
• Ripple-Carry Adders:
• One-stage full adder:• Logic complexity: O(n)• Time complexity: O(n)
Background
• One-stage CSCD Adder:
• Carry-Sensing Completion Detection Adders:
Logic complexity: O(n) Time complexity: O(log n)
Background
• One-stage DIRCA:
• DIRCA Adders:
Logic complexity: O(n) Time complexity: O(log n)• One of the most robust adders
Carry-Lookahead Adders
• RCA requires n stage-propagation delays. • For high speed processors, this scheme is undesirable. • One way to improve adder performance is to use parallel processing in computing the carries. • That is why Carry-Lookahead Adders (CLA) are introduced.
• CLAs:
Logic complexity: O(n) Time complexity: O(log n)
DI Carry-Lookahead Adders
• Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code.
1. dual-rail signaling: inputs, sums, and carry bits
2. one-hot code: internal signals
A1=0A0=0
A1=0A0=1
A1=1A0=0
A1=1A0=1
a. No data b. valid 0 c. valid 1 d. illegal
a. No data: 000b. 001c. 010d. 100
QDI Carry-Lookahead Adders
• DI C module: 1. internal signals: one-hot code, k, g, p
2. input and sum bits: dual-rail signals
CLA A module
QDI Carry-Lookahead Adders
• DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals
CLA B module
DI Carry-Lookahead Adders
G3,2, K3,2
can be used tospeed up the carry computation too.
k3,g3
K3,2, G3,2
Speeding Up DICLA
• Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately.• D module with speed-up circuitry
Speeding Up DICLA
• General form:• D module with speed-up circuitry
for carry-kill
for carry-generate
= gj-1+gj-2Pj-1+…+g0p1p2…pj-1
This is in fact the full carry-lookahead scheme.
Speeding Up DICLA
• Problem of full carry-lookahead scheme • practical limitations on fan-in and fan-out, irregular structure, and many long wire.• logic complexity increases more than linearly
• Solution: use the properties of tree-like structure• New speed-up circuitry:
Optimization:
• Simplified D module • Simplified D’ module
• Better logic complexity• Delay-Insensitive again
Complexity Analysis
• DICLASP
• Logic Complexity: (n)• Time Complexity: (log log n)• Best area-time efficiency: (n log log n)
SPICE Simulation:
SPICE Simulation contains two parts:• Random number inputs: 10000 random generated input pairs• Statistical data: running examples on a 32-bit ARM emulator
SPICE Simulation:
• SPICE simulation results: random number inputs
• Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64
SPICE Simulation:
• Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator
SPICE Simulation:
• SPICE simulation results: dynamic traces• Average computation time:
DIRCA 9.61ns DICALSP 5.25ns• Speedup: DIRCA vs RCA: 4.1
DICLASP vs CLA: 2.2