Upload
eaton-benton
View
22
Download
1
Embed Size (px)
DESCRIPTION
Wagging Logic: Moore's Law will eventually fix it. Charlie Brej APT Group University of Manchester. Introduction. Quasi-Delay-Insensitive (QDI) approach Prove the high performance potential What is performance? Latency Throughput Why is async better? Average case performance - PowerPoint PPT Presentation
Citation preview
19/04/23 Group Talk 1
Wagging Logic: Moore's Law will eventually fix it
Charlie Brej
APT Group
University of Manchester
19/04/23 Group Talk 2
Introduction
Quasi-Delay-Insensitive (QDI) approachProve the high performance potential What is performance?
LatencyThroughput
Why is async better?Average case performance
Variability and data-dependantBit level pipelining
19/04/23 Group Talk 3
Forward Safe Guarding
Ensure all wire pairs are cycled up and down
QDI
C
19/04/23 Group Talk 4
Behaviour
Viewpoint of a single output
Many inputs
19/04/23 Group Talk 5
Behaviour
All or nothingSynchronises inputs
together
19/04/23 Group Talk 6
Why is it so slow?
Delays:Gate: 1, C-element: 2
Stage data propagation: XCycle time (times 2 for set and reset):
Forward guarding: 2XC-element for each gate
Acknowledge propagation: 2XC-element for each fork (fork depth ~ gate depth)
About eight times slower than worst case!
19/04/23 Group Talk 7
Why is four-phase so slow?
Low latencyLow throughputOnly 1/8th of the system doing useful work
Rest is resetting/completing
WorkieWorkie SleepySleepySleepySleepySleepySleepySleepy Sleepy
19/04/23 Group Talk 8
Solutions
Ultra/Hyper/Super PipeliningNeed 8 times finer pipelining
ImpossibleEach latch adds to the latency
Faster completion detectionBalanced treeing C-elements
Arranging to suit arrival orderBackward guardingNot even close to 8x improvement
19/04/23 Group Talk 9
Inspiration: Wagging Latches
Alternate latch read/write
Capacity of two latches
Depth of one latch
19/04/23 Group Talk 10
Wagging Logic
Apply same method to the logicAlternate logic allowing one to set while the
other resets (precharges)
ResetReset
SetSet
ResetReset
SetSet
SetSet
ResetReset
SetSet
ResetReset
19/04/23 Group Talk 11
Wagging Logic
Between wagging stagesNo need to waggNo need to synchronize
Wagg only when communication with non-wagging logic
19/04/23 Group Talk 12
Non FIFO Example
19/04/23 Group Talk 13
Duplicate the Logic
19/04/23 Group Talk 14
Connect to Complementary
19/04/23 Group Talk 15
A Harder Example
19/04/23 Group Talk 16
Duplicate the Logic
19/04/23 Group Talk 17
Connect to Complementary
19/04/23 Group Talk 18
Triplicate the Logic
19/04/23 Group Talk 19
Connect to the next on the list
19/04/23 Group Talk 20
Other example
19/04/23 Group Talk 21
Proof of the puddingSimple gate level simulation
My own simulatorDelays: C-element=2, Gate=1
Example circuitsFibonacci sequence generators
Vertically pipelined 64bit ripple carry adderNon-pipelined 8bit ripple carry adder
16 input XORBackward and Forward guarded
Relative measurements of Speed, Power, Area10,000 gate delays simulation
19/04/23 Group Talk 22
64bit Fibonacci Performance
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Wagging Level
Res
ult
s/10
,000
GD
s
Backward
Forward
Synchronous Worst Case:74
19/04/23 Group Talk 23
8bit Fibonacci Performance
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Wagging Level
Re
su
lts
/10
,00
0 G
Ds
Forward
Backward
Synchronous Worst Case:500
19/04/23 Group Talk 24
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Wagging Level
Resu
lts/1
0,0
00 G
Ds
Forward
Backward
XOR PerformanceSynchronous Worst/Best Case:1250
(8 gate delays) Inc. Flip-Flop:1000(10 gate delays)
Inc. Timing margins
19/04/23 Group Talk 25
Power Consumption
9500
10000
10500
11000
11500
12000
12500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Wagging Level
Tra
ns
itio
ns
/Ins
tru
cti
on
BackwardForward
Synchronous:610
19/04/23 Group Talk 26
1220
7522
13880
20238
26596
32954
39312
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Co
mp
on
en
ts
Area
19/04/23 Group Talk 27
Future workLarger and more complex designs
Small CPULayoutSilicon?
Improve completion timeCurrent optimal wagging ~ 5Target ~ 3
Fully automated flowVerilog Input & OutputPartitioning
19/04/23 Group Talk 28
ConclusionsMatching and surpassing synchronous
performance every timeDI logic for performanceVery Expensive
20 times more power5 times bigger (times wagging)
Fastest logic on the planet!Discounting increase in wire delaysAssuming other things will be able to keep up