Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Compositional Dataflow Circuits
Stephen A. Edwards Richard Townsend Martha A. Kim
Columbia University
MEMOCODE, Vienna, Austria, October 1, 2017
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
−
−
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
−
−
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
Townsend et al.
CC ’2017
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
Patience Through Handshaking
Want patient blocks to handle delays from
Memory systemsData-dependent
computations
Full buffersShared resourcesBusy computational
units
up
stream
do
wn
stream
data
valid
ready
valid ready Meaning
1 1 Token transferred1 0 Token valid; held0 − No token to transfer
Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure
Patience Through Handshaking
Want patient blocks to handle delays from
Memory systemsData-dependent
computations
Full buffersShared resourcesBusy computational
units
up
stream
do
wn
stream
data
valid
ready
valid ready Meaning
1 1 Token transferred1 0 Token valid; held0 − No token to transfer
Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Datapath
Combinational function ignores flow control
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Valid network
Output valid if both inputs are valid
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Ready network
Input tokens consumed if output token is consumed(output is valid and ready)
Multiplexer Block
in0 in1 in2
out
select
in0
in1
in2
select out
decoder
Demultiplexer Block
out0 out1 out2
in
select
select
in
out2
out1
out0
decoder
Buffering a Linear Pipeline (Point 1/4)
Combinational block
Data buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
Buffering a Linear Pipeline (Point 1/4)
Combinational block
Data buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
The Solution to Combinational Loops (Point 2/4)
valid
readyXXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
Nondeterministic Merge (Point 4/4)
f f fShare with
merge/demux
merge
f
demux
select
Two-Way Nondeterministic Merge Block w/ Select
in0
in1
out
sel
01
Arb
iter
“Two-way fork with multiplexed outputselected by an arbiter”
Experiments: Random Buffer Placement
0
2
4
6
2 4 6 8 10
(7 buffers)Com
plet
ion
Tim
e (µ
s)
Number of buffer pairs
GCD(100,2)
0
750
1500
2250
2 4 6 8 10
(80 buffers)
21-way Conveyor
0
1
2
3
2 4 6 8 10
(96 buffers)
BSN
Best Buffering for GCD (Manually Obtained)
Each loop has one of each buffer
Data Buffer
Control Buffer
Summary
Compositional Dataflow Networks as an IR
Patient dataflow blocks with valid/ready handshaking
1. Break downstream, upstream paths w/ two buffer types
2. Avoid comb. cycles: prohibit ready-to-valid paths
3. Add one state bit per output so forks may “race ahead”
4. Tame nondeterministic merge with a select output
Random buffer placement experiments show it works