38
Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia University MEMOCODE, Vienna, Austria, October 1, 2017

Compositional Dataflow Circuits

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compositional Dataflow Circuits

Compositional Dataflow Circuits

Stephen A. Edwards Richard Townsend Martha A. Kim

Columbia University

MEMOCODE, Vienna, Austria, October 1, 2017

Page 2: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 3: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 4: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 5: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 6: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 7: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

Page 8: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

Page 9: Compositional Dataflow Circuits

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

Townsend et al.

CC ’2017

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Page 10: Compositional Dataflow Circuits

Patience Through Handshaking

Want patient blocks to handle delays from

Memory systemsData-dependent

computations

Full buffersShared resourcesBusy computational

units

up

stream

do

wn

stream

data

valid

ready

valid ready Meaning

1 1 Token transferred1 0 Token valid; held0 − No token to transfer

Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure

Page 11: Compositional Dataflow Circuits

Patience Through Handshaking

Want patient blocks to handle delays from

Memory systemsData-dependent

computations

Full buffersShared resourcesBusy computational

units

up

stream

do

wn

stream

data

valid

ready

valid ready Meaning

1 1 Token transferred1 0 Token valid; held0 − No token to transfer

Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure

Page 12: Compositional Dataflow Circuits

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Datapath

Combinational function ignores flow control

Page 13: Compositional Dataflow Circuits

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Valid network

Output valid if both inputs are valid

Page 14: Compositional Dataflow Circuits

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Ready network

Input tokens consumed if output token is consumed(output is valid and ready)

Page 15: Compositional Dataflow Circuits

Multiplexer Block

in0 in1 in2

out

select

in0

in1

in2

select out

decoder

Page 16: Compositional Dataflow Circuits

Demultiplexer Block

out0 out1 out2

in

select

select

in

out2

out1

out0

decoder

Page 17: Compositional Dataflow Circuits

Buffering a Linear Pipeline (Point 1/4)

Combinational block

Data buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Page 18: Compositional Dataflow Circuits

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Page 19: Compositional Dataflow Circuits

Buffering a Linear Pipeline (Point 1/4)

Combinational block

Data buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Page 20: Compositional Dataflow Circuits

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Page 21: Compositional Dataflow Circuits

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Page 22: Compositional Dataflow Circuits

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

Page 23: Compositional Dataflow Circuits

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

Page 24: Compositional Dataflow Circuits

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

Page 25: Compositional Dataflow Circuits

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

Page 26: Compositional Dataflow Circuits

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

Page 27: Compositional Dataflow Circuits

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

Page 28: Compositional Dataflow Circuits

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

Page 29: Compositional Dataflow Circuits

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

Page 30: Compositional Dataflow Circuits

The Solution to Combinational Loops (Point 2/4)

valid

readyXXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

Page 31: Compositional Dataflow Circuits

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

Page 32: Compositional Dataflow Circuits

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

Page 33: Compositional Dataflow Circuits

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

Page 34: Compositional Dataflow Circuits

Nondeterministic Merge (Point 4/4)

f f fShare with

merge/demux

merge

f

demux

select

Page 35: Compositional Dataflow Circuits

Two-Way Nondeterministic Merge Block w/ Select

in0

in1

out

sel

01

Arb

iter

“Two-way fork with multiplexed outputselected by an arbiter”

Page 36: Compositional Dataflow Circuits

Experiments: Random Buffer Placement

0

2

4

6

2 4 6 8 10

(7 buffers)Com

plet

ion

Tim

e (µ

s)

Number of buffer pairs

GCD(100,2)

0

750

1500

2250

2 4 6 8 10

(80 buffers)

21-way Conveyor

0

1

2

3

2 4 6 8 10

(96 buffers)

BSN

Page 37: Compositional Dataflow Circuits

Best Buffering for GCD (Manually Obtained)

Each loop has one of each buffer

Data Buffer

Control Buffer

Page 38: Compositional Dataflow Circuits

Summary

Compositional Dataflow Networks as an IR

Patient dataflow blocks with valid/ready handshaking

1. Break downstream, upstream paths w/ two buffer types

2. Avoid comb. cycles: prohibit ready-to-valid paths

3. Add one state bit per output so forks may “race ahead”

4. Tame nondeterministic merge with a select output

Random buffer placement experiments show it works