Architecture and Routing for NoC-based FPGA
Israel Cidon*
*joint work with Roman Gindin and Idit Keidar
2
Israel Cidon - Technion
FPGA
One NoC does not fit all!
Flexibility
Traffic uncertainty
single application
General purpose computer
Chip design
Run time
SOC
CMP
I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006
Configuration
3
Israel Cidon - Technion
Field Programmable Gate Array - 101
Flexible Soft logicConfigurable logic blocks (CLBs) and routing
channels Programmed Look-up-tables (LUTs) Configurable switching boxes
Area, power and speed efficient Hard logic Wire and clock infrastructureSpecial purpose modules, e.g., CPU, SerDes
5
Israel Cidon - Technion
Challenges for Future FPGA Scalability of design methodology Dominance of wire delays
Already more than 50% of delay Power Complex communication patterns Prototyping for NoC-based SoCs
6
Israel Cidon - Technion
NoC Based FPGA Architecture
CR
CR
R
FRSERDESCNI
R
FR CPU
RCNI
CR
CNIR
CR
R
CNIR
R
CR
CNIR
R
CNIR
FRDSP
CNIR
CR
R
FRPCI
RCNI
CR
CNIR
CR
CNIR
CNIR
FRCPU
RCNI
CR
CNIR
CNIR
CNIR
FRDRAM
R
CR
RCNI
R
CR
CNIR
CNIR
CNI
CNI CNI
FRETHI/F
CNIR
CNIR
FRD/AA/D
CNIR
CNIR
FRETHI/F
CNIR
CNI CNI CNI CNI
Functional unit
Routers
NoC for inter-routing
Configurable region – User
logic
Configurable network interface
8
Israel Cidon - Technion
Hard or soft NoC?
Why hard Interconnect is a
performance bottleneck
Interconnect power Part of FPGA
infrastructure
Why soft Application is not
known when the network is built
Provides maximum flexibility
Prevents resource lockup
9
Israel Cidon - Technion
Suggested FPGA NoC ArchitectureNoC Element Implementation
Wires, repeaters, etc. Hard
Routers, including VCs, buffers, QoS support
Hard
Network interfaces Soft: Configurable Network Interface (CNI)
Routing algorithm and headers
Soft: determined in CNI
Routing tables Soft
10
Israel Cidon - Technion
FPGA Routing – Optimization Problem
Set of ApplicationsDifferent Architectures
Different Traffic Patterns
Implemented on the same
chip
Common efficient NoC
11
Israel Cidon - Technion
The NoC design problem
The cost Hard grid links
For uniform grids - the capacity of the most congestion link NoC Logic
Hard logic for router Soft logic for routing tables, headers, CNIs
Design Envelope Collection of designs supported by a given programmable chip
The variables Number of “hard-coded” wires per link Possible configurable routing schemes
12
Israel Cidon - Technion
Routing Schemes XY
Very simple logic Deadlock free Unbalanced - high cost in
uniform capacity grids
v1
v2
f
13
Israel Cidon - Technion
Toggle XY (TXY)
Split packets evenly between XY, YX routes
Deadlock avoided with 2 VCs Near-optimal for symmetric
traffic (permutations) [Seo et al. 05; Towles & Dally 02]
Simple Better Balanced Split routes Does not take into account the
traffic pattern
v1
v2
f/2 f/2 f/2
f/2f/2
f/2 f/2 f/2
f/2f/2
14
Israel Cidon - Technion
Weighted Schemes
TXY not always produces the best results -
0 0.2 0.4 0.6 0.8 110
15
20
25Capacity vs. XY weight on Toggle XY Routing - for (2,1) and (1,1) hotspots
XY fraction
Max c
apac
ity
YX only
XY only
11.94
15
Grid with optimal weight Grid with equal weight
Max. Capacity for graph with two hotspots at (1,1) and (1,2) on 5x5 grid
TXYOptimum
15
Israel Cidon - Technion
WTXY
Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity
Compute the ratio at programming time Load into Cxy field in router Router chooses XY route with probability
Cxy, otherwise YX
16
Israel Cidon - Technion
TXY, WTXY Limitation Traffic split
packets of the same flow take different paths Delays may cause out-of-order arrivals Re-ordering buffers are costly
17
Israel Cidon - Technion
Ordered Routing Algorithms
One route per source-destination (S-D) pairNo traffic splitting
Unordered Routing Ordered Routing
18
Israel Cidon - Technion
Source Toggle XY
The route is a function of source and destination ID bitwise XOR
Very simple algorithm Maximum capacity is
similar to TXY
XY YX XY YX XY
YX YX XY YX
XY YX XY YX XY
YX XY YX XY YX
XY YX XY YX XY
19
Israel Cidon - Technion
Weighted Ordered Toggle - WOT
Weighted Ordered Toggle (WOT)Route per S-D pair is chosen at programming
time Each source stores a routing bit for each
destination Objective: minimize max link capacity
Optimal route assignment is difficult
20
Israel Cidon - Technion
WOT Min-max Route Assignment
initial assignment - STXY Make changes that reduce the capacity:
Find most loaded linkAmong S-D pairs sharing this link change one
that minimizes the max capacity (if possible) Sub-optimal
22
Israel Cidon - Technion
Benchmarks Previous work consider uniform
permutations Chips have one or more hotspots
CPU, on-chip memory, off-chip memory interface
We use several hot-spot traffic models Also use a real world example
23
Israel Cidon - Technion
Single Hotspot
0 5 10 15 200
5
10
15
20
Capacity
Numb
er of
Links
XY
TXY
STXY
WTXY
WOT
CORNER CENTER INTERNAL HOR. EDGE VER. EDGE0
2
4
6
8
10
12
14
16
18
20
Location of the hot spot
Cap
aci
ty
XY
TXY
STXY
WTXY
WOT
24
Israel Cidon - Technion
Two Hotspots
1 2 3 4 5
15
20
25
30
Minimum Distance between the hotspots
Capa
city
XY
TXY
STXY
WTXY
WOT
Maximum Capacity Design Envelope for various distances
between the hotspots for WOT
25
Israel Cidon - Technion
Three Hotspots
Maximum capacity vs. Minimum distance between the hotspots
1 2 3 4
20
30
40
Minimum Distance between the hotspots
Cap
aci
ty
XY
TXY
STXY
WTXY
WOT
26
Israel Cidon - Technion
Mixed Traffic Model
Three parameters per node A probability to be a hotspot, A probability to send data to
a hotspot A probability to send data to
a non-hotspot
Average improvement for WOT vs. TXY is 12% and vs. XT is 25%
5 6 7 8 910
20
30
40
50
60
70
80
90
100
110
Grid Size
Max
. C
Performance for Phs = 0.10
Psend,hs = 0.8000,Psend,no,hs = 0.0500, Nsim = 45
XY
TXY
WTXY
STXY
WOT
27
Israel Cidon - Technion
Real-World Example
Based on Bertozzi - video encoderMapping and placement are done manually
28
Israel Cidon - Technion
Real World Example
Maximum Capacity WOT - 1053 STXY -1377 XY - 1539
81 243 405 567 729 891 1053 1215 1377 15390
5
10
15
Capacity
Num
ber
of
Lin
ks
XYYXSTXYWOT