27
Architecture and Routing for NoC- based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar

Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Architecture and Routing for NoC-based FPGA

Israel Cidon*

*joint work with Roman Gindin and Idit Keidar

2

Israel Cidon - Technion

FPGA

One NoC does not fit all!

Flexibility

Traffic uncertainty

single application

General purpose computer

Chip design

Run time

SOC

CMP

I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006

Configuration

3

Israel Cidon - Technion

Field Programmable Gate Array - 101

Flexible Soft logicConfigurable logic blocks (CLBs) and routing

channels Programmed Look-up-tables (LUTs) Configurable switching boxes

Area, power and speed efficient Hard logic Wire and clock infrastructureSpecial purpose modules, e.g., CPU, SerDes

5

Israel Cidon - Technion

Challenges for Future FPGA Scalability of design methodology Dominance of wire delays

Already more than 50% of delay Power Complex communication patterns Prototyping for NoC-based SoCs

6

Israel Cidon - Technion

NoC Based FPGA Architecture

CR

CR

R

FRSERDESCNI

R

FR CPU

RCNI

CR

CNIR

CR

R

CNIR

R

CR

CNIR

R

CNIR

FRDSP

CNIR

CR

R

FRPCI

RCNI

CR

CNIR

CR

CNIR

CNIR

FRCPU

RCNI

CR

CNIR

CNIR

CNIR

FRDRAM

R

CR

RCNI

R

CR

CNIR

CNIR

CNI

CNI CNI

FRETHI/F

CNIR

CNIR

FRD/AA/D

CNIR

CNIR

FRETHI/F

CNIR

CNI CNI CNI CNI

Functional unit

Routers

NoC for inter-routing

Configurable region – User

logic

Configurable network interface

8

Israel Cidon - Technion

Hard or soft NoC?

Why hard Interconnect is a

performance bottleneck

Interconnect power Part of FPGA

infrastructure

Why soft Application is not

known when the network is built

Provides maximum flexibility

Prevents resource lockup

9

Israel Cidon - Technion

Suggested FPGA NoC ArchitectureNoC Element Implementation

Wires, repeaters, etc. Hard

Routers, including VCs, buffers, QoS support

Hard

Network interfaces Soft: Configurable Network Interface (CNI)

Routing algorithm and headers

Soft: determined in CNI

Routing tables Soft

10

Israel Cidon - Technion

FPGA Routing – Optimization Problem

Set of ApplicationsDifferent Architectures

Different Traffic Patterns

Implemented on the same

chip

Common efficient NoC

11

Israel Cidon - Technion

The NoC design problem

The cost Hard grid links

For uniform grids - the capacity of the most congestion link NoC Logic

Hard logic for router Soft logic for routing tables, headers, CNIs

Design Envelope Collection of designs supported by a given programmable chip

The variables Number of “hard-coded” wires per link Possible configurable routing schemes

12

Israel Cidon - Technion

Routing Schemes XY

Very simple logic Deadlock free Unbalanced - high cost in

uniform capacity grids

v1

v2

f

13

Israel Cidon - Technion

Toggle XY (TXY)

Split packets evenly between XY, YX routes

Deadlock avoided with 2 VCs Near-optimal for symmetric

traffic (permutations) [Seo et al. 05; Towles & Dally 02]

Simple Better Balanced Split routes Does not take into account the

traffic pattern

v1

v2

f/2 f/2 f/2

f/2f/2

f/2 f/2 f/2

f/2f/2

14

Israel Cidon - Technion

Weighted Schemes

TXY not always produces the best results -

0 0.2 0.4 0.6 0.8 110

15

20

25Capacity vs. XY weight on Toggle XY Routing - for (2,1) and (1,1) hotspots

XY fraction

Max c

apac

ity

YX only

XY only

11.94

15

Grid with optimal weight Grid with equal weight

Max. Capacity for graph with two hotspots at (1,1) and (1,2) on 5x5 grid

TXYOptimum

15

Israel Cidon - Technion

WTXY

Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity

Compute the ratio at programming time Load into Cxy field in router Router chooses XY route with probability

Cxy, otherwise YX

16

Israel Cidon - Technion

TXY, WTXY Limitation Traffic split

packets of the same flow take different paths Delays may cause out-of-order arrivals Re-ordering buffers are costly

17

Israel Cidon - Technion

Ordered Routing Algorithms

One route per source-destination (S-D) pairNo traffic splitting

Unordered Routing Ordered Routing

18

Israel Cidon - Technion

Source Toggle XY

The route is a function of source and destination ID bitwise XOR

Very simple algorithm Maximum capacity is

similar to TXY

XY YX XY YX XY

YX YX XY YX

XY YX XY YX XY

YX XY YX XY YX

XY YX XY YX XY

19

Israel Cidon - Technion

Weighted Ordered Toggle - WOT

Weighted Ordered Toggle (WOT)Route per S-D pair is chosen at programming

time Each source stores a routing bit for each

destination Objective: minimize max link capacity

Optimal route assignment is difficult

20

Israel Cidon - Technion

WOT Min-max Route Assignment

initial assignment - STXY Make changes that reduce the capacity:

Find most loaded linkAmong S-D pairs sharing this link change one

that minimizes the max capacity (if possible) Sub-optimal

21

Israel Cidon - Technion

Iteration Demonstration

S3 S2

S1

D3

D1

D2

22

Israel Cidon - Technion

Benchmarks Previous work consider uniform

permutations Chips have one or more hotspots

CPU, on-chip memory, off-chip memory interface

We use several hot-spot traffic models Also use a real world example

23

Israel Cidon - Technion

Single Hotspot

0 5 10 15 200

5

10

15

20

Capacity

Numb

er of

Links

XY

TXY

STXY

WTXY

WOT

CORNER CENTER INTERNAL HOR. EDGE VER. EDGE0

2

4

6

8

10

12

14

16

18

20

Location of the hot spot

Cap

aci

ty

XY

TXY

STXY

WTXY

WOT

24

Israel Cidon - Technion

Two Hotspots

1 2 3 4 5

15

20

25

30

Minimum Distance between the hotspots

Capa

city

XY

TXY

STXY

WTXY

WOT

Maximum Capacity Design Envelope for various distances

between the hotspots for WOT

25

Israel Cidon - Technion

Three Hotspots

Maximum capacity vs. Minimum distance between the hotspots

1 2 3 4

20

30

40

Minimum Distance between the hotspots

Cap

aci

ty

XY

TXY

STXY

WTXY

WOT

26

Israel Cidon - Technion

Mixed Traffic Model

Three parameters per node A probability to be a hotspot, A probability to send data to

a hotspot A probability to send data to

a non-hotspot

Average improvement for WOT vs. TXY is 12% and vs. XT is 25%

5 6 7 8 910

20

30

40

50

60

70

80

90

100

110

Grid Size

Max

. C

Performance for Phs = 0.10

Psend,hs = 0.8000,Psend,no,hs = 0.0500, Nsim = 45

XY

TXY

WTXY

STXY

WOT

27

Israel Cidon - Technion

Real-World Example

Based on Bertozzi - video encoderMapping and placement are done manually

28

Israel Cidon - Technion

Real World Example

Maximum Capacity WOT - 1053 STXY -1377 XY - 1539

81 243 405 567 729 891 1053 1215 1377 15390

5

10

15

Capacity

Num

ber

of

Lin

ks

XYYXSTXYWOT

29

Israel Cidon - Technion

Summary

A new NoC-based architecture for FPGA A design methodology for this architecture. WOT routing algorithm –

Balanced In-orderLow cost