24
Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture Ivan Miro-Panades 1,2,3 , Fabien Clermidy 3 , Pascal Vivet 3 , Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris, France 2 STMicroelectronics, Crolles, France 3 CEA-Leti, MINATEC, Grenoble, France Ivan MIRO PANADES – NOCS 2008 1

Physical Implementation of the DSPIN Network-on-Chip in

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Physical Implementation of the DSPIN Network-on-Chip in

Physical Implementation of the DSPIN Network-on-Chip in the

FAUST Architecture

Ivan Miro-Panades1,2,3, Fabien Clermidy3, Pascal Vivet3, Alain Greiner1

1 The University of Pierre et Marie Curie, Paris, France2 STMicroelectronics, Crolles, France3 CEA-Leti, MINATEC, Grenoble, France

Ivan MIRO PANADES – NOCS 20081

Page 2: Physical Implementation of the DSPIN Network-on-Chip in

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 20082

Page 3: Physical Implementation of the DSPIN Network-on-Chip in

Motivation

Physically implement the DSPIN NoC into the FAUST application platform

- DSPIN is a NoC developed between Lip6 and ST

- FAUST is a stream-oriented application platform for 4G telecom applications, based on ANOC, developed by CEA-Leti.

Compare the performances between ANOC and DSPIN on a real application and traffic

Ivan MIRO PANADES – NOCS 20083

Page 4: Physical Implementation of the DSPIN Network-on-Chip in

FAUST architecture

RAM IF58 Pads

ETHERNET IF17 Pads

Async/Sync IF

Async node

NOC2 IF

83 Pads

RX units

TX units

AHB units

OFDMMOD.

ALAM.MOD.

CDMAMOD. MAPP. BIT

INTER.TURBOCODER

RAM ARM946 RAMEXT.

RAMCTRL

AHB

ROTOR EQUAL. CHAN.EST.

CONV.DEC.

ETHERNET

FRAMESYNC.

ODFMDEM.

CDMADEM.

DE-MAPP.

DE-INTER.

EXP

SPort

APort

NOC1 IF

84 PadsSPort

APort

RAC

NoCPerf.

EXP

CONV.CODER

Clk & Reset CTRLJTAG Clk, Rst

DART

23 computation units

Asynchronous NoC(ANOC)

20 ANOC routers

GALS conception

24 independent Clks

Ethernet port

Internal/External RAM

CPU ARM946ES

Cache 4KB-I 4KB-D

Hardware OFDM modulation/demodulation

Ivan MIRO PANADES – NOCS 20084

Page 5: Physical Implementation of the DSPIN Network-on-Chip in

ANOC architectureAsynchronous NoC

- Asynchronous send/accept handshake protocol

- QDI 4-phase/4-rail asynchronous logic

- QoS with two Virtual Channels (Best Effort, Guaranteed Service)

NoC Routers- 5 ports router- Source routing- Wormhole packet switching- 32 bits payload

FIFO based GALS interfacesMapped onto libraries

- CMOS ST 130nm- TIMA TAL130 library

CrossbarCrossbar

CrossbarCrossbar

NICGALS

interfac

eIP

Asynchronous handshake protocol

Ivan MIRO PANADES – NOCS 20085

Page 6: Physical Implementation of the DSPIN Network-on-Chip in

FAUST floor-plan with ANOC 4.5 M Gates276 chip pinsChip area 80 mm²ST CMOS 130nm166 MHz (worst-case)

NoC implementation :

DART

RAC

ARM946

RAM1RAM2 Ext.

RAM Ctrl.

OFDM mod.

Rotor

Frame sync.

OFDM demod.

CDMA Dem.

Channel Est. Ethernet

Deinter.Demapp.

Conv. Dec.Equal.

Mapp.CDMA Mod.N

P1

NP2

Ala.Bit.

Inter.Turbo Dec.

Conv. Codec.

Exp.

Exp.

CLK

Res

et

- Uses ANOC router hard-macro- Perform buffering and routing

of ANOC links- No spaghetti routing at top

level !

ANOC router (0.211 mm²)

WestPort

EastPort

NorthPort Unit Port

South Port

Ivan MIRO PANADES – NOCS 20086

Page 7: Physical Implementation of the DSPIN Network-on-Chip in

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 20087

Page 8: Physical Implementation of the DSPIN Network-on-Chip in

Ivan MIRO PANADES – NOCS 20088

Cluster (Y,X)Cluster (Y,X)

Cluster (YCluster (Y--1,X)1,X)

(Y,X+1)(Y,X+1)(Y,X(Y,X--1)1)

(Y(Y--1,X1,X--1)1) (Y(Y--1,X+1)1,X+1)

Cluster (Y+1,X)Cluster (Y+1,X)(Y+1,X(Y+1,X--1)1) (Y+1,X+1)(Y+1,X+1)

Local Local portport

in

in

in

in

in

in

in

in

in

DSPIN architecture

CK0CK0CK1CK1

CK4CK4CK5CK5

CK3CK3

CK6CK6 CK7CK7 CK8CK8

CKCK22

Packet baseDistributed router architectureSuited to GALS approachMesochronous links between routersSynthesizable with standard cellsNeither asynchronous nor custom cellsMetastability resolved by “bi-synchronous FIFO” More details in:

"A Low Cost Network-on-Chip with Guaranteed Service Well Suited to the GALS Approach“ NanoNet’06

Page 9: Physical Implementation of the DSPIN Network-on-Chip in

NoC architectures comparisonANOC DSPIN

Router arity 5 port router 5 port router

Topology Irregular 2D mesh Regular 2D mesh

Routing technique Source routing (9 hops) Address-based (8 bits) X-First algorithm

Switching technique Wormhole Wormhole

Flit size (payload) 34 bits (32 bits) Generic: 34 bits (32 bits)

Flow control bits on the flit Begin of packet (BOP)End of packet (EOP)

Begin of packet (BOP)End of packet (EOP)

Virtual channels Best effortGuaranteed service

Best effortGuaranteed service

Programming model Message passing Shared memory (2 routers per cluster)Message passing (1 router per cluster)

Clocking scheme Fully asynchronous (QDI) with GALS interfaces

Multi-synchronous with mesochronousinterfaces

Flow control protocol Send/accept asynchronous handshake FIFO protocol (Write and WriteOk)

Clock tree None One per router

Physical implementation Hard macro Soft macro distributed on five modules

Long wires Inter-router wires Intra-cluster wires

Ivan MIRO PANADES – NOCS 20089

Page 10: Physical Implementation of the DSPIN Network-on-Chip in

Packet format

Y X

4 bits

H0H1H2...H8

2 bits 4 bits2 bits

34 bits18 bits

34 bits (generic)

First flit

Following flits

2 bits

8 bits

DSPIN packetANOC packet

Similar packet format and control bitsANOC uses Source-routing (18 bits) allowing 9 hopsDSPIN uses Address-based (8 bits)Packet conversion module required:

- Design of Protocol_conversion module

Ivan MIRO PANADES – NOCS 200810

Page 11: Physical Implementation of the DSPIN Network-on-Chip in

FAUST integration

GALS interface

SynchronousSEND/ACCEPT

AsynchronousSEND/ACCEPT

IPNIC

ANOC router

AsynchronousSEND/ACCEPT

CLK_IP

ANOC IP template

Protocol_conversion

SynchronousREAD/WRITE

SynchronousSEND/ACCEPT

IPNIC

LUT

DSPINrouter

MesochronousREAD/WRITE

CLK_NoC

CLK_IP

DSPIN IP template

Protocol_conversion module:Translates the routing algorithm using a LUTAdapts the flow control signals:

- ANOC: Send/Accept - DSPIN: FIFO protocol

Implements two bi-synchronous FIFOs

ANOC GALS Interface:Implements 4 FIFOssynchronous↔asynchronous

FAUST IPs and NICs are not modified

Ivan MIRO PANADES – NOCS 200811

Page 12: Physical Implementation of the DSPIN Network-on-Chip in

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 200812

Page 13: Physical Implementation of the DSPIN Network-on-Chip in

DSPIN implementation

Ivan MIRO PANADES – NOCS 200813

SynthesisHierarchical synthesis

Low Power CMOS ST 130nm technology

Standard cells

Without asynchronous nor custom cellsFloorplanning

Place

Optimize placement

Timing constraints file:

- Muti-cycle path (mesochronous interfaces)

- False path (asynchronous interfaces)

Clock-tree

RouteGALS compatible

Clock gating

Implemented in 4 stepsOptimize

Page 14: Physical Implementation of the DSPIN Network-on-Chip in

DSPIN clock-treeMesochronous links

GALS compatibleBi-synchronous FIFO [NOCS 2007]

Max skew 50% clock periodTiming constraints:set_multi_cycle_path

Clk

Clk’Clk Clk’

Ivan MIRO PANADES – NOCS 200814

Router (0,0) Router (0,1) Router (0,2)

180° phase shift and 30% skew between routers5% skew

within the router

2nd, 3th Step5% skew

( bottom tree)

1st Step

4th Step30% skew(top tree)

Clk_NoC

Clock-tree implementation1. Add buffers/inverters2. Built bottom clock tree

(5% skew)3. Characterize bottom

clock-tree4. Build top clock-tree

(30% skew)

Page 15: Physical Implementation of the DSPIN Network-on-Chip in

FAUST floor-plan with DSPINDistributed router implementationSoft macro approachHigher floor-plan flexibilityNoC adapts to the SoCLong wires are routed in a tree mannerDifferent router configurations are possible

WE

S

L N L

ES

W W

N L

S

E

N

WLS

E

N

WL S E

N

LW S E

N

W E

E

N

W

LS E

N

WL S

E

N

WL S E

L

N

W S

E

N

W

L

E

S

N

WL

S E WL

N

S E

N

EW S

L

N

WS

L ES W E

L

N

W S L E

N

W S

L E

N

S

WL

E

N

N

LRAC OFDM mod.

CDMA Mod.NP1

Ala. Bit. Inter.

Turbo Dec.

Conv. Codec.

CLK

ARM946

RAM1

RAM2Ext. RAM Ctrl.

Rotor

Channel Est. EthernetConv. Dec.

Equal.

Frame sync. OFDM demod.

CDMA Dem. Deinter.Demapp.

DART

N

W

LS E

DSPIN routerPlacement density: 60-70 %

(reserved area)

Ivan MIRO PANADES – NOCS 200815

Page 16: Physical Implementation of the DSPIN Network-on-Chip in

FAUST floor-planM

app.

NP

2

NP

1

NP

2

Exp

.

Exp.

CLK

Res

et

FAUST with ANOC FAUST with DSPIN

Ivan MIRO PANADES – NOCS 200816

Page 17: Physical Implementation of the DSPIN Network-on-Chip in

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 200817

Page 18: Physical Implementation of the DSPIN Network-on-Chip in

NoC Area

ANOC DSPINRouter 0.211 mm² 0.161 mm²Interface GALS 0.070 mm² 0.024 mm²Clock tree 0.000 mm² 0.0016 mm²Total 0.281 mm² 0.187 mm²CMOS 130nm

ANOC is implemented as a hard macroDSPIN is implemented as a soft macroDSPIN is 33% smaller than ANOC

Ivan MIRO PANADES – NOCS 200818

Page 19: Physical Implementation of the DSPIN Network-on-Chip in

NoC Throughput

ANOC DSPIN

Throughput on worst-caseconditions ~ 160Mflit/s ≤ 289Mflit/s

Throughput on nominalconditions ~ 220Mflit/s ≤ 408Mflit/s

DSPIN throughput is deterministic with respect to the clock frequency (one flit per clock cycle)Long wire latency penalty on throughput:

- DSPIN: critical path crosses one time the long wires- ANOC: critical path crosses 4 times the long wires, 4-phase protocol

• ANOC link pipelining is feasible

In a commercial circuit, DSPIN will be clocked not far away fromworst-case (289 MHz) to improve the fabrication yield

Ivan MIRO PANADES – NOCS 200819

Page 20: Physical Implementation of the DSPIN Network-on-Chip in

Packet latency (1)

Compute the packet latency through many routersDSPIN has deterministicpacket latency with respect to clock frequencyDSPIN has lower First and Last packet latenciesANOC intermediate latencyis lower than DSPIN. DSPIN resynchronize the data on each router

ANOC DSPIN

F = 150 MHz

Intermediate router latency 6.80 ns 16.66 ns

First + Last latency 60.00 ns 56.66 ns

ANOC DSPIN

F = 250 MHz

6.80 ns 10.00 ns

47.00 ns 34.00 ns

Ivan MIRO PANADES – NOCS 200820

Page 21: Physical Implementation of the DSPIN Network-on-Chip in

Packet latency (2)

ANOC DSPIN ANOC DSPIN

F = 150 MHz F = 250 MHz

Latency for 5 hops path 80.00 ns 106.66 ns 68.00 ns 64.00 ns

Latency for 9 hops path 106.66 ns 173.30 ns 96.00 ns 104.00 ns

The packet latency are similar for clock frequencies >250 MHzIntermediate packet latency is important but the application should be mapped in order to optimize the data locality (try to communicate with neighbor IPs rather than with faraway IPs)

Ivan MIRO PANADES – NOCS 200821

Page 22: Physical Implementation of the DSPIN Network-on-Chip in

Power consumption

DSPIN ANOC

F = 150 MHz F = 250 MHz

Router 2.07 mW 2.89 mW 4.85 mW

GALS interface 1.62 mW 0.56 mW 0.81 mW

Clock tree 0.00 mW 2.44 mW 4.73 mW

Total 3.69 mW 5.89 mW 10.39 mW

Power extraction after P&RFunctional packet traffic (OFDM demodulation)Power consumption majorly dominated by FIFO data registersThe DSPIN clock-gating reduced the power consumption by 67%DSPIN clock-tree consumes as much power as the router itself

- Needs to improve DSPIN clock-gating- GALS clock-tree consumes only 2.5% of total clock-tree power

Ivan MIRO PANADES – NOCS 200822

Page 23: Physical Implementation of the DSPIN Network-on-Chip in

ConclusionPhysical implementation of the DSPIN Network-on-Chip on FAUST platform

- Comparison between ANOC and DSPIN at architecture level - Adaptation of DSPIN architecture to manage stream-oriented

communications- Implementation up to layout of DSPIN network on FAUST platform=> DSPIN mesochronous links fully implemented with standard tools

Comparison between DSPIN and ANOC NoCs (STMicroelectronics 130nm)

- Area of DSPIN is 33% smaller than ANOC one- Maximum sustained throughput of DSPIN is 31% higher than ANOC- ANOC has lower packet latency - DSPIN power consumption 1.5 to 3 times higher than ANOC

ANOC is a good candidate for low latency and low power applications, while DSPIN is more suited to low area and high performance applications

Ivan MIRO PANADES – NOCS 200823

Page 24: Physical Implementation of the DSPIN Network-on-Chip in

See you at my PhD defense on May 20th

Ivan MIRO PANADES – NOCS 200824