104
NOC: Networks on Chip MPSoC:Multiprocessor System on Chip EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/EE8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Ryerson University Overview Overview Introduction to SoC and MPSoC Networks on a Chip Bus-based and Point-to-point NoC Systems Regular and Application Specific NoC Topologies Routing and Switching Techniques NOC Topology Generation and Analysis Introductory Articles on Introductory Articles on MPSoC MPSoC and and NoC NoC available at the course available at the course webpage webpage

NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Embed Size (px)

Citation preview

Page 1: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC: Networks on Chip MPSoC:Multiprocessor System on Chip

EE8205: Embedded Computer Systemshttp://www.ee.ryerson.ca/~courses/EE8205/

Dr. Gul N. Khanhttp://www.ee.ryerson.ca/~gnkhan

Electrical and Computer EngineeringRyerson UniversityRyerson University

OverviewOverview• Introduction to

SoC

and

MPSoC

• Networks on a Chip • Bus-based and Point-to-point

NoC

Systems

• Regular and Application Specific

NoC

Topologies• Routing and Switching Techniques• NOC Topology Generation and Analysis

Introductory Articles onIntroductory Articles on MPSoCMPSoC andand NoCNoC available at the courseavailable at the course webpagewebpage

Page 2: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 2

System on a Chip Systems-on-Chip (SoC)

Advances in chip design and integration.•

Incorporate multiple components on a single chip.

MPSoC

has addressed ever-increasing performance requirements.

Samsung S3C6400 Platform

Page 3: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 3

Samsung S3C6410 Platform

Page 4: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 4

S3C6410 System on Chip• A

16/32-bit RISC low power, high performance micro-processor

Applications include

mobile phones, Portable

Navigation Devices and other general

applications.

Provide

optimized H/W performance for the

2.5G and 3G communication services,

Includes

many powerful hardware accelerators for motion video

processing, display control and scaling. An

Integrated

Multi Format Codec (MFC) supports

encoding and decoding of MPEG4/H.263,

H.264.

Many hardware peripherals such as camera interface, TFT 24-bit LCD controller, power management, etc.

Page 5: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 5

ARM 11 (v6) based SOC

Page 6: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 6

S3C6410 based Mobile Processor

Navigation System

iPhone

based on ARM1176JZS3C6410

Page 7: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 7

System on Chip Design Flow • Specify:

* What does the customer really want?• Architect:

* Find the most cost and performance effective architecture to implement it?* What existing components can we adapt and re-use?

• Evaluate: * What is the performance impact of a cheaper architecture?

• Implement:* What can we generate automatically from libraries and customization?

Use separate computation, communication and performance

Page 8: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 8

System-on-Chip and

NoC System-on-Chip ---to--- NetworkNetwork--onon--ChipChip

Analog ComponentADC/DAC

VGA CORE

DSP

CPU

MPEG CORE

Page 9: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 9

SoC

StructureNoC-based System on a Chip

Proc

Proc Proc

Cache L2

A tile of the chip

control

data

spare

parity

A tile of the chip

Instr $

Data $NetworkInterface

p1

p2

p3

p4

Switch Fabric

Control Logic p0

core

control

data

spare

parity

A computational block

Switch Fabric

Control Logic p0

Instr $

Data $NetworkInterface

core

p1 p3bus

A communication link

Page 10: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 10

System on Chip Design Flow

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

Flow To Implementation4

Page 11: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 11

System on Chip Design Flow

Annotation of architectural timing and

Energy onto behavior

PerformanceSimulation

behavior annotated with architectural effects

Analyze / VisualizeResults

Page 12: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 12

SoC Appl.-

Wireless LAN Physical Layer

OFDM Physical Layer/Digital BB

MAC

OFDM TXOFDM TX OFDM RXOFDM RX

Network

Application

HiperLan/2

PicoRadio

Protocol StackProtocol Stack

Multi-media WirelessNetworks;High Rate: 10 Mb/secLow Power: 10-100 mW

Ad Hoc Networks:Low Rate: b/sec - kb/secLow Power: 100μW

DynamicDynamicReconfigurationReconfiguration

Page 13: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 13

Wireless LAN

SoCASIC FPGA

Microcontroller

busbus

ADAD

DAPA

Analog front end

ADAD

DAPA

FPGA

Micro-controller

crossbar bus

f0 f1 f2Analog front end

**-Main Points-**• Which micro-controller to use?• Do we need more

FPGAs?• DSP or ASIC?• Which MAC?• Where will the MAC run? • Which other

appls. can I add?• Is the chip reusable?• Is too much memory?

Digital Modem

ProtocolUserInterface

clockmanager

sleep modemngmt

BlocTurbo Codec

ADAD

DAPA

Analog front end

Page 14: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 14

Wireless LAN Physical Layer Design Flow

Implementation

Application Specification

Algorithm Exploration

Functional Simulation and Refinement

Architecture Exploration:Performance Simulation

Architecture Refinement

SystemC or C (Matlab/Simulink, …)

English (UML, SystemC…)

Coware (….)

Coware(, …)

TX

OFDM RXOFDM TX

RX

OFDM Physical Layer

Higher Layers

Functional IP Reuse

Mapping

SystemC

Mapping

Functional Partitioning

Page 15: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 15

Physical Layer

SoC

Architecture

FPGA FFT FIR UART BUFFER

FPGAconfig. mem. Int. bridge

Micro -

Clock gen.

SPS2(instruction/

data RAM)

XBARInterface

Processor busInterface

XBAR

Processor bus

JtagInterface

DPR2/SPS2Bridge

TEST(0..2)

Ck, reset

CK2 CK1 MCK VDD VSS Reset

I/D caches

Datapath

Page 16: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 16

Multiple Processor/Core System-on- Chip

Inter-node communication between CPU/cores can be performed by message passing or shared memory. Number of processors in the same chip-die increases at each node (CMP and

MPSoC).

• Memory sharing will require: SHARED BUSSHARED BUS* Large

Multiplexers

* Cache coherence techniques* Not Scalable

• Message Passing: NOCNOC* Scalable* Require data transfer transactions * Has overhead of extra communication

Page 17: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 17

NOC: Network-on-Chip

Shared bus is not a long-term solution• It has poor scalabilityOn-Chip micro-networks suit the demand of scalability and performance

System Bus

Page 18: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 18

NOC and Off-Chip Networks

NOCNOCSensitive to cost:

area and powerWires are relatively cheapLatency is criticalTraffic is known a-prioriDesign time specializationCustom NoCs are possible

OffOff--Chip NetworksChip NetworksCost is in the linksLatency is tolerableTraffic/applications

unknownChanges at runtimeAdherence to networking

standards

Page 19: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 19

On-Chip Communication Structures

Page 20: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 20

On-Chip Bus Interconnection

For highly connected multi-core systemCommunication bottleneck

For multi-master busesArbitration will become a complex problem

Power grows for each communication event as more units attached will increase the capacitive load.

A crossbar switch can overcome some of these problems and limitations of the buses

Crossbar is not scalable

Page 21: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 21

SOC Communication StructuresDedicated Point-to-Point

• AdvantagesOptimal in terms of bandwidth, availability, latency and power usage

Simple to design and verify as well as easier to model

• DisadvantagesNumber of links may increase exponentially with the increase in number of cores

Hardware AreaRouting Problems

Page 22: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 22

SOC Communication StructuresNetwork on Chip

• AdvantagesStructured architecture – Lower complexity and cost of SOC design

Reuse of components, architectures, design methods and tools

Efficient and high performance interconnect.Scalability of communication architecture

• DisadvantagesInternal network contention can cause a latencyBus oriented IPs need smart wrapping hardwareSoftware needs clear synchronization in

multiprocessor systems

Page 23: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 23

Networks-on-Chip• Interconnect for SoCs, CMPs, MPSoC and

FPGAsMulti-hop, packet-based communicationEfficient resource sharing

• Scalable communication infrastructureprovides scalable performance/efficiency in

PowerHardware AreaDesign productivity

Page 24: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 24

Networks-on-Chip• Interconnect for SoCs, CMPs, MPSoC and

FPGAsMulti-hop, packet-based communicationEfficient resource sharing

• Scalable communication infrastructureprovides scalable performance/efficiency in

PowerHardware AreaDesign productivity

Page 25: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 25

NoCNoC ??A chip-wide network: Processing Elements (PEs) are inter- connected via a packet-based network in NoC Architecture

textROUTER

PE 1

textROUTER

PE 5

textROUTER

PE 9

textROUTER

PE 13

textROUTER

PE 2

textROUTER

PE 6

textROUTER

PE 10

textROUTER

PE 14

textROUTER

PE 3

textROUTER

PE 7

textROUTER

PE 11

textROUTER

PE 15

textROUTER

PE 4

textROUTER

PE 8

textROUTER

PE 12

textROUTER

PE 16

MSG

MSG

Packetized Message

Decoded Message

Page 26: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 26

Network-on-Chip vs. Bus Interconnection• Total bandwidth grows• Link speed unaffected• Concurrent spatial reuse• Pipelining is built-in• Distributed arbitration•

Separate abstraction layers

However• No performance guarantee• Extra delay in routers• Area and power overhead?• Modules need NI• Unfamiliar methodology

BUS inter-connection is fairly simple and familiar

However• Bandwidth is limited, shared• Speed goes down as N grows• No concurrency • Pipelining is tough• Central arbitration•

No layers of abstraction (communication and computation are coupled)

Page 27: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 27

On-Chip Buses

• Ad hoc BusesTraditional Data/Address Buses

• ARM AMBA BusAdvanced Micro controller Bus Architecture

• IBM Core Connect BusCoreConnect

Bus Architecture

Page 28: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 28

AMBA On-Chip Bus

AMBA evolved from ARM’s internal bus development:

ASB/AHB: Advance System Bus/High Performance bus with support for pipelining, burst transfer and multiple bus masters

• APB: Advance

Periphral

Bus with all slave devices

• Bridge: A slave on ASB that connect it to APB

Page 29: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 29

AMBA based Single Chip GPS Controller■ Suitable for handheld

and personal navigation systems■ ARM7TDMI 16/32 bit RISC CPU based host■ Complete embedded memory system:

Flash 256 KB, RAM 64 KB.■ 12 channel GPS correlation DSP■ 4 channels A/D■ 4 serial communication interfaces■ One serial peripheral interfaces (SPI)■ Real-time clock module ■ 16-bit watchdog timer

Page 30: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 30

IBM

CoreConnect

On-Chip BusCoreConnect

is an SOC Bus proposed by IBM having:

• PLB: Processor Local Bus, PLB Arbiter, PLB to OPB Bridge• OPB: On-Chip Peripheral Bus, OPB Arbiter• DCR: Device Control Register Bus and a Bridge

Page 31: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 31

CoreConnect

Advance FeaturesIBM

CoreConnect

Bus with

32-, 64-, and 128-bit versions to

support a variety of applications• PLB: Fully synchronous, supports up to 8 masters

-

Separate read/write data buses-

Burst transfers, variable and fixed-length, Pipelining

-

DMA transfers and No on-chip tri-states required-

Overlapped arbitration, programmable priority fairness

• OPB: Fully synchronous, 32-bit address and data buses-

Support 1-cycle data transfers between master and slaves

-

Arbitration for up to 4 OPB master peripherals-

Bridge function can be master on PLB or OPB

• DCR: Provides fully synchronous movement

of GPR data between CPU and slave

logic

Page 32: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 32

CoreConnect

Bus based

SoC

Page 33: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 33

Comparing AMBA and CoreConnect SoC

Buses

Page 34: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 34

NoC: Buses to NetworksOriginal Bus Features•

One transaction at a time

Central Arbiter•

Limited bandwidth

Synchronous•

Low cost

S

S

Shared Bus to Segmented Bus

Page 35: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 35

Advanced Bus

Segmented BusSegmented Bus• More General/Versatile

bus architecture• Pipelining capability• Burst transfer • Split transactions• Overlapped arbitration • Transaction preemption,

resumption & reordering

Shared Bus to Segmented Bus

S

S

Page 36: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 36

Buses to Networks

• Architectural paradigm shift: Replace wire spaghetti by network• Usage paradigm shift: Pack everything in packets• Organizational paradigm shift

Confiscate communications from logic designersCreate a new discipline, a new infrastructure responsibility

Page 37: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 37

NoC

Related Main ProblemsGlobal interconnect design problems:

• Delay• Power• Noise• Scalability• Reliability

System integration Productivity problem

Chip Multi Processors For power-efficient computing

Page 38: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 38

NoC

and Global Connections DelayLong wiring delay is dominated by Resistance•

Add repeaters

• Repeaters will become latches(with clock frequency scaling)

• Latches can become NoC routers

NoC router

NoC router

NoC router

Page 39: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 39

NoC: Long Wiring DelaysLong wiring delay is dominated by Resistance•

Add repeaters

• Repeaters will become latches(with clock frequency scaling)

• Latches can become NoC routers

NoC router

NoC router

NoC router

Page 40: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 40

NoC

Wiring Design

NoC

links:–

Regular

Point-to-point --

no fan-out tree (problem)–

Can use transmission-line layout

Well-defined current return path

Can be optimized for noise / speed / power–

Low swing, current mode, ….

Page 41: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 41

NoC

ScalabilityCompare the wire-area for same performance

n

n

dd

n

n

dd

NoC:

n

n

dd

Bus

Segmented Bus:

Pt-to-Pt:

( )3O n n

( )2O n n

( )O n

( )2O n n

Page 42: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 42

NoC

a PlatformSystem modules may use

different clocks/voltages.NoC can take care of

synchronization.NoC design may be

asynchronous.No waste of power when

links/routers are idle.It eliminates ad-hoc global

wire engineering.It separates computation

from communication.It supports modularity &

reuse of cores.

NoCNoC platform for System platform for System Integration, Testing and Integration, Testing and

DebuggingDebugging

Page 43: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 43

CMP and

NoC

Uniprocessors

cannot provide Power-efficient performance growth

Interconnect dominates dynamic powerGlobal wire delay doesn’t scaleInstruction-level parallelism is limited

Power-efficiency requires many parallel local computations

Chip Multi Processors (CMP)Thread-Level Parallelism (TLP)

Network is another choice for CMP

Inter- connect

Gate

Diff.

Uni-processor dynamic power

(Magen et al., SLIP 2004)

Die Area (or Power)

Uni-processor Performance

“Pollack’s rule”

(F. Pollack. Micro 32, 1999)

Page 44: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 44

Network-on-Chip Topologies

Application Specific Irregular Topologies

Page 45: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 45

Irregular

NoC

Topologies

Based on the concept of using only what is necessary.

Application-specific topologies.

Eliminate unneeded resources and bandwidth from the system.

Leads to reduced power and area use.

Requires additional design work.

Page 46: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 46

NOC Topology1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Mesh Physical implementation

Page 47: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 47

NOC

Torus

Topology

Torus Physical implementation

1 2 4 3

13 14 16 15

5 6 8 7

9 10 12 11

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Page 48: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 48

NOC Abstraction Layers Network ModelingSoftware Layers

O/S, applicationNetwork and Transport Layers

Network topologySwitchingAddressingRoutingQuality of ServiceCongestion control, end-to-end flow control

Data Link LayerFlow control (handshake)Handling of contentionCorrection of transmission errors

Physical LayerWires, drivers, receivers, repeaters, signaling, circuits,..

e.g. crossbar, ring, mesh, torus, fat tree,…Circuit / packet switching: VCT, wormhole

e.g. guaranteed-throughput, best-effort

Logical/physical, source/destination, flow, transactionStatic/dynamic, distributed/source, deadlock avoidance

Page 49: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 49

Definitions and Terminology

Switch: The component of the network that is in charge of flit routing.

Flit Latency: The time needed for a FLIT to reach its target PE from its source PE.

Packet Latency: The time needed for a PACKET to reach its target PE from its source PE.

Packet Spread: The time from the reception of the first flit of a packet to the reception of the last one.

Page 50: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 50

Message Abstraction

Message

Packet

Header Payload

Flit Typ

e

Dest.

VC

Typ

e

Body

VC

Typ

e

Tail

VC

Packet:Packet: An element of information that a processing element (PE) sends to another PE. A packet may consist of a variable number of flits.”

Flit:Flit: The elementary unit of information exchanged in the communication network in a clock cycle.

Page 51: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 51

Switching Techniques

Circuit Switching

Packet Switching –

Routing ProtocolsStore and Forward: Router cost is packet based. Packet size also affects latency and buffering requirements. Stalling happens at two nodes and the link between them.

Wormhole: Router cost is based on header. Header can effect latency and buffering at the router is based on the header size.

Stalling can happen at all the nodes and links spanned by the packet..

Virtual Cut-through: Router cost depends on header and packet size. Stalling at local nodes level.

Page 52: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 52

Relevant Parameters: RoutingMinimum latency is of paramount importance in

NOC/SOC (inter-process communication). Ideally: 1 clock latency per switch/router (flit enters

at time t and exits at t+1)Maximum switch clock frequency (technology +

routing logic limits)Deadlock freeNo flits are ever lost; once a flit is injected in the

NOC, it must reach its destination may be after a long time.

Page 53: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 53

Fixed Shortest Path Routing

Suitable for Regular Topologiese.g. Mesh, Torus, Tree, etc.

X-Y routing (fist x then y direction.

Simple Router No deadlock scenarioNo retransmissionNo reordering of messagesPower-efficient

Page 54: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 54

Wormhole RoutingIn wormhole routing a header flit “digs”

the

path and hold.Successive flits are routed to the same path or

directionIn case of blocks and loss-less

NoC

we need:

BuffersA back-pressure mechanism if we don’t have

infinite size FIFOs…

Page 55: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 55

Wormhole

Src

Dest

Page 56: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 56

Wormhole

Src

Dest

H F

F 2

F 3

F 4

T F

Page 57: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 57

Wormhole

Src

Dest

F 2

H F

F 3

F 4

T F

Page 58: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 58

Wormhole

Src

Dest

F 3

F 2

HF

F 4

T F

Page 59: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 59

Wormhole

Src

Dest

F 4

F 3 F2

HF

T F

Page 60: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 60

Wormhole

Src

Dest

F 4

F 3 F2

HF

T F

Page 61: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 61

Wormhole

Src

Dest

F 3

F2

HF

F 4

T F

Page 62: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 62

Wormhole

Src

Dest

F3

F2

F 4

T F

HF

Page 63: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 63

Wormhole

Src

Dest

F4

F3

T F

HFF2

Page 64: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 64

Wormhole

Src

Dest

TF

F4

HFF2F3

Page 65: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 65

Wormhole

Src

Dest

TF

HFF2F3F4

Page 66: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 66

Wormhole

Src

DestHFF2F3

TFF4

Page 67: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 67

Deflection RoutingHot Potato Hot Potato –– Deadlock Free RoutingDeadlock Free RoutingEvery flit can be routed to different directions

(no packet notion at the switch level)If the optimal direction is blocked, the flit is “deflected” to

another direction Switch latency of 1 clock cycle whatever the level of congestionMinimum buffer requirements

Packets reorderingAdaptive routingNo bufferingNo back pressureWorks with Torus/Mesh

Wormhole RoutingNo packets reorderingStatic routingBuffering ( ≥ 2 flits/port)Back pressureXY routing needs mesh

Page 68: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

Page 69: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

H

F

F2F3T

F

Page 70: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

F2H

F

F3T

F

Page 71: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

F3 F2H

F

T

F

Page 72: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

T

F

HFF2F3

Page 73: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

T

F

HFF2F3

Page 74: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

F3

TF

H

F

F2

Page 75: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

TF

F3

F2HF

Page 76: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

Dest

F3

F2HFTF

Page 77: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

Hot-Potato

Src

DestF2HFTFF3

Page 78: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 78

Network-on-Chip

Page 79: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 79

Core to Network Connection

Page 80: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 80

NOC Switch/RouterGeneric

Router/Switch

Page 81: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 81

Another Generic Router with Virtual ChannelsVCID

Input 0(From West)

Input 1(From North)

Input N(From PE)

Demux

VC Allocater(VA)

Routing Logic

Flit_in

Credit_out

Full Crossbar(5x5)

Credit_in, Output VC Resv_State

Mux

Scheduling

Switch Allocater(SA)

VC0

VC(V-1)

VC0

VC(V-1)

VC0

VC(V-1)

Page 82: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 82

A Typical Router Pipeline

ROUTING& BUFFERS

VCALLOCATION ARBITRATION SWITCH

TRAVERSAL

FLIT IN

FLIT OUT

Page 83: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 83

VC: Virtual-Channels

Page 84: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 84

CAD Problems for NOCApplication Mapping (map tasks to cores)(map tasks to cores)

Floorplanning/Placement (within the network)(within the network)

Routing (of messages)(of messages)

Buffer Sizing (size of FIFO queues in the routers)(size of FIFO queues in the routers)

Timing Closure (Link bandwidth capacity allocation)(Link bandwidth capacity allocation)

Simulation (Network simulation for traffic, delay, power (Network simulation for traffic, delay, power modeling)modeling)

Testing … Combined with problems of designing NOC itself(topology synthesis, switching, virtual channels, arbitration,(topology synthesis, switching, virtual channels, arbitration,flow control,flow control,…………))

Page 85: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 85

Topology Generation and Analysis•

Aim:

Generate a viable network topology.Analyze the generated topology.

Targeted Network:Best-effort, wormhole switched.Lookup table based source routing.No virtual channel support.Round Robin switch output arbitration.One NI per component master or slave interface.All transactions converted to packets of the same length (flit count).Burst beats converted to separate packets.

Page 86: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 86

System Input and Output

Input:Core GraphNetwork Parameters

Output:Topology GraphRoute tablesRecommended Operating Clock Frequency

Page 87: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 87

Topology Generation

Aims:Provide physical links.Minimize latency on select paths.Use a minimum of resources.

Two algorithms are used.ALG1: Point-to-Point Oriented Topologies.ALG2: Partitioned Crossbar Topologies.

Heuristic approach.

Page 88: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 88

Point-to-Point Oriented Topologies

Page 89: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 89

Partitioned Crossbar Topologies•

Initial topology: Fully-

Connected Crossbar (single switch).

Ideal latency situation.

May violate maximum port requirement.

Partitioning process.

Page 90: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 90

Topology Analysis•

Aim:

Estimate achievable performance.Account for interference in the system.

Use of

Petri

Nets.

Partitioned analysis.Analyze components in isolation.Sum contention effects across paths.

Two Stages:Frequency selection.Path verification.

Page 91: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 91

Verification Process

Verify all path latencies.

Write packet latency.Read packet latency.

Adjust delays based on contention.

Contention Areas:Switch output.Destination NI.

Page 92: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 92

Contention Estimation

Page 93: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 93

Frequency Selection

Cyclical relation between contention and frequency.

Frequency is fixed before contention is analyzed.•

To find minimum valid frequency:

Interval halving process.Large number of frequency points.

Page 94: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 94

Simulation Environment•

SystemC

based.

Collection of models:Generators and Sinks.Master and Slave NIs.Various Switches.

AMBA AXI protocol implemented.

Page 95: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 95

Results

Applications and generated topologies.

Comparative results.

Resource Usage.

Accuracy tests.

Page 96: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 96

MPEG4 -

Decoder

Clock Frequency:3.43 GHz

A)

B)

Page 97: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 97

MWD Application

Clock Frequency:573.4 MHz

A)

B)

Page 98: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 98

AV Benchmark

Page 99: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 99

AV Topologies

A) B)

Clock Frequency:2.31 GHz

Page 100: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 100

Comparative Results I

Page 101: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 101

Comparative Results II

Page 102: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 102

Resource Usage

Topology Mesh Fat Tree

Custom 1 Custom 2

MPEG4Decoder

46 44 22 14

MWDApplicatio

n

59 47 13 17

AvBenchmar

k

87 67 25

Page 103: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 103

Accuracy Test Results I

Page 104: NOC: Networks on Chip MPSoC:Multiprocessor System on Chipcourses/ee8205/lectures/SOC-NOC.pdf · NOC and SOC Design 16 Multiple Processor/Core System-on-Chip Inter-node communication

NOC and SOC Design 104

Accuracy Test Results II