35
1 Introduction Linus Svensson Linus Svensson D5, [email protected] D5, [email protected] Åke Östmark Åke Östmark D5, [email protected] D5, [email protected]

Introduction

  • Upload
    lowri

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction. Linus Svensson D5, [email protected] Åke Östmark D5, [email protected]. Why We Are Here. The architecture of a Network Processor Unit (NPU) Master’s thesis - a joint operation between Luleå University of Technology and SwitchCore AB. Today's Topics. Background - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction

1

Introduction

Linus SvenssonLinus Svensson D5, [email protected], [email protected]

Åke ÖstmarkÅke Östmark D5, [email protected], [email protected]

Page 2: Introduction

2

Why We Are Here

The architecture of a Network Processor The architecture of a Network Processor Unit (NPU)Unit (NPU)

Master’s thesis - a joint operation between Master’s thesis - a joint operation between Luleå University of Technology and Luleå University of Technology and SwitchCore ABSwitchCore AB

Page 3: Introduction

3

Today's Topics

BackgroundBackground Ethernet and internetworksEthernet and internetworks Switches and routersSwitches and routers

NPU (Network Processor Unit)NPU (Network Processor Unit) Why an NPU?Why an NPU? Cons and pros with NPU:sCons and pros with NPU:s

The architecture of our NPUThe architecture of our NPU Design difficulties and design choicesDesign difficulties and design choices The architecture, strengths and weaknessesThe architecture, strengths and weaknesses

The big pictureThe big picture From idea to siliconFrom idea to silicon

Page 4: Introduction

4

Ethernet

Most widespread network technology used Most widespread network technology used in LAN (Local Area Network)in LAN (Local Area Network) 10 Mb/s (Ethernet)10 Mb/s (Ethernet) 100 Mb/s (Fast Ethernet)100 Mb/s (Fast Ethernet) 1000 Mb/s (Gigabit Ethernet)1000 Mb/s (Gigabit Ethernet)

Packet switched networkPacket switched network Host-to-host delivery on the same networkHost-to-host delivery on the same network Switches forward packets from one section to another Switches forward packets from one section to another

using the datagram paradigmusing the datagram paradigm

Page 5: Introduction

5

Ethernet

Datagram paradigmDatagram paradigm Packet contains enough information for a switch to Packet contains enough information for a switch to

forward it correctlyforward it correctly I.e. packet contains complete destination addressI.e. packet contains complete destination address

Ethernet packets = framesEthernet packets = frames In Ethernet the packets are referred to as framesIn Ethernet the packets are referred to as frames

Page 6: Introduction

6

Ethernet Frame Format

PreamblePreamble 64 bits used for synchronisation64 bits used for synchronisation

HeaderHeader 48-bit globally unique destination address48-bit globally unique destination address 48-bit globally unique source address48-bit globally unique source address 16-bit type field used for classification16-bit type field used for classification

PreambleDest addr

Source addr

Type Body CRC

8 6 6 2 46-1500 4 Bytes

Page 7: Introduction

7

Ethernet Frame Format

BodyBody 46-1500 bytes of data46-1500 bytes of data

CRCCRC 32-bit CRC (Cyclic Redundancy Check) for error 32-bit CRC (Cyclic Redundancy Check) for error

detectiondetection

PreambleDest addr

Source addr

Type Body CRC

8 6 6 2 46-1500 4 Bytes

Page 8: Introduction

8

Internetworks

InternetworkInternetwork Several physical networks combined into one logical Several physical networks combined into one logical

internetworkinternetwork Also called internet (with lowercase “i”)Also called internet (with lowercase “i”) Most famous is the world spanning Internet (with capital “I”)Most famous is the world spanning Internet (with capital “I”)

Host-to-host delivery between different networksHost-to-host delivery between different networks

Page 9: Introduction

9

Internet Protocol (IP)

Most widespread protocol used in Most widespread protocol used in internetworksinternetworks

Routers forward packets from one network Routers forward packets from one network to another using the datagram paradigmto another using the datagram paradigm

Page 10: Introduction

10

IP Packet Format

12 bytes of status fields e.g. version, length etc12 bytes of status fields e.g. version, length etc 32-bit globally unique source address32-bit globally unique source address 32-bit globally unique destination address32-bit globally unique destination address Optional fields of variable lengthOptional fields of variable length BodyBody

Ver, len etc

Source addr

Dest addr

Opt Body

12 4 4 Bytes0-65515

Page 11: Introduction

11

IP Over Ethernet

IP packets are encapsulated in Ethernet IP packets are encapsulated in Ethernet framesframes

P r e a m b l eD e s t a d d r

S o u r c e a d d r

T y p e B o d y C R C

V e r , l e n e t c

S o u r c e a d d r

D e s t a d d r

O p t B o d y

Page 12: Introduction

12

Host-To-Host Communication

H

S

H

R R

H

S

H

Network 1 Network 2 Network 3

Page 13: Introduction

13

Devices

SwitchCore CXE-2010SwitchCore CXE-2010 A 16-port Gigabit Ethernet Switch-on-a-chipA 16-port Gigabit Ethernet Switch-on-a-chip Full 4K VLAN supportFull 4K VLAN support Includes support of IEEE 802.1pIncludes support of IEEE 802.1p

Cisco 1710Cisco 1710 Security Access RouterSecurity Access Router Secure Internet, intranet, and extranet access with VPN and Secure Internet, intranet, and extranet access with VPN and

firewallfirewall Advanced QoS featuresAdvanced QoS features

Page 14: Introduction

14

Features

What if we want:What if we want: Load BalancingLoad Balancing

distributing client requests across multiple serversdistributing client requests across multiple servers

Multi-Protocol Label Switching (MPLS)Multi-Protocol Label Switching (MPLS) next hop based on a the labelnext hop based on a the label

Page 15: Introduction

15

Features

What if we don’t wantWhat if we don’t want QoS QoS Security featuresSecurity features

The Network Processor Unit (NPU)The Network Processor Unit (NPU) A programmable CPU chip that is optimized for networking and A programmable CPU chip that is optimized for networking and

communications functionscommunications functions Quick adaptation of new standards/featuresQuick adaptation of new standards/features

Page 16: Introduction

16

Conditions For the Work

1 GE (1000 Mbit) port1 GE (1000 Mbit) port 8 FE (100 Mbit) ports8 FE (100 Mbit) ports ScalableScalable

Add more portsAdd more ports Remove ports Remove ports

Feasible to make an ASIC prototypeFeasible to make an ASIC prototype

Page 17: Introduction

17

NPU components:NPU components: Processor CoreProcessor Core Embedded softwareEmbedded software Network InterfaceNetwork Interface Packet buffersPacket buffers QueuesQueues TablesTables Switch fabricSwitch fabric

Page 18: Introduction

18

Design Choices

Processor coreProcessor core RISC basedRISC based Network specificNetwork specific

Network InterfaceNetwork Interface FEFE

MII (Media Independent Interface)MII (Media Independent Interface) RMII (Reduced MII)RMII (Reduced MII)

GEGE GMII (Gigabit MII)GMII (Gigabit MII) RGMII (Reduced GMII)RGMII (Reduced GMII)

Page 19: Introduction

19

Design Choices

QueuesQueues A packet ready for transmissionA packet ready for transmission

TablesTables Data structure for IP & MAC addressesData structure for IP & MAC addresses

Switch fabricSwitch fabric The internal interconnect architecture. The internal interconnect architecture.

How to transport from in-port to out-port?How to transport from in-port to out-port?

Page 20: Introduction

20

Design Choices

Packet buffersPacket buffers Internal and/or externalInternal and/or external How many times do we need to access a (buffer) How many times do we need to access a (buffer)

memory?memory? Write when receive from networkWrite when receive from network Read packet for processingRead packet for processing Write modified packet for transmissionWrite modified packet for transmission Reading the packet when transmittingReading the packet when transmitting For N ports the memory needs to run at 4N the port speedFor N ports the memory needs to run at 4N the port speed

Page 21: Introduction

21

Design Choices

8 FE ports8 FE ports 1 GE port1 GE port

Inter-arrival time:Inter-arrival time: 1.5*101.5*1066 + 8*1.5 + 8*1.555 = 2.7*10 = 2.7*1066 packets/s packets/s -> New packet every 370 ns-> New packet every 370 ns

Cycle budget example:Cycle budget example: 100 MHz -> 37 cycles to process every packet100 MHz -> 37 cycles to process every packet 200 MHz -> 74 cycles to process every packet200 MHz -> 74 cycles to process every packet

Page 22: Introduction

22

Design Choices

Model of operationModel of operation Route processingRoute processing Packet forwardingPacket forwarding

~200 cycles~200 cycles Special servicesSpecial services

Target technologyTarget technology ~150 MHz~150 MHz

Page 23: Introduction

23

Design Decisions

2 FE ports2 FE ports 125 MHz125 MHz 1 Integer Unit1 Integer Unit

1 GE port1 GE port 125 MHz125 MHz 5 Integer Units5 Integer Units

Interactive voice can tolerate somewhere Interactive voice can tolerate somewhere between 100 and 200 milliseconds of end-between 100 and 200 milliseconds of end-to-end delay without people noticing it.to-end delay without people noticing it.

420 cycles -> 0.00336 ms420 cycles -> 0.00336 ms

-> Cycle budget of 420 for each packet-> Cycle budget of 420 for each packet

Parallel Processor ArchitectureParallel Processor Architecture

Page 24: Introduction

24

Design Decisions

TablesTables MAC Address lookup, fixed length:MAC Address lookup, fixed length: CAM (Content Addressable Memory)CAM (Content Addressable Memory)

Pros: FastPros: Fast Cons: ExpensiveCons: Expensive Like a cacheLike a cache

IP Address lookup, longest match:IP Address lookup, longest match: Possibly large tablePossibly large table External SRAMExternal SRAM

Page 25: Introduction

25

Internal packet buffers: Internal packet buffers: Pros: Pros: Fast, less pin countFast, less pin count

Cons: Cons: Limited size of memoryLimited size of memory

MAC

MAC

MAC

MAC

Shared memoryPacket buffer

Packet buffer

Packet buffer

Input

2 FE ports / 1 buffer2 FE ports / 1 bufferPros: Pros: Reduce contention, Reduce contention,

reduce 4N problemreduce 4N problem

Cons: Cons: Less effective use of memoryLess effective use of memory

Page 26: Introduction

26

Virtual output queues: Virtual output queues: Pros: Pros: No Head Of Line (HOL) blocking,No Head Of Line (HOL) blocking,

Possible to select any packet from buffer memoryPossible to select any packet from buffer memory

Cons: Cons: Expensive in hardwareExpensive in hardware

MAC

MAC

MAC

MAC

Packet buffer

Input

MAC

MAC

MAC

MAC

OutputVirtual Output Queues

Virtual Output Queues

12

3

4

Packet buffer12

3

4

Page 27: Introduction

27

NPU Architecture

SFPU

CAM SRAM

RU TU

Receiving Units

Processing Units

Switching Fabric

Transmitting Units

Shared Resources

1.8 Gbps 1.8 Gbps

Page 28: Introduction

28

32 (to SF)

MIPS IU

Arb

MemCtrl (Instr)

Trans- mitter

Frame Engine

8kB SRAM

1kB SRAM

CAM I/O

Shared SRAM I/O

420 cycles / min size packet

1 transmit / 20 cycles (FE) or 1 transmitt / 4 cycles (GE)

MemCtrl (Data)

1kB SRAM

128 (from RU)

128

128

32 32

32

24

32

3 accesses / 40 cycles (not counting accesses from IU)

PU with 1xIU

Page 29: Introduction

29

MIPS IU

Arb

MemCtrl (Instr)

Trans- mitter

Frame Engine

32kB SRAM

1kB SRAM

CAM I/O

Shared SRAM I/O

420 cycles / min size packet

1 accesses / 32 cycles (not counting accesses from IUs)

1 transmit / 5 cycles

MemCtrl (Data)

1kB SRAM

512 (from RU)

512

512

32 32

ArbArb

Arb

32

24

32

32 (to SF)

PU with 5xIU

Page 30: Introduction

30

Performance

0

50

100

150

200

250

50 100 150 200

Frames

Cy

cle

s

IP in shared SRAM

IP in internal SRAM

MAC in shared CAM

Page 31: Introduction

31

Strengths in the Architecture

More bandwidthMore bandwidth More RU and TUMore RU and TU New types of RU and TUNew types of RU and TU

More processing powerMore processing power More PU per RU/TUMore PU per RU/TU More IU per PUMore IU per PU New types of PUNew types of PU New types of IUNew types of IU

Page 32: Introduction

32

Strengths in the Architecture

New functionalityNew functionality New types of shared resourcesNew types of shared resources

SemaphoresSemaphores Multipurpose CPUMultipurpose CPU

New softwareNew software All IU:s can run different softwareAll IU:s can run different software

Page 33: Introduction

33

Weaknesses in the Architecture

Not everything scales wellNot everything scales well Shared resourcesShared resources No. of IU:s in a PUNo. of IU:s in a PU

Page 34: Introduction

34

ASIC design flow

From Idea to SiliconDesign Entry

Logic Synthesis

Floor-planning

Placement

Routing

VHDL/Verilog

Transfer to target technology(TSMC 0.18)

Arrange blocks on chip

Decide location of cells in a block

Make connections betweencells and blocksCircuit

exctraction

Postlayout simulation

Finished

DesignSpecification

Page 35: Introduction

35

ALU : process(alu_RegA, alu_RegB, In_Ctrl_Ex) begin case In_Ctrl_Ex.OP is when ALU_ADD => alu_Result <= alu_RegA + alu_RegB; when ALU_SUB => alu_Result <= alu_RegA - alu_RegB; when ALU_AND => alu_Result <= alu_RegA and alu_RegB; when ALU_OR => alu_Result <= alu_RegA or alu_RegB; when ALU_XOR => alu_Result <= alu_RegA xor alu_RegB; when ALU_NOR => alu_Result <= alu_RegA nor alu_RegB; when others => alu_Result <= (others => '-'); end case; end process;

Layout

2.6 x 2.6 mm