28
David M. Zar [email protected] http://www.arl.wustl.edu/projects/techX Block Design Review: PlanetLab Line Card Header Format

David M. Zar Block Design Review: PlanetLab Line Card Header Format

Embed Size (px)

DESCRIPTION

3 - David M. Zar - 3/8/2016 Line Card Centric Overview Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Port Splitter Port Splitter (Ingress and Egress): »Accepts packets on a NN ring »Based on the physical destination port number 0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring »Measured delay is about 120 cycles, including memory latency

Citation preview

Page 1: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

David M. [email protected]

http://www.arl.wustl.edu/projects/techX

Block Design Review:

PlanetLab Line Card Header Format

Page 2: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

2 - David M. Zar - 05/14/23

Revision History 10/31/06 (DMZ):

»Initial Draft 11/04/06 (DMZ):

»Updates for performance issues

Page 3: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

3 - David M. Zar - 05/14/23

Line Card Centric OverviewLookupPhy Int

RxSwitch

TxQM/SchdKeyExtract

HdrFormat

Lookup KeyExtract

SwitchRx

Phy IntTx QM/Schd Hdr

Format

SWITCH

Por

t Spl

itter

Por

t Spl

itter

Port Splitter (Ingress and Egress):»Accepts packets on a NN ring»Based on the physical destination port number

0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring

»Measured delay is about 120 cycles, including memory latency

Page 4: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

Ingress Header Format

Page 5: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

5 - David M. Zar - 05/14/23

Ingress Header Format Microengine Usage

»One microengine»Eight identical threads»NN ring input from Lookup»NN ring output to Port Splitter

Main functions:»Using data from Lookup, modify packet header in DRAM for proper

routing to PE: Destination MAC address

First five bytes are same as source MAC address Source MAC address

Address of this LC VLAN tag

»Adjust pre-queue stats counters»Format input data for QM

QID Port Number Ethernet Frame Length

Page 6: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

6 - David M. Zar - 05/14/23

LC Ingress Functional Blocks

Type=802.1Q (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Src Addr (4B)Dst Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B) Et

hern

etHe

ader

IPHe

ader

UDP

Head

erEt

hern

etTr

aile

r

LookupPhy IntRx

SwitchTxQM/SchdKey

ExtractHdr

Format

Buf Handle(32b)IP Pkt

Length (16b)

QID (20b)VLAN (16b) Stats Index (16b)

DAddr(8b)

Port(4b)

Reserved(8b)

Eth HdrLen (8b)

Stats Index (16b)

Buffer Handle(32b)

Frame Length (16b)

QID(20b)Rsv(4b)

Port(4b)

Rsv(4b)

Type=IP (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Dst Addr (4B)Src Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

Type=802.1Q (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Dst Addr (4B)Src Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B) Et

hern

etHe

ader

IPHe

ader

UDP

Head

er

Possible Input Packet Formats Ouput PacketFormat

Page 7: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

7 - David M. Zar - 05/14/23

MAC Address and VLAN Tag (Ingress) The source MAC address is fixed and set at

boot time (_WU_get_mac_address)

The destination MAC address will only differ in the last byte and this byte is obtained from the Lookup data.

The VLAN tag is obtained from the Lookup data.

Page 8: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

8 - David M. Zar - 05/14/23

Stats/Counters (Ingress/Egress) The Stats Index is obtained from the Lookup Data The pre-queue packet and byte counters are updated

(_WU_update_counters)» Packet counter is incremented (atomic SRAM)» Byte count is incremented by the number of bytes in

the entire Ethernet frame (_WU_get_enet_frame_length).

Frame_length = IP_pkt_len + 18 18 is the VLAN Ethernet header length

Page 9: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

9 - David M. Zar - 05/14/23

QM Data Formatting (Ingress and Egress)

QID is extracted from Lookup data Port number is extracted from Lookup data Total Ethernet frame length is passed to QM Stats index is passed on for post-queue counters

Stats Index (16b)

Buffer Handle(32b)

Frame Length (16b)

QID(20b)Rsv(4b)

Port(4b)

Rsv(4b)

Page 10: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

10 - David M. Zar - 05/14/23

Ingress HF Block Diagram

_WU_get_enet_frame_length

_WU_write_vlan_header

_WU_update_counters

_WU_update_buffer_descriptorWait for prev ctx

Signal next ctx

NN Enqueue

Wait for prev ctx

Signal next ctx

NN Dequeueinit

signal

dl_sink()

dl_source()

DRAM: 4|5 4B writes

Cycles: 26

SRAM: 1 read 1 write

Cycles: 10

SRAM: 3 writes

Cycles: 12

Cycles: 10

Cycles: 5

Cycles: 2

Cycles: 1

Total cycles: 33+66=99 Budget: 1400 MHz/(10Gbs/8*90) = 100.8 => 100 cycles

Measured Latency: 745

Cycles: 17

Cycles: 16

Page 11: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

11 - David M. Zar - 05/14/23

Ingress Validation Send in non-tunneled packets and check output

packets to see they are our internal, tunneled, packets.» Worked during development but not tested in integrated

system at this point. Send in tunneled packets and check output packets to

see they are our internal, tunneled, packets.» Example:

01020304 05060708 090a0b0c 81000aaa 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87 [6d7e d5be] CRC that’s stripped by RX->

» 01020304 0a020102 03040a0b 81000002 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87

Page 12: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

Egress Header Format

Page 13: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

13 - David M. Zar - 05/14/23

Egress Header Format Microengine Usage

»One microengine»Eight identical threads»NN ring input from Lookup»NN ring output to Port Splitter

Main functions:»Using data from Lookup, modify packet header in DRAM for proper

routing to Switch: Destination MAC address

First five bytes are same as source MAC address Destination MAC address is looked up based on IP address from lookup

Source MAC address Address of this LC

VLAN tag»Adjust pre-queue stats counters»Format input data for QM

QID Port Number Ethernet Frame Length

Page 14: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

14 - David M. Zar - 05/14/23

LC Egress Functional Blocks

Lookup KeyExtract

SwitchRx

Phy IntTx QM/Schd Hdr

FormatSWITCH

EthernetFrame Length (16b)

Buffer Handle(32b)

Stats Index (16b)

QID(20b)Rsv(4b)

Port(4b)

Rsv(4b)

Type=802.1Q (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Src Addr (4B)Dst Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B) Et

hern

etHe

ader

IPHe

ader

UDP

Head

erEt

hern

etTr

aile

r Inpu

t Pac

ket F

orm

at

Type=802.1Q (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Src Addr (4B)Dst Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B) Et

hern

etHe

ader

IPHe

ader

UDP

Head

erEt

hern

etTr

aile

r

Out

put P

acke

t For

mat

Buf Handle(32b)IP Pkt

Length (16b)Reserved

(8b)Eth HdrLen (8b)

VLAN(12b)

QID (20b)Rsvd(4b)

Port(4b)

Rsvd(4b)

Stats Index (16b)Rsvd(4b)

IP DAddr (32b)

Page 15: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

15 - David M. Zar - 05/14/23

MAC Address and VLAN Tag (Egress) The source MAC address is fixed and set at

boot time (_WU_get_mac_address)

The destination MAC address will only differ in the last nibble and this nibble is obtained from the Lookup data.» _WU_ip_lookup will take 32 bits from the destination IP address

and use the local CAM to obtain the least significant 4 bits of the MAC address.

» The CAM state bits are used for this so that’s why there are only 4 bits of data returned

The VLAN tag is obtained from the Lookup data.

Page 16: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

16 - David M. Zar - 05/14/23

Egress HF Block Diagram

_WU_get_enet_frame_length

_WU_write_vlan_header

_WU_update_counters

_WU_update_buffer_descriptorWait for prev ctx

Signal next ctx

NN Enqueue

Wait for prev ctx

Signal next ctx

NN Dequeueinit

signal

dl_sink()

dl_source()

DRAM: 1 4B read 4 4B writesCycles: 32SRAM: 1 add 1 incrCycles: 6

SRAM: 3 writesCycles: 10

_WU_ip_lookup

Cycles: 10

Cycles: 2

Cycles: 2

Cycles: 1

Cycles: 1

Cycles: 1

Total cycles: 65

Measured Latency*: ~660

Page 17: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

17 - David M. Zar - 05/14/23

Egress Validation Send in our internal, tunneled packets and check

output packets to see they are our valid IP, tunneled, packets.» For the PlanetLab demo, there are no non-tunneled output

packets Check packet and byte counters for valid updates Check CAM for proper initialization (data watch)

Page 18: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

18 - David M. Zar - 05/14/23

HF Initialization (Ingress/Egress) All memory locations defined in dl_system.h:

»Base address for HF LC[I/E]_HF_SRAM_INIT_BASE

MAC_ADDR_HI32 MAC_ADDR_LO16

»Pre-Queue Counters LC[I/E]_LU_COUNTERS_SRAM_INIT_BASE

LC[I/E]_LU_PRE_Q_PKT_CNT_OFFSET – offset into counters structure for packet counter

LC[I/E]_LU_PRE_Q_BYTE_CNT_OFFSET – offset into counters structure for byte counter.

Thread 0 waits for signal from rx For Egress, the CAM is filled (_WU_hfe_initialize_ip_lookup)

with data from LCE_HF_SRAM_INIT_BASE + 8:each entry is 64 bits: cam_entry (32b), RSVD (28b), MAC_DEST (4b)

Page 19: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

19 - David M. Zar - 05/14/23

File Locations (Ingress and Egress) Main code

» Applications/LC_Ingress/src/hdr_format/PL/hdr_format.uc» Applications/LC_Egress/src/hdr_format/PL/hdr_format.uc

Library » library/DataPlane/hdr_format_util.uc

Page 20: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

20 - David M. Zar - 05/14/23

Required Includes (Ingress and Egress) Files

»build/PL/dispatch_loop/dl_system.h memory locations

»IXA_SDK_4.0/src/library/microblocks_library/ dl_meta – for metadata macros

»IXA_SDK_4.0/src/library/dataplane_library/ dram – for DRAM read/write macros sram – for SRAM read/write/add/incr macros xbuf – for transfer buffer macros

Page 21: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

Performance Issues

Page 22: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

22 - David M. Zar - 05/14/23

Ingress Performance AnomaliesThese stalls are in various SRAM and DRAM accesses – the

command FIFO is FULL!

Page 23: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

23 - David M. Zar - 05/14/23

Ingress Anomalies (Explanation)

Page 24: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

24 - David M. Zar - 05/14/23

Ingress Anomalies (Explanation)

These bus arbiters are shared across

all memory interfaces

The SRAM Controllers have a

command FIFO

Page 25: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

25 - David M. Zar - 05/14/23

Ingress/Egress SRAM Issues It seems that using atomic ADD/INCR instructions is

expensive at the SRAM controller If I remove them and read the SRAM, add myself, write

the SRAM, this is quicker and consumes less of the SRM controller time an, thus, the command queue never backs up.

The this new design, there are more instructions executed, but there may be a few I could eliminate with some optimizing of code.

No stalling in the WU microblocks (well QM does and RX and TX still do but these looks normal).

Page 26: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

26 - David M. Zar - 05/14/23

Ingress/Egress Performance ~99 CPU cycles ~745 cycles latency Expected performance

»Should have no trouble going at 10 Gb/s but does… Simulated performance (as of 11/06/2006)

»~10 Gb»With all other microengines in place (i.e. real simulation)

Page 27: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

Future Work

Page 28: David M. Zar  Block Design Review: PlanetLab Line Card Header Format

28 - David M. Zar - 05/14/23

Determine source of I/O stalls Update Stubs projects for validation of Ingress/Egress

blocks (done for Ingress) Extend Both blocks for all possible packet formats

»Ingress – inputs»Egress – outputs

Possible instruction optimization to give a little headroom (99 cycles out of 100). Currently, design will not work for standard IPv4 packets; PlanetLab VLAN packets are OK.

Ingress/Egress Future Work