Upload
christiana-maryann-hancock
View
214
Download
0
Embed Size (px)
DESCRIPTION
3 - David M. Zar - 3/8/2016 Line Card Centric Overview Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Port Splitter Port Splitter (Ingress and Egress): »Accepts packets on a NN ring »Based on the physical destination port number 0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring »Measured delay is about 120 cycles, including memory latency
Citation preview
David M. [email protected]
http://www.arl.wustl.edu/projects/techX
Block Design Review:
PlanetLab Line Card Header Format
2 - David M. Zar - 05/14/23
Revision History 10/31/06 (DMZ):
»Initial Draft 11/04/06 (DMZ):
»Updates for performance issues
3 - David M. Zar - 05/14/23
Line Card Centric OverviewLookupPhy Int
RxSwitch
TxQM/SchdKeyExtract
HdrFormat
Lookup KeyExtract
SwitchRx
Phy IntTx QM/Schd Hdr
Format
SWITCH
Por
t Spl
itter
Por
t Spl
itter
Port Splitter (Ingress and Egress):»Accepts packets on a NN ring»Based on the physical destination port number
0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring
»Measured delay is about 120 cycles, including memory latency
Ingress Header Format
5 - David M. Zar - 05/14/23
Ingress Header Format Microengine Usage
»One microengine»Eight identical threads»NN ring input from Lookup»NN ring output to Port Splitter
Main functions:»Using data from Lookup, modify packet header in DRAM for proper
routing to PE: Destination MAC address
First five bytes are same as source MAC address Source MAC address
Address of this LC VLAN tag
»Adjust pre-queue stats counters»Format input data for QM
QID Port Number Ethernet Frame Length
6 - David M. Zar - 05/14/23
LC Ingress Functional Blocks
Type=802.1Q (2B)
PAD (nB)CRC (4B)
UDP Payload(MN Packet)
Src Addr (4B)Dst Addr (4B)
Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)
TTL (1B)Protocol = UDP (1B)
Hdr Cksum (2B)
DstAddr (6B)SrcAddr (6B)
IP Options (0-40B)Src Port (2B)Dst Port (2B)
UDP length (2B)UDP checksum (2B)
VLAN (2B)Type=IP (2B) Et
hern
etHe
ader
IPHe
ader
UDP
Head
erEt
hern
etTr
aile
r
LookupPhy IntRx
SwitchTxQM/SchdKey
ExtractHdr
Format
Buf Handle(32b)IP Pkt
Length (16b)
QID (20b)VLAN (16b) Stats Index (16b)
DAddr(8b)
Port(4b)
Reserved(8b)
Eth HdrLen (8b)
Stats Index (16b)
Buffer Handle(32b)
Frame Length (16b)
QID(20b)Rsv(4b)
Port(4b)
Rsv(4b)
Type=IP (2B)
PAD (nB)CRC (4B)
UDP Payload(MN Packet)
Dst Addr (4B)Src Addr (4B)
Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)
TTL (1B)Protocol = UDP (1B)
Hdr Cksum (2B)
DstAddr (6B)SrcAddr (6B)
IP Options (0-40B)Src Port (2B)Dst Port (2B)
UDP length (2B)UDP checksum (2B)
Type=802.1Q (2B)
PAD (nB)CRC (4B)
UDP Payload(MN Packet)
Dst Addr (4B)Src Addr (4B)
Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)
TTL (1B)Protocol = UDP (1B)
Hdr Cksum (2B)
DstAddr (6B)SrcAddr (6B)
IP Options (0-40B)Src Port (2B)Dst Port (2B)
UDP length (2B)UDP checksum (2B)
VLAN (2B)Type=IP (2B) Et
hern
etHe
ader
IPHe
ader
UDP
Head
er
Possible Input Packet Formats Ouput PacketFormat
7 - David M. Zar - 05/14/23
MAC Address and VLAN Tag (Ingress) The source MAC address is fixed and set at
boot time (_WU_get_mac_address)
The destination MAC address will only differ in the last byte and this byte is obtained from the Lookup data.
The VLAN tag is obtained from the Lookup data.
8 - David M. Zar - 05/14/23
Stats/Counters (Ingress/Egress) The Stats Index is obtained from the Lookup Data The pre-queue packet and byte counters are updated
(_WU_update_counters)» Packet counter is incremented (atomic SRAM)» Byte count is incremented by the number of bytes in
the entire Ethernet frame (_WU_get_enet_frame_length).
Frame_length = IP_pkt_len + 18 18 is the VLAN Ethernet header length
9 - David M. Zar - 05/14/23
QM Data Formatting (Ingress and Egress)
QID is extracted from Lookup data Port number is extracted from Lookup data Total Ethernet frame length is passed to QM Stats index is passed on for post-queue counters
Stats Index (16b)
Buffer Handle(32b)
Frame Length (16b)
QID(20b)Rsv(4b)
Port(4b)
Rsv(4b)
10 - David M. Zar - 05/14/23
Ingress HF Block Diagram
_WU_get_enet_frame_length
_WU_write_vlan_header
_WU_update_counters
_WU_update_buffer_descriptorWait for prev ctx
Signal next ctx
NN Enqueue
Wait for prev ctx
Signal next ctx
NN Dequeueinit
signal
dl_sink()
dl_source()
DRAM: 4|5 4B writes
Cycles: 26
SRAM: 1 read 1 write
Cycles: 10
SRAM: 3 writes
Cycles: 12
Cycles: 10
Cycles: 5
Cycles: 2
Cycles: 1
Total cycles: 33+66=99 Budget: 1400 MHz/(10Gbs/8*90) = 100.8 => 100 cycles
Measured Latency: 745
Cycles: 17
Cycles: 16
11 - David M. Zar - 05/14/23
Ingress Validation Send in non-tunneled packets and check output
packets to see they are our internal, tunneled, packets.» Worked during development but not tested in integrated
system at this point. Send in tunneled packets and check output packets to
see they are our internal, tunneled, packets.» Example:
01020304 05060708 090a0b0c 81000aaa 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87 [6d7e d5be] CRC that’s stripped by RX->
» 01020304 0a020102 03040a0b 81000002 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87
Egress Header Format
13 - David M. Zar - 05/14/23
Egress Header Format Microengine Usage
»One microengine»Eight identical threads»NN ring input from Lookup»NN ring output to Port Splitter
Main functions:»Using data from Lookup, modify packet header in DRAM for proper
routing to Switch: Destination MAC address
First five bytes are same as source MAC address Destination MAC address is looked up based on IP address from lookup
Source MAC address Address of this LC
VLAN tag»Adjust pre-queue stats counters»Format input data for QM
QID Port Number Ethernet Frame Length
14 - David M. Zar - 05/14/23
LC Egress Functional Blocks
Lookup KeyExtract
SwitchRx
Phy IntTx QM/Schd Hdr
FormatSWITCH
EthernetFrame Length (16b)
Buffer Handle(32b)
Stats Index (16b)
QID(20b)Rsv(4b)
Port(4b)
Rsv(4b)
Type=802.1Q (2B)
PAD (nB)CRC (4B)
UDP Payload(MN Packet)
Src Addr (4B)Dst Addr (4B)
Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)
TTL (1B)Protocol = UDP (1B)
Hdr Cksum (2B)
DstAddr (6B)SrcAddr (6B)
IP Options (0-40B)Src Port (2B)Dst Port (2B)
UDP length (2B)UDP checksum (2B)
VLAN (2B)Type=IP (2B) Et
hern
etHe
ader
IPHe
ader
UDP
Head
erEt
hern
etTr
aile
r Inpu
t Pac
ket F
orm
at
Type=802.1Q (2B)
PAD (nB)CRC (4B)
UDP Payload(MN Packet)
Src Addr (4B)Dst Addr (4B)
Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)
TTL (1B)Protocol = UDP (1B)
Hdr Cksum (2B)
DstAddr (6B)SrcAddr (6B)
IP Options (0-40B)Src Port (2B)Dst Port (2B)
UDP length (2B)UDP checksum (2B)
VLAN (2B)Type=IP (2B) Et
hern
etHe
ader
IPHe
ader
UDP
Head
erEt
hern
etTr
aile
r
Out
put P
acke
t For
mat
Buf Handle(32b)IP Pkt
Length (16b)Reserved
(8b)Eth HdrLen (8b)
VLAN(12b)
QID (20b)Rsvd(4b)
Port(4b)
Rsvd(4b)
Stats Index (16b)Rsvd(4b)
IP DAddr (32b)
15 - David M. Zar - 05/14/23
MAC Address and VLAN Tag (Egress) The source MAC address is fixed and set at
boot time (_WU_get_mac_address)
The destination MAC address will only differ in the last nibble and this nibble is obtained from the Lookup data.» _WU_ip_lookup will take 32 bits from the destination IP address
and use the local CAM to obtain the least significant 4 bits of the MAC address.
» The CAM state bits are used for this so that’s why there are only 4 bits of data returned
The VLAN tag is obtained from the Lookup data.
16 - David M. Zar - 05/14/23
Egress HF Block Diagram
_WU_get_enet_frame_length
_WU_write_vlan_header
_WU_update_counters
_WU_update_buffer_descriptorWait for prev ctx
Signal next ctx
NN Enqueue
Wait for prev ctx
Signal next ctx
NN Dequeueinit
signal
dl_sink()
dl_source()
DRAM: 1 4B read 4 4B writesCycles: 32SRAM: 1 add 1 incrCycles: 6
SRAM: 3 writesCycles: 10
_WU_ip_lookup
Cycles: 10
Cycles: 2
Cycles: 2
Cycles: 1
Cycles: 1
Cycles: 1
Total cycles: 65
Measured Latency*: ~660
17 - David M. Zar - 05/14/23
Egress Validation Send in our internal, tunneled packets and check
output packets to see they are our valid IP, tunneled, packets.» For the PlanetLab demo, there are no non-tunneled output
packets Check packet and byte counters for valid updates Check CAM for proper initialization (data watch)
18 - David M. Zar - 05/14/23
HF Initialization (Ingress/Egress) All memory locations defined in dl_system.h:
»Base address for HF LC[I/E]_HF_SRAM_INIT_BASE
MAC_ADDR_HI32 MAC_ADDR_LO16
»Pre-Queue Counters LC[I/E]_LU_COUNTERS_SRAM_INIT_BASE
LC[I/E]_LU_PRE_Q_PKT_CNT_OFFSET – offset into counters structure for packet counter
LC[I/E]_LU_PRE_Q_BYTE_CNT_OFFSET – offset into counters structure for byte counter.
Thread 0 waits for signal from rx For Egress, the CAM is filled (_WU_hfe_initialize_ip_lookup)
with data from LCE_HF_SRAM_INIT_BASE + 8:each entry is 64 bits: cam_entry (32b), RSVD (28b), MAC_DEST (4b)
19 - David M. Zar - 05/14/23
File Locations (Ingress and Egress) Main code
» Applications/LC_Ingress/src/hdr_format/PL/hdr_format.uc» Applications/LC_Egress/src/hdr_format/PL/hdr_format.uc
Library » library/DataPlane/hdr_format_util.uc
20 - David M. Zar - 05/14/23
Required Includes (Ingress and Egress) Files
»build/PL/dispatch_loop/dl_system.h memory locations
»IXA_SDK_4.0/src/library/microblocks_library/ dl_meta – for metadata macros
»IXA_SDK_4.0/src/library/dataplane_library/ dram – for DRAM read/write macros sram – for SRAM read/write/add/incr macros xbuf – for transfer buffer macros
Performance Issues
22 - David M. Zar - 05/14/23
Ingress Performance AnomaliesThese stalls are in various SRAM and DRAM accesses – the
command FIFO is FULL!
23 - David M. Zar - 05/14/23
Ingress Anomalies (Explanation)
24 - David M. Zar - 05/14/23
Ingress Anomalies (Explanation)
These bus arbiters are shared across
all memory interfaces
The SRAM Controllers have a
command FIFO
25 - David M. Zar - 05/14/23
Ingress/Egress SRAM Issues It seems that using atomic ADD/INCR instructions is
expensive at the SRAM controller If I remove them and read the SRAM, add myself, write
the SRAM, this is quicker and consumes less of the SRM controller time an, thus, the command queue never backs up.
The this new design, there are more instructions executed, but there may be a few I could eliminate with some optimizing of code.
No stalling in the WU microblocks (well QM does and RX and TX still do but these looks normal).
26 - David M. Zar - 05/14/23
Ingress/Egress Performance ~99 CPU cycles ~745 cycles latency Expected performance
»Should have no trouble going at 10 Gb/s but does… Simulated performance (as of 11/06/2006)
»~10 Gb»With all other microengines in place (i.e. real simulation)
Future Work
28 - David M. Zar - 05/14/23
Determine source of I/O stalls Update Stubs projects for validation of Ingress/Egress
blocks (done for Ingress) Extend Both blocks for all possible packet formats
»Ingress – inputs»Egress – outputs
Possible instruction optimization to give a little headroom (99 cycles out of 100). Currently, design will not work for standard IPv4 packets; PlanetLab VLAN packets are OK.
Ingress/Egress Future Work