41
Brandon Heller [email protected] http://www.arl.wustl.edu/projects/techX Block Design Review: Substrate Decap and IPv4 Parse

Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse

Embed Size (px)

DESCRIPTION

3 - Brandon Heller - 1/19/2016 Contents Lookup Rx Tx QM Parse Header Format Substr Decap slide taken from PlanetLab_Design.ppt For SD and Parse: »overview »block diagram »memory usage »code locations »test procedures Performance analysis »Unexpected interactions »Future work

Citation preview

Page 1: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

Brandon [email protected]

http://www.arl.wustl.edu/projects/techX

Block Design Review:

Substrate Decap and IPv4 Parse

Page 2: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

2 - Brandon Heller - 05/03/23

Revision History 9/26/06 (BDH):

»Released 9/28/06 (BDH):

»SD now at 5Gbps+

Page 3: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

3 - Brandon Heller - 05/03/23

Contents

LookupRx TxQMParse HeaderFormat

SubstrDecap

slide taken from PlanetLab_Design.ppt

For SD and Parse:»overview»block diagram »memory usage»code locations»test procedures

Performance analysis»Unexpected interactions»Future work

Page 4: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

Substrate Decap

Page 5: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

5 - Brandon Heller - 05/03/23

Substrate Decap

LookupRx TxQMParse HeaderFormat

SubstrDecap

slide taken from PlanetLab_Design.ppt

Main functions:»validate & consume Ethernet header»look up code_option and slice_data_ptr based on VLAN tag»validate & consume substrate UDP/IP headers»pass relevant fields to IPv4 parse

Single code path NN communication Uses 8 threads Name change from Demux

Page 6: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

6 - Brandon Heller - 05/03/23

IPv4 MR Functional Blocks

LookupRx TxQMParse HeaderFormat

SubstrDecap

Buf Handle(32b)Port(8b)

Reserved(8b)

Eth. FrameLen (16b)

Type=802.1Q (2B)

PAD (nB)CRC (4B)

UDP Payload(MN Packet)

Dst Addr (4B)Src Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)SrcAddr (6B)

IP Options (0-40B)Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B) Et

hern

etHe

ader

IPHe

ader

UDP

Head

erEt

hern

etTr

aile

r

Rx UDP DPort (16b)

Buf Handle(32b)

Slice ID (VLAN) (16b)

MN Frm Offset (16b)MN Frm Length(16b)

Rx IP SAddr (32b)Reserved

(12b)Rx UDP SPort (16b) Code(4b)

Slice Data Ptr (32b)

slide taken from PlanetLab_Design.ppt

Page 7: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

7 - Brandon Heller - 05/03/23

Ethernet Validation No alignment necessary Counters kept in non-VLAN-specific region Tests for

» invalid Ethernet packet length» non-VLAN tag protocol ID» non-locally-addressed packet» unrecognized VLAN

Page 8: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

8 - Brandon Heller - 05/03/23

VLAN TableVLAN code_opt slice_data_ptr

0 0 01 0 0… … …0xaaa 1… … …0xfff 0 0

SD dataP dataHF data

code_option = 0 implies invalid slice»“on switch” for a slice in the data plane

SD data is currently only counters 64B slice data SRAM space for all 4096 VLANs

Page 9: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

9 - Brandon Heller - 05/03/23

Substrate UDP/IP Validation Header checks per RFC1812:

» IP ver other than 4» invalid header length» length too small» IP len doesn't match Enet-deduced IP len» UDP len doesn't match IP-deduced UDP len

NOTE: need to check Ethernet length, to ensure that padded 64B packets are using the correct length

Page 10: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

10 - Brandon Heller - 05/03/23

SD Block Diagram

add one 4B SRAM increment per counter (none currently for common case)

Read Eth/IP Hdrs

Validate Ethernet

Read VLAN table

Validate IP

Read UDP hdr

Validate UDP

Prepare ring dataWait for prev ctx

Signal next ctx

NN Enqueue

Wait for prev ctx

Signal next ctx

NN Dequeueinit

signal

substrate_decap()

dl_sink()

dl_source()

DRAM: 5 8B reads

SRAM: 2 4B reads

DRAM: 2 8B reads

mem access

Page 11: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

11 - Brandon Heller - 05/03/23

File locations (in …/IPv4_MR/) Code

» src/substrate_decap/PL/substrate_decap.[c,h]» src/dispatch_loop/PL/substrate_decap_dl.[c,h]» src/dispatch_loop/PL/dl_source.[c,h]

dl_source() and dl_sink() functions adds ordered thread synchronization if the following defined:

DL_ORDERED FIRST_ORDERED_ME LAST_ORDERED_ME

» src/IXP2XXX_book/Chapter09/ordered_signal.[c,h] functions for ordered thread synchronization

» src/dispatch_loop/PL/nn_rings.[c,h] functions for enqueuing and dequeuing NN ring data

Data formats» src/PL/ipv4_common.h

IP and UDP structure definitions» src/PL/substrate_common.h

Ethernet VLAN structure definitions» src/dispatch_loop/PL/ring_formats.h

ring data struct defs» build/PL/dispatch_loop/dl_system.h

memory locations

Page 12: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

12 - Brandon Heller - 05/03/23

Required Includes Files

»IXA_SDK_4.0\microengineC\src\intrinsic.c»IXA_SDK_4.0\microengineC\src\rtl.c

Directories»IXA_SDK_4.0\src\library\microblocks_library\microc\»IXA_SDK_4.0\MicroengineC\include\..\..\..\..\»IXA_SDK_4.0\src\library\dataplane_library\microc\

These are required to gain access to the buffer libraries and intrinsic functions!

Page 13: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

13 - Brandon Heller - 05/03/23

SD Initialization All memory locations defined in dl_system.h, incl:

» locations for MAC address IPV4_SD_MAC_ADDR_HI32 IPV4_SD_MAC_ADDR_LO16

»non-VLAN-specific counters IPV4_SD_COUNTERS_BASE IPV4_SD_COUNTERS_SIZE

»VLAN table IPV4_SD_VLAN_CODE_OPT_TABLE_x (BASE, SIZE, ENTRY_SIZE)

»VLAN-specific memory SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE, ENTRY_TOTAL) IPV4_SD_SLICE_DATA_ENTRY_OFFSET

At least one slice must be initialized to send packets»Call init_slice() from system_init.ind»Currently 0xaaa initialized by default»All counters zeroed

SD caches MAC address in registers Thread 0 waits for signal from rx

Page 14: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

14 - Brandon Heller - 05/03/23

Substrate Decap Validation All validation tests done with 1 thread and substrate_decap_tests.tcs

» Ethernet validation/counter tests invalid Ethernet packet length non-VLAN tag protocol ID non-locally-addressed packet unrecognized VLAN

» UDP/IP validation/counter tests IP ver other than 4 invalid header length length too small IP len doesn't match Enet-deduced IP len UDP len doesn't match IP-deduced UDP len

» Watched counters for proper number of increments

Fully valid packet: vlan_ip_udp_ip_udp/tcp (speed_test_all_valid.tcs)» Verified all fields of output ring data were as expected» Single-thread plus 8-thread

Hardware testing» Uses Fred’s sp++ utility with a logged trace of the above packets» observed exact same behavior as in simulation

Page 15: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

15 - Brandon Heller - 05/03/23

SD Other Bugs

»substrate IP proto not checked, should correspond to UDP Untested

»buffer drops Data Structures

»substrate_decap_vlan_table_entry_t»substrate_decap_stats_t»substrate_decap_vlan_stats_t»vlan_ip_header

ipv4_header_struct vlan_header_struct

»udp_header Performance

»coming later

Page 16: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

IPv4 Parse

Page 17: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

17 - Brandon Heller - 05/03/23

IPv4 Parse

LookupRx TxQMParse HeaderFormat

SubstrDecap

slide taken from PlanetLab_Design.ppt

Main functions»Read/align IP header»Validate and consume IP header (per RFC1812 5.2.2)»Update IP header

Dec TTL Recalc IP checksum Write updated checksum to DRAM

»Read/align L4 (UDP/TCP/other) header»Mark exceptions for Header Format»Extract fields for Lookup

Page 18: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

18 - Brandon Heller - 05/03/23

IPv4 MR Functional Blocks

IPv4 Exception Bits»Bit 0: TTL = 0 or 1»Bit 1: Options

LookupRx TxQMParseHeaderFormatDeMux

Rx UDP DPort (16b)

Buf Handle(32b)

Slice ID (VLAN) (16b)

MN Frm Offset (16b)MN Frm Length(16b)

Rx IP SAddr (32b)Reserved

(12b)Rx UDP SPort (16b) Code(4b)

Lookup Key[111-80] DA (32b)

Buf Handle(32b)IP Pkt Length (16b)IP Pkt Offset (16b)

Lookup Key[ 79-48] SA (32b)Lookup Key[ 47-16] Ports (32b)

Lookup KeyProto/TCP_Flags

[15- 0] (16b)ExceptionBits (12b)

Lookup Key[143-112] Slice ID/Rx UDP DPort (32b)

LFlags(4b)

Slice Data Ptr (32b)

Slice Data Ptr (32b)Reserved

(28b)Code(4b)

Page 19: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

19 - Brandon Heller - 05/03/23

Zeros (4b)

IPv4 Internal Header FormatsType (6b) Len (6b)

Type Dependent Data (8B)

Rx UDP DPort (2B)Tx UDP DPort (2B)Tx UDP SPort (2B)

Tx IP DAddr (4B)

Source Category Typebit field

Reason Internal Hdr

RMPE Action

Ingress LC

Normal Fwd None Classify and fwd

GPE No Classify (w/

FwdKey**)

[0] Original pkt , reinjected to data path

Rx UDP DPort + FwdKey

Perform substrate lookup to resolve

LCAddr, port and QID

Classify (w/o

FwdKey)

[1] ICMP or local traffic Rx UDP DPort

Classify and fwd

4 bits at start discriminate between IPv4 and internal headers for more details see planetlab_IPv4_MR_parse_hdr_format.ppt in bdh4\techx\

IPv4_MR_shared

Page 20: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

20 - Brandon Heller - 05/03/23

Parse Validation IPv4_parse_tests.tcs

» Invalid internal header invalid len for internal header type internal header type unknown

» Invalid IPv4 (RFC 1812 checks) IP ver other than 4 invalid header length length too small SD IP len doesn't match packet IP len invalid header checksum

» IPv4 Exceptions options flag set in packet TTL equals zero TTL equals one

IPv4_parse_valid.tcs» Fully valid, no-exceptions packets

from GPE, classify from GPE, non-classify ingress, TCP ingress, UDP

Page 21: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

21 - Brandon Heller - 05/03/23

Parse Block Diagram

add one 4B SRAM increment per counter (none currently for common case)

Read Int Hdr

Handle Internal

Read IP

Validate IP

Read L4

Handle L4

Prepare ring dataWait for prev ctx

Signal next ctx

NN Enqueue

Wait for prev ctx

Signal next ctx

NN Dequeueinit

signal

ipv4_parse()

dl_sink()

dl_source()DRAM: 2 8B reads

DRAM: 4 8B reads

DRAM: 4 8B reads

mem access

(DRAM: 2 8B reads)

Checksum

Page 22: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

22 - Brandon Heller - 05/03/23

File locations (in …/IPv4_MR/) Code

» src/ipv4/PL/ipv4_parse[c,h]» src/dispatch_loop/PL/parse_dl.[c,h]» src/parse/PL/parse.[c,h]» src/dispatch_loop/PL/dl_source.[c,h]

dl_source() and dl_sink() functions adds ordered thread synchronization if the following defined:

DL_ORDERED FIRST_ORDERED_ME LAST_ORDERED_ME

» src/IXP2XXX_book/Chapter09/ordered_signal.[c,h] functions for ordered thread synchronization

» src/dispatch_loop/PL/nn_rings.[c,h] functions for enqueuing and dequeuing NN ring data

Data formats» src/PL/ipv4_common.h

IP and UDP structure definitions» src/dispatch_loop/PL/ring_formats.h

ring data struct defs» build/PL/dispatch_loop/dl_system.h

memory locations

Page 23: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

23 - Brandon Heller - 05/03/23

Parse Initialization All memory locations defined in dl_system.h, incl:

»VLAN-specific memory SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE, ENTRY_TOTAL) IPV4_PARSE_SLICE_DATA_ENTRY_OFFSET

At least one slice must be initialized to send packets»Call init_slice() from system_init.ind»Currently 0xaaa initialized by default»All counters zeroed

Page 24: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

24 - Brandon Heller - 05/03/23

Other Bugs

»none? Untested

»buffer drops Unimplemented

»checksum for IP options not handled yet Data Structures

»parse_vlan_stats_t»ipv4_header_struct»udp_header_struct»tcp_header_struct

Performance»coming next

Page 25: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

Performance

Page 26: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

26 - Brandon Heller - 05/03/23

Packet SizesEthernet VLAN Header 18BSubstrate Header IPv4 Header 20B UDP Header 8BMetanet Frame GPE to MPE n IPv4 Header 20B UDP Header 8B Payload nEthernet Pad 0Ethernet FCS 4BTotal 78B + internal + payloadEthernet IFS 12BTotal Physical 90B + internal + payload

Page 27: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

27 - Brandon Heller - 05/03/23

Cycle Budget (min eth packets) To hit 5Gb rate:

» 76B per min IPv4 packet (64 min Eth + 12B IFS)» 1.4Ghz clock rate» 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mp/sec» 1.4Gcycle/sec * 1 sec/ 8.22 Mp = 170.3 cycles per packet» compute budget: 170 cycles» latency budget: (threads*170)

4 threads : 680 cycles 8 threads: 1360 cycles

Page 28: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

28 - Brandon Heller - 05/03/23

Cycle Budget (IPv4 MN packets) To hit 5Gb rate:

» 90B per min IPv4 packet (78 min IPv4MN + 12B IFS)» 1.4Ghz clock rate» 5 Gb/sec * 1B/8b * packet/90B = 6.94 Mp/sec» 1.4Gcycle/sec * 1 sec/ 6.94 Mp = 201.7 cycles per packet» compute budget: 201 cycles» latency budget: (threads*201)

4 threads : 804 cycles 8 threads: 1608 cycles

Page 29: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

29 - Brandon Heller - 05/03/23

Performance Anomalies

Substrate Decap

Spot the issue!

these issues have since been fixed!more DRAM contentionunhidden DRAM latency

Page 30: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

30 - Brandon Heller - 05/03/23

Substrate Decap Performance Optimized common case (ingress, no options)

»Combined initial header checks»No options assumed single DRAM read

153 cycles typical ~650 cycles latency 337 control store instructions Expected performance

»(201/153)*5Gb = ~6.5Gb expected performance Simulated performance (as of 9/26/2006)

»>5 Gb, but something else slows down 6Gb input

Page 31: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

31 - Brandon Heller - 05/03/23

SD Optimizations possible optimizations

» caching VLAN-to-CodeOption table in Local Memory» optimize nn_dequeue_incr() via assembly coding» move VLAN counter computation off fast path?» use transfer regs directly

saves 9 cycles» remove volatile statements

Page 32: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

32 - Brandon Heller - 05/03/23

Parse Performance single-threaded

»~380 cycles for computation»1708 cycles latency»556 control store insts

Expected performance»(201/380)*5Gb = <3Gb expected performance

Going to optimize a bit before add all 8 threads

Page 33: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

33 - Brandon Heller - 05/03/23

Parse Optimizations possible optimizations

» incremental IPv4 checksum update per RFC1624» checksum computation in assembler » optimized 5LW alignment for IP read» combined initial error-check to optimize common case

reduces branch delays slows down exception path

Page 34: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

34 - Brandon Heller - 05/03/23

Implementation Status Parse needs

» error testing» IP options with checksum » multithreading» drop tests

Page 35: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

35 - Brandon Heller - 05/03/23

Image Slide Template

Page 36: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

36 - Brandon Heller - 05/03/23

Text Slide Template

Page 37: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

37 - Brandon Heller - 05/03/23

Extra Slides

Page 38: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

38 - Brandon Heller - 05/03/23

Parse Memory Usage Memory reads/writes

» 2 8B DRAM reads: unaligned internal header» 2 8B DRAM reads: unaligned internal header + FwdKey» 4 8B DRAM reads: unaligned IPv4 header» [0,6] DRAM reads: unaligned IPv4 header options» 4 8B DRAM reads: unaligned L4 header» 1 SRAM increment: per counter» 1 DRAM write: updated TTL and checksum

Page 39: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

39 - Brandon Heller - 05/03/23

Ethernet Validation First, read packet from memory, guaranteed aligned Not specific to any VLAN - in separate mem area For efficiency, can keep counters in LM and update to RAM when a signal

is triggered

typedef struct _substrate_decap_stats_t{ unsigned int rx; // received unsigned int pass; // passed to next stage unsigned int dropLen // invalid Ethernet packet length unsigned int dropTPID; // non-VLAN tag protocol ID unsigned int dropDst; // non-locally-addressed packet unsigned int dropVLAN; // unrecognized VLAN } substrate_decap_stats_t;

Page 40: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

40 - Brandon Heller - 05/03/23

UDP/IP Validationtypedef struct _substrate_decap_slice_stats_t{ unsigned int dropIPVer; // IP ver other than 4 unsigned int dropHdrLen; // invalid header length unsigned int dropLenSmall; // length too small unsigned int dropLenMismatch; // IP len doesn't match Enet IP len unsigned int dropUDPLen; // UDP len doesn't match IP UDP len unsigned int pass; // passed to next stage }substrate_decap_slice_stats_t;

Page 41: Brandon Heller  Block Design Review: Substrate Decap and IPv4 Parse

41 - Brandon Heller - 05/03/23

RFC 1812 5.2.2 IP Header Validation(1) The packet length reported by the Link Layer must be large

enough to hold the minimum length legal IP datagram (20 bytes)

(2) The IP checksum must be correct.

(3) The IP version number must be 4. If the version number is not 4 then the packet may be another version of IP, such as IPng or ST-II.

4) The IP header length field must be large enough to hold the minimum length legal IP datagram (20 bytes = 5 words).

(5) The IP total length field must be large enough to hold the IP datagram header, whose length is specified in the IP header length field.

from http://www.faqs.org/rfcs/rfc1812.html