199
SAMSUNG OPEN SOURCE CONFERENCE 2019 SOSCON Faster Packet Processing in Linux: XDP Kosslab | Software Engineer | 이호연 2019.10.17 1

Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SAMSUNG OPEN SOURCE CONFERENCE 2019

SOSCONFaster Packet Processing in Linux: XDP

Kosslab | Software Engineer |이호연2019.10.17

1

Page 2: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

이호연 @Daniel T. Lee

- Kosslab Software Engineer

- Opensource Developer (Linux Kernel – BPF, XDP, uftrace, etc.)

2

Page 3: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

• Packet Processing in Linux

• How fast are we talking?

• What is XDP?

• How to use XDP?

• More about XDP

Today’s agenda

3

Page 4: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Processing in Linux

From basic path to processing hooks

4

Page 5: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Path in Kernel

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

5

Page 6: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

6

Page 7: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forwarding Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

7

Page 8: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

8

Page 9: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

9

Page 10: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

10

Page 11: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

11

Page 12: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

12

Page 13: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

13

$ route -n Kernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 _gateway 0.0.0.0 UG 101 0 0 enp6s010.1.0.0 0.0.0.0 255.255.255.0 U 111 0 0 eth010.1.1.0 0.0.0.0 255.255.255.0 U 110 0 0 eth1192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0_gateway 0.0.0.0 255.255.255.255 UH 101 0 0 enp6s0

Page 14: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Receiving Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

14

Page 15: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forwarding Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

15

Page 16: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forwarding Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

16

Page 17: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forwarding Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

17

Page 18: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forwarding Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

18

Page 19: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

19

Page 20: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

20

Page 21: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

21

Page 22: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

22

Page 23: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Sending Path

DEVICE DRIVER

UPPER LAYER

Ingress

PROTO HANDLER ROUTING FORWARDING

OUTPUT

INPUT

NEIGH

ROUTING

Egress

L4~

L3

L2

23

Page 24: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet processing?

24

Page 25: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Processing

• Filtering (Drop, Pass)

• Forwarding (Routing)

• NAT (Masquerade, Source, Destination)

• Packet Tunneling (Encapsulation)

• Packet Mangling (ToS, TTL, mark)

25

Page 26: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Hooks to process packet?

26

Page 27: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Hooks to process packet?

L2

L3

L7 Userspace

IP (Netfliter)

Traffic Control

DD (XDP)

NIC

27

Page 28: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Linux Socket

Packet processing by receive message

Userspace

28

Page 29: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Packet-Processing Path

29

Page 30: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Userspace Packet Path

30

Page 31: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Userspace

31

Any kind of packet processing is possible!

int fd = socket(AF_INET, SOCK_DGRAM, 0);struct sockaddr_in in;int res, pkts, bytes;char buf[MTU_SIZE];

memset(&in, 0, sizeof(in));in.sin_family = AF_INET;in.sin_addr.s_addr = INADDR_ANY;in.sin_port = htons(1234);

if (bind(fd, (struct sockaddr*)&in, sizeof(in)) < 0)exit(EXIT_FAILURE);

while (1) {int res = read(fd, buf, MTU_SIZE);

if (res <= 0)return 0;

pkts += 1;bytes += r;

}

Page 32: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Linux Firewall Framework

set of hooks inside the Network stack

Netfilter

32

Page 33: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Packet Path

33

Page 34: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Packet Path

modulemodule

module

module

module

By registering kernel modules to hooks,Intercepts network traffic

ex) iptables…

34

Page 35: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter

35

By using iptables, nftables, etc..

Packet Processing is available

- Packet filtering

- NAT (network address translation)

- Packet Mangling.

- Stateless/Stateful Firewalling

- ETC..

Page 36: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Example 1 – Packet Filtering

Netfilter (iptables) packet drop

DROP UDP 10.1.0.2:1234

$ iptables -A INPUT -d 10.1.0.2 -p udp --dport 1234 -j DROP

36

Page 37: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 1 – Packet Filtering

ROUTING

packetpacketpacket

10.1.0.2:1234

37

Page 38: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 1 – Packet Filtering

DROP

Source IP Source Port Dest. IP Dest.Port Action

1 10.0.0.5 -- -- -- ALLOW

2 -- -- -- 22 DROP

3 -- -- 10.1.0.2 1234 DROP

ROUTING

packetpacketpacket

10.1.0.2:1234

38

Page 39: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Example 2 – NAT

Netfilter (iptables) Destination NAT

NAT UDP 10.1.0.2:1234 -> 10.1.1.2:1234

$ iptables -t nat -A PREROUTING \-d 10.1.0.2 -p udp --dport 1234 \-j DNAT --to-destination 10.1.1.2:1234

39

Page 40: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 2 – NAT

ROUTING

packetpacketpacket

10.1.0.2:1234

40

Page 41: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 2 – NAT

packetpacketpacket

ROUTING

$ iptables -t nat -A PREROUTING -d 10.1.0.2 -p udp --dport 1234 \-j DNAT --to-destination 10.1.1.2:1234

$ iptables -t nat -L PREROUTINGChain PREROUTING (policy ACCEPT)target prot opt source destinationDNAT udp -- anywhere 10.1.0.2 udp dpt:1234 to:10.1.1.2:1234

10.1.0.2:1234

41

Page 42: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 2 – NAT

Source IP Source Port Dest. IP Dest.Port to

1 10.0.0.5 -- -- -- 10.0.0.2:1234

2 -- -- 10.1.0.255 80 10.1.0.1:443

3 -- -- 10.1.0.2 1234 10.1.1.2:1234

DNAT

packetpacketpacket

packet

10.1.0.2:1234

10.1.1.2:1234

42

Page 43: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 2 – NAT

ROUTING

packetpacketpacket

packet

10.1.0.2:1234

10.1.1.2:1234

43

$ route -n Kernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 _gateway 0.0.0.0 UG 101 0 0 enp6s010.1.0.0 0.0.0.0 255.255.255.0 U 111 0 0 eth0

10.1.1.0 0.0.0.0 255.255.255.0 U 110 0 0 eth1

192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0_gateway 0.0.0.0 255.255.255.255 UH 101 0 0 enp6s0

Page 44: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Example 2 – NAT

packetpacketpacket

10.1.0.2:1234 packet

10.1.1.2:1234

44

ROUTING

Page 45: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Traffic Control – Packet Scheduler

Control network traffic with Queuing policy

TC

45

Page 46: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Packet Path

Qdisc Qdisc

46

Page 47: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Packet Path

Qdisc Qdisc

47

With Packet Scheduler (Qdisc),can determine how to receive & transmit packets

ex) priority, delay, dropping

Page 48: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Packet Path

Qdisc Qdisc

48

By attaching Filter to Qdisc,User can take action with matched packet

Page 49: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC

49

With filters attached to Qdisc

Packet Processing is available

- Packet filtering

- NAT (network address translation)

- Packet Mangling (skb_edit)

- Packet Forwarding (mirroring)

- QoS

- ETC..

Page 50: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Example 1 – Packet Filtering

TC Ingress packet drop

DROP UDP 10.1.0.2:1234

$ tc qdisc add dev eth0 ingress$ tc filter add dev eth0 parent ffff: protocol ip u32 \

match ip dst 10.1.0.2 match ip dport 1234 0xffff \action drop

50

Page 51: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 1 – Packet Filtering

packetpacket

packet

ROUTING

$ tc qdisc add dev eth0 ingress

10.1.0.2:1234

51

Page 52: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 1 – Packet Filtering

packetpacket

packet

ROUTING

10.1.0.2:1234

52

$ tc qdisc add dev eth0 ingress

Qdisc

Page 53: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 1 – Packet Filtering

packetpacket

packet

ROUTING

$ tc filter add dev eth0 parent ffff: protocol ip u32 \match ip dst 10.1.0.2 match ip dport 1234 0xffff \action drop

`

53

10.1.0.2:1234

Qdisc

Page 54: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 1 – Packet Filtering

packetpacket

packet

ROUTING

$ tc filter add dev eth0 parent ffff: protocol ip u32 \match ip dst 10.1.0.2 match ip dport 1234 0xffff \action drop

`

54

10.1.0.2:1234

QdiscDROP

Page 55: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Example 2 – NAT

TC Ingress Destination NAT

NAT UDP 10.1.0.2:1234 -> 10.1.1.2:1234

$ tc qdisc add dev eth0 ingress$ tc filter add dev eth0 parent ffff: protocol ip u32 \

match ip dst 10.1.0.2 match ip dport 1234 0xffff \action nat ingress 10.1.0.2/32 10.1.1.2/32 pipe \action mirred egress redirect dev eth1

55

Page 56: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 2 – NAT

packetpacket

packet

ROUTING

$ tc qdisc add dev eth0 ingress

10.1.0.2:1234

56

Page 57: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 2 – NAT

packetpacket

packet

ROUTING

10.1.0.2:1234

57

$ tc qdisc add dev eth0 ingress

Qdisc

Page 58: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 2 – NAT

packetpacket

packet

ROUTING

$ tc filter add dev eth0 parent ffff: protocol ip u32 \match ip dst 10.1.0.2 match ip dport 1234 0xffff \action nat ingress 10.1.0.2/32 10.1.1.2/32 pipe \action mirred egress redirect dev eth1

`

58

10.1.0.2:1234

Qdisc

Page 59: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 2 – NAT

packetpacket

packet

ROUTING

$ tc filter add dev eth0 parent ffff: protocol ip u32 \match ip dst 10.1.0.2 match ip dport 1234 0xffff \action nat ingress 10.1.0.2/32 10.1.1.2/32 pipe \action mirred egress redirect dev eth1

`

59

10.1.0.2:1234

Qdisc

DNAT

packet

10.1.1.2:1234

Page 60: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Example 2 – NAT

packetpacket

packet

ROUTING

60

10.1.0.2:1234

ROUTING

Qdisc Qdiscpacket

10.1.1.2:1234

action mirred egress redirect dev eth1

Redirect

Page 61: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP

eBPF based fast data-path

61

Page 62: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

XDP Packet Path

62

Page 63: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

eBPF based fast data-path

XDP

63

Page 64: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

How FAST are we talking?

64

Page 65: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Drop

From zero to 14 Mpps

65

Page 66: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

packet

Test environment

NIC

Host A

User

IP

TC

DD

NIC

Host B10 Gbe

packetpacket

10.1.0.1 10.1.0.2:1234

66

Connected with 10Gbe Ethernet

Page 67: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

packet

Test environment

NIC

Host A

User

IP

TC

DD

NIC

Host B10 Gbe

Pktgen : Send UDP packet (linux/samples/pktgen)

$ ./pktgen.sh -i eth0 -m $DST_MAC -d 10.1.0.2 -p 1234Result device: eth0Params: count 100000 min_pkt_size: 60 max_pkt_size: 60...

dst_min: 10.1.0.2dst_mac: $DST_MACudp_dst: 1234

Current: pkts-sofar: 100000 errors: 0started: 7802657us stopped: 7862969us idle: 55uscur_saddr: 10.1.0.1 cur_daddr: 10.1.0.2cur_udp_dst: 1234 cur_udp_src: 109cur_queue_map: 0 flows: 0

Result: OK: 60312(c60257+d55) usec, 100000 (60byte,0frags) 1658039pps 795Mb/sec (795858720bps) errors: 0

packetpacket

10.1.0.1 10.1.0.2:1234

67

Page 68: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Test environment

10.1.0.1

NIC

Host A

User

IP

TC

DD

NIC

Host B

10.1.0.2:1234

10 Gbe

DROP

Packet Drop / Single Core?

packet

10 Gbe

packetpacket

DROP UDP 10.1.0.2:1234

68

Page 69: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

10Gbit = (10 * 10^9 bit) / (84 * 8bit) = 14,880,952 pps (14Mpps)

Preamble

Ethernet Frame

InterFrame Gap

MAC.Destination

MAC.Source

Type PAYLOAD(IP/IPv6/ARP…)

CRC

8 Bytes 6 Bytes 6 Bytes2

Bytes 46-1500 Bytes 4 Bytes 12 Bytes

8 64 12

84

Theoretical speed of 10Gbe?

69

Page 70: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Hooks to process packet?

L2

L3

L7 Userspace

IP (Netfliter)

Traffic Control

DD (XDP)

NIC

70

Page 71: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Userspace Packet Drop

DROP

71

Page 72: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

char buf[MTU_SIZE];

while (1) {int res = read(fd, buf, MTU_SIZE);

if (res <= 0)return 0;

pkts += 1;bytes += res;

}

user-drop.c

Userspace Packet Drop

UDP socket serverpacket drop by reading socket

NIC

Host A

User

IP

TC

DD

NIC

Host B

DROP

10 Gbe

10.1.0.1 10.1.0.2:1234

72

Page 73: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Userspace Packet Drop

$ gcc –o user-drop user-drop.c

$ sudo ./user-droppackets=778148 bytes=14006664packets=782171 bytes=14079078packets=784792 bytes=14126256packets=786466 bytes=14156388packets=784163 bytes=14114934packets=782500 bytes=14085000packets=783085 bytes=14095530packets=783172 bytes=14097096

Average Packet Drop

783,063pps/core

≈ 530Mbit/s

73

Page 74: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Userspace Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {packet_rcv() {skb_push();consume_skb();

}}ip_rcv() {ip_rcv_core.isra.25();ip_rcv_finish() {ip_rcv_finish_core.isra.23() {udp_v4_early_demux() {ip_check_mc_rcu();ip_mc_sf_allow();ipv4_dst_check();ip_mc_validate_source() {fib_validate_source() {__fib_validate_source();

}}

}

ip_local_deliver() {nf_hook_slow() {iptable_filter_hook [iptable_filter]() {ipt_do_table [ip_tables]() {__local_bh_enable_ip();

}}}ip_local_deliver_finish() {ip_protocol_deliver_rcu() {raw_local_deliver();udp_rcv() {__udp4_lib_rcv() {udp_unicast_rcv_skb.isra.64() {udp_queue_rcv_skb() {udp_queue_rcv_one_skb() {__udp_enqueue_schedule_skb();. . .

userspace DROP

74

≈ 530Mbit/s

Page 75: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Can it be faster?

75

Page 76: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

YES! By using Netfilter

76

Page 77: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Packet Drop

77

DROP

Page 78: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ iptables -A INPUT -d 10.1.0.2 -p udp --dport 1234 -j DROP

$ iptables -L INPUTChain INPUT (policy ACCEPT)target prot opt source destinationDROP udp -- anywhere 10.1.0.2 udp dpt:1234

Netfilter Packet Drop

NIC

Host A

User

IP

TC

DD

NIC

Host B

DROP

10 Gbe

Netfilter (iptables)packet drop at INPUT chain

10.1.0.1 10.1.0.2:1234

78

Page 79: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Packet Drop

$ iptables -vxnL INPUTChain INPUT (policy ACCEPT 79 packets, 4318 bytes)

pkts bytes target prot opt in out source destination2927763 134677098 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:12344235054 194812484 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:12345541468 254907528 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:12346397986 294307356 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:12347706050 354478300 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:12349013710 414630660 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123410320569 474746174 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123411629331 534949226 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123412937107 595106922 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123414243987 655223402 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123415553372 715455112 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:123416861793 775642478 DROP udp -- * * 0.0.0.0/0 10.1.0.2 udp dpt:1234

Average Packet Drop

1,266,730pps/core

≈ 860Mbit/s

79

Page 80: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {packet_rcv() {skb_push();consume_skb();

}}ip_rcv() {ip_rcv_core.isra.25();ip_rcv_finish() {ip_rcv_finish_core.isra.23() {udp_v4_early_demux() {ip_check_mc_rcu();ip_mc_sf_allow();ipv4_dst_check();ip_mc_validate_source() {fib_validate_source() {__fib_validate_source();

}}

}

ip_local_deliver() {nf_hook_slow() {iptable_filter_hook [iptable_filter]() {ipt_do_table [ip_tables]() {udp_mt();__local_bh_enable_ip();

}}kfree_skb();

80

DROP

Page 81: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Packet Dropip_local_deliver() {

nf_hook_slow() {iptable_filter_hook [iptable_filter]() {ipt_do_table [ip_tables]() {__local_bh_enable_ip();

}}}ip_local_deliver_finish() {ip_protocol_deliver_rcu() {raw_local_deliver();udp_rcv() {__udp4_lib_rcv() {udp_unicast_rcv_skb.isra.64() {udp_queue_rcv_skb() {udp_queue_rcv_one_skb() {__udp_enqueue_schedule_skb();. . .

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {packet_rcv() {skb_push();consume_skb();

}}ip_rcv() {ip_rcv_core.isra.25();ip_rcv_finish() {ip_rcv_finish_core.isra.23() {udp_v4_early_demux() {ip_check_mc_rcu();ip_mc_sf_allow();ipv4_dst_check();ip_mc_validate_source() {fib_validate_source() {__fib_validate_source();

}}

}

ip_local_deliver() {nf_hook_slow() {iptable_filter_hook [iptable_filter]() {ipt_do_table [ip_tables]() {udp_mt();__local_bh_enable_ip();

}}kfree_skb();

UserspaceNetfliter

userspace

DROP

DROP

81

≈ 530Mbit/s

≈ 860Mbit/s

Page 82: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Is it fast enough?

82

Page 83: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

83

NO! With TC, it can be faster

Page 84: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Ingress Packet Drop

Qdisc Qdisc

DROP

84

Page 85: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ tc qdisc add dev eth0 ingress$ tc filter add dev eth0 parent ffff: protocol ip u32 \

match ip dst 10.1.0.2 match ip dport 1234 0xffff \action drop

$ tc filter show ingress dev eth0...match 0a010002/ffffffff at 16match 000004d2/0000ffff at 20

action order 1: gact action droprandom type none pass val 0index 1 ref 1 bind 1

TC Ingress Packet Drop

NIC

Host A

User

IP

TC

DD

NIC

Host B

DROP

10 Gbe

10.1.0.1 10.1.0.2:1234

85

TC Ingresspacket drop with filter action

Page 86: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Ingress Packet Drop

$ tc -s filter show ingress dev eth0…match 0a010002/ffffffff at 16match 000004d2/0000ffff at 20

action order 1: gact action droprandom type none pass val 0index 1 ref 1 bind 1 installed 62 sec used 62 sec

Action statistics:Sent 154709776 bytes 3363256 pkt (dropped 3363260, ...) Sent 345955052 bytes 7520762 pkt (dropped 7520767, ...) Sent 537288326 bytes 11680181 pkt (dropped 11680186, ...) Sent 728875750 bytes 15845125 pkt (dropped 15845130, ...) Sent 920311328 bytes 20006768 pkt (dropped 20006772, ...) Sent 1112155754 bytes 24177299 pkt (dropped 24177304, ...)Sent 1276514214 bytes 27750309 pkt (dropped 27750309, ...)Sent 1453930004 bytes 31607174 pkt (dropped 31607179, ...)Sent 1645855758 bytes 35779473 pkt (dropped 35779477, ...)Sent 1838232266 bytes 39961571 pkt (dropped 39961577, ...)Sent 2029302696 bytes 44115276 pkt (dropped 44115282, ...)Sent 2221122512 bytes 48285272 pkt (dropped 48285278, ...)Sent 2412335864 bytes 52442084 pkt (dropped 52442089, ...)Sent 2603632476 bytes 56600706 pkt (dropped 56600711, ...)

Average Packet Drop

4,083,820pps/core

≈ 2.75Gbit/s

86

Page 87: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Ingress Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {tcf_classify() {u32_classify [cls_u32]() {tcf_action_exec() {tcf_gact_act [act_gact]();}

}}kfree_skb(); DROP

87

Page 88: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Ingress Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {packet_rcv() {skb_push();consume_skb();

}}ip_rcv() {ip_rcv_core.isra.25();ip_rcv_finish() {ip_rcv_finish_core.isra.23() {udp_v4_early_demux() {ip_check_mc_rcu();ip_mc_sf_allow();ipv4_dst_check();ip_mc_validate_source() {fib_validate_source() {__fib_validate_source();

}}

}

ip_local_deliver() {nf_hook_slow() {iptable_filter_hook [iptable_filter]() {ipt_do_table [ip_tables]() {udp_mt();__local_bh_enable_ip();

}}kfree_skb();

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {tcf_classify() {u32_classify [cls_u32]() {tcf_action_exec() {tcf_gact_act [act_gact]();}

}}kfree_skb();

NetfliterTC

DROP

DROP

88

≈ 2.75Gbit/s

≈ 860Mbit/s

Page 89: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

More faster?

89

Page 90: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

90

Sure thing! It's faster with XDP

Page 91: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

XDP Packet Drop

DROP

91

Page 92: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Packet Drop

DROP UDP 10.1.0.2:1234

SEC("xdp1")int xdp_prog1(struct xdp_md *xdp) {void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;...if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);...

if (iph->daddr == _htonl(0xa010002)) {if (iph->protocol == IPPROTO_UDP) {uh = data + sizeof(*eth) + sizeof(*iph);...

if (uh->dest == htons(1234))return XDP_DROP;

}}

}return XDP_PASS;

}92

Page 93: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ bpftool prog load ./xdp-drop.o /sys/fs/bpf/drop$ bpftool prog show...24: xdp name xdp_prog1 tag 6f8c2e06dfa2abcb gpl

loaded_at 2019-10-11T17:17:33+0900 uid 0xlated 544B jited 344B memlock 4096B map_ids 17

$ bpftool net attach xdp id 24 dev eth0$ bpftool netxdp:eth0(8) driver id 24

XDP Packet Drop

NIC

Host A

User

IP

TC

DD

NIC

Host B

DROP

10 Gbe

10.1.0.1 10.1.0.2:1234

93

XDPpacket drop with XDP_DROP

Page 94: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ ethtool -S eth0 | grep rx_xdp_droprx_xdp_drop: 6161010rx_xdp_drop: 16114329rx_xdp_drop: 26079532rx_xdp_drop: 36025920rx_xdp_drop: 45958488rx_xdp_drop: 55869376rx_xdp_drop: 65835136rx_xdp_drop: 75748898rx_xdp_drop: 85670845rx_xdp_drop: 95635591rx_xdp_drop: 105569230rx_xdp_drop: 115515714rx_xdp_drop: 125480832rx_xdp_drop: 135432211rx_xdp_drop: 1455190421

XDP Packet Drop

Average Packet Drop

9,941,337pps/core

≈ 6.69Gbit/s

94

Page 95: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();ixgbe_run_xdp() {bpf_prog_run_xdp();

} DROP

95

Page 96: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {napi_gro_receive() {netif_receive_skb_internal() {skb_defer_rx_timestamp();__netif_receive_skb() {__netif_receive_skb_one_core() {__netif_receive_skb_core() {tcf_classify() {u32_classify [cls_u32]() {tcf_action_exec() {tcf_gact_act [act_gact]();}

}}kfree_skb();

ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();ixgbe_run_xdp() {bpf_prog_run_xdp();}

TCXDP

DROP

DROP

96

≈ 2.75Gbit/s

≈ 6.69Gbit/s

Page 97: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Drop Results

783,063 1,266,730

4,083,820

9,941,337

0

2

4

6

8

10

12

userspace netfilter tc xdp

Mpps

97

≈ 6.69Gbit/s

Page 98: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP? Super FAST!

98

Page 99: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP?

The definition of XDP

99

Page 100: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP?

eBPF based fast data-path

100

Page 101: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

eBPF based fast data-path

What is XDP?

101

Page 102: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

in the kernel is similar withV8 in Chrome browser

What is BPF?

102

Page 103: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

In kernel Virtual Machine

What is BPF?

103

Page 104: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

An virtual machine?What does BPF look like?

What is BPF?

104

Page 105: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

You may already used it!

What is BPF?

105

Page 106: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

You may already used it!

What is BPF?

An tcpdump!

106

Page 107: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

$ tcpdump -i eth0 'udp and dst 10.1.0.2'tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes00:19:42.244270 IP 192.168.10.1.221 > 10.1.0.2.1234: UDP, length 1800:19:42.244270 IP 192.168.20.1.813 > 10.1.0.2.1234: UDP, length 1800:19:42.244270 IP 192.168.30.1.856 > 10.1.0.2.1234: UDP, length 1800:19:42.244270 IP 192.168.40.1.959 > 10.1.0.2.1234: UDP, length 1800:19:42.244271 IP 192.168.10.1.160 > 10.1.0.2.1234: UDP, length 1800:19:42.244271 IP 192.168.20.1.102 > 10.1.0.2.1234: UDP, length 1800:19:42.244276 IP 192.168.30.1.114 > 10.1.0.2.1234: UDP, length 187 packets captured10 packets received by filter3 packets dropped by kernel

107

Page 108: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet filter by BPF!

What is BPF?

108

Page 109: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

$ tcpdump -i eth0 -d 'udp and dst 10.1.0.2’

tcpdump with –d option

109

Page 110: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

$ tcpdump -i eth0 -d 'udp and dst 10.1.0.2’

(000) ldh [12](001) jeq #0x800 jt 2 jf 7

(002) ldb [23](003) jeq #0x11 jt 4 jf 7

(004) ld [30](005) jeq #0xa010002 jt 6 jf 7(006) ret #262144(007) ret #0

tcpdump with –d option

110

Page 111: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

$ tcpdump -i eth0 -d 'udp and dst 10.1.0.2’

(000) ldh [12](001) jeq #0x800 jt 2 jf 7

(002) ldb [23](003) jeq #0x11 jt 4 jf 7

(004) ld [30](005) jeq #0xa010002 jt 6 jf 7(006) ret #262144(007) ret #0

Ethernet Frame

MAC.Destination

MAC.Source

Type PAYLOAD(IP/IPv6/ARP…)

CRC

6 Bytes 6 Bytes2

Bytes 46-1500 Bytes 4 Bytes

64

111

Page 112: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

$ tcpdump -i eth0 -d 'udp and dst 10.1.0.2’

(000) ldh [12](001) jeq #0x800 jt 2 jf 7

(002) ldb [23](003) jeq #0x11 jt 4 jf 7

(004) ld [30](005) jeq #0xa010002 jt 6 jf 7(006) ret #262144(007) ret #0

Ver. IHL TOS Total Len.

Identification Flags Frag. Offset

TTL. Protocol Header Checksum

Source Address

Destination Address

Options(optional)

Ethernet Frame

MAC.Destination

MAC.Source

Type PAYLOAD(IP/IPv6/ARP…)

CRC

6 Bytes 6 Bytes2

Bytes 46-1500 Bytes 4 Bytes

64

112

Page 113: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is BPF?

Ver. IHL TOS Total Len.

Identification Flags Frag. Offset

TTL. Protocol Header Checksum

Source Address

Destination Address

Options(optional)

Ethernet Frame

MAC.Destination

MAC.Source

Type PAYLOAD(IP/IPv6/ARP…)

CRC

6 Bytes 6 Bytes2

Bytes 46-1500 Bytes 4 Bytes

64

$ tcpdump -i eth0 -d 'udp and dst 10.1.0.2’

(000) ldh [12](001) jeq #0x800 jt 2 jf 7(002) ldb [23](003) jeq #0x11 jt 4 jf 7

(004) ld [30](005) jeq #0xa010002 jt 6 jf 7

(006) ret #262144(007) ret #0

113

Page 114: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

In kernel Virtual Machine“Linux kernel code execution engine”

What is BPF?

114

Page 115: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

“Run code in the kernel”

What is BPF?

115

Page 116: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

“Run code in the kernel”

$ readelf -h bpf-prog.o | grep Machine

Machine: Advanced Micro Devices X86-64

What is BPF?

116

Page 117: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

“Run code in the kernel”

$ readelf -h bpf-prog.o | grep MachineMachine: Linux BPFMachine: Advanced Micro Devices X86-64

writing C program

clang / llc

BPF instruction

What is BPF?

117

Page 118: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

“Run code in the kernel”

$ readelf -h bpf-prog.o | grep MachineMachine: Linux BPFMachine: Advanced Micro Devices X86-64

but, restricted

writing C program

clang / llc

BPF instruction

What is BPF?

118

Page 119: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

“Run BPF code in the kernel”

$ readelf -h bpf-prog.o | grep MachineMachine: Linux BPFMachine: Advanced Micro Devices X86-64

but, restricted

writing C program

clang / llc

BPF instruction

What is BPF?

119

Page 120: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP?

eBPF based fast data-path

120

Page 121: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP?

eBPF based fast data-path

121

Page 122: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

L2

L3

L7 Userspace

IP (Netfliter)

Traffic Control

DD (XDP)

NIC

What is XDP?

122

Page 123: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

L2

L3

L7 Userspace

IP (Netfliter)

Traffic Control

DD (XDP)

NIC

Earliest Hook in RX pathof the kernel

What is XDP?

123

Page 124: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Call Path

__do_softirq() {net_rx_action() {ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();

ixgbe_run_xdp() {bpf_prog_run_xdp();

}

ixgbe_build_skb();ixgbe_rx_skb() {napi_gro_receive() {netif_receive_skb_internal();

Intel Device Driver RX function Call stack

124

Page 125: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Call Path

Right AfterInterrupt Processing

Before any memory allocation(Expensive operation)

__do_softirq() {net_rx_action() {ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();

ixgbe_run_xdp() {bpf_prog_run_xdp();

}

ixgbe_build_skb();ixgbe_rx_skb() {napi_gro_receive() {netif_receive_skb_internal();

125

Page 126: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Call Path

Right AfterInterrupt Processing

Before any memory allocation(Expensive operation)

__do_softirq() {net_rx_action() {ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();

ixgbe_run_xdp() {bpf_prog_run_xdp();

}

ixgbe_build_skb();ixgbe_rx_skb() {napi_gro_receive() {netif_receive_skb_internal();

Decide the fate of the packet

126

* User written XDP program

Page 127: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Actions

XDP_DROPXDP_ABORTXDP_PASSXDP_TXXDP_REDIRECT

127

Page 128: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Actions

XDP_DROP - Very fast drop by recycling

XDP_ABORT - Also drop, but with tracepoint

XDP_PASS - Toss packet to network stack

XDP_TX - Send packet back to same interface

XDP_REDIRECT - Transmit out other NICs

128

Page 129: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

XDP Actions

XDP_PASS

XDP_DROP

XDP_REDIRECT

XDP_TX

129

Page 130: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Actions

XDP

XDP_PASS

XDP_DROP XDP_REDIRECT

XDP_TX

User

NICRX TXNIC

130

Page 131: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

xdp_buff

NO Memory Allocation? NO sk_buff?

131

Page 132: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

xdp_buff

struct xdp_buff {void *data;void *data_end;void *data_meta;void *data_hard_start;unsigned long handle;struct xdp_rxq_info *rxq;

}; xdp_buff->data_hard_start

xdb_buff->data_meta

xdp_buff->data

xdp_buff->data_end

HEADROOM TAIL/TAILROOM

132

Page 133: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

xdp_buff

struct xdp_buff {void *data;void *data_end;void *data_meta;void *data_hard_start;unsigned long handle;struct xdp_rxq_info *rxq;

};

133

HEADROOM TAIL/TAILROOM

xdp_buff->data

xdp_buff->data_end

Page 134: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

xdp_buff

struct xdp_buff {void *data;void *data_end;void *data_meta;void *data_hard_start;unsigned long handle;struct xdp_rxq_info *rxq;

};

134

HEADROOM TAIL/TAILROOM

xdp_buff->data

xdp_buff->data_end

MAC IP UDP

Packet Data

Page 135: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

How to use XDP?

135

Page 136: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forward

With understanding XDP code

136

Page 137: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

DEMO

137

Page 138: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

packet

Test environment

10.1.0.1

NIC

Host A

User

IP

TC

DD

NIC

Host B

10.1.0.2:1234

10 Gbe

packetpacket

10.1.1.2:1234

NIC

Host C10 Gbe

packetpacket

packet

138

1. Destination NAT2. Packet Forward

10.1.0.2 -> 10.1.1.2

Page 139: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

139

Page 140: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Packet Forward

140

2. Packet Forward

10.1.0.2 -> 10.1.1.2

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data = (void *)(long)xdp->data;struct bpf_fib_lookup fib;struct ethhdr *eth = data;struct iphdr *iph;...if (eth->h_proto == htons(ETH_P_IP)) {

iph = data + sizeof(*eth);

if (iph->daddr == _htonl(0xa010002))iph->daddr = _htonl(0xa010102);

...

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

1. Destination NAT

Page 141: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

141

Page 142: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

142

-> ELF Section where XDP program will be located

Page 143: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

143

-> Name of the XDP program

Page 144: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

144

HEADROOM TAIL/TAILROOM

xdp_buff

xdp_buff->data

xdp_buff->data_end

Page 145: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

145

MAC

HEADROOM TAIL/TAILROOM

xdp_buff

-> Cast to Ethernet header

xdp_buff->data

xdp_buff->data_end

Page 146: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

146

MAC

HEADROOM TAIL/TAILROOM

xdp_buff

-> Validate the Ethernet header

xdp_buff->data

xdp_buff->data_end

Page 147: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

147

MAC IP

HEADROOM TAIL/TAILROOM

xdp_buff

-> If eth->protocol is IP, Cast to IP header

xdp_buff->data

xdp_buff->data_end

Page 148: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

148

MAC IP

HEADROOM TAIL/TAILROOM

xdp_buff

-> Validate the IP header

xdp_buff->data

xdp_buff->data_end

Page 149: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

149

MAC IP

HEADROOM TAIL/TAILROOM

xdp_buff

-> Check destination address 10.1.0.2

xdp_buff->data

xdp_buff->data_end

Page 150: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

150

MAC IP UDP

HEADROOM TAIL/TAILROOM

xdp_buff

-> If iph->protocol is UDP, Cast to UDP header

xdp_buff->data

xdp_buff->data_end

Page 151: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

151

HEADROOM TAIL/TAILROOM

xdp_buff

-> Validate the UDP header

MAC IP UDP

xdp_buff->data

xdp_buff->data_end

Page 152: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data_end = (void *)(long)xdp->data_end;void *data = (void *)(long)xdp->data;struct ethhdr *eth = data;struct iphdr *iph;struct udphdr *uh;

…if (eth + 1 > data_end)

return XDP_DROP;

if (eth->h_proto == htons(ETH_P_IP)) {iph = data + sizeof(*eth);if (iph + 1 > data_end)

return XDP_DROP;

if (iph->daddr == _htonl(0xa010002))if (iph->protocol == IPPROTO_UDP) {

uh = data + sizeof(*eth) + sizeof(*iph);if (uh + 1 > data_end)

return XDP_DROP;

if (uh->dest == htons(1234))iph->daddr = _htonl(0xa010102);

}…

XDP Packet Forward

152

HEADROOM TAIL/TAILROOM

xdp_buff

-> if dst port is 1234,Change dst address to 10.1.1.2

MAC IP UDP

xdp_buff->data

xdp_buff->data_end

Page 153: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

153

-> Prior to routing table lookup,prepare bpf_fib_lookup struct

Page 154: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

154

-> Fill the struct for queryex) src / dst address

Page 155: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

155

-> Make sure where packet comes from

Page 156: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

156

-> Query routing table for redirect interface

Page 157: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

157

-> Query routing table for redirect interface

$ route -n Kernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 _gateway 0.0.0.0 UG 101 0 0 enp6s010.1.0.0 0.0.0.0 255.255.255.0 U 111 0 0 eth0

10.1.1.0 0.0.0.0 255.255.255.0 U 110 0 0 eth1

192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0_gateway 0.0.0.0 255.255.255.255 UH 101 0 0 enp6s0

Page 158: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

158

-> If success, decrease ttl

Page 159: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

159

-> Change Mac address for src, dst

Page 160: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

struct bpf_fib_lookup fib;int rc;

...__builtin_memset(&fib, 0, sizeof(fib));fib.family = AF_INET;fib.tos = iph->tos;fib.l4_protocol = iph->protocol;fib.tot_len = ntohs(iph->tot_len);fib.ipv4_src = iph->saddr;fib.ipv4_dst = iph->daddr;fib.ifindex = xdp->ingress_ifindex;

} else {return XDP_PASS;

}

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {ip_decrease_ttl(iph); /* from include/net/ip.h */memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

XDP Packet Forward

160

-> Packet Forward with bpf_redirect!(returns with XDP_REDIRECT)

Forward

Page 161: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ clang -O2 -Wall -target bpf -c xdp-fwd.c -o xdp-fwd.o

$ bpftool prog load ./xdp-fwd.o /sys/fs/bpf/fwd$ bpftool prog show... 44: xdp name xdp_fwd_prog tag 1aa0135c2e55b38d gpl

loaded_at 2019-10-11T20:03:01+0900 uid 0xlated 760B jited 442B memlock 4096B

$ bpftool net attach xdp id 44 dev eth0$ bpftool netxdp:eth0(8) driver id 44

XDP Packet Forward

161

Compile with BPF target

Load program with bpftool

Attach to interface

Page 162: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Benchmark

162

Page 163: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

packet

Test environment

10.1.0.1

NIC

Host A

User

IP

TC

DD

NIC

Host B

10.1.0.2:1234

10 Gbe

packetpacket

10.1.1.2:1234

NIC

Host C10 Gbe

packetpacket

packet

163

1. Destination NAT2. Packet Forward

10.1.0.2 -> 10.1.1.2

Page 164: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Hooks to process packet?

L2

L3

L7 Userspace

IP (Netfliter)

Traffic Control

DD (XDP)

NIC

164

Page 165: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Userspace Packet Forward

Forward

165

Page 166: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Userspace Packet Forward

NIC

Host A Host B

NIC

Host C

User

IP

TC

DD

NIC

Forwardstream {server {listen 1234 udp;proxy_pass udp_target;

}

upstream udp_target {server 10.1.1.2:1234;

}

}

/etc/nginx/nginx.conf10.1.0.1 10.1.0.2:1234 10.1.1.2:1234

166

UDP Load Balancerpacket forward with DNAT

Page 167: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ ethtool -S eth0 | grep rx_xdp_droprx_xdp_drop: 370841rx_xdp_drop: 754718rx_xdp_drop: 1118515rx_xdp_drop: 1483198rx_xdp_drop: 1858322rx_xdp_drop: 2237080rx_xdp_drop: 2609916rx_xdp_drop: 2985062rx_xdp_drop: 3342212rx_xdp_drop: 3725703rx_xdp_drop: 4105888

Average Packet Forward

373,263pps/core

Userspace Packet Forward

≈ 260Mbit/s

167

Page 168: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

How about Netfilter?

168

Page 169: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

Netfilter Packet Forward

169

Forward

Page 170: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ iptables -t nat -A PREROUTING -d 10.1.0.2 -p udp --dport 1234 \-j DNAT --to-destination 10.1.1.2:1234

$ iptables -t nat -L PREROUTINGChain PREROUTING (policy ACCEPT)target prot opt source destinationDNAT udp -- anywhere 10.1.0.2 udp dpt:1234 to:10.1.1.2:1234

Netfilter Packet Forward

10.1.0.1

NIC

Host A Host B

10.1.0.2:1234 10.1.1.2:1234

NIC

Host C

User

IP

TC

DD

NIC

Forward

170

Netfilter (iptables)packet forward with DNAT

Page 171: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Netfilter Packet Forward

$ ethtool -S eth0 | grep rx_xdp_droprx_xdp_drop: 686991rx_xdp_drop: 1377146rx_xdp_drop: 2064571rx_xdp_drop: 2755620rx_xdp_drop: 3418211rx_xdp_drop: 4050284rx_xdp_drop: 4682626rx_xdp_drop: 5366274rx_xdp_drop: 6054439rx_xdp_drop: 6743962rx_xdp_drop: 7434054rx_xdp_drop: 8003538rx_xdp_drop: 8693454

Average Packet Forward

668,728pps/core

≈ 960Mbit/s

171

Page 172: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What about TC?

172

Page 173: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

TC Ingress Packet Forward

ROUTING

Qdisc Qdisc

173

Forward

Page 174: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ tc qdisc add dev eth0 ingress$ tc filter add dev eth0 parent ffff: protocol ip u32 \

match ip dst 10.1.0.2 match ip dport 1234 0xffff \action nat ingress 10.1.0.2/32 10.1.1.2/32 pipe \action mirred egress redirect dev eth1

$ tc filter show ingress dev eth0...match 0a010002/ffffffff at 16match 000004d2/0000ffff at 20

action order 1: nat ingress 10.1.0.2/32 10.1.1.2 pipeindex 1 ref 1 bind 1

action order 2: mirred (Egress Redirect to eth1) stolenindex 1 ref 1 bind 1

TC Ingress Packet Forward

NIC

Host A Host B

NIC

Host C

User

IP

TC

DD

NIC

Forward

10.1.0.1 10.1.0.2:1234 10.1.1.2:1234

174

TC Ingresspacket forward with DNAT (redirect)

Page 175: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

TC Ingress Packet Forward

$ ethtool -S eth0 | grep rx_xdp_droprx_xdp_drop: 1407789rx_xdp_drop: 2848601rx_xdp_drop: 4274539rx_xdp_drop: 5695547rx_xdp_drop: 7132990rx_xdp_drop: 8533688rx_xdp_drop: 9943697rx_xdp_drop: 11367744rx_xdp_drop: 12808213rx_xdp_drop: 14234151rx_xdp_drop: 15685314

Average Packet Forward

1,425,938pps/core

175

≈ 1Gbit/s

Page 176: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

And… XDP?

176

Page 177: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

UPPER LAYER

XDP

TC Ingress

ROUTING FORWARDING

OUTPUT

TC Egress

ROUTING

L4~

L3

L2

PREROUTING

INPUT

POSTROUTING

Rx Tx

XDP Packet Forward

ROUTING

177

Forward

Page 178: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Packet Forward

178

2. Packet Forward

10.1.0.2 -> 10.1.1.2

SEC("xdp_fwd")int xdp_fwd_prog(struct xdp_md *xdp) {

void *data = (void *)(long)xdp->data;struct bpf_fib_lookup fib;struct ethhdr *eth = data;struct iphdr *iph;...if (eth->h_proto == htons(ETH_P_IP)) {

iph = data + sizeof(*eth);

if (iph->daddr == _htonl(0xa010002))iph->daddr = _htonl(0xa010102);

...

rc = bpf_fib_lookup(xdp, &fib, sizeof(fib), 0);

if (rc == BPF_FIB_LKUP_RET_SUCCESS) {memcpy(eth->h_dest, fib.dmac, ETH_ALEN);memcpy(eth->h_source, fib.smac, ETH_ALEN);return bpf_redirect(fib.ifindex, 0);

}

return XDP_PASS;}

1. Destination NAT

Page 179: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ bpftool prog load ./xdp-fwd.o /sys/fs/bpf/fwd$ bpftool prog show... 44: xdp name xdp_fwd_prog tag 1aa0135c2e55b38d gpl

loaded_at 2019-10-11T20:03:01+0900 uid 0xlated 760B jited 442B memlock 4096B

$ bpftool net attach xdp id 44 dev eth0$ bpftool netxdp:eth0(8) driver id 44

XDP Packet Forward

NIC

Host A Host B

NIC

Host C

User

IP

TC

DD

NIC

Forward

10.1.0.1 10.1.0.2:1234 10.1.1.2:1234

179

XDPpacket forward with DNAT (XDP_FORWARD)

Page 180: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ ethtool -S eth0 | grep rx_xdp_droprx_xdp_drop: 3031344rx_xdp_drop: 6057680rx_xdp_drop: 9088916rx_xdp_drop: 12118560rx_xdp_drop: 15150319rx_xdp_drop: 18180559rx_xdp_drop: 21210383rx_xdp_drop: 24240271rx_xdp_drop: 27270543rx_xdp_drop: 30300559rx_xdp_drop: 33329483rx_xdp_drop: 36356788

XDP Packet Forward

Average Packet Forward

3,029,764pps/core

≈ 2.04Gbit/s

180

Page 181: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Forward Results

181

373,263

668,728

1,425,938

3,029,764

0

1

1

2

2

3

3

4

userspace netfilter tc xdp

Mpps

≈ 2.04Gbit/s

Page 182: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

More about XDP?

XDP Offload and Further usages

182

Page 183: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Modes

183

Page 184: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Modes

Generic XDPNative XDP

Offloaded XDP

184

Page 185: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Modes

Packet NIC

XDP

Driver

XDP

Generic

XDP

Netfilter …TC

XDP can happen here

NetworkHardware

NetworkDriver

Linux KernelNetwork Stack

185

NativeOffloaded Generic

Page 186: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Modes

Packet NIC

XDP

Driver

XDP

Generic

XDP

Netfilter …TC

NetworkHardware

NetworkDriver

Linux KernelNetwork Stack

186

NativeOffloaded Generic

=Normally, XDP?

Page 187: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Modes

Generic XDP - No driver support needed

Native XDP

- Intel (ixgbe, ixgbevf, i40e)

- Mellanox (mlx4, mlx5)

- Broadcom (bnxt)

- Qlogic (qede)

- Netronome (nfp)

- Others (virtio, tun)

(Most of the 10Gbe Driver)

Offloaded XDP - Netronome (nfp)

187

Page 188: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP OFFLOAD

188

Page 189: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP OFFLOAD?

0: r0 = 11: r2 = *(u32 *)(r1 + 4)2: r1 = *(u32 *)(r1 + 0)3: r3 = r14: r3 += 145: if r3 > r2 goto +18 <LBB0_8>6: r3 = *(u8 *)(r1 + 12)7: r4 = *(u8 *)(r1 + 13)8: r4 <<= 89: r4 |= r310: if r4 != 8 goto +12 <LBB0_7>11: r3 = r112: r3 += 3413: if r3 > r2 goto +10 <LBB0_8>14: r3 = *(u32 *)(r1 + 30)15: if r3 != 33554698 goto +7 <LBB0_7>16: r3 = *(u8 *)(r1 + 23)17: if r3 != 17 goto +5 <LBB0_7>18: r3 = r119: r3 += 42...

XDP offload

BPF Program

189

Page 190: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

What is XDP OFFLOAD?

0: r0 = 11: r2 = *(u32 *)(r1 + 4)2: r1 = *(u32 *)(r1 + 0)3: r3 = r14: r3 += 145: if r3 > r2 goto +18 <LBB0_8>6: r3 = *(u8 *)(r1 + 12)7: r4 = *(u8 *)(r1 + 13)8: r4 <<= 89: r4 |= r310: if r4 != 8 goto +12 <LBB0_7>11: r3 = r112: r3 += 3413: if r3 > r2 goto +10 <LBB0_8>14: r3 = *(u32 *)(r1 + 30)15: if r3 != 33554698 goto +7 <LBB0_7>16: r3 = *(u8 *)(r1 + 23)17: if r3 != 17 goto +5 <LBB0_7>18: r3 = r119: r3 += 42...

XDP offload

BPF Program

190

Runs more earlierNo CPU usage

Page 191: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Drop Results

783,063 1,266,730

4,083,820

9,941,337

0

2

4

6

8

10

12

userspace netfilter tc xdp

Mpps

191

≈ 6.69Gbit/s

Out of 10Gbit/s

Page 192: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

$ bpftool prog load ./xdp-drop.o /sys/fs/bpf/drop$ bpftool prog show ...24: xdp name xdp_prog1 tag 6f8c2e06dfa2abcb gpl

loaded_at 2019-10-11T17:17:33+0900 uid 0xlated 544B jited 344B memlock 4096B map_ids 17

$ bpftool net attach xdpoffload id 18 dev eth0$ bpftool netxdp:eth0(8) offload id 18

XDP Offload Packet Drop

10.1.0.1

NIC

Host A

User

IP

TC

DD

NIC

Host B

DROP10 Gbe

10.1.0.2:1234

192

XDP offloadpacket drop with XDP_DROP

Page 193: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Offload Packet Drop

$ ethtool -S eth0 | grep bpf_app1_pkts

bpf_app1_pkts: 9340887 bpf_app1_pkts: 14274730bpf_app1_pkts: 29233693bpf_app1_pkts: 44165498bpf_app1_pkts: 59097632bpf_app1_pkts: 74057835bpf_app1_pkts: 88990848bpf_app1_pkts: 103947963bpf_app1_pkts: 118881530bpf_app1_pkts: 133838008bpf_app1_pkts: 148770851bpf_app1_pkts: 163705144bpf_app1_pkts: 178664224bpf_app1_pkts: 193596262bpf_app1_pkts: 208554907bpf_app1_pkts: 223488168

Average Packet Drop

14,883,153pps/core

≈ 10.00Gbit/s

193

Page 194: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Packet Drop Results with XDP Offload

783,063 1,266,730

4,083,820

9,941,337

14,883,153

0

2

4

6

8

10

12

14

16

userspace netfilter tc xdp xdp-offload

Mpps

194

Out of 10Gbit/s

≈ 10.00Gbit/s

6.69Gbit/s

Page 195: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Offload Packet Drop

ixgbe_poll() {ixgbe_clean_rx_irq() {ixgbe_get_rx_buffer();ixgbe_run_xdp() {bpf_prog_run_xdp();

}

XDPXDP Offload

DROP

DROP

NONE!

195

Page 196: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Offload Packet Drop

XDP XDP Offload

196

Page 197: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

XDP Use Cases

• Load Balancing

• Packet Tunneling (Encapsulation)

• DDoS attack mitigation

• Network monitoring

• ETC..

197

Page 198: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

Sample code, test results can be found:

github.com/DanielTimLee/soscon19_XDP

198

Page 199: Faster Packet Processing in Linux: XDP 2_1100_3.pdfFaster Packet Processing in Linux: XDP ... 192.168.0.0 0.0.0.0 255.255.255.0 U 101 0 0 enp6s0 _gateway 0.0.0.0 255.255.255.255 UH

SAMSUNG OPEN SOURCE CONFERENCE 2019

THANK YOU

199