52
Feature Rich Flow Monitoring with P4 John Sonchack University of Pennsylvania

Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

1 ©2017 Open-NFP

Feature Rich Flow Monitoring with P4

John Sonchack University of Pennsylvania

Page 2: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

2 ©2017 Open-NFP

Outline

▪  Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪  Benchmarks and Optimizations

Page 3: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

3 ©2017 Open-NFP

Outline

▪  Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪  Benchmarks and Optimizations

Page 4: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

4 ©2017 Open-NFP

Introduction: Flow Records

▪  A flow record summarizes groups of packets in the same TCP / UDP stream

Flow key (IP 5-tuple)

Flow features (statistics and meta-data)

Page 5: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

5 ©2017 Open-NFP

Introduction: Flow Record Applications

▪  Flow records are input to analysis applications

▪  Examples: •  Security: Detect botnets, intrusions,

DoS attacks, port scans •  Traffic management: Traffic

classification, QoS routing, flow scheduling, load balancing

•  Debugging: Performance monitoring, loop and black hole detection

•  More: –  “A survey of network flow

applications” B. Li, et al. –  “An overview of ip flow-based

intrusion detection” A.Sperotto, et al. –  NetFlow/IPFIX applications

Flow records

Information

Reconfiguration

Page 6: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

6 ©2017 Open-NFP

Introduction: Flow Record Benefits

▪  Flow records reduce monitoring costs

▪  Important for high coverage monitoring •  visibility into many (or all) links •  costs add up

Packet Headers Flow Records Volume 2GB per second 1.38MB per second

Rate 328k per second 8.9k per second Tablesource:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)

Flow records

Information

Reconfiguration

Page 7: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

7 ©2017 Open-NFP

Introduction: Flow Record Richness

▪  An important quality of flow records is their information richness:

Accuracy: Account for every packet and flow on monitored links

Feature richness: provide the features that applications use

Page 8: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

8 ©2017 Open-NFP

Introduction: Flow Record Generation

▪  The problem: current approaches trade overhead for information richness

So#wareGenera+on SamplingSwitches DedicatedASICs

+Informa+onrich-Highoverhead

-Reducesaccuracy+Lowoverhead

-Limitedfeatureset+Lowoverhead

Flow recordsSampled flow records Fixed flow records

Page 9: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

9 ©2017 Open-NFP

Introduction: our Goal

▪  Flow record generation that is efficient, accurate, and feature rich

+Informa+onrich+Lowoverhead

Thiswork:so#waregenera+on+P4hardwareaccelera+onSo#waregenerators

+Informa+onrich-Highoverhead

Flow recordsFlow records

+

Page 10: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

10 ©2017 Open-NFP

Outline

▪  Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record

Generation ▪  Benchmarks and Optimizations

Page 11: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

11 ©2017 Open-NFP

Packets

Flow Records

Flow Records

Microflow Records

Packets

P4 Accelerated Flow Record Generation ▪  Main Idea: Use P4 hardware to preprocess packets into

micro flow records that summarize per-flow packet bursts

Totalwork:map8

packetsto3flowrecords

Totalwork:map4micro

flowrecordsto3flowrecords

•  Informa+onrich•  featuresarefully

customizable•  Efficient

•  P4hardwarereducesCPUworkload

So#wareflowrecordgenera+on

P4acceleratedflowrecordgenera+on

Page 12: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

12 ©2017 Open-NFP

First attempt: CPU Managed Tables

PacketsFlow Record Generator.P4

Insert

Update

Key Pkt Ct.

1

12

FlowGenerator_controller.c

RemoveInsert

CPU

Flow Records

P4 Hardware Forwarding Table

Page 13: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

13 ©2017 Open-NFP

PacketsFlow Record Generator.P4

Insert

Update

Key Pkt Ct.

1

12

FlowGenerator_controller.c

RemoveInsert

CPU

Flow Records

P4 Hardware Forwarding Table

First attempt: CPU Managed Tables

▪  Bottleneck: table update operations ▪  P4 designed for forwarding tables that are not highly dynamic

NotLineRate!

Flowarrivalrate:~50kflowspersec.(10Gbit/sInternetlink)

Controllerruleupdaterate:~200–5kpersecond(hwdep.)

Page 14: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

14 ©2017 Open-NFP

A Better Design: Minimal CPU Interaction 1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.8

Hash

Key1.1.1.1—>…

52.2.2.2—>…

Page 15: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

15 ©2017 Open-NFP

A Better Design: Minimal CPU Interaction 1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.8

Hash

Key1.1.1.1—>…

52.2.2.2—>…

Page 16: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

16 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.8

Hash

Key1.1.1.1—>…

52.2.2.2—>…

A Better Design: Minimal CPU Interaction

Page 17: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

17 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.9

Hash

Key1.1.1.1—>…

52.2.2.2—>…

A Better Design: Minimal CPU Interaction

Page 18: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

18 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.11

Hash

Key1.1.1.1—>…

72.2.2.2—>…

A Better Design: Minimal CPU Interaction

Page 19: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

19 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.11

Hash

Key1.1.1.1—>…

72.2.2.2—>…

Collision

A Better Design: Minimal CPU Interaction

Page 20: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

20 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records. 2.  Evict to CPU on collision.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.11

Hash

Key1.1.1.1—>…

13.3.3.3—>…

Insert

Normal Hash Table

72.2.2.2—>…

Collision

A Better Design: Minimal CPU Interaction

Page 21: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

21 ©2017 Open-NFP

1.  Use a flat array, i.e., P4 register, indexed by hash to store records. 2.  Evict to CPU on collision.

Packets

MicroFlowGenerator.P4

MicroFlowAggregator.c

P4 Hardware

CPUFlow Records

Pkt Ct.11

Hash

Key1.1.1.1—>…

13.3.3.3—>…

Insert

Normal Hash Table

72.2.2.2—>…

Timeout

A Better Design: Minimal CPU Interaction

Page 22: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

22 ©2017 Open-NFP

A Simple Implementation

Page 23: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

23 ©2017 Open-NFP

Packets

Flow Records

Flow Records

Microflow Records

Packets

P4 Accelerated Flow Record Generation ▪  Main Idea: Use P4 hardware to preprocess packets into

micro flow records that summarize per-flow packet bursts

Totalwork:map8

packetsto3flowrecords

Totalwork:map4micro

flowrecordsto3flowrecords

•  Informa+onrich•  featuresarefully

customizable•  Efficient

•  P4HWreducescpuworkload

Commodityserverflowrecordgenera+on

P4acceleratedflowrecordgenera+on

Page 24: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

24 ©2017 Open-NFP

Outline

▪  Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪ Benchmarks and Optimizations

Page 25: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

25 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload? 2.  What is the maximum throughput of the P4 component? 3.  How can we optimize for the NFP-4000?

Page 26: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

26 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload? 2.  What is the maximum throughput of the P4 component? 3.  How can we optimize for the NFP-4000?

Page 27: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

27 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

Workload:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)

Eachmicroflowrepresents~10packets,onaverage

Page 28: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

28 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

Workload:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)

Eachmicroflowrepresents~10packets,onaverage

Sta$s$c Packetsaggrega+on Microflowaggrega+on

Avg.ratefor10Gbit/slink

~500k ~50k

CPUaggregatorthroughput(percore)

~600k ~600k

totalCPUmonitoringcapacity(percore)

~1x10Gbit/slink ~10x10Gbit/slinks

Packets

Flow Records

Flow Records

Microflow Records

Packets

Page 29: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

29 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component? 3.  How can we optimize it for the NFP-4000?

Page 30: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

30 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

Page 31: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

31 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

packetsize bitrate@10Mpkts/sec

64 ~5Gbit/s

128 ~10Gbit/s

256 ~20Gbit/s

512 ~40Gbit/s

1024 ~80Gbit/s

Averagepacketsizeson10Gbit/sinternetrouterlinks(caida)

Page 32: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

32 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets)

packetsize bitrate@10Mpkts/sec

64 ~5Gbit/s

128 ~10Gbit/s

256 ~20Gbit/s

512 ~40Gbit/s

1024 ~80Gbit/s

Averagepacketsizeson10Gbit/sinternetrouterlinks(caida)

Page 33: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

33 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets) 3.  How can we optimize for the NFP-4000?

Page 34: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

34 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets) 3.  How can we optimize for the NFP-4000?

Op[miza[on1:finergrainedsemaphores

Page 35: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

35 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets) 3.  How can we optimize for the NFP-4000?

Op[miza[on1:finergrainedsemaphores

Page 36: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

36 ©2017 Open-NFP

Core 1

Core 2

Pkt Ct.Key

53.3.3.3—>…

packet.key = 4.4.4.4 —> … packet.hash = 3

1: read key @ 3

packet.key = 3.3.3.3 —> … packet.hash = 3

2: read key @ 3

3: replace 3

4: update 3

Optimization 1: Fine Grained P4 Semaphores

Q: Are there race conditions?

Page 37: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

37 ©2017 Open-NFP

Core 1

Core 2

Pkt Ct.Key

53.3.3.3—>…

packet.key = 4.4.4.4 —> … packet.hash = 3

1: read key @ 3

packet.key = 3.3.3.3 —> … packet.hash = 3

2: read key @ 3

3: replace 3

4: update 3

Optimization 1: Fine Grained P4 Semaphores

Incorrect!

Q: Are there race conditions? A: Yes.

Page 38: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

38 ©2017 Open-NFP

Optimization 1: Fine Grained P4 Semaphores

Core 1

Core 2

Pkt Ct.Key

53.3.3.3—>…

packet.key = 4.4.4.4 —> … packet.hash = 3

1: read key @ 3

packet.key = 3.3.3.3 —> … packet.hash = 3

3: read key @ 3

2: replace 3

4: replace 3

WAIT

Q: Are there race conditions? A: Yes.

Page 39: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

39 ©2017 Open-NFP

Optimization 1: Fine Grained P4 Semaphores

Core 1

Core 2

Pkt Ct.Key

53.3.3.3—>…

packet.key = 4.4.4.4 —> … packet.hash = 3

1: read key @ 3

packet.key = 3.3.3.3 —> … packet.hash = 3

3: read key @ 3

2: replace 3

4: replace 3

WAIT

Page 40: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

40 ©2017 Open-NFP

Optimization 1: Fine Grained P4 Semaphores

Core 1

Core 2

Pkt Ct.Key

53.3.3.3—>…

1: read key @ 3

2: replace 3WAIT

Core 3

WAIT

Core N

WAIT

Page 41: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

41 ©2017 Open-NFP

Optimization 1: Fine Grained P4 Semaphores

lock/unlockbycallingaP4ac[on.

Lockasingleentry,nottheen[rearray.

Page 42: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

42 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets) 3.  How can we optimize for the NFP-4000?

1.  Fine grained P4 semaphores

Op[miza[on2:combiningtables

Page 43: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

43 ©2017 Open-NFP

Optimization 2: Combining Tables

Version Cycles/packet

Improvement

Baseline 5084 -

Page 44: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

44 ©2017 Open-NFP

Optimization 2: Combining Tables

Version Cycles/packet

Improvement

Baseline 5084 -

computehashandgetkeymerged

4191 18%

Page 45: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

45 ©2017 Open-NFP

Optimization 2: Combining Tables

Version Cycles/packet

Improvement

Baseline 5084 -

computehashandgetkeymerged

4191 18%

Changingtablesisasexpensiveas~5ememops(register_reads/writes).

Page 46: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

46 ©2017 Open-NFP

Optimization 2: Combining Tables

Canwemergeallthetables?Challenge:

branches.

Page 47: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

47 ©2017 Open-NFP

Optimization 2: Combining Tables

InC,wecouldusecondi[onalassignments.

Canwemergeallthetables?Challenge:

branches.

Page 48: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

48 ©2017 Open-NFP

Optimization 2: Combining Tables

InP4,wecanuseacondiAonalmask.

Canwemergeallthetables?Challenge:

branches.

hdp://homolog.us/blogs/blog/2014/12/04/the-en[re-world-of-bit-twiddling-hacks/

Encodeasmodify_field+register_writeinP4.

Page 49: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

49 ©2017 Open-NFP

Optimization 2: Combining Tables

hdp://homolog.us/blogs/blog/2014/12/04/the-en[re-world-of-bit-twiddling-hacks/

Version Cycles/packet

Improvement

Baseline 5084 -

computehashandgetkeymerged

4191 18%

singletable 2915 42%

Endresult:~20-30%bederthroughput.

Page 50: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

50 ©2017 Open-NFP

Benchmarks and Optimizations 1.  How much does the P4 hardware reduce CPU workload?

(~10x with 1 MB of P4 HW memory) 2.  What is the maximum throughput of the P4 component?

(~40-80 Gbit/s with average size packets) 3.  How can we optimize for the NFP-4000?

1.  Fine grained P4 semaphores 2.  Combine tables, use conditional masks

Page 51: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

51 ©2017 Open-NFP

Summary

▪  Flow records are powerful for high coverage, low overhead network monitoring.

▪ Generating them efficiently, without sacrificing information richness, is a challenge.

▪ Our P4 accelerator can increase flow generator capacity by factor of 10 or more, without sacrificing information richness.

▪  Table merging, conditional masks, and P4 accessible semaphores are useful and portable optimization techniques.

▪ Code available (soon) at: https://github.com/jsonch/p4_code ▪  Thank you for attending!

Page 52: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key

52 ©2017 Open-NFP

THANK YOU!