33
VPP Host Stack Transport and Session Layers Florin Coras, Dave Barach

VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host StackTransport and Session Layers

Florin Coras, Dave Barach

Page 2: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

EFFICIENCY

PERFORMANCE

SOFTWARE DEFINED NETWORKING

CLOUD NETWORK SERVICES

LINUX FOUNDATION

VPP - A Universal Terabit Network PlatformFor Native Cloud Network Services

Superior Performance

Most Efficient on the Planet

Flexible and Extensible

Open Source

Cloud Native

Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !

Page 3: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Motivation: Container networking

FD.io Mini-Summit at KubeCon Europe 2018

FIFO

TCP

IP (routing)

device

send()

FIFO

TCP

IP (routing)

device

recv()

kernel

glibc

PID 1234 PID 4321

Page 4: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Motivation: Container networking

FIFO

PID 1234

TCP

IP (routing)

device

send()

FIFO

PID 4321

TCP

IP (routing)

device

recv()

FIFO

device

FIFO

device

VPP

af_packet/tap

etc etc etcACL, SR, VXLAN, LISP

IP4/6MPLS

Ethernet

dpdk

dpdk

device

af_packet/tap

FD.io Mini-Summit at KubeCon Europe 2018

Page 5: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Why not this?

PID 1234 PID 4321

recv()

FIFOFIFO

TCP

IP

DPDK

send()

Session

FD.io Mini-Summit at KubeCon Europe 2018

VPP

Page 6: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

Page 7: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack: Session Layer

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

§ Maintains per app state and conveys to/from session events

§ Allocates and manages sessions/segments/fifos§ Isolates network resources via namespacing§ Session lookup tables (5-tuple) and local/global

session rule tables (filters)§ Support for pluggable transport protocols§ Binary/native C API for external/builtin

applications

shmsegmentrx tx

Page 8: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack: SVM FIFOs

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

§ Allocated within shared memory segments with

or without file backing (ssvm/memfd)

§ Fixed position and size

§ Lock free enqueue/dequeue but atomic size

increment

§ Option to dequeue/peek data

§ Support for out-of-order data enqueues

shm

segmentrx tx

Page 9: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack: TCP

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

§ Clean-slate implementation§ “Complete” state machine implementation§ Connection management and flow control

(window management)§ Timers and retransmission, fast retransmit, SACK§ NewReno congestion control, SACK based fast

recovery§ Checksum offloading§ Linux compatibility tested with IWL TCP protocol

tester

Page 10: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack: more transports

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

§ SCTP§ UDP§ TLS

Page 11: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

VPP Host Stack: Comms Library (VCL)

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

§ Comms library (VCL) apps can link against§ LD_PRELOAD library for legacy apps§ epoll

shmsegmentrx tx

Page 12: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Application Attachment

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

TCP

IP, DPDK

VPP

attachbind (server)connect (client)

Binary API

shmsegment

Page 13: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Session Establishment

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

Binary API Binary API

attachbind

listen

Page 14: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Session Establishment

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

Binary API

attachconnect

open

Binary API

attachbind

listen

Page 15: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Session Establishment

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

Binary API

handshake

Binary API

Page 16: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Session Establishment

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

Binary API

handshake

Binary API

new clientconnect succeeded

Page 17: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Session Establishment

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

Binary API

connect reply

Binary API

accept notifyshm

segmentshm

segmentrx tx rx tx

Page 18: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Data Transfer

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

read

copy to buffer copy to fifo

rx tx rx tx

write

Congestion controlReliable transport

Binary API

tx write evt

Binary API

rx write evt

Page 19: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Data Transfer

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Session

Server

TCP

IP, DPDK

VPP

read

copy to buffer copy to fifo

rx tx rx tx

write

Congestion controlReliable transport

Binary API

tx write evt

Binary API

rx write evt

Some rough numbers on a E2699: ~12Gbps/core (1.5k MTU), ~20Gbps/core (9k MTU), ~185k CPS!

Page 20: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Data Transfer: Dgram Transports

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

UDP

IP, DPDK

VPP

Session

Server

UDP

IP, DPDK

VPP

Binary API

Data and dgram header dequeued from fifo

Binary API

listenshm

segmentshm

segmentrx tx rx tx

Data enqueued in fifowith dgram hdr

“connect”

Page 21: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Server

bindBinary API

Page 22: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon Europe 2018

Session

TCP

IP, DPDK

VPP

redirectconnect

Client Server

Binary API

Page 23: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon Europe 2018

Session

TCP

IP, DPDK

VPP

VPP tracks these sessions, allocates ssvm segments and asks both peers to map them.

Client Server

Binary API

Page 24: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon Europe 2018

Session

Client

TCP

IP, DPDK

VPP

Server

Binary API

Throughput is memory bandwidth constrained: ~120Gbps!

Page 25: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Multi-threading for stream connections

FD.io Mini-Summit at KubeCon Europe 2018

Session

App1

Binary API

Session

DPDK

rx tx rx tx

TCP

IP

TCP

IP

Core 0 Core 1

§ Connections/sessions ’pinned’ to a thread

§ Per-thread data structures/state

Page 26: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Features: Namespaces

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

VPP

Session

TCP

Session

TCP

IP IP IP

ns1 ns2 ns3

fib1 fib2

Request access to vpp ns + secret

Namespaces are configured independently and associate applications to network layer resources like interfaces and fib tables

Page 27: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Features: Session Tables

FD.io Mini-Summit at KubeCon Europe 2018

NS Local Session Table

Binary API

TCP

NS Local Session Table

TCP

ns1 ns2

fib1

Global Session Table

App1

Request access to global and/or local scope

Page 28: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Features: Session Tables

FD.io Mini-Summit at KubeCon Europe 2018

NS Local Session Table

Binary API

TCP

NS Local Session Table

TCP

ns1 ns2

fib1

Global Session Table

§ Both table have “rules table” that can be used for filtering

§ Local tables are namespace specific and can be used for egress filtering

§ Global tables are fib table specific and can be used for ingress filtering

App1

Page 29: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

TLS App

FD.io Mini-Summit at KubeCon Europe 2018

App

TLS App

App Session

TCP

TLS Engine(openssl, mbedtls)

TLS context

rx tx

rx tx

§ TLS App registers as transport at VPP init time

§ TLS protocol implementation handled by plugin “engines”. We support openssl and mbedtls

§ Client app registers key and certificate via api and requests tls as session transport

§ CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/ca-certificates.crt

§ Ping and Ray from Intel working on accelerating the openssl engine with QAT cards

Page 30: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

TLS App

FD.io Mini-Summit at KubeCon Europe 2018

App

TLS App

App Session

TCP

TLS Engine(openssl, mbedtls)

TLS context

rx tx

rx tx

§ TLS App registers as transport at VPP init time

§ TLS protocol implementation handled by plugin “engines”. We support openssl and mbedtls

§ Client app registers key and certificate via api and requests tls as session transport

§ CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/ca-certificates.crt

§ Ping and Ray from Intel working on accelerating the openssl engine with QAT cards

Some rough OpenSSL numbers on a E2699: ~1Gbps/core (no hw accel)

Page 31: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Ongoing work

• Overall integration with k8s• Istio/Envoy

• TCP• Rx policer/tx pacer• TSO • New congestion control algorithms• PMTU discovery• Optimization/hardening/testing

FD.io Mini-Summit at KubeCon Europe 2018

Page 32: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

• Get the Code, Build the Code, Run the Code• Session layer: src/vnet/session• TCP: src/vnet/tcp• SVM: src/svm• VCL: src/vcl

• Read/Watch the Tutorials• Read/Watch VPP Tutorials• Join the Mailing Lists

FD.io Mini-Summit at KubeCon Europe 2018

Next steps – Get involved

Page 33: VPP Host Stack - FD.io · 2018-04-30 · VPP Host Stack: Session Layer FD.io Mini-Summit at KubeCon Europe 2018 Session App Binary API TCP IP, DPDK VPP § Maintains per app state

Thank you!

FD.io Mini-Summit at KubeCon Europe 2018

? Florin Corasemail:([email protected])irc: florinc