49
Deeper Dive in Docker Overlay Networks Laurent Bernaille @lbernail CTO D2SI

Deeper dive in Docker Overlay Networks

Embed Size (px)

Citation preview

Page 1: Deeper dive in Docker Overlay Networks

Deeper Dive in Docker

Overlay Networks

Laurent Bernaille@lbernail

CTO D2SI

Page 2: Deeper dive in Docker Overlay Networks

Agenda

Reminder on the Docker Overlay

VXLAN Control Plane options

Using BGP as a dynamic Control Plane

What can we do with this?

Page 3: Deeper dive in Docker Overlay Networks

Reminder on the Docker overlay

Page 4: Deeper dive in Docker Overlay Networks

The Docker Overlay networkdocker0:~$ docker network create --driver overlay --subnet 192.168.0.0/24 dockercon

d099dcc709daddbc0e143c24e7091bef6b13bdc3abb379473af4582bf1e112b1

docker1:~$ docker network ls

NETWORK ID NAME DRIVER SCOPE

d099dcc709da dockercon overlay global

docker0:~$ docker run -d --ip 192.168.0.100 --net dockercon --name C0 debian sleep infinity

docker1:~$ docker run -it --rm --net dockercon debian

root@950d67e96db7:/# ping 192.168.0.100

PING 192.168.0.100 (192.168.0.100): 56 data bytes

64 bytes from 192.168.0.100: seq=0 ttl=64 time=1.153 ms

Page 5: Deeper dive in Docker Overlay Networks

Docker Overlay: Data plane

docker0

eth0

192.168.0.100

C0 Namespace

br0

vxla

nve

th

eth0

docker1

C1 Namespace

br0

vxla

nve

th

eth0PING

eth0

192.168.0.Y

10.0.0.10 10.0.0.11IPsrc: 10.0.0.11dst: 10.0.0.10

UDPsrc: Xdst: 4789

VXLANVNI

Original L2src: 192.168.0.Ydst: 192.168.0.100

Page 6: Deeper dive in Docker Overlay Networks

What is VXLAN?• Tunneling technology over UDP (L2 in UDP)

• Developed for cloud SDN to create multi-tenancy

• Without the need for L2 connectivity

• Without the normal VLAN limit (4096 VLAN Ids)

• Easy to encrypt: IPsec

• Overhead: 50 bytes

• In Linux

• Started with Open vSwitch

• Native with Kernel >= 3.7 and >=3.16 for Namespace support

Outer IP packetUDPdst: 4789

VXLANHeader

Original L2

VXLAN: Virtual eXtensible LAN

VNI: VXLAN Network Identifier

VTEP: VXLAN Tunnel Endpoint

Page 7: Deeper dive in Docker Overlay Networks

docker0 docker1

10.0.0.0/16

10.0.0.10 10.0.1.10

Let's build an overlay "manually"

Page 8: Deeper dive in Docker Overlay Networks

Overlay namespaces

docker0

br42

vxla

n4

2

eth0

docker1

br42

eth010.0.0.10 10.0.1.10

vxla

n4

2

Page 9: Deeper dive in Docker Overlay Networks

Creating the overlay namespaceip netns add overns

ip netns exec overns ip link add dev br42 type bridge

ip netns exec overns ip addr add dev br42 192.168.0.1/24

ip link add dev vxlan42 type vxlan id 42 proxy dstport 4789

ip link set vxlan1 netns overns

ip netns exec overns ip link set vxlan42 master br42

ip netns exec overns ip link set vxlan42 up

ip netns exec overns ip link set br42 up

create overlay NS

create bridge in NS

create VXLAN interface

move it to NS

add it to bridge

bring all interfaces up

setup_vxlan script

Page 10: Deeper dive in Docker Overlay Networks

docker0

C0 Namespace

br42

veth

eth0

docker1

C1 Namespace

br42

veth

eth0

eth0

192.168.0.10

eth0

192.168.0.20

10.0.0.10 10.0.1.10

vxla

n4

2

vxla

n4

2

Attach containers

Page 11: Deeper dive in Docker Overlay Networks

docker0

docker run -d --net=none --name=demo debian sleep infinity

ctn_ns_path=$(docker inspect --format="{{ .NetworkSettings.SandboxKey}}" demo)

ctn_ns=${ctn_ns_path##*/}

ip link add dev veth1 mtu 1450 type veth peer name veth2 mtu 1450

ip link set dev veth1 netns overns

ip netns exec overns ip link set veth1 master br42

ip netns exec overns ip link set veth1 up

ip link set dev veth2 netns $ctn_ns

ip netns exec $ctn_ns ip link set dev veth2 name eth0 address 02:42:c0:a8:00:10

ip netns exec $ctn_ns ip addr add dev eth0 192.168.0.10

ip netns exec $ctn_ns ip link set dev eth0 up

docker1

Same with 192.168.0.20 / 02:42:c0:a8:00:20

Create container without net

Create veth

Send veth1 to overlay NS

Attach it to overlay bridge

Send veth2 to container

Rename & Configure

Get NS for container

Create containers and attach them

plumb script

Page 12: Deeper dive in Docker Overlay Networks

Does it ping?

docker0:~$ docker exec -it demo ping 192.168.0.20

PING 192.168.0.20 (192.168.0.20): 56 data bytes

92 bytes from 192.168.0.10: Destination Host Unreachable

docker0:~$ sudo ip netns exec overns ip neighbor show

docker0:~$ sudo ip netns exec overns ip neighbor add 192.168.0.20 lladdr 02:42:c0:a8:00:20 dev vxlan42

docker0:~$ sudo ip netns exec overns bridge fdb add 02:42:c0:a8:00:20 dev vxlan42 self dst 10.0.1.10 \

vni 42 port 4789

docker1: Same with 192.168.0.10, 02:42:c0:a8:00:10 and 10.0.0.10

Page 13: Deeper dive in Docker Overlay Networks

docker0

C0 Namespace

br42

veth

eth0

docker1

C1 Namespace

br42

veth

eth0

eth0

192.168.0.20

eth0

192.168.0.20

10.0.0.10 10.0.1.10

vxla

n4

2

vxla

n4

2

PING

FDB

ARP

FDB

ARP

Result

Page 14: Deeper dive in Docker Overlay Networks

VXLAN Control Plane options

Page 15: Deeper dive in Docker Overlay Networks

vxlan vxlan

vxlan

Multicast239.x.x.x

ARP: Who has 192.168.0.2?

L2 discovery: where is 02:42:c0:a8:00:02 ?

Use a multicast group to send traffic for unknown L3/L2 addresses

PROS: simple and efficient

CONS: Multicast connectivity not always available (on public clouds for instance)

VXLAN Control Plane options - 1: Multicast

Page 16: Deeper dive in Docker Overlay Networks

Configure a remote IP address where to send traffic for unknown addresses

PROS: simple, not need for multicast, very good for two hosts

CONS: difficult to manage with more than 2 hosts

VXLAN Control Plane options - 2: Point-to-point

vxlan vxlan

Remote IP: point-to-pointSend everything to remote IP

Page 17: Deeper dive in Docker Overlay Networks

Do nothing, provide ARP / FDB information from outside

PROS: very flexible

CONS: requires a daemon and a centralized database of addresses

VXLAN Control Plane options - 3: User-Land

vxlan vxlan

daemon daemon

Manual (with a daemon modifying ARP/FDB)ARP: Do you know 192.168.0.2?L2: where is 02:42:c0:a8:00:02 ?

vxlan

daemon

Page 18: Deeper dive in Docker Overlay Networks

consul/swarm

docker0

eth0

192.168.0.100

C0 Namespace

br0

vxla

n

veth

eth0

docker1

C1 Namespace

br0

vxla

n

veth

eth0

192.168.0.Y

eth0

NAT

PING

dockerd dockerd

10.0.0.10 10.0.1.10

ARP

FDB

ARP

FDB

IPsrc: 10.0.0.11dst: 10.0.0.10

UDPsrc: Xdst: 4789

VXLANVNI

Original L2src: 192.168.0.Ydst: 192.168.0.100

Serf / Gossip

Docker Overlay control plane (3: User-land)

Page 20: Deeper dive in Docker Overlay Networks

Using BGP as a dynamic

control plane

Page 21: Deeper dive in Docker Overlay Networks

Rely on BGP eVPN address family to distribute L2 and L3 data

PROS: BGP is a standard to distribute addresses, supported by SDN vendors

CONS: limited Linux implementations, requires some BGP knowledge

VXLAN Control Plane- Option 4: BGP-EVPN

vxlan vxlan

bgpd bgpd

vxlan

bgpd

Endpoint data is distributed with BGP

Page 22: Deeper dive in Docker Overlay Networks

BGP in one slide

● Routing Protocol between network entities ("Autonomous Systems", AS)

Google ASN: 15169 / Amazon ASN: 16509

(both actually have more than one)

● BGP is an EGP: Exterior Gateway Protocol

IGP: Interior Gateway Protocol (OSPF, EIGRP, IS-IS)

IGP: next hop is the IP of a router

BGP: next hop is an Autonomous System

● BGP is what makes Internet work

● BGP scales very well

500 000+ prefixes for a full Internet table

Page 23: Deeper dive in Docker Overlay Networks

A quick BGP example

AS 1

AS 2

AS 3

AS 5AS 4

eBGP

iBGP

20.0.0.0/16

20.0.0.0/16: AS1

20.0.0.0/16: AS120.0.0.0/16: AS4-AS1

Shortest PATH?

20.0.0.0/16: AS5-AS4-AS1

20.0.0.0/16: AS2-AS1

AS: Autonomous System

eBGP: external (different AS)

iBGP: internal (same AS)

Page 24: Deeper dive in Docker Overlay Networks

iBGP

iBGP requires to mesh between all peers

n peers => n * (n-1) / 2 connections

50 peers => 1225 (49 of each host)

Route-reflectors simulate the mesh

More scalable and simpler

Possible to have more than one RR

RR

Distribute BGP information within an Autonomous System

Page 25: Deeper dive in Docker Overlay Networks

BGP EVPN

● Part of MP-BGP (multi-protocol BGP: not only IP prefixes)

● Announce VXLAN information instead of IP prefixes

L3: IP addresses of VXLAN endpoints (VTEP)

L2: Location of MAC addresses

● BUM (Broadcast, Unknown, Multicast) traffic unicasted to all VTEPs

● Get the scalability of BGP

Page 26: Deeper dive in Docker Overlay Networks

10.0.0.0/16

docker0: 10.0.0.10

Environment

RR1 RR2

quagga-

rrquagga-

rr

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

Page 27: Deeper dive in Docker Overlay Networks

docker0:~$ docker run -t -d --privileged --name quagga -p 179:179 --hostname docker0 \

-v $(pwd)/quagga:/etc/quagga cumulusnetworks/quagga (modify routing/forwarding)

router bgp 65000

bgp router-id 10.0.0.10

no bgp default ipv4-unicast

neighbor reflectors peer-group

neighbor reflectors remote-as 65000

neighbor reflectors capability extended-nexthop

neighbor 10.0.0.5 peer-group reflectors

neighbor 10.0.1.5 peer-group reflectors

address-family evpn

neighbor reflectors activate

advertise-all-vni

BGP configuration on Docker0

router bgp 65000

bgp router-id 10.0.0.5

bgp cluster-id 111.111.111.111

no bgp default ipv4-unicast

neighbor docker peer-group

neighbor docker remote-as 65000

bgp listen range 10.0.0.0/16 peer-group docker

address-family evpn

neighbor docker activate

neighbor docker route-reflector-client

BGP configuration on Route Reflectors

Creating our BGP clients on Docker hosts

Page 28: Deeper dive in Docker Overlay Networks

10.0.0.0/16

docker0: 10.0.0.10

What we have so far

RR1 RR2

quagga-

rrquagga-

rr

docker0

quaggaeth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

quaggaeth0

Page 29: Deeper dive in Docker Overlay Networks

Let's look at the BGP data

docker0:~$ docker exec -it quagga vtysh

docker0# show run

docker0# show bgp neighbors

docker0# show bgp evpn summary

BGP router identifier 10.0.0.10, local AS number 65000 vrf-id 0

Peers 2, using 42 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

quagga0(10.0.0.5) 4 65000 42 43 0 0 0 00:02:01 0

quagga1(10.0.1.5) 4 65000 42 43 0 0 0 00:02:01 0

docker0# show bgp evpn route

No EVPN prefixes exist

Page 30: Deeper dive in Docker Overlay Networks

Configuring VXLAN interfaces

sudo ./setup_vxlan 42 container:quagga dstport 4789 nolearning <= Only learn through EVPN

10.0.0.0/16

docker0: 10.0.0.10

RR1 RR2

quagga-

rrquagga-

rr

docker0

br42 vxlan42

quaggaeth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

br42vxlan42

quaggaeth0

Page 31: Deeper dive in Docker Overlay Networks

Let's look at the BGP data

docker0:~$ docker exec -it quagga vtysh

docker0# show bgp evpn route

BGP table version is 0, local router ID is 10.0.0.10

EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]

EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

Network Next Hop Metric LocPrf Weight Path

Route Distinguisher: 10.0.0.10:1

*> [3]:[0]:[32]:[10.0.0.10]

10.0.0.10 32768 i

Route Distinguisher: 10.0.1.10:1

*>i[3]:[0]:[32]:[10.0.1.10]

10.0.1.10 0 100 0 i

docker0# show evpn mac vni all

Page 32: Deeper dive in Docker Overlay Networks

Let's add containers and try pinging

10.0.0.0/16

docker0: 10.0.0.10

RR1 RR2

quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo: 192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

br42vxlan42

quagga

demo: 192.168.0.20

eth0

eth0

docker0:~$ sudo ./plumb br42@quagga demo 192.168.0.10/[email protected] 02:42:c0:a8:00:10

docker1:~$ sudo ./plumb br42@quagga demo 192.168.0.20/[email protected] 02:42:c0:a8:00:20

Page 33: Deeper dive in Docker Overlay Networks

What about BGP?

docker0:~$ docker exec -it quagga vtysh

docker0# show bgp evpn route

BGP table version is 0, local router ID is 10.0.0.10

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal

Origin codes: i - IGP, e - EGP, ? - incomplete

EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]

EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

Route Distinguisher: 10.0.1.10:1

*>i[2]:[0]:[0]:[48]:[02:42:c0:a8:00:20]

10.0.1.10 0 100 0 i

* i[3]:[0]:[32]:[10.0.1.10]

10.0.1.10 0 100 0 i

docker0# show evpn mac vni all

VNI 42 #MACs (local and remote) 2

MAC Type Intf/Remote VTEP VLAN

02:42:c0:a8:00:10 local veth0pldemo

02:42:c0:a8:00:20 remote 10.0.1.10

Page 34: Deeper dive in Docker Overlay Networks

10.0.0.0/16

docker0: 10.0.0.10

Overview

RR1 RR2

quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo: 192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

br42vxlan42

quagga

demo: 192.168.0.20

eth0

eth0Control plane

Data plane

Page 35: Deeper dive in Docker Overlay Networks

● Standard VXLAN address distribution (used on many routers)

● Full management of BUM traffic

ARP queries

Broadcasts (DHCP)

Multicast (Discovery, keepalived)

● BUM traffic is unicasted (not efficient)

Possible optimizations: ARP suppression (Cumulus Quagga)

What's interesting about this setup?

Page 36: Deeper dive in Docker Overlay Networks

What can we do with this?

Page 37: Deeper dive in Docker Overlay Networks

What if we want a second Overlay?

10.0.0.0/16

docker0: 10.0.0.10

RR1 RR2quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

br66 vxlan66

docker0

br42vxlan42

quagga

demo192.168.0.10

eth0

eth0

br66vxlan66

demo66192.168.66.10

eth0demo66

192.168.66.20

eth0

docker0:~$ sudo ./setup_vxlan 66 container:quagga dstport 4789 nolearning

docker0:~$ docker run -d --net=none --name=demo66 debian sleep infinity

docker0:~$ sudo ./plumb br66@quagga demo66 192.168.66.10/24 02:42:c0:a8:66:10

Page 38: Deeper dive in Docker Overlay Networks

What about BGP?

docker0:~$ docker exec -it quagga vtysh

docker0# show evpn vni

Number of VNIs: 2

VNI VxLAN IF VTEP IP # MACs # ARPs # Remote VTEPs

42 vxlan42 0.0.0.0 2 0 1

66 vxlan66 0.0.0.0 2 0 1

docker0# show evpn mac vni all

VNI 42 #MACs (local and remote) 2

MAC Type Intf/Remote VTEP VLAN

02:42:c0:a8:00:10 local veth0pldemo

02:42:c0:a8:00:20 remote 10.0.1.10

VNI 66 #MACs (local and remote) 2

MAC Type Intf/Remote VTEP VLAN

02:42:c0:a8:66:10 local veth0pldemo66

02:42:c0:a8:66:20 remote 10.0.1.10

Page 39: Deeper dive in Docker Overlay Networks

10.0.0.0/16

docker0: 10.0.0.10

RR1 RR2quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

br42vxlan42

quaggaeth0

Taking advantage of broadcast: DHCP

dhcp192.168.0.254

eth0demo

192.168.0.20

eth0demodhcp

192.168.0.10?

eth0

Page 40: Deeper dive in Docker Overlay Networks

Configuring DHCP

docker0:~$ docker run -d --net=none --name dhcp -v "$(pwd)/dhcp":/data networkboot/dhcpd eth0

docker0:~$ sudo ./plumb br42@quagga dhcp 192.168.0.254/24

docker1:~$ docker run -d --net=none --name=demodhcp debian sleep infinity

docker1:~$ sudo ./plumb br42@quagga demodhcp dhcp

docker1:~$ docker exec -it demodhcp ping 192.168.0.10

PING 192.168.0.10 (192.168.0.10): 56 data bytes

64 bytes from 192.168.0.10: icmp_seq=0 ttl=47 time=1.566 ms

subnet 192.168.0.0 netmask 255.255.255.0 {

range 192.168.0.100 192.168.0.200;

option routers 192.168.0.1;

option domain-name-servers 8.8.8.8;

}

DHCP configuration

Page 41: Deeper dive in Docker Overlay Networks

10.0.0.0/16

RR1 RR2quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker0

br42vxlan42

quaggaeth0

Getting out of our Docker environment

dhcp192.168.0.254

eth0demo

192.168.0.20

eth0client

192.168.0.100

eth0

quagga

br42

vxlan42

vethgw192.168.0.1

docker0: 10.0.0.10 docker1: 10.0.1.10gateway0: 10.0.0.20

Page 42: Deeper dive in Docker Overlay Networks

Getting out of our Docker environment

gateway0:~$ ./setup_vxlan 42 host dstport 4789 nolearning

gateway0:~$ ip link add dev vethbr type veth peer name vethgw

gateway0:~$ ip link set vethbr master br42

gateway0:~$ ip addr add 192.168.0.1/24 dev vethgw

gateway0:~$ ping 192.168.0.10

PING 192.168.0.10 (192.168.0.10): 56 data bytes

64 bytes from 192.168.0.10: icmp_seq=0 ttl=47 time=0.866 ms

br42

vethgw192.168.0.1

vxlan42

vethbr

Page 43: Deeper dive in Docker Overlay Networks

10.0.0.0/16

RR1 RR2quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker0

br42vxlan42

quaggaeth0

Getting out of VXLAN / Quagga

dhcp192.168.0.254

eth0demo

192.168.0.20

eth0client

192.168.0.100

eth0

quagga

br42

vxlan42

vethgw192.168.0.1

eth0

Non-VXLAN

host

10.0.0.30

route

10.0.0.0/16 192.168.0.0/24NAT

docker0: 10.0.0.10 docker1: 10.0.1.10gateway0: 10.0.0.20

Page 44: Deeper dive in Docker Overlay Networks

Getting out of VXLAN / Quagga

gateway0:~$ echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward

gateway0:~$ iptables -t nat -A POSTROUTING ! -d 10.0.0.0/16 -s 192.168.0.0/24 -o eth0 -j MASQUERADE

docker1:~$ docker exec -it demodhcp ping 192.168.0.1 <= Local (VXLAN)

docker1:~$ docker exec -it demodhcp ping 10.0.0.30 <= Routed

docker1:~$ docker exec -it demodhcp ping 8.8.8.8 <= NATed

simple1:~$ ping 192.168.0.1

simple1:~$ ping 192.168.0.10

eth0

routeNAT

Page 45: Deeper dive in Docker Overlay Networks

10.0.0.0/16

docker0: 10.0.0.10

RR1 RR2quagga-

rrquagga-

rr

docker0

br42 vxlan42

quagga

demo192.168.0.10

eth0

eth0

10.0.0.5 10.0.1.5

docker1: 10.0.1.10

docker0

br42vxlan42

quaggaeth0

Another nice thing we can do

dhcp192.168.0.254

eth0demo

192.168.0.20

eth0demodhcp

192.168.0.100

eth0

gateway0: 10.0.0.20

quaggabr42

vxlan42

vethgw192.168.0.1

eth0

Non-VXLAN

host

10.0.0.30

routeNAT

QEMU, dhclient192.168.0.10x

tap0

Page 46: Deeper dive in Docker Overlay Networks

What could a real-life setup look like?

RR2

Docker

quagga

Docker

quagga

Docker

quagga

Docker

quagga

Docker

quagga

Docker

quagga

Docker

quagga

Docker

quagga

BGP/EVPN

Router

Standard

host

Standard

host

Standard

host

Standard

host

VXLAN

Routing

Routes from non-VXLAN infraRoutes to VXLAN networks

RR1

Page 47: Deeper dive in Docker Overlay Networks

How does it compare to other solutions?

Data plane Control Plane

Swarm Classic VXLAN External KV Store (Consul / Etcd)

SwarmKit VXLAN Swarmkit (Raft / Gossip implementation)

Flannel host-gw Routing Etcd / Kubernetes API

Flannel VXLAN VXLAN Etcd / Kubernetes API

Calico Routing / IPIP Etcd / BGP (IP prefixes)

Weave Classic Custom Custom

Weave Fast Datapath VXLAN Custom

Contiv VXLAN, Routing, L2 Etcd / BGP (IP and maybe eVPN)

Disclaimer: almost no experience with any (from documentation and discussions mostly)

Page 48: Deeper dive in Docker Overlay Networks

Perspectives

● FFRouting

Quagga fork

Cumulus has switched to FFRouting and merged EVPN support

● Open vSwitch

Alternative to linux native bridge and VXLAN

(Possibly) better performances and more features

Not sure how Quagga/FFRouting would integrate with Open vSwitch

● Performances

Measure impact of VXLAN

Test VXLAN acceleration when available on NICs

● CNI plugin (to test on Kubernetes and mostly for learning purposes )

Page 49: Deeper dive in Docker Overlay Networks

Thank you!

Questions?

https://github.com/lbernail/dockercon2017

@lbernail