35
© 2014 VMware Inc. All rights reserved. L2 over L3 Encapsulations VXLAN, NVGRE, STT, Geneve, etc. Motonori Shindo Network & Security Business Unit VMware July. 13, 2014

L2 over l3 ecnaspsulations (english)

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: L2 over l3 ecnaspsulations (english)

© 2014 VMware Inc. All rights reserved.

L2 over L3 EncapsulationsVXLAN, NVGRE, STT, Geneve, etc.

Motonori ShindoNetwork & Security Business UnitVMwareJuly. 13, 2014

Page 2: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 2

Tunneling vs Encapsulation

• Tunneling Protocols– Signaling + Encapsulation

• Usually equips some sort of “signaling” mechanism, which manages the tunnel.• Encapsulation is another part of tunneling protocol.

– E.g. ) PPTP, L2TP, IPsec (IKE), etc.

• Encapsulations– A way of wrapping (i.e. encapsulating) something

– E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….)

• What I’m going to talk about today is “encapsulation”

• I am not going to talk about “control plane” today (though it’s very important)

Page 3: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 3

L2 over L3 encapsulations typically seen in Network Virtualization

• GRE (Generic Routing Encapsulation) *

• VXLAN (Virtual Extensible LAN)

• NVGRE (Network Virtualization using GRE)

• STT (Stateless Transport Tunneling)

* Strictly speaking GRE is not an L2 over L3 encapsulationas it can encapsulate not only L2 but also L3

Page 4: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 4

VXLAN

• Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat – draft-mahalingam-dutt-dcops-vxlan-09.txt

• Extends VLAN ID (12bit) to VNI (24bit)

• Encapsulation by UDP/IP– L3 overlay

– Multipath

• Encapsulates Ethernet Frame only

• Simple so that it can be implemented by hardware

• Forming an “ecosystem”

Page 5: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 5

VXLAN Header

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|R|R|R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 6: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 6

Fabric Network

• Service Oriented Architecture

• 2 or 3 layer network to Leaf & Spine

• High density and bandwidth required

• Layer 3 ECMP

• No oversubscription

• Low and uniform delay characteristic

• Wire & configure once network

• Uniform network configuration

WAN/Internet

WAN/Internet

Page 7: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 7

Multipath Network

• Background– In order to support significant increase of East-West traffic, Fabric Network based on multipath is

getting popular

• Requisites– A given flow must traverse over the same paths

– Must have enough “entropy” to make an efficient use of fabric

Page 8: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 8

Multipath by VXLAN

VXLAN (8)UDP (8)IP (20)

Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) *

dst port = 4789src port = Hash()

Ether IP TCP Data

original packet

* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.

Page 9: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 9

VXLAN Ecosystem

• Switch / Router– Arista, Brocade, Cisco, Cumulus, DELL, HP,

Huawei, Juniper, Open vSwitch, Pica8

• Operating System– Linux, VMware

• Appliances– A10, Citrix F5

• Testers– IXIA, Spirent

• ASIC / NIC – Broadcom, Intel (Fulcrum), Emulex, Mellanox

• Cloud Orchestrator– CloudStack, OpenStack, vCAC

Note: this is not an exhaustive list

This is a list of venders who participated in VXLAN interoperability test at INTEROP Tokyo 2014, which went all successful.

Page 10: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 10

NVGRE

• Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex– draft-sridharan-virtualization-nvgre-04.txt

• 24bit Virtual Subnet ID (VSID) and 8bit FlowID

• Encapsulation is GRE as is:– Put VSID + FlowID in Key Field

– L3 Overlay

– Multipath possible (in theory) but difficult

• Windows affinity

Page 11: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 11

NVGRE Header

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 12: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 12

Multipath in NVGRE

GRE (8)IP (20)

Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) *

FlowID = Hash()

Ether IP TCP Data

Original Packet

Router / Switch needs to lookup the Key Field in GRE header to do an ideal multipath!

* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.

Page 13: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 13

NVGRE ecosystem

• Switch / Router– Huawei

– Arista and Brocade claim they are going to support but product hasn’t come out yet??

• Operating System– Microsoft (Windows Server 2012 R2)

• Appliances– F5

• ASIC / NIC – Emulex Mellanox

• Cloud Orchestrator– System Center 2012 R2

Note: this is not an exhaustive list

Page 14: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 14

STT (Stateless Transport Tunneling)

• L2 over L3 encapsulation proposed by VMware– draft-davie-stt-06.txt

• Why yet another L2 over L3 encapsulation ?– Performance

– Richer context information

– Multipath

– Software oriented

Page 15: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 15

TSO (TCP Segmentation Offload)

• Modern NIC (shipped within 4-5 years) equips various hardware acceleration features:– RSS, GSO/TSO, Checksum Offload, etc.

• With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in software)– Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the

number of packet processing (i.e. interrupt) hence much less context switches needed.

• To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”!

Page 16: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 16

Encapsulation / Segmentation in STT

STT (18)TCP’ (20)IP (20)

Payload 1STT (18)TCP’ (20)IP (20)

Payload 2TCP’ (20)IP (20)

Payload nTCP’ (20)IP (20)

L2 Frame (up to 64K)

・・・・

SegmentationBy

Hardware

Page 17: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 17

TCP-like Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fields marked as * are repurposed in STT

Page 18: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 18

STT Header

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | L4 Offset | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max. Segment Size | PCP |V| VLAN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Context ID (64 bits) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | |

Page 19: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 19

Throughput and CPU Utilization

Linux Bridge OVS Bridge OVS-GRE OVS-STT0

1

2

3

4

5

6

7

8

9

10

0

10

20

30

40

50

60

70

80

90

100

スループット CPU (Receive) CPU (Send)

(Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/

Page 20: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 20

Multipath in STT

STT (18)TCP’ (20)IP (20)

Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.)

dst port = 7471 (TBD)src port = Hash()

Ether IP TCP Data

Original Packet

* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.

Page 21: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 21

Geneve (Generic Network Virtualization Encapsulation)

• New encapsulation being proposed by VMware, Microsoft, RedHat, Intel– draft-gross-geneve-00.txt

• Goals– Extensibility

• Service Chaining, Metadata support, etc.

– Leverage NIC offload

– Above two at the same time! (each one is straightforward, but two at the same time is difficult)

• Highlights– Information can be added as Option field in TLV formart

– Format carefully designed so that NIC can perform TSO

– OAM and Criticality (indicating parsing the option fields mandatory)

Page 22: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 22

Geneve Header & Option HeaderGeneve Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| Opt Len |O|C| Rsvd. | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Length Options | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Class | Type |R|R|R| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Option Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 23: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 23

Geneve Implementation

• Recently implemented in Open vSwitch ( OVS ) and merged into master branch on GitHub

– VNI can be specified

– Geneve Options can’t be specified (at this point)

– Can’t mark OAM flag?? (I tried but didn’t work)

– Looks like Critical flag supported as long as critical options are present

• Geneve dissector for Wireshark also implemented and merged to master branch of Github

• Geneve-aware NIC is not available yet

Page 24: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 24

Running Geneve on Open vSwtich

host-1:~$ sudo ovs-vsctl add-br br0 host-1:~$ sudo ovs-vsctl add-br br1 host-1:~$ sudo ovs-vsctl add-port bra eth0 host-1:~$ sudo ifconfig eth0 0 host-1:~$ sudo dhclient br0 host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0 host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface \ geneve1 type=geneve options:remote_ip=192.168.203.149

host-2:~$ sudo ovs-vsctl add-br br0 host-2:~$ sudo ovs-vsctl add-br br1 host-2:~$ sudo ovs-vsctl add-port bra eth0 host-2:~$ sudo ifconfig eth0 0 host-2:~$ sudo dhclient br0 host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0 host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface \ geneve1 type=geneve options:remote_ip=192.168.203.151

Page 25: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 25

Dissecting Geneve Packets by Wireshark (1)

Page 26: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 26

Dissecting Geneve Packets by Wireshark (2)

Page 27: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 27

Information about Geneve

• English– http://tools.ietf.org/html/draft-gross-geneve-00

– http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/

– http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation-protocol-advances-video.html

– http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf-war-with-GENEVE

– http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead

– http://blog.shin.do/2014/07/geneve-on-open-vswitch/

• Japanese– http://blog.shin.do/2014/05/geneve-encapsulation/

– http://blog.shin.do/2014/07/geneve-on-open-vswitch/

Page 28: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 28

Geneve replaces VXLAN / STT / NVGRE ?

• Geneve replaces VXLAN ?– NO

– VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else

– VMware will continue to support VXLAN and ecosystem partners

• Geneve replaces STT?– In short term, NO. In the long run, maybe if

• Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT today.

• Geneve replaces NVGRE ?– In short term, NO. In the long run, maybe if

• Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today.

Page 29: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 29

Encapsulation is like a wire, right cable in the right place

http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/ 

Page 30: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 30

World is not that simple • Some people are against Geneve

• Their claims are more or less as follows:

– What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static tunneling or VXLAN) as is or with a small extension !?

– Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be independent from encapsulation !?

– 24bit as VNI not long enough !?

Page 31: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 31

L2TPv3 static tunneling

• L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3 static tunneling” where configuration is made at both ends manually.

• L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS and Linux (l2tpd) have L2TPv3 static tunneling.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Session ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cookie (optional, maximum 64 bits)... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 32: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 32

L2TPv3 static tunneling as a L2 over L3 encapsulation

• Session ID (32bit) corresponds to VNI

• L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better.

• No explicit field for context information (metadata, etc.). It has to be configured manually on both ends (if possible) and express it implicitly as a part of Session ID– Therefore 32bit Session ID can’t be used entirely for VNI

• Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose. Many L2TPv3 implementations still have this “offset” option for backward compatibility to L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd support the offset field.

Page 33: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 33

VXLAN Generic Protocol Extension (a.k.a. eVXLAN)

• Proposed by Cisco 、 Huawei 、 Intel 、 Microsoft– draft-quinn-vxlan-gpe-03.txt

• An extension to VXLAN– Support protocols other than Ethernet

• IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04)– Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits.

– OAM support

– Version field

• Used by Cisco ACI

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 34: L2 over l3 ecnaspsulations (english)

CONFIDENTIAL 34

VXLAN-gpe as L2 over L3 encapsulation

• Mostly identical to VXLAN– VNI length (24bits)

– Multipath property

– Hardware friendliness

• The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network service header)

• No further extensibility

Page 35: L2 over l3 ecnaspsulations (english)

Thank You!

35