Upload
proidea
View
205
Download
22
Tags:
Embed Size (px)
DESCRIPTION
Alexis Dacquay – is CCIE with over 10 years experience in the networking industry. He has in the past been designing, deploying, and supporting some large corporate LAN/WAN networks. He has in the last 4 years specialised in high performance datacenter networking to satisfy the needs of cloud providers, web2.0, big data, HPC, HFT, and any other enterprise for which high performing network is critical to their business. Originally from Bretagne, privately a huge fan of polish cuisine. Topic of Presentation: Architectures for Universal Data Centre Networks, topologies and overlays Language: English Abstract: Network integration with single- and multi-hypervisor virtualization environments.
Citation preview
1 Tuesday, 30 September 14 PLNOG 13 - Krakow
Architecture for Universal DC Networks Topologies & Overlays
Alexis Dacquay ([email protected])
2 Tuesday, 30 September 14 PLNOG 13 - Krakow
Universal architecture
§ One design, tunable for workload, deterministic any-to-any performance
§ Integrated detailed telemetry for real-time visibility
§ Open standards based to avoid technology cul-de-sac
§ Simple to design, capacity plan, scale and troubleshoot
§ Open management tools/techniques
§ Enables continuous innovation and “pay as you grow” scale
3 Tuesday, 30 September 14 PLNOG 13 - Krakow
Infrastructure specific to specific apps Applications abstracted from infrastructure
Vertically integrated, proprietary stacks Open technologies, maximum generalisation
Vendor lock-in, Forklift refreshes Best-of-breed, continuous innovation
Multiple management domains Homogenous, universal automation
Complex and custom architectures Simple, repeatable and scalable architectures
IT becomes the service provider
Cloud thinking in a nutshell
4 Tuesday, 30 September 14 PLNOG 13 - Krakow
connecting the cloud Cloud Architecture: Universal Platform
Cloud
IP Storage
VM farms
Big Data
VDI
Web 2.0
HFT
HPC
Legacy
5 Tuesday, 30 September 14 PLNOG 13 - Krakow
Cloud Architecture: End to End Automation
hypervisor
6 Tuesday, 30 September 14 PLNOG 13 - Krakow
Evolution not Revolution • Gradual shift/unification of skills base • Phase-out of legacy applications
manually configured
ad-hoc bash perl scripting
automated provisioning and monitoring Puppet, chef, Ansible Other IT frameworks
Physical + virtual, cloud orchestration
Progressive technology adoption
DevOps Meet NetOps!
7 Tuesday, 30 September 14 PLNOG 13 - Krakow
Importance of the Underlay Network
8 Tuesday, 30 September 14 PLNOG 13 - Krakow
Clos Principals – Avoiding Suboptimal Designs
§ Multi-Tiers - Non-equal performance - Unequal Hop-count - Cumulated o/s can be high
3 hop
5 hop
8:1 Oversubscription
4:1 Oversubscription Cumulated Oversubscription is 32:1
§ The right physical topology… - Physical Architecture Clos Leaf/Spine - Consistent any-to-any latency/
throughput - Consistent performance for all racks - Fully non-block architecture if required - Simple scaling of new racks
Spine Layer 10GbE/40GbE/100GbE Layer 2/3
Leaf layer 40GbE/10Gbe/1Gbe Layer 2/3
9 Tuesday, 30 September 14 PLNOG 13 - Krakow
§ Active-active Layer2 topologies possible without new protocols - Various Multi-chassis Link Aggregation, use known and trusted standard LACP protocol. - Achieved without new hardware or any new operational challenges - But at large scale same challenges as the new protocols, VLANs and MAC explosion - Layer 2 can scale to some level (considering only port counts) without requiring new
protocols and hardware
Layer 2 Leaf-Spine – MLAG Design
Data Centre Transport
But!...The layer 2 approach only targets the VMobility challenge, what about Layer2 Scale, Multi-tenancy, Simplicity and Big Data environments ?
600#
1152#
1440#
4224#
8832#
13440#
27264#
0# 5000# 10000# 15000# 20000# 25000# 30000#
7124#Leaf/7050064#Spine#
7050#Leaf/7050052#Spine#
7050#Leaf/7050064#Spine#
7050#Leaf/7504#Spine#
7050#Leaf/7508#Spine#
7050#Leaf/75040Gen2#Spine#
7050#Leaf/75080Gen2#Spine#
10GbE&nodes&interconnected&
10G&nodes:&Scale&with&Arista&Leaf/Spine&Design:&L2&MLAG&Device scale
> 1000x10G ports
L2-only Clos topology scaling depending on devices’ density/port count (with oversubscription)
Device scale From 48 x10G ports
10 Tuesday, 30 September 14 PLNOG 13 - Krakow
§ Build a CLOS Fabric - Add New protocols to widen the scope of V-
mobility - TRILL based/IEEE 802.1aq (SPB) solutions - L3 routing (IS-IS) model for active-active
forwarding
§ Issues with large L2 networks. - Can introduce MAC address explosion issues - VLAN limited(4K), no overlapping VLANs - New hardware with potential interop issues - New protocols for the core/backbone, the
unknown, new operational and troubleshooting challenges.
Data Centre Transport
Physical CLOS topology and widen the scope of VM mobility with a new layer 2 technology
Single large L2 Domain
11 Tuesday, 30 September 14 PLNOG 13 - Krakow
§ Segmented Layer 3 design - Routed traffic at the top of the rack - OSPF/BGP between Leaf and Spine - Proven and trusted protocols for scale - Mature Open standards for interoperability - Minimise the size of the Layer 2 domain - Reducing the size of the fault & broadcast - Standard scalable model for Virtualized and
non-virtualised solutions Subnet/VLAN A
Layer 2 Domain Subnet/VLAN B Layer 2 Domain
Subnet/VLAN C Layer 2 Domain
Subnet/VLAN D Layer 2 Domain
Scope of VM Mobility restricted to within the rack
Utilize tried and proven protocols and Management tools
Data Centre Transport
For Scale, industry convergence on a Layer 3 infrastructure
Layer 3 between
Leaf & Spine
12 Tuesday, 30 September 14 PLNOG 13 - Krakow
Data Centre Transport
§ VM mobility for Compute optimization and resilience - For stateful vMotion/Live migration, the VM IP address must be preserved after the
Vmotion - Ensuring zero disruption to any client communicating with the apps residing on the
motioned VM - To ensure IP address preservation a VM can thus only be motioned to an ESXi host
residing in the same subnet/VLAN.
128.218.10.4 128.218.10.4
128.218.10.0/24 VLAN 10
VM mobility and Virtualization place a requirement on the physical network
13 Tuesday, 30 September 14 PLNOG 13 - Krakow
Overlay - VXLAN Overview
§ What is an Overlay Network - Abstracts the virtualized environment from the
physical topology - Constructs L2 tunnels across the physical
infrastructure - Tunnels provide connectivity between physical
and virtual end-points
§ Physical Infrastructure - Transparent to the overlay technology - Allows the building of L3 infrastructure - Physical provide the bandwidth and scale for the
communication - Removes the scaling constraints of the physical
from the virtual
Layer 3 Leaf/Spine
VXLAN network
Logical layer 2 tunnels across the physical Infrastructure
14 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN as Overlay Network
15 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN Refresher
§ Standardized overlay technology for encapsulating layer 2 traffic on top of an IP fabric
IP Fabric
VTEP A VTEP B
VNI 5000
Host 1 Host 2
Layer 2 Layer 2 over Layer 3 Layer 2
16 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN Components
§ VTEP encapsulates the Ethernet frame in a VXLAN header - 24-bit VNI identifier to defining the VXLAN Layer 2 domain of the frame (8 bytes) - UDP header, SRC port hash of the inner Ethernets header, Dst port = 4789 (8 bytes) - Allowing load-balancing across a ECMP IP fabric which is VXLAN transparent - IP address header SRC and DEST of the local and remote VTEP (20 bytes)
Ethernet header local VTEP MAC and default router MAC (14 bytes- 4 optional)
Original Ethernet frame 50 byte VXLAN tunnel header
VNI 24-bit UDP header Dst VTEP
IP
Remote VTEP
IP
VTEP MAC
Next-hop Mac
Local Host MAC
Remote Host MAC
local Host IP
Remote Host MAC
Layer 3 network core forwards packets based on the IP/UDP info
UDP Src Port hash of inner Frame for entropy across ECMP network
17 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN Control Plane - Unicast
§ Head-end replication (HER) Mode - Removes the reliance on a multicast control plane for flooding and MAC learning - VTEPs configured with a “flood list” of the remote VTEPs within the VNI , Broadcast/
Multicast traffic replicated to the configured VTEP list for the VNI
BUM traffic received locally on VTEP
VTEP-1 VTEP-2 VTEP-3 VTEP-4
Unicast to VTEP-4
VTEP flood list on VTEP-1 VNI 2000 à VTEP-3 VNI 2000 à VTEP-4
VTEP flood list on VTEP-3 VNI 2000 à VTEP-1 VNI 2000 à VTEP-4
VTEP flood list on VTEP-4 VNI 2000 à VTEP-1 VNI 2000 à VTEP-3
VTEP creates a unicast frame for each VTEP in the flood-list
of the specific VNI
BUM* traffic
1 VTEP flood list manually configured on each VTEP for each VNI
2
3
4 4 VTEP learns inner MAC
and maps to the outer SRC IP (remote VTEP)
Separate unicast on the wire for each VTEP in the VNI
* BUM = Broadcasts, Unknown unicasts, Multicasts
18 Tuesday, 30 September 14 PLNOG 13 - Krakow
Bare Metal
Hardware VTEPs enable bare-metal servers to connect to virtualized workloads
Storage Hardware VTEPs can encap/decap at line rate 10/40/100Gb high performance storage
Services
VTEPs integrate with: Physical (HW VTEP) and virtual instances of network services
VMs
VTEPs can support VMs across multiple versions of virtualisation platforms
Network Virtualization - Capabilities
19 Tuesday, 30 September 14 PLNOG 13 - Krakow
Overlay: VNI 2000 - 150.100.100.x/24
• VTEPs can automates VXLAN / MAC learning • Ideally, automated provisioning of new workloads,
segments, and tenants (service chaining) • Integration with orchestrators automate many of the
labor intensive network workflows
Network Virtualization - Optimization/Simplification
20 Tuesday, 30 September 14 PLNOG 13 - Krakow
VTEP
VTEP VTEP Subnet
10.10.10.0/24 VTEP
Subnet 10.10.20.0/24
VTEP 10.10.10.1
VTEP 10.10.20.1
VTEP 10.10.30.1/32
VTEP 10.10.40.1
Spine1 routing table 10.10.10.0/24 à ToR1 10.10.20.0/24 à ToR2 10.10.30.1/32 à ToR3 10.10.40.1/32 à ToR4
Hardware VTEP announce only the loopback in OSPF
VLAN 10 192.168.10.4
VLAN 10 192.168.10.5
VLAN 20 192.168.20.4
VLAN 20 192.168.20.5
VLAN 20 192.168.20.6
VLAN 10 192.168.10.6
VLAN 200 192.168.20.7
VLAN 100 192.168.10.6
VLAN 10 192.168.10.9
VNI-300
ECMP Default Gateway for Physical servers
VRF-1 VRF-2
VNI-200
VNI-100 VNI-200
Virtual Servers Virtual Servers Physical Server
(Bare Metal Server)
VLAN translation on VTEP Tenant DGW Tenant DGW
With a Layer 2 only Service the Tenant Networks are abstracted from the IP Fabric, SP cloud model
Overlay Network
21 Tuesday, 30 September 14 PLNOG 13 - Krakow
Overlay Network
§ Overlay Network provides transparency - Scalable Layer 2 services across a layer 3
transport - Decouples the requirements of the Virtualized
from the constraints of the physical network
- Tenant network transparent to the transport for Layer 3 scale
- Multi-Tenancy with 24-bit tenancy ID and overlapping VLANs
- Network becomes a flexible bandwidth platform
Scalable, multi-tenant Layer 2 services transparent to the Layer 3 transport network
Physical Infrastructure
Overlay network
VNI 3000
VNI 2000
VNI 1000
Layer 3 Transport
Transparent L2 Services
22 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN Overlay Integration with the Underlay
23 Tuesday, 30 September 14 PLNOG 13 - Krakow
VXLAN Deployment Solutions
Hardware VTEP • Manually or automated VTEP
endpoints
• Traffic flooded via the HER distribution
• Flow based MAC learning
• No need for Multicast in the IP fabric: Unicast only
• Suitable for DCI solutions and small scale intra-DC solution due to manual config
Software VTEP • Automated VTEP endpoints
• Network virtualisation controller configure virtual endpoints
• For virtual switches only
• Also support other protocols than VXLAN (e.g. GRE, etc)
• No communication between virtual and Physical equipment
HW + SW VTEP § Automated VTEP endpoints
§ Driven by an orchestrator (Cloud Management Platform)
§ CMP integrated with a third-party network virtualisation controller (NSX, Nuage, Plumgrid, etc), or use dedicated drivers (e.g. OpenStack)
§ MAC address learning between Software and hardware VTEPs.
§ VNI provisioning via centralized controller
§ Solution for scalable DCs with HW to SW VTEP automation
VTEP-1
No Multicast Requirement*
Small Scale DC and DCI solution Automated VXLAN – Virtual only Automation and integration with third-party controller and Cloud Manamagent Platform
*check the integration roadmap with your vendors of choice
24 Tuesday, 30 September 14 PLNOG 13 - Krakow
MP-BGP
VXLAN Integration Cloud Management Platform
Network Virtualisation Controller
etc
etc
Network information Mechanisms: APIs, XMPP, …
Port config, MAC, VLAN, VXLAN (VNI, VTEPs) Mechanisms: OVSDB, OpenFlow, APIs, …
Virtual switches
Physical network
IP Fabric
Network (Software and Hardware)
HW+SW networks operates: • Head-end replication (direct or proxy) • BUM to remote VTEPs (SW+HW) • HW+SW MAC learning on connected VNI • (or pre-provisioned)
25 Tuesday, 30 September 14 PLNOG 13 - Krakow
Conclusion
26 Tuesday, 30 September 14 PLNOG 13 - Krakow
End-to-End single vendor Solution
• Vendor Lock-in • Proprietary • Vertically
integrated solution
End-to-End Open Source Solution
• Open Source but….
• Not a completely tested solution
• Roadmap is unclear or scattered
• Customer bears brunt of integration
Integrated HW + SW vendors
• Truly Open Standards
• Well-tested • Focused on
Customer Deployments and Use Cases
• Aligned Roadmaps
VXLAN integration solutions
27 Tuesday, 30 September 14 PLNOG 13 - Krakow
Thank you