Upload
brent-bryan
View
230
Download
1
Tags:
Embed Size (px)
Citation preview
Create a dynamic datacenter with software-defined networking
• Agility in deploying new services• IT is frequently the bottleneck various Business Groups• How do you become the HERO!?
• Flexibility in deployment• Once a workload is deployed, it can be hard to move it around within the
datacenter and across clouds• How can you unshackle your workloads?
• Availability• Infrastructure issues frequently cause services to go down causing SLAs to get
missed• How can you provide an even better SLA on the same infrastructure!?
• Security• Attacks frequently gets in through one host (an unpatched one!), and then spreads
across the rest of the infrastructure.• How can you model your network to make this MUCH harder?
Common Customer Challenges
We faced the same challenges
At a much higher scale!
Fortune 500 using Azure
>57%
>50 TRILLION
storage objects
425MILLIONAAD users
1 out of 5 VMs
are Linux VMs
>5 MILLION
requests/sec
1 TRILLIONEvent Hubs
events/month
1,400,000SQL databases in Azure
Azure Scale Momentum
>18BILLIONAzure Active Directory authentications/week
New Azure customers a month
>90,000
So, how did we do it?
Start by finding the right abstractions
SDN: Building the right abstractions for ScaleAbstract by separating
management, control, and data planes Azure
Resource Manager
Controller
Switch (Host)
Management Plane
Control Plane
Management plane
Create a tenant ACL
Control plane Plumb these tenant ACLs to these switches
Data plane Apply these ACLs to these flows
Example: ACLs
• Data plane needs to apply per-flow policy to millions of VMs
• How do we apply billions of flow policy actions to packets?
• If every host performs all packet actions for its own VMs, scale is much more tractable
• Use a tiny bit of the distributed computing power of millions of servers to solve the SDN problem• If millions of hosts work to implement billions of flows, each host only
needs thousands
• Build the controller abstraction to push all SDN to the host
Solution: Host Networking
Virtual Networks on the Host• A Virtual Network is essentially a set
of mappings from a customer defined address space (CAs) to provider addresses (PAs) of hosts where VMs are located
• Separate the interface to specify a Virtual Network from the interface to plumb mappings to switches via a Network Controller
• All CA<-> PA mappings for a local VM reside on the VM’s host, and are applied there
Azure Frontend
Controller
Customer Config
VNet Description (CAs)
L3 Forwarding Policy(CAs <-> PAs)
VMSwitchVMSwitch
Blue VMsCA Space
Green VMs
CA Space
Northbound API
Southbound API
Hyper-V Hyper-V
Frontend
Controller
Node1: 10.1.1.5
Blue VM110.1.1.2
Green VM1
10.1.1.2
Azure VMSwitch
Node2: 10.1.1.6
Red VM110.1.1.2
Green VM2
10.1.1.3
Azure VMSwitch
Node3: 10.1.1.7
Green S2S GW10.1.2.1
Azure VMSwitch
Green Enterprise Network10.2/16
VPN GW
Customer Config
VNet Description
L3 Forwarding Policy
SecondaryControllers
ConsensusProtocol
Controllers
Forwarding Policy: Traffic to on-prem
Node1: 10.1.1.5
Blue VM110.1.1.2
Green VM1
10.1.1.2
Azure VMSwitchSrc:10.1.1.
2Dst:10.2.0.
9
Src:10.1.1.2
Dst:10.2.0.9
Policy lookup:10.2/16 routes toGW on host withPA 10.1.1.7
Controller
Src:10.1.1.5
Dst:10.1.1.7
GRE:GreenSrc:10.1.1.
2Dst:10.2.0.
9
L3 Forwarding Policy
Node3: 10.1.1.7
Green S2S GW
10.1.2.1
Azure VMSwitch
Green Enterpise Network10.2/16
VPN GW
Src:10.1.1.2
Dst:10.2.0.9
L3VPN PPP
Site-to-Site VPN• S2S connectivity to branch offices• Connecting Virtual Networks in other
Azure Sites• BGP for route updates• Transit routing for resiliency
Contoso HQ
Exchange
AD/DNS
IIS ServersSQL Farm
Monitoring
Contoso virtual networks/VMs
Internet
Services on public IPs
VPN Gateway(Internet Edge)
ExpressRoute
Microsoft
WAN
Corp HQ
Branch office 1
Branch office 2
Public internet
ExpressRoute provides a private, dedicated, high-throughput network
connection to Microsoft
Security
Lower cost
Predictable performance
High throughput
IaaS VM
• All infrastructure runs behind an LB to enable high availability and application scale
• How do we make application load balancing scale to the cloud?
• Challenges:• Load balancing the load balancers• Hardware LBs are expensive, and cannot
support the rapid creation/deletion of LB endpoints required in the cloud
• Support 10s of Gbps per cluster• Support a simple provisioning model
Cloud Load Balancing
Internet
LB
Web Server VM
Web Server VM
SQL Service
IaaS VM
SQL Service
NAT
• Goal of an LB: Map a Virtual IP (VIP) to a Dynamic IP (DIP) set of a cloud service
• SDN controller abstracts out LB/vswitch interactions
• Two steps: Load Balance (select a DIP) and NAT (translate VIP->DIP and ports)
• Pushing the NAT to the vswitch makes the LBs stateless (ECMP) and enables direct return
All-Software Load Balancer:Scale using the Hosts
LB MUX
VM DIP10.1.1.
2
VM DIP10.1.1.
3
Azure VMSwitch
StatelessTunnel
Edge Routers
Client
VIP
VIP
DIPDIP
DirectReturn:
VIP
VIP
LB MUX
VM DIP10.1.1.
4
VM DIP10.1.1.
5
Azure VMSwitch
NATController
Tenant Definition:VIPs, # DIPs
Mappings
NAT
Layered Security, Protection, and Isolation
DDoSProtection
Virtual Networ
kIsolatio
n
DMZ &NSG
VMFirewall
Cloud Services &
Virtual Machines InternetACLs
• Segment network to meet security needs
• 5 tuple ACLs on both directions
• Can protect Internet and internal traffic
• Enables DMZ subnets• Associated to
subnets/VMs and now NICs
• ACLs can be updated independent of VMs
Network Security Groups
Virtual Network
Backend10.3/16
Mid-tier10.2/16
Frontend10.1/16
VPN GW
Internet
On Premises 10.0/16
ExpressRouteand VPNs
√ √
√ √
• VMs that perform specific network functions • Focus: Security (Firewall, IDS , IPS), Router/VPN, ADC
(Application Delivery Controller), WAN Optimization
• Typically Linux or FreeBSD-based platforms• 1st and 3rd Party Appliances
• Azure Marketplace• Available through Azure Certified Program to ensure quality
and simplify deployment
Network Virtual Functions/Appliances
Network Virtual Appliance Ecosystem
• Hot-add / hot-remove vNIC• Add or remove a virtual NIC in a running Linux virtual
machine• Linux guest will add or remove the corresponding /dev
entry• New in Windows Server 2016 Hyper-V
• Network throughput• Implemented vRSS/vSSS and various TCP offloads• Instrumented and tightened code paths• All improvements are upstream in the main Linux
kernel
Linux Networking on Hyper-V
Hyper-V host #1 Hyper-V host #2
Linux guest (8 vCPUs)
Linux guest(8 vCPUs)
iperf3(16
threads)
10G Ethernet
iperf3(16
threads)9.4 Gbps throughput
Microsoft partners with distro vendors to get
improvements built-in. Feature grids in Linux
TechNet docs have version info.
WAP UI Portal
ARMAzure Resource
Manager
Authentication, Authorization, Role BasedAccess Control (RBAC), Template handling
CRPCompute Resource Provider
SRPStorage
Resource Provider
NRPNetwork Resource Provider
NCNetwork
Controller
xRP cache for resources
TemplateMicrosoft.Compute Resource 1 Resource 2Microsoft.Storage Resource 3Microsoft.Network Resource 4 Resource 5
Write Your App Once, Run Anywhere
Identical across Azure and other clouds
Azure Stack Demo
Storage needs networking as well as compute!
How do we make Azure Storage scale?
• Remote DMA primitives (e.g. Read address, Write address) implemented on-NIC• Zero Copy (NIC handles all transfers via DMA)• Zero CPU Utilization at 40Gbps (NIC handles all packetization)• <2μs E2E latency
• RoCEv2 enables Infiniband RDMA transport over IP/Ethernet network (all L3)• Enabled at 40GbE for Azure Storage, achieving massive COGS savings by
eliminating many CPUs in the rack
All the logic is in the host. Software Defined Storage now scales with the Software Defined
Network
RDMA: High Performance TransportNICNICMemory
Buffer A
Memory
Buffer B
Write local buffer at Address A to remotebuffer at Address B
Buffer B is filled
ApplicationApplication
Just so we’re clear…40Gbps of I/O with 0% CPU
It gets better – Converged Fabric
Network Performance MonitoringInterne
t
Fault domain (Subnet1)
Fault domain (Subnet2)
www.bing.com www.msn.com
Capabilities
Loss & latency monitoring (Intra/Inter/Subnet to Internet)
Impact assessment
Advanced algorithms
Network visibility
SCOM integration
Reduce MTTD and time to resolve issues through network performance monitoring, impact assessment, fault localization and health data
Infrastructure Demo
• Agility• You get a default gallery of applications (eg. SharePoint, Exchange, etc)• You get a self-service Portal that your customers/BGs use• Customer picks a workload and all the underlying requirements automatically get plumbed into the
infrastructure
• Flexibility• Workloads can get instantiated into overlays or virtual networks• You can move VMs around in the overlay without changing any IP addresses• You can move subnets to different clouds altogether and connect via gateways• Your apps can be written such that they work transparently in Azure or Azure Stack!
• Availability• Every component in the underlying system is designed to ensure it remains available• If one instance goes down, another one picks up
• Security• Use the Distributed Firewall, Network Security Groups, and Virtual Appliances for more dynamic security
• Any others? We want to hear from you!
Back to your challenges
• As a platforms company, we wanted to create a Cloud platform that would solve for real customer needs. We did – with Microsoft Azure!
• With Microsoft Azure we faced a lot of the same challenges you currently are.
• We did not want to take away choice from you. Instead of forcing you into Azure, we instead are bringing Azure to you as well! Now you get to choose where to put your workloads while benefiting from the years of innovation and more to come
• We would love to hear your use cases and challenges.
If you would like to take part in surveys that shape the future of Microsoft SDN. Please drop off a business card or send me contact info at [email protected].
Summary
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.