Upload
vuongkhuong
View
215
Download
0
Embed Size (px)
Citation preview
VirtualFilteringPlatform
Aretrospectiveon8yearsofshippingHostSDNinthePublicCloud
DanielFirestoneTechLeadandManager,AzureNetworkingHostSDNTeam
Overview
• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture
mediacaching identity servicebus
mobileservices
cloudservices
virtualmachines
DataServices tableDataLake
blobstorage
SQLdatabase
AppServices
media
hpcintegration analytics
caching identity servicebus
web appsmobileservices
cloudservices
InfrastructureServices cdn
virtualmachines
virtualnetwork vpn
trafficmanager
MicrosoftAzure
10’sofPB ExabytesAzureStorage
Pbps10’sofTbpsDatacenterNetwork
2010 2017
100K MillionsComputeInstances
Fortune500usingMicrosoftCloud
>85%
>60 TRILLIONAzurestorageobjects
>9 MILLIONAzureActiveDirectoryOrgs
1outof3AzureVMsareLinuxVMs>900 TRILLION
requests/day
>3 TRILLIONAzureEventHubsevents/week
>110BILLIONAzureDBrequests/day
AzureScale&Momentum
>18 BILLIONAzureActiveDirectoryauthentications/week
NewAzurecustomersamonth
>120,000
IaaS VM
Example#1:LB(FromAnanta,SIGCOMM‘13)
• AllinfrastructurerunsbehindanLBtoenablehighavailabilityandapplicationscale• Howdowemakeapplicationloadbalancingscaletothecloud?• Challenges:• Howdoyouloadbalancetheloadbalancers?• HardwareLBsareexpensive,andcannotsupporttherapidcreation/deletionofLBendpointsrequiredinthecloud• Support10sofGbps percluster• Needasimpleprovisioningmodel
LB
WebServerVM
WebServerVM
SQLService
IaaS VM
SQLService
NAT
“SDN”Approach:SoftwareLBwithNATinVMSwitch
MUX
VMDIP10.1.1.2
VMDIP10.1.1.3
AzureVMSwitch
StatelessTunnel
EdgeRouters
Client
VIP
VIP
DIPDIP
DirectReturn:VIP
VIP
MUX
VMDIP10.1.1.4
VMDIP10.1.1.5
AzureVMSwitch
NATSLBM
TenantDefinition:VIPs,#DIPs
Mappings
• GoalofanLB:MapaVirtualIP(VIP)toaDynamicIP(DIP)setofacloudservice• Twosteps:LoadBalance(selectaDIP)andNAT(translateVIP->DIPandports)• PushingtheNATtothevswitchmakestheMUXes stateless(ECMP)andenablesdirectreturn• SinglecontrollerabstractsoutLB/vswitch interactions
NAT
Example#2:Vnet
• IdeasfromVL2(SIGCOMM‘09)
• GoalistomapCustomerAddresses(e.g.BYOIPspace)toProviderAddresses(real10/8addressesonthephysicalnetwork)
• Thisrequiresatranslationof*every*packetonthenetwork– nohardwaredeviceonournetworkisscalableenoughtohandlethisloadalongwithalloftherelevantpolicy
• Enablescompaniestocreatetheirownvirtualnetworkinthecloud,definingtheirowntopologies,securitygroups,middleboxesandmore
VNETForwardingPolicy:Traffictoon-prem
Node1:10.1.1.5
BlueVM110.1.1.2
GreenVM110.1.1.2
VMSwitchSrc:10.1.1.2 Dst:10.2.0.9Src:10.1.1.2 Dst:10.2.0.9
Policylookup:10.2/16routestoGWonhostwithPA10.1.1.7
NM
Src:10.1.1.5 Dst:10.1.1.7 GRE:Green Src:10.1.1.2 Dst:10.2.0.9
L3ForwardingPolicy
Node3:10.1.1.7
GreenVPNGWVM10.1.2.1
VMSwitch
GreenEnterpiseNetwork10.2/16
VPNGW
Src:10.1.1.2 Dst:10.2.0.9L3VPNPPP
EvenMoreVSwitch…
• 5-tupleACLs• InfrastructureProtection• User-definedProtection
• Billing• Meteringtraffictointernet
• Ratelimiting• SecurityGuards• Spoof,ARP,DHCP,andotherattacks
• Moreindevelopmentallthetime…
NM/TM
Node2:10.2.1.6
VM110.1.1.2
VM210.1.1.3
VMSwitch
TenantDescription
ACL:VM2cantalktoothergreenVMsACL:VM2cantalktoVM3butnotVM4MeteralltrafficfromVM2outsideof10/8RatelimitVM2to800mbps
VM310.1.1.4
VM410.1.1.5 Billing
VM2sent23MBofpublicinternettraffic
EarlyApproachtoAzureVswitch (2009-2011):StackedSDNdriversperapp• EachSDNapplicationisadrivermodulehardcompiledintothevswitch,handlingpacketsonitsown
• ChangestoSDNpolicyrequirekernelspacechanges,andanOSupdate
• WasrevolutionaryforusinshippingLBandVNETandHostSDN– butnoteasytoaddnewSDNApps
• AfteracoupleofyearswedecidedweneededamoreflexibleHostSDNplatform
AzureVirtualSwitch
ACLs
LB
VNET
Overview
• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture
OriginalGoalsforAzureHostSDNPlatform
• Goal1:Provideaprogrammingmodelallowingformultiplesimultaneous,independentnetworkcontrollerstoprogramnetworkapplications,minimizingcross-controllerdependencies
• Goal2:ProvideaMATprogrammingmodelcapableofusingconnectionsasabaseprimitive,ratherthanjustpackets– statefulrulesasfirstclassobjects
• Goal3:Provideaprogrammingmodelthatallowscontrollerstodefinetheirownpolicyandactions,ratherthanimplementingfixedsetsofnetworkpoliciesforpredefinedscenarios
VMSwitch
vNIC
VM
NICvNIC
VM
SLB(NAT)
VNET
ACLs,Metering,Security
VFP
VirtualFilteringPlatform(VFP)Azure’sSDNDataplane
• PluginmoduleforWS2012+VMSwitch• ProvidescoreSDNfunctionalityforAzurenetworkingservices,including:• AddressVirtualizationforVNET• VIP->DIPTranslationforSLB• ACLs,Metering,andSecurityGuards
• Usesprogrammablerule/flowtablestoperformper-packetactions• ProgrammedbymultipleAzureSDNcontrollers,supportsalldataplane policyatlineratewithoffloads
VFPTranslatesL2extensibility(ingress/egresstoswitch)toL3extensibility(inbound/outboundtoVM)
VM
Metering
VNET
SLB
ACLs
Inbound(Egress) Outbound(Ingress)
Egress->Inbound
Ingress->Outbound
VMSwitch
vNIC
VM
NICvNIC
VM
SLB(NAT)
VNET
ACLs,Metering,Security
VFPEgress
Egress
Ingress
Ingress
Goal:AllPolicyisintheController-VFPisaFast,FlexibleImplementationofPolicy• Toenableagility,allowcontrollerstospecifyexactlywhattheywanttodoattheflow/packetlevel,sotheycanimplementnewSDNscenarioswithoutdataplane driverchanges• VFPfocusesonintegratingmulti-controllerpoliciesandscalingthehostdataplane – perf andoffloadswithoutsacrificingflexibility• 3KeyPrimitivesweexposetocontrollers:• Layers– independentflowtablespercontrollertoorderthepipeline• RuleMatches– definewhichpacketsmatchwhichrule• RuleActions– whattodowithapacketforagivenrule
Node:10.4.1.5
VFP
KeyPrimitive:MatchActionTables
BlueVM110.1.1.2NIC
Controllers
TenantDescriptionVNet Description
Flow Action
VNet RoutingPolicy ACLsNAT
Endpoints
Flow ActionFlow Action
TO:10.2/16 Encap toGW
TO:10.1.1.5 Encap to10.5.1.7
TO:!10/8 NAT outofVNET
Flow ActionFlow Action
TO:79.3.1.2 DNATto10.1.1.2
TO:!10/8 SNATto79.3.1.2
Flow Action
TO:10.1.1/24 Allow
10.4/16 Block
TO:!10/8 Allow
• VFPexposesatypedMatch-Action-TableAPItotheagents/controllers• Onetable(“Layer”)perpolicy• InspiredbyOpenFlow andotherMATdesigns,butdesignedformulti-controller,stateful,scalablehostSDNapplications
VNET SLBNAT ACLS
Layers
• AVFPlayerisnotabuilt-infunction– itisagenericsetofrule/flowtables• Anylayercanbecreatedatanytime– itisonlyan“LBlayer”ora“VNETlayer”basedonwhatrulesareplumbedintoit• ResourceslikeNATpoolsorPA->CAmappingpoolsareavailabletoanylayertoimplementspecialfunctionality(e.g.SLBorVNET)
EverythingisStateful• Thecoreprimitiveofmostpolicyisa(TCP,UDP,…)connection– translatestoatwo-wayflow• 5-tupleACLs,VIP-DIPSLBNAT,dynamicoutboundSNAT,andmore• Stateful rulesmakeiteasytoreasonaboutasymmetricpolicy– rulesapplytowhicheversidestartedtheflow,andthereversehappensautomaticallyfortheotherdirection• FlowstatemanagedbyTCPconnectiontracker
VM
OUTRules
INRules
Flow Flow
INFlowTable OUTFlowTable
VFP
Layer
Example:SoftwareLBSupportVM
DynamicNATRules
StaticNATRules
OUTRules
DecapRules
INFlowTable OUTFlowTable
INFlowTable OUTFlowTable
FlowFlow
SLBNATLayer
SLBDecap Layer
FlowFlow NATRanges
NATPool
RulescanreferenceResources,likedynamicNATpoolsorPA-CA
mappingtables
Similarly,VNETcanbeexpressedasaseriesof(encap,decap,rewrite,etc)rules,ratherthanfixedpolicy
CoolUsesofStateful Flows– LBFastpath
VFP
SLBDecap/Fastpath
SLBNAT
Storage
Decap
VFP
SLBDecap/Fastpath
SLBNAT
VM
DecapEncap
MUX RedirectPacket
FASTPATH
ExampleVFPLayers:SupportforLB,VNET,SecurityGroups,andBilling
VM
ACLs
VNET
SLBNAT
ILB
SLBDecap /Fastpath
Metering
SuccessfullydeployedacrossAzurein2012
AgilityExample:InternalLoadBalancing
• LBteamwantedtoofferCA-spaceLBinadditiontoPA-spaceLB
• Alltheyhadtodowascreateanewlayer– addednewpolicybyspecifyingCA-spacerulematchesforNATrules
• NonewworkinVFP,becausewepickedtherightprimitives
VM
ACLs
VNET
SLBNAT
ILB
SLBDecap /Fastpath
Metering
SLBController
Overview
• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture
ScalingUpSDN:NICSpeedsinAzure
• 2009:1Gbps• 2012:10Gbps• 2015:40Gbps• 2017:50Gbps• Soon:100Gbps?
Wegota50ximprovementinnetworkthroughput,butnota50ximprovementinCPUpower!
0
10
20
30
40
50
60
2009 2012 2015 2017
NICSpeed,Gbps
NewGoalsforVFPv2(2013-2014)
• Goal4:ProvideaserviceabilitymodelallowingforfrequentdeploymentsandupdateswithoutrequiringrebootsorinterruptingVMconnectivityforstateful flows,andstrongservicemonitoring
• Goal5:Provideveryhighpacketrates,evenwithalargenumberoftablesandrules,viaextensivecaching
• Goal6:ImplementanefficientmechanismtooffloadflowpolicytoprogrammableNICs,withoutassumingcomplexruleprocessing
VFPv1Layers- ChallengesVM
Metering
VNET
SLBNAT
ACLs
SLBDecap /Fastpath
ILB
• Holdoverfromoriginalvswitch design– everylayerindependentlyhandles,parses,andmodifiespackets
• Mostofourlayerswanttobestateful – butthismeansindependentconnectiontrackingandflowstateateachlayer
• AshostSDNbecameeasytoprogramandwidelyused,peoplewantedtoaddnewlayersallthetime
• Couldn’tkeepaddinglayersandscalingup
PARSE
PARSEPARSE
PARSE
PARSEPARSE
PARSE
Modify
Modify
Modify
Modify
Weneedabetterprimitiveforactions!
VM
Metering
VNET
SLBNAT
ACLs
SLBDecap /Fastpath
ILB
PARSE
MODIFY
UnifiedFlowID
HEADER
HEADER
HEADER
MATCH
TranspositionEngine
CompositeTransposition
ASICPipelineModel:ParseOnce,ModifyOnce
TRANSPOSE
Shippedin2014
HeaderTransposition- ActionsHeader Parameters
OuterEthernet SourceMAC,DestMAC
OuterIP SourceIP,DestIP
Encap Encap Type, GREKey/VXLANVNI
InnerEthernet SourceMAC,DestMAC
InnerIP SourceIP,DestIP
TCP/UDP SourcePort,Dest Port(note:doesnotsupportPush/Pop)
Action Notes
Pop Removethisheader.Noparamssupported.
Push Pushthisheaderontothepacket.Allparams mustbespecified.
Modify Modifythisheader.Allparamsareoptional,butatleastonemustbespecified.
Ignore Leavethisheaderasis.Noparamssupported.
NotPresent Thisheaderisnotexpectedtobepresent(basedonthematchconditions).Noparams supported.
Headers
HeaderActions
HeaderTransposition– ExampleActions
Header NAT Encap Decap Encap+NAT Decap+NAT
OuterEthernet Ignore Push(SMAC,DMAC) Pop Push(SMAC,DMAC) Pop
OuterIP Modify(SIP,DIP) Push(SIP,DIP) Pop Push(SIP,DIP) Pop
GRE/VxLAN NotPresent Push(Key) Pop Push(Key) Pop
InnerEthernet NotPresent Modify(DMAC) Ignore Modify(DMAC) Ignore
InnerIP NotPresent Ignore Ignore Modify(SIP,DIP) Modify(SIP,DIP)
TCP/UDP Modify(SPt,DPt) Ignore Ignore Modify(SPt,DPt) Modify(SPort,DPt)
Allowsrulestoexpressmorecomplexactionsacrossheaders
UnifiedParsingandMatchingCondition Notes
SourceVPort N/A
(Outer)SourceMACAddress N/A
(Outer)DestinationMACAddress N/A
(Outer)SourceIPAddress IPv4orIPv6
(Outer)DestinationIPAddress IPv4orIPv6
(Outer)IPProtocol N/A
SourcePort AppliesifProtocol==TCPorUDP
DestinationPort AppliesifProtocol==TCPorUDP
ICMPType AppliesifProtocol==ICMP(v4orv6)
DestinationVport N/A
GREKey/VxLAN VNI(TenantID) AppliesifOuterProtocol==GRE/VxLAN
(Inner)SourceMACAddress N/A
(Inner)DestinationMACAddress N/A
(Inner)SourceIPAddress IPv4orIPv6
(Inner)DestinationIPAddress IPv4orIPv6
(Inner)IPProtocol N/A
HeaderTranspositionsCompleteourGenericSouthboundAPICapabilityStory• Inordertoenableagility,wewantcontrollerstobeabletodefinenewtypesofpolicydynamicallywithoutneedingtochangeVFP.• Wealreadyprovideflexibilityin:• Layers:Controllerscandefinenewlayersdynamicallyfortheirownpolicywithoutinterferingwithothercontrollers’layers• Rules:Controllerscandefinewhichrulesmatchwhichpacketsviaaconsistent5-tuplematchAPI,nothingspecifictospecialpolicies
• Headertranspositionsprovidethekeythirdprimitive:Abilitytospecifywhatexactlyaruledoesonceitismatched• AllbuiltinrulesdefineHTs,butcontrollerscandefinetheirownrulesbycreatingnewonesoutofHTsonthefly
UnifiedFlowTables– AFastpath ThroughVFP
TranspositionEngine
Rewrite
Transposition
SLBDecap SLBNAT VNET ACL MeteringRule Action Rule ActionRule Action Rule Action Rule Action Rule Action
Decap* DNAT* Rewrite* Allow* Meter*FirstPacket
Second+Packet
Flow ActionDecap,DNAT,Rewrite,Meter1.2.3.1->1.3.4.1,62362->80
RuleLookups(Expensive)
HashLookups(Cheap)
VFP
UnifiedFlowTables
• Singlehashlookupforeachpacketafterflowiscreated• Leavesroomfornewlayersw/operf impact(e.g.ILB,etc)• SingleflowtableperVMcanbesizedwithVMsize• AllVFPactionscanbeexpressedasheadertranspositions– e.g.encap/decap/l3rewrite/l4NAT• Anysetofheadertranspositionscanbecomposedandexpressedasonetransposition• UnifiedFlowTable:Onematch(perentireflowid,innerandouter)andoneaction(headertransposition)perflow
Overview
• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture
SingleRootIOVirtualization(SR-IOV):NativePerformanceforVirtualizedWorkloads
ParentPartition VM1 VM2
TCP/IP TCP/IP
VFDriver VFDriverNetworkVirtualServiceProvider
NICNICEmbeddedSwitch
ExternalSwitch
VF VFPF
ButwhereistheSDNPolicy?
2016:AcceleratingVFPwithFPGASmartNICs
• Goal:Offloadacacheofourinternal(unified)flowtabletotheNIC
• PackageHeaderTranspositionsandUnifiedFlowIDsintohardwareAPI
• AllowsustoenableSR-IOV,applyingvirtualizationpolicyinhardwareandbypassingthehostcompletely
FutureofHostSDN:NewHardware/Softwareco-designmodels,programmableaccelerationfortransports,QoS,crypto,andmore!
Results-AzureAcceleratedNetworking:FastestCloudNetwork!• HighestbandwidthVMsofanycloud• DS15v2&D15v2VMsgetupto25Gbps
• Consistentlowlatencynetworkperformance• ProvidesSR-IOVtotheVM• 10xlatencyimprovement• Increasedpacketspersecond(PPS)• Reducedjittermeansmoreconsistencyinworkloads
• EnablesworkloadsrequiringnativeperformancetorunincloudVMs• >2ximprovementformanyDBandOLTPapplications
HostNetworkingmakesPhysicalNetworkFastandScalable
• Massive,distributed40/100GbEnetworkbuiltoncommodityhardware• NoHardwarepertenantACLs• NoHardwareNAT• NoHardwareVPN/overlay• NoVendor-specificcontrol,managementordataplane
• Allpolicyisinsoftwareonhosts–andeverything’saVM!• Networkservicesdeployedlikeallotherservices
• VFP,battletestedinthecloud,isnowavailableinMicrosoftAzureStackforprivatecloudaswell!
T2-1-1
T2-1-2
T2-1-8
T3-1
T3-2
T3-3
T3-4
RowSpine
T2-4-1
T2-4-2
T2-4-4
DataCenterSpine
T1-1 T1-8T1-7…T1-2
… …
RegionalSpine
…
T1-1 T1-8T1-7…T1-2 T1-1 T1-8T1-7…T1-2
Rack …T0-1 T0-2 T0-
20
40/50GServers
…T0-1 T0-2 T0-
20
40/50GServers
…T0-1 T0-2 T0-
20
40/50GServers
Thanks!
• VFPDevelopers• YueZuo,HarishKumarChandrappa,PraveenBalasubramanian,VikasBhardwaj,SomeshChaturmohta,MilanDasgupta,MahmoudElhaddad,LuisHernandez,NathanHu,AlanJowett,HadiKatebi,FengfenLiu,KeithMange,RandyMiller,ClaireMitchell,SambhramaMundkur,ChidambaramMuthu,GauravPoothia,MadhanSivakumar,EthanSong,KhoaTo,KelvinZou,andQasimZuhair
• DesignInfluence• AlirezaDabagh,DeepakBansal,PankajGarg,ChanghoonKim,HemantKumar,ParveenPatel,ParagSharma,NisheethSrivastava,VenkatThiruvengadam,NarasimhanVenkataramaiah,HaiyongWang
• DaveMaltz,MarkRussinovich,andAlbertGreenbergforyearsofsupport