23
© 2013 SAMSUNG Electronics Co. Mario Smarduch Senior Virtualization Architect Open Source Group Samsung Research America (Silicon Valley) [email protected] State of the Union: Open Source Netw ork Function Virtualization

State of the Union: Open Source Network Function Virtualization

Embed Size (px)

DESCRIPTION

Mario Smarduch, Senior Virtualization Engineer from the Samsung OSG, gives his perspective on Network Function Virtualization, including discussions of the Xen platform.

Citation preview

Page 1: State of the Union: Open Source Network Function Virtualization

© 2013 SAMSUNG Electronics Co.

Mario SmarduchSenior Virtualization ArchitectOpen Source GroupSamsung Research America (Silicon Valley)[email protected]

State of the Union: Open Source Netw

ork Function Virtualization

Page 2: State of the Union: Open Source Network Function Virtualization

2 © 2013 SAMSUNG Electronics Co.

ConfidentialTalk DescriptionOne of the hottest developments today for Fixed and Mobile Networks is

'Network Function Virtualization', headed by ETSI (European Telecommunications

Standard Institute) ISG which managed to become the largest ISG in matter of six months

with close to 70 members and 90 participants. Goals of NFV are to eliminate proprietary

hardware appliances, to reduce energy, space, and hardware turnover cost. Leverage IT

virtualization benefits like consolidation, time to market, multi-tenancy of heterogeneous

applications, scaling out and in, and encourage an open eco-system not tied to any

specific hardware. However IT virtualization is currently not fit for some NFV scenarios,

Network Elements, User Equipment. Proprietary vendors and chip manufacturers are

rushing to close this gap.

This presentation focuses on open source virtualization technology primarily KVM-ARM to

contrast these Gaps and identify required low level enhancements in hypervisor, guest,

and ongoing community development to address these gaps is presented. Real uses

cases are presented to illustrate why IT virtualization is not always a fit for many NFV

scenarios. A brief overview of ARM-KVM virtualization and hardware extensions are also

covered.

Page 3: State of the Union: Open Source Network Function Virtualization

3 © 2013 SAMSUNG Electronics Co.

ConfidentialAgendaGeneral Public Clouds

NFV Introduction, Status

Cloud RAN NFV use case

KVM (ARM) – limitations/required enhancements

Page 4: State of the Union: Open Source Network Function Virtualization

4 © 2013 SAMSUNG Electronics Co.

ConfidentialPublic Cloud Control Focus on IaaS – PaaS, SaaS build on top of each other

- NFV does have PaaS, SaaS – powerful use cases as well (see NFV use case document)

- IaaS to grow – 2011 $4.2B � $24B 2016 (Source: Gartner)

DatabaseAgent Agent

IaaSOwners Portal SchedulerCompute Cloud Sotorage & ImageCloud

� IaaS owner issues new VM request via portal serverwith params # of cores, memory, storage, image toload/install

� Scheduler – view physical server/storage/network DBselects optimal server, loads image, creates raid createsVM in Compute cloudo May need to migrate load, create NAT entrieso For KVM issue libvirt � qemu, commands

� Update DB to maintain availability� IaaS owner – unaware of physical topology, migration,

i.e. other management – cloud infrastructure control plane� OpenStack equivalent components – Dashboard, Network

Compute, Image, Block Storage, …

RAIDVM

VMVM

QEMUvCPU/IO Threads QEMUQcow2imagevirtio-blkvirtio-net vSW OpenFlow – SDNControl (also VLANS, GRE,..)SSH, VNC

VM

Page 5: State of the Union: Open Source Network Function Virtualization

5 © 2013 SAMSUNG Electronics Co.

ConfidentialPublic Cloud NetworkL3 & L2 in public cloud – Scaling the Cloud

- Public clouds 40,000 Physical machines possibly up to 1,000,000 VMs

∙ 2011-Gartner 8VMs/Server, probable 30:1 ratio

- IaaS – typically don’t require L2 broadcast domain, scale through multiple VMs

- VMs place on unique subnets – isolated for security

- Very few apps require – L2 in Cloud (broadcast, multicast – discovery of services)

∙ Large cloud providers – support L2 subnets

∙ Some client/server architectures – i.e. front end/backend processing

- Large cloud Scaling achieved through L3 hierarchical aggregated routes

Page 6: State of the Union: Open Source Network Function Virtualization

6 © 2013 SAMSUNG Electronics Co.

ConfidentialPublic Cloud Network

10.20.0.0/16 – bits [17-24].xxxx_xxxxAggregate – 256 subnets * 256 10.20.254.0/27 – 10.20.254.224/27[25-27].x_xxxx aggregate – 8 subnets * 32

10.20.0.0/24 10.20.255.0/24 ……10.20.254.0/30 – 10.20.254.31/30 bits [28-30].xx aggregate – 8 subnets * 4Each 1 IP, 1 GW

L2 overlay of L3, support isolated L2 Subnets For VMs IaaS

vSW 192.168.x.x

SDN OrchistrationOpenFlow switch/route on any fields

SDN - Openvswich with KVM# ovs-vsctl add-br br0# ovs-vsctl add-port br0 <phys-intfc>- qemu.ifup – ovs-vsctl add tap to br0- ovs-ofctl – control flows

DNS Load Balance- For example A record to multiple IPsScaling via L3

Page 7: State of the Union: Open Source Network Function Virtualization

7 © 2013 SAMSUNG Electronics Co.

ConfidentialPublic Cloud Characteristics (IaaS)Workloads

Web front end, SQL data base backend – eCommerce

Social Networking

SaaS apps like email, Content Backup

High Performance Computing in the cloud

Characteristics• Resources – traditional compute – cpu,

ram, storage, network

• Response - Not Real-time – response

driven by user perception (web interface)

• Scalability – out, in – add/remove VMs

- Front frontend server, or load balancer

distributes load,

• I/O – primarily virtualized – storage,

network

• Overcommit – as much as current

average 8:1, future 30:1 per Server

[Source: cloudscaling]

• Orchestration – spans few VM types,

small geographic area – same Pod

Page 8: State of the Union: Open Source Network Function Virtualization

8 © 2013 SAMSUNG Electronics Co.

ConfidentialIntroduction to NFVMobile Network – LTE EUTRAN/EPC

Page 9: State of the Union: Open Source Network Function Virtualization

9 © 2013 SAMSUNG Electronics Co.

ConfidentialIntroduction to NFV• EUTRAN – eNodeB, UE

- Radio – bearer, admission, mobility, scheduling dynamic radio resource allocation for uplink and downlink

• EPC

- PCRF = Policy Control and Charging Rules

∙ Determines QoS Class Identifier for data flow

∙ QoS – GBR/non-GBR, Priority, Delay, Pkt error loss rate – RT Gaming, Voice, Live Streaming most demanding

- HSS = Home Serving Server

∙ Subscriber profile – QoS, APN (PDN), current user MME

- P-GW = Packet Data Network Gateway

∙ UP IP alloc, enforce PCRF QCI map to DL bearers

- S-GW = Serving Gateway

∙ UE anchor for all IP traffic as UE roams through eNodeBs, retain bearer info for UE in idle

- MME = Mobility Management Engine

∙ Control node, UE attachment, bearer setup, UE context management from HSS,

process Tracking Area Update, paging, UE-IDLE to CONNECT state

Page 10: State of the Union: Open Source Network Function Virtualization

10 © 2013 SAMSUNG Electronics Co.

ConfidentialEstablishing Bearers

Data Plane Three traffic pipes - call setupQCI1 GBR Delay 100ms voiceQCI2 GBR Delay 100ms videoQCI4 GBR Delay 50ms RT gaming

Control Plane – idea of messaging in Bearer setup – mobile initiated

- LTE supports Public Safety – call setup time < 300ms – support group calls

- Other procedures –Sys Info Bcast, UE Rand Access Proc., UE Attach/Detach, TAU, Call Term., Paging – MME 500-800 UE msgs/hr, heavy load – 1500msgs/hr

- Example Call Setup – range from 2-3sec -UE eNodeB MME HSS SGW P-GWRRC Conn Establish (Several MSGs)Bearer Resource Allocation Req

Radio Bearer S1-U Bearer S5-U Bearer

UE S-GWeNodeB P-GW IPGTP-UUDP/IPL2L1IPPDCPRLCMACPHY QCI�DL-TFT � S5 TEIDS1 TEID � S5 TEIDRB-ID � S1 TEIDUL-TFT � RB ID

Identification & Authorization Request (long procedure) Modify Bearer Request Modify Bearer RequestModify Bearer ResponseModify Bearer ResponseBearer Resource Cmd Bearer Resource CmdCreate Bearer RqstCreate Bearer RqstDetermine E2EResources for BearerAllocate TFT, map to QCIand GTP-U TEIDActivate Bearer@ eNodeB andPiggybacks UEMessageeNodeB tellUE on RRC allocaton……..

Page 11: State of the Union: Open Source Network Function Virtualization

11 © 2013 SAMSUNG Electronics Co.

ConfidentialLTE EUTRAN/EPC Load CharacteristicsResources

- Radio BW, Network (CN), CPU, Memory, Storage (varies on NE like HSS).

Response - State Machine driven

- Attachment, idle-connect, bearer setup – associated with timers/states

- Real-time sensitive – various parameters can be tuned – but User Experience Suffers

- User perception still all important – but hard deadlines exist

- Near native scheduling

Scalability & Orchestration

- Network tightly coupled – scaling out – ripples through NEs

- Unlike Public Cloud just adding new VMs will not do it

- Orchestration for scale out/in extremely complex

I/O – Need near native

- RAN – massive device pass-through BBU accelerators, EPC NIC device pass-through

Overcommit

- Delicate load calculation required for PLMN to scale on demand where needed

- Can’t apply Cloud 8:1, 30:1 ratios

Page 12: State of the Union: Open Source Network Function Virtualization

12 © 2013 SAMSUNG Electronics Co.

ConfidentialCurrent State of NVFNFV ETSI ISG

- Initial White Paper published Oct 2012

- Spans Mobile, and Fixed Networks

- First serious attempt to virtualize Mobile/Fixed networks∙ Members Service Providers and all eco-system players

Proof of Concepts – Cloud Ran, Migration with Dev Pass-through, Cloud rGW

- Network Function Virtualization as a Service (NVF IaaS)

∙ Target Big Telco/Small Telco – lease NFVI as IasS for VNF and Cloud

- VNFaaS – move enterprise CPE into SP cloud, and later PE simplify Opex/Capex

∙ AR, NG-FW, QoS/DPI in owned/provisioned by SP

- VNPasS – Platform as a Service for example DNS, DHCP, email, FW

∙ Bring closer to APN – no tunneling back central IT infrastructure – total control

∙ SP provides bare services and Enterprise with config tools to manage the service

- VNF Forwarding Graphs

∙ Essential SDN – in multi-tenant environment OpenFlow capable config required to host f.e. small telco in VNFI

∙ Need SDN orchestration OpenStack enhancing Quantum for SDN – to span VNFs and Physical Network functions

- Mobile Core Virtualization – Goes along with NVFIaaS (to some extent)

∙ Improves Self Optimizing Networks – deliver performance where needed

- Cloud-RAN - key features for SON, on demand Radio BW, Opex/Capex savings

- Virtualizing home – vSTB, vGW – Fixed Network video/internet delivery to home

Page 13: State of the Union: Open Source Network Function Virtualization

13 © 2013 SAMSUNG Electronics Co.

ConfidentialNFV Cloud-RAN Use CaseEvolution of Radio Access Network

• Single mode – 2G,3G – combined BBU & RRU• Scaled to maximum peak – waste of resources• Base Band Processing co-located with Remote Radio Unit

o Hard access, power an issue in some locations

BBUBBU

RRUs• Remote Radio Units distributed via fiber links• Base Band Processing support multiple technologies• BBU can be housed in-door RRUs strategically distributed

• Pooling of Radio Base Band Unit Processing• Capacity dynamically adjusted – example sport event • Resources maximized – delivered on demand• Several Technologies supported

vBBULTE vBBULTE vBBUUMTSPHY AceeleratorsMME/SGW SGSN

Page 14: State of the Union: Open Source Network Function Virtualization

14 © 2013 SAMSUNG Electronics Co.

ConfidentialNew Virtualization HYP Mode

Page 15: State of the Union: Open Source Network Function Virtualization

15 © 2013 SAMSUNG Electronics Co.

ConfidentialVirtualization MMU Extensions

Page 16: State of the Union: Open Source Network Function Virtualization

16 © 2013 SAMSUNG Electronics Co.

ConfidentialInterrupt Virtualization Extensions

Page 17: State of the Union: Open Source Network Function Virtualization

17 © 2013 SAMSUNG Electronics Co.

ConfidentialDevice Pass-throughArchitecture/cost of interrupts

- BBU cloud has hundreds of devices passed through – small cells many RRUs and fiber links

- Libvirt, qemu not ready for such passive pasthrough, another issue handling faults

- RRU to/from BBU PHY OFDMA (channels framing,FEC) to MAC – L2 logical Channels

- L3 - RRC, NAS, IP

~80 regsGPRegsVFP/SIMDCP15 regsGuest ~80 regsGPRegsVFP/SIMDCP15Host

• MMU Pass-through – to usero Devices emulated – trap to QEMU – not this typeo GVA � IPA � HPA – Direct access to HW regs

� PCI – looks up target BARs for HPA, QEMU selects IPA� DT – Device node with HPA, QEMU selects IPA

o No performance penalty for MMU pass-through• Cost of Exit/Enter – executed in HYP – optimized assembler

o Similar to process switch, Guest switch very costly� More so – OS system registers saved/restored� All banked regs (dabt, iabt, irq, …)

o No concept of light-weight context switch like threadso Goal avoid at all costs

QEMU GuestKVM/HostHYP Mode (PL2)Hardware (PHY to UE, NIC to CN)

QEMU Guest UserKernelMemory DriversDriversT&E Device IPA IPAGVA GVATask TaskGVAHPA

MMIO Device Passthrough (PHY to UE/NIC)GuestExitReturnTo HostIRQToHost

Inject To Guest EOI ExitvGICDupdate Save GuestState Load Host State ~80 regsGPRegsVFP/SIMDCP15HostSave HostState ~80 regsGPRegsVFP/SIMDCP15 regsGuestLoad Guest State

Page 18: State of the Union: Open Source Network Function Virtualization

18 © 2013 SAMSUNG Electronics Co.

ConfidentialDevice Pass-throughIRQ over head and optimizationsQEMU Guest

KVM/HostHYP Mode (PL2)Hardware (PHY to UE, NIC to CN)

QEMU Guest UserKernelMemory DriversDriversT&E Device IPA IPAGVA GVATask TaskGVAHPA

MMIO Device Passthrough (PHY to UE/NIC)GuestExitReturnTo HostIRQToHost

Inject To Guest EOI ExitvGICDupdate

1. Guest executes – exit to hyp mode – save guest/restore host2. Host enable Interrupts – deliver to host – 1st Complete IRQ OS PATH3. Inject to Guest – save host, restore guest & 2nd Complete IRQ OS PATH4. Guest EOI – exit save guest/restore host5. Update virtual distributor6. Resume Guest – save host/restore guestNote: Applying most direct injection no – irqfd, and additional threadsTesting by Virtual Open Systems reveals atleast 5x delay

Optimization 1• ARM supports piority drop/deactivation after ack IRQ priority dropsand can deactivate from Guest during EOI w/no exit

• ARM can inject hwirq• Eliminate 4-6 (experimenting)Optimization 2• Process Interrupts directly from HYP mode

o Currently HYP mode limited to low level• Build hwirq inject to Guest• Eliminate 2-6 HOWEVER requires C-code, more overhead in HYP mode

In Addition• IRQ CPU affinity must match vCPU affinity – either bind or follow vCPUotherwise you need IPIs – very slow

• Prefereable vCPU in idle not exit, wait for event not exit• Future GIC versions per IRQ – direct delivery – still handle sleeping guests

Page 19: State of the Union: Open Source Network Function Virtualization

19 © 2013 SAMSUNG Electronics Co.

ConfidentialDynamic Load BalancingCloud Ran – dynamic load balancing between VMs

- Cell sites exhibit various loads throughout the day

- vCPU hotplug∙ Unplug/plug vCPUs – dynamically scale to demand

Multicore Platform

vBBU

unplug/plug vCPUsPower MgmtvCPUs Idling

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

vBBU vBBU

Page 20: State of the Union: Open Source Network Function Virtualization

20 © 2013 SAMSUNG Electronics Co.

ConfidentialFast Path between Radio/CNZero copy message passing – Guest/Guest, Guest to Host

- ivshmem one example, add enhancements

QEMU Guest

Hardware (PHY to UE, NIC to CN)T&E Device

QEMURANPHYSHMDEV SHMDEVSHMDRV SHMDRVT&E Device

TaskTask NICHYP Mode

GuestPass-throughHOST/KVM

OptimizedInterruptSignaling

• BBU needs fast switching – radio � core network• Can’t have full stack with expensive IPC• Want to separate Radio and Core functions• Radio

o Dedicate CPUs – poll –or- optimized dev passthrough� Dedicate is never good.

o PHY Device passthrough –� Pull of wire directly to user space + MAC + L3 packet� Zero copy to inter-guest shared memory

• Core Networko Pull packet from Shared Memory Ring buffero Tx/Rx to Core Network SCTP or GTP-U

• Issues:o Signaling – signaling/interrupt path too long (red lines)

� Guest via UIO writes to IRQ reg, exit, MMIO to QEMU� QEMU ‘event’ peer QEMU� Emulated Device on peer inject interrupt to Guest� Solution: interrupt HYP mode, coalesce

o Discovery – to pair Guest must discover shared memory segments dynamically

� Many vBBU clouds on demand create/destroy� Solution: shared memory discover protocol via emulateddevice through QEMU (green line)

SharedMemoryvBBU Instance

Page 21: State of the Union: Open Source Network Function Virtualization

21 © 2013 SAMSUNG Electronics Co.

ConfidentialRT-SchedulingNetwork stack time sensitive - requirements

- Highres timers a must

- Preemptibility – event PREEMPT_RT a must – prevent interrupt inversion

- Scheduling at several levels – host and guest threads

Timers

- Arch-timers improvement no exit on reg updates

- But still exit on timer fire – need injection

- Issue for high res timers in Guest

- Again near native IRQ pass-through important

IRQs & page faults

- Any host IRQ can prevent guest from running

- PFRA as well (if so most likely not tuned for RT)

PREEMPT_RT

- Not really tested with virtualization

IRQ SourcesLinux Host & KVMHardwareGuest

vMME vS-GW(PREEMPT_RT)Timer Events � LatencyTimer Events � LatencyVFIO, Protocol FSM, …Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver.

Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver (PREEMPT_RT)

Page 22: State of the Union: Open Source Network Function Virtualization

22 © 2013 SAMSUNG Electronics Co.

ConfidentialRT-SchedulingPossible Optimizations – area of research

Linux Host & KVMHardwareGuest

vMME vS-GW(PREEMPT_RT)Timer Events � LatencyTimer Events � LatencyVFIO, Protocol FSM, …Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver.

Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver (PREEMPT) Host PREEMPT_RT

- Eliminate spinlocks, replace with mutexes

- Prioritize interrupts - non-VM targeted IRQs

- vCPUs – prioritize at higher priority

- VM IRQs don’t run as threads – timers, dev-passthrough IRQs

- Use Priority Drop/Deactivation to schedule

highest priority interrupts for VMs

Guests most likely PREEMPT only

Challenges –

- multiple VMs sharing CPU∙ Priority between them

∙ Priority of their IRQs

∙ Context switching an issue, depends on load

- OS periodic tick work - CONFIG_NO_HZ_FULL

∙ Promising, for dedicated vCPU to core reduces tick overhread

∙ Improves multiple vCPUs as well tick rate

IRQ Sources

Page 23: State of the Union: Open Source Network Function Virtualization

Thank you.

© 2013 SAMSUNG Electronics Co.

Mario Smarduch

Senior Virtualization Architect

Open Source Group

Samsung Research America (Silicon Valley)

[email protected]