Overcoming OpenStack Obstacles in vCPE

WHITEPAPER

© 2016

Published by

Overcoming OpenStackObstacles in vCPE

01 | Overcoming OpenStack Obstacles in vCPE

Introduction

OpenStack is a leading candidate for cloudmanagement in network virtualization.The open source software orchestrates

virtual machines (VMs) running on virtualizedinfrastructure, which is critical for enablingnetwork operators to spin up new instances ofVMs and, ultimately, deliver new services andfeatures to customers with more speed andflexibility. Widely deployed in enterprise ITenvironments, OpenStack has gained broadindustry support from telecom networkoperators as well as Network FunctionsVirtualization (NFV) groups such as theEuropean Telecommunications StandardsInstitute (ETSI) and Open Platform for NFV(OPNFV). Notable early OpenStackdeployments include AT&T, Deutsche Telekom,NTT Group, SK Telecom and Verizon.

Despite the industry support and thriving opensource community, OpenStack is a controversial,even divisive, technology among networkoperators. This is mainly due to the fact that itwas not originally designed for telecomnetworks and so it does not meet the industry’sstringent carrier-grade requirements particularlyin the areas of scalability, resiliency,performance, manageability and interoperability.While the community continues to address thecarrier-grade concerns, these limitations haveseeded doubts among some operators aboutthe viability of OpenStack for virtualizednetwork infrastructure.

Open source projects, such as OpenStack, KVMand Linux, are vital to the growth of networkvirtualization because of the speed with which acommunity approach can develop and enhancenew technologies. But open source softwarealone cannot help network operators bringservices to market more quickly or improvenetwork reliability. Today’s open source codeneeds to be hardened for commercialdeployment to realize the full benefits ofnetwork virtualization.

Some of OpenStack’s limitations are particularlyproblematic for the virtual customer premisesequipment (vCPE) use case. These challengescame to the industry’s attention when, inOctober 2015, Peter Willis, BT’s chief researcherfor data networks, publicly revealed sixsignificant problems with OpenStack thatthreatened its plans for using the open sourcetechnology in vCPE deployments to servebusiness customers.

This paper reviews the six obstacles that BTidentified and examines the solutions forovercoming them that have been developed byWind River.

Whitepaper | 02

OpenStack Obstacles Threaten vCPE Use Case

BT has found a number of issues withOpenStack that affect its ability toimplement vCPE for business customers.The problems that BT raised are related toscalability, security, resiliency, networkbinding of VNFs, service chain modification,and backwards compatibility and arediscussed in more detail here.

Obstacle 1: Binding virtual network interfacecards (vNICs) to virtual network functions(VNFs).

One of the advantages of NFV is the speed withwhich new services can be introduced forcustomers via on-demand provisioning.Launching a new service can be accomplished ina matter of minutes, rather than months, simplyby instantiating a new VNF in a cloudenvironment. For business customers, vCPEenables operators to add new services andfeatures quickly without having to engage inmanual hardware configurations and expensivetruck rolls to the premises.

But there is a hitch in what should be a smooth,efficient process for adding new services. Whenadding VNFs, many VNFs require their vNICs tobe identified in a specific order. However, off-the-shelf OpenStack distributions do notsupport an effective mechanism that informsthe VNF what the vNIC’s order and type are.The result is that VNFs can be connected to thewrong vNICs and, in some cases, the VNF canlock up completely. It’s also difficult for anoperator to verify that the VNF has beenconnected to the correct vNIC. Without areliable and configurable enumerationmechanism, operators have little control overthe process for binding vNICs to VNFs, whichultimately impairs their ability to deliverbusiness services.

Obstacle 2: Service Chain Modification.

Service chaining (also referred to as servicefunction chaining) is an integral component ofthe enterprise vCPE use case. Service chainingis an age-old networking concept, but in thecontext of software-defined networking (SDN)the technique is used to automate theprovisioning of applications and services. Forbusiness services, service chaining enablesoperators to launch new services and featureseasily, quickly and automatically via softwarewithout the need for manually configuringhardware at the customer’s premises.

The problem with OpenStack arises whenoperators need to make changes to the servicechain. Modifying service chains using OpenStack,as is, is simply too slow and cumbersomebecause the open source software does notsupport fast, dynamic reconfiguration. Forexample, to add a new business service, such as aWAN accelerator, to a service chain that alreadyincludes a router and firewall, an operator wouldneed to delete the interface on the firewall andthen reconnect it, which leads to unpredictableresults, the reconnected interface may or may notwork, the firewall may stop working, and addsoutage time to the process. Alternatively, theoperator would have to build a completely newservice chain to accommodate the new service,which would result in a service outage that canlast more than five minutes. The current scenariosfor service chain modification using OpenStackare unacceptable for vCPE services.

“Modifying service chainsusing OpenStack, as is, issimply too slow andcumbersome because theopen source softwaredoes not support fast,dynamic reconfiguration.”


Obstacle 3: Scalability of the OpenStackController.

Scalability is a significant factor in determiningthe cost of deploying enterprise vCPE networks.Operators need to know the precise scalablecapacity of OpenStack-based control nodes andhow many compute nodes each one is capableof supporting in order to conduct a thoroughcost analysis on the number of servers requiredfor deployment, where they can be located, andwhat type of workloads they can handle. Thereare several options for where operators caninstall the compute and control nodes in a vCPEdeployment, such as in their own central officeor data center, or at the customer’s premises.The number of servers needed affects wherethey can be located, as for example someoperator premises may be too small toaccommodate many servers.

The problem for vCPE deployments is that vanillaOpenStack distributions are not methodicallytested to give operators reliable data on how wellthe software scales. Currently, operators do nothave certainty about the number of computenodes that an OpenStack-based controller cansupport. The lack of information is a significantlimitation in the early planning stages for vCPEnetworks. An operator would not be able to

determine the appropriate scale for differentdeployment scenarios. For example, it is difficult, ifnot impossible, to determine when it is appropriateto deploy an OpenStack-based control node tosupport a large region, or a smaller scale versionfor a branch office or a small town. Operators areforced to conduct their own costly and time-consuming testing, or risk the consequences ofuncertainty. Both are unacceptable options andpoint toward using a commercial solution toovercome the shortcomings.

Obstacle 4: Start-Up Storms.

Service outages are every network operator’sworst nightmare. Whether they provide vCPE-based services to businesses or services toconsumers, operators pay dearly for servicedisruptions through financial losses and branddamage. A start-up storm or a stampede occurswhen a piece of network infrastructure fails,causes an outage, and then when it issubsequently restored all of the systems relatedto that infrastructure try to reconnect at thesame time. It’s critical that the infrastructure isrobust enough to cope with start-up stampedesso that the system is not overwhelmed andservices can be restored as quickly as possible.

BT’s OpenStack Challenge #4

Start-up storms (or stampedes): need to ensure controller never overloaded

Compute

Control

Characterized up to 50 nodes so far

• Characterization in progress for higher node counts

Titanium Server systems engineering: tuning and optimizations

• System controls ensure stability during Dead Office Recovery (DoR)

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Whitepaper | 04

In virtualized environments, the infrastructuremust be just as resilient as legacy, hardware-based solutions for these scenarios. Forexample, in a vCPE deployment, when a fiber iscut and restored, there could be thousands ofOpenStack-based compute nodes trying toattach to the OpenStack controller node at thesame time.

The problem with OpenStack today is that it isnot resilient in stampede conditions, whichresults in outages lasting longer because theycannot be resolved quickly. Typically, there aremultiple SSH (secure shell) sessions percompute node, which makes the process ofreattaching too slow and computationallyintensive. Often, the OpenStack-based controllerbecomes overloaded and does not recoverwithout manual intervention.

Obstacle 5: Securing OpenStack over theInternet.

Security is a paramount concern for networkoperators, especially when it comes to deliveringbusiness services to corporate customers. It isunthinkable for an operator to deploy a newsoftware-based network that is less secure thanthe previous, traditional hardware-basedimplementation. But, based on BT’s findings,that is exactly what operators will be doing ifthey rely on OpenStack for VM orchestration invCPE scenarios.

In a typical vCPE deployment, there is acentralized OpenStack control plane anddistributed compute nodes usually deployed atthe customer’s premises. The link between thecontrol and compute nodes needs to be secure,but sometimes that link is the public Internet. Theproblem with OpenStack in the vCPE scenario isthat there are too many potential attack vectors,which makes the VM orchestration inherentlyinsecure over the public Internet.

BT found in its NFV Lab that connecting acontrol and compute node over the Internetrequired a huge amount of reconfiguration tothe firewall. To make the connection worksecurely, BT’s lab engineers had to open morethan 500 pinholes, or ports, in the firewall,including ports for virtual network computing(VNC) and SSH for command line interfaces. Inaddition, every time the compute node’sdynamic IP address changed, the firewall had tobe reconfigured. Firewall reconfiguration is notonly a tedious task, but it is also a risky activitybecause it can potentially leave the firewall, aswell as other VNFs and services, open tomalicious attacks.

Given the amount of firewall configurationrequired in a vCPE scenario with centralizedcontrol and distributed compute nodes,OpenStack cannot be sufficiently secured overthe public Internet.

“The problem withOpenStack in the vCPEscenario is that there aretoo many potential attackvectors, which makes theVM orchestrationinherently insecure overthe public Internet.”


Obstacle 6: Backwards Compatibility betweenOpenStack Versions.

In a distributed NFV deployment like vCPE, boththe compute nodes and the control nodes arerequired to run the same version of OpenStack.Incompatible versions of OpenStack will causeproblems in the telco cloud and potentially leadto service outages. OpenStack has a new releaseevery six months. If an operator has a large-scaledeployment with thousands of distributedcompute nodes and wants to stay up-to-datewith the latest OpenStack release, it will have toupgrade manually each of the nodes, which isexpensive, time consuming, and increases therisk of disrupting services. Indeed, it could takeweeks for an operator to migrate their entirecloud environment to the latest OpenStackrelease, which is an unreasonable amount oftime. It is equally unacceptable that there couldbe service disruptions during upgrades due tosystem reboots.

While there are tools and guides available fromthe OpenStack community that can help withchecking compatibility and API versioning, theseare relatively new and do not fully address theproblem for telcos. OpenStack does not providea solution that can reliably and automaticallyensure efficient upgrades as well as versioncompatibility across a network operator’s entirecloud environment. OpenStack, as it is today,does not support compatibility betweenversions that is robust enough for telecomnetwork operators.

Whitepaper | 06

Optimization Removes OpenStack Obstaclesfor vCPE

The challenges listed that BT raised aresignificant impediments to vCPEdeployment and raise legitimate doubtsabout OpenStack’s viability for virtualizednetwork infrastructure in general. However,open source software is not a panacea fornetwork virtualization. Open source codeneeds to be tuned and optimized to meetcarrier resiliency and reliability requirements.

As a leading supplier of infrastructure softwareplatform for network virtualization, Wind Riverhas developed and deployed solutions forOpenStack that resolve each of the obstaclesdiscussed above. These solutions have beenimplemented in the Titanium Server andTitanium Server CPE virtualization platforms.The latter is a two-node configuration ofTitanium Server designed for on-premiseequipment. It delivers all of the reliability andperformance of Titanium Server except in amuch smaller footprint.

Solution 1: Simplify VNF Initiation.

There are easier and more controllable ways toconfigure vNIC binding to VNFs. One option isto allow the VNFs to enumerate the vNICbinding order, which can be configured prior tolaunching the VNF. Titanium Server thenensures that the vNICs are connected to thecorrect networks. The solution not onlyguarantees efficiency and reliability in thebinding process, but it also improves theportability of VNFs so that operators can initiatethe same VNFs in different cloud environmentswithout having to customize it for each andevery connection.

Another option is to define vNIC binding inOpenStack HEAT templates. A HEAT templatespecifies relationships between resources tomanage NFV infrastructure. By configuring thevNIC ordering requirements for VNFs in HEATtemplates specific to each VNF, the VNFinitiation is more precise, less complicated andrepeatable. This approach simplifies the initiationprocess for multiple VNFs, each of which couldhave different vNIC numeration requirements.

Both options allow VNFs to boot onto TitaniumServer without network operators having tomodify any of the VNFs, which reduces theamount of time and complexity involved withinitiating VNFs. This is particularly advantageousfor vCPEs when operators want to add multiplenew features or services at the same time forbusiness customers.

“Open source software isnot a panacea fornetwork virtualization.Open source code needsto be tuned andoptimized to meet carrierresiliency and reliabilityrequirements.”



Service chain modification: no OpenStack primitives for fast reconfiguration

Option 2:

• Reconfigure Titanium Server vSwitch flows using SDN

Services(VNFs)

1 2 3 LAN

1 2 3 LAN4

1 2 3 N LAN

Option 1:

• Orchestrate service chain update using OpenStack within Titanium Server

• Accelerated by use of HEAT stack for each service

Initial service chain

Add new serviceat the end:• Change Stack C• Create Stack N

Add new servicein the middle:• Change Stack C• Create Stack N

Add a new service in seconds• vs. weeks or months today

Titanium ServerSoftware

ServiceA

HEAT Stack A

ServiceB

HEAT StackB

ServiceC

HEAT StackC

ServiceA

HEAT Stack A

ServiceB

HEAT StackB

ServiceN

HEAT StackN

ServiceC

HEAT StackC

ServiceA

HEAT Stack A

ServiceB

HEAT StackB

ServiceN

HEAT StackN

ServiceC

HEAT StackC

Solution 2: Simplify Service Chain Modification.

The process for modifying service chains invCPE deployments can be accelerateddramatically so that operators can quicklyreconfigure and launch new services for businesscustomers. The optimal solution is to leverageTitanium Server and assign a separate HEATstack within OpenStack to each service, ratherthan assigning a HEAT stack to the entire servicechain. Dedicating a HEAT stack to each serviceensures fewer changes to the service chain whenmaking modifications.

For example, consider an enterprise customerthat wants to start using a WAN acceleratorservice in addition to the firewall, router andmalware detection services it already uses. AsFigure 1 illustrates, the operator would need tomake only two changes to the service chain:modify HEAT stack C (which is associated withthe last service in the chain) and create HEATstack N for the WAN accelerator. Furthermore, ifan operator wanted to add the new service tothe middle of the chain, that too would requirejust two changes, as the image shows.

Alternatively, operators could also reconfigurethe virtual switch (vSwitch) flows in TitaniumServer using SDN. Both solutions reduce thetime it takes to reconfigure a businesscustomer’s service package from months, as it istoday, to seconds.

“Both solutions reducethe time it takes toreconfigure a businesscustomer’s servicepackage from months,as it is today, toseconds.”

Whitepaper | 08

Scalability of the controller(s) to support hundreds of compute nodes

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Control

Storage

Control

Storage

Frame-LevelSolution

4 - 30 nodes

Large-Scale Solution

Hundreds of nodes

Small-Scale Solution

Two nodes

Compute

VM

Control

VM VM

Storage

Compute

VM

Control

VM VM

Storage

Titanium Server CPE providesideal configuration for vCPE

and enterprise edge use cases

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Compute

VM VM VM

Compute

VM VM VM

Top of Rack Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Compute

VM VM VM

Compute

VM VM VM

Top of RackCompute

VM VM VM

Compute

VM VM VM

Top of Rack

Control

Storage

Control

Storage

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Control

Storage

Control

Storage

Compute

VM VM VM

Compute

VM VM VM

Top of Rack

Control

Storage

Control

Storage


Solution 3: Validate the Scalability ofOpenStack.

The key to overcoming OpenStack’s scalabilityissues is rigorous testing. Titanium Server hasbeen validated to scale up to hundreds of nodesas well as down to just two nodes, which isunique among commercial NFV infrastructures.To prove the scalability of the system, TitaniumServer was first tested on real hardware, andwhen the scale grew too large, it was testedusing simulation techniques. Then, it was testedfor scaling down to two nodes. For any bugsthat were detected in the testing process,patches were developed and implemented toensure the scalability of the servers. The patcheshave been shared upstream with the OpenStackcommunity as well.

Rigorous testing on real hardware and viasimulations, along with software enhancementsand optimization, removes any doubt operatorshave about Titanium Server’s scalability.

The ability to scale down to two redundantOpenStack-based nodes is just as significant asbeing able to scale up to hundreds of nodes,especially for vCPE deployments at the edge ofoperator networks where supporting multiplelarge servers is network resource intensive.Titanium Server CPE provides compute, controland storage functionality on each node, whichcreates a fully redundant system that can bedeployed at an enterprise customer site usingjust two nodes for vCPE services. Virtualmachines can be run on both nodes in this two-node scenario. This delivers significant capexand opex savings when compared to the four orfive nodes needed by an enterprise solution.

Through tuning, optimization, and by employingvarious OpenStack plug-ins, Titanium Server CPEoffers operators flexibility in deployment size aswell as certainty of the system’s scalability.


Solution 4: Build Resiliency to Cope with Start-Up Storms.

OpenStack can be optimized to cope with start-up storms and to ensure that the OpenStackcontroller is never overloaded and manualintervention is never needed. This is anothercase where robust testing of Titanium Server hasverified the systems engineering and tuning thatmake OpenStack more resilient for telcooperations, particularly in vCPE deploymentscenarios.

Titanium Server has been systematicallyreviewed and tested to address OpenStack’svulnerability in start-up stampedes as well asDead Office Recovery (DOR) conditions, in whichall power is lost to a facility that hosts servers. Inthe DOR scenario, Titanium Server was testedusing a 50-node system. Myriad race conditionswere simulated, such as powering off all thenodes at the same time and turning them backon at the same time; or deliberately overloadingspecific parts of the system to find weak points.

Tests prove that OpenStack can withstand start-up storms or DORs and come back on, fullyrestored, without time-consuming manualintervention.

Solution 5: Secure OpenStack over theInternet.

The simplest, and most elegant, solution tosecuring OpenStack over the public Internet isto not use the public Internet. That is, in a vCPEdeployment, distribute the OpenStack controland compute nodes together so that they donot have to communicate over the Internet.

In Titanium Server CPE, the security issue withOpenStack is eliminated because the server isdesigned to distribute compute and control outto the edge of the network at the customer’spremises. Titanium Server’s design ensures thatthe control and compute nodes do not need tocommunicate over the public Internet because

they are both in a secure location at thecustomer premises. The NFV infrastructure stillneeds to communicate with the orchestrationlayer, however, and that is handled by leveragingstandard IT security techniques, such as VPNsand firewalls, which are likely to be already inplace at the customer premises.

Rather than have centralized control anddistributed compute, Titanium Server CPE hascentralized orchestration and low-costdistributed control and compute, which createssecurity and high reliability in a small footprintsolution for vCPE deployments.

Solution 6: Enable Hitless Upgrades to EnsureCompatibility between OpenStack Versions.

Network operators can keep their clouds andvCPE services up and running even whenupgrading to new versions of OpenStack. Butthey won’t find these capabilities in vanilla, off-the-shelf OpenStack. Achieving efficient, carrier-grade reliability in OpenStack upgrades requiresoptimization and expertise. Titanium Serverfeatures a comprehensive upgrade solution thatincludes hitless upgrades, live migrations andhot patching.

In a hitless upgrade, the NFV infrastructure doesnot have to be taken down and rebooted tocomplete an upgrade to a new version ofOpenStack or a new version of Titanium Server.The vCPE services and VNFs remain live duringthe upgrade so that here is no impact to thebusiness customer’s services. Since the VMs aremigrated live, there is no service downtime.Titanium Server also supports hot patching forminor updates, which can be deployed onto arunning system and automatically loaded ontoall of the nodes in the system.

Whitepaper | 10

Conclusion

OpenStack was not designed for telecom networks and theopen source code has yet to meet all of the requirementsfor resiliency and reliability that operators demand. Getting

OpenStack to that level requires a significant investment in timeand resources. However, that does not mean OpenStack cannot bedeployed today. A solution based on open standards and opensource software, hardened for commercial products, meets carriers’needs for interoperability and avoiding vendor lock-in while alsodelivering the flexibility, performance and reliability they require.

Wind River Titanium Server fills the carrier-grade gaps inOpenStack, making it fit-for-purpose for VM orchestration for NFV.Based on open standards and open software and supported by anecosystem of leading technology providers, Titanium Server issuited to vCPE scenarios and ready for commercial deploymenttoday. With Titanium Server, OpenStack is not an obstacle to vCPE.

© 2016

Produced by the mobile industry for the mobile industry, MobileWorld Live is the leading multimedia resource that keeps mobileprofessionals on top of the news and issues shaping the market. Itoffers daily breaking news from around the globe. Exclusive videointerviews with business leaders and event reports providecomprehensive insight into the latest developments and key issues.All enhanced by incisive analysis from our team of expertcommentators. Our responsive website design ensures the bestreading experience on any device so readers can keep up-to-datewherever they are.

We also publish five regular eNewsletters to keep the mobileindustry up-to-speed: The Mobile World Live Daily, plus weeklynewsletters on Mobile Apps, Asia, Mobile Devices and Mobile Money.

What’s more, Mobile World Live produces webinars, the Show Dailypublications for all GSMA events and Mobile World Live TV – theaward-winning broadcast service of Mobile World Congress andexclusive home to all GSMA event keynote presentations.

Find out more www.mobileworldlive.com

About Wind River

A global leader in delivering software for intelligent connectedsystems, Wind River® offers a comprehensive, end-to-end portfolioof solutions ideally suited to address the emerging needs of IoT,from the secure and managed intelligent devices at the edge, to thegateway, into the critical network infrastructure, and up into thecloud. Wind River technology is found in nearly 2 billion devices andis backed by world-class professional services and award-winningcustomer support.

Documents

Overcoming OpenStack Obstacles in vCPE