1188 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 35, … · 1188 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 35, NO. 6, MARCH 15, 2017 Flexible Architecture and Autonomous Control Plane for Metro-Scale

1188 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 35, NO. 6, MARCH 15, 2017

Flexible Architecture and Autonomous ControlPlane for Metro-Scale Geographically

Distributed Data CentersPayman Samadi, Senior Member, IEEE, Matteo Fiorani, Member, IEEE, Yiwen Shen, Student Member, IEEE,

Lena Wosinska, Senior Member, IEEE, and Keren Bergman, Fellow, IEEE, Fellow, OSA

(Invited Paper)

Abstract—Enterprises and cloud providers are moving awayfrom deployment of large-scale data centers and towards small-to mid-sized data centers because of their lower implementationand maintenance costs. An optical metro network is used to pro-vide connectivity among these data centers. The optical networkrequires flexibility on bandwidth allocation and various levels ofQuality of Service to support the new emerging applications andservices including the ones enabled by 5G. As a result, next gen-eration optical metro networks face complex control and manage-ment issues that needs to be resolved with automation. We presenta converged inter/intra data center network architecture with anautonomous control plane for flexible bandwidth allocation. Thearchitecture supports both single-rate and multi-rate data planeswith two types of physical layer connections (Background and Dy-namic) that provide connections with strict bandwidth and latencyrequirements. We demonstrate autonomous bandwidth steeringbetween two data centers on our prototype. Leveraging a simu-lation platform, we show up to 5× lower transmission times and25% less spectrum usage compared with the single-rate conven-tional non-converged networks. This is a significant improvementin the data center network performance and energy efficiency.

Index Terms—Autonomous control plane, data center networks,networks, network provisioning strategy, optical communication,optical metro networks, quality of service (QoS), software-definednetworking (SDN), traffic monitoring.

I. INTRODUCTION

B IG Data analytics and cloud computing have drasticallyincreased the deployment of data centers ranging in num-

ber of servers from a few hundred to a million. Considering

Manuscript received October 31, 2016; revised January 9, 2017; acceptedJanuary 10, 2017. Date of publication January 15, 2017; date of current versionMarch 24, 2017. This work was supported in part by the CIAN NSF ERC (EEC-0812072), NSF NeTS (CNS-1423105), DoE Advanced Scientific ComputingResearch under Turbo Project DE-SC0015867, and in part by the SwedishResearch Council (VR).

P. Samadi, Y. Shen, and K. Bergman are with the Lightwave ResearchLaboratory, Department of Electrical Engineering, Columbia University, NewYork, NY 10024 USA (e-mail: [email protected]; [email protected];[email protected]).

M. Fiorani and L. Wosinska are with the Optical Networks Laboratory, Schoolof ICT, KTH Royal Institute of Technology, Stockholm 114 28, Sweden(e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JLT.2017.2652480

the construction and maintenance costs along with operationreliability, enterprises and cloud-providers are moving towardsdeploying small- to mid-sized data centers rather than the morecostly large-scale data centers [1]. These smaller size data cen-ters are generally located in metro-scale distances with opticalnetworks providing the connectivity among them.

Increased utilization of optical metro networks is foreseeableas the upcoming 5th generation of mobile networks (5G) [2]is expected to exploit general purpose data centers to providevirtualized network services, rather than relying only on cus-tomized hardware systems. More specifically, 5G systems willbenefit from a distributed data center approach, to bring theservices closer to the end-users and to satisfy the tight delayrequirements imposed by new critical applications [3].

Traffic trends predict that the metro network traffic will growtwice as fast as long-haul traffic between 2014–2019 and it willaccount for more than 60% of the total IP traffic [4]. This sub-stantial growth means that most of the traffic that will enter themetro networks will stay locally and will not enter the long-haulbackbone networks. This trend is due to the growing adoption ofContent Delivery Networks (CDN) that use the metro network toaccess and distribute media like video streaming to its clients [5].

New emerging cloud-based and 5G systems will introduceservices with strict bandwidth and latency requirements. Someof these services such as remote control of industrial systems[6] and migration of virtualized Evolved Packet Cores (vEPCs)[7] require strict end-to-end latency. Other services, includingdata replication and backup demand large bandwidths. Currentoptical networks are static with fixed bandwidth provisionedthroughout the network. In these static networks, it will takedays or even a month to add or drop a new connection inthe physical layer. The advent of Software-Defined Network-ing (SDN) has paved the way to move away from static net-works to flexible networks. SDN is enabled by separation ofdata plane and control plane and remote management of net-work components from the higher layers, a necessary featuremaking it possible to achieve dynamic networks and supportconnections with different bandwidth and latency requirements.In addition, given the inevitable complexity introduced in thenext-generation networks, human-operating networks will be

0733-8724 © 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistributionrequires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

SAMADI et al.: FLEXIBLE ARCHITECTURE AND AUTONOMOUS CONTROL PLANE 1189

inefficient and error-prone. Autonomous and cognitive networkmanagement and provisioning can be the solution.

In this paper, we present a software-defined autonomousoptical network for metro data centers. This architectureenables scaling up data centers in distance to support dynamicload variations. It also supports distribution of applications overmultiple data centers to improve the service reliability perfor-mance. Our proposed converged inter/intra architecture i) sup-ports single-rate [8] and multi-rate dynamic optical data planes,ii) enables connections with various QoS through two types oflinks, namely background and dynamic, and iii) manages con-nections in both explicit operations (request by operator) andimplicit operations (automated assignment based on the trafficcharacteristics) by the SDN control plane. We built a prototypeconsisting of three data center nodes, two fully emulated andone only in the control plane. Experimental evaluations of thecontrol plane capabilities are presented. We have also developeda simulation platform and evaluated the control strategy on arealistic metro network consisting of 38 nodes and 59 opticallinks [3].

In the rest of this paper, first we review the related workson dynamic inter data center networks and architectures thatenable bandwidth on-demand. In Section III, we discuss ourarchitecture including the optical data plane and the SDNcontrol plane. Section IV shows the simulation model and thenumerical analysis. The prototype and the experimental resultsare presented in Section V. Section VI is a discussion on theapplication, scalability, and agility of our proposed architectureand Section VII presents our conclusion.

II. RELATED WORKS

Dynamic optical networks providing efficient interconnec-tion of metro data centers have been recently investigated inthe literature [9]–[14]. [9] discusses the traffic growth trendsand capacity demands in Google’s optical inter data center net-works. It identifies the autonomous network management as oneof the key solutions to meet the future capacity demands andsatisfy the requirements of new critical distributed applications.In [10] an elastic optical inter data center network architectureand corresponding resource allocation strategies for efficientsupport of cloud and big data applications are investigated. Thework in [11] presents a flexible optical metro node architectureand discusses a method to provide dynamic service provision-ing among metro data centers. The solutions proposed in [10],[11] are not compared with convectional metro and data centernetworks, and thus it is not possible to evaluate the potentialbenefits in terms of transfer times and resource usage.

Research works in [12]–[15] explored new optical networkarchitectures that allow to set up direct lightpaths between racksin different metro data centers. This approach makes it possibleto reduce the data transfer time among different metro data cen-ters and potentially provide better resource usage with respectto conventional networks. However, the architectures proposedin [12], [13] rely on optical burst/packet switching and requireexpensive network components (e.g., semiconductor optical am-plifiers and tunable wavelength converters) that might not besuitable for metro networks. The architecture proposed in [14]

instead relies on a combination Wavelength, Time and SpatialDivision Multiplexing (TDM/WDM/SDM) technologies whichmight also result in high network costs using currently availablenetwork components.

III. NETWORK ARCHITECTURE

Our proposed network architecture is a converged inter/intradata center network that provides static data center-to-datacenter (background) and dynamic rack-to-rack and/or pod-to-pod (dynamic) connections. In this section, the single-rate andmulti-rate data plane, the control plane with explicit connectionrequests and autonomous bandwidth steering along with thenetwork provisioning algorithm are discussed.

A. Data Plane

Our proposed converged network architecture is shown inFig. 1. The intra data center connectivity is through an Elec-trical Packet Switched (EPS) network. The inter data centerconnectivity is via the optical gateway [16] that is a Colorless,Directionless, Contentionless Reconfigurable Optical Add/DropMultiplexer (CDC-ROADM) with few hundred ports. Two typesof connections are supported in this architecture: i) Back-ground connections (shown in red) from the EPS network andii) Dynamic connections (shown in green) directly from theracks/pods. In the single-rate scenario, all connections are at thesame rate and in multi-rate scenario dynamic connections are atdifferent rates (10G, 40G, 100G, etc.).

Background connections are shared by several traffic flowsto/from different ToR switches. Consequently, the availablebandwidth for each flow depends on how many flows sharethe same background connection thus each flow can experi-ence a relatively small bandwidth. For this reason, backgroundconnections are utilized for short-lived and low data-rate trafficflows that have no strict requirement in terms of bandwidth,such as control information among servers running a distributedapplication. A set of background connections is always active toprovide basic connectivity among all the distributed data centers.

Dynamic connections are dedicated to perform large datatransfers between racks in different data centers. In the multi-ratescenario, the dynamic connections employ always the highestavailable data rate for the transmission between the racks. Inthis way, the data transfers can be performed efficiently andcompleted in a short time. At high network loads, it mighthappen that the setup of a dynamic connection is blocked due tothe lack of wavelength resources. When this happens, the setupof the dynamic connection is rescheduled at a later time, whichincreases the total time required to complete the data transfer.This might negatively affect some critical data transfers, suchas the migration of virtualized 5G network services. For thisreason, we assign a priority to each dynamic connection and weemploy a control algorithm for efficiently manage the prioritylevels. The control algorithm is described in the next subsection.

The direct rack-to-rack and/or pod-to-pod connections in thisarchitecture also enables transparent data center to data centerconnections without Electrical/Optical/Electrical conversions.Optical metro-scale transceivers supporting a transparent reachup to 80 km provide the direct connections at low cost.


Fig. 1. Proposed converged inter/intra data center network architecture supporting applications/services with various QoS requirement by background anddynamic connections.

Fig. 2. SDN control plane workflow with explicit and autonomous (implicit)connection requests.

B. Control Plane

The SDN control plane architecture is shown in Fig. 2. Themetro network controller consists of the network optimizer,topology manager, and the network database. The data centercontroller has the traffic monitoring and the topology managermodules. The traffic monitoring module receives the rack-to-rack traffic information from the OpenFlow switches. The net-work optimizer determines the connections that need to beestablished and utilizes the Routing and Wavelength Assign-ment (RWA) module to find the available path and wavelength.The Topology manager receives the configurations from thenetwork optimizer and establishes/removes connections. Thenetwork database knows the network topology and keeps trackof the used paths and wavelengths. In our previous work [16],we used Redis for the inter controller and application to controlplane communications, however REST connections can be usedas well. In our prototype presented in Section V, for simplicitywe implemented all controllers in one RYU instance on separatemodules.

Data centers are divided in different subnets based on theirsize, e.g. data center1 has the 10.1.x, data center2 , 10.2.x, etc.ToRs and/or Pods are OpenFlow switches with permanent flowrules for global inter data center connectivity through back-ground connections. The control plane periodically monitors theflow counters by the traffic monitoring module and calculatesthe total data center-to-data center traffic and the Standard Devi-ation (SD) of each Rack-to-Rack and/or Pod-to-Pod connection.Based on these two metrics over time, a new background ordynamic connection (implicit) is established or removed.Machine learning tools can be used to optimize the frequencyof reconfiguration and the thresholds using historical data.Explicit dynamic connections are made from the applicationlayer directly to the network optimizer module. The detailsof the network optimization algorithm for both explicit andimplicit connections are as following.

The algorithm for the implicit operation is shown inAlgorithm 1. The algorithm is designed for a multi-rate con-verged network with two transmission rates and two levels ofpriority. At the beginning, a set of static background connec-tions are established to provide all-to-all connectivity amongthe metro data centers. The control plane measures the traffic onthe background connections. If the traffic on background con-nection B between source and destination data centers (DCS

and DCD ) is higher than a specified threshold, the controllerruns a RWA to identify a possibility to establish a new lightpathbetween DCS and DCD . In the positive case, the controller cal-culates the SD of the traffic of the different flows in B. If theSD is ≤ 1, the controller creates a new background connectionbetween DCS and DCD . On the other hand, if the SD is > 1,the controller creates a new dynamic connection for carryingthe flows between the Racks/Pods generating the highest traf-fic. The controller creates the new dynamic connection usingthe highest available data rate in the network. When using thehighest data rate the controller also makes sure that the distancebetween DCS and DCD is not higher than the optical reach of thetransceivers. The new dynamic connection is assigned priority 0and marked as Bulk in case that the connection carries the trafficof a delay-tolerant application (i.e., bulk data transfer). On theother hand, the new dynamic connection is assigned priority 1


Algorithm 1: Control algorithm for implicit operation.Create a set of static Background connections;while Time < Simulation time do

Monitor the active Background connections;if Traffic on Background B DCS − DCD > Thr. then

Run RWA on topology database DB (k-SP FF);if ∃ lightpath DCS − DCD then

if SD of traffic on B ≤ 1 thenCreate new Background DCS − DCD ;

else if SD of traffic on B > 1thenif Distance DCS − DCD > Max Distance then

Create new low-rate Dynamic DCS − DCD ;else if Dist. DCS − DCD ≤ Max Distance then

if Tr. high-rate free on source/dest. racks thenCreate new high-rate Dynamic DCS − DCD ;

else if New connection has Pr. 1 (Critical) thenMove Dynamic Pr. 0 (Bulk) to Background;Create new high-rate Dynamic DCS − DCD ;

else if New connection has Pr. 0 (Bulk) thenCreate new low-rate Dynamic DCS − DCD ;

end ifend if

end ifend if

end ifend while

and marked as Critical, if associated to a delay-sensitive appli-cation (i.e., vEPC migration). If the new dynamic connectionhas priority 1 (Critical), the controller can force an active dy-namic connection with priority 0 (Bulk) to move to a lower datarate or to background.

The algorithm for the explicit operation was proposed in ourprevious work [8]. This algorithm is designed for single-rateconverged networks and aims at minimizing the blocking prob-ability for dynamic connections with high priority. Accordingto this algorithm, when the metro network controller receivesa new explicit dynamic connection request (R) between sourcedata center DCS and destination data center DCD , it runs aRWA algorithm. If the RWA identifies an available lightpathbetween DCS and DCD , then R is served using that lightpath.In our work R can have priority 0 (for bulk applications) or 1(for critical applications). If the RWA does not find an availablelightpath and R has priority 0, then R is rescheduled at a latertime. If the RWA does not find an available lightpath and R haspriority > 0, the algorithm tries to move an active connection(R′) with lower priority to a background connection. In this waythe optical resources occupied by R′ can be released and utilizedto serve R. Note that this causes an increment in the transfertime for R′ due to the fact that using a background connectionit will share the channel capacity with other traffic.

In this work, we have implemented two separate controllersfor the implicit and explicit operations. However, in principle theco-existence of both operations is possible. This would requiresome extensions to our network control algorithms in order toavoid contentions, which will be addressed in our future work.

IV. SIMULATION MODEL AND NUMERICAL RESULTS

We developed an event-driven simulator to evaluate thebenefits of the proposed converged architecture and controlalgorithms in a realistic network scenario. The performanceof the converged single-rate architecture with explicit controlplane operation are illustrated in our previous work [8]. In thispaper we present the results of the converged architecture withmulti-rate data plane and implicit control plane operation.

The reference metro topology is composed of 38 nodes,59 links and 100 wavelengths per fiber [3], [8]. Each noderepresents a metro data center with 100 racks/pods and eachrack/pod switch is equipped with one 10G and one 40G WDMtunable transceivers connected to the optical gateway. The EPSis equipped with 25 10G grey transceivers connected to theoptical gateway as well. The control plane employs k-shortestpaths with first-fit wavelength assignment (k-SP FF) algorithmto establish new lightpaths. We assume that rack/pod switchesgenerate traffic flows with lognormal inter-arrival distribution.The choice of the inter-arrival distribution is based on a numberof studies on real data center traffic traces that have been per-formed in recent years by Microsoft [19]–[21]. These studiesshow that the lognormal inter-arrival distribution, which is aheavy-tailed probability distribution, provides the best fit to theinter-arrival distribution of the traffic flows at the ToR switches.We vary the mean of the lognormal distribution to mimic dif-ferent traffic loads. On average half of the traffic flows havepriority 0 (Bulk) and half have priority 1 (Critical). The flowsrepresent data transfers with sizes uniformly distributed between1 and 500 GB. We compared the performance of: (i) ConvergedMulti-Rate (MR) network, (ii) Converged Single-Rate (SR) and(iii) Conventional SR network. In our simulations the three net-works have the same overall capacity. In the Converged SR only10G transceivers are used for both background and dynamicconnections, while the Conventional relies only on the back-ground connections. To manage different levels of priority inthe Conventional network, we employ a traffic engineering tech-nique that reserves at least 3 Gbps capacity over a backgroundconnection for Critical transfers. The results are illustrated inFig. 3.

Fig. 3(a) shows the average time required to complete a datatransfer between different metro data centers, as a function ofthe load. It can be observed that the proposed Converged MRprovides on average 2.5 × faster Critical transfers and 2 × fasterBulk transfers, with respect to the Converged SR. This is dueto the use of multi-rate transmission that allows for improv-ing the transmission efficiency on the dynamic connections. Inaddition, the Converged MR provides 5 × faster Critical andBulk transfers compared to the Conventional network. This isdue to the combined use of multi-rate transmission and dy-namic connections for performing large data transfers. On theother hand, the Converged SR network provides on average2.5 × faster Critical transfers and 2 × faster Bulk transfers,with respect to the Conventional, thanks to the use of dynamicconnections.

Fig. 3(b) shows the average number of wavelengths requiredto carry different loads. The Converged MR requires at least20% less wavelengths than the Converged SR and 25% less


Fig. 3. Simulation results comparing the performance of the proposed Converged Multi-Rate (MR) and Single-Rate (SR) architectures with implicit controlstrategy. The proposed architectures are also compared with a Conventional SR network. (a) Average transfer times, (b) average network resource usage,(c) blocking probability for new connections.

Fig. 4. Average usage of the transceivers at 10G and 40G.

wavelengths than the Conventional. On the other hand, the Con-verged SR requires between 5% and 10% less wavelengths withrespect to the Conventional network. Consequently, we can con-clude that the Converged networks provide a more efficientresource usage. The main reasons are that in the Converged ar-chitectures the data transfers are faster and thus occupy networkresources for shorter times. In addition, the dynamic connectionsare always fully utilized and no bandwidth is wasted.

Fig. 3(c) shows the blocking probability in the Converged net-works. The blocking probability indicates the probability thatthe establishment of a new connection is blocked due to thelack of wavelength resources or of free transceivers for estab-lishing the lightpath between the source and destination datacenters. It can be observed in the Converged architectures theCritical connections have lower blocking probability than theBulk connections. This is due to the procedure described inAlgorithm 1 that allows to move Bulk connections to back-ground to serve new Critical connections. In the Converged MRthe blocking probability is slightly higher than in the ConvergedSR, due to the fact that less transceivers are employed at therack/pod switches, which leads to a slightly higher probabilitythat a new connection will not be served due to the lack of a freetransceiver.

Finally, in Fig. 4 we show the percentage of the time in whichthe 10G and 40G transceivers at the racks/pods switches areutilized in the Converged MR network. It is shown that, for

sufficiently high traffic, the 40G transceivers are utilized almost100% of the time. The reason is that in our control strategy(Algorithm 1) we prioritize the use of the 40G transceivers,which are the most expensive asset, so that the operator can getmaximum outcome from the investment on the more expensive40G transceivers.

V. PROTOTYPE AND EXPERIMENTAL RESULTS

Fig. 5 shows the implemented prototype. Data centers 1 and2 consist of 4 racks, each connected to one server. Servers areequipped with a 6-core Intel Xeon processor, 24 GB memory anda dual-port 10G Network Interface Card (NIC). ToR switchesare implemented by logically dividing two Pica8 10G OpenFlowswitches to 8 bridges. Each ToR has a 10G port to the serverand a 10G port to the EPS switch with a direct-attached coppercable, and a 10G port to the optical gateway with a 10G opticaltransceiver. ToR1 also has a 40G uplink. The EPS is a bridge ina Pica8 10G OpenFlow switch with two 10G transceivers. Opti-cal gateways are implemented using Calient and Polatis OpticalSpace Switches (OSS), Nistica WSS and DWDM Mux/Demux.The 10G optical transceivers are 10G SFP+ DWDM with 24 dBpower budget. Due to limitations in form-factor single wave-length DWDM 40G transceivers, we used a (4×10G) QSFP+with 18 dB power budget, however commercial single wave-length form-factor 40G and 100G transceivers will be availablesoon [22].

The control plane is implemented in a controller server run-ning RYU OpenFlow controller and is connected to the ToR andoptical gateways via 1 Gbps campus Internet. Data centers 1 and2 are in 10.0.1.x and 10.0.2.x subnets respectively. All the con-trol plane modules are developed in Python and are integratedin a RYU application. Data center 3 is only implemented in thecontrol plane to achieve a mesh topology. Distances betweenthe data centers are 5–25 km.

In the first set of experiments, we evaluate the backgroundconnection. Fig. 6(a) shows the throughput as a function of timefor the background connection between Data centers 1 and 2.At the start, only Rack 1 is transmitting data, thus they use thefull 10 Gbps bandwidth of the link. At 10 s, Racks 2–4 alsostart data transmission, hence the 10 Gbps capacity is shared


Fig. 5. (a) Prototype of the 3-node metro-scale converged data center network. DC1 and DC2 are fully implemented, while DC3 is implemented only on thecontrol plane, (b) Snapshot of the prototype.

Fig. 6. Experimental results on the prototype: (a) Random traffic change between 4 racks on a background connection, (b) 4 simultaneous dynamic rack-to-rackconnections with an explicit request, (c) Spectrum of DC1 and DC2 links consisting of 5 DWDM connections (1 background and 4 dynamic), (d) Autonomous(Implicit) bandwidth adjustment on a background connection utilized by 4 racks between DC1 and DC2 , (e) Autonomous (Implicit) bandwidth adjustment byestablishing a dynamic connection between DC1 and DC2 in a single-rate data plane (10G), (f) Autonomous (Implicit) bandwidth adjustment by establishing adynamic connection between DC1 and DC2 in a multi-rate data plane (10G and 40G).

between the 4 racks. Between, 30–40 s, Rack 1 again utilizesthe whole bandwidth. At 40 s, the background connection isshared between Racks 1 and 2, each 4.5 Gbps and so on. Forlow priority services with loose bandwidth requirements, a back-ground connection performs well enough, however for higher

priority services, such unpredictable changes on the throughputis not acceptable, therefore a dedicated dynamic connection isrequired.

Next, we demonstrate explicit dynamic connections betweendata centers 1 and 2. Four Rack-to-Rack connections are


established and the throughput is measured. Fig. 6(b) showsthe result that is captured simultaneously on the servers. Allconnections are utilizing the maximum available bandwidth of10 Gbps. The slight difference between the throughput of Rack4 connection is due to difference in the brand of the transceivermodule compared to the rest. With dynamic connections, strictlatency and bandwidth demands for high priority services areguaranteed. Fig. 6(c) shows the spectrum of the connectionsafter 25 km of transmission. Channel C34 is the backgroundconnection and channels C26, C28, C30 and C32 are the fourdynamic connections.

We continue the prototype evaluation by demonstrating thecontrol plane capability to establish implicit connections. Thecontrol plane monitors the 4 flows between Racks 1–4 of datacenters 1 and 2 on the background connection every 2 s andcalculates the total throughout of all 4, every four measurements.As shown in Fig. 6(d), at the start Racks 1–4 have throughputof 0.8, 1, 1.2 and 1.4 Gbps, respectively, all together 4.4 Gbps.Between 18–36 s, the throughput of links 1 and 2 increase,however the overall traffic is still under the threshold that is9 Gbps. At 45 s, all Racks start transmitting more data thatresults in link saturation at 50 s (total traffic: 9.4 Gbps, SD:0.22). At this point, the control plane makes a new backgroundconnection since the SD ≤ 1. Now, there are two backgroundconnections having total throughput of 8.87 and 8.60 and theSD of 0.1 and 0.04, respectively.

Next, we evaluate the autonomous dynamic connection es-tablishment on a single-rate scenario. Fig. 6(e) shows the racksthroughput. Experiment starts by having a total 4.4 Gbps trafficon the background connection. At 15 s, Rack 1 increases itsthroughput to the point that the background connection is overthe threshold of 9 Gbps with the 4 flows having SD of 2.25.Since, it is larger than 1, the control plane establishes a newdynamic connection for the flow with the largest traffic that isRack 1. At 32 s, Rack 2 starts transmitting more data and satu-rating the background link, thus the control plane established anew dynamic connection for Rack 2 at 42 s. Same increase inrequired bandwidth occurs for Racks 3 and 4, and they also geta dedicated background connection.

Finally, we demonstrate an implicit dynamic connection inthe multi-rate scenario. Racks 1–3 are transmitting data at 0.8,1.2, and 1.4 Gbps, a total 4.4 Gbps on the background con-nection (Fig. 6(f)). At 16 s, Rack 1 starts increasing the traffic,resulting in a background link saturation at 20 s. The SD of the3 flows is >1, thus the control plane creates a new 40G dynamiclink for the flow with the largest traffic that is Rack 1. Nowthe 18 Gbps of traffic from Rack 1 is on the dynamic connec-tion and the traffic from Racks 2 and 3 is on the backgroundconnection between the two data centers. At 70 s, the trafficfrom Rack 1 drops to 1 Gbps, hence the dynamic connectionis removed and all 3 Racks transmit data on the backgroundconnection.

The experimental results show the feasibility of our approachfor traffic monitoring and network reconfiguration. Comparedwith the related works presented in Section II that mostly con-sider custom optical subsystems, we showed that only by us-ing commodity optical and electronic switches, and WSSs, it

is possible to build a reconfigurable optical network and steerthe bandwidth based on the traffic and application needs. Suchoptimal network resource allocation became possible by thecombination of an intelligent control plane and flexibility in thedata plane.

VI. DISCUSSION

Our proposed architecture is designed for metro-scale net-working of small- to mid-sized data centers that are now repre-senting more than 50% of the deployment in U.S. [23]. Becauselarge-scale data centers are approaching scalablity limits dueto the massive growth in network size and power consumption,we believe that distributed data centers in metro distances arethe long-term solution for data center deployment. AlthoughDWDM optical networks provide high data transfer capacity inthe order of terabits/s, however, the exponential growth in band-width demand, and the emerging cloud and 5G services forcea shift in architecture that delivers flexibility and automation inbandwidth provisioning.

This architectural option benefits enterprises which need tomanage their own network due to the privacy and security re-quirements. They lease a dark fiber network from an operatorand manage their own metro network. In the case of shared-orchestration, secure protocols need to be implemented forcontroller to controller interactions. Both implicit and ex-plicit control planes are compatible with shared-orchestration,however in the case of implicit, the data center owner hasto share traffic characteristics with the metro network oper-ator. In this case, a middleware can guarantee the securityrequirements.

A key consideration for any data center network architec-ture is scalability. The scalability of our architecture on thehardware side is dependent on the available OSS ports in theoptical gateway. Commercial OSS with 384× 384 ports are nowavailable [24]. Considering 25% of the capacity for backgroundtransceivers, the optical gateway would support 288 Pods. With255 servers per Pod, each data center based on our architecturecan have up to 70 k of servers. Considering a realistic metronetwork with 38 nodes and 59 links [3], in total over 2.5 millionservers can be supported.

A major novelty in this architecture is autonomous bandwidthsteering from the physical layer based on the traffic characteris-tics. The novelty of this feature is continuous rack-to-rack trafficmonitoring over an inter data center network and re-provisioningthe WDM spectrum capacity on the physical layer, on-demand.The applicability of this feature is measured by the controlplane latency that is dependent on the OpenFlow controller andhardware reconfiguration response times. OpenFlow controllerssuch as OpenDayLight [25] can now update flow rules at therate of a hundred flow rules in milliseconds. The delay of ourimplemented optical gateway to add/drop a WDM connectionin the physical layer is in the range of 100 ms that is equal to500 MB of data transfer in a 40 Gbps link. This delay can beimproved by a few orders of magnitude by leveraging the latest2D MEMS technology [26] and faster WSS [27] to reduce thenetwork capacity usage inefficiency caused by re-provisioning.


VII. CONCLUSION

We presented a flexible converged architecture for metro-scale networking of small- and mid-sized data centers. Thearchitecture supports both single-rate and multi-rate optical dataplane. The SDN control plane is designed to receive connectionrequests either from the application layer explicitly or to monitorthe traffic in the network and autonomously steer the bandwidthacross the network based on the traffic charactristics. Leveragingtwo types of connections – background and dynamic – differ-ent levels of QoS are designed into the architecture to supportconnections with strict requirements for bandwidth and latency.

Our inter data center prototype demonstrates both types ofconnections as well as autonomous (implicit) bandwidth adjust-ment between racks of two data centers. Additionally, using asimulation platform on a real-size metro network, we achieve upto 5× faster transmission times and 2.5× lower spectrum usagecompared to conventional single-rate networks. Because of itsability to support millions of servers in several data centers andthe terabits capacity of the DWDM network, this architecturecan also be useful as a design model for large-scale data centersto surpass scalability limits due to energy consumption and net-work size through distribution of clusters over short distances.Furthermore, the control plane’s capability to monitor trafficand collect the historical data, can improve the automation ofnetwork control and management.

ACKNOWLEDGMENT

The authors would like to thank A&T, Calient Technologies,and Polatis for their generous donations to their testbed.

REFERENCES

[1] V. Bahl, “Cloud 2020: The emergence of micro datacenter for mobile com-puting,” Microsoft, Redmond, WA, USA, May 2015. [Online]. Available:https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/Micro-Data-Centers-mDCs-for-Mobile-Computing-1.pdf

[2] 5G PPP Architecture Working Group, “View on 5G architecture,” WhitePaper, Jul. 2016. [Online]. Available: www.5g-ppp.eu

[3] P. Ohlen et al., “Data plane and control architectures for 5G transportnetworks,” J. Lightw. Technol., vol. 34, no. 6, pp. 1501–1508, Mar. 2016.

[4] Cisco, “Cisco visual networking index: Forecast and methodology, 2014-2019,” White Paper, Aug. 2015. [Online]. Available: http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html

[5] Cisco, “White paper: Cisco VNI Forecast and Methodology, 2015-2020,” White Paper, Jun. 2016. [Online]. Available: http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html

[6] Ericsson, “5G use cases,” White Paper, Nov. 2015. [Online]. Available:www.ericsson.com/5g

[7] T. Taleb et al., “EASE: EPC as a service to ease mobile core networkdeployment over cloud,” IEEE Network, vol. 9, no. 2, pp. 78–88, Mar.2015.

[8] M. Fiorani, P. Samadi, Y. Shen, L. Wosinska, and K. Bergman, “Flexiblearchitecture and control strategy for metro-scale networking of geograph-ically distributed data centers,” in Proc. Eur. Conf. Opt. Commun. 2016,pp. 1–3.

[9] X. Zhao, V. Vusirikala, B. Koley, V. Kamalov, and T. Hofmeister, “Theprospect of inter-data-center optical networks,” IEEE Commun. Mag.,vol. 51, no. 9, pp. 32–38, Sep. 2013.

[10] P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly efficient data migrationand backup for big data applications in elastic optical inter-data-centernetworks,” IEEE Netw., vol. 29, no. 5, pp. 36–42, Sep./Oct. 2015.

[11] M. Schiano, A. Percelsi, and M. Quagliotti, “Flexible node architecturesfor metro networks,” IEEE/OSA J. Opt. Commun. Netw., vol. 7, no. 12,pp. 131–140, Dec. 2015.

[12] G. Chen et al., “First demonstration of holistically-organized metro-embedded cloud platform with all-optical interconnections for virtual dat-acenter provisioning,” in Proc. IEEE Opto-Electron. Commun. Conf., Oct.2015, pp. 1–3.

[13] M. Fiorani, S. Aleksic, J. Chen, M. Casoni, and L. Wosinska, “Energyefficiency of an integrated intra-data-center and core network with edgecaching,” IEEE/OSA J. Opt. Commun. Netw., vol. 6, no. 4, pp. 421–432,Jun. 2014.

[14] S. Yan et al., “Archon: A function programmable optical interconnectarchitecture for transparent intra and inter data center networking,” J.Lightw. Technol., vol. 33, no. 8, pp. 1586–1595, Apr. 2015.

[15] P. Samadi, J. Xu, and K. Bergman, “Virtual machine migration over opticalcircuit switching network in a converged inter/intra data center architec-ture,” in Proc. Opt. Fiber Commun. Conf. Exhib., 2015, pp. 1–3.

[16] P. Samadi, K. Wen, J. Xu, and K. Bergman, “Software-defined opticalnetwork for metro-scale geographically distributed data centers,” OSAOpt. Express, vol. 24, no. 11, pp. 12310–12320, May 2016.

[17] S. Sanfilippo and P. Noordhuis, Redis. [Online]. Available: http://redis.io[18] RYU, “RYU SDN Framework.” [Online]. Available: https://osrg.

github.io/ryu/[19] S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken, “The

nature of data center traffic: Measurements & analysis,” in Proc. 9th ACMSIGCOMM Conf. Internet Meas. Conf., 2009, pp. 202–208.

[20] T. Benson, A. Anand, A. Akella, and M. Zhang, “Understanding datacenter traffic characteristics,” in Proc. ACM Workshop Res. EnterpriseNetw., 2009, pp. 65–72.

[21] T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics ofdata centers in the wild,” in Proc. 10th Internet Meas. Conf., Nov. 2010,pp. 267–280.

[22] Inphi, Inphi debuts 100G DWDM solution for 80 km data center intercon-nects. 22, Mar. 2016. [Online]. Available: https://www.inphi.com/media-center/press-room/press-releases-and-media-alerts/inphi-debuts-100g-dwdm-solution-for-80km-data-center-interconnects.php

[23] Natural Resources Defense Council, “Americas data centers are wast-ing huge amounts of energy,” Issue Brief, (Aug. 2014). [Online].Available: https://www.nrdc.org/sites/default/files/data-center-efficiency-assessment-IB.pdf

[24] Polatis Series 700, 384 × 384 port software-defined optical circuit switch.[Online]. Available: http://www.polatis.com

[25] OpenDayLight, OpenDaylight: Open source SDN platform. [Online].Available: https://www.opendaylight.org/

[26] M. Wu, T. J. Seok, and S. Han, “Large-scale silicon photonic switches,”in Proc. Int. Coonf. Adv. Photon., Jul. 2016, pp. 370–375.

[27] T. A. Strasser and J. L. Wagener, “Wavelength-selective switches forROADM applications,” IEEE J. Sel. Topics Quantum Elect., vol. 16, no. 5,pp. 1150–1157, Sep./Oct. 2010.

Payman Samadi received the Ph.D. degree in photonics from McGill Univer-sity, Montreal, QC, Canada in 2012. He is currently a Research Scientist at theLightwave Research Laboratory, Columbia University, New York, NY, USA. Hewas formerly with Optiwave Systems Inc., Canada as a Product Manager during2011–2013. He has coauthored more than 30 articles in peer-reviewed confer-ences and journals. His current research interests include data center networksarchitecture, software-defined networking, application of optical interconnectsfor computing platforms, and big data analytics.

Matteo Fiorani received the Ph.D. degree in information and communicationtechnologies from the University of Modena, Modena, Italy. During the Ph.D.studies he spent 1 year at Vienna University of Technology, Austria. SinceJanuary 2014, he has been working as a Postdoctoral Researcher at KTH RoyalInstitute of Technology, Stockholm, Sweden. He was a Visiting Researcher atthe University of California, Davis, CA, USA, in August/September 2015, andat Columbia University, New York, NY, USA, in April/May 2016. His researchinterests include 5G radio and transport technologies, optical and electronicinterconnects for data centers, software defined networking. He coauthoredmore than 40 peer-reviewed research papers (18 journal papers). He is currentlyinvolved in several European and Swedish research projects. He is Cofounderand Chair of the IEEE workshop on 5G transport networks.


Yiwen Shen was born in Shanghai, China, in 1992. He received the B.Sc. degreein electrical engineering from the University of Toronto, Toronto, ON, Canada in2014, and the M.Sc degree in electrical engineering from Columbia University,New York, NY, USA in 2015. He is currently a Graduate Research Assistant withthe Lightwave Research Laboratory, Columbia University where he is workingtoward the Ph.D. degree. His research interests include the development of opti-cal software-defined networks for data-center and high-performance computingapplications.

Lena Wosinska received the Ph.D. degree in photonics and the Docentdegree in optical networking from KTH Royal Institute of Technology,Stockholm, Sweden, where she is currently a Full Professor of telecommu-nication in the School of Information and Communication Technology. She isFounder and Leader of the Optical Networks Laboratory. She has been workingin several EU projects and coordinating a number of national and internationalresearch projects. Her research interests include fiber access and 5G transportnetworks, energy efficient optical networks, photonics in switching, optical net-work control, reliability and survivability, and optical datacenter networks. Shehas been involved in many professional activities including guest editorshipof IEEE, OSA, Elsevier, and Springer journals, serving as General Chair andCochair of several IEEE, OSA, and SPIE conferences and workshops, servingin TPC of many conferences, as well as being Reviewer for scientific journalsand project proposals. She has been an Associate Editor of the OSA Journalof Optical Networking and the IEEE/OSA JOURNAL OF OPTICAL COMMUNI-CATIONS AND NETWORKING. She is currently serving on the Editorial Boardof the Springer Photonic Networks Communication Journal and of the WileyTransactions on Emerging Telecommunications Technologies.

Keren Bergman (S’87–M’93–SM’07–F’09) received the B.S. degree fromBucknell University, Lewisburg, PA, USA, in 1988, and the M.S. and Ph.D. de-grees from the Massachusetts Institute of Technology, Cambridge, MA, USA,in 1991 and 1994, respectively, all in electrical engineering. She is currently theCharles Batchelor Professor of electrical engineering at Columbia University,New York, NY, USA, where she is also the Scientific Director of the ColumbiaNano Initiative. Her current research interests include optical interconnectionnetworks for high-performance embedded computing, optical data center net-works, and silicon photonic systems-on-chip.

Documents

1188 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 35, … · 1188 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 35, NO. 6, MARCH 15, 2017 Flexible Architecture and Autonomous Control Plane for Metro-Scale