Cognitive Policy Based SON Management Demonstrator

HAL Id: hal-01815821https://hal.archives-ouvertes.fr/hal-01815821

Submitted on 14 Jun 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Cognitive Policy Based SON Management DemonstratorTony Daher, Sana Jemaa, Laurent Decreusefond

To cite this version:Tony Daher, Sana Jemaa, Laurent Decreusefond. Cognitive Policy Based SON Management Demon-strator. ICIN 2018, Feb 2018, paris, France. �hal-01815821�

https://hal.archives-ouvertes.fr/hal-01815821

https://hal.archives-ouvertes.fr

Cognitive Policy Based SON ManagementDemonstrator

Tony Daher, Sana Ben JemaaOrange Labs

44 Avenue de la Republique 92320 Chatillon, FranceEmail:{tony.daher,sana.benjemaa}@orange.com

Laurent DecreusefondTelecom ParisTech

23 avenue d’Italie, 75013 Paris, FranceEmail:[email protected]

Abstract—Policy Based SON Management(PBSM) framework has been introduced tomanage Self-Organizing Networks (SON) func-tions in a way that they fulfill all togetherthe operator global goals and provide a uniqueself-organized network that can be controlledas a whole. This framework mainly translatesoperator global objectives into policies to befollowed by individual SON functions. To copewith the complexity of radio networks due tothe impact of radio environment and traffic dy-namics, we propose to empower the PBSM withcognition capability. We propose a CognitivePBSM (CPBSM) that relies on a ReinforcementLearning (RL) algorithm which learns the opti-mal configuration of SON functions and steersthem towards the global operator objectives.The visitor will see how changing the opera-tor objectives leads to a reconfiguration of theSON functions in such a way that the newobjectives are fulfilled. The operation of a RLbased cognitive management will be illustratedand the exploration/exploitation and scalabilitydilemmas will be explained.

I. Introduction

The automation of Radio Access Networks(RAN) management became a market reality withthe standardization [1] and the commercial releaseof Self Organized Networks (SON) solutions. To-day, SON functions are running in several oper-ational networks to replace individual tasks suchas neighboring relations configuration, load man-agement related parameter settings, etc. Henceseveral SON functions run simultaneously in thesame network, each of them fulfilling a specificobjective. These individual objectives may some-times be conflictual and trade-offs need to be foundwith respect to the operator strategy. Policy BasedSON Management (PBSM) framework has beenintroduced to orchestrate SON functions in a waythat they fulfill all together operator global goals[2]. Finding the optimal mapping between theoperator goals and the actual actions or behaviors

of the SON functions is a complex problem for sev-eral reasons. First, SON functions are provided byRAN vendors as black boxes with limited leverageson their behavior, through the configuration ofthe SON function parameters such as thresholds,step size, parameters intervals, etc that we noteas SON function Configuration Value (SCV) sets.Besides, the behavior of a SON function runningon a given cell, for a given SCV set, depends onthe radio environment, namely the propagation,and on the traffic distribution and dynamics. Tocope with this complexity we propose to empowerthe PBSM with cognition capability illustratedin figure 1 [3]. The proposed Cognitive PBSM(CPBSM) relies on a Reinforcement Learning (RL)algorithm that learns the optimal configuration ofSON functions and steers them towards fulfillingthe global operator objectives.

Fig. 1: Cognitive PBSM

The learning technique considered for thedemonstrator is a centralized online learning tech-nique, based on Multi-Armed Bandits (MAB) [4].

MAB is a RL problem, formulated as a sequentialdecision problem, where at each iteration an agentis confronted with a set of actions called arms, eachwhen pulled, generates a reward. The agent is onlyaware of the reward of the arm after pulling it.The objective is to find a strategy that permits toidentify the optimal action, while maximizing thecumulated rewards obtained during the process. Inour case, a learning agent on top of the SON func-tions takes actions by configuring all the instancesof the SON functions in the network, as shown infigure 2.

Fig. 2: Cognitive PBSM Functional Scheme

II. Demonstrator DescriptionA. Scenario

Fig. 3: Network Topology

The demonstration shows a cognitive manage-ment system for radio access mobile networks.Based on operator-defined objectives, such as max-imizing end-user average bit-rates and minimiz-ing average cell loads, the cognitive managementsystem automatically configures and controls theoperation of SON Functions. The demonstrator isbased on an LTE-A radio access network simulator

with several SON functions deployed on a hetero-geneous network, with realistic topology and prop-agation modeling. The considered network portionis presented in figure 3. The macro-cellular layercorresponds to a portion of a real network topologywith real-like network parameters and accurate raytracing based propagation. Small cells are addedto the network using standard propagation. Weconsider a stationary and unbalanced traffic dis-tribution.The demonstration scenario considers 3SON functions:• Mobility Load Balacing (MLB) deployed on

each macro-cell, in charge of setting the mo-bility parameters to balance the load betweenneighboring macro cells.

• Cell Range Expansion (CRE) deployed oneach small cell, in charge of balancing the loadbetween the small cell and the neighbor macrocell.

• Enhanced Inter Cell Interference Coordina-tion (eICIC) deployed on each macro cellhaving at least 1 small cell deployed in itscoverage region. The eICIC’s task is to protectsmall cell edge users from the high levels of in-terference generated by the macro cell, by tun-ing the number of Almost Blank Subframes(ABS) transmitted by the macro cell (ABScontains only control signals transmitted atreduced power).

For this scenario, we will define the following KPIs:• Li,c,t is the load of cell i• Lc,t is the average load in the considered

section• T c,t is the average user throughput in the

considered section• T ′c,t is the average pico cell edge user through-

put in the considered sectionc ∈ C, where C is the set of actions, whichis in this case the set of SCV set combinationsfor the deployed SON instances in the consideredsection. t represents the iteration. The variableswere normalized between 0 and 1. The rewardfunction reflecting the operator’s objective, andthat the learning algorithm seeks to maximize, istherefore:

rc,t = ω1(1− σc,t) + ω2T c,t + ω3T ′c,t (1)

Where the load variance is:

σc,t =∑B

i=0 (Li,c,t − Lc,t)2

BB is the number of cells in the considered sectionand ω1, ω2 and ω3 are priority weights set by theoperator.

B. Demonstrator

The CPBSM demonstrator consists of 3 maincomponents that are: the Operator ObjectivesPanel, the Learning Panel and the PerformanceEvaluation Panel.In the Operator Objectives Panel, the operator

defines its objectives by giving priorities to severalproposed criteria such as load balance, end userthroughput or interference level at cell edge. Anexample of operator objective panel is illustratedin figure 4.

Fig. 4: Operator Objectives Panel

The Learning Panel shows the evolution of theCPBSM’s action selection statistics, as well as thegenerated rewards. This panel, illustrated in figure5, shows how the learning process progresses. Theaction selection statistics graph shows the evolu-tion of the online learning agent decisions. In thefirst stages of the learning, the demonstrator showsan almost random action selection, due to the lackof knowledge of the learning agent. After severaliterations, the agent’s decisions start convergingtowards the optimal action. This is also shownin the reward function evolution graph, where wecan see that the generated rewards keep increasingwith time, to finally converge to a stationary state.Furthermore, the demonstration shows that theCPBSM, through its learning agent, is able toadapt to objective changes, i.e. when the operatorchanges its objectives, the CPBSM will perform abrief learning phase, before converging back to anew optimal action.

Fig. 5: Learning Process Evolution

The Performance Evaluation Panel comparesthe KPIs performances of the CPBSM with anone-objective driven, static configuration of SONfunctions as shown in figure 6. It is important tonote that the static configuration of SON func-tions is a typical configuration deployed on realfield networks. The objective is to illustrate thegain that we can obtain by adopting the CPBSMwith respect to typical performances that can beobtained today with SON.

Fig. 6: Performance Comparison

III. ConclusionThe demonstrator shows how cognition capabil-

ity allows the RAN management system to alwaysmake optimal decisions. The visitor will see howchanging the operator objectives leads to a recon-figuration of the SON functions in such a way thatthe new objectives are fulfilled. The operation of aRL based cognitive management will be illustratedand the exploration/exploitation and scalabilitydilemmas will be explained.

References[1] 3GPP TR 36.814, "Technical Specification

Group Radio Access Network; Evolved Univer-sal Terrestrial Radio Access (E-UTRA); Fur-ther advancements for E-UTRA physical layeraspects (Release 9)", March 2010.

[2] SEMAFOUR project web page http://fp7-semafour.eu/.

[3] T. Daher, S. Ben Jemaa and L. Decreusefond,"Cognitive Management of Self-Organized Ra-dio Networks Based on Multi Armed Bandit,"IEEE Personal Indoor and Mobile Radio Com-munications (PIMRC), 2017.

[4] P. Auer, N. Cesa-Bianchi and P. Fischer,"Finite-time analysis of the multiarmed banditproblem," Machine learning, 47 (2-3), pp. 235-256, 2002

Documents

Cognitive Policy Based SON Management Demonstrator