17
Research Article A Mobile Network Planning Tool Based on Data Analytics Jessica Moysen, Lorenza Giupponi, and Josep Mangues-Bafalluy Centre Tecnol` ogic de Telecomunicacions de Catalunya (CTTC), Av. Carl Friedrich Gauss 7, 08860 Castelldefels, Spain Correspondence should be addressed to Jessica Moysen; [email protected] Received 3 August 2016; Accepted 14 November 2016; Published 5 February 2017 Academic Editor: Piotr Zwierzykowski Copyright © 2017 Jessica Moysen et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Planning future mobile networks entails multiple challenges due to the high complexity of the network to be managed. Beyond 4G and 5G networks are expected to be characterized by a high densification of nodes and heterogeneity of layers, applications, and Radio Access Technologies (RAT). In this context, a network planning tool capable of dealing with this complexity is highly convenient. e objective is to exploit the information produced by and already available in the network to properly deploy, configure, and optimise network nodes. is work presents such a smart network planning tool that exploits Machine Learning (ML) techniques. e proposed approach is able to predict the Quality of Service (QoS) experienced by the users based on the measurement history of the network. We select Physical Resource Block (PRB) per Megabit (Mb) as our main QoS indicator to optimise, since minimizing this metric allows offering the same service to users by consuming less resources, so, being more cost- effective. Two cases of study are considered in order to evaluate the performance of the proposed scheme, one to smartly plan the small cell deployment in a dense indoor scenario and a second one to timely face a detected fault in a macrocell network. 1. Introduction Nowadays, we are assisting to the definition of what 5G networks will look like. 3GPP has started multiple work items that will lead to the definition of a novel 5G radio and architecture [1]. e main vendors are publishing numer- ous white papers presenting their view on 5G networks and architectures. e EU commission has put in place an important 5G Infrastructure Public Private Partnership (5GPPP) program to fund research for 5G networks [2]. What is clear from all these converging visions is that network management in beyond 4G and future 5G networks has to face a whole new set of challenges, due to (1) ultradense deployments, heterogeneous nodes, networks, applications, and Radio Access Networks (RANs) and also heterogeneous spectrum access through novel technologies, such as Long Term Evolution Unlicensed (LTE-U) and License Assisted Access (LAA), all coexisting in the same setting, (2) the need to manage very dynamic networks where part of the nodes is controlled directly by the users (e.g., femtocells), energy saving policies generating a fluctuating number of nodes, active antennas, and so on, (3) the need to support 1000x traffic and 10x users and to improve energy efficiency, (4) the need to improve the experience of the users by enabling Gbps speeds and highly reduced latency, or (5) the need to manage new virtualised architectures. In this context, it is already widely recognized that the network needs to establish new procedures to become more intelligent, self-aware, and self-adaptive. An initial step in this direction has been introduced in 4G LTE networks, since Release 8, with the introduction of Self-Organising Network (SON). However, this vision needs to be further developed in 5G considering the huge complexity of these networks. As we already observed in [3], a huge amount of data is currently already generated in 4G networks during normal operations by control and management functions, and more data is expected to be gathered in 5G networks due to the densification process [4], heterogeneity in layers and technologies, the additional control and management complexity in Network Functions Virtualisation (NFV) and Soſtware Defined Network (SDN) architectures, the increas- ing relevance of Machine to Machine (M2M) and Internet of ings (IoT) communications, the increasing variety of applications and services, each with distinct traffic patterns and QoS/Quality of Experience (QoE) requirements, and so on. Hindawi Mobile Information Systems Volume 2017, Article ID 6740585, 16 pages https://doi.org/10.1155/2017/6740585

ResearchArticle A Mobile Network Planning Tool Based on

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Research ArticleA Mobile Network Planning Tool Based on Data Analytics

Jessica Moysen, Lorenza Giupponi, and Josep Mangues-Bafalluy

Centre Tecnologic de Telecomunicacions de Catalunya (CTTC), Av. Carl Friedrich Gauss 7, 08860 Castelldefels, Spain

Correspondence should be addressed to Jessica Moysen; [email protected]

Received 3 August 2016; Accepted 14 November 2016; Published 5 February 2017

Academic Editor: Piotr Zwierzykowski

Copyright © 2017 Jessica Moysen et al.This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Planning future mobile networks entails multiple challenges due to the high complexity of the network to be managed. Beyond4G and 5G networks are expected to be characterized by a high densification of nodes and heterogeneity of layers, applications,and Radio Access Technologies (RAT). In this context, a network planning tool capable of dealing with this complexity is highlyconvenient. The objective is to exploit the information produced by and already available in the network to properly deploy,configure, and optimise network nodes. This work presents such a smart network planning tool that exploits Machine Learning(ML) techniques. The proposed approach is able to predict the Quality of Service (QoS) experienced by the users based on themeasurement history of the network. We select Physical Resource Block (PRB) per Megabit (Mb) as our main QoS indicator tooptimise, since minimizing this metric allows offering the same service to users by consuming less resources, so, being more cost-effective. Two cases of study are considered in order to evaluate the performance of the proposed scheme, one to smartly plan thesmall cell deployment in a dense indoor scenario and a second one to timely face a detected fault in a macrocell network.

1. Introduction

Nowadays, we are assisting to the definition of what 5Gnetworks will look like. 3GPP has started multiple workitems that will lead to the definition of a novel 5G radioand architecture [1].Themain vendors are publishing numer-ous white papers presenting their view on 5G networksand architectures. The EU commission has put in placean important 5G Infrastructure Public Private Partnership(5GPPP) program to fund research for 5Gnetworks [2].Whatis clear from all these converging visions is that networkmanagement in beyond 4G and future 5G networks has toface a whole new set of challenges, due to (1) ultradensedeployments, heterogeneous nodes, networks, applications,and Radio Access Networks (RANs) and also heterogeneousspectrum access through novel technologies, such as LongTerm Evolution Unlicensed (LTE-U) and License AssistedAccess (LAA), all coexisting in the same setting, (2) the needto manage very dynamic networks where part of the nodesis controlled directly by the users (e.g., femtocells), energysaving policies generating a fluctuating number of nodes,active antennas, and so on, (3) the need to support 1000xtraffic and 10x users and to improve energy efficiency, (4) the

need to improve the experience of the users by enabling Gbpsspeeds and highly reduced latency, or (5) the need to managenew virtualised architectures.

In this context, it is already widely recognized that thenetwork needs to establish new procedures to become moreintelligent, self-aware, and self-adaptive. An initial step inthis direction has been introduced in 4G LTE networks,since Release 8, with the introduction of Self-OrganisingNetwork (SON). However, this vision needs to be furtherdeveloped in 5G considering the huge complexity of thesenetworks. As we already observed in [3], a huge amount ofdata is currently already generated in 4G networks duringnormal operations by control and management functions,and more data is expected to be gathered in 5G networksdue to the densification process [4], heterogeneity in layersand technologies, the additional control and managementcomplexity in Network Functions Virtualisation (NFV) andSoftware Defined Network (SDN) architectures, the increas-ing relevance of Machine to Machine (M2M) and Internetof Things (IoT) communications, the increasing variety ofapplications and services, each with distinct traffic patternsand QoS/Quality of Experience (QoE) requirements, and soon.

HindawiMobile Information SystemsVolume 2017, Article ID 6740585, 16 pageshttps://doi.org/10.1155/2017/6740585

2 Mobile Information Systems

The main objective of network management is then tomake the network (1) more self-aware, by exploiting andanalysing data already generated by the network (this isexpected to drive network management from reactive to pre-dictive) and (2) more self-adaptive by exploiting intelligentcontrol decisions tools, offered by ML, based on learning andexperience.

In this paper, among many network management prob-lems, we focus on smart network planning, which givesparticular emphasis to the QoS offered to the users and theresources used by the operator to offer it. We believe that theuse of smart network planning tools is crucial and inevitablefor operators running multi-RAT, multivendor, multilayernetworks, where an overwhelming number of parametersneeds to be configured to optimise the network performance.The state of the art in network planning, in literature, andin the market offers a wide range of platforms systems andapplications proposed by the research community and/ororiented to the industry. From an industry perspective, themarket of commercial planning tools aims at providing acomplete set of solutions to design and analyse networks[5–8]. For instance, in [9], the authors present an open-source network planning tool that includes different planningalgorithms to analyse the network under different failuresand energy efficiency schemes. However, these works ingeneral focus on several configuration scenarios, RF coverageplanning, network recovery test, traffic load analysis, andforecasting traffic, among others, and not directly on QoSoffered to end-users and the resources that mobile networkoperators need to offer. Other works are more targeted toQoS estimation, but not in the area of network planning.The literature already offers different works targeting theproblem of QoS prediction and verification, such as [10, 11].In our preliminary work [12], we focus on complexmultilayerheterogeneous networks where we predict QoS indepen-dently of the physical location of the UE, that is, basedon data learned throughout the whole network. Preliminaryresults show that by abstracting from the physical positionof the measurements we can provide high accuracies in theestimations of QoS in other arbitrary regions. Furthermore,the results presented in [13] show that data analysis achievesbetter performance in a reduced space rather than in theoriginal one.

This paper presents a smart network planning tool thatworks in two steps. First, it estimates the QoS at everypoint of the network based on user-collected measurements,which may have been taken at different time instants andanywhere in the heterogeneous network. We perform thisestimation through supervised learning tools. This results inan appropriately tuned QoS prediction model, which is thenintegrated in the next step. Second, we adjust the networkparameters in order to reach certain network objectives byevaluating different combinations and calculating the result-ing QoS based on the model of step 1. Network objectives areset in terms of PRB per Mb. We focus on this specific QoSindicator because it combines information that is relevant tothe operator (PRBs) and other that is relevant to the end-user(Mb of data) into a singlemetric.Therefore, theminimizationof this indicator allows serving users with an improved

spectral efficiency and offeredQoS.More specifically, to carryout this optimisation we take advantage of GAs, which arestochastic search algorithms useful to implement learningand optimisation tasks [14], and so are adequate for ourpurpose.

In order to evaluate the performance of the proposedscheme, we consider two use cases. The first one is ina densified indoor scenario, and the second one is in amore traditional macrocell scenario. In the first use case,we focus on how to plan a dense small cell deploymentinspired in a typical 3GPP dual stripe scenario [15], where theparameters to adjust are the number of small cells and theirposition. In the second use case, we deal with self-healingaspects in macrocellular scenarios. We focus on readjustingthe parameters of the surrounding cells to quickly solve anoutage problem by automatically adjusting their antenna tiltparameters.

We evaluate the performance of the proposed planningtool through a network simulation campaign carried outover the 3GPP compliant, full protocol stack ns-3 LENAmodule.We show that the proposedML-based network plan-ning, differently from other closed optimisation approaches,is extremely flexible in terms of application problem andscenario, and, in this sense, it is generic. Without loss of gen-erality, particular attention has been focused on how to plana dense 4G small cell deployment and how to quickly solvean outage problem. As can be seen, the approach is equallyvalid and shows good performance in both studied use cases,characterized by very different optimisation problems andscenarios. As for the RAT, we have focused on 4G and itsevolution. The same technique can be applied also to 2G and3G technologies.

This paper is organised as follows. The general approachis described in Section 2. It describes the ML-based networkplanning tool, its main design principles, and algorithms weuse to build it. In Section 3, we discuss the specific designdetails and tuning of the QoS prediction model and we putall the pieces (i.e., prediction and optimisation) together. InSection 4, we present the details of the two use cases towhich the tool is applied, the simulation platform, and thesimulations results. Finally, Section 5 concludes the paper.

2. ML-Based Network PlanningTool Description

We propose designing a network planning tool, which worksin two steps. First, we propose to model the QoS through theanalysis of data extracted from the networks in the form ofmeasurements. This phase requires to first prepare the dataand then to analyse it. And second, we keep on adjusting theparameters and analysing the impact on QoS based on theprevious model. In this way, the performance of the networkis optimised tomeet certain operator targets. Figure 1 presentsthe different phases required by the proposed network plan-ning.

(1) Data Preparation.This process aims at transforming datainto a meaningful format for the estimation at hand. Thetarget is to integrate and prepare large volumes of data

Mobile Information Systems 3

Table 1: Relevant sources of information in mobile networks.

Source Information UsageControl info for short-termnetwork operation

Call/session setup, release, maintenance QoS, RRC, idleand connects mode mobility

Discarded afterusage

Control info for SONfunctions

Info on Radio link failure, intercell interference, UEmeasurements, MDT measurements, radio resource

status, cell load signalling, etc.

Heuristic algorithmstypically discard info

after

Management information forlong-term operation

Fault configuration, accounting, performance and securitymanagement (FCAPS), Operations and Management, e.g.,

in OAM aggregated statistics per eNB, on networkperformances, # users, successful/failed HO, active

bearers, information from active probing

Mainly used fortriggering engineer

intervention

Customer RelationshipInformation Complaints about bad service quality, churn info Only used by

customer service

Data preparation Data analysis Optimised network planning

(i) Reducing the highdimensional space

(ii) Building machinelearning models

Modelling QoS

(i) Collecting the data

(iii) Loading the data

(ii) Preprocessing thedata

(i) Optimisation ofdecision makingprocess

Figure 1: Architecture of the network planning tool.

over the network to provide a unified information base foranalysis. To do this, we follow the Extract-Transform-Load(ETL) process, which is responsible for pulling data out of thesource and placing it into a database. It involves 3 main steps:data extraction (E), whose objective is to collect the data fromdifferent sources; data transformation (T), which prepares thedata for the purpose of querying and analysis; data loading(L), which loads the data into the main target, most of thecases into a flat file. This process plays an important role forthe design and implementation of planning for future mobilenetworks. The objective is to create a data structure that isable to provide meaningful insights. Some examples of thekind of sources available in mobile networks are shown inTable 1 [3]. Here the data are classified based on the purposefor which they are generated in the network.The usage that isgiven nowadays in the network is also suggested in the lastcolumn. For the purpose of network planning, we plan toextract data reported by the UEs to the network in the formof UE measurements, in terms of received power, receivedquality, and offered QoS.

Once the data has been collected, we prepare the data forstoring, using the proper structure for the querying.

(2) Data Analysis.The objective of this process is to discoverpatterns in data that can lead to predictions about the future.This is done by finding this information/correlation amongthe radio measurements extracted from the network. We dothis by applying ML techniques.

(3) Optimised Network Planning.Theobjective of this processis to find the configuration parameters for the optimised

network planning based on the information extracted fromthe previous data analysis process. In the complex cellularcontext, we need to deal with several network characteristicsthat introduce high complexity, for example, the very largenumber of parameters, the strong cross-tier interference,fast fading, shadowing, and mobility of users. In orderto deal with these issues and to guarantee an appropriatenetwork planning, in this work, we propose to use GAs,which allow avoiding some of the problems of typical closedoptimisation techniques (e.g., computational intractability)in this complex and dynamic scenario. More specifically, theywork with chromosomes (i.e., a given combination of valuesfor the parameters to be tuned in the network). For each ofthe chromosomes, they calculate its fitness score based ona given objective function (in our case, the QoS predictedby the model) [17]. Then, they select the chromosomes withthe best fitness score and generate better child chromosomesby combining the selected ones and they keep on iteratinguntil the objective function of the chromosomes generatedreaches the performance target. In this way, GAs performparallel search from a population of points, which representthe values of the different parameters to be tuned in thescenario and, jointly with other techniques, they have theability to avoid local minima and use probabilistic searchrules.

2.1. Modelling the QoS. We estimate the QoS at every pointof the network based on measurements collected in differentmoments in time and from other regions of the heteroge-neous network, that is, based on the measurement historyof the network. To do this, we consider the data preparationand data analysis processes of the network planning tool.As mentioned previously, the objective of these 2 processesis to extract, prepare, and analyse the information alreadyavailable in the network to provide insightful informationfrom the analysis of it. In fact, in this kind of estimations,ML techniques can be very effective to make predictionsbased on observations. Therefore, we take advantage of MLtechniques to create a model that allows estimating the QoSby learning the relation between PHY layer measurementsand QoS measured at the UE. We propose using SL, sinceamong many applications it offers tools for estimation and

4 Mobile Information Systems

prediction of behaviours. In particular, we focus on a regres-sion problem, since we want to analyse the relationshipbetween a continuous variable (PRB per Mb) and the dataextracted from the network in the form of UEmeasurements.Many regression techniques have been developed in the SLliterature, and criteria to select the most appropriate methodinclude aspects such as the kind of relation that exists betweenthe input and the output or between the considered features,the complexity, the dimension of the dataset, the ability toseparate the information from the noise, the training speed,the prediction speed, the accuracy in the prediction, and soon. We focus on regression models, and we select the mostrepresentative approaches.We then use ensemble methods tosub-sample the training samples, prioritizing criteria such asthe low complexity and the high accuracy.

We then build a dataset of user measurements, based onthe same data contained in the Minimization of Drive Tests(MDT) database. The MDT is a standardized database usedfor different 3GPP use cases. The dataset contains trainingsamples (rows) and features (columns) and is divided into2 sets, the training set to train the model and the test setto make sure that the predictions are correct. That trainingdata develop a predictive model and evaluate the accuracyof the prediction, by inferring a function 𝑓(x), returningthe predicted output ��. The input space is represented byan 𝑛-dimensional input vector x = (𝑥(1), . . . , 𝑥(𝑛))𝑇 ∈ R𝑛.Each dimension is an input variable. In addition, a trainingset involves 𝑚 training samples ((x1, 𝑦1), . . . , (x𝑚, 𝑦𝑚)). Eachsample consists of an input vector x𝑖 and a correspondingoutput 𝑦𝑖 of one data point 𝑖. Hence, 𝑥(𝑗)𝑖 is the value of theinput variable𝑥(𝑗) in training sample 𝑖, and the error is usuallycomputed via |𝑦𝑖 − ��𝑖| or with the root mean square error.

In addition to regression analysis, we exploit Unsuper-vised Learning (UL) techniques for dimensionality reductionto filter the information in the data that is actually ofinterest (thus reducing the computational complexity) whilemaintaining the prediction accuracy. We do this by feedingthe data into an ensemble method consisting of Bagging/AdaBoost to manipulate the training samples. And after that,the SL techniques under evaluation are then applied. Detailsfor each step are given in the following, and the whole processis depicted in Figure 2.

(1) Collecting the Data. The data we take into account comesfrom mobile networks, which generate data in the form ofnetwork measurements, control, and management informa-tion (Table 1). As we mentioned previously, we focus onMDT functionality, which enables operators to collect UserEquipment (UE) measurements together with location infor-mation, if available, to be used for network management,while reducing operational costs.This feature has been intro-duced by 3GPP since Release 10; among the targets there arethe standardization of solutions for coverage optimisation,mobility, capacity optimisation, parametrization of commonchannels, and QoS verification. In this context, the literaturealready offers different solutions for this feature. An exampleof that can be observed in, [18, 19]. Since operators are alsointerested in estimating QoS performance, in Release 11, the

Building machine learning models

Reducing the high dimensional spaceFE-PCA/FS-SPCA

Bagging/Adaboost

Tuned model

Preprocessingthe data

Collectingthe data

Loading the data

Tuneregression model

SVM, DT)(e.g., k-NN, NN,

Figure 2: Modelling the QoS.

MDT functionality has been enhanced to properly dimensionandplan the network by collectingmeasurements of through-put and connectivity issues [20].Therefore, we collect for eachUE (1) the Reference Signal Received Power (RSRP) and (2)the Reference Signal Received Quality (RSRQ) coming fromthe serving and neighbouring eNBs. The size of the inputspace is [𝑙 × 𝑛]. The number of rows is the number 𝑙 of UEin the scenario, and the number of columns corresponds tothe number ofmeasurements 𝑛.The size of the output space is[𝑙×1], which corresponds to theQoS performance in terms ofthe PRB per transmitted Mb. These measurements gatheredat arbitrary points of the network throughout its lifetime areexploited to plan other arbitrary future deployments.

(2) Preprocessing the Data. In order to obtain a good per-formance during the evaluation, the input variables of thedifferent measurements must be in a similar scale and range.So, a common practice is to normalize every variable between−1 ≤ 𝑥(𝑗) ≤ 1 range and replace 𝑥(𝑗) with 𝑥(𝑗) − 𝜇(𝑗), over thedifference between themaximumandminimumvalues of theinput variables in the dataset, where 𝜇(𝑗) is the average of theinput variable (𝑗) in the dataset. The normalized data is thensplit into training and test set. We create a random partitionfrom the 𝑙 sets of input.This partition divides the observationsinto a training set of 𝑚 samples and a test set 𝑝 = 𝑙 − 𝑚samples. We randomly select approximately 𝑝 = (1/5) × 𝑙observations for the test set.

(3) Loading the Data. This process varies widely. As wementioned before, depending on the operator requirements,the data can be updated or new data can be added in ahistorical form at regular intervals. For our propose, in thisparticular work, we maintain a history of all changes to thedata loaded in the network.

(4) Reducing the High Dimensional Space. One of the prob-lems that mobile operators have to face in this kind of

Mobile Information Systems 5

networks is the huge amount of potential features we haveas input. In our particular case, features would substantiallyincrease as networks densify. Therefore, to deal with thehuge amount of features, we propose applying regressiontechniques in a reduced space, rather than in the originalone. The idea behind that comes from our previous work[12], in which we observed that using a high dimensionalspace did not result in the best performance. Therefore, wesuggest applying the regression analysis in a reduced space.As a result, we take advantage of dimensionality reductiontechniques to reduce the number of random variables underconsideration. These methods can be divided into FeatureExtraction (FE) and Feature Selection (FS) methods. Bothmethods seek to reduce the number of features in the dataset.FE methods do so by creating new combinations of features(e.g., Principal Component Analysis (PCA)), which projectthe data onto a lower dimensional subspace by identifyingcorrelated features in the data distribution. They retain the𝑐 Principal Components (PCs) with greatest variance anddiscard all others to preserve maximum information andretainminimal redundancy [21]. Correlation-based FSmeth-ods include and exclude features present in the data withoutchanging them. An example is Sparse Principal ComponentAnalysis (SPCA), which extends the classic method of PCAfor the reduction of dimensionality of data by adding sparsityconstraints on the input features; that is, by adjusting a set ofweights over the input features, it induces a matrix in whichmost of the elements are zero in the solution. In FS-SPCAthe sparsity is used to select the 𝑓 features that give the mostuseful information; as we increase the weight of SPCA, thenumber of features is reduced. That is, by adding sparsityconstraints on the input features, we promote solutions inwhich only a small number of input features capture most ofthe variance. Some preliminary work on these features waspresented in [13].

(5) Building the Machine Learning Models. We select somerepresentative regression models, prioritizing criteria such aslow complexity and high accuracy: (1) 𝑘-Nearest Neighbours(𝑘-NN), (2) Neural Networks (NN), (3) Support VectorMachines (SVMs), and (4) Decision Tree (DT), and weanalyse them by performing an empirical comparison ofthese algorithms, observing the impact on the prediction ofthe different kinds and amounts of UE measurements.

(1) 𝑘-NN can be used for classification and regression[22]. The 𝑘-NN method has the advantage of beingeasy to interpret and fast in training and parametertuning is minimal.

(2) NN is a statistical learning model inspired by thestructure of a human brain where the interconnectednodes represent the neurons to produce appropriateresponses. NN support both classification and regres-sion algorithms. NN methods require parameters ordistribution models derived from the dataset, and ingeneral they are susceptible to overfitting [23].

(3) SVMs can be used for classification and regression.The estimation accuracy of this method depends ona good setting of the regularization parameter 𝐶, 𝜖,

and the kernel parameters. This method in generalshows high accuracy in the prediction, and it can alsobehave very well with nonlinear problemswhen usingappropriate kernel methods [24].

(4) DT is a flow-chart model, which supports both clas-sification and regression algorithms. Decision treesdo not require any prior knowledge of the data, arerobust, and work well on noisy data. However, theyare dependent on the coverage of the training data,as is the case for many classifiers, and they are alsosusceptible to overfitting [25, 26].

In order to enhance the performance of each learningalgorithm described before, instead of using the same datasetto train, we can use multiple data sets by building anensemble method. Ensemble methods are learning modelsthat combine the opinions of multiple learners. This tech-nique has been investigated in a huge variety of works [27,28], where the most useful techniques have been found tobe Bagging and AdaBoost [29]. Bagging manipulates thetraining examples to generate multiple hypothesis. It runsthe learning algorithm 𝑛iter times, each one with a differentsubset of training samples. AdaBoost works similarly, but itmaintains a set of weights over the original training set andadjusts these weights by increasing the weight of samples thatare misclassified and decreasing the weight of examples thatare correctly classified [30].

In summary, once the extracted data has been processedand loaded into a file, it is fed into a dimensionality reduc-tion step, and subsequently into an ensemble step, whichmanipulates the training set by applying Bagging/AdaBoosttechniques.The learning algorithm is then applied to producea regression (see Algorithm 1).

(6) Evaluation of Accuracy. To evaluate the accuracy of themodel, the performance of the learned function is measuredon the test set. That is, we use a set of samples used to tunethe regression algorithm. For each test value, we predict theaverage QoS and compare it with the actual value in termsof the Root Mean Squared Error (RMSE) of the predictionas follows RMSE = √∑𝑝𝑖=1(𝑦𝑖 − ��𝑖)2/𝑝, where 𝑝 is the lengthof the test set, ��𝑖 indicates the predicted value, and 𝑦𝑖 is thetesting value of one data point 𝑖. In order to compare theRMSE with different scales, the input and output variablevalues are normalized as follows: NRMSE = RMSE/(𝑦max −𝑦min), where 𝑦max and 𝑦min represent the max andmin valuesin the output space 𝑌test of size [𝑝 × 1], respectively.2.2. Optimised Network Planning. As we mentioned before,the objective of this process is to close the loop by adjustingthe parameters, and so the network performance, through aGA. This results in a GA organised into different phases, asdepicted in Figure 3.

(1) Create S Feasible Solutions. We create a set S ={𝜃(1), . . . , 𝜃(𝑝size)} of feasible solutions (also called chromo-somes or individuals), where 𝑝size is the starting populationsize. We denote by 𝜃(𝑠) = (𝜃(𝑠)1 , . . . , 𝜃(𝑠)𝑀 ) the configuration

6 Mobile Information Systems

Input: Input space of size [𝑚 × 𝑛](𝑋train),output space of size [𝑚 × 1](𝑌train),Input space of size [𝑝 × 𝑛](𝑋test),output space of size [𝑝 × 1](𝑌test),number of iterations (𝑛iter)

Output: 𝑚𝑜𝑑𝑒𝑙// Given the training set (𝑋train, 𝑌train)𝑋train = {x1, . . . , x𝑚}𝑌train = {𝑦1, . . . , 𝑦𝑚}// Apply dimensional reduction step (if it is necessary)obtain 𝑐-dimensional input vectorApply regression analysis in a reduced space// Set up the data for Bagging/Adaboostfor 𝑘 = 1 to 𝑛iter do// Call regression algorithmmodel(𝑘) fl regression algorithm, namely 𝑘-NN, NN, SVM, DT// Predict the average QoSQoSpredicted fl predict(model(𝑘),𝑋test)Evaluate performances against the actual value (𝑌test) by NRMSE

end for// Result of training base learning algorithmmodel fl best(model)return (model)

Algorithm 1: Train regression algorithm.

Evaluate networkdeploymentperformance

Selection

Crossover

Mutation

yes

no

OutputDesired

performancereached?

Create S randominitial feasible

solutions

Figure 3: Optimised network planning.

parameters vector of an individual 𝑠, with 𝜃(𝑠)𝑗 denoting thevalue for the parameter of eNB 𝑗, for instance, the transmittedpower (𝜃(𝑠)𝑗txp ), the antenna tilt (𝜃(𝑠)𝑗tilt ), or the action to switchONor OFF (𝜃(𝑠)𝑗sc ) the 𝑗th small cell.

(2) Evaluate Network Deployment Performance.This functionis responsible for evaluating the network deployment perfor-mance. The objective is to calculate the objective function(fitness) of each individual. Given a particular configurationparameter vector 𝜃(𝑠), this function is responsible for return-ing the average offered QoS, that is, the average network per-formance predicted based on the measurement history of thenetwork.This function takes as input S feasible solutions andthe model produced by the regression algorithm discussed inthe previous section. That is, the output of Algorithm 1.

The behaviour of this module is as follows. In eachiteration, we collect 𝑋eval measurements in some arbitrary 𝑞points in the scenario. These measurements are obtained as aconsequence of having configured the parameters of the sce-nario according to each 𝜃 ∈ S. Based on these measurements,the QoS in the points of interest is predicted by using themodel generated in step 1 (see evalNetPerformance functionin Algorithm 2). Finally, for a given individual, the average ofthe predicted QoS at the 𝑞 points is returned as an indicatorof the performance of the system with this setup.

As wementioned before, our metric of interest is the PRBper transmitted Mb, since reducing it allows improving theQoS of the users while also improving the spectral efficiencyof the operator.Therefore, the fitness function aims at findingthe configuration of parameters for which the total PRBper transmitted Mb is minimized. The operator can targeta desired value for the total PRB per transmitted Mb, and,based on this, the network planning tool can decide whenthe objective has been achieved and interrupt the operation.Therefore, as in any GA, we try to improve the tuning ofparameters for each new generation through the processes

Mobile Information Systems 7

Input: Configuration parameters vector (𝜃),the tuned model (model)

Output: average QoSevalNetPerformance fl function(𝜃, model){// Call ns-3 network simulatorevaluate 𝜃(s) = (𝜃(𝑠)1 , . . . , 𝜃(𝑠)𝑀 )// Collect measurements at 𝑞 points in the scenario𝑋eval = {x1, . . . , x𝑞}// Predict the average QoSQoSpredicted fl predict(model, 𝑋eval)average QoSflmean(QoSpredicted)return (average QoS)}

Algorithm 2: Evaluate network deployment performance.

described below (i.e., selection, crossover, and mutation).And our measure of improvement is the average predictedQoS values for the individuals belonging to each generation.

(3) Selection. We select the best fit individuals for reproduc-tion based on their fitness, that is, this function generates anew population of individuals from the current population.This selection is known as elitist selection. Elitism copies thebest 𝑒 fittest candidates into the next generation.(4) Crossover. This function forms a new individual bycombining part of the genetic information from their parents.The idea behind crossover is that the new individual may bebetter than both parents if he takes the best characteristicsfrom each of them. We use an arithmetic crossover, whichcreates new individuals (𝛽) that are the weighted arithmeticmean of two parents. If 𝜃(𝑎) and 𝜃(𝑏) are the parents, thefunction returns as follows:

𝛽 = 𝛼 × 𝜃(𝑎) + (1 − 𝛼) × 𝜃(𝑏), (1)

where 𝛼 is a random weighting factor chosen before eachcrossover operation. That is, the arithmetic crossover oper-ator combines two parent chromosome vectors to produce anew individual, where 𝛼 is a random value between [0, 1].(5) Mutation. This function randomly selects a parameterbased on a uniform random value between a minimum andmaximum value. It maintains the diversity in the value ofthe parameters for subsequent generations. That is, it avoidspremature convergence on a local maximum or minimum.For that, we set to 𝛿 the probability of mutation in a feasiblesolution 𝜃(𝑠). If 𝛿 is too high, the convergence is slow orit never happens. Therefore, most of the times 𝛿 tends tobe small. Finally, we replace the worst fit population withnew individuals. The whole genetic algorithm is described inAlgorithm 3.

3. Design of the Network Planning Tool

This section presents a discussion on how to tune the modelfor the ultradense complex scenarios under consideration

Table 2: Overall model accuracy.

Approaches Regression model Bagging AdaBoost

(1) FE-PCA(1.1) 𝑘-NN 90.33% 91.98%(1.2) NN 93.44% 92.28%(1.3) SVM 94.70% 94.07%(1.4) DT 92.84% 93.60%

(2) FS-SPCA(2.1) 𝑘-NN 89.69% 90.88%(2.2) NN 92.41% 91.78%(2.3) SVM 93.62% 92.87%(2.4) DT 91.22% 92.08%

and then explains how this model is integrated into the globalnetwork planning tool.

3.1. Design and Evaluation of the QoS Modelling Component.This section compares the different options for each of thecomponents that define our QoS model, namely, dimension-ality reduction, ensemble methods, and regression methods.We consider different options for each of them: (1) FE-PCAand FS-SPCA for dimensionality reduction; (2) Bagging andAdaBoost as ensemble methods; and (3) 𝑘-NN, NN, SV, andDT as regression models, as anticipated in Section 2.1.

Table 2 summarizes the performance accuracy of eachlearning algorithm. Accuracy is measured as (1 −NRMSE) ×100. Based on this table, we discuss potential design deci-sions:

(1) FS-SPCA is a very useful approach if we are interestedin excluding features to retain minimal informationredundancy. This can be observed in Figure 4, whichshows the NRMSE as a function of different numberof features (𝑓) selected by the SPCA, and whereresults reveal that for 𝑓 = 92 features we obtain thelowest NRMSE value. As a consequence, we select the92 features that give us the most useful information.On the other hand, for the FE-PCA approach, weobserved in Figures 5 and 6 that we can obtain the70% of cumulative variance if we consider 𝑐 = 10PCs. That is, with the first 10 PCs we already capturethe main variability of the data. Therefore, on theone hand, FE-PCA would present less inputs to theSL step of the data processing chain at the cost of aprior processing of features. On the other hand, FS-SPCA would simplify the initial feature processing,since selected features are taken as they are, but thecost would be the higher needed storage.

In terms of the overall accuracy of the prediction, byimplementing the FS-SPCA approach, we can reducethe dimensionality of the data down to 𝑓 features andstill maintain almost the same accuracy with respectto FE-PCA approach. That is, if we consider the 92features, we lose only 1% of accuracy with respectto FE-PCA (i.e., 92 inputs versus 10 inputs to the SLstep).Therefore, it will depend on the specific networkand operator to select whether computing or storage

8 Mobile Information Systems

Input: Initial population (S),size of S (𝑝size),the tuned model (model),number of generations (𝑔),rate of elitism 𝑒,rate of mutation 𝛿

Output: solution 𝜃(∗)//Initializationfor 𝑖 = 1 to 𝑔 do

// Return the value of average QoS describing the fitness of each individual 𝜃 ∈ Sfor all 𝜃 ∈ S doaverage QoSfl evalNetPerformance(𝜃, model)

end for// Elitism based selectionselect the best 𝑒 solutions// Crossovernumber of crossover 𝑛𝑐 = (𝑝size − 𝑒)/2for 𝑗 = 1 to 𝑛𝑐 do

randomly select two solutions 𝜃(𝑎) and 𝜃(𝑏)

generate 𝜃(𝑐) by arithmetic crossover to 𝜃(𝑎) and 𝜃(𝑏)end for// Mutationfor 𝑗 = 1 to 𝑛𝑐 do

mutate each parameter of 𝜃(𝑗) under the rate 𝛿 and generate a new solutionend for// The GA keeps on iterating until the new solution reaches the performance target.

end forreturn the best solution 𝜃(∗)

Algorithm 3: GA scheme.

should be reduced and to decide whether the pricepaid in terms of accuracy is acceptable.

(2) When we build ensemble methods, SVM and NNregression models perform better when they arebagged than when they are boosted. This wasexpected, as Bagging combines many weak predictors(i.e., the predictor is only slightly correlated with thetrue prediction) to produce a strong predictor (i.e., thepredictor is well-correlated with the true prediction).This works well for algorithms where by changing thetraining set the output changes.

Theopposite behaviour can be found in 𝑘-NNandDTregression models; that is, when these algorithms areboosted themodels tend to provide better results thanwhen they are bagged.That is, in order to improve theperformance of AdaBoost, we use suboptimal values,for the number of the neighbours (𝑘) for 𝑘-NN, andthe number of trees (𝑇) for DT; that is, we use valuesthat are not that good, but at least better than random.Therefore, we make weak predictors by tuning theparameters to avoid the cases in which the regressorsrespond similarly [31]. This is not the case for SVMand NN. Since these learning algorithms do not havean input parameter that we can adjust to obtain aweak predictor without affecting the accuracy of themodel, the probability that these algorithms provide

better performance when they are boosted than whenare bagged is lower. Some initial results about how aSVM can be used as a weak predictor can be foundin [32]. Another option could be treating this kind ofalgorithm as a weak regressor by using fewer samplesto train, as stated in [33].

(3) By applying different regression models, and in par-ticular when the SVM regressionmodel is bagged, weimprove by 5% the overall accuracy of the predictionwith respect to the 𝑘-NN model. More specifically, interms of the NRMSE, 𝑘-NN exhibits an error of 10%,while SVM halves this value to 5%.

Results suggest that while all the regression modelsexhibit high accuracy, the bagged-SVM learning model isthe one that better fits our needs and exhibits more accuratepredictions.Therefore, in this work we focus on bagged-SVMto build the best model that fits the data.

3.2. Putting It Together: ML-Based Network Planning Tool. Insummary, the proposed tool exploits the power of both com-ponents working together, namely, ML-based QoSmodellingand GA-based network optimisation. The former is capableof extracting the most relevant information out of wealth ofoperational data measured in complex ultradense networksand make meaningful predictions for the operator and theend-user. This results in powerful network performance

Mobile Information Systems 9

80 90 100 110 120 13070Number of features

0.064

0.070

0.076

NRM

SE

Figure 4: NRMSE, a function of different number of featuresselected by the SPCA.

30

40

50

60

70

80

90

100

Cum

ulat

ive v

aria

nce (

%)

200 400 600 800 1000 12000PCs

Figure 5: Cumulative contribution of each PC to the original data’svariance.

4 6 8 102PCs

0

100

200

300

Varia

nces

Figure 6: Variability of the data set as a function of the 𝑐 = 10 PCs.

prediction models that are fed into the latter, which exploresand finds the best combination of parameters for configuringthe network elements, where best means the one that givesthe best predicted QoS according to the learned model. Thewhole network planning process is depicted in Figure 7.

4. Performance Evaluation of the NetworkPlanning Tool

In order to evaluate the performance of the proposed scheme,we consider two cases of study.

(A) Case Study #1: Deployment Planning in a Dense Small CellScenario. In this use case, we exploit the experience gained

throughout the network to properly dimension and deploy(i.e., locate) the small cells in an indoor deployment.The goalis to improve the QoS offered to end-users while increasingthe spectral efficiency by reducing the PRBs used per Mbtransmitted.

(B) Case Study #2: Self-Healing to Compensate for Faults.Cell Outage Compensation (COC) is applied to alleviate theoutage caused by the loss of service from a faulty cell. For thisuse case, an adequate reaction is vital for the continuity ofthe service. As a result, vendor specific Cell Outage Detection(COD) schemes have also to be designed [34–36]. In this caseof study we assume the outage has already been detected, andwe focus on readjusting the network planning (antenna tilt)to solve the outage problem.

The planning tool aims at guaranteeing that the networkmeets the operator’s needs. We proceed to design the appro-priate planning tool by defining the following aspects:

(1) Data: we define the radio measurements that weextract and analyse from the network.

(2) Parameter: we define the network parameters that weaim to tune.

(3) Action: we define the possible actions to take in orderto optimise network performance.

(4) Objective: we define the system level target.

Table 3 presents this information for each use case understudy.

The results of the use cases of application are described inthe rest of the section.Wefirst present the simulation scenarioand then the simulation results for each use case.

(A) Case of Study #1: Deployment of Indoor Small Cells. Weaim at providing a network planning of a small cell indoordeployment to improve the QoS offered to end-users andto increase the resource efficiency (in PRB per Mb) of ourplanning.

(1) Simulation Scenario. The scenario that we set up consistsof 1 Enhanced Node Base station (eNB), with 3 sectors. Weneed to plan the deployment of the small cell network definedas the standard dual stripe scenario based on 1 block of 2buildings [37]. The building has 1 floor, with 20 apartments,which results in 40 apartments, as depicted in Figure 8. Weconsider that 1 small cell is located in each apartment andthe planning will decide which one will be switched ON orOFF.The parameters used in the simulations and the learningparameters are given in Table 4.

(2) Simulation Results. The simulation starts with an initialdeployment, where each apartment of the building has ran-domly deployed a small cell. The Signal to Interference andNoise Ratio (SINR) at each point of the scenario, obtainedthrough this initial deployment, is depicted in Figure 9. Theidea is to determine themost effective number and location ofsmall cells by evaluating the performance of each individual𝜃sc configuration. The configuration 𝜃sc is represented by abinary string, with dimension𝑁sc. Each element in the binary

10 Mobile Information Systems

Building machine learning models

Reducing high dimensional spaceFS-SPCA

Bagging

Preprocessingthe data

Collectingthe data

Loading the data

Evaluate networkdeploymentperformance

Selection

Crossover

Mutation

yes

no

OutputDesired

performancereached?

Create S randominitial feasible

solutions

Optimised network planning

Subsets oftraining samples

Tune SVMusing grid search

Data analysis

Data preparation

Tune

d m

odel

Figure 7: Bagged-SVM network planning tool architecture.

Table 3: Information relevant for the network planning tool.

Information Case study #1 Case study #2Data RSRP and RSRQ coming from the serving and neighbouring

cellsRSRP and RSRQ coming from the serving and

neighbouring cells

Parameter 𝜃sc = (𝜃sc1 , . . . , 𝜃sc𝑁sc ), which is a binary vector that denotes ifthe small cells are switched ON or OFF

𝜃tilt = (𝜃tilt1 , . . . , 𝜃tilt𝑀), which is a vector that denotes thetilt value associated with each of the surrounding cells

that are trying to fill the outage gapAction Switch ON or OFF each small cell Adjusting the antenna tilt parameter

ObjectiveIncreasing the resource efficiency of our planning by

dimensioning the deployment and locating LTE indoor smallcells

Readjusting the network planning to quickly solve anoutage problem

×2

Figure 8: Scenario.

string represents if the 𝑗th small cell is ON or OFF. A value of1means that the power transmission of the 𝑗th small cell is setto 23 dBm, while a 0means that the 𝑗th small cell is switchedOFF. At each iteration (i.e., generation of the GA), the GA𝜃scprocess evaluates different strings, and when the evaluationis done, the GA𝜃sc process provides a new configuration of

−150−100−50

050

100150200250300350

y-c

oord

inat

e (m

)

−20

−10

0

10

20

30

40

50SI

NR

(dB)

100 150 200 250 300 35050x-coordinate (m)

Figure 9: Initial deployment, in which we consider one small celllocated in each apartment.

small cells. Therefore, the proposed network planning tooltakes advantage of the model generated through the data

Mobile Information Systems 11

Table 4: Case of study #1: simulation parameters.

Parameter ValuePropagation loss model HybridBuildingsShadow fading Log-normal, std = 8 dBScheduler Proportional Fair (PF)AMC model LteAmc::MiErrorModel

Transport protocol User Datagram Protocol(UDP)

Traffic model Constant bit rateLayer link protocol Radio Link Control (RLC)

Mode Unacknowledged Mode(UM)

Macro cell scenarioNumber of eNBs 1 site with 3 cellseNB Tx power 46 dBm

Small cell scenarioInitial number of small cells (𝑁sc) 40Small cell Tx power 23 dBm

LTEBandwidth 5MHzNumber of RBs 25TTI 1ms

GAType binary-valuedSize of S (𝑝𝑠𝑖𝑧𝑒) 100Elitism 𝑒 2Mutation change 𝛿 0.1

Bagged-SVMNumber of iteration (𝛾) 1000Epsilon 𝜖 0.1

Kernel Radial Basis Function(RBF)

Simulation time 0.25 s

preparation and data analysis processes to estimate the QoSat any random point in the scenario given a certain 𝜃sc. Then,it learns online through the optimised network planning toolthe 𝜃sc

(∗) best configuration to the small cell deployment byimproving the configuration with each new generation (seeFigure 10). This process is referred hereafter as GA𝜃sc .

We evaluate the system performance of each deployment,represented by a binary string of dimension𝑁sc, on the ns-3LENA module. When we get a new configuration from theGA𝜃sc , we implement it in the system simulator and evaluateit again, until the network deployment reaches the networkperformance target set by the operator.

Figures 11 and 12 show the fitness of the best individualsfound in each generation (Best) and the mean of the fitnessvalues across the entire population (Mean), in terms ofPRB/Mb and Average Throughput, respectively. Figure 11depicts the time evolution of the average PRB per Mb in thescenario. We observe that as the generations proceed and the

−150−100−50

050

100150200250300350

y-c

oord

inat

e (m

)

−20

−10

0

10

20

30

40

SIN

R (d

B)

100 150 200 250 300 35050x-coordinate (m)

Figure 10: Final deployment, in which the GA𝜃sc

scheme finds thenumber of small cells to deploy.

BestMean

0.4

0.6

0.8

1

1.2

Aver

age P

RB p

er M

b

20 40 60 80 1000Generations

Figure 11: Evolution of the average PRB per Mb.

GA𝜃sc evolves, the PRB/Mb decreases as it was expected, andso the efficiency of the planning increases because the averagethroughout increases and the PRBs consumed decrease. Thatis, at the 60th generation, the GA𝜃sc reaches the best spectralefficiency for the planning. Figure 12 shows the evolution ofthe GA𝜃sc scheme in terms of the average throughput in thewhole network. We observe that for each new generation theGA𝜃sc scheme finds the number and location of small cellsthat maximise the throughput. Figure 10 shows the resultingSINR map of the deployment. It can be seen that the SINRin the represented region of the network is also globallyimproved.

A comparison of different planning schemes is shownin Figure 13. This figure shows the SINR performance forthree deployments: (1) one where all the small cells are ONand there is a small cell in each apartment (tagged as initialdeployment), (2) the proposedGA𝜃sc approach (tagged asfinaldeployment), which results in 28 deployed small cells, and (3)a benchmark deployment based on a greedy algorithm. Thegreedy algorithm searches for the best deployment by testinga certain number 𝑚subset of string vector configurations,defining the number and position of switched ON nodes.For each configuration, the greedy algorithm computes theQoS and then it selects the configuration that provides the

12 Mobile Information Systems

BestMean

25

26

27

Aver

age T

HR

(Mbp

s)

20 40 60 800Generations

Figure 12: Evolution of the average throughput in the wholenetwork.

Final deploymentInitial deploymentGreedy deployment

0 10 20 30−10SINR (dB)

0

0.5

1

CDF

Figure 13: CDF of the SINR (dB) in the building.

best QoS results. The argument 𝑚subset is discussed in [38].It offers a tradeoff between a high number of configurations,which would result in high computational effort, and lownumber of configurations, which may result in local optima.We have tested different values against performances and wefinally selected the 𝑚subset = 10000. While the greedy searchalgorithm rarely outputs optimal solutions, it often providesreasonable suboptimal solutions [39]. This can be observedin Figure 13, where the number of small cells in the initialdeployment is 40, in the greedy deployment it is 33, and whenapplying our GA𝜃sc scheme it is 28. We observe in this figurethat, in general, the GA𝜃sc tends to work more effectively,since it provides improved QoS while deploying a reducednumber of small cells. The reason for this is that the GA app-roach makes a much deeper estimation of the state of theenvironment through the regression analysis and counts ona more sophisticated combinatorial search scheme, basedon the genetic approach, than the greedy scheme, whichperforms a limited search andmay be trapped at local optima.

Active cell

Faulty sector

Neighbouring cell

eNB1

eNB2

eNB4

eNB3

×2

Figure 14: Scenario.

(B) Case of Study #2: Self-Healing to Compensate for Faults.Aswe alreadymentioned, in this case of application, we focus onreadjusting the network planning to quickly solve an outageproblem by adjusting the antenna tilt parameter. To evaluatethis, we generate a sector fault in a typical macrocellulardeployment. That is, during a certain period of time, a sectoris not able to offer service to its users.Therefore, the proposednetwork planning tool allows setting the antenna tilt for eachcompensating sector to automatically alleviate the outagecaused by the loss of service from a faulty sector [40].

(1) Simulation Scenario.We consider a Long Term Evolution(LTE) cellular network composed of a set of M eNBs. TheM eNBs form a regular hexagonal network layout withintersite distance𝐷 and provide coverage over the entire net-work. We assume that a sector in the scenario is down (seeFigure 14). The parameters to tune are the antenna tilts ofthe cells neighbouring the affected sector. In particular, thesurrounding 𝑀 cells automatically and continuously adjusttheir antenna tilt until the coverage gap is filled. The verticalradiation pattern of a cell sector is obtained according to [41],where the gain in the horizontal plane is given by

𝐺ℎ (𝜙) = max{−12( 𝜙HPBWℎ

) , SLLℎ} . (2)

Here 𝜙 is the horizontal angle relative to the maximumgain direction, HPBWℎ is the half power beam-width forthe horizontal plane, and SLLℎ is the side lobe level for the

Mobile Information Systems 13

horizontal plane. Similarly, the gain in the vertical plane isgiven by

𝐺V (𝜌) = max{−12((𝜌 − 𝜃tilt)HPBWV

) , SLLV} . (3)

All eNBs in the scenario have the same antenna model where𝜌 is the vertical angle relative to themaximum gain direction,𝜃tilt is the tilt angle, HPBWV is the vertical half power beam-width, and SLLℎ is the vertical side lobe level. Finally, the twogain components are added by

𝐺 (𝜙, 𝜌) = max {𝐺ℎ (𝜙) + 𝐺V (𝜌) , SLL0} + 𝐺0, (4)

where SLL0 is an overall side lobe floor and 𝐺0 the antennagain. We consider a cellular network whose system perfor-mance has been evaluated on the 3GPP-compliant ns-3 LENAmodule. The parameters used in the simulations and thelearning parameters are given in Table 5. The macrocell sce-nario that we set up consists of 4 eNB with 3 sectors each,which results in 12 cells, as depicted in Figure 14.

Therefore, each feasible solution corresponds to a vectorof𝑀 = 6 tilt values (i.e., each tilt value is associated with oneof the surrounding cells that are trying to fill the outage gap).

(2) Simulation Results. We analyse performance results obt-ained through the network planning tool described in Sec-tion 2. Figures 15–17 show the fitness of the best individualfound in each generation (Best) and the mean of the fitnessvalues across the entire population (Mean). We observe that,for each generation, the population (i.e., the possible com-binations of antenna tilts) tends to get better as generationsproceed. Figure 15 depicts the time evolution of the PRBper offered Mb in the scenario during the evolution of theGA𝜃tilt . We observe that as the GA𝜃tilt evolves, the efficiency ofthe planning increases. That is, as the generations proceed,the GA𝜃tilt finds the configuration parameter vector thatminimizes the value of PRBs per transmitted Mb.

In order to analyse the overall impact of the outage andits evolution for each new generation, we depict the averagethroughput for neighbouring cells only (Figure 16) and for thewhole network (Figure 17).

More specifically, Figure 16 depicts the time evolutionof the average throughput of the compensating sectors. Thatis, it shows the performance in terms of the throughput ofthe 6 neighbouring cells, which adjust their antenna tilt inorder to compensate for the faulty sector. Figure 17 describesthe average throughput of the whole network. From thisfigure, we observe that the GA𝜃tilt scheme achieves at the 30-th generation 23Mbps of average throughput in the wholenetwork, while, in the neighbouring cells (Figure 16), theachieved throughput is only 15Mbps, which is reasonable,due to the challenging service conditions in this area.

Finally, the GA𝜃tilt scheme is compared in Figure 18with the self-organised Reinforcement Learning- (RL-) basedapproach for COC proposed in [42], where in order to designthe self-healing solution (AC𝜃tilt ), the antenna tilt is adjusted.The considered RL approach is an Actor Critic algorithm,which is already proven to outperform different solutions for

BestMean

0.9

1

1.1

Aver

age P

RB p

er M

b

10 20 30 40 500Generations

Figure 15: Evolution of the average PRB per Mb.

BestMean

0

5

10

15Av

erag

e TH

R (M

bps)

10 20 30 40 500Generations

Figure 16: Evolution of the average throughput in the neighbouringcells.

BestMean

10

15

20

25

Aver

age T

HR

(Mbp

s)

10 20 30 40 500Generations

Figure 17: Evolution of the average throughput in the wholenetwork.

14 Mobile Information Systems

Table 5: Case of study #2: Simulation parameters.

Parameter ValuePropagation loss model HybridBuildingsShadow fading Log-normal, std = 8 dBScheduler PFAMCmodel LteAmc::MiErrorModelTransport protocol UDPTraffic model Constant bit rateLayer link protocol-mode RLC-UMMacro cell scenario

MMacroEnbSites 4 (each site has 3 cells)Number of cells 12eNB Tx power 46 dBm

LTEIntersite distance𝐷 500mBandwidth 5MHzNumber of RBs 25

Antenna parametersHorizontal angle 𝜙 −180∘ ≤ 𝜙 ≤ 180∘HPBW Vertical 10∘ : Horizontal 70∘Antenna gain 𝐺0 18 dBiVertical angle 𝜃 −90∘ ≤ 𝜙 ≤ 90∘SLL Vertical −18 dB :Horizontal −20 dBSLL0 −30 dB

GAType real-valuedTilt values (min–max) 0∘–15∘

Size of S (𝑝𝑠𝑖𝑧𝑒) 100Elitism 𝑒 2Mutation change 𝛿 0.1

Bagged-SVMNumber of iteration (𝛾) 1000Epsilon 𝜖 0.1Kernel RBFSimulation time 0.25 s

COC available in literature in [42]. Therefore, we considerthis to be a good benchmark for comparison.

Figure 18 depicts the CDF of the SINR of the network.We assume the user is out of outage when its SINR is abovethe threshold of −6 dB, as explained in [42]. Therefore, inthis figure, we observe that AC𝜃tilt is able to recover 95%of UE, while the GA𝜃tilt is able to recover all the UE. Weobserve in this figure that, in general, the GA𝜃tilt tends to workeffectively, since it maintains the diversity in the values takenby the parameters during the generations and itmakes amuchdeeper estimation of the state of the environment through theregression analysis approach than the AC𝜃tilt scheme, whichconsiders only the SINR and CQI feedback from the UE todetermine the state of the environment and to estimate thegeneral behaviour of the network.

From these figures, we observe that using the planningtool to adjust the antenna tilt parameter, we are able to

0

0.5

1

CDF

−5 0 5 10 15 20−10SINR (dB)

AC𝜽tiltGA𝜽tilt

Figure 18: CDFof the average SINRvalues in the faulty sector by twodifferent COC approaches.The GA

𝜃tiltbased scheme is compared to

a RL approach presented in [16].

alleviate the outage caused by the loss of service from a faultysector.

5. Concluding Remarks

In this paper, we have defined a methodology and built atool for smart and efficient network planning that exploitsand learns from the operational history reflected in the mea-surements gathered anywhere and at any time throughoutthe network. To build the QoS prediction model based onthis historical data, we collect UE measurements accordingto 3GPPMDT functionality and we apply regression analysistechniques to estimate the Physical Resource Blocks (PRB)per transmitted Mb. This model is then used as objectivefunction in a genetic algorithm (GA𝜃). By doing this, onecan observe that each new iteration increases the efficiency ofresource usage in the network while improving the through-put offered to the user.

To demonstrate the flexibility of the proposed smart ML-based planning tool, in this paper, we have decided to apply itto two very different use cases and scenarios: (1) deploymentof indoor small cells and (2) compensation of sector faults ina traditional macrocell deployment. For use case #1, resultsdemonstrate the ability of the proposed scheme to deploysmall cells in a network in such a way that the averagethroughput is increased while the PRBs consumed per Mbtransmitted decrease, hence improving the spectral efficiency.Regarding use case #2, results demonstrate the ability of theproposed scheme to compensate 100% of outage users inthe scenario and to offer them service. We have comparedthe performance of our approach in the context of the twoproposed cases of study to state-of-the-art solutions based ona greedy algorithm for case 1 and reinforcement learning forcase 2. Our scheme outperformed state-of-the-art solutionsin both cases. We believe that the same technique can besuccessfully applied to many other planning problems ofinterest for operators.

Mobile Information Systems 15

As a futurework, wewill analyse the planning of scenarioswhere a huge number of devices have to be provided withservice. This is expected to benefit the performance of theestimation, since the data base will be enriched by manymeasurements, which is the basis of the intelligence of theapproach.

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper.

Acknowledgments

The research leading to these results has received fundingfrom the Spanish Ministry of Economy and Competitive-ness FPI Research Programme (BES-2011-047309) under theGrant TEC2010-21100. The work of J. Moysen is also fundedby 5GNORM Project (TEC2011-29700-C02-01).

References

[1] 3GPPon track to 5G, http://www.3gpp.org/ news-events/3gpp-news/1787-ontrack 5g.

[2] 5GPPP.The5GInfrastructurePublic Private Partnership, https://5g-ppp.eu/.

[3] N. Baldo, L. Giupponi, and J. Mangues-Bafalluy, “Big dataempowered self organized networks,” in Proceedings of the 20thEuropean Wireless Conference (EW ’14), pp. 181–188, Barcelona,Spain, May 2014.

[4] B. Romanous, N. Bitar, A. Imran, and H. Refai, “Network den-sification: challenges and opportunities in enabling 5G,” in Pro-ceedings of the IEEE 20th International Workshop on ComputerAided Modelling and Design of Communication Links and Net-works (CAMAD ’15), Guildford, United Kingdom, September2015.

[5] RiverbedOPNETNetOne, https://support.riverbed.com/content/support/software/steelcentral-npm/net-planner.html.

[6] Cariden MATE Design, Adquired by CISCO, http://www.cisco.com/c/dam/en/us/td/docs/net mgmt/wae/6-1/design/user/guide/MATE Design User Guide.pdf.

[7] RSoft Design Group. Metrowand, https://optics.synopsys.com/rsoft/pdfs/RSoftProductCatalog.pdf.

[8] CelPlan. CelTrace TM Wireless Global Solutions, http://www.celplan.com/products/indoor/celtrace.asp.

[9] Net2Plan, “The open-source network planner,” http://www.net-2plan.com/publications.php.

[10] F. Chernogorov and T. Nihtila, “QoS verification for minimiza-tion of drive tests in LTE networks,” in Proceedings of the IEEE75th Vehicular Technology Conference (VTC ’12), pp. 6–9, IEEE,Yokohama, Japan, May 2012.

[11] F. Chernogorov and J. Puttonen, “User satisfaction classificationfor minimization of drive tests QoS verification,” in Proceedingsof the IEEE 24th Annual International Symposium on Personal,Indoor, and Mobile Radio Communications (PIMRC ’13), pp.2165–2169, London, UK, September 2013.

[12] J. Moysen, N. Baldo, L. Giupponi, and J. Mangues-Bafalluy,“Predicting QoS in LTE HetNets based on location-ind-ependent UE measurement,” in Proceedings of the 20th IEEEInternational Workshop on Computer Aided Modelling and

Design of Communication Links and Networks, Guildford, UK,2015.

[13] J. Moysen, L. Giupponi, and J. Mangues-Bafalluy, “On thepotential of ensemble regression techniques for future mobilenetwork planning,” in Proceedings of the IEEE Symposium onComputers and Communication (ISCC ’16), pp. 477–483, IEEE,Messina, Italy, June 2016.

[14] D. E. Goldberg,Genetic Algorithms in Search, Optimization, andMachine Learning, Addison-Wesley Longman, Boston, Mass,USA, 1989.

[15] 3GPP, “Radio performance and protocol aspects (system)—RFparameters and BS conformance,” Tech. Rep. TSG RAN WG4R4-092042, 2009.

[16] J. Moysen and L. Giupponi, “A reinforcement learning basedsolution for self-healing in LTE networks,” in Proceedings ofthe 80th IEEE Vehicular Technology Conference (VTC ’14-Fall),September 2014.

[17] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press,Cambridge, Mass, USA, 1996.

[18] J. Turkka, F. Chernogorov, K. Brigatti, T. Ristaniemi, and J. Lem-piainen, “An approach for network outage detection fromdrive-testing databases,” Journal of Computer Networks andCommunications, vol. 2012, Article ID 163184, 13 pages, 2012.

[19] F. Chernogorov, S. Chernov, K. Brigatti, and T. Ristaniemi,Data Mining Approach to Detection of Random Access SleepingCell Failures in Cellular Mobile Networks, Computer Science,Networking and Internet Architecture, 2015.

[20] J. Johansson,W.A.Hapsari, S. Kelley, andG. Bodog, “Minimiza-tion of drive tests in 3GPP (Release 11),” IEEE CommunicationsMagazine, 2012.

[21] S. T. Roweis, EM Algorithms for PCA and SPCA, Advances inNeural Information Processing Systems, The MIT Press, 1998.

[22] N. S. Altman, “An introduction to kernel and nearest-neighbornonparametric regression,” The American Statistician, vol. 46,no. 3, pp. 175–185, 1992.

[23] M. C. Bishop, Pattern Recognition and Machine Learning, Busi-nes Dia, Llc. Springer Science, 2006.

[24] A. J. Smola and B. Scholkopf, “A tutorial on support vectorregression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222,2004.

[25] J. R. Quinlan, Induction of Decision Trees. Machine Learning,Kluwer Academic, 1986.

[26] L. Rokach and O. Maimon, Data Mining with Decision Trees:Theory and Applications, World Scientific, 2008.

[27] D. Opitz and R. Maclin, “Popular ensemble methods: an emp-irical study,” Journal of Artificial Intelligence Research, vol. 11, pp.169–198, 1999.

[28] L. Rokach, “Ensemble-based classifiers,” inArtificial Intelligence,pp. 1–39, 2010.

[29] T. G. Dietterich, “An experimental comparison of three meth-ods for constructing ensembles of decision trees: bagging,boosting, and randomization,” Machine Learning, vol. 40, no.2, pp. 139–157, 2000.

[30] T. G. Dietterich, Machine-Learning Research: Four CurrentDirections, American Association for Artificial Intelligence,1997.

[31] X. Li, L. Wang, and E. Sung, “AdaBoost with SVM-based com-ponent classifiers,” Engineering Applications of Artificial Intelli-gence, vol. 21, no. 5, pp. 785–795, 2008.

16 Mobile Information Systems

[32] E. Mayhua-Lopez, V. Gomez-Verdejo, and A. R. Figueiras-Vidal, Boosting Ensembles with Subsampled LPSVM Learners,Universidad Carlos III de Madrid (UC3M), 2013.

[33] E. Garcıa and F. Lozano, “Boosting Support Vector Machines,”in Proceedings of the 5th International Conference on MachineLearning and Data Mining in Pattern Recognition, pp. 153–167,Leipzig, Germany, July 2007.

[34] O. Onireti, A. Zoha, J. Moysen et al., “A cell outagemanagementframework for dense heterogeneous networks,” IEEE Transac-tions on Vehicular Technology, vol. 65, no. 4, pp. 2097–2113, 2016.

[35] E. J. Khatib, R. Barco, P. Munoz, I. D. La Bandera, and I. Ser-rano, “Self-healing in mobile networks with big data,” IEEECommunications Magazine, vol. 54, no. 1, pp. 114–120, 2016.

[36] A. Gomez-Andrades, P.Munoz, I. Serrano, and R. Barco, “Auto-matic root cause analysis for LTE networks based on unsuper-vised techniques,” IEEE Transactions on Vehicular Technology,vol. 65, no. 4, pp. 2369–2386, 2016.

[37] 3GPP, “Technical specification group radio access network;Evolved universal terrestrial radio access (E-UTRA); Furtheradvancements for E-UTRA physical layer aspects,” Tech. Rep.TR 36.814, 2010.

[38] P. Romanski and L. Kotthoff, “Package FSelector,” https://cran.r-project.org/web/packages/FSelector/FSelector.pdf.

[39] M. Hazewinkel, “Preface,” Discrete Mathematics, vol. 227-228,pp. 1–4, 2001.

[40] J. Moysen, L. Giupponi, and J. Mangues-Bafalluy, “A machinelearning enabled network planning tool,” in Proceedings of the27th IEEE Personal Indoor and Mobile Radio Communications(PIMRC ’16), Valencia, Spain, 2016.

[41] F. Gunnarsson, M. N. Johansson, A. Furuskar et al., “Downtil-ted base station antennas—a simulation model proposal andimpact on HSPA and LTE performance,” in Proceedings of the68th IEEE Vehicular Technology Conference (VTC Fall ’08), pp.1–5, Calgary, Canada, September 2008.

[42] J. Moysen and L. Giupponi, “A reinforcement learning basedsolution for self-healing in LTE networks,” in Proceedings of the80th IEEE Vehicular Technology Conference (VTC Fall ’14), pp.1–6, IEEE, Vancouver, Canada, September 2014.

Submit your manuscripts athttps://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Applied Computational Intelligence and Soft Computing

 Advances in 

Artificial Intelligence

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014