Load balancing and handoff in lte

Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/269705521

LoadbalancingandhandoverjointoptimizationinLTEnetworksusingFuzzyLogicandReinforcementLearning

ARTICLEinCOMPUTERNETWORKS·NOVEMBER2014

ImpactFactor:1.26·DOI:10.1016/j.comnet.2014.10.027

READS

173

3AUTHORS:

P.Muñoz

UniversityofMalaga

20PUBLICATIONS79CITATIONS

SEEPROFILE

RaquelBarco

UniversityofMalaga


SEEPROFILE

IsabeldelaBanderaCascales

UniversityofMalaga


SEEPROFILE

Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate,

lettingyouaccessandreadthemimmediately.

Availablefrom:P.Muñoz

Retrievedon:30October2015

http://www.researchgate.net/publication/269705521_Load_balancing_and_handover_joint_optimization_in_LTE_networks_using_Fuzzy_Logic_and_Reinforcement_Learning?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_2

http://www.researchgate.net/publication/269705521_Load_balancing_and_handover_joint_optimization_in_LTE_networks_using_Fuzzy_Logic_and_Reinforcement_Learning?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_3

http://www.researchgate.net/?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_1

http://www.researchgate.net/profile/P_Munoz?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_4


http://www.researchgate.net/institution/University_of_Malaga?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_6


http://www.researchgate.net/profile/Raquel_Barco?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_4




http://www.researchgate.net/profile/Isabel_De_La_Bandera_Cascales?enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA%3D%3D&el=1_x_4




Computer Networks 76 (2015) 112–125

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/locate /comnet

Load balancing and handover joint optimization in LTEnetworks using Fuzzy Logic and Reinforcement Learning

http://dx.doi.org/10.1016/j.comnet.2014.10.0271389-1286/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author. Tel.: +34 952 134 164; fax: +34 952 132 027.E-mail addresses: [email protected] (P. Muñoz), [email protected]

(R. Barco), [email protected] (I. de la Bandera).

P. Muñoz ⇑, R. Barco, I. de la BanderaUniversity of Málaga, Communications Engineering Dept., Campus de Teatinos, 29071 Málaga, Spain

a r t i c l e i n f o

Article history:Received 10 June 2014Received in revised form 29 October 2014Accepted 31 October 2014Available online 13 November 2014

Keywords:Load balancingHandoverSelf-organizing networksLong-term evolutionFuzzy logicReinforcement learning

a b s t r a c t

With the growing deployment of cellular networks, operators have to devote significantmanual effort to network management. As a result, Self-Organizing Networks (SONs) havebecome increasingly important in order to raise the level of automated operation in cellulartechnologies. In this context, Load Balancing (LB) and Handover Optimization (HOO) havebeen identified by industry as key self-organizing mechanisms for the Radio Access Net-works (RANs). However, most efforts have been focused on developing a stand-alone entityfor each self-organizing mechanism, which will run in parallel with other entities, as wellas designing coordination mechanisms in charge of stabilizing the network as a whole. Dueto the importance of LB and HOO, in this paper, a unified self-management mechanismbased on Fuzzy Logic and Reinforcement Learning is proposed. In particular, the proposedalgorithm modifies handover parameters to optimize the main Key Performance Indicatorsrelated to LB and HOO. Results show that the proposed scheme effectively provides betterperformance than independent entities running simultaneously in the network.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

In the last years, cellular networks have experienced alarge increase in size and complexity. As a result, mobileoperators have focused attention on reducing capitalexpenditures (CAPEX) and operational expenditures(OPEX) of their networks [1]. This fact has stimulatedstrong research activity in the field of Self-OrganizingNetworks (SON), which is a set of principles and conceptsdefined by the 3rd Generation Partnership Project (3GPP)for automating network management while improvingnetwork quality [2]. In the context of SON, certainfunctions have been identified as key enablers by the3GPP, among which are Load Balancing (LB) and HandoverOptimization (HOO). The former is an automated functionwhere cells suffering occasional congestion can transfer

load to neighbor cells, which have spare resources, by e.g.adjusting mobility parameters. The latter is a solution forautomatic detection and correction of errors and subopti-mal settings in the mobility configuration, which may leadto a degradation of user performance. Many efforts in theresearch community have been devoted to the so-calledMobility LB (MLB) and Mobility Robustness Optimization(MRO), for which the 3rd Generation Partnership Project(3GPP) has specified particular features [3]. Typically, thesefunctionalities are implemented at a low-level in the net-work architecture, meaning that they operate quickly (i.e.at time scales of the order of seconds or less) and theyare located in each base station on the access network. Inthis sense, less or no attention has been paid to LB andHOO at higher levels, e.g. at the level of the Operations,Administration, and Maintenance (OAM) system, whichtypically operates slower (i.e. at time scales of the orderof minutes or even hours) and they are not necessarilylocated in the base stations (e.g. they can be located in a

http://crossmark.crossref.org/dialog/?doi=10.1016/j.comnet.2014.10.027&domain=pdf

http://dx.doi.org/10.1016/j.comnet.2014.10.027

mailto:[email protected]



http://dx.doi.org/10.1016/j.comnet.2014.10.027

http://www.sciencedirect.com/science/journal/13891286

http://www.elsevier.com/locate/comnet

https://www.researchgate.net/publication/235407956_Self-configuration_-optimisation_and_-healing_in_wireless_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

P. Muñoz et al. / Computer Networks 76 (2015) 112–125 113

server on the core network). Thus, network management atthis level copes with slower changes in the network, whoseimpact on performance can be even more important, sincethe underlying variations to be tracked are typically ratherslow as well [4]. In addition, the data available in the OAMsystem is much more abundant than in the base stations,thereby allowing more efficient and powerful networkmanagement. As a result, the implementation of this kindof algorithms will provide great benefits and cost-savingsto operators.

As the deployment of stand-alone SON functions isgrowing, the number of conflicts and dependenciesbetween them increases. A conflict can happen if, forexample, two individual SON functions optimize the sameparameter with different goals at a network element [5]. Asexpected, conflicts may have a negative impact on networkperformance. The common solution in SON research hasbeen to create an additional entity, usually called coordina-tor, which manages the conflict. Typically, an entitycausing conflict is switched off or limited in the controlstrategy, e.g. by decreasing the allowed range, themaximum allowed step sizes or the periodicity at whichparameter control takes place. The study of SON coordina-tion is a topic recently addressed in the bibliography. Onthe one hand, there are several studies with the aim ofdeveloping a functional framework for SON coordination[6–10]. On the other hand, further efforts have beendevoted to specific solutions for coordination of two ormore SON functionalities [11]. Special attention has beendevoted to the coordination of MLB and MRO, addressedby the SOCRATES project [12]. In particular, the studyassumes the control parameters of the MLB and MROalgorithms to be independent of each other, i.e. the twoalgorithms do not tune the same parameters. While theMLB function adjusts the HO margin (HOM), the MROfunction adjusts the Time-To-Trigger (TTT) and hysteresisparameters. The interactions exist because these two func-tions influence the same Key Performance Indicators (KPIs)that are used as input for the optimization algorithms. In[13], a constraint for the connection quality more restric-tive than the one assumed in the SOCRATES project is con-sidered. In this sense, the MLB function is restrained infavor of the HO performance optimization. In [14], to avoidthe conflict between MLB and MRO, the HOM range of MLBis dynamically adjusted according to the TTT and the hys-teresis parameters, which are first adjusted considering theeffect of the user speed.

Although the coordinator-based schemes have beenwell accepted by the research community, there are somerelated issues. Specifically, the definition of operatorpolicies becomes a complex task, since there exists atrade-off between proper controllability and ease of use[10]. In addition, when some limitations are applied tothe control strategy (e.g. by restricting the step size), theoptimal configuration may lie outside the space of possiblesolutions. Another problem is related to the prioritizationof SON functions in a centralized coordination scheme,which is the typical implementation due to the requiredintegration with a (centralized) legacy OAM system [15].Under this situation, the coordination entity has to processmany parameter configuration requests, so that the risk of

monopolization by high priority functions is high. Due tothis, the joint optimization of SON functions has also beenaddressed. In [16], the problem of coordinating capacityand coverage optimization and MLB is addressed. Insteadof implementing an additional entity that coordinates theoutcomes of each independent function, these functionsare combined into one algorithm and then the cellular net-work is optimized towards a joint target. Similarly, in [17],instead of controlling the conflict between independentMRO and MLB functions, a joint optimization algorithm isproposed. Such an algorithm adjusts the same HO param-eters for individual users (i.e. each user has individual val-ues of the same HO parameters). This solution reducesunnecessary HOs for some users that should not be handedover to the neighbor cell. However, it is noted that, at thelevel of the OAM system, this feature is hard to be imple-mented since statistics are rarely given per user-level, inaddition to the high signaling cost that this kind of optimi-zation would involve. In [18], the proposed MRO and MLBalgorithm prioritizes the MRO part, since KPIs related tothe connection quality (e.g. the radio link failure) are con-sidered first. However, other important KPIs from the MROviewpoint, such as those associated with unnecessary HOs,are not taken into account in the study, which makes moredifficult to achieve optimal performance.

For all those reasons, in this paper, a novel unified algo-rithm for both LB and HOO in Long-Term Evolution (LTE)networks is proposed. This algorithm is based on a FuzzySystem (FS) that tunes the handover (HO) parameters atthe cell adjacency level to improve network performance.The FS is optimized by the Q-Learning algorithm, whichdrives it to select the most appropriate action either dueto LB and/or HOO reasons. The decision of which actionthe FS should take depends on past actions which weretaken by the FS and whose impact on network perfor-mance was measured through the KPIs. With the proposedsolution, the complexity of the SON coordination entitywould be reduced, as it is freed from the coordination oftwo important SON functions. In addition, the proposedalgorithm is expected to achieve better performance, asits space of all candidate solutions is not as restricted asif a coordinator-based scheme or some type of prioritiza-tion algorithm would be used.

The rest of the paper is organized as follows. Section 2formulates the problem and introduces the mobilityalgorithm in LTE networks and the system performancemetrics. In Section 3, the design of the proposed FS as wellas its optimization process is described. Section 4 presentsthe simulation setup and discusses the simulation results.Finally, Section 5 presents the main conclusions of thestudy.

2. System model

The HO is the procedure that preserves the connectionwhen the user moves around the network. As LTE is beingdeployed with a frequency reuse of one (i.e. the same fre-quency is shared by all cells), the intra-frequency HO isvery common in these networks. More specifically, themost widely extended algorithm for the HO-triggeringdecision is the 3GPP A3 event [19]. Roughly, this algorithm

https://www.researchgate.net/publication/221293442_A_coordination_framework_for_self-organisation_in_LTE_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/221293442_A_coordination_framework_for_self-organisation_in_LTE_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/261335781_A_Dynamic_Hysteresis-Adjusting_Algorithm_in_LTE_Self-Organization_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/260670991_Dynamic_optimization_of_Handover_Parameters_Adjustment_for_Conflict_Avoidance_in_long_term_evolution?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/261357228_Efficient_Dynamic_Coordination_of_Request_Batches_in_C-SON_Systems?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/257680635_A_Novel_Dynamic_Adjusting_Algorithm_for_Load_Balancing_and_Handover_Co-Optimization_in_LTE_SON?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/230802819_Coordination_of_autonomic_functionalities_in_communications_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/221171753_A_Mathematical_Perspective_of_Self-Optimizing_Wireless_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/260799708_A_Heuristic_Coordination_Framework_for_Self-Optimizing_Mechanisms_in_LTE_HetNets?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/261377751_SON_Coordination_in_a_Unified_Management_Framework?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/235932644_Improving_coverage_and_load_conditions_through_joint_adaptation_of_antenna_tilts_and_cell_selection_rules_in_mobile_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/261383527_Difference-Based_Joint_Parameter_Configuration_for_MRO_and_MLB?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/261380521_A_framework_for_classification_of_Self-Organising_network_conflicts_and_coordination_algorithms?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

114 P. Muñoz et al. / Computer Networks 76 (2015) 112–125

triggers the execution of an HO if the neighbor cellbecomes offset better than the serving cell during a specifictime period determined by the TTT parameter. Formally, itis expressed as:

RSRPj > RSRPi þHOMi!j; ð1Þ

where RSRPi and RSRPj are the averaged values of the Ref-erence Signal Received Power (RSRP) measured for servingcell i and target cell j respectively, and HOMi!j is the HOMfrom cell i to cell j. Note that the symmetric HOMj!i is alsodefined in the opposite direction of the adjacency (i.e. apair of cells that are neighbors).

In contrast with HO-triggering decisions based onabsolute comparisons (e.g. the serving/neighbor cellbelow/above a threshold), the A3 event consists of a rela-tive comparison that simplifies the configuration of itsparameters since they are independent of the absolutereceived power levels, which may depend on diverse con-text factors. However, the HOM in Eq. (1) is broken downinto several terms by the 3GPP, so that:

HOMi!j ¼ Hysþ Ofi� Ofjþ Oci� Ocjþ Off ; ð2Þ

where Hys and Off are the hysteresis and offset parameters,respectively, for this event, Ofi and Oci are the frequencyand cell specific offsets, respectively, for serving cell i,and Ofj and Ocj are the frequency and cell specific offsets,respectively, for neighbor cell j. While only one value ofHys and Off corresponding to the A3 event can be usedfor all the cells and deployed frequencies in the network,Oci and Ocj can be defined per cell and Ofi and Ofj can bedefined per frequency layer. In addition to this, the defini-tion of Hys implies the existence of another inequality inwhich this term has opposite sign:

RSRPj < RSRPi � Hysþ Ofi� Ofjþ Oci� Ocjþ Off : ð3Þ

This inequality is called the leaving condition for thisevent. Assuming that the entering condition given by Eq.(1) was previously satisfied, the leaving condition mustbe satisfied to reset the TTT parameter. By optimizingHys, the impact of signal fluctuations on the handover pro-cess can be effectively reduced.

In general, the HOO function is directly related to theparameter Hys, so that its optimization could only beperformed at the event level. Conversely, LB function is

cell iRSRP

HOMji HOMij

cell j

cell i cell j

R

(a)

Fig. 1. Adjustment of HOM for (a)

more related to HO parameters that are defined at the celllevel (e.g. Oci and Ocj). In practice, different parameters areoptimum for different cell pairs (e.g. due to the shadowingvariations). Hence, the model adopted in this paper isbased on a joint optimization of LB and HOO at the celllevel. Specifically, this model requires only one formula(e.g. the entering condition given by Eq. (2)) and oneparameter, denoted as HOM and defined per adjacency.The only condition is that both HOM values in the adja-cency (i.e. HOMi!j and HOMj!i) are simultaneously tunedto perform the joint optimization of LB and HOO. Note thatthis model complies with the 3GPP specifications if theparameter Hys is set to zero and the rest of the parametersare grouped into a single parameter defined per adjacency.

To facilitate the joint optimization, the parameters needto be expressed in a more understandable way, accordingto the following relationship in an adjacency x:

HðxÞ þ OðxÞ ¼ HOMðxÞi!j;

HðxÞ � OðxÞ ¼ HOMðxÞj!i;

8<: ð4Þ

where HðxÞ is the parameter related to HOO that representsa hysteresis and OðxÞ is the parameter related to LB that rep-resents a certain offset. Intuitively, the HOM determinesthe area where the users connected to a cell would performan HO toward a neighbor cell. On the one hand, in the con-text of HOO and assuming that OðxÞ ¼ 0, the parameterHOMðxÞ

i!j and the symmetric HOMðxÞj!i can be set to the same

positive value (given by HðxÞ) so that certain symmetricregion between the two cells is ensured in order to avoidunnecessary HOs. In Fig. 1(a), the RSRP of the neighborand the serving cells are represented. For instance, if a userconnected to cell i moves to cell j, it connects to cell j whenthe RSRP from cell j is equal to the RSRP from cell i plusHOMðxÞ

i!j. As in this case the HOMs are assumed to be sym-metric, the same value is applied to the opposite situation(i.e. when the user moves from cell j to i;HOMðxÞ

j!i is used).Decreasing such a symmetric region (i.e. both HOMs)favors UEs to perform an HO to a neighbor cell. This isbecause the UE would need a lower received power fromthe target cell to trigger an HO. Conversely, increasingthe HOMs makes more difficult to perform an HO, meaningthat the user spends more time attached to the serving cell,while the connection quality is getting worse, which could

cell iSRP

HOMji HOMij

cell j

cell i cell j

(b)

MRO and (b) MLB purposes.


lead to call dropping. However, under this configuration,possible unnecessary HOs due to signal fluctuations canbe avoided. In this sense, the HOO function aims to reducethe inefficient usage of network resources due to unneces-sary HOs provided that the level of call dropping is lowenough. An important KPI closely related to the problemof unnecessary HOs is the HO Ratio (HOR), defined in thiswork as the number of HOs divided by the total numberof carried calls.

On the other hand, maintaining the symmetric region,both HOMs can also be jointly tuned by means of OðxÞ (e.g.HOMðxÞ

i!j is increased while HOMðxÞj!i is decreased) so that

the service area of these cells is modified for LB purposes.In this case, both HOMs are modified with the samemagnitude to preserve the hysteresis region. However,those variations in HOM should have opposite sign to mod-ify the service area of the two cells. In Fig. 1(b), it is observedthat HOMðxÞ

i!j has been increased, while HOMðxÞj!i has been

decreased. As a result, the service area of the cell i is larger.Considering, for example, that cell j is overloaded, its servicearea is reduced while the service area of the adjacent cellswith spare resources (e.g. cell i) is increased to take usersfrom the congested cell edge. Thus, cells suffering occa-sional congestion can transfer load to neighbor cells, whichhave free resources, by adjusting mobility parameters. It isnoted that, since HOMs are defined on an adjacency basis,cell service areas cannot be only re-sized but also re-shaped.The most important benefit of LB is that call blocking in thenetwork is reduced, especially in those cells highly loaded.The function responsible for accepting or blocking a call isthe Call Admission Control (CAC). Such a function checksthe availability of free resources in the candidate cell beforetaking a decision. In this paper, a ‘worst-case’ criterion hasbeen adopted to accept calls, i.e. the user is finally acceptedif the highest number of radio resources needed to maintaina connection (worst-case requirement) is less or equal thanthe number of radio resources available in the candidatecell. If the condition is not satisfied by any candidate cell,then the call is blocked. To quantify the call blocking,network operators typically use the Call Blocking Ratio(CBR), defined as the number of blocked calls divided bythe number of call attempts.

Finally, the actions performed by LB and HOO may alsoinvolve a decrease in the connection quality. To explainthis, on the one hand, let’s consider the HOM to bedecreased for LB reasons. The target cell will increase theprobability to be preferred to the serving cell, even if theconnection quality is worse due to negative HOM values.If this is the case, some users, usually located in the celledge, will be handed over to the target cell experiencingworse radio conditions as a result of the performed HO.Thus, negative values of the HOM will increase the risk ofdropping. On the other hand, when the HOM is increasedfor either LB or HOO reasons, the user will spend more timeattached to the serving cell, delaying an HO towards cellswith better radio conditions. In this case, the probabilityof call dropping will also increase. For these reasons, a KPIwidely used by network operators is the Call DroppingRatio (CDR), defined as the number of dropped calls dividedby the number of finished calls. Typically, call droppingmay occur because the connection quality is bad, but also

because there are no available resources due to an overloadsituation. In this paper, the calculation of the CDR includesthose dropped calls due to bad radio conditions. In particu-lar, a call is dropped when the Signal-to Interference plusNoise Ratio (SINR) is below a certain threshold during aspecific time interval. The call dropping due to an overloadsituation is assumed to be negligible due to the correctoperation of the CAC in the network, since enoughresources are guaranteed by the CAC for the accepted calls.

3. Joint optimization algorithm

This section explains the proposed algorithm for LB andHOO. The first part comprises the design of the FS, describ-ing the inputs, the outputs and the behavior of the system.After this, the second part of the section is devoted to theoptimization technique that is used to lead the FS in theaction selection.

3.1. The fuzzy system

As an alternative to Classical Logic, Fuzzy Logic is amathematical discipline that introduces a degree ofvagueness when an assertion is made [20]. The design ofa FS for control problems is one of the most importantapplication areas of Fuzzy Logic [21]. Its main benefit isthat controlling a system can be performed by using lin-guistic terms such as high or low instead of providing anumerical value when defining the reference values ofthe controller. Experience has shown that Fuzzy Logic Con-trollers (FLCs) provide results superior to those obtained byconventional control algorithms. In particular, the method-ology of the FLC becomes very useful when the processesare too complex for analysis by conventional quantitativetechniques or when the available sources of informationare interpreted qualitatively, inexactly, or uncertainly [22].

The proposed FS is designed on the basis of FLCs, butthere are some differences. From the operational perspec-tive, it combines the functionalities of LB and HOO, i.e. todecrease the call blocking and the HO signaling load,respectively, while at the same time the connection qualityis preserved. To achieve this, the LB part is inspired in theLB algorithm proposed in [23] and the HOO part is inspiredin the HOO algorithm proposed in [24]. In both cases, thedeveloped algorithms only implement a unique SONfunction which iteratively adjusts the HOM to optimizethe respective KPIs. However, in the case of LB, the HOMvariations in both directions of any adjacency have thesame magnitude but opposite sign. In this paper, this isequivalent to modify OðxÞ, as shown in Eq. (4). In the caseof HOO, only the magnitude of the HOM variations is chan-ged (i.e. the sign remains unchanged), which is equivalentto adjusting HðxÞ in Eq. (4).

The design of an FS that integrates both functionalitiesposes new challenges. Typically, in the context of FLCs, ifthe system is composed of two outputs, the design isbroken down into two new FLCs with one output each. Inthis paper, the proposed joint optimization algorithmaffects two different components of the HOM parameter,OðxÞ and HðxÞ, whose combination results in the values of

https://www.researchgate.net/publication/243658533_Fuzzy_Logic_With_Engineering_Applications?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/210908982_Computational_Intelligence_An_Introduction_Second_Edition?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/3113724_Fuzzy_logic_in_control_systems_fuzzy_logic._Parts_I_and_II_IEEE_Trans?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/236011337_Optimization_of_load_balancing_using_fuzzy_Q-Learning_for_next_generation_wireless_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/236011325_On_the_Potential_of_Handover_Parameter_Optimization_for_Self-Organizing_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==


HOMðxÞi!j and HOMðxÞ

j!i, both set in the adjacency x. In princi-ple, this design may require two separate FLCs as two out-puts are involved. However, such a solution is directlyrelated to the coordinator-based schemes mentioned inSection 1, in which the FLCs would be coordinated by anupper-level entity. To avoid the problems linked to thiskind of solutions, in this paper, the proposed FS integratesboth functionalities into one entity, whose structure isdepicted in Fig. 2. It is assumed that each adjacency inthe network has this entity implemented. As observed,

the inputs of the FS in the adjacency x are the CBRðxÞji , the

HOR(x) and the HOMðxÞij . The first input is a derived KPI, cal-

culated as:

CBRðxÞji ¼ CBRðxÞj � CBRðxÞi ; ð5Þ

where CBRðxÞi and CBRðxÞj are the CBR measured in cell i andcell j, respectively. This input allows to balance the trafficbetween adjacent cells. In principle, CBRðxÞji could take neg-ative values if the cell i has higher CBR than cell j. However,to simplify the behavior of the FS, the FS is always appliedin the direction of the adjacency in which cell j has equal orhigher CBR than cell i. The second input, HOR(x), is used toreduce the HO signaling load when possible. This KPI iscalculated considering those HOs and calls carried in bothcells of the adjacency. The third input, HOMðxÞ

ij , is thecurrent value of the HOM, whose aim is to determine whenthe HOM is reaching high values, in which case the connec-tion quality would be significantly impacted. More pre-cisely, the current value of HOM is the one taken in theopposite direction of the CBR difference, i.e. from cell i tocell j. Finally, the output of the FS, called Y ðxÞ, is a variablewhose possible values refer to simultaneous variations inOðxÞ and HðxÞ. More specifically, one output value is given

CBR ji(x)

HOR(x)

HOMij(x)

Inference

...

...

Fuzzifier·

·

·

Network

Fig. 2. Scheme of th

by the concatenation of two fields, the former correspond-ing to the variation in OðxÞ and the latter to the variation inHðxÞ, i.e.:

Y ðxÞ ¼ ðDOðxÞ;DHðxÞÞ: ð6Þ

This solution allows to overcome the problem of FLCs inwhich the output is one variable that can only takesdiscrete or continuous values in a certain range. Let alsoassume that the variation for the component OðxÞ in a cer-tain step of the algorithm can only be �d, 0 or þd, whilethe variation for HðxÞ can only be �s, 0 or þs. For example,if the output is Y ðxÞ ¼ ðþd;0Þ, then the assignment can beformally expressed as:

OðxÞðt þ 1Þ OðxÞðtÞ þ d; ð7ÞHðxÞðt þ 1Þ HðxÞðtÞ þ 0: ð8Þ

Once the inputs and the output have been determined,the next step is their characterization from the fuzzy logicperspective. Starting with the inputs, it is necessary todefine the fuzzy sets and membership functions associatedwith them, as shown in Fig. 3. Each fuzzy set should beidentified with a linguistic term (e.g. ‘low’ or ‘very low’).The need to work with fuzzy sets comes from the existenceof concepts with no clear boundaries in their definition. Inthis context, when working with KPIs, it is often difficult todetermine from which value a KPI is considered to be jeop-ardized. For this reason, two fuzzy sets, ‘low’ and ‘high’,have been defined for each input. In the case of HOM, theobjective of defining two fuzzy sets is to identify whenthe HOM is close to saturation, since the CDR may be neg-atively affected. In addition, for each fuzzy set of theinputs, a membership function, denoted by lV ðuÞ, quanti-fies the degree of membership of a given input value u toa certain fuzzy set V, with a value between 0 and 1. Thus,

·ΔO(x)

ΔH(x)

HOMij(x)

ji(x)

HOM

Conversion

O(x) H (x)

Y = ( , H )ΔO Δ(x) (x)

e proposed FS.

low high1

0

μ (x)ji

CBR(x)ji0.03

low high1

0

μ (x)

HOR(x)1

low high1

0

μ (x)ij

HOM(x)ij

6 8

(a) (b) (c)

[dB]

Fig. 3. Membership functions of the fuzzy sets for each input: (a) CBRðxÞji , (b) HOR(x) and (c) HOMðxÞij .


unlike in classical sets, the transition between both valuesis gradual. For simplicity, as shown in Fig. 3, the selectedmembership functions follow a triangle-shaped or trape-zoid-shaped functions.

The core of the FS is given by a rule base that representsthe dynamic behavior of the FS through a set of linguisticrules derived from the expert knowledge. Such a rule basecomprises a collection of fuzzy rules following a syntax ofthe type IF-THEN to set the control strategy, e.g.:

IF ðCBRðxÞji is highÞ & ðHORðxÞ is lowÞ & ðHOMðxÞij is lowÞ

THEN Y ðxÞ ¼ ðþd;0Þ: ð9Þ

To define these rules, the knowledge and experience ofhuman experts is normally required. Each antecedent ofthe rules represents an input state and the number of rulesis derived from the combination of all fuzzy sets among thedifferent inputs. In this sense, the definition of the rulebase must be complete (i.e. all fuzzy rules defined), so thatthe FS can generate an appropriate action for every inputstate in the system. The rule base for the proposed FS isshown in Table 1. The definition of each rule is as follows.First, rule 1 is activated when the CBR is balanced, and theHOR and HOM remain at low values, meaning that nochange in HOM is needed (i.e. Y ðxÞ ¼ ½0;0�). Rules 2–4 havein common that they are triggered when the HOM is low.As stated before, a low HOM means that the connectionquality is not jeopardized due to changes in HOM, so thatboth LB and HOO actions can be applied without anyrestriction. For this reason, rule 2, which is triggered whenonly the CBR presents undesired behavior, implements anaction of LB (i.e. Y ðxÞ ¼ ½þd;0�). Specifically, this rule shrinksthe service area of the congested cell, in order to send

Table 1Proposed sets of fuzzy rules.

Ruleno.

Input 1

CBRðxÞji

Input 2HOR(x)

Input 3

HOMðxÞij

Candidate action(s)

[DOðxÞ;DH(x)]

1 L L L [0,0]2 H L L [þd,0]3 L H L [0,þs]4 H H L [þd,0], [0,þs]5 L L H [�d,0], [0,�s]6 H L H [0,0], [�d,0]7 L H H [0,0], [0,�s]8 H H H [�d,0], [0,�s]

traffic to the adjacent cell. Regarding rule 3, it is activateddue to HOO reasons, i.e. when the HOR reaches high values.In this case, the proposed solution is to increase the sym-metric region between adjacent cells (i.e. Y ðxÞ ¼ ½0;þs�),so that the number of unnecessary HOs can be reduced.To activate rule 4, the CBR and HOR must have high valuessimultaneously. In principle, it is unclear whether theperformed action should be related to LB or to HOO. Thisdecision may depend on the particular scenario, so thattrial-and-error strategies (e.g. reinforcement learning) areappropriated in this case. The next section explains howto select the optimal consequent for this rule.

The remaining rules (i.e. rule 5, 6, 7 and 8) are all linkedto high values of the HOM, which may indicate that the con-nection quality is very poor, especially for cell-edge users.In rule 5, the CBR and HOR must exhibit low values to acti-vate this rule, meaning that the problems related to LB/HOOwere mitigated by increasing HOM. The objective of rule 5will be to decrease the HOM, since the high HOM mayalready be unnecessary. If not, the concerned KPI will beaffected and, therefore, the FS will perform the appropriateaction. The problem is that such a decrease in HOM can bedue to LB or HOO reasons. As in rule 4, the application oftrial-and-error strategies will help to make this decisionin those situations. Rules 6 and 7 are related to moreextreme situations in which the existing problem has leadthe FS to achieve high values of HOM, but it has not beenmitigated. In addition, the fact that the HOM achieves largevalues due to successive LB and HOO actions may nega-tively affect the network performance. In this sense, thereare essentially two different situations related with thisissue. One situation would be given by severe congestionsituations, in which LB actions would greatly modifyservice areas. As a result, any action of HOO to reduceunnecessary HOs (e.g. due to high mobility) would involvelarger HOM values that may degrade cell-edge user’s per-formance. Given that the HOM is saturated due to LB, theaction of rule 6 should be to lead the HOM towards lowervalues or leave it unchanged, so that subsequent HOOactions could be effectively applied. The other situation isgiven by the presence of very high mobility in the network,in which the HOM will reach high values as a result of HOOactions. Under this assumption, any congestion arising inthe network would lead the LB part to work with largeHOM values, which is not desirable. Thus, rule 7 is intendedto leave unchanged, or even reduce, the symmetric region,provided that the HOM is saturated due to HOO reasons.

reward

stateagent

environment

CBR

HOR

(Network)

HOMFuzzySystem

action

HOM

CDRValuefunction

Policy

Q-Learning

Fig. 4. Optimization scheme.


Finally, rule 8 refers to a situation in which the CBR andHOR are simultaneously high, which may occur for examplewhen there are both a severe congestion and high mobilityusers in the network. In this case, the solution could be toenlarge the service area of the congested cell or to reducethe symmetric region in the adjacency. Since the objectivewould be to favor HOO or LB, respectively, the option thatsimultaneously enlarges the service area of the congestedcell and reduces the symmetric region in the adjacency isdiscarded. Other alternatives would cause the connectionquality to be significantly worsened. The specific actionfor this rule will also be determined by the optimizationalgorithm explained in next section.

Once the FS has been defined, its operation is as follows.As shown in Fig. 2, the first step is given by the fuzzifier,the process by which the assignment of membership val-ues (one for each fuzzy value of the linguistic variable) toa numerical input value is made by using the membershipfunctions. The next step of the FS is given by the inference,which calculates the degree of truth of each activated ruleas follows:

aðxÞk ¼ lK1ðCBRðxÞji Þ � lK2

ðHORðxÞÞ � lK3ðHOMðxÞ

ij Þ; ð10Þ

where aðxÞk is the degree of truth for the rule k in theadjacency x, and lK1

;lK2and lK3

are the membershipfunctions corresponding to the fuzzy sets K1;K2 and K3,respectively, involved in the rule k. The intersection ofthe fuzzy sets, denoted here by ‘�’, is implemented by usingthe min-operator, which takes the minimum value of thearguments. Finally, unlike the structure of a typical FLC,the proposed FS does not implement the module knownas defuzzifier, where the activated fuzzy rules are all aggre-gated to produce a non-fuzzy value. The reason for this isthat the output of the proposed scheme is given by atwo-dimensional variable whose elements are not corre-lated between them (e.g. OðxÞ can be either increased ordecreased while HðxÞ does not change). Thus, the fuzzinessbetween different consequents would not be applicable inthis work. Conversely, the output of the proposed FS isgiven by the consequent of the rule whose degree of truthis the highest. It can be formally expressed as:

outputðxÞ ¼ Y ðxÞ arg maxk

aðxÞk

� �: ð11Þ

Finally, to download these changes to the networkconfiguration, two more steps will be necessary. As repre-

sented in Fig. 2, the former is the update of OðxÞ and HðxÞ

considering the parameter variations given by the FS and

the latter is the conversion from OðxÞ and HðxÞ to HOMðxÞi!j

and HOMðxÞj!i given their relationship expressed in Eq. (4).

3.2. Optimization of the fuzzy system

In the proposed FS, there are certain rules (4–8 in Table 1)with more than one consequent defined. This means that, apriori, it is unclear which is the most appropriate action forthese rules, since it may depend on many context factors(e.g. the environment, the traffic distribution patterns, theuser mobility, etc.) at the moment of the execution of the

rules or simply because of the interactions between theobjectives of LO and HOO. Different strategies have beeninvestigated to create, adapt or refine rules [25–29]. In thissense, mobile operators usually do not have the completeknowledge to take proper actions in every network state.Thus, due to the complex nature of network management,Reinforcement Learning (RL) is of particular interest in thiscontext, as the system is able to learn from its own experi-ence. In addition, unlike other mathematical approaches(e.g. supervised learning in Neural Networks), in RL, a train-ing data set is not required. For this reason, in this work, thepopular RL algorithm known as Q-Learning has beenadopted, so that the best consequent for each fuzzy rulecan be found through learning from interaction.

The combination between fuzzy logic and RL has beenaddressed in some previous works [29–33]. However, theproposed optimization algorithm differs from the commonimplementation of the fuzzy Q-Learning algorithm [34].This is because, in the case of a typical FLC, the q-function(i.e. a characteristic function of fuzzy Q-Learning optimiza-tion) is updated according to the degree of activation of eachtriggered rule of the FLC. As a consequence, the q-functioncan be updated for more than one input state, i.e. thereexists a certain degree of fuzziness. Conversely, in the caseof the proposed fuzzy system, the update of the q-functionis only made for one input state, as the number of rules thatcan be activated at each optimization step is only one.

In RL, an agent is driven to take actions in an environ-ment in order to maximize a cumulative reward. Theoptimization scheme showing the combination of the FSand the learning entity is depicted in Fig. 4. The basic ele-ments in RL are the agent, the environment, the states,the actions, the policy, the reward and the value function.In this case, the agent that takes the actions is the proposedFS, while the environment corresponds to the cellular net-work. The states are given by the combination of the fuzzysets of the FS. Note that, for each state, there is one fuzzyrule defined. The actions are given by the candidate conse-quents of the rules and they represent a specific variation inthe HOM. The policy defines how the agent has to act at agiven time. The reward is a numerical value that expressesthe intrinsic desirability of being in a certain state. While

https://www.researchgate.net/publication/251196963_An_Optimum_Vertical_Handoff_Decision_Algorithm_Based_on_Adaptive_Fuzzy_Logic_and_Genetic_Algorithm?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/220134396_A_Framework_for_JRRM_with_Resource_Reservation_and_Multiservice_Provisioning_in_Heterogeneous_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/223296698_Self-Organizing_Networks_in_next_generation_radio_access_networks_Application_to_Fractional_Power_Control?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/4282455_A_new_approach_of_UMTS-WLAN_load_balancing_algorithm_and_its_dynamic_optimization?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/4282455_A_new_approach_of_UMTS-WLAN_load_balancing_algorithm_and_its_dynamic_optimization?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/220440185_A_Fuzzy_Reinforcement_Learning_Approach_for_Self-Optimization_of_Coverage_in_LTE_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/224668141_Fuzzy_Q-learning_and_dynamical_fuzzy_Q-learning?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/220465761_Fuzzy_Q-Learning_Admission_Control_for_WCDMAWLAN_Heterogeneous_Networks_with_Multimedia_Traffic?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/224570591_Adaptive_Network_Fuzzy_Inference_System_(ANFIS)_Handoff_Algorithm?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/221288780_A_Nash-Stackelberg_Fuzzy_Q-Learning_Decision_Approach_in_Heterogeneous_Cognitive_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/220839448_Downlink_femto-to-macro_interference_management_based_on_Fuzzy_Q-Learning?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

Fig. 5. Pseudo-code of the optimization algorithm.


the reward indicates what is good in an immediate sense,the value function specifies what is good in the long run.In particular, the value function is a mapping between eachstate and the total amount of reward that an agent canexpect to accumulate over the future, starting from thatstate. In this sense, the objective of the agent is not to obtainthe maximum immediate reward, but to maximize the totalreward that the agent receives in the long run.

RL methods are characterized by two important fea-tures: the trial-and-error search and the fact that actionsmay affect not only the immediate reward but also thesubsequent rewards. As previously stated, the agent hasto maximize the received reward in the long-term (orexpected cumulative reward), which is the sum of therewards that will be obtained from the input states visitedin the future:

Rt ¼ rtþ1 þ crtþ2 þ c2rtþ3 þ � � � ¼X1k¼0

ckrtþkþ1; ð12Þ

where r is the numerical reward obtained at each optimi-zation step after performing an action and c is the discountrate determining the relative importance of futurerewards. In this paper, the action performed by an acti-vated fuzzy rule will be rewarded positively if the connec-tion quality is not significantly degraded. The immediatereward, r can be formally expressed by defining a specificthreshold for the CDR, which is the KPI that estimates theconnection quality, as stated in Section 2. Then, thoseactions leading to a CDR equal or less than the thresholdshould be rewarded with a positive value, while thoseactions producing a CDR higher than the threshold shouldbe punished with a negative value. Considering this, theformula for the reward is expressed as:

rðxÞ ¼ c if CDRðxÞmeasured 6 CDRth;

�c otherwise;

(ð13Þ

where rðxÞ is the reward for the adjacency x and c is aconstant that can be expressed as a common factor in thedefinition of the reward in Eq. (12), so that the effect is ascaling transformation that can be used to avoid under-flow/overflow issues in storing the q-function. In thispaper, c = 10 is assumed. In addition, CDRth is the thresholddefined at the network level to determine bad quality andCDRðxÞmeasured is the maximum CDR between both cells in theadjacency, i.e.:

CDRðxÞmeasured ¼ max CDRðxÞi ;CDRðxÞj

n o: ð14Þ

In the proposed FS, to quantify the benefits of executinga certain rule consequent (i.e. the action) provided that arule has been activated (i.e. the state), the value q of astate-action pair ½s; a� is defined. It is a discrete function,denoted by q½s; a�, that expresses the expected cumulativereward that can be received when taking action a fromstate s. In this work, a discrete version of the Q-Learningalgorithm is considered, where the learned q-functiondirectly approximates the optimal one independently ofthe policy followed by the agent [35].

The pseudo-code of the optimization algorithm isshown in Fig. 5. After initializing the q-function, the

selection of the consequent for each rule (step 1) is madeby using a certain exploration/exploitation policy. Explora-tion is needed since trying actions that have not yet beenselected is the only way to discover new actions that couldprovide much more reward than other actions alreadytested. Exploitation is also needed since the current knowl-edge must be exploited to obtain reward. A widely-usedpolicy is the so-called �-greedy policy, defined as:

ai ¼ arg maxk

q½i; k� with probability 1� �; ð15Þ

ai ¼ randomfak; k ¼ 1;2; . . . ; Jg with probability �; ð16Þ

where ai is the selected consequent for rule i and � deter-mines the trade-off between exploration and exploitationduring the optimization process (e.g. � ¼ 0 means noexploration, so that the best action is always selected).

Each time an action (i.e. a variation in HOM) is per-formed, the network should evolve to a new state, s0, inwhich the KPIs are collected again. At this time, the rewardof the action is computed by using Eq. (13), as stated instep 2 (Fig. 5). Then, the so-called value of the new state,denoted by v ½s0�, is calculated as:

v ½s0� ¼ maxk

q½s0; ak�: ð17Þ

While the q-function quantifies the value of taking anaction when starting from a given state, the v-functionestimates the value of being in that state regardless ofthe action to be taken. Note also that the new state s0 isspecified by the new activated fuzzy rule in the FS. Fromvðs0Þ, an error signal is calculated as follows:

Dq ¼ r þ c � v ½s0� � q½s; ai�; ð18Þ

where c is a discount factor. As observed, the first part ofthe formula is the q-function calculated as the sum of theimmediate reward r for state s and the expected value ofthe next state, v½s0�. This is equivalent to Eq. (12), wherethe immediate reward and future rewards (i.e. theexpected value of the next state) are accumulated. The lastpart in Eq. (18) is taken from the stored q½s; a�. As a result,the q½s; a� will be updated in the direction of the optimal q-function independently of the policy followed by the agent

https://www.researchgate.net/publication/220344150_Technical_Note_Q-Learning?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

Scenario

Congestedar ea

ParametersIndicators

ing Ratio (CBR)

ping Ratio (CDR)

LoadBalancing

HandoverOptimization

BaselineAlgorithm

FuzzySystem

Uncoordinated

Fig. 6. Block diagram of the simulation process.

Table 2Simulation parameters.

Parameter Configuration

Cellular layout Hexagonal grid, 57 cells (3 � 19 sites),cell radius 0.5 km

Transmission direction DownlinkCarrier frequency 2.0 GHzSystem bandwidth 1.4 MHz


(step 3 in Fig. 5). Such an update is made by utilizing anordinary gradient descent, i.e.:

q½s; ai� q½s; ai� þ g � Dq; ð19Þ

where g is a learning rate. The above-described process isrepeated for the new current state (steps 4 and 5 inFig. 5) starting with the action selection (step 1).

Frequency reuse 1Propagation model Okumura–Hata with wrap-around

Log-normal slow fading, rsf = 8 dB andcorrelation distance = 50 m

Channel model Multipath fading, EPA modelMobility model Random direction

Low speed = 3 km/hHigh speed = 50 km/h

Service model Constant bit rate (voice call), poissontraffic arrival, mean call duration120 s, 16 kbps

Base station model Tri-sectorized antenna, SISO,EIRPmax = 43 dBm

Scheduler Time domain: Round-RobinFrequency domain: Best Channel

Power control Equal transmit power per PRBLink adaptation Fast, CQI based, perfect estimationHandover Time-To-Trigger = 100 ms

HOM: ½�24;24� dBCall dropping SINR < �6.9 dBTraffic distribution Unevenly distributed in spaceTime resolution 100 TTI (100 ms)Loop time 12 minSimulation duration 3200 minOptimization algorithm d = 1 dB, s = 0.5 dB

4. Performance analysis

4.1. Analysis setup

To assess the performance of the proposed joint optimi-zation algorithm, a dynamic system-level simulator for LTEmacrocells has been used [36]. This simulator executes aselectable number of optimization loops to emulate thetuning process. Each loop comprises 7000 simulation steps,equivalent to 12 min of actual network time. Each simula-tion step includes updating user positions, propagationcomputation, generation of new calls, and radio resourcemanagement algorithms. At the end of each loop, measure-ments and reliable statistics are obtained to be used in thefollowing optimization loop. Thus, in a certain loop, thesteps 1–5 of the algorithm described in Fig. 5 are executedonce.

The simulated scenario includes a macro-cellular envi-ronment with a layout consisting of 19 tri-sectorized sitesevenly distributed in the scenario, as shown in Fig. 6. Themain simulation parameters are summarized in Table 2.For simplicity, only the downlink is considered in the sim-ulation. The service provided to users is the voice call as it isthe main service affected by the tuning process. The trafficdistribution is unevenly distributed in space, where some

cells in the center of the scenario have higher traffic densitythan the surrounding cells. In addition, to thoroughly assessthe proposed algorithm, three different configurations havebeen considered. Firstly, the simulated high load scenario is


given by the presence of a greater number of users movingat low speed (3 km/h) around the scenario, where the CBRis expected to be high. Secondly, the simulated high mobil-ity scenario is given by the presence of high-speed users(50 km/h), which in principle would lead to a high HOR.In this case, the number of users is not high, but theunevenly distributed traffic in the scenario can lead to con-gestion situations, especially in the central area. Finally, thethird scenario is a combination of the two previous scenar-ios, so that high-load and high-speed users are simulated.

To compare the proposed method with reference cases,as shown in Fig. 6, the independent SON functions of LBand HOO, taken from [23,24] respectively, have beenimplemented and simulated in two different ways. In oneof them, only a functionality is active in the network, whilein the other configuration both LB and HOO functions aresimultaneously executed in an uncoordinated way. Inaddition, a baseline optimization scheme following themain principles addressed in [18] has been implemented.This scheme prioritizes the HOO part depending onwhether the connection quality is jeopardized or not. Morespecifically, if the CDR is above a certain threshold, only theHOO function is executed. The performance of theseapproaches will be assessed by looking at the main relatedKPIs, in particular, the overall HOR, CBR and CDR. A Figure-of-Merit (FoM), U, that combines the previous KPIs into ascalar value has also been considered. This FoM character-izes, qualitatively, the overall performance of the evaluatedapproaches. Formally, U is defined as [37]:

Fig. 7. Sensitivity analysis for d and s in different scenarios: (a) high

U ¼ k � ðCBR½%� þ ð1� CBR½%�=100Þ � CDR½%�Þ þHOR;

ð20Þ

where k is a constant weight determining the relativeimportance of the CBR and CDR (both related to user dis-satisfaction) compared with the HO signaling cost givenby HOR. In this study, k equal to 1 is assumed.

4.2. Simulation results

First, a sensitivity analysis for determining the optimalvalues of d and s (i.e. the variation of O and H components,respectively) has been carried out. Fig. 7 shows the mean ofthe related KPIs and the proposed FoM, U, for the three dif-ferent situations: high-load, high-mobility and bothtogether. As observed, U is a combination of the KPIsrelated to user dissatisfaction (i.e. the CBR and CDR) andthe KPI related to the HO signaling cost (i.e. the HOR).

In the high-load scenario (Fig. 7(a)), the variations of dand s have low impact on HOR since users have low mobil-ity, meaning that the impact of HOR on U will also beminor. Due to this, the variations in U are mainly givenby the user dissatisfaction. In this sense, there is a cleartrade-off between CBR and CDR, i.e. while CBR is reduced(by increasing d), CDR is greater. However, for high valuesof d, the variations in CBR are greater than in CDR. As a con-sequence, the best values of U (i.e. the lowest) correspondto high values of d. This is in contrast to the situations withhigh-mobility, as explained below.

-load, (b) high-mobility and (c) high-load and high-mobility.

https://www.researchgate.net/publication/257680635_A_Novel_Dynamic_Adjusting_Algorithm_for_Load_Balancing_and_Handover_Co-Optimization_in_LTE_SON?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/236011337_Optimization_of_load_balancing_using_fuzzy_Q-Learning_for_next_generation_wireless_networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/236011325_On_the_Potential_of_Handover_Parameter_Optimization_for_Self-Organizing_Networks?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==

https://www.researchgate.net/publication/257877851_Traffic_steering_by_self-tuning_controllers_in_enterprise_LTE_femtocells?el=1_x_8&enrichId=rgreq-83234bf9-871c-470e-a96e-bfa3fe5758c0&enrichSource=Y292ZXJQYWdlOzI2OTcwNTUyMTtBUzoxNzYwODUzNDYwMzM2NjRAMTQxODk5MzQyNjU2MA==


The second scenario given by high-mobility (Fig. 7(b))shows that HOR increases for larger values of d, especiallywhen s is 1 dB. The reason for this is that resizing the cellservice areas for load balancing purposes leads cell-edgeusers to be under worse radio conditions after performingan HO, so that the probability to perform a new HO toother neighbor cells is increased. Conversely, provided thatd is low (avoiding the effect of load balancing on HOR), forlarger values of s, HOR decreases. This is in line with theoptimization of the H component, i.e. increasing H makesmore difficult to perform an HO and it reduces the HO fre-quency. The main drawback for this case is that the CDR isnegatively affected. The high-mobility scenario also pro-duces lower values of CBR and CDR because the traffic isgeographically dispersed due to the high speed of theusers. The configuration (d = 1, s = 0.5) dB provides thelowest value of U, as a result of a better trade-off betweenHO signaling and user dissatisfaction.

The above analysis can also be extended to the scenariothat combines both high-load and high-mobility (Fig. 7(c)).Since an important objective of the proposed algorithm is tooptimize mobility and load balancing without jeopardizingthe connection quality, the high values of CDR measured inthis scenario establishes the possible range of variation of dand s. In particular, it is observed that values of s above0.5 dB involve a CDR greater than 5%, which would causeserious inconvenience to operators. Leaving s fixed to0.5 dB, the increase of d can also lead to high values ofCDR. In particular, values above 3 dB would significantlyjeopardize the CDR. For this reason, the range of d and sanalyzed in this work does not exceed the limits shown inFig. 7. As in the previous high-mobility scenario, the opti-mal configuration is (d = 1, s = 0.5) dB, meaning that thissetting can be reasonably used to evaluate the performanceof the proposed algorithm against other approaches.

The comparison of the proposed fuzzy system withother approaches is represented in Fig. 8, where the evolu-tion throughout the time of the KPIs for each strategy isdepicted. For the sake of clarity, the represented valueshave been averaged with the six subsequent samples.The initial situation is given by a low traffic and low mobil-ity. After about 200 min, the central cells of the scenariobecome crowded, so that many users are blocked, increas-ing the CBR. Looking more closely at this indicator, theevaluated approaches reach values of CBR �5% when thetraffic change occurs. The HOO configuration is not ableto solve this problem, keeping the CBR at such high values,while the LB configuration achieves a reduction of 2% in afew optimization steps. Conversely, the gain in CBRobtained by the uncoordinated alternative, the baselinescheme and the proposed fuzzy system is more moderate.A higher number of users also means more interference inthe network, so that the connection quality of the users isworse, increasing the CDR. This increase is more pro-nounced in the case of the uncoordinated approach. Toexplain this, note that the LB and HOO functions are simul-taneously changing the HOM from the first optimizationsteps. As the CDR is not significantly affected by thesechanges (due to the low interference conditions), theHOM reaches large values. As a result, when the congestionsituation occurs, the HOM values are so large that the CDR

becomes high. After this, the SON functions attempt toreduce this KPI. In the case of the baseline approach, thiseffect on CDR is attenuated because the LB function isswitched off when the CDR becomes high. Since HOMsare not adjusted by the LB function, the level of CDR isnot as high as with the uncoordinated scheme. The restof configurations, i.e. MLB, HOO and the fuzzy system, keepthe CDR constant at around 2%. Due to the presence of onlylow-speed users, the HOR is about 1.

The offered traffic experiences a small reduction ataround min. 1000, but it is not until around min. 1200when the users move at high-speed. The scenario ofhigh-mobility starts at this moment and the HOR isabruptly increased to values above 10, except in the caseof the proposed fuzzy system, whose values over the timeare below 7. Thus, the performance of the proposed tech-nique in terms of HOR is clearly better than the rest ofthe strategies. It is also noted that the trajectory of HORfollowed by the uncoordinated and baseline approachesis very similar since the HOO function is active duringthe entire simulation. Regarding the CBR and CDR, the LBand the proposed approaches lead to values below 1%,while the rest of strategies produce undesirably higher val-ues. Note also that, for all the cases, the CBR decreases sig-nificantly from the previous situation (i.e. before min.1200) because the traffic load is geographically disperseddue to the presence of fast users.

The situation after�2100 min is given by a new increaseof the offered traffic, so that the last part of the simulationincludes both high-load and high-mobility. Looking at theHOR, the proposed method remains at low values, beingthe best approach from this perspective at any time. Simi-larly, the LB approach keeps a relatively constant but higherlevel of HOR values, since no actions to reduce HO signalingtake place in this case. The HOO, the uncoordinated and thebaseline approaches lead to a gradual increase in this KPI.The reason for this is that these strategies implement thesame HOO function, which attempt to decrease the highpeak in the CDR at the expense of increasing the numberof HOs. However, the impact of these three methods onthe CDR is not the same. In particular, the baseline approachprovides lower CDR values than those obtained by theuncoordinated approach because the LB function isswitched off when the CDR is jeopardized after thevariation in traffic load. The HOO approach gives evenlower values of CDR since the LB function is not executedduring the entire simulation. From the CBR perspective,the LB approach provides the lowest values while the CDRis also quite low, similar to the fuzzy system. The proposedmethod gives better CBR than other approaches and, as pre-viously stated, the HOR is the lowest as well.

The evolution of U throughout the time (Fig. 8(d))shows the suitability of the evaluated methods in each sce-nario. It is noted that the strategy with the lowest value ofU will establish a good trade-off between HO signaling anduser dissatisfaction. In the first scenario, given by high-load conditions, the best method is the execution of LBalone, which significantly reduces the CBR but at theexpense of an increase in the CDR that is higher than inthe case of the proposed scheme. This is because thescenario has low mobility and does not require any

0 500 1000 1500 2000 2500 3000

5

10

15

20

HO

R

(a)LBHOOUncoordinatedBaselineFuzzy System

0 500 1000 1500 2000 2500 30000

2

4

6

8

(b)

0 500 1000 1500 2000 2500 30000

2

4

6

(c)

0 500 1000 1500 2000 2500 30000

20

40

60

80

U

(d)

Fig. 8. Temporary evolution of (a) HOR, (b) CBR, (c) CDR and (d) U for different approaches.


optimization from the HOO perspective. In the second sce-nario, determined by the presence of high-speed users, theproposed fuzzy system provides the lowest value of U,since it considerably reduces the HOR. At the beginningof the third scenario (a combination of the two previous),the proposed scheme also achieves lower U values thanthose obtained by the LB approach since this latter methodneeds more iterations to reduce the CBR. Thus, it can behighlighted that the proposed joint optimization methodis the only solution that, in the presence of mobility andcongestion problems (i.e. scenarios two and three), reducesboth the HOR and the CBR, which are the objectives of theHOO and LB, respectively. In this sense, note that the LB

approach does not reduce the HOR in the second scenario,which is mainly determined by high-mobility.

5. Conclusion

In this paper, a novel joint optimization algorithm forLB and HOO functions has been proposed. First, theoptimized parameter HOM is broken down into two com-ponents, O(x) and H(x), which are directly related to LB andHOO, respectively. Then, an FS that adjusts the HOM com-ponents at the cell adjacency level for the joint optimiza-tion of both functions is proposed. Finally, the FS isteamed with the Q-Learning algorithm, which leads the


FS to select suitable actions from the LB/HOO perspective,without jeopardizing the connection quality of the activeusers in the network. The proposed technique has beencompared with a baseline scheme based on the existingbibliography and the reference cases in which LB andHOO operate separately or even simultaneously in anuncoordinated way. In addition, these techniques havebeen assessed in extreme scenarios in which the HOMachieves large values, such as those with high traffic loadand/or high mobility.

Results show that the proposed scheme effectivelyimproves network performance over the reference cases.In particular, the HOR in the presence of high-mobilityusers can be reduced down to the half, while the userdissatisfaction in terms of the CBR and CDR keeps valuessimilar to the baseline schemes. In addition, it is the onlysolution that is able to partially alleviate a congestionsituation and to reduce the number of HOs, which arethe main objectives of the LB and HOO, respectively. Unlikeother reference methods, the proposed technique does notproduce high peaks in the KPIs when the situation changesabruptly, e.g. some cells become congested. In the contextof SON, it is highlighted that the complexity of the SONentity that coordinates SON specific functions would bereduced, as it is freed from the coordination of the twoimportant SON functions, LB and HOO. Finally, the advan-tages of using fuzzy logic is that the proposed design iseasy to implement.

Acknowledgment

This work has partially been supported by the Junta deAndalucía (Excellence Research Program, Projects P08-TIC-4052 and P12-TIC-2905).

References

[1] L.C. Schmelz et al., Self-configuration, -optimisation and -healing inwireless networks, in: Wireless World Research Forum Meeting, vol.20, 2008.

[2] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA) andEvolved Universal Terrestrial Radio Access Network (E-UTRAN);Overall description; Stage 2, version 11.4.0 (2012-12), TS 36.300.

[3] 3GPP, Self-Organizing Networks (SON) Policy Network ResourceModel (NRM) Integration Reference Point (IRP); Requirements,version 11.1.0 (2012-12), TS 32.521.

[4] I. Viering, M. Döttling, A. Lobinger, A mathematical perspective ofself-optimizing wireless networks, in: Proc. of InternationalConference on Communications (ICC ’09), 2009.

[5] 3GPP, Self-Organizing Networks (SON) Policy Network ResourceModel (NRM) Integration Reference Point (IRP); Information Service(IS), version 11.4.0 (2012-12), TS 32.522.

[6] K. Tsagkaris, N. Koutsouris, P. Demestichas, R. Combes, SONcoordination in a unified management framework, in: Proc. of IEEE77th Vehicular Technology Conference (VTC), Spring, 2013.

[7] X. Gelabert, B. Sayrac, S. Ben Jemaa, A heuristic coordinationframework for self-optimizing mechanisms in LTE HetNets, IEEETrans. Veh. Technol. 63 (3) (2013) 1320–1334.

[8] R. Combes, Z. Altman, E. Altman, Coordination of autonomicfunctionalities in communications networks, in: CoRR abs/1209.1236, 2012.

[9] H. Lateef, A. Imran, A. Abu-Dayya, A framework for classification ofself-organising network conflicts and coordination algorithms, in:Proc. of IEEE 24th International Symposium on Personal Indoor andMobile Radio Communications (PIMRC), 2013.

[10] L. Schmelz, M. Amirijoo, A. Eisenblaetter, R. Litjens, M. Neuland, J.Turk, A coordination framework for self-organisation in LTE

networks, in: Proc. of IEEE International Symposium on IntegratedNetwork Management (IM), 2011 IFIP, 2011, pp. 193–200.

[11] P. Vlacheas, E. Thomatos, K. Tsagkaris, P. Demestichas, Operator-governed SON coordination in downlink LTE networks, in: Proc. ofFuture Network & Mobile Summit (FutureNetw), 2012.

[12] INFSO-ICT-216284 SOCRATES, Framework for the Development ofSelf-organisation Methods, Tech. Rep. Deliverable D2.4, Version1.0.3, September, 2008.

[13] W. Li, X. Duan, S. Jia, L. Zhang, Y. Liu, J. Lin, A dynamic hysteresis-adjusting algorithm in LTE self-organization networks, in: Proc. ofIEEE 75th Vehicular Technology Conference (VTC), Spring, 2012.

[14] Y. Li, M. Li, B. Cao, Y. Wang, W. Liu, Dynamic optimization ofhandover parameters adjustment for conflict avoidance in long termevolution, China Commun. 10 (1) (2013) 56–71.

[15] R. Romeikat, H. Sanneck, T. Bandh, Efficient, dynamic coordination ofrequest batches in C-SON systems, in: Proc. of IEEE 77th VehicularTechnology Conference (VTC), Spring, 2013.

[16] H. Klessig, A. Fehske, G. Fettweis, J. Voigt, Improving coverage andload conditions through joint adaptation of antenna tilts and cellselection rules in mobile networks, in: Proc. of InternationalSymposium on Wireless Communication Systems (ISWCS), 2012.

[17] J. Chen, H. Zhuang, B. Andrian, Y. Li, Difference-based jointparameter configuration for MRO and MLB, in: Proc. of IEEE 75thVehicular Technology Conference (VTC), Spring, 2012.

[18] W.-Y. Li, X. Zhang, S.-C. Jia, X.-Y. Gu, L. Zhang, X.-Y. Duan, J.-R. Lin, Anovel dynamic adjusting algorithm for load balancing and handoverco-optimization in LTE SON, J. Comput. Sci. Technol. 28 (3) (2013)437–444.

[19] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA); RadioResource Control (RRC); Protocol specification, version 11.2.0 (2012-12), TS 36.331.

[20] T. Ross, Fuzzy Logic with Engineering Applications, Wiley, 2010.[21] A. Engelbrecht, Computational Intelligence: An Introduction, John

Wiley & Sons, 2007.[22] C. Lee, Fuzzy logic in control systems: fuzzy logic controller. I, IEEE

Trans. Syst., Man Cybernet. 20 (2) (1990) 404–418.[23] P. Muñoz, R. Barco, I. de la Bandera, Optimization of load balancing

using fuzzy Q-Learning for next generation wireless networks,Expert Syst. Appl. 40 (4) (2013) 984–994.

[24] P. Muñoz, R. Barco, I. de la Bandera, On the potential of handoverparameter optimization for self-organizing networks, IEEE Trans.Veh. Technol. 62 (5) (2013) 1895–1905.

[25] K.C. Foong, C.T. Chee, L.S. Wei, Adaptive network fuzzy inferencesystem (ANFIS) handoff algorithm, in: Proc. of the InternationalConference on Future Computer and Communication (ICFCC), 2009.

[26] A. Çalhan, C. Çeken, An optimum vertical handoff decision algorithmbased on adaptive fuzzy logic and genetic algorithm, Wireless Pers.Commun. (2010) 1–18.

[27] L. Giupponi, R. Agustí, J. Pérez-Romero, O. Sallent, A framework forJRRM with resource reservation and multiservice provisioning inheterogeneous networks, Mobile Networks Appl. 11 (2006) 825–846.

[28] M. Dirani, Z. Altman, Self-organizing networks in next generationradio access networks: application to fractional power control,Comput. Networks 55 (2) (2011) 431–438.

[29] R. Nasri, A. Samhat, Z. Altman, A new approach of UMTS-WLAN loadbalancing; algorithm and its dynamic optimization, in: Proc. of IEEEInternational Symposium on a World of Wireless, Mobile andMultimedia Networks, 2007.

[30] A. Galindo-Serrano, L. Giupponi, Downlink femto-to-macrointerference management based on fuzzy Q-learning, in: Proc. ofInternational Symposium on Modeling and Optimization in Mobile,Ad Hoc and Wireless Networks (WiOpt), 2011.

[31] M. Haddad, Z. Altman, S. Elayoubi, E. Altman, A Nash–Stackelbergfuzzy Q-learning decision approach in heterogeneous cognitivenetworks, in: Proc. of IEEE Global Telecommunications Conference(GLOBECOM), 2010.

[32] R. Razavi, S. Klein, H. Claussen, A fuzzy reinforcement learningapproach for self-optimization of coverage in LTE networks, BellLabs Tech. J. 15 (3) (2010) 153–175.

[33] Y.H. Chen, C.J. Chang, C.Y. Huang, Fuzzy Q-learning admissioncontrol for WCDMA/WLAN heterogeneous networks withmultimedia traffic, IEEE Trans. Mobile Comput. 8 (11) (2009)1469–1479.

[34] P.Y. Glorennec, Fuzzy Q-learning and dynamical fuzzy Q-learning,in: Proc. of the Third IEEE Conference on Fuzzy Systems, vol. 1, 1994,pp. 474–479.

[35] C. Watkins, P. Dayan, Technical note: Q-learning, Mach. Learn. 8 (3)(1992) 279–292.

http://refhub.elsevier.com/S1389-1286(14)00389-2/h0030



























































[36] P. Muñoz, I. de la Bandera, F. Ruiz, S. Luna-Ramírez, R. Barco, M. Toril,P. Lázaro, J. Rodríguez, Computationally-efficient design of adynamic system-level LTE simulator, Int. J. Electron. Telecommun.57 (3) (2011) 347–358.

[37] J. Ruiz-Avilés, S. Luna-Ramírez, M. Toril, F. Ruiz, Traffic steering byself-tuning controllers in enterprise LTE femtocells, EURASIP J.Wireless Commun. Network. 2012 (337) (2012).

Pablo Muñoz received his M.Sc. and Ph.D.degrees in Telecommunication Engineeringfrom the University of Málaga (Spain) in 2008and 2013, respectively. He is currently work-ing with the Communications EngineeringDepartment at the same university. SinceSeptember 2009, he has been a Ph.D. Fellow,where he has been working in self optimiza-tion of mobile radio access networks andradio resource management.

Raquel Barco received the M.Sc. degree inTelecommunication Engineering in 1997 andthe Ph.D. degree in 2007 from the Universityof Málaga, Spain. From 1998 to 2000, sheworked at the European Space Agency,Darmstadt, Germany. From 2000 to 2003, sheworked part-time for Nokia Networks. Cur-rently, she is Associate Professor at the Com-munication Engineering Department,University of Málaga. She has published morethan 50 papers in international journals andconferences and she has been involved in

several projects with companies. Her research interests are in the field ofmobile communication systems, especially Self-Organizing Networks.

Isabel de la Bandera received her M.Sc.degree in Telecommunication Engineeringfrom the University of Málaga (Spain) in 2009.In 2008, she was with the CommunicationsEngineering Department at the same univer-sity in RFID projects. Since February 2010, shehas been with the same department workingin projects about radio resource managementin next generation mobile networks and she isworking toward the Ph.D. degree in Tele-communications Engineering.