Document15

Abbas et al. EURASIP Journal on Wireless Communications andNetworking(2015) 2015:174 DOI 10.1186/s13638-015-0381-7REVI EW Open AccessRecent advances on artificial intelligenceand learning techniques in cognitive radionetworksNadine Abbas*, Youssef Nasser and Karim El AhmadAbstractCognitive radios are expected to play a major role towards meeting the exploding traffic demand over wirelesssystems. A cognitive radio node senses the environment, analyzes the outdoor parameters, and then makes decisionsfor dynamic time-frequency-space resource allocation and management to improve the utilization of the radiospectrum. For efficient real-time process, the cognitive radio is usually combined with artificial intelligence andmachine-learning techniques so that an adaptive and intelligent allocation is achieved. This paper firstly presents thecognitive radio networks, resources, objectives, constraints, and challenges. Then, it introduces artificial intelligenceand machine-learning techniques and emphasizes the role of learning in cognitive radios. Then, a survey on thestate-of-the-art of machine-learning techniques in cognitive radios is presented. The literature survey is organizedbased on different artificial intelligence techniques such as fuzzy logic, genetic algorithms, neural networks, gametheory, reinforcement learning, support vector machine, case-based reasoning, entropy, Bayesian, Markov model,multi-agent systems, and artificial bee colony algorithm. This paper also discusses the cognitive radio implementationand the learning challenges foreseen in cognitive radio applications.Keywords: Cognitive radio; Artificial intelligence; Adaptive and flexible radio access techniques1 Review1.1 IntroductionAccording to Cisco Visual Networking Index, the globalIP traffic will reach 168 exabytes per month by 2019 [1],and the number of devices will be three times the globalpopulation. In addition, the resources in terms of powerand bandwidth are scarce. Therefore, novel solutions areneeded to minimize energy consumption and optimizeresource allocation. Cognitive radio (CR) was introducedbyJosephMitolaIII andGeraldQ. Maguirein1999for a flexible spectrum access [2]. Basically, they definedcognitive radio as the integration of model-based reason-ing with software radio technologies [3]. In 2005, SimonHaykin had given a review of the cognitive radio conceptand had treated it as brain-empowered wireless communi-cations [4]. Cognitive radio is a radio or systemthat sensesthe environment,analyzes its transmission parameters,*Correspondence: [email protected] of Electrical and Computer Engineering, American University ofBeirut, Beirut, Lebanonand then makes decisions for dynamic time-frequency-space resource allocation and management to improve theutilization of the radio electromagnetic spectrum.Generally, radio resource management aims at optimiz-ing the utilization of various radio resources such thattheperformanceof theradiosystemisimproved. Forinstance, the authors in [5] proposed an optimal resource(power and bandwidth) allocation in cognitive radio net-works (CRNs),specifically in the scenario of spectrumunderlay, while taking into consideration the limitations ofinterference temperature limits. The optimization formu-lations provide optimal solutions for resources allocationat, sometimes, the detriment of global convergence, com-putation time, and complexity.Toreducethecomplexityandachieveefficientreal-time resource allocation, cognitive radio networks needto be equipped with learning and reasoning abilities. Thecognitive engine needs to coordinate the actions of theCR by making use of machine-learning techniques.Asdefined by Haykin in [4],cognitive radio is an intelli-gent wireless communication system that is aware of its 2015 Abbas et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium,provided the original work is properly credited.Abbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 2 of 20environment and uses the methodology of understanding-by-building to learn from the environment and adapt tostatistical variations in the input stimuli. Therefore, a CRis expected to be intelligent and capable of learning fromitsexperiencebyinteractingwithitsRFenvironment.Accordingly, learning is an indispensable component ofCR that can be provided using artificial intelligence andmachine-learning techniques.In this paper, we firstly present the cognitiveradiosystemprinciple, its main resources, parameters, andobjectives. Then, we introduce the artificial intelli-gence techniques,the learning cycle,the role,and theimportanceof learningincognitiveradios. Thepaperthen discusses a literature survey on the state-of-the-artachievements in cognitive radios that use learning tech-niques. Several surveys were conductedtostudy theapplication of learning techniques in cognitive radio tasks;however, they still lack some components of a compre-hensive study on cognitive radio systems. For instance, theauthors of [6] presented a brief survey on artificial intel-ligence techniques;however,their works focus was onCR application and testbed development and implemen-tation. The authors in [7] presented a survey on differentlearning techniques such as reinforcement learning, gametheory, neural networks, support vectormachine, andMarkov model. They also discussed their strengths, weak-nesses, and the challenges in applying these techniquesinCRtasks. In[8], theauthorsconsideredgamethe-ory, reinforcementlearning, andreasoningapproachessuch as Bayesian networks, fuzzy logic, and case-basedreasoning. Incontrast totheliterature, wepresent acomprehensive survey considering all the learning tech-niques that were used in cognitive networks. The surveyis organized based on different artificial intelligenceapproaches including the following:(a) fuzzy logic,(b)genetic algorithms, (c) neural networks, (d) game theory,(e) reinforcement learning, (f ) support vector machine,(g) case-based reasoning, (h) decision tree, (i) entropy, (j)Bayesian, (k) Markov model, (l) multi-agent systems, and(m) artificial bee colony algorithm.The main contributions of this paper are as follows: (1)it provides a comprehensive study on learning approachand presents their application in CR networks, evalua-tion, strengths, complexity, limitations, and challenges; (2)this paper also presents different cognitive radio tasks,as well as the challenges that face cognitive radio imple-mentations; (3) it evaluates the application of the learningtechniques to cognitive radio tasks; and (4) categorizeslearning approaches based on their implementations andtheir applicationinperformingmajor cognitiveradiotasks such as spectrum sensing and decision-making.This paper is organized as follows: cognitive radio net-works, resources, objectives, and challenges are presentedin Section 1.2. Learning in cognitive radios is presentedin Section 1.3. Artificial intelligence and its learning roleareintroducedinSection1.3.1. Theliteraturesurveyis presented in Section 1.3.2. Learning techniques chal-lenges, strengths, weaknesses, andlimitationsarepre-sented in Section 1.4.1. Discussion on applying learningtechniques in cognitive radio networks is presented inSection 1.4.2. Finally, conclusions are drawn in Section 2.1.2 Cognitive radioCognitive radio provides the radio system an intelligenceto maintain a highly reliable communication with efficientutilizationof theradiospectrum. Inthis section, wepresent the cognitive radio cycle, tasks, and correspond-ing challenges.1.2.1 The cognitive cycleAs shown in Fig. 1, the wireless communications system isformed by base stations and radio networks where someare primary users (PUs) or networks that own the spec-trum and others are secondary users (SUs) that may usethe spectrum when it is available and not occupied byother networks. As shown in Fig. 2, the cognitive radionetwork follows the cognitive cycle for best resource man-agement and network performance. It starts by sensingthe environment, analyzing the outdoor parameters, andthen making decisions for dynamic resource allocationand management to improve the utilization of the radioelectromagneticspectrum[9]. Thesecouldbebrieflydescribed as follows.Sensing the environment: In cognitive radionetworks, the primary network has thepriority to use the spectrum than the secondarynetwork. The secondary network may use theavailable spectrum but without causing harmfulinterference to the primary network. Therefore, itneeds to primarily quantify and sense itssurrounding environment parameters such as (1)channel characteristics between base station andusers; (2) availability of spectrum and power; (3)availability of spectrum holes points in frequency,time, and space; (4) user and applicationrequirements; (5) power consumption; and(6) local policies and other limitations [10].Analyzing the environment parameters: The sensedenvironment parameters will be used as inputs forresource management in all dimensions such as time,frequency, and space. The main resource allocationobjectives in CR include but are also not limited to(1) minimizing the bit error rate, (2) minimizing thepower consumption, (3) minimizing the interference,(4) maximizing the throughput, (5) improving thequality of service, (6) maximizing the spectrumefficiency, and (7) maximizing the user quality ofexperience. In practice, cognitive radio aims atAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 3 of 20Fig. 1 Wireless communication network formed by cognitive radio networkssatisfying multiple objectives; however, thecombination of some objectives may createconflicting solutions such as minimizing the powerconsumption and bit error rate simultaneously.Therefore, tradeoff solutions are needed toguarantee a balance between the objectivefunctions [11].Making decisions for different decision variables: Inorder to achieve the objectives mentioned before, theCR network needs to make decisions concerning thefollowing important variables: (1) power control, (2)frequency band allocation, (3) time slot allocation,(4) adaptive modulation and coding, (5) frame size,(6) symbol rate, (5) rate control, (6) antenna selectionand parameters, (7) scheduling, (8) handover, (9)admission control, (10) congestion control, (11) loadcontrol, (12) routing plan, and (13) base stationdeployment [12]. The decision-making can be basedon optimization algorithms; however, in order toreduce the complexity and achieve efficient real-timeresource allocation, cognitive radio networks usemachine learning and artificial intelligence.Fig. 2 Learning process in cognitive radiosAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 4 of 201.2.2 Cognitive radio tasks and corresponding challengesThe major role of cognitive radio is to identify spectrumholes across multiple dimensions such as time, frequency,and space and accordingly adjust its transmission param-eters such as modulation and coding, frequency and timeslot allocation, power control, and antenna parameters.Therefore, the CR behaving as a secondary user needsto dynamically reconfigure itself to avoid any noticeableinterference to the primary user by efficiently using its(1) cognitive capability, (2) reconfigurable capability, and(3) learningcapability. However, thesecapabilitiesaresubjected to many challenges presented as follows.Task 1The cognitive capability can be achieved usingefficient situation awareness and spectrum-sensing tech-niques. It includes location and geographical aware-ness, RF environment, network topology, and operationalknowledge. The main challenges facing spectrum-sensingtechniques are the decision accuracy on spectrum avail-ability, sensing duration, frequency and periodicity, uncer-tainty in background noise power especially at low SNRduetomulti-pathfadingandshadowing, detectionofspread spectrum primary signals, and sensing with lim-ited information about the environment. To improve thespectrum-sensing performance, cooperative sensing andgeo-location technology were proposed. First, cooperativedetectionshowedcollaborativecommunications gains;however, it is still facing many challenges such as over-head, developing efficient cooperation framework includ-inginformationsharingalgorithmsandnetworks, anddynamic information exchange with minimum delay. Sec-ond, combining geo-location technology with spectrumsensing may reduce the complexity, power, and cost atthe CR device. The CR will be using database look-up forlocation awareness as well as unused spectrum and chan-nels. Geo-location technology also faces many challengessuch as updating the databases,efficient look-up tech-niques and algorithms, accuracy, trust, and security of thedatabases [1315].Task 2After performing spectrum sensing and situa-tion awareness, the CR network needs to use its recon-figurable capability to dynamically adjust operational andtransmission parameters and policies to achieve the high-est performance gain such as maximizing the utilizationof the spectrum and throughput, reducing the energy andpower consumption, and reducing the interference levelwhilemeetingusers qualityof service(QoS) require-ments such as rate, bit error rate, and delay. The mainreconfigurable parameters listed in Section 1.2 includefor instance (1) power control, (2) frequency band allo-cation, (3) time slot allocation, (4) adaptive modulationand coding, (5) frame size, (6) symbol rate, (5) rate con-trol, (6) antenna selection and parameters, (7) scheduling,(8) handover, (9) admission control, (10) congestion con-trol, (11) load control, (12) routing plan, and (13) basestation deployment. The reconfigurable capability is basedon decision-making, which can be based on optimizationalgorithms. The main challenge here concerns the com-plexity and the convergence of these techniques withina limited time.In order to reduce the complexity andachieve efficient real-time resource allocation, cognitiveradio networks use machine learning and artificial intel-ligence. This decision-making is based on models builtusing the CR learning capability, which is based on theenvironment information. However, the latter may not becomplete or accurate due to limited training data. In addi-tion, the decision-making procedure needs to be dynamicand fast [16]. Therefore, focusing future research contri-butions on any of these two aspects is needed as theyare the bottleneck of the reconfiguration capability in CRnetworks.Task 3The learning capability is used to build anddevelopthelearningmodel for decision-making. Themain challenge here is to enable the devices to learn frompast decisions and use this knowledge to improve theirperformance. Some learning techniques may require pre-vious knowledge of the system, predefined rules, policiesand architecture, and a large number of iterations whichmay increase the delay and reduce the efficiency of thesystem. Therefore, the choice of a learning technique forperforming specific CR task is considered a challenge aswell as the accuracy and efficiency of the techniques.1.3 Learning in cognitive radio networksLearning in cognitive radios has recently gained a lot ofinterest in the literature. In this section, artificial intel-ligence and machine learning are introduced as well asa survey of the state-of-the-art achievements in applyinglearning techniques in cognitive radio networks.1.3.1 Introduction to artificial intelligence and machinelearningArtificial intelligence aims at making machines performtasks in a manner similar to an expert.The intelligentmachine will perceive its environment and take actionsto maximize its own utility. The central problems in arti-ficial intelligence include deduction, reasoning, problemsolving, knowledge representation, and learning [17].The major steps in machine learning in cognitive radiosare shown in Fig. 2 and can be presented as follows: (1)sensing the radio frequency (RF) parameters such as chan-nel quality, (2) observing the environment and analyzingits feedback such as ACKresponses, (3) learning, (4) keep-ing the decisions and observations for updating the modeland obtaining better accuracy in future decision-making,and finally (5) deciding on issues of resource managementand adjusting the transmission errors accordingly [7, 18].In [6], Zhao et al. introduced the concepts of cognitiveradio from the perspectives of artificial intelligence andAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 5 of 20machine learning. Moreover, the authors presented thepossible applications and fundamental ideas that drive theCR technology.Artificial intelligence may be represented but not lim-itedtothefollowinglearningtechniques: fuzzylogic,genetic algorithms, neural networks, game theory, rein-forcement learning, support vector machine, case-basedreasoning, decision tree, entropy, Bayesian, Markovmodel, multi-agent systems, and artificial bee colony algo-rithm. However, the mentioned approaches are the maintechniques used and applied in CR networks.1.3.2 Applying artificial intelligence techniques to CRsIn this section, a survey of the state-of-the-art researchconsidering learning in CRs is presented. They aregrouped based on artificial intelligence and learning tech-niques for CR.Fuzzy logicThe fuzzy set theory was proposed by LotfiA. Zadeh in 1965 to solve and model uncertainty, ambigu-ity, imprecisions, and vagueness using mathematical andempirical models [19]. The variables in fuzzy logic are notlimited to only two values (True or False) as it is definedinclassical andcrispsets[8]. Afuzzyelement hasadegree of membership or compatibility with the set and itsnegation. Fuzzy logic provides the system with (1) approx-imate reasoning by taking fuzzy variables as an input andproducing a decision by using sets of if-then rules, (2)decision-making capability under uncertainty by predict-ing consequences, (3) learning from old experience, and(4) generalization to adapt to the new situations [20, 21].In general, the inputs for the fuzzy inference system (FIS)need to be fuzzified or categorized into levels or degreessuch as low, medium, and high. FIS using if-then rules willallow determining the output of the system.The authors in [2224] and [25] applied fuzzylogictheory in cognitive radio to solve the following objec-tives: bandwidth allocation, interference and power man-agement, spectrum availability assessment methods, andresource allocation. In [22], the authors have proposed acentralized fuzzy inference system that can allocate theavailable bandwidth among cognitive users consideringtraffic intensity, type, and quality of service priority. Thesecondary users (SU) have to submit bandwidth requeststo the master SU which uses fuzzy logic to grant the SUbandwidth access.First,the master SU assesses trafficintensity of the SU queue and of the bandwidth alloca-tion queue to determine the allowed access latency forSUs. Second, combiningtheallowedaccesslatencytothe traffic type and priority, the master will be able todecide on the amount of bandwidth that may be allo-cated to the required SU. Depending on the combinationof access latency and traffic priority, the bandwidth to beallocated is characterized as very high, high, medium, low,and very low.In [23], Aryal et al. have presented an approachfor power management while reducing interference andmaintaining quality of service. Their algorithm considersthe number of users, mobility, spectrum efficiency, andsynchronization constraint. These inputs are categorizedas low, moderate, high, and very high. Fuzzy rules are thenused to determine the power adjustments as (1) remainunchanged, (2) slightly increase, (3) moderately increase,(4) highly increase, and (5) fully increase.Theauthors in[24] usedfuzzylogictodeterminetheproper methodfor detectingavailablebandwidth.Four input parameters are considered for the selection ofthe spectrum-sensing method: (1) required probability ofdetection, (2) operational signal-to-noise ratio (SNR), (3)available time for performing the detection, and (4) a pri-ori information. The outputs are (1) energy detection, (2)correlation detection, (3) feature detection, (4) matchedfiltering, and (5) cooperative energy detection. The inputparameters are first fuzzified from measurable values tofuzzy linguistic variables by using input membership func-tions such as low, medium, or high. Based on the if-thenrules, the fuzzy values of the input parameters will thenspecify the method for spectrum availability detection.Qin et al. in [25] proposed the use of fuzzy inferencerules for resource management in a distributed hetero-geneous wireless environment. The fuzzy convergence isdesigned in two levels. First, the local convergence calcu-lation is based on local parameters such as interferencepower, bandwidthof afrequencyband, andpathlossindex. Second, thelocal convergencecalculationscol-lected from allnodes willbe aggregated to generate aglobal control for each node. The aggregation weights areidentified using: the nodes control, the link state aggrega-tion date, and the link states amount.Genetic algorithmsGenetic algorithms (GA) are orig-inated in the work of Friedberg (1958), who attemptedto produce learning by mutating small FORTRAN pro-grams. Therefore, by making an appropriate series of smallmutations to a machine code program, one can generate aprogram with good performance for any particular simpletask [18]. Genetic algorithms simply search in the space,with the goal of finding an element or solution that max-imizes the fitness function by evolving a population ofsolutions, or chromosomes, towards better solutions. Thechromosomes are represented as a string of binary digits.This string grows as more parameters are used by the sys-tem. The search is parallel instead of processing a singlesolution because each element can be seen as a sepa-rate search. A genetic algorithm-based engine can provideawareness-processing, decision-making, and learning forcognitive radios [26, 27].The authors in [28] used genetic algorithms for enhanc-ingthesystemperformancebysolvingmulti-objectiveAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 6 of 20problemsthataimtominimizethebiterrorrateandpower and maximize the throughput. The authors haveencoded the operating variables for different numbers ofsubcarriers into a chromosome, including the center fre-quency, transmission power, and modulation type. Theypresented several approaches such as population adapta-tion, variable quantization, variable adaptation, and multi-objective genetic algorithms.Their results showed thatthe information from the past states of the environmentfrom previous cognition cycles can be used to reduce theconvergence time of the GA and that genetic algorithmscan enhance the convergence of optimization problems.In [29], the authors addressed spectrum optimization inCRs using elitism in genetic algorithms. They used fourparameters for the chromosome structure representation:frequency, power, bit error rate, and modulation scheme.The solution has provided the most efficient performancefor the users quality of service, subjected to different con-straints on the genes in the chromosome structure. Elitismis used for the selection of the best chromosomes amongthe population to be transferred to the next generationbefore performing crossover and mutation. This preventsthe loss of the most likely solutions in the available pool ofsolutions.Haurisetal. usedgeneticalgorithmsin[30]forRFparameter optimization in CR. The genes used aremodulationandcodingschemes, antennaparameters,transmit and receive antenna gains, receiver noise figure,transmit power, data rate, coding gain, bandwidth, andfrequency. The fitness measure is calculated from the fol-lowingkeyperformanceparameters: linkMargin, C/I,data rate, and spectral efficiency. The maximum fitnessmeasure and its associated chromosomes are tracked andsaved. The best member is then utilized as the optimalsolution for setting the RF parameters.The authors of [31] presented cognitive radio resourceallocation based on a modified genetic algorithm namedniche adaptive genetic algorithm (NAGA). NAGA solvedthe problem of fixed crossover and mutation probabilitiesand adaptively adjusted them to achieve optimal perfor-mance. The goal was to determine the assignment of thesubcarrier and modulation schemes to the users, in orderto maximize the total transfer speed of CR networks whilesatisfying minimumrate requirements, maximumallowedbit error rate, and total power consumption limits.Neural networksNeural networks were introduced byWarren McCulloch and Walter Pitts in 1943 and wereinspired from the central nervous system. Similar to thebiological neural network,the artificial neural networkwill be formed by nodes, also called neurons or processingelements, which are connected together to form a net-work. The artificial neural network gets information fromall neighboring neurons and gives an output depending onits weight and activation functions. The adaptive weightsmayrepresenttheconnectionstrengthsbetweenneu-rons. Toaccomplishthelearningprocess, theweightsneed to be adjusted until the output of the network isapproximately equal to the desired output. Artificial neu-ral networks have been used to make the cognitive radiolearn from the environment and take decisions, in orderto improve the quality of service of the communicationsystem [32, 33].In[34], Xuezhi Tanetal. addressedtheproblemofspectrum lack and inefficiency in the current commu-nication networks by introducing a new solution usingartificial neural networks (ANNs) to replace the currentfrequency allocation system. They presented theoreticalanalysis about twodifferent scenarios: thesingle-usercase scenario and the multi-user scenario with weightedallocation. They focused on the Back Propagation The-ory mainly formed by the idea of the exchange ofinformation going forward and error being transmittedbackward.The authors in [35] tried to improve the performance ofspectrum sensing in CR networks based on a new ANNsolution. The authors installed ANN at every secondaryuser to predict the sensing probabilities of these units.Their idea was to create a new cooperative spectrum-sensingsystemthroughthe collaborationof the SUsequippedwithANNcapabilities anda fusioncenterusing the theory of the belief propagation network. Theresults showed a global low false-alarm probability for theCR network.Yang et al. in [36] proposed a design of the cognitiveengine based on genetic algorithms and radial basis func-tion network (RBF) in order to adjust the parameters ofthe system so as to effectively adapt to the environment asit changes. They utilized a decision-making table to trainthe RBF learning model whereas they mainly made useof the GA to adjust the operating parameters of the RBFneural network such as the data rate, MAC window, andtransmitting power.The authors in[37] presenteda general reviewofthe main spectrum-sensing methods and then proposedtheir own automatic modulation classification detectionmethod. The main idea of [37] is based on the fact that thesecondary user is not supposed to have a priori informa-tion about the primary users signal type and not supposedto address the issue of the hidden node. The authors devel-oped the digital classifier using an ANN that allows theuser to detect all forms of primary radio signals whetherweak, strong, pre-known, or unknown.Game theoryThe first known discussion of game the-oryoccurredinaletterwrittenbyJamesWaldegravein1713. Game theory is usedas a decision-makingtechnique where several players must make choices andAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 7 of 20consequently affect the interests of other players. Eachplayerdecidesonhisactionsbasedonthehistoryofactions selectedby other players inprevious roundsof thegame. Incognitivenetworks, theCRnetworksaretheplayers inthegame. Theactions aresettingtheRFparameterssuchastransmitpowerandchan-nel selection. Theseactionswill betakenbyCRnet-works basedonobservations representedbyenviron-ment parameterssuchaschannel availability, channelquality, andinterference. Therefore, eachCRnetworkwill learnfromits past actions, observe the actionsof the other CR networks, and modify its actionsaccordingly [38, 39].Abdulghfoor et al. in [40] introduced the concepts ofmodeling resource allocation in ad hoc cognitive radionetworks with game theory and compared between twoscenarios related to the presence or absence of coopera-tion using game theory. They also outlined game theoryapplications in other layers and future directions for usinggame theory in CR. The authors showed that game the-ory can be used to design efficient distributed algorithmin ad hoc cognitive radio networks whereas its applicationin the MAC layer proved to be most challenging.The authors in [41] addressed spectrum sensing in CRnetworks. They proposed a single framework for coop-erative spectrum sensing of CR networks as well as forself-organization of femtocells by relying on the data gen-erated by the CR users to help in a better understandingof the environment. They proposed the creation of a largespectrum database that relies on macrocell and femto-cell networks. The game theory approach provides mutualbenefits to the CR users, and each femtocell is consideredas a player in the game of spectrum sensing and spectrumutilization. The results showed a lower false alarm proba-bility and a tradeoff between the gain increase and the sizeof the coalition as the time to generate the reports has anegative effect on the matter.In [42], Pandit et al. approached the problemof the fixedspectrum allocation policy from an economic point ofview where they proposed a simulation model to improvethe bandwidth allocation between the primary and sec-ondary users of a CR network.The authors developedan algorithm that minimizes the cost of the bandwidthutilization while maximizing the effectiveness of the SUsin the CR network. The authors utilized game theory asa utility to model the payoffs between the SUs and PU,considered as players. In their approach, the PUs mainaim was to maximize their revenue whereas the SUs aimwas to improve their QoS satisfaction at an acceptablecost.The authors in [43] addressed spectrum managementin CR. They proposed spectrum tradingspectrum man-agement without game theory (SMWG) and spectrumcompetitionspectrum management with game theory(SMG). Considering the SMWG first, the authors intro-duced a competition factor to model the spectrum com-petition between the different PUs. They also introduced anew QoS level function that relates to the spectrum avail-ability and variable according to SU requirements. In bothcases, they assumed that the tradeoff is between the PUsdesire to maximize revenue and the SUs desire to obtainthe desired QoS level. In the SMG model, they formulatedtwo games by relying on a Bertrand game with the Nashequilibrium being the solution, and the other is using theStackelberg game having the same solution.Reinforcement learningReinforcement learning (RL)plays a key role in the multi-agent domain, as it allowsthe agents to discover the situation and take actions usingtrial and error to maximize the cumulative reward as illus-trated in Fig. 3. The basic reinforcement learning modelconsists of (1) environment states, (2) actions, (3) rules fortransition between states, (4) immediate reward of transi-tion rules, and (5) agent observation rules. In RL, an agentneeds to consider the immediate rewards and the conse-quences of its actions to maximize the long-term systemperformance [44, 45].In [46], Yau tried to incorporate RL to correctly com-plete the cognition cycle in centralized and static mobilenetworks. The RL approach was applied at the level of theSUs where they dynamically rank channels according toPU utilization and packet error rate during data transmis-sion to increase throughput of SUs and QoS levels whilereducing delays.Theauthorsin[47] consideredroutingincognitivenetworks. They proposed a new RL system that jointlyworks on channel selection and routing for a multi-hopCR network. The RL incorporated a system of errors andrewards based on each decision, and hence, every agenttried to maximize its own rewards. After trial and error,the CR users of RL will reach an optimal state in theirdecision-makingwheretheymaximizetheir spectrumFig. 3 The agent-environment interaction in reinforcement learningAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 8 of 20utilization performance. The authors also used the feed-back obtained from the environment and modeled theproblemusingtheMarkovdecisionprocess. Thekeydrivers of the channel selection can be represented by linkcost and transmission time. This method allows the usersto choose the best route available through continuous andefficient determination of the next hop.Zhou et al. in [48] addressed the problem of high-powerconsumption generated due to overhead communicationbetween different CR users. They proposed a new powercontrol scheme relying on RL to eliminate the need forinformation sharing on interference and power strategiesof each CR element. The CR users compete over repeatedrounds to maximize their own objective while taking intoconsideration the interference constraints imposed by thenetwork.In[49], theauthorsusedreinforcement learningonthe secondary user node for spectrum sensing, PU sig-nal detection, andtransmissiondecisionsincognitiveradio networks. The SU will learn the behavior of the PUtransmissions to dynamically fill the spectrum holes.Support vector machineSupport vector machines(SVM) are supervised learning models used for classifica-tion and regression analysis. In the learning phase, SVMuses the training data to come up with margins to separateclasses as shown in Fig. 4. A new entry or object is thenclassified based on these margins and the compatibility ordistance between the object and the class [50, 51].In [52], the authors used SVM to add a learning designto the CR engine. The proposed model is based on biterror rate (BER), SNR, data rate, and modulation mode.For training data, three channel models are considered:flat fadingmodel, deepfadingmodel, andnofadingmodel. Once the model is built, the data is tested takingthe BER, SNR, and data rate as inputs.Dandan et al. in [53] used SVM for spectrum sensingand real-time detection. The sample data was classifiedas a primary user or not by training and testing on theproposed SVMclassification model. The classes are deter-minedbasedonthereceivedsignal asfollows: if thesignal detected is formed by the signal and AWGN noise,the class will be denoted as PU detected. When the sig-nal is only AWGN noise, the class indicates no PU. Theparameters considered in this work included carrier, pulsesequence, repeatability extension, and circulation prefixprocessing.The authors in [54] used SVM for medium access con-trol (MAC) identification. TheproposedSVMmodelenables CR users to sense and identify the MAC pro-tocol typesof theexistingtransmissionsandtoadapttheir transmission parameters accordingly. The authorsusedthreedifferent kernel functionsforSVM: linear,polynomial, and radial basis functions.In [55], the authors proposed applying SVMtoeigenvalue-based spectrumsensing for multi-antennacognitive radios. They built their training model byobservingNsamples andgeneratingtheir covariancematrix eigenvalues. The new data point is then classifiedbased on the annotated training data set and SVM kernelto indicate the presence or absence of PU.The authors in [56] used SVM to solve the problem ofbeam-forming design in cognitive networks. They con-sideredaCRnetworkwithrelayingcapabilitieswherethe cognitive base station shares the spectrum with theprimarynetworkandcanact asarelaytoassist theprimarydatatransmission. First, theyaimedtomini-mize the total transmit power of the cognitive base sta-tion, whilemaintainingQoSrequirementsandmutualinterference level as acceptable. SVM was used to solvethe optimization problem for the beam-forming weightvectors.Case-based reasoningLi D. Xu introduced the conceptof case-based reasoning (CBR) in [57] which relies on pastproblems and solutions to solve current similar situations.CBR systems build an information database about pastsituations, problems, and their solutions and rewards asshown in Fig. 5. New problems are then solved by find-ing the most similar case in memory and inferring thesolution to the current situation [58].Ken-Shin Huan et al. introduced a new space efficientand multi-objective CBR method in [59] to solve the high-storage space required by the traditional CBR methods.The authors considered all possible cases in relation totheir objectives in order to develop their model as accu-rately as possible. Their method relied on the divide-and-conquer technique using unity functions.Reddy in [60] designed an efficient spectrum allocationusing CBR and collaborative filtering approaches. Theyused the case-base reasoning to identify a channel pre-ferred by the secondary user. They then used automaticcollaborative priority filtering to assign a channel to thehighest prioritized user.The authors in [61] used CBR for proper link man-agement, network traffic balance, and system efficiency.Each case contains the problem, a solution, and its corre-sponding result to provide the CR with better informationutilization on the input, previous decisions, and their con-sequences. They aimed to reduce access time by findingsimilarities between cases and bucketing them.In [62], the authors used CBR quantum genetic algo-rithmto adjust and optimize CRparameters. Envi-ronment change factor was used to measure similaritybetween the current problemand the cases in the databaseand initialize quantum bits to avoid the blindness of ini-tial population search and speed up the optimization ofquantum genetic algorithm.Abbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 9 of 20Fig. 4 SVMclassifier separates the training data into two classes by determining a linear line representing the greatest separation between both sidesDecision treeDecision tree learning uses a decision treetocreateamodel that predicts thevalueof atargetclass based on severalinput variables.A decision treehasasimilarstructureasthatof aflowchart, whereeach node is an attribute, and the topmost node is theroot node as shown in Fig.6.Each branch representsthe outcome of a test, and each leaf node holds a classlabel [63].In [64], the authors used decision tree learning to findthe optimal wideband spectrum-sensing order. The rootnode is the start point, and the leaf node is the chan-nel selectedat everystage. At everynode, onestrat-egyisselectedbasedoncertainrulestoproducethecorresponding child node and construct a branch. At theend leaves of the tree, the sensing order can be trackedbackward to the root.The authors in[65] used decisiontree for cogni-tive routing sothat the nodes canlearntheir envi-ronment and adapt their parameters and decisionsaccordingly. Asender will thenusethedecisiontreetoselect themost appropriateandreliablenext hopneighbor.In [66], the authors used decision tree for video rout-ing in CR mesh networks. They aimed to improve thepeak signal-to-noise ratio (PSNR) of the received videobyconsideringchannel status, nodessupportingvideoframe quality of service, effects of spectrum stability, andbandwidth availability.Theauthorsin[67]addressedcooperativespectrumsensing using distributed detection theory. CRs cooper-atetosensethespectrumandclassifyoverlappingairinterfaces. The authors proposed the decomposition of aFig. 5 Case-based reasoning concept illustrationAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 10 of 20Fig. 6 Decision tree chart where each node is an attribute and each leaf node holds a class labelM-ary subtest into a set of binary tests, represented in adecision tree, to make the classification process simpler.EntropyIn 1948, Claude E. Shannon introduced entropyinhispaperAMathematical Theoryof Communica-tion [68]. Entropy is a measure of the uncertainty in arandom variable. It is also defined as a logarithmic mea-sure of the rate of transfer of information in a particularmessage.Zhao in [69] proposed (1) an entropy detector basedon spectrum power density that provides better detectionwith lower computational complexity and (2) a two-stageentropy detection scheme to improve the performance ofthe entropy detector based on spectrum power density atlow SNR.In [7072], entropy was used for spectrum sensing. Theauthors in [70] addressed spectrum sensing in a maritimeenvironment where ships are far from land.The com-municationship-to-ship/ship-to-shoreandship-to-shipad hoc network in deep sea have been realized with thesupport of satellite communication links.The authors in [71] aimed to increase the spectrum-sensing performance and reliability in a cooperativewideband-sensing environment. They appliedentropyestimation in each subchannel with multiple suspicious-CR-user elimination. Soft decision fusion methods:weighted gain combining and equal gain combining wereusedtoimprovethereliabilityof cooperativesensing.In addition, the generalized extreme studentized deviate(GESD) test was used to detect outliers and eliminate sus-picious cognitive users. The authors extended their workin [72]; they performed hardware-in-the-loop simulationof the developed algorithm in a field programmable gatearray (FPGA). The cooperative wideband spectrum sens-ing was based on entropy estimation and exclusion ofsuspicious CR users using GESD test and sigma limit test.Bayesian approachBayesian networks are graphic prob-abilistic models that rely on the interaction between thedifferent nodes to achieve learning for and from everynodeinvolvedintheprocess. TheBayesiannetworks(BNs) have a role in the decision-making process if com-bined with utilities in order to form influence diagrams[73, 74].Yuqing Huang et al.in [75] proposed a CR learninginterference and decision-making engine based onBayesian networks. The authors made use of the junctiontree algorithm to model interference using probabilisticmodels obtained from the BNs. The authors developedtheir CR model to adapt their radio parameters to ensureQoS of the users.In [76],the authors addressed multi-channelsensingandaccessindistributednetworks, withandwithoutconstraint on the number of channels that SUs are able tosense and access. They proposed a cooperative approachfor estimating the channel state and used Bayesian learn-ing to solve multi-channel sensing problem.Theauthors in[77] presentedaBayesianapproachfor spectrumsensingtomaximizethespectrumuti-lizationinCRNs. Theyaimedtodetect known-ordermulti-phase shift keying (MPSK)-modulated primary sig-nals over AWGN channels based on Bayesian decisionrule.Zhou et al. in [78] proposed a cooperative spectrum-sensing scheme basedonBayesianreputationmodelinCRNswheremalicioussecondaryusersmayoccur.Theysuggestedtheuseof SUs reputationdegreestoreflect their service quality. These reputation degrees areupdated based on Bayesian reputation model to distin-guish the trustworthiness of the reports from SUs andtrack the behaviors of malicious SUs.In [79], the authors exploited sparsity in coope-rative spectrum sensing.Sparsity occurs as follows:(a)in frequency domain when primary users occupy a smallpart of the system bandwidth and (b) in space domainwhen the number of users is small and their locationsoccupy a smallfraction of the area.In their proposedmodel, the authors used the theory of Bayesian hierarchi-cal prior modeling in the framework of sparse Bayesianlearning.Abbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 11 of 20Markov model The Markov model is used to model ran-dom process changing from one state to another overtime. The random process is memoryless where futurestates depend only on the present state [80, 81]. In Markovmodels, the states are visible to the observer; however, inthe hidden Markov model (HMM), some states are hiddenor not explicitly visible [82].The authors in [83] used HMM for blind source separa-tion to identify spectrum holes. Using the hidden Markovmodel, a CR engine will predict the primary users nextsensing frame.In [84], Pham et al. addressed spectrum handoff in cog-nitive radio networks based on the hidden Markov model.Spectrum handoff occurs when a SU needs to switch toa new idle channel due to a continuous data transmis-sion when a PU needs the current channel.Therefore,the secondary user needs to study the behavior of theprimaryuserandpredict itsfuturebehaviorstoper-form spectrum handoff and ensure a continuous trans-mission.Li et al. in [85] used the Markov model for modeling andanalyzing the competitive spectrum access among cog-nitive radio networks. The cognitive network is formedby multiple dissimilar SUs and channels. They proposeddecomposing the complex Markov model into a bank ofseparated Markov chains for each user. They focused onevaluating the SU throughput under uncertain spectrumaccess strategies.Theauthorsin[86] usedtheMarkovdecisionpro-cess for dynamic spectrum access in cognitive networks.They used the HMM to model a wireless channel andpredict the channel state. They decided on spectrumsens-ing, channel selection, modulation and coding schemes,transmitting power, and link layer frame size to minimizeenergy consumption.Multi-agent systemsJacques Ferber introduced multi-agentsystems(MAS)asasmartentityawareoftheirsurroundings, capableof skillfullyactingandcommu-nicating independently.MAS contain the environment,objects, agents, and the different relations between theseentities. MAShavetheirapplicationsfoundmainlyinproblem solving and in the creation of a virtual world[87, 88].In [89], Emna Trigui et al. introduced a novel approachto address the spectrum handoff within the CR domain.Their approach allows CR terminals to always switch tothespectrumbandthat offersthebest conditionsbyusing multi-agent system negotiation.They consideredthe mobile CR terminal and primary users as agents whenthey negotiate on prices and bandwidth trying to maxi-mize their own profits.The authors in [90] addressed the issue of real-time CRresource management by relying on multi-agent systems.They considered the scenario of a user stepping into azone with bad QoS.The authors used several learningalgorithms such as K-NN and decision trees in order toclassify data.In [91], the author addressed the issue of resource man-agement and proposed a negotiation model to reduce theoverhead present by relying on a novel spectrum accessscheme that eliminates negotiation. The author relied ongame and multi-agent Q-learning in order to create hismodel.Mir et al. in [92] used a multi-agent system for dynamicspectrumsharing in cognitive radio networks. Theyproposedacognitiveradionetworkwhereagentsaredeployed over each primary and secondary user device.Accordingly, when the SU needs spectrum, its agent willcooperate and communicate with PU agents for spectrumsharing.Artificial bee colonyTheartificial beecolony(ABC)concept was introducedbyDervis Karabogain2005,motivated by the intelligent behavior of honey bees. In[93], ABC is defined as a heuristic approach that has theadvantages of memory, multi-characters, local search, anda solution improvement mechanism. In the ABC model,the colony consists of three groups of bees:employedbees, onlookers, and scouts. The objective is to determinethe locations of the best sources of food. The employedbees will look for food sources; if the nectar amount of anew source is higher than that of the one in their mem-ory, they will memorize the new position and forget theprevious one. The position of a food source representsa possible solution to the optimization problem, and thenectar amount of the source corresponds to the quality orfitness of the solution [94, 95].In [96], Sultan et al. applied the artificial bee colony opti-mization to the problem of relay selection and transmitpower allocation in CR networks. The authors aimed atmaximizing the SNR at the secondary destination whilekeeping the level of interference low. In the ABC model,the fitness function can be represented by SNR, and theinterference threshold level is the main constraint. Theauthors defined the role of employed bees to search forsolutions by comparing the neighboring food sources withthe one they memorized and updating their memory withthe best solution that improves the fitness function andsatisfies the constraints.The authors in [97] used ABC for multiple relay selec-tion in a cooperative cognitive relay network. Their aim isto find the optimal SNR, considered as the fitness value,and the best relay to cooperate, considered as the bestfood source.The authors in[98] and[99] usedABCfor spec-trumallocationincognitive radionetworks. In[98],Ghasemi et al.aimed to optimize spectrum utilizationAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 12 of 20while maintaining fairness. The position of a food sourcerepresented a possible solution of the optimization prob-lem, and the amount of a food source corresponded to thequality of the solution.In [99], the authors addressed channel allocation andfairnessamongdevices. Theyaimedtomaximizetheutility functionthat consideredthe total reward, theinterference constraint, and the conflict-free channelassignment.Pradhan in [100] studied the performance of ABC inCR parameter adaptation. The control parameters con-sidered were the transmit power and the type of modu-lation (BPSK, QPSK, and QAM). The author comparedthe performance of ABC with the performance of geneticalgorithm and particle swarm optimization in terms offitnessconvergencecharacteristic, optimal fitness, andcomputation time.1.4 Discussion: challenges and evaluationsFirstly, this section presents the learning techniquesevaluatedagainst eachotherwithinaCRcontext. Inaddition, the challenges facing these learning techniques,strengths, and limitations are presented.1.4.1 Learning techniques evaluation: strengths,limitations, and challengesLearning techniques may have many advantages;how-ever, theirimplementationmayfacemanychallenges.The strengths and limitations as well as implementationchallengesof thepreviouslymentionedlearningtech-niques are presented in this subsection and summarizedin Table 1.Neural networks Inspired from the biologicalnervous system, the artificial neural network is usedto accomplish the learning process, identify newpatterns, perform classifications, and improve thedecision-making process. ANNs are based onempirical risk minimization. They provide anadaptation ability to minor changes of theenvironment and confidence information about thedecision made. ANNs have the ability to describe amultitude of functions and are conceptually scalable.They are not sequential or deterministic; however,they process in parallel which permits solutions toproblems where multiple constraints must besatisfied simultaneously. In addition, the neuralnetwork will have the ability to work on forwardpropagation mode as an analytical tool on otherdata once the network is trained to a satisfactorylevel. The output of a forward propagation run is thepredicted model for the data which can then be usedfor further analysis and interpretation. Neuralnetworks can be constructed by only a few samples,thus reducing the complexity of the solution. ANNscan be combined with CBR and GA in the trainingphase.ANNs are categorized under supervised learningwhich requires prior knowledge of the environmentand training data. Neural networks may face slowtraining depending on the network size. In addition,it is possible to overtrain a neural network, whichmeans that the network has been trained exactly torespond to only one type of input. The network willbe then memorizing all input situations instead oflearning. ANN will then face overfitting when thedeveloped model is not general. ANNs may also facedifficulty with infinite recursion and structuredrepresentations. Neural networks provide multiplesolutions associated with local minima and for thisreason may not be robust over different samples.Since artificial neural networks require training datalabels, the outcome accuracy and algorithmperformance are highly related to the data used fortraining the model. Therefore, the main challenge isto collect the training data and make sure to use cleanand task-relevant data in the training phase. Thesecond challenge is to deal with the data and thechoice of initial parameters and attributes that can benonlinear, complex, and numerous which may lead toslow training performance.Support vector machines are supervised learningmodels used for classification, regression analysis, andoutlier detection. They provide high performance inmany real-world problems due to their generalizationability and robustness against noise and outliers. Thebasic idea of SVMs is detecting the soft boundary of agiven set of samples so as to classify new points asbelonging to that set or not. A good separation isachieved by the hyperplane that has the largestdistance to the nearest training data points of anyclass, since in general the larger the margin the lowerthe generalization error of the classifier. SVMs mapthe input features into a high-dimensional featurespace in which they become separable. This mappingfrom the input vector space to the feature space is anonlinear mapping achieved by using kernelfunctions. SVM can accommodate high dimensionalspaces and can still be effective in cases where thenumber of dimensions is greater than the number ofsamples. SVMs produce very efficient classifiers withhigh prediction accuracy and less overfitting even iftraining examples contain errors. SVMs deliver aunique solution, since the optimality problem isconvex, which is considered an advantage comparedto Neural Networks, which provide multiple solutionsassociated with local minima. SVMs are versatile,different Kernel functions can be specified for theAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 13 of 20Table 1 Learning techniques applications in CR, strengths and limitationsLearning technique Spectrum sensing (SS) Decision-making Strengths Limitations and challengesAdaptation ability to minor changes Require training data labelsNeural networks Construction using few examples, Poor generalizationthus reducing the complexity OverfittingSupport Generalization ability Requires training data labelsvector Robustness against noise and outliers and previous knowledge of the systemmachine Complex with large problemsMulti-objective optimization Require prior knowledge of the systemGenetic algorithms Dynamically configure the CR Suitable fitness functionbased on environment changes High complexity with large problemsGame theory Related to the capabilities of the Reduces the complexity of adaptation Requires prior knowledge of the systemspectrum-sensing technique used Solutions for multi-agent systems and labeled training dataReinforcement Learning autonomously using feedback Needs learning phase of the policieslearning Self-adaptation progressively in real timeFuzzy logic Related to the capabilities of the Simplicity, decisions are Needs rules derivationspectrum-sensing technique used directly inferred from rules Accuracy is based on these rulesEntropy approach Statistical model Requires prior knowledge of the systemRelated to the capabilities of the Simplicity Requires prior knowledge of the systemDecision tree spectrum-sensing technique used Decision using tree branches May suffer overfittingRequires labeled training dataArtificial Parallel search for solutions Requires prior knowledge of the systembee colony Requires a fitness functionBayesian Probabilistic models Requires prior knowledge of the systemMay face computational complexityMarkov model Statistical models Requires prior knowledge of the systemScalable May face computational complexityCase- Find acceptable solution based Complex search in large databasesbased on the existing case found Requires predefined and relevant casesreasoning in the case database Mistakes propagationdecision function. Common kernels are provided,but it is also possible to specify custom kernels.SVM may provide high performance in smallproblems, however, its complexity increases in largeproblems. If the number of features is much greaterthan the number of samples, the method is likely togive poor performance. Support vector machines arepowerful tools, but their computation and storagerequirements increase rapidly with the number oftraining vectors. The core of an SVM is a quadraticprogramming problem, separating support vectorsfrom the rest of the training data. They do notdirectly provide probability estimates, these arecalculated using an expensive five-foldcross-validation. Therefore, SVMs are consideredcomputationally expensive, they run slow with longtraining time.By introducing the kernel, SVMs gain flexibility in thechoice of the functional margins form, however, thekernel function can be any of the following: linear,polynomial, sigmoid, Gaussian, radial basis functionor custom kernels as a python function or bypre-computing the Gram matrix. Choosing theappropriate and convenient Kernel function may bechallenging since it would affect the algorithmperformance, efficiency and accuracy. In addition,the size of the kernel cache has a strong impact onrun times for larger problems. Support vectormachine algorithms require labeled training data andprevious knowledge of the system which may be alsochallenging due to the diversity of the featuresaffecting the decisions. SVMs are not scale invariant,so it is highly recommended to scale the data forefficient system performance.Abbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 14 of 20Genetics algorithms are a randomized heuristicsearch strategy where the population is composed ofcandidate solutions obtained and emerged viamutation and crossover. GA solve multi-objectiveoptimization problem and dynamically configure theCR in response to the changing wireless environment.They aim to optimize nonlinear systems with a largenumber of variables by providing multiple solutionshaving fitness scores measuring how close a candidateis to being a solution. Genetic algorithms search inparallel from a population. Therefore, it has the abilityto avoid being trapped in a local optimal solution liketraditional methods, which search from a single point.GAs are then faster and consume lower memory thansearching a very large space. In addition, GA aresimple and easy since the values of the fitnessobjective function are used for optimizationpurposes.Optimization based on fitness function cannot assureconstant optimization response times which limitsthe genetic algorithms use in real-time applications.In addition, GA may not converge to a globaloptimum specially when the populations have a lot ofsubjects and performance metrics. The algorithmmay get stuck on local maxima and may not provideoptimal and complete solutions. The algorithmperformance highly depends on the fitness functionwhich may generate bad chromosome blocks in spiteof the fact that only good chromosome blocks crossover. GAs are considered slow since evaluation of thefitness is computationally intensive. The mainchallenge is that GAs require prior knowledge-basedlearning and derived fitness functions; thus, new rulesare formulated on the basis of training examplesand/or patterns observed in previous iterations of thesearch. Setting up the problem and selecting theparameters to express the fitness function are criticalto bias the next generation towards better genes.Determining chromosomal representation ofparameters, domain, and range are also challengingsince they are dependent on the studiedproblem.Game theory reduces the complexity of adaptationalgorithms in large cognitive networks. It providessolutions for decentralized multi-agent systems,similar to the players in a game, under partialobservability assumptions. Games can be eithercooperative or competitive. In cooperative games, allplayers are concerned about all the overall benefits,and they are not very worried about their ownpersonal benefit. However, in competitive games,every user is mainly concerned about his personalpayoff, and therefore, all its decisions are madecompetitively. Game theory primary applications arein linear programming, statistical decision-making,and operations research.One of the most basic limitations of game theory isthat each player must know the cost functions of theother players. Game theory can use Bayesian analysisto reason about the cost based on observations ofactions chosen by the player. In addition, in the caseof nonzero-sum games, multiple Nash equilibria mayexist. Game theory is also limited by the number ofplayers in the game, since the analysis of the gamingstrategies turns out to be increasingly complicated.Therefore, game theory requires prior knowledge ofthe system as well as labeled training data. Theplayers require knowledge of different parameterssuch as SINR, power, and price from base stations,which is impractical in many situations.Reinforcement learning can be used by agents tolearn autonomously using the feedback of the actions.RL is used with multi-agent systems to solveclassification and decision-making tasks.Reinforcement learning uses rewards to learn asuccessful agent function. The system can achieveself-adaptation progressively in real time. RL agentslearn to interact with an environment and have thegoal to optimize the cumulative reward.The main limitation of RL is that the actions of theagents and reward function should be defined basedon the system and task requirements. A feedback isneeded to make decisions.Using RL, the system engine needs to perform alearning phase to acquire the optimal and suboptimalpolicy. In dynamic environments, it would bechallenging, sometimes even impossible, to providethe agents with the correct actions associated withthe current situation.Fuzzy logic is flexible, provides intuitiveknowledge-base design with easy computation, andsimple implementation and interpretation. Fuzzylogic is used as an application to systems that aredifficult to model with unclear quality boundaries.Instead of using complicated mathematicalformulations, fuzzy logic uses human-understandablefuzzy sets and inference rules to obtain the solutionthat satisfies the desired system objectives. Therefore,the main advantage of fuzzy logic is its lowcomplexity and its suitability for real-timeapplications such as cognitive radio applications.Using fuzzy logic, solutions can be obtained givenimprecise, noisy, and incomplete input information.There may be some disadvantages to fuzzy logic; forexample, stability, accuracy, and optimality of thesystem are not guaranteed. In addition, increasing thedimensionality may result in inefficient and memoryintensive settings for most functions. Fuzzy logicAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 15 of 20highly depends on the fuzzy inference rules whichmay be tuned manually, and settings can be made bytrial and errors.The derivation of rules is challenging. These rulesneed to be relevant and rely on relevant features.Moreover, they should be updated when needed.Therefore, the accuracy of the fuzzy logic systemdepends on the completeness and accuracy of therules. The system may return inappropriate decisionsif the rules are not convenient and accurate.Entropy is a measure of the uncertainty in a randomvariable which quantifies the expected value of theinformation contained in a message. The informationentropy can therefore be seen as a numerical measurewhich describes how uninformative a particularprobability distribution is. Independently of anyBayesian considerations, the problem considered as aconstrained optimization problem has the entropyfunction as the objective function. Entropy can becombined with reinforcement learning and theMarkov model.Although entropy is often used as a characterizationof the information content of a data source, thisinformation content is not absolute: it dependscrucially on the probabilistic model. In addition, itrequires prior knowledge of the system (mainly priorknowledge of the model-describing parameters to belearned).Decision tree learning uses a decision tree to create amodel that predicts the value of a target variablebased on several input variables. The decision tree issimple to understand and to interpret byvisualization. The decision can be directly madebased on the tree branches. The cost of using the treefor prediction and decision-making is logarithmic inthe number of data points used to train the tree.Decision tree learning can handle different types ofdata such as numerical and categorical data andmulti-output problems. Each path of the decision treecontains the elements of the paths and the assignedrisk factor which is the estimated likelihood ofoccurrence of the terminal event in the path.Decision trees depict future decision points andpossible chance events. They add to the confidenceand accuracy of the decisions.A decision tree needs labeled training data and doesnot support missing values. The decision tree mayface overfitting which will make the system modelloose its learning aspect. The model is highly relatedto the input training data which may make thesystem unstable because small variations in the datamight result in a completely different tree beinggenerated. In addition, the problem of learning anoptimal decision tree may benondeterministic-polynomial (NP) timeNP-complete which may require heuristic algorithmsto obtain solutions. Such algorithms may notguarantee global optimal solutions for decision trees.The main challenge is that the decision tree requireslabeled training data and prior knowledge of thesystem. The selection of training data is critical. Onone hand, decision trees may suffer overfitting whenall cases are taken into consideration for building thetree, on the other hand, decision trees may be biasedif some classes dominate in the training data set.Therefore, a balanced labeled data set is requiredprior to fitting with the decision tree without anymissing values.Artificial bee colony algorithm finds the optimalposition of a food source representing a possiblesolution to the optimization problem. ABC searchesin parallel over several constructive computationalthreads based on local problem data and a dynamicmemory structure containing information on thequality of the previously obtained results. The ABCalgorithm provides uni-modal and multi-modalnumerical optimization.However, the artificial bee colony is considered as aNP-hard problem with high dimensionality. ABC isvery effective in solving small- to medium-sizedgeneralized assignment problems. Its efficiencydepends on the size of the solution space, number ofvariables and constraints used in the problemmodeling, and the structure of the solution spacesuch as convexity. In addition, ABC needs priorknowledge of the system.Bayesian approachis based on probabilistic learning.It provides exact inferences which do not rely on largesample approximations with simple interpretations.Bayesian inference estimates a full probability modeland allows prior knowledge and results to be used inthe current model. Bayesian inference has a statisticaldecision to facilitate decision-making, it includesuncertainty in the probability model, yielding morerealistic predictions. The Bayesian approach does notface overfitting since it uses observed data only.The Bayesian approach is only useful when priorknowledge is reliable and distribution for allparameters is known. It may also face computationalcomplexity since it involves high-dimensionalintegrals.Markov model chain generates sequences ofobservation symbols by making transitions from stateto state in time or space with fixed probabilities. TheMarkov chain contains a finite number of states andcorresponding transition probabilities. The currentstate of the system depends on previous events andtheir successive structure to determine the process.Abbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 16 of 20For instance, a first-order Markov chain takesaccount of only a single step in the process, but it canbe extended so that the probabilities associated witheach transition depend on multiple events earlierthan the immediately preceding one. Markov modelsare relatively easy to derive and infer from successivedata. The Markov model is scalable and can modelcomplex statistical models. It can be used forclassification and prediction based on experiences. Itdoes not require deep insight into the mechanisms ofdynamic change. The basic transition matrixsummarizes the essential parameters of dynamicchange.However, the Markov model is computationallycomplex. It is expensive, both in terms of memoryand time. The Markov model requires good trainingsequence. In some cases, the data available will beinsufficient to estimate reliable transitionprobabilities, especially for rare transitions. Inaddition, the Markov model is a successional modelwhere the performance efficiency depends onpredictions of system behavior over time.The construction of Markov models requiresaccurate data information to determine the transferprobabilities and state conditions.Case-based reasoning is capable of solving problemsbased on predefined cases. CBR is used to determinean acceptable solution based on the existing casefound in the case database. CBR allows learningwithout knowledge on how rules and cases arecreated. The system will learn from previous and newsituations and update its database accordingly whichmakes development and maintenance easier. CBR canbe used when a domain model is difficult to generate,complex and has multiple-variable situations.CBR faces many limitations. It can only be used whenrecords of initial and previously successful solutionsexist. It can take a large storage space for all the casesand may face a large processing time to find similarcases in the database. CBR relies on previous caseswhich might include irrelevant patterns. Therefore,CBR needs appropriate training before deploymentsince incorrectly solved cases may lead to mistakepropagation. In addition, populating and searching alarge database can be time consuming and difficultwhich highlight the need of fast and efficientcase-selection algorithms. Removing irrelevantpatterns may reduce the case retrieval time and leadto higher efficiency.Multi-agent systems are suitable for multi-playerdecision problems. They provide negotiation,learning, and cooperative-based approaches betweenagents which can be in our case cognitive radio users.MAS are quick, reliable, and flexible. They canaccommodate multiple user networks; however, largenetworks may cause high cost and complexity. MAScan be combined with RL and game theory that alsoprovide solutions using interactions between usersfor enhancing the performance of the system.1.4.2 Applying learning in cognitive radioThe various cognitive radio network tasks need differ-ent learning techniques to learn and adapt to any changein the environment. In addition, a CR may not have anyprior knowledge of the operating RF environment suchasnoise, interference, andtraffic. Therefore, themostsuitable learning technique depends on (1) the availableinformation, (2) network characteristics, and (3) the CRtask and problem to address.Supervised vs. unsupervised learningThe availabilityof information affects the choice of learning techniques.In this scope,we differentiate between supervised andunsupervised learning techniques.Supervised learning is used when training data is labeledand the CR has prior information about the environment.For instance, if the CR knows some signal characteris-tics, supervised training algorithms may help in achievingbetter signal detection and system performance. Super-vised learning techniques such as decision tree,neuralnetworks, support vector machine, and CBR may differ intheir strengths, limitations, challenges, and applications incognitive radio networks.The ANNalgorithm is based on empirical risk mini-mization. It provides real-time solutions which allow fastresponses to changing radio environment. For instance,ANN can be used at the secondary user end in cognitiveradios to act dynamically whenever there is PU activitydetected over the channel to avoid collision. ANNs canbe constructed by only a few examples, thus reducing thecomplexity of the solution,and can perform classifica-tion and assist in solution adaptation process. ANNs canbe used in spectrum sensing and adapting radio param-etersinCR. SVMalgorithmsarebasedonstructuralriskminimization. Theyprovidesuperiorperformancecompared to ANNs, especially for small training exam-ples, sincetheyavoidtheproblemof overfitting. TheSVM classifier is embedded in a CR terminal of the sec-ondary network and can be used for spectrum sensingand decision-making. CBR generally provides good or rea-sonable solutions; however, optimal solutions cannot befound. ACRcanuseaCBRtodetermineacceptableactions for the current environment based on the exist-ing cases in the database. In general, the database maynot include all possible situations a CR may encounter;therefore, a CR needs to learn new cases, generate newactions,and update the case database.Decision tree issimple to understand and to interpret since trees can beAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 17 of 20visualized. Unlike other methods, decision trees can pro-cess categoricaland numericaldata even without datanormalization.In contrast with the supervised approaches, unsuper-visedalgorithms donot requirelabeledtrainingdatasuchas game theory, reinforcement learning, geneticalgorithms, Markov model, ABC, entropy, and Bayesianapproaches. Unsupervised learning is used when some RFcomponents are unknown to the CR network which willhave the opportunity to perform autonomously withouthaving any prior knowledge.Game theory, RL, and Markov model can be used incase of multi-agent interaction for decision-making andsensing and learning the environment.However,othertechniques are used as a single-agent learning process.Single-agent and multi-agent systems and network char-acteristics are discussed further in the next subsection.Game theory presents a suitable platform for modelingrational behavior among cognitive radio users. Game the-ory can be applied to power control, resource allocation,spectrummanagement, and routing algorithms. The deci-sions made by the agents define an outcome for the gamein a way to maximize utility based on payoff matrix; thebest prediction for the outcome of a game is the equilib-rium of the game. Unlike in the game theoretical setting,in reinforcement learning, agents are not assumed to havefull access to the payoff matrix. The agents are taken to beplayers in a normal formgame, which is played repeatedly,in order to improve their strategy over time. Reinforce-ment learning allows agents to learn from their past statesin order to better perform their following actions andmoves. For instance, using RL, each SU senses the spec-trum, perceives its current transmission parameters, andtakes necessary actions when a PU appears. Penalty val-ues are enforced when the SUs interfere with PUs, whichimprove the SU learning and the overall spectrum usage.Markov model can be used in case the problem considersnetwork states and situations. This model can be used toidentify the sequences of observations. For instance, CRwill be able to observe, recognize, and classify receivedstimuli. Note that RL and Markov models can be used forsingle-agent and multi-agent systems.When states or candidate solutions have a large num-ber of successors, searching spaces with traditional searchmethods would be difficult. Genetic algorithms are thenused to improve the model-training efficiency. Fuzzy logicapproaches do not model the interaction between SUsand PUs for spectrum access. This modeling can be effi-ciently performed using Markov chains. Fuzzy logic canbe used in decision-making to select the best suited SUforspectrumaccess, multi-hoprouting, anddetectingunauthorized users. Fuzzy logic approaches use a straight-forwardinferencemethodbasedonpredefinedrules.Therefore, if the considered problem relies on rules withhigh uncertainty, fuzzy logic is recommended. However,when specific cases are considered, case-based reasoningis advised. Unlike Bayesian networks, fuzzy logic handlesfeedback loops. The Bayesian approach provides a sta-tistical decision to facilitate decision-making. It includesuncertainty in the probability model, yielding more real-istic predictions. Independently of any Bayesian consid-erations, the constrained optimization problem may haveentropy function as the objective function. Entropy is ameasure of the uncertainty in a random variable whichquantifies the expected value of the information containedinamessage. It canbecombinedwithreinforcementlearning and Markov model.Single-agent vs. multi-agent The learningtechniquedepends also on the systemcharacteristics such assingle-agent, multi-agent, centralized, and decentralizedsystems [7].In single-agent systems, an agent makes decisions with-out interactions with other agents in the network. Basedon the network characteristics and available information,the user will learn the system, make actions, and recon-figure parameters accordingly.Allthe abovementionedtechniques can be used in single-agent systems exceptgame theory where multiple users must interact to achieveperformance gains.In multi-agent systems, the users may interact, negoti-ate, and cooperate to provide more effective communi-cation between network entities. Using MAS in cognitiveradio networks will allow the users to manage their ownspectrum dynamically and in a decentralized manner. Theagents will perceive their environment and react accord-ingly. For instance, using MAS, the interaction betweenagents canbe cooperative where all agents are con-cerned about system benefits and performance withoutconsideringtheirownbenefits. Forinstance, coopera-tive game theory is considered to avoid interference byreducingthetransmissionpowerof SUs. However, incompetitive games, the agents make decisions compet-itivelytomaximizetheirindividual gains. Theassoci-ationof MASandtheCRcanprovidemoreeffectivecommunication between network entities and can leadtoabetterexploitationofunusedspectrumandopti-mal resourcemanagement whilereducingtheriskofinterference.Matching learning and CR tasksIngeneral, theCRtasks can be divided into two main categories: (1) cog-nitivecapabilityfor spectrumsensingand(2) recon-figurable capability for decision-making andresourcemanagement.Performing spectrum sensing with supervised trainingdata can be performed by using classification techniquessuch as neural networks and SVM. Some tasks related toAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 18 of 20spectrum sensing such as choosing a spectrum-sensingtechnique and spectrum-sensing order can be done usingfuzzy logic, game theory, and decision tree, respectively[24, 41, 64]. In addition, Markov and entropy approachescan be used for spectrum sensing [101].For decision-making,several learning techniques canbe used such as decision tree, ABC, fuzzy logic, geneticalgorithms, case-based reasoning, game theory, reinforce-ment learning, entropy, and Markov approaches. The useof these techniques may differ based on the system char-acteristics. For instance, game theory is used in decen-tralized multi-agent systems while reinforcement learningcan be used in centralized and single-agent systems. Neu-ral networks and SVM can be also used for some appli-cations in decision-making, for instance, in [37], neuralnetworks are used to adjust the parameters of the systemso as to effectively adapt to the environment as it changes,and in [52], the SVM technique builds a model for deter-mining the modulation scheme using the following inputs:bit error rate, SNR, and data rate. We remind the readerthat the learning techniques and their applications in CRare summarized in Table 1.Therefore, determining the most suitable learning tech-nique for each task in general may not be feasible. How-ever, the choice of learning technique will depend on themain problem objective and type, available information,and network characteristics.2 ConclusionsThispaperpresentedasurveyontheapplicationsofmachine-learning techniques to cognitive radio networks.It introduced recent advances on cognitive radio networksandartificial intelligenceandemphasizedtheroleoflearningincognitive radios. The literature reviewofthe state-of-the-art achievements in applying machine-learning techniques to CRis presented and catego-rizedunder the followingmajor artificial intelligencetechniques. This paper presented CR tasks and challengeswith the necessary evaluation and challenges of the learn-ing techniques application in cognitive radio networks.Finally, the paper provided different points of view andforms different angles on the application of learning in CRnetworks.Competing interestsThe authors declare that they have no competing interests.Received: 1 September 2014Accepted: 13 May 2015References1. Cisco, Cisco visual networking index: forecast and methodology,2014-2019. Cisco and/or its affiliates White Paper, 114 (2012)2. J Mitola, GQ Maguire, Cognitive radio: making software radios morepersonal. IEEE Personal Commun. 6(4), 1318 (1999)3. J Mitola III, in Dissertation for the degree of Doctor of Technology inTeleinformatics. Cognitive radio: an integrated agent architecture forsoftware defined radio (Royal Institute of Technology (KTH) Stockholm,Sweden, 2000)4. S Haykin, Cognitive radio: brain-empowered wireless communications.IEEE J. Selected Areas Commun. 23(2), 201220 (2005)5. L Wang, W Xu, Z He, J Lin, in Proceedings of the Second InternationalConference on Power Electronics and Intelligent Transportation System.Algorithms for optimal resource allocation in heterogeneous cognitiveradio networks (Shenzhen, China, 2009)6. Y Zhao, L Morales-Tirado, in Proceedings of the International Conferenceon Computing, Networking and Communications. Cognitive radiotechnology: principles and practice (Maui, Hawaii, 2012)7. M Bkassiny, Y Li, SK Jayaweera, A survey on machine-learning techniquesin cognitive radios. IEEE Commun. Surveys Tutor. 15(3), 11361159(2013)8. L Gavrilovska, V Atanasovski, I Macaluso, L DaSilva, Learning andreasoning in cognitive radio networks. IEEE Commun. Surveys Tutor.15(4), 17611777 (2013)9. E Biglieri, AJ Goldsmith, LJ Greenstein, NB Mandayam, HV Poor, Principlesof Cognitive Radio. (Cambridge University Press, New York, USA, 2012)10. F Khan, K Nakagawa, Comparative study of spectrumsensing techniques incognitive radio networks, (Sousse, Tunisia, 2013)11. AM Wyglinski, M Nekovee, T Hou, Cognitive Radio Communications andNetworks: Principles and Practice. (Elsevier, San Diego, California, USA,2009)12. RC Qiu, Z Hu, H Li, MC Wicks, Cognitive Radio Communications andNetworking Principles and Practice. (John Wiley and Sons, UnitedKingdom, 2012)13. R Umar, AUH Sheikh, in Proceedings of the International Conference onMultimedia Computing and Systems. Cognitive radio oriented wirelessnetworks: challenges and solutions (Tangier, Morocco, 2012)14. A Ghasemi, ES Sousa, Spectrum sensing in cognitive radio networks:requirements, challenges and design trade-offs. IEEE Commun. Mag.46(4), 3239 (2008)15. B Wang, KJR Liu, Advances in cognitive radio networks: a survey. IEEE J.Selected Topics Signal Process. 5(1), 523 (2011)16. VT Nguyen, F Villain, Y Le Guillou, in Proceedings of the 3rd InternationalConference on Awareness Science and Technology. Cognitive radiosystems: overview and challenges (Dalian, China, 2011)17. WA Woods, Important issues in knowledge representation. Proceedingsof the IEEE. 74(10), 13221334 (1986)18. SJ Russell, P Norvig, Artificial Intelligence A Modern Approach. (PearsonEducation Limited, London, United Kingdom, 2014)19. LA Zadeh, Fuzzy sets. Information and Control. 8, 338353 (1965)20. LA Zadeh, Communication fuzzy algorithms. Inform. Control. 12, 94102(1968)21. EP Dadios, Fuzzy Logic - Algorithms, Techniques and Implementations.(InTech, Rijeka, Croatia, 2012). doi:10.5772/266322. P Kaur, M Uddin, A Khosla, in Proceedings of the Eighth InternationalConference on ICT and Knowledge Engineering. Fuzzy based adaptivebandwidth allocation scheme in cognitive radio networks (Bangkok,Thailand, 2010)23. SR Aryal, H Dhungana, K Paudyal, in Proceedings of the Third AsianHimalayas International Conference on Internet. Novel approach forinterference management in cognitive radio (Kathmandu, Nepal, 2012)24. M Matinmikko, J Ser Del, T Rauma, M Mustonen, Fuzzy-logic basedframework for spectrum availability assessment in cognitive radiosystems. IEEE J. Selected Areas Commun. 31(11), 11361159 (2013)25. H Qin, L Zhu, D Li, in Proceedings of the Eighth International Conference onWireless Communications, Networking and Mobile Computing. Artificialmapping for dynamic resource management of cognitive radionetworks (Shanghai, China, 2012)26. K Lee, M El-Sharkawi, Modern Heuristic Optimization Techniques: Theoryand Applications to Power Systems. (John Wiley and Sons, Hoboken, NewJersey, USA, 2008)27. M Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms.(John Wiley and Sons, Hoboken, New Jersey, USA, 2003)28. S Chen, TR Newman, JB Evans, AM Wyglinski, Genetic algorithm-basedoptimization for cognitive radio networks. IEEE Sarnoff Symposium.(Princeton, New Jersey, USA, 12-14 April 2010)29. M Riaz Moghal, MA Khan, HA Bhatti, in Proceedings of the 6thInternational Conference on Emerging Technologies. SpectrumAbbas et al. EURASIP Journal on Wireless Communications and Networking(2015) 2015:174Page 19 of 20optimization in cognitive radios using elitism in genetic algorithms(Islamabad, Pakistan, 2010)30. JF Hauris, D He, G Michel, C Ozbay, in Proceedings of the IEEE MilitaryCommunications Conference. Cognitive radio and RF communicationsdesign optimization using genetic algorithms (Orlando, Florida, USA,2007)31. C Zeng, Y Zu, in Proceedings of the International Conference onCommunication Technology and Application. Cognitive radio resourceallocation based on niche adaptive genetic algorithm(Beijing, China, 2011)32. R Rojas, Neural Networks A Systematic Introduction. (Springer, BerlinHeigelberg, Germany, 1996)33. SO Haykin, Neural Networks and Learning Machines. (Pearson PrenticeHall, New Jersey, USA, 2008)34. X Tan, H Huang, L Ma, in Proceedings of the IEEE TENCONSpringConference. Frequency allocation with artificial neural networks incognitive radio system (Sydney, Australia, 2013)35. T Zhang, M Wu, C Liu, in Proceedings of the 8th International Conferenceon Wireless Communications, Networking and Mobile Computing.Cooperative spectrum sensing based on artificial neural network forcognitive radio systems (Shanghai, China, 2012)36. Y Yang, H Jiang, C Liu, Z Lan, in Proceedings of the Spring Congress onEngineering and Technology. Research on cognitive radio engine basedon genetic algorithm and radial basis function neural network (Xian,China, 2012)37. JJ Popoola, R Van Olst, in Proceedings of the AFRICONConference.Application of neural network for sensing primary radio signals in acognitive radio environment (Livingstone, Zambia, 2011)38. D Bellhouse, The problem of Waldegrave. Electronic J. History ProbabilityStat. 3(2), 112 (2007)39. Z Han, D Niyato, W Saad, T Basar, A Hj?rungnes, Game Theory in Wirelessand Communication Networks: Theory, Models, and Applications.(Cambridge University Press, New York, USA, 2012)40. OB Abdulghfoor, M Ismail, R Nordin, in Proceedings of the IEEEInternational Conference on Space Science and Communication.Application of game theory to underlay ad-hoc cognitive radionetworks: an overview (Melaka, Malaysia, 2013)41. Y Li, H Zhang, T Asami, in Proceedings of the Wireless Communications andNetworking Conference. On the cooperation between cognitive radiousers and femtocell networks for cooperative spectrum sensing andself-organization (Shanghai, China, 2013)42. S Pandit, G Singh, in Proceedings of the 3rd International AdvanceComputing Conference. Spectrum sharing in cognitive radio using gametheory (Ghaziabad, India, 2013)43. S Alrabaee, A Agarwal, N Goel, M Zaman, M Khasawneh, in Proceedings ofSeventh International Conference on Broadband, Wireless Computing,Communication and Applications. Comparison of spectrummanagement without game theory (SMWG) and with game theory(SMG) for network performance in cognitive radio network (Victoria,British Columbia, Canada, 2012)44. L Busoniu, R Babuska, B De Schutter, A comprehensive survey ofmultiagent reinforcement learning. IEEE Trans. Systems, Man, andCybernetics, Part C: Applications and Reviews. 38(2), 156172 (2008)45. M van Otterlo, M Wiering, Reinforcement Learning state-of-the-art.(Springer, Berlin Heigelberg, Germany, 2012)46. K-LA Yau, in Proceedings of the International Conference on WirelessCommunications and Applications. Reinforcement learning approach forcentralized cognitive radio systems (Kuala Lumpur, Malaysia, 2012)47. SS Barve, P Kulkarni, in Proceedings of the International Conference onComp

Documents

Document15