11
3610 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014 Utility-Based Resource Allocation for Multi-Channel Decentralized Networks Min Sheng, Member, IEEE, Chao Xu, Xijun Wang, Member, IEEE, Yan Zhang, Member, IEEE, Weijia Han, Member, IEEE, and Jiandong Li, Senior Member, IEEE Abstract—The architecture of decentralization makes future wireless networks more flexible and scalable. However, due to the lack of the central authority (e.g., BS or AP), the limitation of spec- trum resource, and the coupling among different users, designing efficient resource allocation strategies for decentralized networks faces a great challenge. In this paper, we address the distributed channel selection and power control problem for a decentralized network consisting of multiple users, i.e., transmit-receiver pairs. Particularly, we first take the users’ interactions into account and formulate the distributed resource allocation problem as a non– cooperative transmission control game (NTCG). Then, a utility- based transmission control algorithm (UTC) is developed based on the formulated game. Our proposed algorithm is completely distributed as there is no information exchange among different users and hence, is especially appropriate for this decentralized network. Furthermore, we prove that the global optimal solution can be asymptotically obtained with the devised algorithm, and more importantly, in contrast to existing utility-based algorithms, our method does not require that the converging point is one Nash equilibrium (NE) of the formulated game. In this light, our algorithm can be adopted to achieve efficient resource allocation in more general use cases. Index Terms—Decentralized networks, distributed resource allocation, learning, game theory. I. I NTRODUCTION D ECENTRALIZED networks are the infrastructure-less wireless networks consisting of multiple transmit-receive pairs, where each transmitter could dynamically adjust its trans- mission parameters and transmit data to its receiver [1]–[4]. Compared to the conventional networks with the control of cen- tral authorities, e.g., BSs or APs, decentralized networks have more flexibility and scalability, and hence, span a large number of real-world implementations, e.g., military communications, disaster relief or sensor networking [2], [4], [5]. Manuscript received January 27, 2014; revised June 18, 2014; accepted August 24, 2014. Date of publication September 11, 2014; date of current ver- sion October 17, 2014. This work was supported in part by the National Natural Science Foundation of China under Grants 61231008, 61172079, 61201141, 61301176, and 91338114, by the 863 Project under Grant 2014AA01A701, and by the 111 Project under Grant B08038. The associate editor coordinating the review of this paper and approving it for publication was Y. J. Zhang. The authors are with the State Key Laboratory of ISN, Xidian University, Xi’an 710071, China (e-mail: [email protected]; [email protected]. edu.cn; [email protected]; [email protected]; alfret@gmail. com; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCOMM.2014.2357028 The main characteristics of a decentralized network can be summarized as follows. 1) The lack of central controller. In such an infrastructure- less network, each transmitter is responsible for tuning its transmission strategy, e.g., transmission frequency, bandwidth, power, modulation, etc., based on its local observation. Therefore, self-organization is one funda- mental capability for a decentralized network [6], [7]. 2) The limitation of spectrum resource. The available chan- nels are limited in a decentralized network, and hence, users should compete for this precious resource to im- prove their individual performance, e.g., transmission rate or energy efficiency, thereby satisfying their individual QoS requirement. 3) The coupling among different users. Interference occurs when different users transmit on the same channel simul- taneously. Therefore, each user’s performance could be tuned by properly adjusting the operational parameters of other users. In other words, the users are coupled. According to the above three characteristics, there exist two kinds of conflicts in a decentralized network. One is the conflict between different users which is caused by the last two characteristics, i.e., the limitation of spectrum resource and coupling among different users. The other one is the conflict between system performance and individual requirement which is mainly introduced by the lack of a central controller. In fact, these two conflicts always make a decentralized network oper- ate at an inefficient point, which is termed as price of anarchy (PoA). For instance, considering some users who operate on the same channel, if all of them want to maximize their own transmission rate through power control, then the maximum transmit power will be adopted by everyone. Obviously, this is not an efficient power control scheme for this system [8], [9]. In this light, to exploit the benefits promised by the decen- tralized networks, it is essential to design distributed resource allocation strategies which should fully consider these two conflicts. Fortunately, game theory which provides a suitable paradigm to analyze the interrelationship between decision makers, can be naturally adopted to deal with the first conflict [2], [6]–[8], [10], [11]. However, designing globally optimal or even Pareto-efficient (Pareto-optimal) 1 distributed resource allocation algorithms for a decentralized network is still an open problem [2], [6], [7]. 1 Generally speaking, it is easy to prove that the global optimal solution is also Pareto-optimal, but not vice versa. 0090-6778 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Utility-Based Resource Allocation for Multi-Channel Decentralized Networks

Embed Size (px)

Citation preview

3610 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

Utility-Based Resource Allocation for Multi-ChannelDecentralized Networks

Min Sheng, Member, IEEE, Chao Xu, Xijun Wang, Member, IEEE, Yan Zhang, Member, IEEE,Weijia Han, Member, IEEE, and Jiandong Li, Senior Member, IEEE

Abstract—The architecture of decentralization makes futurewireless networks more flexible and scalable. However, due to thelack of the central authority (e.g., BS or AP), the limitation of spec-trum resource, and the coupling among different users, designingefficient resource allocation strategies for decentralized networksfaces a great challenge. In this paper, we address the distributedchannel selection and power control problem for a decentralizednetwork consisting of multiple users, i.e., transmit-receiver pairs.Particularly, we first take the users’ interactions into account andformulate the distributed resource allocation problem as a non–cooperative transmission control game (NTCG). Then, a utility-based transmission control algorithm (UTC) is developed basedon the formulated game. Our proposed algorithm is completelydistributed as there is no information exchange among differentusers and hence, is especially appropriate for this decentralizednetwork. Furthermore, we prove that the global optimal solutioncan be asymptotically obtained with the devised algorithm, andmore importantly, in contrast to existing utility-based algorithms,our method does not require that the converging point is oneNash equilibrium (NE) of the formulated game. In this light, ouralgorithm can be adopted to achieve efficient resource allocationin more general use cases.

Index Terms—Decentralized networks, distributed resourceallocation, learning, game theory.

I. INTRODUCTION

D ECENTRALIZED networks are the infrastructure-lesswireless networks consisting of multiple transmit-receive

pairs, where each transmitter could dynamically adjust its trans-mission parameters and transmit data to its receiver [1]–[4].Compared to the conventional networks with the control of cen-tral authorities, e.g., BSs or APs, decentralized networks havemore flexibility and scalability, and hence, span a large numberof real-world implementations, e.g., military communications,disaster relief or sensor networking [2], [4], [5].

Manuscript received January 27, 2014; revised June 18, 2014; acceptedAugust 24, 2014. Date of publication September 11, 2014; date of current ver-sion October 17, 2014. This work was supported in part by the National NaturalScience Foundation of China under Grants 61231008, 61172079, 61201141,61301176, and 91338114, by the 863 Project under Grant 2014AA01A701,and by the 111 Project under Grant B08038. The associate editor coordinatingthe review of this paper and approving it for publication was Y. J. Zhang.

The authors are with the State Key Laboratory of ISN, Xidian University,Xi’an 710071, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCOMM.2014.2357028

The main characteristics of a decentralized network can besummarized as follows.

1) The lack of central controller. In such an infrastructure-less network, each transmitter is responsible for tuningits transmission strategy, e.g., transmission frequency,bandwidth, power, modulation, etc., based on its localobservation. Therefore, self-organization is one funda-mental capability for a decentralized network [6], [7].

2) The limitation of spectrum resource. The available chan-nels are limited in a decentralized network, and hence,users should compete for this precious resource to im-prove their individual performance, e.g., transmission rateor energy efficiency, thereby satisfying their individualQoS requirement.

3) The coupling among different users. Interference occurswhen different users transmit on the same channel simul-taneously. Therefore, each user’s performance could betuned by properly adjusting the operational parameters ofother users. In other words, the users are coupled.

According to the above three characteristics, there existtwo kinds of conflicts in a decentralized network. One is theconflict between different users which is caused by the lasttwo characteristics, i.e., the limitation of spectrum resource andcoupling among different users. The other one is the conflictbetween system performance and individual requirement whichis mainly introduced by the lack of a central controller. In fact,these two conflicts always make a decentralized network oper-ate at an inefficient point, which is termed as price of anarchy(PoA). For instance, considering some users who operate onthe same channel, if all of them want to maximize their owntransmission rate through power control, then the maximumtransmit power will be adopted by everyone. Obviously, thisis not an efficient power control scheme for this system [8], [9].

In this light, to exploit the benefits promised by the decen-tralized networks, it is essential to design distributed resourceallocation strategies which should fully consider these twoconflicts. Fortunately, game theory which provides a suitableparadigm to analyze the interrelationship between decisionmakers, can be naturally adopted to deal with the first conflict[2], [6]–[8], [10], [11]. However, designing globally optimalor even Pareto-efficient (Pareto-optimal)1 distributed resourceallocation algorithms for a decentralized network is still an openproblem [2], [6], [7].

1Generally speaking, it is easy to prove that the global optimal solution isalso Pareto-optimal, but not vice versa.

0090-6778 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3611

In this paper, we consider a multi-user multi-channel decen-tralized network, where each user (consisting of a transmitterand receiver pair) is capable of performing channel selectionand power allocation to satisfy its transmission rate require-ment. In addition, to avoid the high communication overhead,we focus on the network where there is no information ex-change among different users, i.e., no common control channel(CCC) is introduced. We note that this consideration makes thescenario more practical but on the other hand, brings us moredifficulties in designing efficient resource allocation strategies[2]–[4], [12]–[14].

Because of the limitation of spectrum resource and couplingamong different users, not all the rate requirements of users(i.e., transmit-receive pairs) can be simultaneously satisfied[4]. Furthermore, recalling that there is no central controllerbeing responsible for scheduling users’ transmission, it is agreat challenge to provide hard rate guarantee to every userin this decentralized network. For this reason, as studied inprevious work [15]–[18], we consider softening users’ require-ments and use a sigmoid function to measure their satisfaction.Specifically, one user has very limited satisfaction when itstransmission rate is below the requirement, but the satisfactionrapidly reaches an asymptotic value when its transmissionrate is above the requirement. Based on this, we formulatethe distributed channel selection and power control problemas a non-cooperative transmission control game (NTCG). Toovercome the lack of communication between different users,a utility-based learning approach is adopted2 and a Utility-based Transmission Control algorithm (UTC) is developed,with which each user can configure its operational parametersjust by measuring local interference. More importantly, al-though there is no guarantee that the Nash equilibrium (NE) forNTCG always exists, it is proved that the decentralized networkcould operate at a global optimal point by implementing UTC.Finally, simulation results verify the validity of our analysisand demonstrate that the performance of our algorithm (e.g.,convergence speed, achieved overall utility, etc.) are better thanthat of the existing distributed algorithms.

The remainder of this paper is organized as follows. InSection II, the related work is presented. Section III describesthe system model and formulates the distributed channel se-lection and power control problem. In Section IV, we developa utility-based transmission control algorithm and analyze itscomplexity as well as efficiency. Finally, numerical and sim-ulation results are presented and analyzed in Section V, andconclusions are drawn in Section VI.

II. RELATED WORK

The game theoretic approach has been applied extensively todesign distributed resource allocation schemes in wireless com-munication systems from both the perspective of transmissionrate as well as energy efficiency [9], [19]–[22]. In [19]–[21], theconcerning problem has been formulated as a potential game[23], and then a best response dynamic (BRD) was adopted to

2The definition of utility-based learning approaches will be formally givenin next section.

achieve a pure-strategy NE. However, as discussed in the sem-inal work [23], a potential game always admits multiple pure-strategy NE solutions. Hence, for such a game the operatingpoint achieved by BRD totally depends on the starting pointand may be inefficient. To improve the efficiency of the devisedstrategy, pricing technique was introduced in [9], [22] and thePareto efficiency of the achieved NE is proved.

We note that all of the above schemes require CCC for infor-mation exchange among different agents. Hence, they are notsuitable for the decentralized network, and developing the so-called utility-based or payoff-based learning algorithms is nec-essary. Specifically, when implementing this type of algorithms,each user only needs to access the history of its own actions andutilities, and would make its decision with the local information[24]. To this end, some distributed schemes based on stochasticlearning, no-regret learning and reinforcement learning havebeen proposed in [12]–[14], respectively. It should be notedthat all the algorithms devised in [12]–[14] are utility-based,but the converging solution is a probability distribution over theset of available strategies. Therefore, the performance can onlybe evaluated from a statistical perspective in [12]–[14], i.e., theperformance of each implementation is unpredictable [4].

Recently, some studies begin to focus on developing theutility-based resource allocation strategy which can asymp-totically converge to a fixed configuration (e.g., pure-strategyNE) instead of a probability distribution [3], [4]. In [3], thedistributed channel selection problem was formulated as apotential game, and then a utility-based learning algorithm wasproposed, which could converge to a pure NE. Furthermore, notonly channel selection but also power control was consideredin [4], and another utility-based strategy was designed for oneclass of non-cooperative games. To be more specific, underthe assumption that the set of NE for the proposed game isnot empty and there is at least one NE maximizing the socialwelfare (i.e., sum of the utilities of all users), the proposeddistributed channel selection and power control scheme canasymptotically converge to the global optimal solution [4].

Actually, the above assumption is less plausible in manygeneral cases. The reason lies in two folds: 1. For a non-cooperative game there is no guarantee that the pure-strategyNE always exists [25].3 Particularly, one example can be foundin [21], which is termed as the signal-to-interference-plus-noiseratio (SINR) maximization game. 2. Even if the formulatednon-cooperative game admits a NE, the Pareto-efficiency of itsNE is hard to guarantee [19]–[21], [25]. Obviously, it is moredifficult to satisfy the more severe requirement that there existssome NE which can maximize the social welfare. In this work, anovel utility-based resource allocation algorithm is developed.More importantly, it has been proved that, even if there is noNE for the formulated game, we can also asymptotically obtainthe globally optimal solution with our proposed algorithm.

III. SYSTEM MODEL AND PROBLEM FORMULATION

As depicted in Fig. 1, we consider a decentralized net-work featuring N communicating users, each consisting of a

3Since the mixed-strategy NE will not be considered in this work, we use NEto denote the pure-strategy NE hereafter for brevity.

3612 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

Fig. 1. Illustration of a decentralized network, where each user consists of onetransmitting node and one receiving node.

transmit-receive pair. Particularly, to transmit data, every userwill choose one channel from the K orthogonal channels, eachof which has bandwidth B0. We consider that each channel canbe assigned to multiple users and meanwhile, the interferenceoccurs when each channel is simultaneously utilized by morethan one user. Without loss of generality, we suppose N ≥ K.For notational simplicity, let vectors N and K denote the setof users and channels, respectively, i.e., N = {1, 2, · · · , N}and K = {1, 2, · · · ,K}. Additionally, we denote the channelselected by user n by cn ∈ K. In this paper, we consider thatthere is no CCC or central authority for coordination amongusers. That is, all users are autonomous.

Let G ∈ RN×N×K be the channel power gain matrix, where

gkn,m represents the channel gain between transmitter n andreceiver m on channel k. We assume the channel condition isstatic during the underlying operational period, e.g., the quasi-static scenario. The additive noise is modelled as a zero-meanGaussian random variable, and then, for user n, its signal-to-interference-plus-noise ratio (SINR) can be expressed as

γn =png

cnn,n

Icnn +B0N0

=png

cnn,n∑

m∈N ,m �=n

δ(cm, cn)pmgcnm,n +B0N0, (1)

where In represents the interference caused to user n, pn is thetransmit power of user n, and N0 is the noise power density.Besides that, the indicative function δ(cm, cn) is adopted toindicate whether the same channel is used by user m and nsimultaneously or not: if cm = cn, δ(cm, cn) = 1; otherwiseδ(cm, cn) = 0. In this paper, we consider that each user ncan choose the transmit power pn from a finite set Pn ={p1n, p2n, · · · , pmax

n } [4], [12].Based on the above, the achievable transmission rate of user

n can be expressed as

Rn = B0 log2(1 + γn). (2)

Adopting different channels and power levels, one user willobtain different achievable rates. According to (1) and (2), ifuser n transmits on channel cn, Rn can be maximized withpower pmax

n when there is no interference. Therefore, the upperbound of the rate Rn for user n can be defined as

Rmaxn = max

{B0 log2

(1 +

pmaxn gcnn,nB0N0

)|cn ∈ K

}. (3)

Moreover, we consider that each user n has rate requirementRmin

n to satisfy its QoS requirement and assume that 0 ≤Rmin

n ≤ Rmaxn .

Intuitively, in this network not all the users’ rate requirementscan be guaranteed when they transmit simultaneously, espe-cially for the case where all the users’ rate requirements are high[4]. For instance, if Rmin

n = Rmaxn , ∀n ∈ N , then there are at

most K transmissions being permitted. Here, to get around thisproblem, we consider softening the user’s rate requirement andmeasure its degree of satisfaction with a sigmoid function. Infact, this approach has been widely adopted in radio resourcemanagement [15]–[18]. To this end, the utility of each individ-ual user can be expressed as

Un(Rn) =1

1 + e−βn(Rn−Rminn )

, ∀n ∈ N , (4)

where βn is a constant deciding the steepness of the satisfactorycurve, and moreover, both Rn and Rmin

n are considered to haveunits Mbps. It is clear from the above equation that Un(Rn)is a monotonic increasing function with respect to Rn, i.e.,individual users will feel more satisfied when they have higherrate. Furthermore, since lim

Rn→0Un(Rn) =

1

1+eβnRminn

> 0 and

limRn→∞

Un(Rn) = 1, the utility of each user n is scaled between

0 and 1, i.e., Un(Rn) ∈ (0, 1). We note that although the higherutility means the higher spectral efficiency for given bandwidth,the value of the former can not directly reflect the value of thelatter. Therefore, in the simulation results, not only the overallutility U but also the average rate R̄ are recorded and shown toevaluate the efficiency of different algorithms.

Before starting a transmission, each individual user shoulddecide to adopt which power level and transmit on whichchannel. For notational simplicity, we refer to a pair of channelindex and power level as a strategy sn, i.e.,

sn = (cn, pn) ∈ Sn, Sn = K × Pn, ∀n ∈ N . (5)

From (1), (2), and (4), we note that each user’s rate is affectedby the transmissions of other users and meanwhile, higher ratebrings higher satisfaction to a user. Therefore, to improve thedegree of satisfaction or utility, each user should choose its ownstrategy by considering the actions of other users. That is, thereis a coupling among the strategies employed by different users.To well study the conflict among different users, NTCG hasbeen formulated, and hereafter, the terms user and player willbe used interchangeably.

Definition: NTCG: NTCG can be represented by the tuple

G = Γ(N , (Sn)n∈N , (Un)n∈N

). (6)

Particularly, N denotes the set of players which is identical tothe user set. For each player n, its strategy space Sn is definedas shown in (5). Given a strategy profile

(sn)n∈N = (s1, s2, · · · , sN ) ∈ (Sn)n∈N (7)

the utility function of each player n is

Un

((sn)n∈N

)= Un

(Rn

((sn)n∈N

)), ∀n ∈ N , (8)

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3613

where Rn((sn)n∈N ) represents the achievable rate when playern adopts the strategy sn = (cn, pn), i.e.,

Rn

((sn)n∈N

)= log2

(1 +

pngcnn,n

Icnn (s−n) +B0N0)

). (9)

In (9), s−n = (s1, · · · , sn−1, sn+1, · · · , sN ) is the strategy pro-file of all players other than player n, and Icnn (s−n) representsthe interference caused to player n on channel cn.

Obtaining the optimal channel selection and power controlstrategy for this decentralized network is equivalent to solvingthe following combinatorial problem P which is NP-hard.

P : maxc,p

∑n∈N

Un

((cn, pn)n∈N

)(10)

s.t. c ∈ {(c1, c2, · · · , cN )|∀cn ∈ K, ∀n ∈ N}, (11)

p ∈ {(p1, p2, · · · , pN )|∀pn ∈ Pn, ∀n ∈ N}. (12)

The objective function (10) means that our objective is tomaximize the social welfare or overall utility, which is deter-mined by both the achievable and required rate of users, i.e.,(R1, R2, · · · , RN ) and (Rmin

1 , Rmin2 , · · · , Rmin

N ). In addition,constraints (11) and (12) specify each individual user’s avail-able channel and power level sets, respectively.

Unfortunately, the above problem is an integer programmingwhich is extremely difficult to solve. Moreover, due to thefact that there is no central authority controlling the users inthis decentralized network, developing a completely distributedalgorithm to obtain the optimal solution of P is important andnon-trivial.

IV. DISTRIBUTED ALGORITHM DESIGN

In this section, we would develop a utility-based algorithmfor NTCG to achieve the solution of P shown in Section III.We first prove the uncertainty of the existence of NE forNTCG, and then develop a utility-based algorithm. At theend of this section, we will investigate the complexity of theproposed algorithm and finally, prove that our algorithm canasymptotically converges to the global optimal solution underthe given condition, no matter whether this solution is a NE ofthe formulated game or not.

A. NE for NTCG

Recalling that there is no CCC for exchanging informationamong different players, the utility-based learning algorithm istherefore considered to be more appropriate for this distributedenvironment. Actually, in the recent work [4], a similar problemhas also been studied and moreover, an efficient utility-basedlearning algorithm has been proposed. Particularly, for theformulated game, authors in [4] has proved that if there existsa NE which can maximize the social welfare, this NE can beachieved with their proposed distributed algorithm. To checkwhether the algorithm devised in [4] also can be adopted tosolve our problem P or not, we would first discuss the existenceof NE for NTCG.

For a non-cooperative game, (pure-strategy) NE is a standardsolution standing for the equilibrium state, under which noplayer can unilaterally improve its own utility by choosinga different strategy [25]. Mathematical speaking, if a profiles∗ = (s∗1, s

∗2, · · · , s∗N ) in the strategy space (Sn)n∈N is a NE,

then we have

Un

(s∗n, s

∗−n

)≥ Un

(sn, s

∗−n

), ∀sn ∈ Sn, ∀n ∈ N , (13)

where s∗−n = (s∗1, · · · , s∗n−1, s∗n+1, · · · , s∗N ).

Theorem 1: There is no guarantee that the NE for NTCGalways exists.

Proof: For each player n, we have

argsn

maxUn(sn, s−n)

= argsn

max1

1 + e−ωn (Rn(sn, s−n)−Rminn )

= arg(pmax

n ,cn)

maxRn ((pmaxn , cn) , s−n)

= arg(pmax

n ,cn)

max γn ((pmaxn , cn) , s−n) . (14)

It is noted from the above equation that NTCG is identicalto the SINR-maximization game which is introduced in [21].According to the conclusion drawn from a “toy” two-user casein [21], the existence of the NE for the SINR-maximizationgame can not be guaranteed, which further indicates that theNE for NTCG may not exist too. Note that a counter examplecan be easily derived with the parameters given in Table I in[21], and hence it is omitted here.

Now, the proof is complete. �To this end, we can see that the utility-based algorithm

developed in [4] can not be directly applied to solve the problemaddressed in this work. Hence, a novel utility-based algorithmwill be devised in the following subsection.

B. Utility-Based Distributed Transmission Control Algorithm

When devising a utility-based learning algorithm, there aretwo components should be elaborated for each player: thestate profile and learning model (dynamics) [2], [24]. Morespecifically, the former depicts each player’s available localinformation, and the latter tells the users how to make theirdecisions based on this information. In this subsection, we firstdefine the proper state profile and learning model for eachplayer in NTCG in detail. Then, a utility-based distributedtransmission control algorithm is proposed.4

1) State Profile: At each decision moment t ∈ {1, 2, · · ·},we consider describing the state profile of player n with atriplet Ln(t) = (sn(t), Un(t), αn(t)), where Sn(t), Un(t), andαn(t) ∈ {0, 1} represent its strategy, utility, and mood, re-spectively. We note that the binary variable αn(t) is used toelaborate players’ desire for changing the currently adopted

4The utility-based learning approach implemented in this paper can also beviewed as a state-based learning approach. The term “utility-based” is adoptedby [2], [24] and the references therein.

3614 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

strategy, which will be specified in detail when introducing thelearning model.

2) Learning Model: Motivated by Marden’s work [26], autility-based learning model is adopted in this paper, with whicheach individual player n can update its sn(t), Un(t) and αn(t)in sequence at each decision moment t. To be specific, at thebeginning of time t, individual player n first needs to determinethe probability distribution over the set of its available strategies(i.e., mixed-strategy)

Qn(t) =(q1n(t), q

2n(t), · · · , q|Sn|

n (t)), (15)

where | · | represents the cardinality of a set, and qjn(t) is theprobability of choosing the jth strategy at time t, i.e.,

qjn(t) ≥ 0, ∀j ∈ {1, 2, · · · , |Sn|} ,|Sn|∑j=1

qjn(t) = 1. (16)

In other words, the probability distribution Qn(t) is adopted todescribe the players’ dynamics. Here, player n would updateQn(t) based on its previous mood αn(t− 1) and action sn(t−1). Particularly, if αn(t− 1) = 0,

qi(fn)n (t) =1

|Sn|, ∀fn ∈ Sn, (17)

where i(fn) denotes the index of strategy fn in Sn. The ruleshown in (17) means that if the previous mood is 0 the playerwill choose each strategy with equal probability. On the otherhand, if αn(t− 1) = 1

qi(fn)n (t) =

{εw

|Sn|−1 , ∀fn ∈ Sn, fn �= sn(t− 1)1− εw, otherwise,

(18)

where ε is a constant belonging to (0, 1) and w is a constantgreater than N . The above equation means that if the previousmood is 1 then the player will change its strategy to a differentone (i.e., fn �= sn(t− 1)) with probability εw

|Sn|−1 . Meanwhile,the same strategy (i.e., fn = sn(t− 1)) will be adopted withprobability 1− εw. Since εw is generally much less than 1,equation (18) represents that the player will the a differentstrategy with a relatively smaller probability if its mood is 1,i.e., 1− εw εw

|Sn|−1 . The main motivation behind utilizing(17) and (18) to update Qn(t) is that this rule guarantees thateach individual player would more like to choose the strategymaking its mood be 1.

After that, player n will choose an action sn(t) based onthe probability distribution Qn(t), calculate its utility Un(t) bymeasuring the interference, and finally update mood αn(t) withAlgorithm 1.

Algorithm 1 Mood updating algorithm

1: if αn(t− 1) = 1 then2: if (sn(t) = sn(t− 1)) and (Un(t) = Un(t− 1)) then3: Set αn(t) to 14: else

5: Go to 10.6: end if7: else8: Go to 10.9: end if

10: Set αn(t) to 1 and 0 with the probability ρ1 = ε1−Un(t)

and ρ0 = 1− ρ1, respectively.

3) UTC: Now, based on the above described state pro-file and learning model, UTC is developed and shown inAlgorithm 2, where players can update their strategies inparallel. Similar to [4], the stop criterion of this algorithmcan be one of the following: 1) the preset maximum iterationnumber T is reached or 2) for each player n, the variation of itsutility during a period is trivial.

Algorithm 2 UTC

1: Initialize iteration count t = 0, personality αn(t) = 0

and strategy counter Vn = (v1n, v2n, · · · , v

|Sn|n ) =

(0)1×(|Sn|), ∀n∈N . Each player n randomly chooses itsinitial strategy sn(t) and then, measures its utility Un(t).

2: repeat3: Set t = t+ 14: for n = 1 to N users do5: Update state profile Ln(t):6: if αn(t− 1) = 0 then7: Calculate Qn(t) with (17).8: else9: Calculate Qn(t) with (18).

10: end if11: Choose a strategy sn(t), measure the utility Un(t),

and update its mood αn(t).12: Update strategies count Vn:13: if αn(t) = 1 then14: Update Vn with (19).15: end if16: end for17: until the stop criterion is satisfied.18: Each player n decides its strategy sDn according to (20).

During the initialization of Algorithm 2, each player n willrandomly choose its own strategy, set its moods to 0, andinitialize the strategy counter Vn, where (0)1×(|Sn|) representsthe |Sn|-dimension null vector. We note that elements in vectorVn is used to count the times of αn = 1 when differentstrategies are adopted. For instance, vin represents the timesthat the ith strategy makes the mood of player n be 1. Whenthe initialization is completed, the algorithm goes into a loop,in which each individual player n will first update its stateprofile Ln(t) = (sn(t), Un(t), αn(t)) with the devised utility-based learning model at each iteration. We note that the SINRestimation can be done by sending a pilot or training sequence

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3615

from the transmitter to receiver in practice [27]. Therefore, theutility can be measured by each autonomous user. Then, thestrategy counter Vn = (v1n, v

2n, · · · , v

|Sn|n ) is updated based on

the current mood αn(t). If αn(t) = 1,

vi(sn(t))n = vi(sn(t))n + 1, ∀sn(t) ∈ Sn, (19)

where vi(sn(t))n is the i(sn(t))th entry in vector Vn. Intuitively,

this updating rule implies that each player would like to recordthe strategy which makes its mood be 1. When the loop isexited, individual players will make their final decisions:

sDn =argsn

(vi(sn)n =max

{v1n, v

2n, · · · , v|Sn|

n

}), ∀n∈N . (20)

From (20), we note that the strategy recorded most frequentlywill be eventually adopted by users.

The reasons why we choose the above decision rule are twofolds. Firstly, it only requires simple comparison operationswhen making final decision as shown in (20). Secondly, it canmake the solution of problem P be asymptotically achievedunder the given condition, which will be proved in the followingsubsection. We note that with the adopted learning model, sys-tem dynamics can be depicted as a perturbed Markov processand the parameter ε > 0 is the perturbation factor. Therefore,to show that the optimal strategy profile can be converged, itis essential to prove that the learning process of our algorithmwill lead to a stochastically stable strategy profile which canmaximize the overall utility. The similar idea has also beenadopted by work [4] when designing the utility-based learningalgorithm. However, authors in [4] adopted a quaternary vari-able instead of a binary variable to depict each user’s mood,which introduces a much larger state space to capture systemdynamics and hence, makes the convergence speed of theiralgorithm slower than that of ours. This will be illustratedthrough simulation results as shown in the following section.

Moreover, it is worth noting that Algorithm 2 is simple andcompletely distributed. In particular, when each player updatesits own state profile, it does not require any prior informa-tion of other players, thereby avoiding a large communicationoverhead.

C. Complexity and Efficiency Analysis of UTC

In this subsection, we first present the complexity analysisfor the proposed algorithm UTC. Then, we will analyze itsefficiency and give the main result in Theorem 2.

The main blocks of UTC are two parts. The first one is theloop from line 2 to line 17, which is independently executedby each player. The second one is the step in line 18, in whicheach player n needs to make its own final decision with (20).Note that the first main part (i.e., from line 2 to line 17)only involves basic arithmetic operations and random numbergeneration, and hence has a computational complexity of O(1)for each iteration. In addition, (20) requires the player n tocompare the all Sn elements in the vector Vn. Therefore, thecomplexity of this algorithm explicitly depends on both thestop criterion of the loop and the size of the player’s strategyspace. Particularly, for the two different stop criterions ear-

lier described, the complexities are O(T + L) and O(E + L),respectively, where T is the preset maximum iteration num-ber, L = max{|S1|, |S2|, · · · , |SN |}, and E is the convergencespeed of the algorithm. Moreover, it should be noted that theconvergence speed E is related to the value of parameter ε,which will be further discussed at the end of this subsection.

Theorem 2: Let (sOn )n∈N ∈ (Sn)n∈N denote the solution ofproblem P, i.e.,(

sOn)n∈N = arg

(sn)n∈N

max∑n∈N

Un

((sn)n∈N

). (21)

When (sOn )n∈N is unique and ε is sufficiently small, i.e., ε → 0,the solution of UTC asymptotically converges to (sOn )n∈N , i.e.,

Pr

(lim

T→∞,ε→0

(sDn

)n∈N =

(sOn

)n∈N

)= 1, (22)

where T is the number of iterations.Proof: The proof is given in Appendix A. �

We note that there is no requirement that the optimal solution(sOn )n∈N is a NE for the formulated game, and hence, thisefficient point may be ignored by existing utility-based resourceallocation algorithms which are designed to reach a NE [2]–[4]. In addition, it is worth noting that when the parameter εis given, a much larger state space will make the convergencespeed of the proposed algorithm much slower, i.e., there is acurse of dimensionality. This is mainly due to the fact that theconsidered resource allocation is essentially a combinatorialproblem which is generally NP hard. On the other hand, there isa tradeoff between the efficiency and the convergence speed ofour algorithm, which can be made by adjusting ε. Specifically,a smaller ε will lead to a slower convergence speed, but thealgorithm is more likely to converge to the global optimalsolution (sOn )n∈N . For this reason, if ε is properly set thenour algorithm still works in the scenario where the size ofstate space becomes large. In other words, a tradeoff betweenthe convergence speed and accuracy can be properly made toimplement this algorithm in practice. This conclusion will beconfirmed with simulation results in the following section.

V. RESULTS AND ANALYSIS

A. Simulation Scenario

To evaluate the performance of our proposed algorithm, weconduct simulations of a decentralized network consisting ofN transmit-receive pairs, which are randomly deployed in acircular region of radius r m. Meanwhile, the distance betweeneach transmit-receive pair is a uniform random variable be-tween 0 and D m. We assume that all the channels undergoidentically and independently log-normal shadow fading aswell as path loss and moreover, the path loss exponent α andthe shadow fading standard deviation σψ are set to 3 and4 dB, respectively. We note that this channel model has beenconfirmed empirically to accurately model the variation inreceived power in some outdoor and indoor radio propagationenvironments, see e.g., [28] and references therein. In addition,the duration of a shadow fade lasts for multiple seconds orminutes, and hence changes at a much slower time-scale [27].

3616 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

TABLE ISIMULATION PARAMETERS

We consider a three-level power set for each user, i.e., low,medium, and high power levels, which are set to −20 dBW,−10 dBW, and 0 dBW, respectively. Besides that, for each usern, the minimal rate requirement Rmin

n and steepness of thesigmoid function ωn are set to 1

10Rmaxn and 10, respectively.

In addition, each individual simulation result is obtained byaveraging over 1000 independent realizations of the users’locations and channel conditions. Unless specified otherwise,the simulation parameters are adopted as listed in Table I.

B. Convergence of UTC

Before delving into the performance of the proposed dis-tributed resource allocation algorithm UTC, we first investigateits convergence behavior and examine the impact of the algo-rithm parameter ε. According to Theorem 2, we provide themaximum overall utility as shown in (10) as a benchmark result.To solve the problem P within an acceptable period of time, asimplified scenario is considered in this simulation. Particularly,we focus on the case that there are K = 5 channels, andmeanwhile, all the users transmit with the high power level,i.e., Pn = {0 dBW}, ∀n ∈ N . When there are N = K andN = 2K users, the simulation results are illustrated in Fig. 2(a)and (b), respectively.

It can be seen from Fig. 2 that when ε becomes smaller, theconvergence speed of UTC is slower but the achieved overallutility is higher. Besides that, although there is a small gapbetween the performance of UTC and that of enumeration, ouralgorithm converges with much fewer iterations than that re-quired by the latter (i.e.,

∏n∈N

|Sn| =∏n∈N

K|Pn|). Considering

the scenario consisting of 10 users as an example, enumerationneeds 510 = 9765625 iterations but our algorithm converges inabout 40 and 100 iterations when ε is set to 10−3 and 10−5,respectively. Moreover, if ε is set to 10−5, the relative differencebetween the overall utility achieved by enumeration and thatachieved with our algorithm is only around 0.4%. RecallingTheorem 2, we note that this gap may stem from the fact thatε is not small enough and meanwhile there is no guarantee thatthe optimal solution of P is always unique in each round ofsimulation.

Next, we compare the convergence behavior of UTC and thatof the Trial and Error Learning algorithm (TEL) proposed in[4]. For fair comparison, both the perturbation factor ε usedin UTC and that in TEL are set to 0.01. Furthermore, the

Fig. 2. Convergence of UTC with respect to ε, where the number of channelsis K = 5 and the overall utility is U =

∑n∈N

Un. (a) N = K = 5 users.

(b) N = 2K = 10 users.

necessary mapping functions suggested in [4] are also adoptedin our simulation,5 i.e., G(x)=−0.2x+ 0.2 and F (x)=−0.2/N+0.2/N . During this simulation, there are N=25 users andK=5 channels, and additionally, for each user n the availabletransmit power set Pn is set as shown in Table I. The conver-gence in terms of overall utility U and of average transmission

rate R̄=

∑n∈N

Rn

N are illustrated in Fig. 3(a) and (b), respectively.From the simulation results, we note that our algorithm UTC

converges much faster than TEL. The main reasons are twofolds. On one hand, TEL introduces four states to depict eachuser’s mood, but just two states are adopted in our algorithm.This difference indicates that TEL has a much larger statespace to capture system dynamics, which further means thateach player has to search more states before making the finaldecision. On the other hand, as discussed previously, the mainresult in [4] is that when ε → 0 and there is at least one NEwhich can maximize the overall utility, TEL will asymptoticallyconverge to a NE with which the maximum overall utility canbe achieved. However, there is no guarantee that the NE forNTCG always exists, i.e., the main result given in Theorem 1in Section IV. Besides that, we can see that both higher overallutility and average rate can be achieved by UTC. This is mainlydue to the fact that the goal of our algorithm is to find the

5The designing requirements of these two mapping functions G(x) andF (x) are given in (6) and (7) in [4], and meanwhile, two instances are suggestedby simulations below the two equations, respectively.

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3617

Fig. 3. Convergence comparison of TEL developed in [4] and our algorithmUTC, where there are N = 25 users sharing K = 5 channels. (a) Overallutility U vs the number of iterations T . (b) Average rate R̄ vs the number ofiterations T .

global optimal solution of P rather than reach a NE of theformulated game.

C. Performance Comparison

In this section we will evaluate the performance of ouralgorithm UTC with the following metrics:

• Overall utility U : The sum utility of the all players, i.e.,U =

∑n∈N

Un.

• Average transmission rate R̄: The average transmission

rate achieved by the users, i.e., R̄ =

∑n∈N

Rn

N .• User satisfaction ratio ηs: The ratio of users whose rate

requirements are met over the total number of users N , i.e.,ηs =

|N0|N , ∀n ∈ N0, Rn ≥ Rmin

n . Note that | · | denotesthe cardinality of a set.

We compare our algorithm UTC with three distributedschemes which are presented as follows.

• Random: With this algorithm, each user n will randomlychoose a strategy sn from its strategy space Sn = K × Pn.Therefore, the performance of this method can be regardas the baseline.

• Greedy transmission control (GTC): This greedy based al-gorithm is proposed in [29], with which each user needs to

Fig. 4. Overall utility U vs. the number of users N .

Fig. 5. Average rate R̄ vs. the number of users N .

measure the interference on all channels and then transmitson the channel having the minimum interference with themaximum transmit power. This process is repeated untilthe stop criterion is satisfied.

• TEL: This utility-based distributed learning algorithm isdeveloped in [4]. Note that when implementing this al-gorithm, the corresponding mapping functions G(x) andF (x) are set the same as those adopted in previoussubsection B.

When running UTC and TEL, we use the parameter setting sug-gested in [4] and hence, set the perturbation factor ε to 10−2. Inaddition, for fair comparison, all the algorithms are executed inparallel and the maximum iteration number T is set to 104 [4].

Figs. 4 and 5 illustrate the overall utility U and the corre-sponding average rate R̄ versus the number of users, respec-tively. As it can be observed from the simulation results, whenthere are more transmitting users, the improvement of the over-all utility is gradually slow down and meanwhile, the averagerate becomes lower. This is because that when the density ofuser increases there is more interference in this network, and inreturn, both the achieved utility and transmission rate of eachuser would decrease. In addition, we can see that when thenumber of communicating users in this network is small (forexample N ≤ 15), the performance of GTC is good. However,when there are more users its performance becomes worse.This is mainly caused by the greedy behavior of users whenimplementing this algorithm, i.e., they always transmit with the

3618 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

Fig. 6. Performance comparison in terms of satisfaction.

maximum power to improve their own utilities. As discussedin previous studies [8], [9], such a greedy based method maycause severe interference in the system and finally may becomean inefficient resource allocation strategy.

Additionally, we note that both our algorithm and TELperform much better than the baseline algorithm (i.e., Random).Meanwhile, compared with TEL, there is also an improvementin performance by implementing our algorithm. For instance,when there are N = 50 users, UTC has around 9.7% higheroverall utility (i.e., from 40.08 to 44.77) and 12.4% higheraverage rate (i.e., from 0.884 Mbps to 0.994 Mbps) than TEL,respectively. Therefore, we can conclude that the interferencemitigation capability of UTC is the best among these fourdistributed algorithms. It should be noted that the reason for thisimprovement is similar to that stated in the previous subsection.

Next, we compare the performance of these four algorithmsfrom the perspective of user satisfaction, which is demonstratedin Fig. 6. Particularly, Fig. 6(a) illustrates the user satisfactionrate ηs versus the number of users N , and Fig. 6(b) comparesthe cumulative distribution function (CDF) of the number ofsatisfied users (i.e., |N0|=ηs ·N ) for the four algorithms whenthere are N=50 users. Three observations can be made fromFig. 6. Firstly, not all the rate requirements of users can bemet, especially for the case where the number of users islarge. We note that this result is consistent with the statementgiven in Section III. Secondly, ηs decreases with respect to N ,which is due to the fact that more users will result in higher

interference in this network. Thirdly, compared with the otherresource allocation schemes, more users can satisfy their raterequirements with our algorithm. Furthermore, combining theresults shown in Figs. 4–6, we can see that our algorithm is ableto achieve better performance on both the system and individuallevel. This is mainly for the reason that, both the selfishnessof users and welfare of the whole network are well consideredwhen developing our algorithm.

VI. CONCLUSION

In this paper, we have addressed the issue of distributedchannel selection and power control in decentralized networksand meanwhile, proposed a distributed resource allocation al-gorithm where no information exchange is introduced. Moreimportantly, we have theoretically proved that the networkscan asymptotically operate at the global optimal point withour proposed algorithm under the given condition. Simulationresults verified the validity of our analysis and demonstratedthat our algorithm always performed better than the existingones in terms of different metrics. One possible extension inthe future work is to consider the time varying characteristic ofthe network topology and speed up the convergence time.

APPENDIX APROOF OF THEOREM 2

Proof: The learning model in Algorithm 2 introduces aMarkov process over the finite state space Z =

∏n∈N (Sn ×

Un ×An), where Un is the finite range of Un over all strategyprofiles (sn)n∈N ∈ (Sn)n∈N , and An = {0, 1} is the set ofmood. Accordingly, for every scalar ε > 0 such a Markov pro-cess is perturbed and meanwhile, we denote such a “perturbed”Markov process by MPε. Before giving the proof in detail, westart by introducing some necessary definitions.

Definition 2: Interdependence: An N -person gameΓ(N , (Sn)n∈N , (Un)n∈N ) is interdependent if, for everystrategy profile (sn)n∈N ∈ (Sn)n∈N and every subset ofplayers H ⊆ N , there exists a player g /∈ H and a choice ofstrategies (s′h)h∈H ∈ (Sh)h∈H such that

Ug

((s′h)h∈H , (s)n∈{N/H}

)�=Ug

((sh)h∈H, (s)n∈{N/H}

). (23)

In other words, given a strategy profile (sn)n∈N ∈ (Sn)n∈N ,every subset of players H can cause a utility (welfare) changefor some player in {N/H} by performing a proper change intheir strategies.

Definition 3: Stochastically stable states: For a perturbedMarkov process MPε, the elements of the support of the lim-iting stationary distribution are referred to as the stochasticallystable states. Specifically, a state T ∈ Z is stochastically stableif and only if lim

ε→0π(T , ε) > 0, where π(T , ε) is stationary

distribution of the perturbed process.We divide the proof of Theorem 2 into 3 steps, S1–S3, which

we now elaborate formally.Step S1:Proposition 1: UTCG G is an interdependent game.

Proof: According to (4) and (8), the utility of each playerUn is an increasing function with respect to its achievablerate Rn. We consider a situation that a single player h can

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3619

change its strategy and other players stay the same. Letwe suppose that the current strategy profile is (sn)n∈N =((cn, pn))n∈N , and then, the according situation can be di-vided into three disjoint cases: 1. ∀n ∈ {N/{h}}, ch �= cn;2. ∀n ∈ {N/{h}}, ch = cn; 3. ∃n ∈ {N/{h}}, ch = cn, andmeanwhile ∃m ∈ {N/{h}}, ch �= cm.

In the first case, for any player g ∈ {N/{h}}, if player hchange its strategy to s′h = (c′h, ph) where c′h = cg , player gwill suffer higher interference and achieve lower rate Rg , whichwill finally make its utility Ug reduced. In the second case, ifthere exists a player g ∈ {N/{h}} whose channel index cg =ch, h can change its strategy to realize c′h �= cg , which impliesthat player g will suffer lower interference and obtain higherutility. Additionally, similar to the above discussions, it is easyto prove that there is a player g ∈ {N/{h}} whose welfare canbe changed when player h properly changes its strategy in thethird case. �

Therefore, the game UTCG G is interdependent.Step S2:

Proposition 2: For the introduced perturbed Markov processMPε = Z =

∏n∈N

(Sn × Un ×An), if and only if a state T =

((sn)n∈N , (un)n∈N , (αn)n∈N ) ∈ Z is stochastically stable,then the strategy profile can maximize the social welfare, i.e.,

(sn)n∈N = arg(fn)n∈N

max∑n∈N

Un

((fn)n∈N

). (24)

Moreover, in such a state, we have un = Un((sn)n∈N ) andαn = 1, ∀n ∈ N . Therefore, such a state can be equally rep-resented as T = ((sn)n∈N , (αn = 1)n∈N ).

Proof: According to the conclusion of Proposition 1 andthe proof of Theorem 1 in [26], Proposition 2 can be provedwith the theory of resistance trees for regular perturbed Markovdecision processes, which can be found in [30]. The detailedproof and related definitions can be founded in [26]. Here, weonly provide an outline of the proof which consists of foursteps.

First, for the unperturbed process MP0, we need to provethat the recurrence classes are all singletons T ∈ C0 and D0,where C0 denotes the subset of states in which each agent’smood is 1 and the benchmark action and utility are aligned.In other words, if ((sn)n∈N , (un)n∈N , (αn)n∈N ) ∈ C0 thenun = Un((sn)n∈N ) and αn = 1. Additionally, D0 representsthe set of states in which the mood of everyone is 0, i.e., if((sn)n∈N , (un)n∈N , (αn)n∈N ) ∈ D0 then un = Un((sn)n∈N )and αn = 0. This proof can be completed based onProposition 1, i.e., the interdependence of UTCG.

Second, it is needed to prove that the stochastic potentialof each state T ∈ C0 is γ(T ) = w(|C0| − 1) +

∑n∈N

(1− un),

where w is a constant larger than N . Here, the stochasticpotential γ(T ) is the minimum resistance over all trees rooted atthe state T [30]. This conclusion can be drawn by showing thatthe upper bound and lower bound for the stochastic potentialγ(T ) are the same.

Then, we need to apply the criterion for determining thestochastically stable states introduced in [30]. To be specific,the criterion shows that the stochastically stable states are

precisely those states contained in the recurrence classes withthe minimum stochastic potential.

Finally, by way of contradiction, it can be shown that onlythe recurrence classes (i.e., all singletons) in C0 can be thecandidates for stochastically stable states. Hence, the mainconclusion in this proposition can be directly drawn, since

argT ∈C0

min γ(T ) = argT ∈C0

min

(w (|C0| − 1) +

∑n∈N

(1− un)

)

= argT ∈C0

max

(∑n∈N

Un

((sn)n∈N

)). (25)

Step S3:If the social optimal solution is unique, then there is only

one stochastically stable state for the perturbed Markov processMPε. According to Proposition 2, if the stochastically stablestate T ∈ Z is unique, then we have

limε→0

π(T , ε) = limε→0

Pr((sn)n∈N , (αn = 1)n∈N

)= lim

ε→0

∏n∈N

Pr(sn, αn = 1)

= limT→∞,ε→0

∏n∈N

t(sn, αn = 1)

T= 1, (26)

where t(sn, αn = 1) is the number of occurrences of thecorresponding state during period T . In Algorithm 2, eachplayer will choose the most frequently recorded strategy whichmakes its mood equal to 1 (as shown in (20)). Therefore, theunique efficient strategy profile can be achieved by applyingthe proposed distributed approach.

Now, the proof is completed. �

REFERENCES

[1] J. Huang, R. Berry, and M. Honig, “Distributed interference compensationfor wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 5,pp. 1074–1084, May 2006.

[2] L. Rose, S. Lasaulce, S. Perlaza, and M. Debbah, “Learning equilib-ria with partial information in decentralized wireless networks,” IEEECommun. Mag., vol. 49, no. 8, pp. 136–142, Aug. 2011.

[3] Q. Wu et al., “Distributed channel selection in time-varying radio environ-ment: Interference mitigation game with uncoupled stochastic learning,”IEEE Trans. Veh. Technol., vol. 62, no. 9, pp. 4524–4538, Nov. 2013.

[4] L. Rose, S. Perlaza, C. Le Martret, and M. Debbah, “Self-organization indecentralized networks: A trial and error learning approach,” IEEE Trans.Wireless Commun., vol. 13, no. 1, pp. 268–279, Jan. 2014.

[5] W. Kiess and M. Mauve, “A survey on real-world implementations ofmobile ad-hoc networks,” Ad Hoc Netw., vol. 5, no. 3, pp. 324–339,Apr. 2007.

[6] O. Aliu, A. Imran, M. Imran, and B. Evans, “A survey of self organisationin future cellular networks,” IEEE Commun. Surveys Tuts., vol. 15, no. 1,pp. 336–361, 2013.

[7] M. Peng, D. Liang, Y. Wei, J. Li, and H.-H. Chen, “Self-configurationand self-optimization in LTE-advanced heterogeneous networks,” IEEECommun. Mag., vol. 51, no. 5, pp. 36–45, May 2013.

[8] A. MacKenzie and S. Wicker, “Game theory and the design of self-configuring, adaptive wireless networks,” IEEE Commun. Mag., vol. 39,no. 11, pp. 126–131, Nov. 2001.

[9] F. Wang, M. Krunz, and S. Cui, “Price-based spectrum management incognitive radio networks,” IEEE J. Sel. Topics Signal Process., vol. 2,no. 1, pp. 74–87, Feb. 2008.

[10] A. Zappone, Z. Chong, E. Jorswieck, and S. Buzzi, “Energy-aware com-petitive power control in relay-assisted interference wireless networks,”IEEE Trans. Wireless Commun., vol. 12, no. 4, pp. 1860–1871, Apr. 2013.

[11] G. Bacci, E. V. Belmega, P. Mertikopoulos, and L. Sanguinetti, “Energy-aware competitive link adaptation in small-cell networks,” in Proc.Int. Workshop Resource Allocation Wireless Netw., Hammamet, Tunisia,May 2014, pp. 1–8.

3620 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 10, OCTOBER 2014

[12] M. Bennis, S. Perlaza, P. Blasco, Z. Han, and H. Poor, “Self-organizationin small cell networks: A reinforcement learning approach,” IEEE Trans.Wireless Commun., vol. 12, no. 7, pp. 3202–3212, Jul. 2013.

[13] P. S. Sastry, V. V. Phansalkar, and M. Thathachar, “Decentralized learningof Nash equilibria in multi-person stochastic games with incomplete in-formation,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 5, pp. 769–777,May 1994.

[14] Z. Han, C. Pandana, and K. Liu, “Distributive opportunistic spectrum ac-cess for cognitive radio using correlated equilibrium and no-regret learn-ing,” in Proc. IEEE WCNC, Kowloon, Hong Kong, 2007, pp. 11–15.

[15] M. Xiao, N. Shroff, and E. K. P. Chong, “A utility-based power-controlscheme in wireless cellular systems,” IEEE/ACM Trans. Netw., vol. 11,no. 2, pp. 210–221, Apr. 2003.

[16] J. Zhang and Q. Zhang, “Stackelberg game for utility-based coopera-tive cognitiveradio networks,” Proc. Proc. 10th ACM Int. Symp. MobileAd Hoc Netw. Comput., pp. 23–32, 2009.

[17] H. Lin, M. Chatterjee, S. Das, and K. Basu, “ARC: An integrated ad-mission and rate control framework for competitive wireless CDMA datanetworks using noncooperative games,” IEEE Trans. Mobile Comput.,vol. 4, no. 3, pp. 243–258, May/Jun. 2005.

[18] D. T. Ngo, L. B. Le, T. Le-Ngoc, E. Hossain, and D. I. Kim, “Distributedinterference management in two-tier CDMA femtocell networks,” IEEETrans. Wireless Commun., vol. 11, no. 3, pp. 979–989, Mar. 2012.

[19] Q. D. La, Y. Chew, and B.-H. Soong, “An interference-minimizationpotential game for OFDMA-based distributed spectrum sharing systems,”IEEE Trans. Veh. Technol., vol. 60, no. 7, pp. 3374–3385, Sep. 2011.

[20] Q. D. La, Y. Chew, and B.-H. Soong, “Performance analysis of down-link multi-cell OFDMA systems based on potential game,” IEEE Trans.Wireless Commun., vol. 11, no. 9, pp. 3358–3367, Sep. 2012.

[21] S. Buzzi, G. Colavolpe, D. Saturnino, and A. Zappone, “Potential gamesfor energy-efficient power control and subcarrier allocation in uplinkmulticell OFDMA systems,” IEEE J. Sel. Topics Signal Process., vol. 6,no. 2, pp. 89–103, Apr. 2012.

[22] C. Xu, M. Sheng, C. Yang, X. Wang, and L. Wang, “Pricing-based mul-tiresource allocation in OFDMA cognitive radio networks: An energyefficiency perspective,” IEEE Trans. Veh. Technol., vol. 63, no. 5,pp. 2336–2348, Jun. 2014.

[23] D. Monderer and L. Shapley, “Potential games,” Games Econ. Behavior,vol. 14, no. 1, pp. 124–143, May 1996.

[24] R. Cominetti, E. Melo, and S. Sorin, “A payoff-based learning procedureand its application to traffic games,” Games Econ. Behavior, vol. 70, no. 1,pp. 71–83, Sep. 2010.

[25] R. B. Myerson, Game Theory: Analysis of Conflict. Cambridge, MA,USA: Harvard Univ. Press, 2013.

[26] J. R. Marden, L. Y. Pao, and H. P. Young, “Achieving Pareto optimalitythrough distributed learning,” Dept. Econ., Univ. Oxford, Oxford, U.K.,Jul. 2011, Tech. Rep.

[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.Cambridge, U.K.: Cambridge Univ. Press, 2005.

[28] A. Goldsmith, Wireless Communications. Cambridge, U.K.: CambridgeUniv. Press, 2004.

[29] B. Babadi and V. Tarokh, “Gadia: A greedy asynchronous distributed in-terference avoidance algorithm,” IEEE Trans. Inf. Theory, vol. 56, no. 12,pp. 6228–6252, Dec. 2010.

[30] H. P. Young, “The evolution of conventions,” Econometrica, vol. 61, no. 1,pp. 57–84, Jan. 1993.

Min Sheng (M’03) received the M.S. and Ph.D.degrees in communication and information systemsfrom Xidian University, Shaanxi, China, in 1997 and2000, respectively.

She is currently a Full Professor at the BroadbandWireless Communications Laboratory, the Schoolof Telecommunications Engineering, Xidian Univer-sity. Her general research interests include mobilead hoc networks, wireless sensor networks, wirelessmesh networks, third generation (3G)/fourth genera-tion (4G) mobile communication systems, dynamic

radio resource management (RRM) for integrated services, cross-layer al-gorithm design and performance evaluation, cognitive radio and networks,cooperative communications, and medium access control (MAC) protocols. Shehas published two books and over 50 papers in refereed journals and conferenceproceedings.

Dr. Sheng was the New Century Excellent Talents in University by theMinistry of Education of China, and obtained the Young Teachers Award bythe Fok Ying-Tong Education Foundation, China, in 2008.

Chao Xu received the B.S. degree in electronic infor-mation engineering from Xidian University, Xi’an,China, in 2009, where he is currently working towardthe Ph.D. degree in communication and informationsystems with the Institute of Information and Sci-ence, Broadband Wireless Communications Labora-tory, School of Telecommunications Engineering.

From June to September 2014, he was a visitingstudent with the Singapore University of Technologyand Design, Singapore, under the supervision ofProf. Tony Q. S. Quek. His research interests focus

on dynamic radio resource management, cognitive radio and networks, energyefficient transmission, distributed algorithm design, and the applications ofgame theory and learning theory in wireless communications.

Xijun Wang (M’12) received the B.S. degree withdistinction in telecommunications engineering fromXidian University, Xi’an, Shaanxi, China, in 2005.He received the Ph.D. degree in electronic engi-neering from Tsinghua University, in January 2012,Beijing, China.

Since 2012, he has been with the School of Tele-communications Engineering, Xidian University,where he is currently an Assistant Professor. Hisresearch interests include wireless communicationsand cognitive radios and interference management.

Dr. Wang served as a Publicity Chair of IEEE/CIC ICCC 2013. He was a reci-pient of the 2005 “Outstanding Graduate of Shaanxi Province” Award, the Excel-lent Paper Award at 6th International Student Conference on Advanced Scienceand Technology in 2011, the Best Paper Award at IEEE/CIC ICCC 2013.

Yan Zhang (M’12) received B.S. and Ph.D. degreesfrom Xidian University, Xi’an, China, in 2005 and2010, respectively. He is currently an Associate Pro-fessor in Xidian University.

His research interests include cooperative cogni-tive networks, self-organizing networks, media ac-cess protocol design, energy-efficient transmissionand dynamic radio resource management (RRM) inheterogeneous networks.

Weijia Han (S’07–M’11) received the B.S. degreefrom Northwest University, China, the M.S. degreefrom Queen’s University Belfast, UK, and the Ph.D.degree from Xidian University, Xi’an, China. Heis currently a Lecturer in Xidian University, Xi’an,China.

His research interests include sensing in cognitiveradio networks, resource management and networkoptimization, cognitive media access protocol andalgorithm design.

Jiandong Li (SM’05) received the B.S., M.S., andPh.D. degrees in communications and electronic sys-tems from Xidian University, Xi’an, China, in 1982,1985, and 1991, respectively.

In 1985, he joined Xidian University, where hehas been a Professor since 1994 and the Vice-President since 2012. His current research inter-ests and projects consist of mobile communications,broadband wireless systems, ad hoc networks, cog-nitive and software radio, self-organizing networks,and game theory for wireless networks.

Dr. Li is a Senior Member of the China Institute of Electronics and a Fellowof the China Institute of Communication. He was a member of the PCN Special-ist Group for the China 863 Communication High Technology Program betweenJanuary 1993 and October 1994 and from 1999 to 2000. He is also a member of theCommunication Specialist Group for The Ministry of Industry and Information.