Sum Throughput Maximization in a Cognitive Multiple Access ... · throughput optimization in a fading multiple access cognitive radio (CR) network, where the secondary user (SU) transmitters

2332-7731 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCCN.2019.2908860, IEEETransactions on Cognitive Communications and Networking

Sum Throughput Maximization in a CognitiveMultiple Access Channel with CooperativeSpectrum Sensing and Energy Harvesting

Sinchan Biswas, Student Member, IEEE, Subhrakanti Dey, Senior Member, IEEE, and Amirpasha Shirazinia

Abstract—This paper focuses on the problem of sensingthroughput optimization in a fading multiple access cognitiveradio (CR) network, where the secondary user (SU) transmittersparticipate in cooperative spectrum sensing and are capableof harvesting energy and sharing energy with each other. Weformulate the optimization problem as a maximization of theexpected achievable sum-rate over a finite horizon, subject toan average interference constraint at the primary receiver, peakpower constraints and energy causality constraints at the SUtransmitters. The optimization problem is a non-convex, mixedinteger non-linear program (MINLP) involving the binary actionto sense the spectrum or not, and the continuous variables suchas the transmission power, shared energy and sensing time.The problem is analyzed under two different assumptions onthe available information pattern: (i) non-causal channel stateinformation (CSI), energy state information (ESI) and infinitebattery capacity, and (ii) the more realistic scenario of the causalCSI/ESI and finite battery. In the non-casual case, this problemcan be solved by an exhaustive search over the decision variableor a MINLP solver for smaller problem dimensions, and a novelheuristic policy for larger problems, combined with an iterativealternative optimization method for the continuous variables.The causal case with finite battery is optimally solved using adynamic programming (DP) methodology, whereas a number ofsub-optimal algorithms are proposed to reduce the computationalcomplexity of DP. Extensive numerical simulations are carriedout to illustrate the performance of the proposed algorithms. Oneof the main findings indicates that the energy sharing is morebeneficial when there is a significant asymmetry between averageharvested energy levels/channel gains of different SUs.

Index Terms—Energy harvesting, Cognitive Radio, MultipleAccess Channel, Spectrum Sensing, Fading Channel

I. INTRODUCTION

TWO of the essential and limited resources in wirelesscommunications are radio spectrum and energy. With the

advent of advanced data hungry mobile devices, the demandsfor both of them have increased significantly. The demandsfor spectrum usage have been increasing due to applicationsconsuming high bandwidth. The traditional rigid spectrumallocation policy is found to be fairly inefficient to mitigatethis problem [1]. To combat this issue, a dynamic spectrumallocation policy paradigm of cognitive radio (CR) networkhas been proposed in the literature [2]. In particular, in theinterweave paradigm of CR, the unlicensed secondary users(SU) are capable of actively sensing the licensed primary user

S. Biswas and S. Dey are with the Division of Signals and Systems, UppsalaUniversity, Sweden.

A. Shirazinia is Senior Data Scientist at Swedbank, Sweden.A related work [39] has been published in EUSIPCO 2016.

(PU) spectrum and using it whenever it is detected to benot utilized by the PU [3]. Similarly, the SUs have to vacatethe PU spectrum as soon as it senses PU activity. In such aframework, sensing of PU spectrum has become a significanttask. Significant research has been focused on different spec-trum sensing algorithms like energy detection, cyclostationarydetection, matched filter detection etc [4]. The main focusof such algorithms is to increase the efficiency of spectrumutilization, while protecting PU from harmful interference. Ina multiuser secondary network with varying degree of sensingperformances, it has been well established that cooperativespectrum sensing improves the PU detection performance. Acontemporary survey of energy-efficient cooperative spectrumsensing algorithms can be found in [5] (see also referencestherein).

Alongside spectrum utilization efficiency, the efficiency inutilizing the energy has also become a significant issue. Justlike the spectrum requirement, the communication overhead,signal transmission and detection at the receiver all requiresignificant amounts of energy depending on the application.The usual battery powered electrical sources are unable to fullyreplenish these ever-growing requirements because of its non-renewable feature. In many applications, it is difficult to re-place the batteries attached to the wireless devices periodicallybecause of their remote locations and the required labour costs.To address this issue, the usage of renewable energy has beenproposed in the literature [6]. Such energy from the wind, solaror other unconventional power sources are not only efficient interms of fulfilling the energy requirement but environmentallysustainable as well. This has led to extensive research activitiesin the domain of energy harvesting wireless communicationsand networking [7], [8].

Significant recent research has been directed towards theutilization of energy harvesting in wireless networks. Thetransmission policies have been investigated for minimizationof transmission time completion with infinite battery capacity[9] and finite sized battery [10]. Similar problems have beenaddressed in the context of different channel models. Theoptimal policies have been derived in the context of multiusersettings with relay channel [11], broadcast channels [12],interference channels [13] and multiple access channels [14].

CR network with energy harvesting capabilities effectivelyuse both of the above mentioned resources, which is the focusof the present work. There has been significant research in thisdomain. Opportunistic channel access and energy harvestingprocess has been studied in this framework in [15], [16].



2

Optimal spectrum sensing policy in an energy harvestingCR network has been investigated in [17]. Opportunities andchallenges in such a network has been surveyed in [18]. Theenergy harvesting feature creates a reliable energy efficientsystem, while the dynamic spectrum allocation leads to moreefficient spectrum utilization. The cooperative energy sharingfeature has been incorporated into the system to satisfy theenergy requirements of the more vulnerable users in terms ofaverage harvested energy.

A. Background

This paper focuses on the specific case of CR networks withSUs having energy harvesting and sharing capabilities. Thechannels between SUs and the fusion centre (FC) constitutea fading multiple access channel. For such a network, theachievable sum-rate is generally used as a throughput measure.Optimizing the achievable sum-rate in a CR network has beeninvestigated with respect to different fading channel modelsin [19], [20]. As spectrum sensing is one of the primaryfunctionalities in CR networks, the sensing throughput tradeoffhas been rigorously studied in [21]. Capacity analysis ofwireless systems with energy harvesting capabilities has beenstudied as well in [22]. Such analysis has also been extended tothe CR networks in the literature [23], [24]. In a separate fieldof literature, distortion minimization in a multi-sensor settinghas been studied in an energy harvesting wireless sensornetwork (WSN) with wireless energy transfer mechanisms[25].

There has been some recent research in CR networkswith SUs having energy harvesting capabilities. In [26], thespectrum sensing strategy is studied for a system with non-linear energy harvesting model. In [27], achievable throughputoptimization has been investigated with respect to sensing timeand sensing threshold, whereas [28] addresses the problemof optimizing the throughput with respect to sensing energy,transmission energy and spectrum sensing time. In [29], theauthors analyze the asymptotic activity behavior of a single SUin a hybrid energy harvesting scenario, and subsequently ana-lyze the optimal sum throughput with respect to optimizationparameters such as asymptotic active probabilities, sensingdurations and detection thresholds over a single time slot fora collection of heterogenous SUs, followed by an algorithmfor selecting the best subset of cooperating SUs. In [30], theauthors of [29] tackled the problem of throughput optimizationwith different sets of fusion rules. In [31], the same authorsaddressed the issue of protecting PU from collision with SUtraffic. In [32], the cross-layer problem involving physicallayer transmission power and network layer delay is addressedwith a game-theoretic formulation. In [33], throughput isoptimized with respect to the sensing threshold and the amountof energy being harvested. In [34], the problem of optimalspectrum sensing strategy is studied in energy harvesting CRnetworks. In [35], this optimization is investigated with respectto sensing time and energy harvesting time, considering a slot-ted energy arrival policy. In [36], the problem is studied in thescenario of multislot spectrum sensing and energy harvesting.In [37], the optimal energy scheduling problem is studied in

the fading multiple access channel with energy harvesting.In [38], optimal power allocation policy is analyzed whenonly causal information is available to the energy harvestingtransmitters.

We partially addressed the problem of throughput optimiza-tion in a CR fading multiple access channel with energyharvesting secondary transmitters in [39] in a cooperativespectrum sensing framework. It is obvious that there is a trade-off between the accuracy of spectrum sensing process and thethroughput measure. This is due to the fact that, if the sensingtime is long enough, then the sensing accuracy is high, but onthe other hand, the transmission time decreases, which in turnreduces the throughput. In [39], the optimization was carriedout with respect to the sensing time, SU transmission power,and the binary decision variables concerning (i) whether tosense (as sensing also costs energy) and (ii) the outcome ofthe spectrum sensing decision regarding whether to accessthe spectrum or not. In our current submission, we removethe binary variable regarding the spectrum access decision byconsidering explicit expressions for detection and false alarmprobability for the cooperative spectrum sensing process, andwe extend the paper to consider the additional feature ofenergy sharing between neighbouring sensors. The energysharing mechanism considered here is assumed to be direc-tional, which can be achieved via non radio-frequency wire-less power transfer over shorter distances or radio frequencybased wireless power transfer over longer distances employingenergy beamforming technology [40], [41]. In addition, asopposed to [39], where the PU interference at the secondarybase station was ignored, we consider the presence of PUinterference explicitly in the SU sum-throughput expressionwhen the PU is present. In this setting, we study the problemof jointly maximizing the achievable sum-rate for a CR fadingmultiple access channel with respect to the binary decisionvariables regarding whether each SU senses for the PU, andthe continuous variables such as the SU transmission powers,sensing times and energy transferred between neighbouringsensors under an average interference constraint at the PUreceiver, energy causality and peak power constraints at theSU transmitters.

B. ContributionsWe consider the problem of maximizing the achievable sum-

rate with respect to the four optimization variables mentionedabove. The specific contributions of this paper are listed below:• We solve the problem of expected sum throughput max-

imization over a finite time horizon in a fading mul-tiple access cognitive network where the SUs havingenergy harvesting and sharing capabilities. We explorethe problem in both finite and infinite battery capacitysettings, and with non-causal and causal informationpatterns. In the case where all channel conditions andharvested energy realizations are known non-causally forthe entire time horizon, an analysis is provided for theoptimization of the continuous variables (SU transmissionpower, shared energy and sensing times) for the infinitebattery capacity setting, using an iterative alternating op-timization approach, which only ensures locally optimal



3

solutions, as the problem is jointly non-convex in thecontinuous variables.

• In case of a non-causal information pattern, the over-all optimization problem is a mixed integer nonlinearprogramming problem, where the underlying nonlinearprogramming problem is non-convex. It is well knownthat (see [42], [43]) determining of a global solutionof the non-convex MINLP problem is NP-hard. Wedo not pursue global optimization methods to avoidtheir prohibitive computational complexity in the case oflonger horizons. Instead, for smaller problem dimensions(small number of SUs and a short time horizon), weuse an exhaustive search policy to find the optimalbinary variables regarding the SUs’ decision to sense thespectrum. We also provide a numerical study comparingthe results with those obtained using a global optimizationsoftware. For larger problem dimensions, using either theexhaustive search or global optimization is impractical,and therefore we propose a heuristic policy to solvethe mixed integer non-linear program with respect tothe spectrum sensing decisions resulting in a moderatecomputational complexity.

• The problem pertaining to the finite battery capacity withcausal CSI and ESI is studied using a dynamic program-ming based stochastic control algorithm resulting in theoptimal sequence of the binary spectrum sensing deci-sion variables, and discretized values of the continuousvariables in order to facilitate practical implementation.

• We also propose two other sub-optimal policies to mit-igate the exponential complexity associated with thedynamic programming (DP) algorithm. The first one isa moving-horizon limited look-ahead policy, which has arelatively low complexity especially when the length ofthe finite horizon is high. The second one is an ad-hocpolicy to find sub-optimal values of all three continuousoptimization variables without involving the dynamicprogramming algorithm, based on insights obtained fromthe non-causal case.

• Finally, we compare a non-adaptive policy for sensingtime optimization (i.e. a fixed sensing time is adoptedacross all time slots) with an adaptive sensing timeoptimization (where the sensing time is optimized in adynamical fashion across the time slots) in the abovementioned setting.

C. OrganizationThe rest of the paper is organized as follows. In Section

II we discuss the system model. In Section III we describethe optimization problem and the procedure for finding theoptimal solution. In Section IV we provide an analysis forthe optimal solution of the problem for the infinite batteryand non-causal CSI/ESI scenario. In Section V we discussthe proposed policy for causal CSI and finite battery scenariousing the dynamic programming algorithm. In Section VIwe propose two sub-optimal policies in order to mitigatethe complexity of dynamic programming algorithm. SectionVII contains simulation results followed by some concludingremarks in Section VIII.

N

N

Fig. 1: Figure of System Model

II. SYSTEM MODEL

We consider a system model (see Figure 1) comprised ofa PU, N SUs and a FC, the role of which can be performedby a secondary base station (SBS). We adopt a cooperativespectrum sensing model, where individual SUs sense the PUspectrum locally and transmit their local decisions to the SBS.The SBS collates all the information from the SUs and arrivesat a global decision, which in turn is broadcast to the SUs by acontrol channel, so that the SUs can decide to transmit or not.We assume that time is slotted as shown in Fig.2. the energyharvesting and sharing process is assumed to be independentof the sensing and data transmission process, as we do notrestrict the harvesting and sharing to be based on RF energyharvesting only, and these actions are carried out by the sensorvia a unit separate from the transceiver. We consider a finitetime horizon of M slots, where each slot is assumed to be Ttime units in size. In each slot, the PU is either present withprobability µ or absent with probability (1 − µ). In the kth

time slot, as illustrated in Fig. 2, τk time units are used forthe spectrum sensing process and the residual (T − τk) timeunits are used for the transmission process, provided the PUspectrum is found to be vacant. Since sensing costs energy,in the kth time slot, the ith SU decides to either sense thePU spectrum or be idle based on the binary decision variableai,k, 1 ≤ i ≤ N, 1 ≤ k ≤ M . ai,k ∈ {0, 1}, where 1 (or0) represents the decision to sense the PU spectrum (or not).Note that if Bi,k, the battery energy level of the ith SU at thebeginning of the kth time slot, is smaller than psτk, whereps is the sensing power, ai,k = 0. However, if Bi,k > psτk,ai,k ∈ {0, 1}.

A. Energy harvesting and battery dynamics

As indicated in Fig. 2, we assume an energy full-duplexenergy harvesting scenario [29], where each SU has sep-arate energy harvesting/sharing units and spectrum sensingand information transmitter/receivers so that these harvest-ing/energy sharing and spectrum sensing/information trans-mission/reception processes can occur simultaneously andindependently. This is not unusual where the SU can harvestenergy from non-radio frequency sources, such as solar, wind



4

Sensing time τk Transmission time (T − τk)

Time Slot: T time units

Energy harvesting and sharing Process

Fig. 2: Frame Structure

or vibrations etc. We also make the standard assumption thatthe energy harvested in slot k is available for use in slot k+1onward.

When the spectrum sensing decision ai,k = 1, the SUtransmits with a transmission power pi,k. With the assumptionof a finite battery size of Bmax at each SU, Bi,k can beexpressed as follows:

Bi,k+1 = min

{Bmax, Bi,k − Ei,k +Hi,k

−∑

m∈NT,i

Ti,m(k) +∑

m∈NR,i

ηm,iTm,i(k)

}(1)

where Hi,k is a random process denoting the amount of energyharvested by the ith SU in the kth time slot. Ti,m(k) is theamount of energy transferred from the ith SU to the mth

SU in the kth time slot. Ei,k, ηm,i,NT,i,NR,i represents theamount of energy used by the ith SU in the kth time slot, theenergy transfer efficiency from the mth SU to the ith SU, theset of SUs which ith SU shares its energy to and the set ofSUs that the ith SU received shared energy from, respectively.Although our subsequent analysis does not strictly need it, weassume that Hi,k is independent and identically distributed(i.i.d.) across time and independent across the SU terminals.

In the case where ai,k is a variable of optimization, Ei,k,can be subdivided into sensing energy and transmission energyas follows:

Ei,k =

{ai,kpsτk + pi,k(T − τk)

{(1− Pd(τk))µ+ (1− Pfa(τk))(1− µ)}

}(2)

where Pd(τk) and Pfa(τk) denote the probability of detectionand the probability of false alarm at the FC in the kth time slot,respectively. Note that in order to avoid an additional binarydecision variable regarding whether to access the spectrumor not, we consider the above energy consumption model,which is equivalent to accessing the spectrum with probability(1 − Pd(τk))µ + (1 − Pfa(τk))(1 − µ) in the kth slot.This approximation becomes exact when the time horizon Mbecomes sufficiently large. See [39] for a treatment where thespectrum access decision is also treated as a binary variable.

Applying the recursive definition of (1) we can write:

Bi,k = min

{Bmax, Bi −

k∑r=1

Ei,r +k−1∑r=1

Hi,r

−k∑r=1

∑m∈NT,i

Ti,m(r) +k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r)

}(3)

where Ei,k is defined in (2) and Bi represents the initialbattery state for the ith SU. In the case of infinite batterycapacity, the above expression in (3) simplifies to:

Bi,k = Bi −k∑r=1

Ei,r +k−1∑r=1

Hi,r −k∑r=1

∑m∈NT,i

Ti,m(r)

+k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r) (4)

B. Spectrum Sensing Model

If an SU decides to sense the PU spectrum, the spectrumsensing time τk for the kth time slot is divided into anumber of mini-slots, where the length of each mini-slot ispre-decided. In each mini-slot the SUs take sensing samplesand try to determine the PU spectrum availability by usinga binary hypothesis testing problem based on the followingsignal model:

H0 : yi,k,m = ni,k,m

H1 : yi,k,m =√qixk + ni,k,m (5)

where xk is the PU transmitted signal for the kth time slot,which is real valued and distributed as xk ∼ N (0, σ2

x). qi isthe channel power gain between the PU transmitter and the ith

SU receiver, which is assumed to be constant throughout thesensing process. yi,k,m and ni,k,m are the real valued receivedsignal and noise signal component respectively at the ith SUreceiver for the mth mini-slot of the kth time slot. The noiseis distributed i.i.d. as ni,k,m ∼ N (0, σ2

n). For the spectrumsensing rule, we use an energy detection policy [44] in eachSU, such that the local spectrum sensing decision at the kth

time slot in the ith SU is determined by the following rule:

θi,k = I(1

Sk

Sk∑m=1

y2i,k,m ≥ ε)

where I is the indicator function, Sk is the number of mini-slots in a particular time slot τk and ε is the detection threshold.However, we should note the fact that the number of mini-slotsSk being a discrete variable doesn’t imply that the spectrumsensing time τk being a discrete variable. Thus in general ifthe length of each mini-slot is τmini, a continuous variable,the corresponding number of mini-slots can be determined asSk =

⌊τk

τmini

⌋. All the local decisions θi,k from the SUs are

sent to the FC by control channels and collated using an ORlogic fusion rule to achieve the global decision θk. Note onceagain that our analysis is not restrictive to any particular typeof a fusion rule and other fusion rules such as the ”majoritylogic” rule can also be easily considered (see Remark 2 below).



5

Remark 1. A number of additional sources of energy con-sumption has been ignored in this work, such as circuit powerconsumption, energy consumption due to signalling overheadof side-information over control channels etc. The rationale isthat (i) the circuit power consumption is assumed to be muchlower compared to energy consumption in communication,especially over longer distances, and (ii) a non-trivial circuitpower consumption can be grouped together with the sensingenergy (whenever ai,k = 1) whereas energy consumptiondue to signalling overhead can be easily subtracted fromthe maximum battery capacity without affecting the analysis.The nature of the numerical results will not be affectedconsiderably as long as such energy consumption is smallcompared to energy consumed due to communication.

The probability of detection and false alarm for the ORfusion rule then can be written as Pd = Pr {θk = 1|H1} andPfa = Pr {θk = 1|H0}. Pd(τk) and Pfa(τk) is given by thefollowing:

Pd(τk) = 1−Πi∈Ak(1− p(i)d (τk))

Pfa(τk) = 1−Πi∈Ak(1− p(i)fa(τk))

where Ak is defined as the set Ak = {i : ai,k = 1}, andp(i)d (τk) and p

(i)fa(τk) are the probability of detection and the

probability of false alarm at the ith SU in kth time slot, giventhe sensing time τk and they can be computed by the followingexpressions [21]:

p(i)d (τk) = Q

{(ε

σ2n

− γi − 1)

√τkfs

2γi + 1

}

p(i)fa(τk) = Q

{(ε

σ2n

− 1)√τkfs

}(6)

In the above two equations fs is the sampling rate and γi =qiσ2n

is the signal to noise ratio corresponding to the channelbetween the PU transmitter and the ith SU receiver.

Remark 2. If majority logic fusion rule is implementedwith identical detection and false alarm probabilities pd, pfarespectively, for all SUs, then Pd and Pfa can be calculatedas follows:

Pd(τk) =

|Ak|∑l=b|Ak|/2c+1

(|Ak|l

)pld(1− pd)|Ak|−l

Pfa(τk) =

|Ak|∑l=b|Ak|/2c+1

(|Ak|l

)plfa(1− pfa)|Ak|−l

where |Ak| denotes the cardinality of the set Ak, and bxcdenotes the nearest integer less than x.

C. Communication channel models

We assume that the spectrum sensing channel gain (qi), andthe communication channel gains between the SU transmittersand PU receiver (gi,k), as well as the SU transmitters and theFC (hi,k) are i.i.d. exponentially distributed with unity meanunless otherwise stated. All channel gains are assumed to be

mutually statistically independent. Interference caused by thePU at the FC is treated as noise.

For easier readability we have summarized the relevantparameters for our subsequent problem formulation in TableI.

TABLE I: System Parameters

τk Time taken to perform the spectrum sensing in the kth

slotpi,k Transmission power for the ith SU in the kth time slotTi,m(k) Energy shared from the ith SU to the mth SU in the

kth time slotµ PU activity probabilityNT,i Set of SUs that the ith SU shares its energy toNR,i Set of SUs that the ith SU receives shared energy fromηm,i Energy transfer efficiency from the mth SU to the ith

SUhi,k Channel gain between the ith SU transmitter and the

FC in the kth time slotgi,k Channel gain between the ith SU transmitter and the

PU Receiver in the kth time slotqi Channel power gain between the PU transmitter and

and the ith SUps Power required to sense for the single unit of timePmax Peak power limit on pi,kBi Initial battery state for the ith SUHi,k Energy harvested for the ith SU for the kth time slotai,k Decision to sense the spectrum or not for the ith SU

in kth time slotQavg Average interference limit to the PUσ2 Noise variance in the channel between the SU trans-

mitter and the FCσ2in Interference caused by the PU presence

III. PROBLEM FORMULATION

In this section, we formulate the optimization problemconsidered in this paper, namely, an achievable sum rate max-imization for a SU multiple-access channel, under the abovementioned cooperative spectrum sensing framework, with anaverage interference constraint [45] at the PU receiver, andenergy harvesting as well as peak transmit power constraintsat the SU transmitters.

A. Achievable Sum Rate Maximization

The achievable sum-rate expression, also weighted by theprobability of spectrum access, has two components. Onecomponent corresponds to the case when the PU is active butnot detected by the spectrum sensing process, and the otherone corresponds to when it is not active and is also correctlydetected to be idle. The respective weighted achievable sum-



6

rate can be expressed as the following:

D1(pi,k, τk) =µ

M

T − τkT

(1− Pd(τk))

log2

{1 +

∑Ni=1 pi,khi,kσ2 + σ2

in

}D2(pi,k, τk) =

1− µM

T − τkT

(1− Pfa(τk))

log2

{1 +

∑Ni=1 pi,khi,kσ2

}(7)

where the PU interference (treated as noise) is given byσ2in. The sensing throughput maximization problem can be

formulated as the following (for all 1 ≤ i ≤ N , 1 ≤ k ≤M ):

maxai,k,pi,k,τk,Ti,m(k)

E

{M∑k=1

{D1(pi,k, τk) +D2(pi,k, τk)}

}(8)

s.t.1

M

M∑k=1

E

{T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k

}≤ Qavg

(9)0 ≤ pi,k ≤ Pmax;∀i, k; 0 ≤ τk ≤ T ;∀k0 ≤ Ti,m(k)∀i,m, k (10)k∑r=1

(ai,rpsτr + pi,r(T − τr)((1− Pd(τr))µ

+ (1− Pfa(τr))(1− µ))) ≤ Bi +k−1∑r=1

Hi,r

−k∑r=1

∑m∈NT,i

Ti,m(r) +k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r) a.s.;∀i, k

ai,k =

{0 if Bi,k < psτk

∈ {0, 1} otherwise a.s.;∀i, k (11)

where a.s. stands for almost surely.Remark It is important to note that the average interference

term in constraint has been normalized by the primary activityfactor µ, where Qavg = Q/µ, Q being the true averageinterference limit, since no interference is caused when theprimary is not active. We would also like to point out thatthe achievable sum-throughput expression used in the aboveoptimization problem for the multiple access channel hasthe underlying assumption that the codewords used for datatransmission at each SU are independently generated from thecorresponding statistically independent Gaussian codebooks[46], followed by successive decoding at the receiver.

The problem under consideration is a joint optimizationwith respect to the transmission power pi,k, transferred energyTi,m(k), sensing time τk and the decision to sense ai,k.The objective is to maximize the weighted achievable sumrate expression (8), with respect to an average interferenceconstraint ((9)) imposed over the entire time horizon, whereas(10) signifies the short term transmission power and sensingtime and transferred energy constraints imposed at each timeslot. The last constraint (11) is the energy causality constraintwhich ensures for every SU, Ei,k ≤ Bi,k ∀i, k, and states

that ai,k = 0 (no sensing is possible) when there is not enoughenergy to sense in the beginning of the k-th time slot, whereasai,k ∈ {0, 1} otherwise.

This optimization problem is a mixed-integer nonlinear pro-gramming (MINLP) problem with respect to the binary vari-able ai,k, and the continuous variables pi,k, τk and Ti,m(k). Inthis particular case, the nonlinear programming problem withrespect to the continuous variables is non-convex, and hencethe associated MINLP problem is NP-hard [42], [47]. In thenon-causal case where all channel and harvested energy infor-mation is known a priori, one can still use global optimizationtools, which demand exponential computational complexitywith respect to M,N . Consequently, for the case of smallM,N , we use an exhaustive search method to determine ai,k,followed by an alternative iterative optimization method for thecontinuous optimization variables as described below. Notingthat for a fixed τk, the resulting problem becomes jointlyconvex in pi,k, Ti,m(k) and can be solved efficiently. On theother hand for fixed pi,k and Ti,m(k), it results in a non-convex problem in τk, which can also be locally optimized.This iterative process is continued until a locally optimumsolution is found and the algorithm converges within a certaintolerance. We compare the performance of this approach witha global optimization tool based solution and reveal thatthe corresponding solutions are very close. For large valuesof M,N , both approaches quickly fail to be scalable. Notehowever, that a global optimization approach cannot be usedin the causal case, as this involves a dynamic programmingbased approach, where the associated value functions at eachiteration of the dynamic programming do not have a closedform expression.

B. Heuristic Policy for ai,kIn the case of medium to large M,N , one cannot employ the

exhaustive search method to optimize over the binary variableai,k. Instead, we propose a heuristic policy to determineai,k. The policy focuses on the fact that an upper bound onprobability of false alarm i.e. Pfa ≤ κ, for some κ > 0, which,in turn imposes a lower bound on sensing time as τl ≤ τk. Thevalue of such a lower bound τl can be determined according

to [21] τl = 1fs

{Q−1(Pfa)

ε

σ2n−1

}2

, where Q is the cumulative

distribution function of the standard Gaussian random variable.This heuristic policy proposes that if a particular SU has morethan enough energy in its battery to perform the spectrumsensing operation for τl time units, the decision to sense istaken affirmatively i.e. ai,k = 1, and vice versa. It should benoted that the decision making of spectrum access decisionvariable ai,k for this sub-optimal heuristic policy is takenby individual SUs without any centralized control, which isdifferent from the optimal policy. Thus for this policy, if aparticular SU decides not to sense i.e., ai,k = 0, implying thatthe i-th SU doesn’t have enough energy for sensing for the k-th time slot, therefore setting the transmission power pi,k = 0.In the optimal policy however, ai,k = 0 does not necessarilyimply pi,k = 0, as an SU may choose not to participate insensing, but still may access the spectrum if dictated by theoptimum solution.



7

In the following section we analyze the optimization prob-lem in the context of non-causal CSI and ESI and infinitebattery capacity. For a general M -horizon problem we providean analysis for determining pi,k and Ti,m(k) for a fixed τk,and discuss feasibility conditions for finding a locally optimalsensing time given fixed pi,k and Ti,m(k).

IV. NON-CAUSAL OPTIMIZATION WITH INFINITE BATTERY

We discuss the non-causal throughput maximization prob-lem over a finite time horizon in the context of infinite batteryin two different settings: adaptive and non-adaptive sensingtime scenarios. Section IV.A focuses on the adaptive sensingtime strategy, where the sensing time is optimized for eachtime slot, whereas Section IV.B describes the algorithm forthe optimization. We study the non-adaptive sensing timeoptimization strategy in Section IV.C, where the sensing timeis kept at a fixed optimized value for all time slots.

A. Adaptive Optimization with Non-Causal CSI

In this subsection we derive the optimal transmission power(pi,k) and shared energy (Ti,m(k)) while keeping the sensingtime τk fixed. The optimization problem under considerationis then a jointly convex problem in pi,k and Ti,m,(k). Inthe following subsection, we derive the optimal sensing timekeeping pi,k and Ti,m(k) fixed. The channel state informationgi,k, hi,k and the energy state information of the battery Hi,k

are assumed to be known non-causally for this scenario. Thefollowing derivation assumes that ai,k is pre-determined eitheraccording to an exhaustive search strategy or the heuristicpolicy proposed in Section III.B. The optimal values aredetermined using the KKT necessary conditions. As the overallproblem is non-convex, only local optima can be guaranteed.

1) Transmission Power and Transferred Energy Optimiza-tion: We first provide an analysis for obtaining the optimaltransmission power and shared energy parameters in the con-text of non-causal CSI and ESI, assuming the battery capacityis infinite.

The problem for optimizing the transmission power andtransferred energy for an M -horizon setting is the following:

maxpi,k,Ti,m(k)

M∑k=1

{D1(pi,k) +D2(pi,k)} (12)

s.t.1

M

M∑k=1

T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k ≤ Qavg (13)

0 ≤ pi,k ≤ Pmax;∀i, k, 0 ≤ Ti,m(k), ∀i,m, k (14)k∑r=1

(ai,rpsτr + pi,r(T − τr)

{(1− Pd(τr))µ+ (1− Pfa(τr))(1− µ)}) ≤ Bi +k−1∑r=1

Hi,r

−k∑r=1

∑m∈NT,i

Ti,m(r) +k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r);∀i, k

(15)

Remark: In the objective function, D1 and D2 are defined in(7). The expectations from (12) and (13) are removed withrespect to (8) and (9), because the random parameters suchas channel gains and energy arrival process are known non-causally in this case.

We formulate the Lagrangian for the problem as following:

L({pi,k} , {Ti,m(k)} , λ, {δi,k} , {αi,k} , {υi,m,k} , {βi,k})

=M∑k=1

D1(pi,k) +M∑k=1

D2(pi,k)

− λ

{1

M

M∑k=1

T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k −Qavg

}

+M∑k=1

N∑i=1

δi,kpi,k −M∑k=1

N∑i=1

αi,k(pi,k − Pmax)

+M∑k=1

N∑i=1

N∑m=1,m6=i

υi,m,kTi,m(k)−M∑k=1

N∑i=1

βi,k×{k∑r=1

(ai,rpsτr + pi,r(T − τr)

{(1− Pd(τr))µ+ (1− Pfa(τr))(1− µ)})−Bi −k−1∑r=1

Hi,r

+k∑r=1

∑m∈NT,i

Ti,m(r)−k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r)

}(16)

where ai,k satisfy the constraint in (11). λ, δi,k, αi,k, υi,m,kand βi,k are the non-negative dual variables associated withthe average interference constraint, transmission power lowerand upper limits, transferred energy constraints and energycausality constraint respectively.

We define C as a set of pi,k and Ti,m(k)’s which satisfy(13),(14) and (15). We also define the vector of Lagrangeparameters as λ = (λ, {δi,k} , {αi,k} , {υi,m,k} , {βi,k}). Thenthe Lagrange dual function is expressed as:

g(λ) = max{pi,k,Ti,m(k)}∈C

L({pi,k} , {Ti,m(k)} , λ, {δi,k} , {αi,k} ,

{υi,m,k} , {βi,k})(17)

The dual problem is defined as:

minλ�0

g(λ) (18)

where � denotes elementwise inequality. The above men-tioned dual problem can be solved using a sub-gradientmethod [48]. We determine the dual function g(λ) bysolving the KKT conditions for the optimization prob-lem, where the primary and dual solutions are denoted asp?i,k, δ

?i,k, α

?i,k, T

?i,m(k), υ?i,m,k, β

?i,k. The optimality conditions



8

are given as the following:

∂L∂pi,k

= χ1 + χ2 + δ?i,k − α?i,k

− T − τkMT

λgi,k(1− Pd(τk))(T − τk)((1− Pd(τk))µ

+ (1− Pfa(τk))(1− µ))M∑r=k

β?i,r = 0 ∀i, k

∂L∂Ti,m(k)

= 0 ∀i,m, k

λ

{1

M

M∑k=1

T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k −Qavg

}= 0

δ?i,kp?i,k = 0 ∀i, k; υ?i,m,kT

?i,m(k) = 0 ∀i,m, k

α?i,k(p?i,k − Pmax) = 0 ∀i, kλ � 0, (13),(14),(15) hold. (19)

where χ1, χ2 represents the gradients of the achievable sumrate with respect to pi,k when the PU is active and idle,respectively. The expression for χ1 and χ2 can be written as:

χ1 =dD1

pi,k=µ(T − τk)

MT log 2

hi,k(1− Pd(τk))

σ2 + σ2in +

∑kl=1 p

?l,khl,k

(20)

χ2 =dD2

pi,k=

(1− µ)(T − τk)

MT log 2

hi,k(1− Pfa(τk))

σ2 +∑kl=1 p

?l,khl,k

(21)

a) Optimal Transmission Power: From the above men-tioned KKT conditions we find p?i,k by following the sameapproach as [19]. To derive p?i,k we propose the followinglemma:

Lemma 1: Let i and j be two arbitrary SUs, wherei, j ∈ {1, 2, . . . , N} with p?i,k > 0 and p?j,k = 0, then thefollowing must be satisfied: di,k

λei,k+fi,k≥ dj,k

λej,k+fj,k∀k, where

di,k, ei,k, fi,k are given by the following expressions:

di,k =T − τkMT log 2

hi,k; ei,k =T − τkMT log 2

gi,k(1− Pd(τk))

fi,k = (T − τk)((1− Pd(τk))µ+ (1− Pfa(τk))(1− µ))

×M∑r=k

β?i,r

Proof. Please refer to Appendix A.

Now let π be a permutation over the set {1, 2, . . . , N} suchthat dπ(i),k

λeπ(i),k+fπ(i),k≥ dπ(j),k

λeπ(j),k+fπ(j),kwhen i < j, i, j ∈

{1, 2, . . . , N}. Suppose that there are |I| SUs that can transmitwhere I ⊆ {1, 2, . . . , N}. It can be checked that I ={π(1), π(2), . . . , π(|I|)}. The following lemma provides away to determine |I|.

Lemma 2: |I| is given by the largest value of x such that:

dπ(x),k

λeπ(x),k + fπ(x),k>

{µ(1− Pd(τk))

σ2 + σ2in +

∑x−1y=1 Pmaxhπ(y),k

+

(1− µ)(1− Pfa(τk))

σ2 +∑x−1y=1 Pmaxhπ(y),k

}−1

and the optimal transmission power for the π(I)th SU can beexpressed as p?π(|I|),k = min{Pmax, P ∗k }, whereas p?π(a),k =Pmax, a < |I| and p?π(a),k = 0, a > |I|, and P ∗k isthe solution to the nonlinear (quadratic) equation (43) (seeAppendix B).

Proof. Please refer to Appendix B.

b) Optimal Transferred Energy: The KKT conditionscorresponding to Ti,m(k) are as follows:

∂L∂Ti,m(k)

{≥ 0 for T ?i,m(k) = 0= 0 for T ?i,m(k) > 0

(22)

The necessary condition for the energy transfer is summarizedin the following lemma:

Lemma 4: If it is optimal to transfer energy from theith SU to the mth SU at time slot k, i.e. T ?i,m(k) > 0,then νm,k+1 = ηi,mνi,k, where νm,k is the inverted sum ofthe future Lagrangian multipliers and expressed as : νm,k ={∑M

r=k βm,r

}−1.

Proof. As according to the KKT condition (22), we know thatif T ?i,m(k) > 0 then ∂L

∂Ti,m(k) = 0, and υ?i,m,k = 0. Thus,

we have∑Mr=k β

?i,r = ηi,m

∑Mr=k+1 β

?m,r, which leads to the

required result.

Similar to [25], the above result can be used to show thatenergy transfer in both directions between a pair of SUs atthe same time slot cannot be an optimal since the efficiencyfactor ηi,m ≤ 1 .

2) Optimal Sensing Time: Fixing the transmission powerand the transferred energy would result in the optimizationproblem:

max{τk}∈(0,T )

M∑k=1

{D1(τk) +D2(τk)}

s.t.1

M

M∑k=1

T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k ≤ Qavg

k∑r=1


+ (1− Pfa(τr))(1− µ))) ≤ Bi +k−1∑r=1

Hi,r

−k∑r=1

∑m∈NT,i

Ti,m(r) +k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r), 0 ≤ τk ≤ T

(23)

where the lower bound on τk is changed to τl in case of theheuristic policy for determining ai,k.



9

We formulate the Lagrangian for the problem as follows:

L =M∑k=1

D1(τk) +M∑k=1

D2(τk)− λ(1

M

M∑k=1

T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k −Qavg)

−M∑k=1

N∑i=1

βi,k(k∑r=1


+ (1− Pfa(τr))(1− µ)))−Bi −k−1∑r=1

Hi,r

+k∑r=1

∑m∈NT,i

Ti,m(r)−k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r)) (24)

The gradient of the Lagrangian with respect to τk can bewritten as:

∂L∂τk

= (λω(3)k − ω

1k)(1− Pd)− ω(2)

k (1− Pfa)− ω(4)k

+ (λω(3)k + ω5

k − ω(1)k )(T − τk)

∂Pd∂τk

+ (ω(6)k − ω

(2)k )(T − τk)

∂Pfa∂τk

where ω(1)k , ω

(2)k ω

(3)k ω

(4)k ω

(5)k ω

(6)k are given by the following

expressions:

ω(1)k =

µ

MTlog

{1 +


in

}

ω(2)k =

(1− µ)

MTlog

{1 +

∑Ni=1 pi,khi,kσ2

}

ω(3)k =

1

MT

N∑i=1

pi,kgi,k

ω(4)k =

N∑i=1

(ai,kps − pi,k((1− Pd)µ

+ (1− Pfa)(1− µ)))M∑r=k

βi,r

ω(5)k =

N∑i=1

µpi,k

M∑r=k

βi,r; ω(6)k =

N∑i=1

(1− µ)pi,k

M∑r=k

βi,r

This optimization problem is not convex in τk and a locallyoptimum solution for the sensing time can be found by solvingthe necessary KKT conditions ∂L

∂τk= 0 for 0 < τk < T , while

∂L∂τk≥ 0 for τk = T and ∂L

∂τk≤ 0 for τk = 0 or τk = τl.

The feasibility of obtaining a locally optimal τk is guaran-teed if the following conditions are satisfied:

∂L∂τk|τk→T = (λω

(3)k − ω

1k)(1− Pd)

− ω(2)k (1− Pfa)− ω(4)

k > 0

∂L∂τk|τk→0 < 0 (25)

For a given set of parameters defining the optimization prob-lem, one needs to check whether there is a set of Lagrangeparameters λ, βi,r that satisfy the above mentioned conditions.To this end, one can build up a lookup table of the relevantLagrange parameters for given sets of optimization problemparameters (using numerical methods as no direct closed formrelationship exists), and use this for a feasibility check.

If feasible, the equation ∂L∂τk

= 0 can be solved by applyinga bisection search method in the interval 0 ≤ τk ≤ T orτl ≤ τk ≤ T , depending on the method for finding ai,k.

B. Algorithm for Locally Optimal Transmission Power, SharedEnergy and Sensing Time

In this section we summarize the above mentioned opti-mization procedure in an algorithmic form (Algorithm 1) forcomputing locally optimal values of the transmission power,sensing time and shared energy. We assume that there are pre-decided tolerance values ε and δ. C(pi,k, τk) represents theachievable sum rate corresponding to the transmission powerpi,k and sensing time τk.

Algorithm 1 Algorithm for calculation of optimal parametersfor fixed ai,k

1: Initialization: Choose a feasible initial sensing time τ0kfor all k ∈ {1, 2, . . . ,M}, tolerance of the algorithm δ,Lagrange parameter λ0 and step-size of the sub-gradientalgorithm ε. Choose also the initial achievable sum rateas 0.

2: repeat3: For l = 0, 1, . . . ( l is an iteration number)4: Determine the optimal transmission power pl+1

i,k bysolving the quadratic non-linear equation (43).

5: Compute the shared energy T l+1i,m (k) by solving the

following iteration:

T l+1i,m (k) =

{T li,m(k)−ε(

M∑r=k

βli,r−ηi,mM∑

r=k+1

βlm,r)

}+

(26)where βi,k’s are determined by a dual sub-gradientalgorithm.

6: Fixing the pl+1i,k and T l+1

i,k , the τ l+1k is solved by

applying a bisection search method.7: Solve for λ in the optimization problem (18) by a dual

sub-gradient method given as

λl+1 =

{λl − ε

{Qavg − E

{∑Mk=1

T−τ l+1k

MT (1 −

Pd(τl+1k ))

∑Ni=1 p

l+1i,k gi,k

}}}+

.

8: until Convergence |C(pl+1i,k , τ

l+1k ) − C(pli,k, τ lk)| ≤

δ ∀i,m, k, where C(pi,k, τk) is the achievable sum rateexpression from (8).

Convergence analysis: The objective function of the op-timization problem in (8) is non-decreasing from iterationto iteration of the alternating optimization procedure as ateach iteration it is maximized with respect to a parameter



10

by fixing the other parameters. It can be easily shown thatby considering only the short term constraints (10), in theextreme scenario, i.e, ai,k = 1 ,∀i, k,∀k, τ?k = 0 ,∀k andp?i,k = Pmax ,∀i, k, the achievable sum rate is upper boundedby the following expression:

E

{M∑k=1

µ

2Mlog2

{1 +

∑Ni=1 Pmaxhi,kσ2 + σ2

in

}

+M∑k=1

(1− µ)

2Mlog2

{1 +

∑Ni=1 Pmaxhi,k

σ2

}}

where the expectation is taken over hi,k. Note that the ad-ditional constraints can only limit the feasible set furtherand thus lower the objective function value. Using Jensen’sinequality, we can write:

E

{M∑k=1

µ

2Mlog2

{1 +

∑Ni=1 Pmaxhi,kσ2 + σ2

in

}

+

M∑k=1

(1− µ)

2Mlog2

{1 +

∑Ni=1 Pmaxhi,k

σ2

}}

≤M∑k=1

µ

2Mlog2

{1 +

∑Ni=1 PmaxE {hi,k}σ2 + σ2

in

}

+

M∑k=1

(1− µ)

2Mlog2

{1 +

∑Ni=1 PmaxE {hi,k}

σ2

}

=

M∑k=1

µ

2Mlog2

{1 +

∑Ni=1 Pmaxσ2 + σ2

in

}

+

M∑k=1

(1− µ)

2Mlog2

{1 +

∑Ni=1 Pmaxσ2

}(27)

Thus, the objective function is upper bounded by the achiev-able sum rate using a feasible policy involving ai,k, pi,k, τk.Since a non-decreasing sequence that is upper bounded mustconverge, it follows that the alternating optimization algorithmabove converges to a local optimum.

C. Non-adaptive Optimization with Non-Causal CSI

In this section we propose a policy (albeit sub-optimal)where instead of optimizing the sensing time for each indi-vidual time slot, we use a constant (but optimized) sensingtime for all time slots. In this case, the analysis for thetransmission power and transferred energy (for a fixed sensingtime) remains the same as before. A locally optimum non-adaptive sensing time can be found by solving the following

optimization problem (where pi,k, Ti,m(k) are kept fixed):

max{τ}

M∑k=1

{D1(τ) +D2(τ)}

s.t.1

M

M∑k=1

T − τT

(1− Pd(τ))N∑i=1

pi,kgi,kai,k ≤ Qavg

0 ≤ τ ≤ Tk∑r=1

(ai,rpsτ + pi,r(T − τ)((1− Pd(τ))µ

+ (1− Pfa(τ))(1− µ))) ≤ Bi +k−1∑r=1

Hi,r

−k∑r=1

∑m∈NT,i

Ti,m(r) +k−1∑r=1

∑m∈NR,i

ηm,iTm,i(r) (28)

We can formulate a Lagrangian L∞ similar to (24), and solvefor a locally optimum τ∗ by solving ∂L∞

∂τ = 0. Feasibilityand convergence analysis of this setting can be carried out ina similar fashion to the adaptive case.

V. CAUSAL OPTIMIZATION WITH FINITE BATTERY

In this section, we consider the finite horizon achievablesum rate optimization problem under the more realistic as-sumption of causal CSI and ESI, and a finite battery setting.Similar to the previous section, this optimization can be donefor non-adaptive and adaptive sensing time both. For spacelimitations, we only consider the adaptive sensing time strategybelow.

A. Achievable Sum Rate Optimization with Causal CSI andESI

The throughput optimization in the context of causal CSIand ESI with finite battery is a stochastic control problemand can be solved by dynamic programming (DP) techniques.Note that in the case of DP, at each stage, it is natural toconsider the optimization over ai,k in an exhaustive search,along with optimization over discretized values of all thecontinuous optimization variables for facilitating numericalimplementation.

1) Information Pattern: In each time slot the FC receivesthe CSI between the PU transmitter and the SU receiversgk = {g1,k, g2,k, . . . , gN,k} causally with the assumption of apriori knowledge of the PU transmission levels and channelreciprocity. The CSI between the SU transmitter and the FChk = {h1,k, h2,k, . . . , hN,k} is also received at the FC bychannel training, estimation and feedback mechanisms. Thebattery state information of the individual SUs are collectedat the FC and collated as Bk = B1,k, B2,k, . . . , BN,k. Theinformation available at the kth time slot is given by the tupleJk = {gk,hk,Bk,Jk−1}.



11

2) Dynamic Programming Algorithm: We denote the im-mediate per-stage cost function for the stochastic controlproblem as

C(pi,k, τk) = D1(pi,k, τk) +D2(pi,k, τk)

−λ

{T − τkMT

(1− Pd(τk))N∑i=1

pi,kgi,k −Qavg

}(29)

Here, λ is the Lagrange parameter corresponding to theaverage interference constraint. D1 and D2 are defined in (7).

Combining all the other short term constraints on thetransmission power, shared energy and sensing time, and theenergy causality constraint, along with the binary search spaceof ai,k, we define the feasible set for the optimization variablesas:

A =

{(ai,k, pi,k, Ti,m(k), τk) : ai,k, pi,k, Ti,m(k), τk

satisfy(10), (11)

}For a fixed λ, the optimal values of transmission power,sensing time and energy shared can be determined by thefollowing theorem:

Theorem 1. If the initial condition is J1 = {g1,h1,B1},then the value of the finite horizon finite battery problem withcausal information is given by V1(g1,h1,B1), which can becomputed by the backward Bellman dynamic programmingequation:

Vk(gk,hk,Bk) = max(ai,k,pi,k,Ti,m(k),τk)∈A

[C(pi,k, τk)

+ E[Vk+1(gk+1,hk+1,Bk+1)|ai,k, pi,k, Ti,m(k), τk]],

k = 0, 1, . . . ,M − 1 (30)

where the terminal condition is given by VM (gM ,hM ,BM ) isgiven as the solution to the optimization problem (8)-(11) for asingle slot k = M with the causal knowledge of gM ,hM ,BM .

Proof. The proof can be obtained by the standard optimalityconditions for the finite horizon stochastic control problem[49], and is omitted for brevity.

The solution of the causal optimization problem, which canbe computed numerically, is obtained as:{

a?i,k, p?i,k, T

?i,m(k), τ?k

}= argmaxai,k,pi,k,Ti,m(k),τk∈A

[C(pi,k, τk) + E[Vk+1(gk+1,hk+1,Bk+1)|ai,k,pi,k, Ti,m(k), τk]] (31)

3) Sub-gradient Method for the Lagrange parameters: Theoptimal value of λ from (29) can be found by solving thefollowing equation:

λ

{E

{M∑k=1

T − τkMT

(1− Pd(τk))N∑i=1

pi,kgi,k

}−Qavg

}= 0

(32)

This is achieved by using a sub-gradient algorithm [36] byupdating the value of λ

λl+1 =

{λl − αl

{Qavg

− E

{M∑k=1

T − τkMT

(1− Pd(τk))N∑i=1

pi,kgi,k

}}}+

(33)

where l is the iteration index and αl is a scalar step sizeparameter for the lth iteration satisfying

∑∞l=1 α

l = ∞ and∑∞l=1(αl)2 <∞, and {x}+ = max {x, 0}.Note that the DP algorithm (31) and the subgradient based

update (33) are performed offline, iteratively until conver-gence, purely based on the statistics of the channel gainsand harvested energy information. Since there is no closedform expression to the optimal solutions, one needs to dis-cretize the search space for the continuous control variablespi,k, Ti,m(k) and τk, and the state variables gk,hk,Bk. Withthese discretized search space, the FC creates a lookup tablewhich is shared with all SUs. In real time the FC receives thecontinuous valued channel gains and battery states and checkfor the closest quantization point in its lookup table. The tableindex for the optimal sensing time, transmission power andenergy shared is fetched from the look-up table and sent tothe individual SUs, and then used by the SUs for sensing,information transmission and energy sharing.

VI. SUB-OPTIMAL POLICIES

In order to apply the dynamic programming algorithm fora fixed value of λ, we discretize both the state space parame-ters gk,hk,Bk and action space parameters pi,k, Ti,m(k), τk,where i = 1, 2, . . . , N and k = 1, 2, . . . ,M . If we considerA and S to be the sets of discretized action (control) spaceand state space respectively for a given time slot (consideringall the users together), then the complexity of the dynamicprogramming algorithm becomes O

((2N |A||S|)M

), where

the factor 2N comes from the binary variable ai,k for Nsecondary users. To combat this exponential complexity, in thissection we propose a number of low-complexity suboptimalpolicies.A. Limited Look-ahead Policy

Here we propose a limited 2-Horizon look-ahead policybased on the approximate dynamic programming literature[39]. For this policy, only a two horizon lookup table in thebackward Bellman algorithm is created. For any time slot, thecausal optimization is performed with the help of the abovementioned lookup table considering only one future time slotat a time.

B. Ad-Hoc Policy

Here we formulate a sub-optimal policy based on the nec-essary conditions determined in the Section IV for non-causalinformation with infinite battery and extend that approach tothe causal information scenario with finite battery capacity.To alleviate the complexity associated with optimizing theachievable sum-rate over all time slots, we implement a greedyalgorithm and formulate the optimization problem in terms of



12

individual time slots. The optimization of the sensing timeτk is done through a discretization process similar to the DPalgorithm. The parameter corresponding to the decision tosense ai,k is determined using the heuristic policy proposedin Section III.B. The simplified optimization problem overindividual time slots becomes:

maxpi,k

µ

M

T − τkT

(1− Pd(τk)) log2

{1 +


in

}

+1− µM

T − τkT

(1− Pfa(τk)) log2

{1 +

∑Ni=1 pi,khi,kσ2

}

s.t.T − τkT

(1− Pd(τk))N∑i=1

pi,kgi,k ≤ Qavg (34)

0 ≤ pi,k ≤ Pmax∀i, k; 0 ≤ τk ≤ T ∀k; 0 ≤ Ti,m(k) ∀i,m, k(ai,kpsτk + pi,k(T − τk)

{(1− Pd(τk))µ+ (1− Pfa(τk))(1− µ)}) ≤ Bi,k (35)

We can solve this problem using the same approach as in[19]. Using the techniques used in Section IV, we concludethe following:

• Only one SU in the set {1, 2, . . . , N} would be able totransmit power from the open set (0, Pmax). All the otherSUs either transmit with Pmax or do not transmit in thetime slot under consideration.

• If i 6= j, and pi,k > 0 and pj,k = 0, then hi,kλ?µgi,k+δ∗i

≥hi,k

λ?µgi,k+δ∗j, where λ? and δ∗i are the optimal Lagrange

parameters corresponding to (34) and (35), respectively.• The above inequality also holds for the scenario whenpi,k = Pmax and pj,k ∈ [0, Pmax).

Thus we infer that for such choices of i and j , the followingcondition holds:{

gj,khj,k− gi,khi,k

}≥ 1

λ?µ

{δ?ihi,k−

δ?jhj,k

}

If we fix λ? > 0 and δ?j = 0 then we can have hi,kgi,k≥ hj,k

gj,k.

Using the above mentioned policy we sort all the SUs ac-cording to the descending order of the ratio of their direct andinterference channel gains. Starting with the SU correspondingto the highest ratio, we determine the SU transmission powerof the according to the following expression:

p?i,k = min

{Pmax,

TQavg(T − τk)gi,k(1− Pd(τk))

,

Bi,k − psτk(T − τk) {(1− Pd(τk))µ+ (1− Pfa(τk))(1− µ)}

}(36)

If there is still energy left in the battery after the transmissionprocess, the rest of the energy is shared to the SU withhighest µhµg , ensuring that there is no battery overflow [25]. The

optimal transmission energy shared is given by the followingexpression:

T ?i,m(k) = min

{max

{{Bmax −Bm,k + Em,k

−2µHm

}/ηi,m; 0

};Bi,k

}(37)

C. Computational complexity

Heuristic Policy for ai,k: The proposed heuristic policy forfinding the spectrum access decision variable ai,k ensures thatindividual SUs can determine their corresponding spectrumaccess decision ai,k in a decentralized fashion. Comparedto the optimal policy, no look-up table is required in theFC for specifically optimizing ai,k for the heuristic policy.Thus in this scenario, the value of ai,k can be instantaneouslycomputed as the indicator variable IBi,k≥psτl , resulting in acomputational complexity of O(1) for a given i, k. However,for optimizing the rest of the optimization variables like τk,pi,k and Ti,m(k) look up table needs to be created and storedin FC as usual via dynamic programming, resulting in an ex-ponential complexity of O

((|A||S|)M

). Therefore the overall

complexity of the heuristic algorithm is O(MN(|A||S|)M

).

Limited Look ahead Policy: For this sub-optimal policy,the look up table is only stored for two time slots. Thus the cor-responding computational complexity is O(M(2N |A||S|)2).Ad-hoc Policy: For the ad-hoc policy, the value of thetransmission power pi,k, Ti,m(k) are obtained by the closedform expressions in (36-37), for each k, while ai,k is computedusing the heuristic policy and τk is computed using a set ofdiscretized values, say of cardinality |Aτ |. In this case, thecomplexity is O(MN2(|Aτ ||S|)M ).

VII. SIMULATION RESULTS

In this section we present some numerical results to illus-trate the performance of the various algorithms consideredin the previous sections. We assume the energy harvestingprocess at the SUs is an exponentially distributed randomprocess with a mean of 1 µ J, unless otherwise stated. The PUactivity probability is set to µ = 0.8 (we also present someresults for µ = 0.2 later in this section). The sensing channelsignal to noise ratio (SNR) is assumed to be −15dB. The PUsignal variance is taken to be σ2

x = 1 mW. The length of atime slot is taken as T = 2 ms, and the individual mini-slotlength is 0.001 ms. The probability of false alarm limit of theheuristic policy κ is taken to be 0.1. The sampling frequencyis assumed to be 1 MHz and the normalized threshold ofdetection is assumed to be ε

σ2n

= 1.006. This corresponds toa minimum sensing time limit τl = 0.05 ms. This constrainton the minimum sensing time has been applied only to theheuristic policy based methods.

We assume a secondary network with N = 2 users andwith a maximum transmission power of 10 mW each. ThePU interference and noise variance for the channel betweenthe SUs and the FC is taken to be σ2

in = σ2 = 4 × 10−10

W. Assuming the maximum average received SNR at the FC



13

of approximately 5 dB (with the primary interference active),this corresponds to a path loss of approximately −87 dB witha path loss exponent of 2.7 over a distance of 100 metresbetween the SU and the FC, using the simplified path lossmodel from [p. 46, [50]], where the reference distance forthe antenna far field is 10 m, and transmission frequency at2.4 GHz. The energy transfer efficiency for energy sharing isassumed to be 0.4. The average interference limit (normalizedby the path loss factor between the SUs and the PU receiver)Qavg = 5 mW. The initial battery level for each SU is assumedto be 0.4 µ J. For the dynamic programming simulationthe state space parameters gk,hk,Hk and the action spaceparameters pi,k, Ti,m(k), τk are quantized into 5 differentdiscrete levels.

In Fig.3, we plot the average sensing time (averaged overthe time horizon) for the non-causal CSI and ESI denoted byτavg with respect to the average harvested energy denoted byµH , keeping the mean of the channel mean gains denoted byµg and µh constant, for M = 2, 3, 4. The averages are takenover 50 Monte-Carlo simulations. From Fig.3, it is evidentthat the average sensing time τavg decreases monotonicallyby increasing the length of the horizon. As a numericalcomparison, for mean harvested energy µH = 3.5µJ , thenormalized average sensing times corresponding to M = 3and M = 4 are 12.1 and 31.7 percent less compared toM = 2 respectively. This is due to the fact that increasing thelength of horizon M in the non-causal CSI and battery statescenario helps to spread out the sensing time over multipletime slots, which means on average the transmission timeincreases and the sensing time decreases with increasing M .A similar behaviour of τavg is observed when plotted againstincreasing µh/µg , keeping µH fixed, and hence not shownhere.

Fig.4 compares the average throughput with respect tothe ratio of mean channel gain µh/µg keeping µH fixedfor horizon length M = 2 and M = 3 with non-causalinformation. The spectrum sensing decision variable is opti-mized using both the exhaustive search and a MINLP solvercalled MIDACO [51]. From the figure, it is noticeable thatboth approaches result in approximately the same throughput.Although the MINLP solver is computationally faster, neitherof these methods is scalable for moderate to large values ofM,N .. Fig.5 shows the average throughput plotted against thebattery capacity Bmax for the adaptive and the non-adaptiveoptimization policy with non-causal and causal information.These simulations are averaged over 100 Monte-Carlo iter-ations. We can notice that for Bmax = 1.4µJ the averagethroughput corresponding to causal non-adaptive policy is 17.5percent less compared to its adaptive counterpart. As expected,the adaptive policy out-performs its non-adaptive counterpartat the expense of higher computational complexity.

Fig.6 shows the optimized average throughput, i.e. averagethroughput plotted against battery capacity Bmax with non-causal CSI and battery state scenario with the exhaustivesearch technique (for the decision to sense), causal CSI andESI with exhaustive search and heuristic policy (for thedecision to sense). It is noticeable that for M = 4 andBmax = 1.4µJ the average throughput corresponding to

causal adaptive policy and heuristic policy are 5 and 11.7 per-cent less compared to its non-causal counterpart, respectively.As expected the average throughput increases with increasinghorizon length and non-causal CSI/ESI scenario provides anupper bound for the causal counterpart. As the heuristic policyis less computationally complex, we are able to simulate theaverage sum throughput with causal information with thispolicy for horizon M = 10, 15, 20 in Fig. 9. As expected, it isseen that the average sum throughput increases with increasinghorizon length.

3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4

Average Harvested Energy H

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

No

rma

lize

d A

ve

rag

e S

en

sin

g t

ime

avg/T

Horizon M=2

Horizon M=3

Horizon M=4

Fig. 3: Normalized average sensing time τavg vs Averageharvested energy µH with ratio of SU Tx-FC Rx and SU Tx-PU Rx average channel gain µh/µg fixed.

3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4

Mean channel gain ratio h/

g

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Ave

rag

e T

hro

ug

hp

ut

(Bits/s

ec/H

z)

Horizon M=3 Exhaustive Serach Optimization

Horizon M=3 Optimization with MINLP solver

Horizon M=2 Exhaustive Serach Optimization

Horizon M=2 Optimization with MINLP solver

Fig. 4: Average Throughput vs ratio of SU Tx-FC Rx and SUTx-PU Rx average channel gain µh/µg with average harvestedenergy µH fixed with Exhaustive Search and MINLP solver

We have compared the policies proposed in our previouswork in [39] with the ones introduced in this present workin Fig.7. The figure includes plots for average throughputwith respect to battery capacity Bmax for Non-causal, causaland heuristic policy with horizon length M = 4. The plots



14

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.15

0.2

0.25

0.3

0.35

0.4

0.45A

ve

rag

e T

hro

ug

hp

ut

(Bits/s

ec/H

z)

M=4 Non-Causal Adaptive Policy

M=4 Non-Causal Non-Adaptive Policy

M=4 Causal Adptive Policy

M=4 Causal Non-Adaptive policy

Fig. 5: Average Throughput vs Battery capacity Bmax forAdaptive and Non-Adaptive Policy for Horizon length M=4.

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Avera

ge T

hro

ughput (B

its/s

ec/H

z)

M=4 Non-Causal Policy

M=4 Causal Policy

M=4 Heuristic Policy


M=3 Causal Policy



M=2 Causal Policy

M=2 Heusristic Policy

Fig. 6: Average Throughput vs Battery capacity Bmax Non-Causal, Causal and Heuristic Policy.

show that the policies introduced in this work outperformstheir counterpart policies introduced in [39]. For a numericalcomparison, for Bmax = 0.8µJ , the average throughputcorresponding to the causal policy introduced in [39] is 13.3percent less than its counterpart introduced in this paper. Thisis due to the fact that in our previous work, one additionaloptimization variable was spectrum sensing decision θk, whichcould only take binary values, whereas in our present work,the problem formulation includes parameters like probabilityof detection Pd and probability of false alarm Pfa, whichare continuous functions of sensing time parameter τk. Wehave also simulated comparative plots for Non-causal, causaland heuristic policy when the PU traffic is comparatively lightin Fig.8. For this figure, we have assumed the PU activityprobability to be µ = 0.2. The plots show that the averagethroughput for this scenario is higher compared to the scenarioinvolving µ = 0.8. This is due to the fact that PU spectrum

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Ave

rag

e T

hro

ug

hp

ut

(Bits/s

ec/H

z)

M=4 Non-Causal Proposed Policy

M=4 Causal Proposed Policy

M=4 Heuristic Proposed Policy

M=4 Non-Causal Policy in [39]

M=4 Causal Policy in [39]

M=4 Hesuristic Policy in [39]

Fig. 7: Average Throughput comparison between Proposedpolicy and the policy in [38] for Non-Causal, Causal andHeuristic Policy vs Battery Capacity Bmax for Horizon lengthM=4.

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

Ave

rag

e T

hro

ug

hp

ut

(Bits/s

ec/H

z)


M=4 Causal Policy


Fig. 8: Average throughput for Non-Causal, Causal andHeuristic Policy vs Battery capacity Bmax for Horizon lengthM=4 and PU activity probability µ = 0.2.

availability for µ = 0.2 is much more significant compared tothe case when µ = 0.8.

Fig.10 shows the ratio of average throughput betweenheuristic and non-causal policy for horizon length M = 4with respect to varying level of battery capacity Bmax. Wealso provide an upper bound based on a feasible policy for thenon-causal case for this ratio of average throughput betweenthe heuristic and the non-causal policy. We assume that thesensing time for every time slot to be τk = τl. We also assumethat no energy is shared between the SUs. Assuming that eachSU transmits the same amount of power at every time instant,we can derive the following expression as an upper bound of



15

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6

Bmax

( J)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1A

ve

rag

e T

hro

ug

hp

ut

(Bits/s

ec/H

z)

Horizon M=10 Heuristic Policy



Fig. 9: Average Throughput vs Battery capacity Bmax Heuris-tic Policy for Horizon length M=10,15,20.

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Th

rou

gh

pu

t R

atio

be

twe

en

He

uristic a

nd

No

n-c

au

sa

l P

olic

y

A Upper Bound on Throughput Ratio

Simulated Throughput Ratio

A Lower Bound on Throughput Ratio

Fig. 10: Throughput Ratio between Heuristic and Non-CausalPolicy vs Battery Capacity Bmax for Horizon length M=4.

pi,k.

pi,k ≤ min

{Pmax,

Qavg

N T−τlT (1− Pd(τl))

∑Mk=1

∑Ni=1 gi,k

,

min{Bmax, Bi +

∑k−1r=1 Hi,r − psτl(

∑kr=1 ai,r)

}k(T − τl)((1− Pd(τl))µ+ (1− Pfa(τl))(1− µ)

}(38)

We also assume that at every time slot at least one SU decidesto sense, the probability of which is very high when thenumber of SUs becomes large. This allows us to use thetightest upper bound which satisfies all such MN constraintsin (38) for the transmission by all SUs at all time slots. Wepropose a lower bound on the throughput ratio, by usingthe upper bound expression for the non-causal policy fromequation (27). The expression in (27) assumes that the inter-ference constraint is absent from the optimization problem,

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bmax

( J)

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

Avera

ge T

hro

ughput (B

its/s

ec/H

z)

Horizon M=4 Causal Adptive Policy

Horizon M=4 Causal Limited Lookahead Policy

Horizon M=4 Causal Ad-hoc Policy

Fig. 11: Average Throughput vs Battery capacity Bmax Hori-zon length M=4 with Causal information for Adaptive, Limitedlook-ahead and Ad-Hoc Policy

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Energy Transfer Efficiency

0.8

0.85

0.9

0.95

1

1.05

1.1

Avera

ge T

hro

ughput (B

its/s

ec/H

z)

Horizon M=4 Non Causal Adaptive Policy

Horizon M=4 Causal Adaptive Policy


Fig. 12: Average Throughput vs Energy Transfer Efficiency ηHorizon length M=4 with Non-Causal, Causal and HeuristicPolicy.

resulting in an enlarged feasible set of transmission powerand sensing time. Thus the corresponding lower bound israther loose compared to its upper bound counterpart. Thethroughput ratio between the Heuristic and Non-causal policyincreases gradually with battery capacity Bmax, because asthe battery capacity Bmax increases, on average the amountof energy that can be stored without overflow in the batteryalso increases. This further makes the optimal solutions fornon-causal scenario less superior compared to its heuristiccounterpart, as both policies have more energy at their disposalfor the sensing, transmission and energy sharing process.

In Fig. 11, we compared the average throughput with respectto the battery capacity Bmax with causal information andadaptive sensing time, limited look-ahead policy and ad-hocpolicy. From the simulation, we notice that for M = 4 andBmax = 1.4µJ the average sum throughput correspondingto causal limited look-ahead policy and ad-hoc policy are



16

13.7 and 20 percent lower compared to its causal adaptivecounterpart respectively. Between the limited look-ahead andad-hoc policy the first one is more computationally complex,but performs better in terms of average sum throughput,which again illustrates the trade-off between complexity andperformance.

In Fig.12, we use an asymmetric energy harvesting andchannel gain model. This situation is different compared toall the previous figures, where we assumed that the meanchannel gains µg and µh and mean harvested energy µH arethe same for the two SUs. Intuitively, SUs do not have anyincentive to share energy with their neighbours if their CSIand ESI are symmetric. We assume that the mean channelgain between the SU transmitters and the PU receiver, µg = 1for both SUs, but the mean channel gain between the SUtransmitters and the FC is different for different SUs. Wetake µ1

h = 1 and µ2h = 10. The harvested energy mean is

taken as µ1H = 10µJ and µ2

H = 1µJ . The battery capacityis assumed to be Bmax = 1.5µJ . The figure shows thatthe average sum throughput indeed increases with increasingenergy transfer efficiency, indicating an increased incentive toshare energy in the asymmetric case. This increase is moreprominent in the non-causal and causal dynamic programminggraphs as compared to the heuristic policy, where the increaseis marginal in nature.

VIII. CONCLUSIONS

This paper deals with a mixed integer non-linear pro-gramming problem of maximizing a finite horizon expectedachievable sum rate of a cognitive multiple access channelconsisting of energy harvesting SUs under long term averageinterference constraints at the PU with a cooperative spectrumsensing model, where the SUs are also capable of directionalenergy sharing with their neighbours. We first investigatedthis optimization problem with non-causal CSI and infinitebattery settings, where the integer variable for decision tosense is optimized by an exhaustive search approach whenthe horizon length and number of SUs are small, and a lesscomputationally complex heuristic policy in the case of longerhorizon lengths and moderate to large number of SUs. Inthe non-causal case, we also provided analytical solutionsfor the sensing time, transmission power, energy shared. Theperformance in the non-causal scenario is used to providean upper bound on the performance of its causal CSI (andESI) counterpart. The causal CSI and ESI scenario withfinite battery setting is handled using dynamic programmingtechniques. To combat the curse of dimensionality of dynamicprogramming, we also propose two sub-optimal policies forthe scenario involving causal information, namely limited look-ahead and ad-hoc policy. Through simulations we foundthat the energy sharing is only beneficial in the situationinvolving asymmetric energy harvesting and channel conditionscenarios, i.e, when some SU is harvesting significantly lesson average compared to other SUs, but has a better channelfor data transmission.

APPENDIX APROOF OF LEMMA 1

Proof. Since p?j,k = 0 and p?i,k > 0 from the KKT conditionit follows that δ?i,k = 0, α?j,k = 0, and δ?j,k ≥ 0, α?jik ≥ 0.Using the KKT optimality conditions, it can be deduced aftersome basic algebra that:

(T − τk)

MT log 2hi,k

{µ(1− Pd(τk))

σ2 + σ2in +

∑Nl=1 p

?l,khl,k

+(1− µ)(1− Pfa(τk))

σ2 +∑Nl=1 p

?l,khl,k

}≥ T − τk

MTλgi,k(1− Pd(τk))

+(T−τk) {(1− Pd(τk))µ+ (1− Pfa(τk))(1− µ)}M∑r=k

β?i,r

(39)

(T − τk)

MT log 2hj,k

{µ(1− Pd(τk))

σ2 + σ2in +

∑Nl=1 p

?l,khl,k

+(1− µ)(1− Pfa(τk))

σ2 +∑Nl=1 p

?l,khl,k

}≤ T − τk

MTλgj,k(1− Pd(τk))

+(T−τk) {(1− Pd(τk))µ+ (1− Pfa(τk))(1− µ)}M∑r=k

β?j,r

(40)

The above two equations imply the required result.

APPENDIX BPROOF OF LEMMA 2

Proof. First, we need the following lemma:Lemma 3: The optimal solution of the problem has at mostone user indexed by i that satisfies 0 < p?i,k < Pmax wherei = π(|I|), and the following condition must hold for theoptimal transmission power:

dπ(|I|),k

λeπ(|I|),k + fπ(|I|),k=

{µ(1− Pd(τk))

σ2 + σ2in +

∑|I|c=1 Pmaxhπ(c),k

+(1− µ)(1− Pfa(τk))

σ2 +∑|I|c=1 Pmaxhπ(c),k

}−1(41)

Proof. (By contradiction) We assume that there exist two usersi and j such that with 0 < p?i,k < Pmax and 0 < p?j,k < Pmax.From the KKT condition we determine that δ?i,k = δ?j,k =0 and α?i,k = α?j,k = 0 respectively. Using these values wecan deduce: di,k

λei,k+fi,k=

dj,kλej,k+fj,k

. Since hi,k and gi,k’s areindependent of hj,k and gj,k’s and they are drawn from acontinuous distribution and λ is constant, it can be inferred thatthe above equality is satisfied with a probability of measurezero. Thus we can deduce that there is at most one user (say



17

user i) with 0 < p?i,k < Pmax. Thus the following expressionholds for transmission power for ith SU:

di,kλei,k + fi,k

=

{µ(1− Pd(τk))

σ2 + σ2in +


+(1− µ)(1− Pfa(τk))


}−1(42)

Using the KKT conditions it is easy to check that for anyuser z ∈ I, z 6= i, with p?z,k > 0 must satisfy dz,k

λez,k+fz,k≥

di,kλei,k+fi,k

. Thus it follows that i = π(|I|).

From this result, we can now infer that there are only twopossible sets of solutions for p?i,k, k ∈ I:

• Case I: p?π(a),k = Pmax, a = 1, 2, . . . , |I|.• Case II: p?π(a),k = Pmax, a = 1, 2, . . . , |I| − 1

The expression for p?π(|I|),k can be solved by solving thefollowing non-linear equation:

{µ(1− Pd(τk))

ρ1 + p?π(|I|),khπ(|I|),k+

(1− µ)(1− Pfa(τk))

ρ2 + p?π(|I|),khπ(|I|),k

}−1= ω

(43)where

ρ1 = σ2 + σ2in +

|I|−1∑c=1

Pmaxhπ(c),k

ρ2 = σ2 +

|I|−1∑c=1

Pmaxhπ(c),k

ω =dπ(|I|),k

λeπ(|I|),k + fπ(|I|),k

Since p?π(|I|),k ≤ Pmax, thus we can write the optimaltransmission power as p?π(|I|),k = min{Pmax, P ∗k }, where P ∗kis the feasible solution to the quadratic equation (43).

All that is required now is to show the fact that optimalnumber of active users |I| is the largest value of x such that :

dπ(x),k

λeπ(x),k + fπ(x),k>

{µ(1− Pd(τk))

σ2 + σ2in +

∑x−1c=1 Pmaxhπ(c),k

+(1− µ)(1− Pfa(τk))

σ2 +∑x−1c=1 Pmaxhπ(c),k

}−1(44)

It can be shown that both the case I and case II, for any SUπ(b), b = 1, . . . , |I|, the above inequality holds. Since fromLemma 1, it can be said that its left hand side decreases with x,while the right hand side increases with x, thus it is sufficientto show that the inequality holds for b = |I|. Thus in this case

δ?π(|I|),k = 0 and α?π(|I|),k ≥ 0, we have

dπ(|I|),k

λeπ(|I|),k + fπ(|I|),k≥

{µ(1− Pd(τk))

σ2 + σ2in +

∑|I|c=1 p

?|I|,khπ(c),k

+

(1− µ)((1− Pfa(τk)))

σ2 +∑|I|c=1 p

?|I|,khπ(c),k

}−1

>

{µ(1− Pd(τk))

σ2 + σ2in +

∑|I|−1c=1 Pmaxhπ(c),k

+(1− µ)(1− Pfa(τk))

σ2 +∑|I|−1c=1 Pmaxhπ(c),k

}−1Next we have to show that for any user π(j), j = |I| +1, . . . ,M , the inequality does not hold. Again it is sufficientto show that it does not hold for π(|I| + 1). For that userδ?π(|I|+1),k ≥ 0 and α?π(|I|+1),k = 0, we have

dπ(|I|+1),k

λeπ(|I|+1),k + fπ(|I|+1),k≤

{µ(1− Pd(τk))

σ2 + σ2in +

∑|I|c=1 p

?|I|,khπ(c),k

+(1− µ)(1− Pfa(τk))

σ2 +∑|I|c=1 p

?|I|,khπ(c),k

}−1

≤

{µ(1− Pd(τk))

σ2 + σ2in +


+(1− µ)(1− Pfa(τk))


}−1Thus we can determine |I|, and the proof is complete.

REFERENCES

[1] “Report of the Spectrum Efficiency Working Group.” FCC, Tech. Rep.,Nov. 2002.

[2] S. Kim, J. Lee, H. Wang, and D. Hong, “Sensing performance of energydetector with correlated multiple antennas,” IEEE Signal ProcessingLetters, vol. 16, no. 8, pp. 671–674, Aug 2009.

[3] S. Haykin, “Cognitive radio: brain-empowered wireless communica-tions,” IEEE Journal on Selected Areas in Communications, vol. 23,no. 2, pp. 201–220, Feb 2005.

[4] T. Yucek and H. Arslan, “A survey of spectrum sensing algorithms forcognitive radio applications,” IEEE Communications Surveys Tutorials,vol. 11, no. 1, pp. 116–130, 2009.

[5] K. Cichon, A. Kliks, and H. Bogucka, “Energy-efficient cooperativespectrum sensing: A survey,” IEEE Communications Surveys and Tuto-rials, vol. 18, no. 3, pp. 1861–1886, 2006.

[6] C. Han, T. Harrold, S. Armour, I. Krikidis, S. Videv, P. M. Grant,H. Haas, J. Thompson, I. Ku, C.-X. Wang, T. A. Le, M. Nakhai,J. Zhang, and L. Hanzo, “Green radio: radio techniques to enable energy-efficient wireless networks,” IEEE Communications Magazine, vol. 49,no. 6, pp. 46–54, June 2011.

[7] L. X. Cai, H. V. Poor, Y. Liu, T. H. Luan, X. Shen, and J. W. Mark,“Dimensioning network deployment and resource management in greenmesh networks,” IEEE Wireless Communications, vol. 18, no. 5, pp.58–65, October 2011.

[8] S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover, andK. Huang, “Energy harvesting wireless communications: A review ofrecent advances,,” IEEE J. on Sel. Areas in Commun., vol. 33, no. 3,pp. 360–381, March 2015.

[9] J. Yang and S. Ulukus, “Optimal packet scheduling in an energy harvest-ing communication system,” IEEE Transactions on Communications,vol. 60, no. 1, pp. 220–230, January 2012.

[10] K. Tutuncuoglu and A. Yener, “Optimum transmission policies forbattery limited energy harvesting nodes,” IEEE Transactions on WirelessCommunications, vol. 11, no. 3, pp. 1180–1189, March 2012.

[11] H. Li, N. Jaggi, and B. Sikdar, “Relay scheduling for cooperativecommunications in sensor networks with energy harvesting,” IEEETransactions on Wireless Communications, vol. 10, no. 9, pp. 2918–2928, September 2011.



18

[12] M. A. Antepli, E. Uysal-Biyikoglu, and H. Erkal, “Optimal packetscheduling on an energy harvesting broadcast link,” IEEE Journal onSelected Areas in Communications, vol. 29, no. 8, pp. 1721–1731,September 2011.

[13] H. Huang and V. K. N. Lau, “Decentralized delay optimal control forinterference networks with limited renewable energy storage,” IEEETransactions on Signal Processing, vol. 60, no. 5, pp. 2552–2561, May2012.

[14] J. Yang and S. Ulukus, “Optimal packet scheduling in a multipleaccess channel with rechargeable nodes,” in 2011 IEEE InternationalConference on Communications (ICC), June 2011, pp. 1–5.

[15] S. Lee, R. Zhang, and K. Huang, “Opportunistic wireless energyharvesting in cognitive radio networks,” IEEE Transactions on WirelessCommunications, vol. 12, no. 9, pp. 4788–4799, Sep. 2013.

[16] O. Elnahas, M. Elsabrouty, O. Muta, and H. Furukawa, “Game theoreticapproaches for cooperative spectrum sensing in energy-harvesting cog-nitive radio networks,” IEEE Access, vol. 6, pp. 11 086–11 100, 2018.

[17] H. S. Lee, M. E. Ahmed, and D. I. Kim, “Optimal spectrum sensingpolicy in RF-powered cognitive radio networks,” IEEE Transactions onVehicular Technology, vol. 67, no. 10, pp. 9557–9570, Oct 2018.

[18] J. Ren, J. Hu, D. Zhang, H. Guo, Y. Zhang, and X. Shen, “RF energyharvesting and transfer in cognitive radio sensor networks: Opportunitiesand challenges,” IEEE Communications Magazine, vol. 56, no. 1, pp.104–110, Jan 2018.

[19] R. Zhang, S. Cui, and Y. C. Liang, “On ergodic sum capacity of fadingcognitive multiple-access and broadcast channels,” IEEE Transactionson Information Theory, vol. 55, no. 11, pp. 5161–5178, Nov 2009.

[20] D. Xu and Q. Li, “Joint power control and time allocation for wirelesspowered underlay cognitive radio networks,” IEEE Wireless Communi-cations Letters, vol. 6, no. 3, pp. 294–297, June 2017.

[21] Y. C. Liang, Y. Zeng, E. C. Y. Peh, and A. T. Hoang, “Sensing-throughput tradeoff for cognitive radio networks,” IEEE Transactionson Wireless Communications, vol. 7, no. 4, pp. 1326–1337, April 2008.

[22] O. Ozel and S. Ulukus, “Information-theoretic analysis of an energyharvesting communication system,” in 2010 IEEE 21st InternationalSymposium on Personal, Indoor and Mobile Radio CommunicationsWorkshops, Sept 2010, pp. 330–335.

[23] S. Park and D. Hong, “Achievable throughput of energy harvesting cog-nitive radio networks,” IEEE Transactions on Wireless Communications,vol. 13, no. 2, pp. 1010–1022, February 2014.

[24] S. Park, H. Kim, and D. Hong, “Cognitive radio networks with energyharvesting,” IEEE Transactions on Wireless Communications, vol. 12,no. 3, pp. 1386–1397, March 2013.

[25] S. Knorn, S. Dey, A. Ahlen, and D. E. Quevedo, “Distortion mini-mization in multi-sensor estimation using energy harvesting and energysharing,” IEEE Transactions on Signal Processing, vol. 63, no. 11, pp.2848–2863, June 2015.

[26] Y. H. Bae and J. W. Baek, “Sensing strategy exploiting channel memoryin cr network with rf energy harvesting,” IEEE Communications Letters,vol. 22, no. 12, pp. 2539–2542, Dec 2018.

[27] S. Yin, Z. Qu, and S. Li, “Achievable throughput optimization in energyharvesting cognitive radio systems,” IEEE Journal on Selected Areas inCommunications, vol. 33, no. 3, pp. 407–422, March 2015.

[28] Z. Li, B. Liu, J. Si, and F. Zhou, “Optimal spectrum sensing intervalin energy-harvesting cognitive radio networks,” IEEE Transactions onCognitive Communications and Networking, vol. 3, no. 2, pp. 190–200,June 2017.

[29] A. Celik, A. Alsharoa, and A. E. Kamal, “Hybrid energy harvesting-based cooperative spectrum sensing and access in heterogeneous cogni-tive radio networks,” IEEE Transactions on Cognitive Communicationsand Networking, vol. 3, no. 1, pp. 37–48, March 2017.

[30] A. Celik and A. E. Kamal, “Multi-objective clustering optimizationfor multi-channel cooperative spectrum sensing in heterogeneous greencrns,” IEEE Transactions on Cognitive Communications and Network-ing, vol. 2, no. 2, pp. 150–161, June 2016.

[31] ——, “Green cooperative spectrum sensing and scheduling in hetero-geneous cognitive radio networks,” IEEE Transactions on CognitiveCommunications and Networking, vol. 2, no. 3, pp. 238–248, Sep. 2016.

[32] T. Zhang, W. Chen, and F. Yang, “Balancing delay and energy efficiencyin energy harvesting cognitive radio networks: A stochastic stackelberggame approach,” IEEE Transactions on Cognitive Communications andNetworking, vol. 3, no. 2, pp. 201–216, June 2017.

[33] G. Han, J. K. Zhang, and X. Mu, “Joint optimization of energyharvesting and detection threshold for energy harvesting cognitive radionetworks,” IEEE Access, vol. 4, pp. 7212–7222, 2016.

[34] F. Mohammadian, Z. Pourgharehkhan, A. Taherpour, and T. Khattab,“Optimal collaborative energy harvesting spectrum sensing with limitedtime resource,” in 2016 IEEE Wireless Communications and NetworkingConference, April 2016, pp. 1–7.

[35] X. Liu, W. Lu, L. Ye, F. Li, and D. Zou, “Joint resource allocation ofspectrum sensing and energy harvesting in an energy-harvesting-basedcognitive sensor network,” Sensors, vol. 17, no. 3, 2017. [Online].Available: http://www.mdpi.com/1424-8220/17/3/600

[36] X. Liu, W. Lu, F. Li, M. Jia, and X. Gu, “Optimal simultaneous multislotspectrum sensing and energy harvesting in cognitive radio,” in 2016IEEE Global Communications Conference (GLOBECOM), Dec 2016,pp. 1–6.

[37] Z. Wang, V. Aggarwal, and X. Wang, “Iterative dynamic water-filling forfading multiple-access channels with energy harvesting,” IEEE Journalon Selected Areas in Communications, vol. 33, no. 3, pp. 382–395,March 2015.

[38] ——, “Power allocation for energy harvesting transmitter with causalinformation,” IEEE Transactions on Communications, vol. 62, no. 11,pp. 4080–4093, Nov 2014.

[39] S. Biswas, A. Shirazinia, and S. Dey, “Sensing throughput optimizationin cognitive fading multiple access channels with energy harvestingsecondary transmitters,” in 2016 24th European Signal ProcessingConference (EUSIPCO), Aug 2016, pp. 577–581.

[40] X. Chen, C. Yuen, and Z. Zhang, “Wireless energy and informationtransfer tradeoff for limited-feedback multiantenna systems with energybeamforming,” IEEE Transactions on Vehicular Technology, vol. 63,no. 1, pp. 407–412, Jan 2014.

[41] L. Liu, R. Zhang, and K.-C. Chua, “Multi-antenna wireless poweredcommunication with energy beamforming,” IEEE Transactions on Com-munications, vol. 62, no. 12, December 2014.

[42] C. Floudas, Nonlinear and Mixed-Integer Optimization. Fundamentalsand Applications. Oxford, UK: Oxford University Press, 1995.

[43] K. G. Murty and S. N. Kabadi, “Some NP-complete problems inquadratic and nonlinear programming,” Mathematical Programming,vol. 39, no. 2, pp. 117–129, June 1987.

[44] S. Atapattu, C. Tellambura, and H. Jiang, “Energy detection basedcooperative spectrum sensing in cognitive radio networks,” IEEE Trans-actions on Wireless Communications, vol. 10, no. 4, pp. 1232–1241,April 2011.

[45] X. Kang, Y. Liang, A. Nallanathan, H. K. Garg, and R. Zhang, “Opti-mal power allocation for fading channels in cognitive radio networks:Ergodic capacity and outage capacity,” IEEE Transactions on WirelessCommunications, vol. 8, no. 2, pp. 940–950, Feb 2009.

[46] T. Cover, R. McEliece, and E. Posner, “Asynchronous multiple-accesschannel capacity,” IEEE Transactions on Information Theory, vol. 27,no. 4, pp. 409–413, July 1981.

[47] J. Lee and S. Leyffer, Eds., Mixed Integer Nonlinear Programming.New York: Springer Science, 2012.

[48] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge, UK:Cambridge University Press, 2004.

[49] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed.Athena Scientific, 2005, vol. I.

[50] A. Goldsmith, Wireless Communications. New York: CambridgeUniversity Press, 2005.

[51] M. Schlueter, S. O. Erb, M. Gerdts, S. Kemble, and J.-J. Ruckmann,“MIDACO on MINLP space applications,” Advances in Space Research,vol. 51, no. 7, pp. 1116–1131, 2013.

Documents

Sum Throughput Maximization in a Cognitive Multiple Access ... · throughput optimization in a fading multiple access cognitive radio (CR) network, where the secondary user (SU) transmitters