Dynamic position changeable Stackelberg game for …...technologically advanced mobile devices with enhanced capabilities. To address these challenges, user-provided To address these

RESEARCH Open Access

Dynamic position changeable Stackelberggame for user-provided network controlalgorithmsSungwook Kim

Abstract

Nowadays, wireless mobile services have been going through a paradigm shift due to three reasons: (i) theincreasing needs for ubiquitous connectivity, (ii) an unprecedented volume of mobile data traffic, and (iii)technologically advanced mobile devices with enhanced capabilities. To address these challenges, user-providednetwork (UPN) is an emerging technology and can be extensively deployed with the aim of providing substantialimprovements to cellular coverage and capacity. In this study, we focus on the design of novel UPN controlscheme based on the game theory. Motivated by the Stackelberg game model, our proposed scheme allowsmobile devices to play changeable roles, i.e., hosts or clients, to improve the UPN system performance. Underdynamically changing UPN environments, it is a suitable approach. Based on the decentralized, individual non-cooperative manner, we capture the dynamics of UPN system while leading a deep progress for future networks.Simulations and performance analysis verify the efficiency of proposed scheme, showing that our approach canoutperform existing schemes in terms of bandwidth utilization, quality of experience (QoE) of service success, delayand throughput ratios, and normalized users’ profit.

Keywords: User-provided network, Position changeable Stackelberg game, Quality of experience, Reinforcementlearning, Game theory

1 IntroductionToday, we are witnessing technological advances thatherald the advent of a new era in telecommunicationnetworks. With widespread wireless techniques, andvariety of user-friendly terminals, cellular networks areexpected to face new challenges. The traditional existingcellular systems might run out of capacity in the nearfuture due to significantly increasing machine-to-machine (M2M) data traffic with various service require-ments [1, 2]. With the emergence of a variety of newwireless network paradigms, it is envisioned that a newera of personalized services has arrived, which empha-sizes users’ quality of experience (QoE). 5 Generation(5G) network standards represent a promising technol-ogy to support users’ QoE; it will become one of the keyfeatures in future networks [3].

QoE is a measure of customers’ experiences with ser-vices. It focuses on the entire service experience and is amore holistic evaluation than the more narrowly focuseduser experience. Traditionally, quality of service (QoS)parameters have been employed to evaluate thenetwork-oriented service quality. However, QoE expandsthis horizon to capture people’s esthetic and evenhedonic needs. Therefore, QoE emphasizes the end-to-end performance from both the subjective and objectiveperspectives and takes into consideration user percep-tion in the loop [4, 5]. While the QoE concept is easilymotivated, it remains a difficult and complex problem tospecially support device-to-device, ultra-reliable, andmassive machine-based 5G networks.Within the past few years, the increasing demand

for mobile data in current cellular networks and theproliferation of advanced handheld devices have ledto a new network paradigm, known as user-providednetwork (UPN). The concept of UPN was firstdefined by R. Sofia as “such a networking technology

Correspondence: [email protected] of Computer Science, Sogang University, 35 Baekbeom-ro(Sinsu-dong), Mapo-gu, Seoul 121-742, South Korea

© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made.

Kim EURASIP Journal on Wireless Communications and Networking (2017) 2017:214 DOI 10.1186/s13638-017-1004-2

http://crossmark.crossref.org/dialog/?doi=10.1186/s13638-017-1004-2&domain=pdf

mailto:[email protected]

http://creativecommons.org/licenses/by/4.0/

where the end-user is, at the same time a customerand a provider of network access” [6]. Based on thesocio-technological advance, mobile devices havesurplus capacities as a “micro-provider” to providerelated services for other nearby users withoutadditional network infrastructures. In particular, therise of social networks inspires end users’ willingnessto be micro-providers to spread the common interestsby means of shared connectivity [4].Recently, two UPN models have been implemented

through different approaches. One method enables mobiledevices to create a mesh network and share their Internetconnections without the intervention of network opera-tors. The other method adopts a virtual mobile operatorthat enables its subscribers to act as mobile Wi-Fi hot-spots to serve non-subscribers by offering some free dataquota. Open Garden and Karma are well-known startupexamples of two methods of UPN services, respectively[7]. Even though these two methods have different controlmechanisms, there has been growing consensus that UPNoperations rely on the participation of self-organizing userequipments (UEs). Therefore, a central issue in UPNs isthat users must agree to serve each other. However, par-ticipating users have conflicting interests, and existingUPN methods lack proper control algorithms to coordin-ate conflicts of interests [8].In this paper, we study and design a novel UPN con-

trol scheme where individual users share their mobileconnections and act as hosts for other users, who areacting clients in hosts’ vicinity. Providing UPN services,users in the proposed scheme can benefit from sharingleftover resources to/from other users. To share users’connectivity and resources both fairly and efficiently, weneed a new control paradigm. Nowadays, the game the-oretic approach is widely recognized as a practical per-spective for the implementation of real-world networkoperations. Game theory is the study of strategic interac-tions between multiple intelligent rational decisionmakers trying to maximize the expected value of theirown payoffs, which is measured in some utility scale [8].Motivated by the UPN’s situation that users are rationalindividuals and able to make control decisions logicallyin order to pursue their own interests, we have adopteda game theoretic mechanism. In this way, we are able toease the heavy computational burden of theoretically op-timal centralized solutions.To model the interaction among multiple hosts and

clients, Stackelberg game is a suitable and proper model;it was initially proposed by the German economist H.von Stackelberg in 1934 to explain some economicmonopolization phenomena. In a classical Stackelberggame, one player acts as a leader and the rest as fol-lowers, and the main goal is to find an optimal strategyfor the leader, assuming that the followers react in such

a rational way that followers optimize their objectivefunctions given the leader’s actions [8, 9]. In the study,we devise a new Stackelberg game, called positionchangeable Stackelberg (PCS) game, to adapt effectivelyto the UPN situation. In the PCS game, each individualuser can be a leader, i.e., host, or a follower, i.e., client,dynamically to maximize his payoff. Based on thecurrent position, each user learns the UPN conditionand dynamically selects his strategy.The success of UPN systems relies on the users’ will-

ingness to contribute their Internet connectivity and net-work resources [10]. For practical UPN operations, it isimportant to design effective incentive mechanisms forencouraging hosts to participate UPN services. As exam-ples, the Karma UPN system offers some free data quotareward to UPN hosts who share their connectivity, andthe Open Garden UPN system designs a distributedbargaining-based virtual currency approach to fairly andefficiently share the resource. However, these prior stud-ies neither focused on interactions between a particularhost and his clients, or on dynamic UPN structures withchangeable topologies [10].In the proposed scheme, hosts collect the UPN ser-

vice fee from their clients; it is an incentive for eachhost. To decide the best price strategy, each host usesa reinforcement learning algorithm in a distributedmanner. By taking into account temporal and spatialUPN system status, an individual host makes intelli-gent decisions. Based on the PCS game andreinforcement learning, our proposed UPN controlscheme mainly considers three design and operationalissues: (i) which position is better for each user inthe current UPN topology, (ii) what is an appropriateincentive to each host, and (iii) how users select theirstrategies with imperfect information. To effectivelyaddress these issues, we focus on design principlessuch as feasibility, self-adaptability, and effectivenessin providing a desirable solution. Although severalUPN control schemes have been proposed, no sys-tematic study based on the realistic UPN scenario hasbeen conducted. To the best of our knowledge, therehas been very little research on UPN control prob-lems by integrating game theory and reinforcementlearning algorithms.

1.1 ContributionThis study generalizes the UPN control algorithm in thefollowing aspects. First, users in the UPN can dynamic-ally decide their positions. To model the interactionamong position changeable users, we employ the PCSgame. Second, the host’s pricing policy for UPN servicesis developed as an incentive mechanism. Third, a novelprice decision process is designed with practical assump-tions. Depending on a new reinforcement learning

Kim EURASIP Journal on Wireless Communications and Networking (2017) 2017:214 Page 2 of 13

approach, the hosts select their price strategies throughindividual and social learnings. Finally, a fair-balancedsolution can be obtained under diversified UPNsituations. In summary, the main contributions are asfollows:

� Position changeable Stackelberg game model:motivated by diversified UPN situations, weintroduce a new game model while capturing avariety of system characteristics. This approach isgeneric and applicable to various UPN scenarios.

� Well-balanced network performance: we modelthe interaction of users by considering theresponsive tradeoff between hosts and clients.According to a feedback-based interactiveprocess, we understand thoroughly the impact ofhost’s price strategy and provide an effectiveincentive mechanism for an attractive UPNperformance.

� Implementation practicality: as game players, userslearn how to modify their prior knowledge andselect their strategies with bounded rationality.Therefore, the decision mechanism is implementedwith low complexity. It is practical and suitable forreal-world UPN operations.

� Solution concept: the main idea of our PCS gamelies in its responsiveness to the reciprocalcombination of optimality and practicality. Playersact with self-adaptability and real-time effectiveness.The concept of our solution is to approximate anefficient UPN status using individual and sociallearning approaches.

� Conclusions: Numerical study shows that ourgame-based approach can increase the bandwidthutilization, service QoE, and users’ profit by 5 to10% under different service request rates,comparing to the existing IDME [10] and MGDT[11] schemes.

1.2 OrganizationThe remainder of this article is organized as follows.In the next section, we review some related UPNschemes and their problems. In Section 3, we pro-vide a detailed description of the proposed PCSgame model and UPN control algorithms. In particu-lar, this section provides fresh insights into the bene-fits and design of PCS game-based UPN controlapproach. For convenience, the main steps of theproposed scheme are then listed. In Section 4, wevalidate the performance of the proposed scheme bymeans of a comparison with some existing methods.Finally, we present our conclusion and discuss theremaining open challenges in this area along withpossible solutions.

2 Related workThere has been considerable research into the design ofUPN control schemes. Sofia et al. [6] first defined theconcept of UPN, where the end users can be consumersand providers of Internet access at the same time.Authors provided a characterization of UPN, and a com-parison of connectivity features for the UPN against adhoc and multi-hop networks. To clarify the main differ-ences from other autonomic networks, this article inves-tigated a paradigm shift in Internet services andwholesale models [6].The article [4] surveyed recent technical develop-

ments in the fields of QoE and UPN. After that, anew UPN-based framework in mobile networks wasproposed. In particular, the importance of end usersin the QoE provisioning chain was highlighted. Inorder to aid descriptions of the proposed UPN-basednetworks, a case study was conducted with the pro-posed adaptive resource scheduling method for QoEimprovement. Some possible challenges and futureresearch were also discussed [4].Iosifidis et al. [12] analyzed the design challenges of

incentive mechanisms for encouraging user engage-ment in UPNs. Motivated by recently launched busi-ness models, they discussed the technical issuespertaining to resource allocation for UPN servicesand explained the importance of incorporating incen-tive schemes. And then, they analyzed mobile UPNmodels that were inspired by Open Garden andKarma. Finally, the challenges in designing incentivemechanisms for these services and two appropriatesolutions were discussed [12].Gao et al. [13] designed an optimal pricing and re-

imbursing strategy for a mobile virtual network oper-ator that maximized its total revenue whileconsidering the necessary incentives to the operator.They employed a game-theoretic analysis and mod-eled the interaction between the operator and thehosts as a two-stage leader-follower Stackelberg game.In the first stage, the operator decided the price andfree data quota reimbursing plan. In the second stage,every host decides how much data would be con-sumed for its own needs and how much data wouldbe forwarded for clients. Finally, they systematicallyanalyzed the game equilibrium, which can capture avariety of system characteristics, including the servicetypes of hosts or clients, the capacities of hosts’ Inter-net connections, and the energy consumption patternsof hosts [13].The Matching Game based Data Trading (MGDT)

scheme [11] considered a new operator-supervisedUPN scheme based on the matching game model. Inthis scheme, users shared connectivity and acted asan access point for other users. To incentivize user


participation in the UPN, the MGDT scheme allowedthe users to trade their data plan and obtained aprofit by selling and buying leftover data capacitiesfrom each other. To formulate the buyer-seller datatrading process, a many-to-one matching game wasdesigned and a distributed algorithm was developedthat enabled users to self-organize into a stablematching. By combining the matching theory andmarket equilibrium, the MGDT scheme ensureddynamic adaptation of price to data demand andsupply and calculated operator gains and the bench-mark price that would encourage users to join theUPN [11].The Incentive Design and Market Evolution (IDME)

scheme [10] studied the user membership evolution inan operator-assisted UPN and the operator’s best strat-egy to maximize his profit. Simply, two key questions inthe IDME scheme were (i) what was the best member-ship choice for each mobile user and (ii) what was theoperator’s best pricing strategy. To effectively answerthese two questions, the IDME scheme formulated theinteraction between the operator and mobile devices asa two-stage game. In the stage I, the operator deter-mined the usage-based pricing and quota-based incen-tive mechanism for the data usage. In the stage II, themobile devices made their decisions about whether to bea host, or a client [10]. Some earlier studies [4, 6, 10–13]have attracted considerable attention while introducingunique challenges in handling the UPN control prob-lems. In this paper, we demonstrate that our proposedscheme significantly outperforms these existing IDME[10] and MGDT [11] schemes.

3 The proposed PCS game-based UPN algorithmsIn this section, we provide a brief introduction to ourPCS game model, which forms the theoretical basis ofthe proposed UPN control scheme. By adopting anew PCS game-based approach, we design effectivecontrol protocols to adapt the dynamically changingUPN environments.

3.1 PCS game model and players’ utility functionsCurrently, there are several UPN models that differ inthe architectures and services they offer to users.However, one common aspect of these models is thatUPNs consists of several network agents that areorganized in a mesh topology where all agentscooperate with each other to improve network per-formance. Generally, there are two types of agentswhich are base stations, i.e., eNBs and UEs. The func-tion of eNBs is to support Internet connectivity byacting as gateways to connect backbone networks.UEs participate in UPN services to advance trafficmanagement and reduce energy consumption. The

main concern of UPN idea is to exploit the diversityof UEs’ needs, resources, and Internet connectivity bybuilding an autonomous intervention mechanism.We consider the operation of a multicellular net-

work that consists of a set B = {B1…Bn} of eNBs, anda set G = {G1…Gy} of UEs. Some UEs may be inrange with a B ∈B, while some others may not. EacheNB has a set of bandwidth resource blocks that canbe allocated to UEs in each time period. We assumethat UEs may change their locations and hence havedifferent neighbors in different time periods. Such amobility assumption allows us to get closed formexpressions and gain valuable insights into thedesign of UPN.Different UEs are in different situations. In our model,

there are two types of UEs, i.e., hosts and clients in theUPN. A host can connect to the eNB directly and sharehis Internet connection with clients. Therefore, hostsserve as temporal gateways to other clients. A client canconnect to the eNB directly only for his service, orconnects to the eNB via a nearby host. The host’s goal isto maximize his profit, which depends on the revenuecollected from his clients, and the objective of the clientis also to maximize his profits, which depend on theoutcome from UPN services. According to their currentstatus, they dynamically select their positions and theirdecisions are adaptively changeable over time. In thisstudy, the UPN control scheme is formulated as a newPCS game model. As game players, UEs select theirpositions and strategies based on the interactions ofother UEs. At each time period of gameplay, we formallydefine the PCS game model GPCS = {B, G, SGh ; S

Gc

� �,

UGh ;U

Gc

� �, LPG , T} as follows:

� B = {B1…Bn} represents a set of eNBs, and each Bhas orthogonal bandwidth blocks.

� G = {G1…Gy} is the finite set of PCS gameplayers, i.e., UEs. The G can be divided intothree subsets Gh, Gc, and Gn; Gh is the subset ofhosts Gh

k∈Gh⊂G, Gc is the subset of clients Gc

i∈Gc⊂G and Gn is the subset of UEs, whichtemporarily give up service requests Gn

l ∈Gn⊂G.

� In the SGh ; SGc

� �, SG

h ¼ shmin…shi …shmax

� �is the set of

host’s strategies where shi ∈SGh means the ith price

level for client’s communications. SGc is the set ofclient’s available strategies; each client has threestrategy options: (i) selects one host among hisneighboring hosts for relay communications, (ii)directly connects to his corresponding eNB, or (iii)to be a member of Gn.

� UGh is the payoff received by the host, and UG

c is thepayoff received by the client during the UPNoperation.


� LsG is the learning value for the host’s strategy s; LsGis used to estimate the probability distribution (PG)for the next strategy selection.

� T ={H1,…,Ht,Ht + 1,…} denotes time, which isrepresented by a sequence of time steps withimperfect information for the PCS game process.

Table 1 lists the notations used in this paper.In the UPN system, eNBs charge their subscribers, i.e.,

hosts and clients, based on the usage-based price mech-anism. If unexpected growth of traffic may develop in aspecific eNB, it may create a local traffic congestion. Toalleviate this kind of traffic overload, the current price(PB) should increase. If few subscribers access in a eNB,the current price should decrease to attract subscribersto participate in UPN services. As in the same manner,hosts charge their clients based on the current trafficload. According to the current price, individual clientsact independently and selfishly; all game players selecttheir strategies to maximize their payoffs.Each host can connect to the eNB anytime via his

device. At the same time, this device can provideInternet connections for nearby clients as a gateway.Therefore, the assigned communication resource for ahost can be shared with clients, and clients need topay the price for their UPN services. Depending onthe payment, cost, and expense functions of UPNservices, the utility function of the host Gh

k at time Ht

UGh

kh Htð Þ

� �can be defined as follows:

UGh

kh Htð Þ ¼

XGc

i∈NGhk

PGh

kh r

Gci

c Htð Þ� �

−CGh

kh ΘB

GhkHtð Þ; rGc

ic Htð Þ

� �

−DGh

kh r

Gci

c Htð Þ� �!

ð1Þ

s:t:;

PGh

kh r

Gci

c Htð Þ� �

¼ Fs Ghk ;Ht

� �� rGc

ic Htð Þ

CGh

kh ΘB

GhkHtð Þ; rGc

ic Htð Þ

� �¼ αþΘB Htð Þ� � r

Gci

c Htð ÞD

Ghk

h rGc

ic Htð Þ

� �¼ m

Ghk

h � rGc

ic Htð Þ

8>>>><>>>>:

where PGh

kh �Þð is the price function to clients at time Ht,

and CGh

kh �Þð and D

Ghk

h �Þð are the cost and expense functions

Table 1 Parameters used in the proposed algorithm

Notations Explanation

B The set of eNBs where B = {B1…Bn}

G The set of UEs where E = {E1…Ey}

Gh The subset of hosts Gh ⊂G

Ghk The kth host where Gh

k∈Gh

Gc The subset of clients Gc ⊂G

Gci The ith client where Gc

i ∈Gc

Gn The subset of UEs, which temporarily give up servicerequests where Gn ⊂G

Gnl The lth UE where Gn

l ∈Gn

SGh The set of host’s strategies

shj The jth price level for client’s communications where shj ∈SGh

SGc The set of client’s available strategies

UGh The payoff received by the host

UGc The payoff received by the client

LsG The learning value for the host’s strategy s

NGhk

The set of Ghk ’s neighboring hosts

PGhk

h �ð Þ The price function to clients

CGhk

h �ð Þ The cost function of Ghk ,

DGhk

h �ð Þ The expense function of Ghk

rGci

c The bandwidth amount of Gci ’s UPN service

Fs �ð Þ The G’s UPN price strategy per bit where Fs �ð Þ∈SGhα The connected eNB’s minimum charge

ΘB �ð Þ The current bandwidth utilization

mGhk

h The Ghk ’s the expense factor per bit of UPN services

UGci

c �ð Þ The utility function of the client Gci

QGci

c �ð Þ The outcome function of Gci

I �ð Þ The charge function of Gci

KGci

c �ð Þ The expense function of Gci

ψGci

c Gci ’s satisfaction factor per bit of UPN service

ζGci

c Gci ’s expense factor per bit of UPN service

PSB �ð Þ The B’s price strategy

ϕGkHt

Bj� �

The allocated RU amount for the Gk at time Ht in the Bj,

ϑGkHt

The amount of RU requested by the Gk at time Ht

RBjHt

The available RU amount of Bj

A(Bj) The set of UEs, which request new RUs from the Bj

γGlHt

The Gl’s bargaining power at time Ht

EGlHt

The entropy for the Gl at the time Ht

FGl The set of the neighboring UEs of Gl

O(Gl, Gf) The relative mobility among two Gl and Gf

x The learning rate that models how the L-values are updated

Λ(Bg) The set of hosts, which are connected to the Bg

v The individual value

w The social learning value

β The control factor for the weighted average betweendifferent learning approaches

Table 1 Parameters used in the proposed algorithm (Continued)

Notations Explanation

P The strategy selection distribution for each host

pGhk

shjHtð Þ The shj strategy selection probability by the Gh

k at time Ht

κ The service success ratio in the UPN system

τ The delay rate in the UPN system

z The UPN system throughput


of Ghk , respectively, for the UPN services. NGh

kis the set of

Ghk ’s neighboring hosts, and r

Gci

c is the bandwidth amountof Gc

i ’s UPN service. Fs Ghk ;Ht

� �is the Gh

k ’s UPN pricestrategy per bit at time Ht where Fs Gh

k ;Ht� �

∈SGh

kh . Let α

and ΘB(Ht) denote the connected eNB’s minimum chargeand bandwidth utilization at time Ht, respectively. m

Ghk

h isthe Gh

k ’s the expense factor per bit of UPN services.Each client can connect to the backbone Internet dir-

ectly via the eNB or indirectly via a nearby host. If thereare multiple hosts in his coverage area, the clientselects one of them to maximize his payoff. If Gh

k isselected as a host, the utility function of the client Gc

i at

time Ht UGc

ic Gh

k ;Ht� ��

can be defined as follows:

UGc

ic Gh

k ;Ht� � ¼ Q

Gci

c rGc

ic Htð Þ

� �−IG

ci

c Ghk ; r

Gci

c Htð Þ� �

−KGci

c rGc

ic Htð Þ

� �ð2Þ

s:t:;

QGc

ic r

Gci

c Htð Þ� �

¼ ψGc

ic � r

Gci

c Htð ÞIGc

ic Gh

k ; rGc

ic Htð Þ

� �¼ min min Gh

k∈NGcijFs Gh

k ;Ht� ��

; PSB Htð Þ� �� rGc

ic Htð Þ

KGc

ic r

Gci

c Htð Þ� �

¼ ζGci

c � rGc

ic Htð Þ

8>>>>>><>>>>>>:

where QGc

ic �ð Þ , I

Gci

c �ð Þ , and KGc

ic �ð Þ are the outcome,

charge, and expense functions of Gci , respectively. Let

ψGc

ic and ζG

ci

c denote the Gci ’s satisfaction and expense fac-

tors per bit of UPN service, respectively. PSB(Ht) repre-sents the corresponding eNB’s price strategy at time Ht,e.g., [α +ΘB(Ht)].As discussed previously, there are three positions of

UEs: host (Gh), client (Gc), or a member of Gn. At eachtime, UEs try to decide their best positions. Hence, eachUE selects his position Χ to maximize his payoff (U)where Χ ∈Gh ∪Gc ∪Gn.

Χ ¼ maxΧ∈Gh∪Gc∪Gn

UΧ∈Gh

h ;UΧ∈Gc

c ;UΧ∈Gn

c

n oð3Þ

s:t:;UΧ∈Gh

h ¼ UGh

h Htð Þ;UΧ∈Gc

c

¼ UGc

c Gh;Ht� �

and UΧ∈Gn

c ¼ 0

3.2 Strategy selection mechanisms in the PCS gameDuring the UPN operation, eNBs have to manageradio resource to optimize all UE’s communications.Therefore, the UPN performance depends on itsradio resource management algorithm and its imple-mentation. Resource allocation process in each eNBis performed by the radio resource management(RRM) entity that dynamically distributes the radioresource to each active UEs in its covering area. Theamount of radio resource allocation is specified interms of resource units (RUs), where one RU is the

minimum allocation amount, e.g., 128 Mbps. TheRRM must be able to meet UEs’ requirements whilemaximizing the resource usage in a flat radio net-work structure [14].In this study, we use the main concept of Nash

bargaining solution (NBS) to allocate RUs. The NBSis an effective tool to achieve a mutually desirablesolution with a good balance between efficiency andfairness [8]. However, in many cases, the assump-tions of traditional NBS are unrealistic in real-lifeenvironments. In particular, if we consider a scenarioof dynamic UPNs, the classical NBS cannot be dir-ectly applied to the RU allocation problem in UPNs.To avoid the inevitable burden of uncertainty, it isnecessary to admit a new NBS approach in order toadapt current network changes.In consideration of UEs’ mobility, we develop an itera-

tive resource allocation algorithm, which is formulated asa sequential Nash bargaining manner. The main conceptof sequential Nash bargaining is to observe the currentsystem environment and dynamically update the NBS toadapt to the network dynamics. This approach can relaxthe traditional NBS assumption that all information arecompletely known. Therefore, during the step-by-stepiteration, the eNB adaptively assigns RUs to all resource-requested UEs. In the Bj, the allocated RU amount for the

Gk at time Ht ϕGkHt

Bj� ��

is defined as follows:

ϕGkHt

Bj� � ¼

ϑGkHt; if R

Bj

Ht≥

XGl∈A Bjð Þ

ϑGlHt

arg max …ϑGl∈A Bjð ÞHt

…

� XGl∈A Bjð Þ

γGlHt

� log ϑGlHt

� �� 8><>:

9>=>;;

otherwise

¼ arg max …ϑGl∈A Bjð ÞHt

…

�log

YGl∈A Bjð Þ

ϑGlHt

� �γGlHt

8><>:

9>=>;; s:t:;

XGl∈A Bjð Þ

ϑGlHt

¼ RBj

Ht

8>>>>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>>>>:

ð4Þ

where ϑGkHt

is the amount of RU requested by the Gk at

time Ht and RBj

Htis the available RU amount of Bj. A(Bj)

is the set of UEs, which now request new RUs from the

Bj. γGlHt

is the Gl’s bargaining power at time Ht, which isthe relative ability to exert influence over other UEs.Usually, the bargaining solution is strongly dependenton the bargaining powers. If different bargaining powersare used, the G with a higher bargaining power obtains ahigher RUs. In the proposed algorithm, determining the

bargaining powers depends on the G’s mobility. Let EGlHt

be the entropy for the Gl at the time Ht. Usually, the


basic concept of entropy is the uncertainty and a meas-ure of the disorder in a system [15–17]. To evaluate Gl’s

stability in the UPN, EGlHt

represents the topologicalchange, which is a natural quantification of the effect ofGl mobility in the UPN; it is calculated as follows:

EGlHt

¼ −X

Gk∈FGl

PGk Ht; ;ΔHð Þ � log PGk Ht; ;ΔHð Þð Þ�

= log C FGlð Þð Þð5Þ

s:t:; PGk Ht ; ;ΔHð Þ ¼ O Gl; ;Gkð Þ=X

Gf ∈FGl

O Gl; ;Gf� �

where ΔH is a time interval, and FGl denotes the set ofthe neighboring UEs of Gl, and C FGlð Þ is the cardinality(degree) of set FGl . O(Gl,Gf ) represents a measure of therelative mobility among two Gl and Gf. Recently, a newsensor technology has been developed rapidly. Now-adays, on-board units (OBUs) are embedded in mostsensors of UEs [18]. OBUs are entities capable of eventdata recorder, global positioning system, forward andbackward radar, sensing facility, and short-range wirelessinterface. Using OBUs, each UE can easily estimate therelative mobility of other UEs.

The entropy EGlHt

is normalized as 0≤EGlHt≤1 . If the

value of EGlHt

is close to 1 (or close to 0), the Gl is stable(or unstable) [16]. Finally, the Gl’s bargaining power at

time Ht (γGlHt) is calculated as follows:

γGlHt

¼ EGlHtP

Gk∈A Bjð ÞEGkHt

; ð6Þ

As a host, the Gh’s strategy (Sh) represents the pricelevel of client’s communications. Therefore, the payoff of

Gh UGh

h

� �can be defined as a function of Sh and re-

quested client services rGc

c

� �. Interactions between Gh

and Gc continue repeatedly during the UPN process overtime. Therefore, the Gh should consider the reactionsfrom Gc to determine his strategy Sh. Under incompleteand asymmetric information situation, Gh should learnfrom the current environment, build knowledge, andultimately make strategy decisions to maximize his pay-off. Until now, several learning algorithms have beendeveloped to learn from the dynamic environments.However, traditional learning algorithms are not suffi-cient to follow rapid UPN changes. In particular, slowreaction to system fluctuations is a main drawback.In this study, we develop a new learning algorithm to

decide effectively the host’s price policy for UPN ser-vices. Usually, learning is divided into two categories:

individual learning and social learning. Individual learn-ing refers to trial and error, or insight temporal learning,and social learning refers to spatial leaning throughinteractions with other individuals. The main novelty ofour learning method is a joint-design manner concern-ing individual and social learning approaches. If thestrategy Shj is selected at time Ht − 1 by the Gh

k in the Bg’s

coverage area, the Ghk updates the strategy Shj ’s learning

value for the next time step LshjGh

kHtð Þ

� according to the

following method:

LshjGh

kHtð Þ ¼ 1−xð Þ � L

shjGh

kHt−1ð Þ

� þ x� vþ w½ �ð Þ ð7Þ

s:t:; v ¼ 1−βð Þ � UGh

kh Ht−1ð Þ and

w ¼ β�X

Ghi ∈Λ Bgð Þ

LshjGh

iHt−1ð Þ

Λ Bg� ��

8><>:

9>=>;

where x is a learning rate that models how the L valuesare updated and Λ(Bg) is the set of hosts, which are con-nected to the Bg. ‖Bg‖ is the cardinality of the set Λ(Bg).In Eq. (7), v and w represent individual and social learn-ing values, respectively. Therefore, β is a control factorfor the weighted average between different learning ap-proaches. When a UE’s mobility in the Gh

k ’s coveragearea is high, we place more emphasis on the social learn-ing, i.e., on w. In this case, a higher value of β is moresuitable. When a UE’s mobility in the Gh

k ’s coverage areais low, the L(·) value should be strongly dependent onother UEs’ responses in the Gh

k ’s area. In this case, alower value of β is preferable to emphasize the individ-ual learning, i.e., on v. In the proposed scheme, the valueof β is dynamically adjusted based on the current en-tropy average of UEs, which are covered by the Gh

k . Thatis to say, β is the arithmetic mean of each UE’s entropy,i.e., EG

Ht−1, according to Eq. (5). Therefore, we can learn

the finest strategy both the individual and the social levelwith the incomplete information of UPN conditions.Based on the L(·) values, a strategy selection

distribution (P) for each host can be defined. During

the UPN process, we determine PGhk Htð Þ ¼

pGh

k

shminHtð Þ…p

Ghk

shjHtð Þ…p

Ghk

shmaxHtð Þ

� �as the probability

distribution of Ghk ’s strategy selection at time Ht; it is

sequentially modified over time. In this study, pGs ∙ð Þ isdefined based on the concept of Boltzmann distributionmethod, which has been used expensively in various


machine learning algorithms. Finally, the shj strategy

selection probability by the Ghk at time Ht PGh

k

shjHtð Þ

� is

given by:

pGh

k

shjHtð Þ ¼

EXP LshjGh

kHt−1ð Þ

� � min max U

Ghk

h Ht−1; shj

� �; 0

� �; 1

n o� �Pmax

i¼ min EXP LshiGh

kHt−1ð Þ

� �� min max U

Ghk

h Ht−1; shj� �

; 0� �

; 1n o� ��

ð8Þ

where UGh

kh Ht−1; shj� �

is the Ghk ’s payoff with the strategy

shj at time Ht − 1. Therefore, the min{max(⋅)} term is usedto exclude non-profitable strategies from selection attime Ht.

3.3 Main steps of proposed UPN control algorithmIn this paper, we proposed a novel UPN control schemebased on the PCS game, which is implemented as a dis-tributed and dynamic repeated game for opportunisticUPN operations. In the proposed scheme, individualUEs are game players and attempt to maximize theirpayoffs through a step-by-step interactive game process.Therefore, UEs can learn the current UPN situation anddetermine their best position and strategies. Generally,well-known solution concepts of game theory are pre-sented in closed-form expressions under the completeinformation. However, they cannot capture the adapta-tion issue of UPN operations over time.Usually, control algorithms have exponential time

complexity in order to solve classical optimal problems.Furthermore, they have mostly been developed in astatic setting. These methods are impractical to beimplemented for realistic system operations. In thisstudy, we do not focus on trying to get an optimal solu-tion based on the traditional optimal approach. Butinstead, an interactive game model is proposed. Usingfeedback-based self-monitoring and distributed learningtechniques, control decisions are made dynamicallywhile adapting to the current UPN situation. This deci-sion mechanism is implemented with polynomial com-plexity. In addition, we can transfer the computationalburden from a central system to UEs in a distributedonline fashion. Therefore, it is a suitable approach forthe real-world UPN system in the point view of practicaloperations. The main steps of the proposed scheme aredescribed as follows (see Fig. 1).

Step 1: Control parameters are determined by thesimulation scenario (see Table 2).Step 2: At the initial time, the L learning values in eachhost UE are equally distributed. This starting guessguarantees that each host’s price strategy enjoys thesame benefit at the beginning of PCS game.

Step 3: During the PCS game, UEs are freely movingaround among eNBs’ coverage areas, and each UEconstantly checks a connection availability to hiscorresponding eNB.Step 4: At every time step (H), each individual UEestimates his available payoffs depending on differentpositions and decides independently his position among{Gh∪Gc∪Gn} to maximize his payoff according toEqs. (1)–(3).Step 5: If a UE can directly connect to hiscorresponding eNB, it monitors UPN service requestsfrom his neighboring client UEs. Otherwise, the UEasks his UPN services to his neighboring host UEs.Step 6: When the total bandwidth amount requested byUEs is larger than the eNB’s capacity, the eNBdistributes his available bandwidth through thesequential NBS according to (4). By considering eachUE’s stability, bargaining powers (γ) of UEs areadaptively adjusted using (5)–(6).Step 7: During each time period (H), a host UE (Gh)dynamically calculates current learning values L(·) foreach strategy Sh according to (7). Based on theindividual and social learning viewpoints, L(·) valueseffectively reflect the UPN’s temporal and spatialsituations.Step 8: The probability distribution (PGh

) for each Gh’sstrategy selection is dynamically adjusted based on theobtained learning values (L(·)) using the Eq. (8).Step 9: Based on the interactive feedback process, thedynamics of our PCS game can cause a cascade ofinteractions among the UEs. As game players, UEsdynamically choose their best position and strategies inan online distributed fashion.Step 10: Under the dynamic UPN environment,individual UEs are constantly self-monitoring for thenext game process; go to Step 3.

4 Performance evaluationIn this section, we evaluate the performance of our pro-posed protocol and compare it with that of the IDME[10] and MGDT [11] schemes. Based on the simulationresults, we confirm the superiority of the proposedapproach. There are other performance analysismethods: theoretical or numerical analysis. However,these methods have to be limited in scope—limitedmodeling possibility for dynamic behavior. Therefore,for complex and complicated algorithms, such as ourproposed scheme, no capability makes tractable thetheoretical and numerical model without manysimplifications, which cannot provide precise perform-ance evaluation. In contrast to these methods, asimulation analysis allows more complex realisticmodeling for one real-world system. Therefore, in thispaper, we propose a simulation model for the


Fig. 1 The flowchart of proposed UPN control algorithm

Table 2 System parameters used in the simulation experimentsApplication type Applications Minimum bandwidth requirement for QoE Bandwidth requirement Connection duration average

I Voice telephony 128 Mbps 256 Mbps 1800 s (30 min)

II Video-phone 256 Mbps 512 Mbps 1800 s (30 min)

III Paging 384 Mbps 768 Mbps 300 s (5 min)

IV E-mail 512 Mbps 1.28 Gbps 300 s (5 min)

V Remote-login 640 Mbps 1.28 Gbps 1800 s (30 min)

VI Data on demand 768 Mbps 1.54 Gbps 1800 s (30 min)

VII Tele-conference 896 Mbps 1.82 Gbps 3000 s (50 min)

VIII Digital audio 1.28 Gbps 2.56 Gbps 1200 s (20 min)

Parameter Value Description

n 50 The number of eNBs in the UPN system

y 1000 The number of UEs in the UPN system

α 0.3 The B’s minimum charge for connected host UE

m [0.2~0.3] The expense strategy factor of host UE; it is randomly selected from range

ψ [2.0~3.0] The client UE’s satisfaction factor; it is randomly selected from range

ζ [0.1~0.2] The client UE’s expense factor; it is randomly selected from range

x 0.7 A learning rate that models to update the L values


performance evaluation of our UPN scheme. To ensurea fair comparison, the following assumptions andsystem scenario were used:

� The simulated system consists of 50 eNBs, and thenumber of UEs is 1000. The bandwidth capacity ofeach eNB is 100 Gbps, and the bandwidth resourceblocks of each eNB are orthogonal with each other.

� UEs can travel in one of six directions with an equalprobability. Simply, we consider four cases of uservelocity—fast speed (120 km/h), medium speed(60 km/h), slow speed (20 km/h), and 0 speed(stationary; 0 km/h); it is randomly selected.

� Based on the speed, a UE’s residence time in a eNBarea is estimated that (i) it is 30 s for a fast speedUE, (ii) it is 60 s for a medium speed UE, (iii) it is180 s for a slow speed UE, and (iv) it is the same asthe service duration time for a stationary UE.

� According to the UE’s characteristics, new servicerequests are generated based on the Poisson process,which is with rate λ (services/s), and the range isvaried from 0 to 3.

� There are eight different service requests; UEsrandomly generate different service requests.

� In order to represent various application services, eightdifferent traffic types are assumed based on connectionduration, bandwidth requirement, and required QoE.They are generated with equal probability.

� The durations of service applications areexponentially distributed.

� The price strategies in SIh are defined as shmin¼1= 1,sh2= 1.2, sh3= 1.4, sh4= 1.6, sh5= 1.8, and shmax¼6= 2; thecommunication resource unit is 1 bps.

� System performance measures obtained on the basisof 100 simulation runs are plotted as functions ofthe service request generation rate.

� For simplicity, we assume the absence of physicalobstacles and interferences in the experiments.

To demonstrate the validity of our proposedmethod, we measured the bandwidth utilization, QoEof service success ratio, delay rate and systemthroughput, and normalized UEs’ profit. Table 2shows the system parameters used in the simulation.Major system control parameters of the simulation,presented in Table 2, facilitate the development andimplementation of our simulator.Figure 2 compares the bandwidth utilization of each

scheme. In this study, the bandwidth utilization ismeasured as the percentage of actually used band-width, and it is a key factor to estimate the resourceusability in UPN systems. All schemes exhibit a simi-lar trend; however, the proposed scheme outperformsthe existing methods from low to high service load

distributions. By using a dynamic PCS game model,UEs in our scheme adaptively select their positionsand strategies; it can improve the bandwidthutilization compared to other schemes.Figure 3 compares the QoE in the UPN system. In this

study, we develop a QoE model using the popular sig-moid function with respect to service provision, delay,and system throughput [19, 20], that is,

QoE ¼ 11þ e− κþτþzð Þð Þ ð9Þ

where κ is the service success ratio, τ is the delay rate,and z is the UPN system throughput; κ; τ; and z are theparameters constraining the quantization of QoE andnormalized values. The QoE gain in the UPN systemachieved by the proposed scheme is a result of ourscheme’s self-adaptability and real-time effectiveness.Therefore, all UEs in the proposed scheme make controldecisions strategically to ensure services. Due to this rea-son, the proposed scheme attains superior QoE to otherexisting schemes.The curves in Fig. 4 indicate the normalized UEs’

profit in the UPN system. As game players, all UEsadaptively select their positions in a distribution onlinemanner. From the viewpoint of a host, the main goal isto adaptively decide the price strategy to maximize histotal revenue. From a client perspective, the main con-cern is to choose the best connection way to maximizehis payoff. According to an interactive feedback mechan-ism, hosts and clients learn the current UPN environ-ments and attempt to improve their profits. At everygame period, our learning-based approach can providesynergistic and complementary features to adapt dy-namic UPN situations.The simulation results shown in Figs. 2, 3, and 4 dem-

onstrate that the proposed scheme, which uses alearning-based PCS game model, can monitor thecurrent UPN conditions and adapt to highly dynamicsystem situations. In particular, all UEs in our approachgain real-time information from the UPN environmentand make intelligent decisions in a self-adapting manner.The simulation results indicate that the proposedscheme attains an attractive UPN performance, some-thing that the IDME [10] and MGDT [11] schemes can-not offer.

5 ConclusionsToday, technology can enable widespread communica-tions without depending upon traditional networkstructures. From a global perspective, UPN has a tre-mendous potential and is introducing a paradigm shiftin network services while allowing the end user to be


a host or a client. In this article, we have proposed anew UPN control scheme based on the PCS gamemodel. As game players, UEs decide their positionsconsidering the mutual-interaction relationship andlearn better their strategies under dynamic UPN envi-ronments. Using feedback-based self-monitoring anddistributed learning techniques, game players dynam-ically adapt to the current UPN situation and effect-ively maximizes their expected benefits. Based on theunique features of UPNs, our proposed scheme can

provide satisfactory services under incomplete infor-mation conditions. To demonstrate the validity of ourapproach, we compared our scheme with existingschemes. Our simulation analysis indicates that ourapproach can outperform the existing schemes in asimulation environment.There are many fascinating directions for future

work. Our next step is to develop a new mechanismdesign to provide adaptive incentives to hosts. It willdiffer from classical mechanism design by adopting

Fig. 2 Bandwidth utilization of the UPN system

Fig. 3 Service QoE in the UPN system


distributional assumptions about the agents. To be aviable solution, we also consider computational con-straints. Another interesting direction is to study theimpact of users’ social relationship in a multi-clientmulti-host UPC model. In addition, our studies willaddress the optimality issues in the UPN system fromthe operator’s perspective.

Abbreviations5G: 5 Generation; IDME: Incentive Design and Market Evolution;M2M: Machine-to-machine; MGDT: Matching Game-Based Data Trading;NBS: Nash bargaining solution; OBUs: On-board units; PCS: Positionchangeable Stackelberg; QoE: Quality of experience; QoS: Quality of Service;RRM: Radio resource management; RUs: Resource units; UEs: Userequipments; UPN: User-provided network

AcknowledgementsThis research was supported by the MSIT (Ministry of Science and ICT),Korea, under the ITRC (Information Technology Research Center) supportprogram (IITP-2017-2014-0-00636) supervised by the IITP (Institute forInformation & Communications Technology Promotion) and wassupported by the Basic Science Research Program through the NationalResearch Foundation of Korea (NRF) funded by the Ministry of Education(NRF-2015R1D1A1A01060835).

FundingThis study is funded by the MSIT (Ministry of Science and ICT) and theNational Research Foundation of Korea (NRF).

Availability of data and materialsPlease contact the corresponding author at [email protected].

Author’s informationSungwook Kim received the BS, MS degrees in computer science from theSogang University, Seoul, in 1993 and 1995, respectively. In 2003, he received thePhD degree in computer science from the Syracuse University, Syracuse, NewYork, supervised by Prof. Pramod K. Varshney in 2003. He is currently a professorof Department of Computer Science & Engineering and is a research director ofthe Internet Communication Control research laboratory (ICC Lab.). His currentresearch interests are in game theory and network design applications.

Competing interestsThe author declares no competing interests.

Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Received: 17 August 2017 Accepted: 3 December 2017

References1. Mohammed Dighriri, Ali Saeed Dayem Alfoudi, Gyu Myoung Lee and Thar

Baker, “Data Traffic Model in Machine to Machine Communications over 5GNetwork Slicing”, IEEE DeSE’, pp. 239–244 (2016)

2. K-C Chen, Machine-to-machine Communications for Healthcare. JCSE 6(2),119–126 (2012)

3. Y Wang, P Li, L Jiao, S Zhou, N Cheng, XS Shen, P Zhang, A data-drivenarchitecture for personalized QoE management in 5G wireless networks.IEEE Wirel. Commun. 24(1), 102–110 (2017)

4. Y Wang, X Lin, User-provided networking for QoE provisioning in mobilenetworks. IEEE Wirel. Commun. 22(4), 26–33 (2015)

5. KS Kim, S Uno, MW Kim, Adaptive QoS mechanism for wireless mobilenetwork. JCSE 4(2), 153–172 (2010)

6. R Sofia, P Mendes, User-provided networks: Consumer as provider. IEEECommun. Mag. 46(12), 86–91 (2008)

7. B Lorenzo, F Gomez-Cuba, J Garcia-Rois, FJ Gonzalez-Castano, JC Burguillo,A Microeconomic Approach to Data Trading in User Provided Networks, IEEEGlobecom’ 2015 (2015), pp. 1–7

8. S Kim, Game Theory Applications in Network Design (IGI Global, Hershey, 2014)9. S Kim, Multi-leader multi-follower Stackelberg model for cognitive radio

spectrum sharing scheme. Comput. Netw. 56(17), 3682–3692 (2012)10. MM Khalili, L Gao, J Huang, BH Khalaj, Incentive Design and Market Evolution

of Mobile User-Provided Networks, IEEE INFOCOM’2015 (2015), pp. 498–50311. B Lorenzo, F Javier Gonzalez-Castano, A Matching Game for Data Trading in

Operator-Supervised User-Provided Networks, IEEE ICC’2016 (2016), pp. 1–712. G Iosifidis, G Lin, J Huang, L Tassiulas, Incentive mechanisms for user-

provided networks. IEEE Commun. Mag. 52(9), 20–27 (2014)13. G Lin, G Iosifidis, J Huang, L Tassiulas, Hybrid Data Pricing for Network-

Assisted User-Provided Connectivity, IEEE INFOCOM’2014 (2014), pp. 682–690

Fig. 4 Normalized UEs’ profit in the UPN system


14. I Wayan Mustika and Ida Nurcahyani, “Proportional Fairness with AdaptiveBandwidth Allocation for Video Service in Downlink LTE”, IEEE COMNESTAT ’,pp. 54–59 (2015)

15. MS Jeon, S-K Kim, J-H Yoon, JY Lee, S-B Yang, A direction entropy-basedforwarding scheme in an opportunistic network. JCSE 8(3), 173–179 (2014)

16. S Kim, Adaptive MANET multipath routing algorithm based on thesimulated annealing approach. Sci. World J. 2014, 1–9 (2014)

17. CT Hieu, CS Hong, A connection entropy-based multi-rate routing protocolfor mobile ad hoc networks. JCSE 4(3), 225–239 (2010)

18. Y Wang, Y Liu, J Zhang, H Ye, Z Tan, Cooperative store–carry–forwardscheme for intermittently connected vehicular networks. IEEE Trans. Veh.Technol. 66(1), 777–784 (2017)

19. JY Kim, I Bang, DK Sung, Y Yi, B-H Kim, Design of a Multi-Variable QoEFunction Based on the Remaining Battery Energy, IEEE PIMRC’2015 (2015),pp. 976–980

20. C-K Tham, T Luo, Quality of contributed service and market equilibrium forparticipatory sensing. IEEE Trans. Mob. Comput. 14(4), 829–842 (2015)


Documents

Dynamic position changeable Stackelberg game for …...technologically advanced mobile devices with enhanced capabilities. To address these challenges, user-provided To address these