PHAN, PARK, AND VAN DER SCHAAR: NEAR-OPTIMAL … · 2018. 11. 19. · signals on the channel access outcomes. We show that with public signals it is possible to design near-optimal

arX

iv:1

006.

3782

v1 [

cs.N

I] 1

8 Ju

n 20

10PHAN, PARK, AND VAN DER SCHAAR: NEAR-OPTIMAL DEVIATION-PROOF MEDIUM ACCESS CONTROL DESIGNS IN WIRELESS NETWORKS 1

Near-Optimal Deviation-Proof Medium AccessControl Designs in Wireless Networks

Khoa Tran Phan, Jaeok Park, and Mihaela van der Schaar

Abstract—Distributed medium access control (MAC) protocolsare essential for the proliferation of low cost, decentralizedwireless local area networks (WLANs). Most MAC protocols aredesigned with the presumption that nodes comply with prescribedrules. However, selfish nodes have natural motives to manipulateprotocols in order to improve their own performance. This oftendegrades the performance of other nodes as well as that of theoverall system. In this work, we propose a class of protocolsthat limit the performance gain which nodes can obtain throughselfish manipulation while incurring only a small efficiency loss.The proposed protocols are based on the idea of a review strategy,with which nodes collect signals about the actions of other nodesover a period of time, use a statistical test to infer whetheror notother nodes are following the prescribed protocol, and trigger apunishment if a departure from the protocol is perceived. Weconsider the cases of private and public signals and provideanalytical and numerical results to demonstrate the propertiesof the proposed protocols.

Index Terms—Deviation-proof protocols, game theory, MACprotocols, repeated games.

I. I NTRODUCTION

I N WIRELESS communication networks, multiple nodesoften share a common channel and contend for access. To

resolve contention among nodes, many different MAC proto-cols have been devised and are currently used in internationalstandards (e.g., IEEE 802.11a/b/g protocols) [1]. When a MACprotocol is designed, two types of node behavior can be as-sumed. One iscooperativenodes that comply with prescribedprotocols, and the other isselfishnodes that are capable ofmanipulating prescribed protocols in order to improve theirown performance. With cooperative nodes, a MAC protocolcan be designed to optimize the system performance [2]–[6].However, such a protocol is not robust to selfish manipulationin that selfish nodes that can re-configure the software orfirmware may want to deviate from the protocol in pursuit oftheir self-interest [7]. Thus, selfish manipulation often resultsin a suboptimal outcome, different from the one desired bythe protocol designer [8]–[10]. On the other hand, a MACprotocol can be designed assuming selfish nodes so that theprotocol isdeviation-proof in the sense that selfish nodes donot find it profitable to deviate from the protocol. However,the incentive constraints imposed by the presence of selfishnodes in general restrict the system performance [5], [6]. Inthis paper, we aim to resolve the tension between the selfishmanipulation and optimal performance by proposing a class ofslotted MAC protocols that limit the performance gain fromselfish manipulation while incurring only a small efficiency

The authors are with Electrical Engineering Department, University ofCalifornia, Los Angeles (UCLA), 420 Westwood Plaza, Los Angeles, CA90095-1594, USA. e-mail:{kphan, jaeok, mihaela}@ee.ucla.edu.

loss compared to the optimal performance achievable withcooperative nodes.

Recently, a variety of slotted MAC protocols have beendesigned and analyzed using a game theoretic framework.With cooperative nodes, protocols can be designed to achievesystem-wide optimal outcomes. In [3], a class of slotted MACprotocols is proposed in which nodes can self-coordinate theirtransmission slots based on their past transmission actionsand feedback information to achieve a time division mul-tiple access (TDMA) outcome. In [4], the authors proposegeneralized slotted Aloha protocols that maximize systemthroughput given a short-term fairness constraint. In [5],[6],variations of slotted MAC protocols with different captureeffects, prioritization, and power diversity are studied.It hasbeen demonstrated that with cooperative nodes one can obtainoptimal throughput and expected delay as well as systemstability.

Selfish behavior in MAC protocols has also been analyzedusing game theory. In [11], the authors establish the stabilityregion for a slotted Aloha system with multipacket receptionand selfish nodes. In [12], the authors study the existence ofand convergence to Nash equilibrium in a slotted Aloha systemwhere selfish nodes have quality-of-service requirements.It isoften observed that selfish behavior often leads to suboptimaloutcomes. For example, a prisoners’ dilemma phenomenonarises among selfish nodes using the generalized slotted Alohaprotocols of [4]. A decrease in system throughput, especiallywhen the workload increases due to the selfish behaviorof nodes, is observed in [5], [6]. In the 802.11 distributedMAC protocol, competition among selfish nodes results in aninefficient use of the shared channel in Nash equilibria [8].

Research efforts have been made to devise MAC protocolsthat sustain optimal outcomes among selfish nodes. In [13],the authors induce selfish nodes to behave cooperatively in aslotted random access network by introducing an interveningnode that monitors the actions of nodes and decides itsintervention level accordingly. Pricing has also been usedas amethod to incentivize selfish nodes. In [5], the authors avoidthe degradation of system throughput due to selfish behaviorby adding a cost of transmissions and retransmissions. In [14],the network charges nodes for each successfully transmittedpacket, and the authors consider the problem of adjustingthe price-per-packet to achieve a desired operating point.Theabove approaches, however, require a central entity, whichmaynot be available in a distributed environment. In the case ofanintervention mechanism, an intervening node that is capable ofmonitoring and intervening should be present in the system.Inthe case of a pricing mechanism, a billing authority is neededto charge payments depending on the usage of the network.

http://arxiv.org/abs/1006.3782v1

PHAN, PARK, AND VAN DER SCHAAR: NEAR-OPTIMAL DEVIATION-PROOF MEDIUM ACCESS CONTROL DESIGNS IN WIRELESS NETWORKS 2

In this paper, we propose an approach that decentralizes intonodes the burden of monitoring and punishing.

To this end, we rely on the theory of repeated games[15] to sustain cooperation among selfish nodes. When thenodes in a system interact repeatedly, they can make theirdecisions dependent on their past observations. Thus, nodescan trigger a punishment when they observe a deviation froma predetermined operating point. If the loss due to punishmentoutweighs the gain from deviation, selfish nodes do not havean incentive to deviate from a predetermined operating point.The idea of using a repeated game strategy to build a deviation-proof protocol has recently been applied to several problems incommunications and networking (see, for example, [16]–[19]).However, most existing work assumes perfect monitoring,where players observe decisions that other players make. Withperfect monitoring, it is relatively easy to construct a deviation-proof protocol by using a trigger strategy, which is commonlyused to prove various versions of the Folk theorem.

In our work, we consider a scenario where the decisionsof nodes are their transmission probabilities, which cannotbe observed directly. In order to design a deviation-proofprotocol, we use the idea of a review strategy [20], [21] withwhich nodes collect imperfect signals about the decisions ofother nodes, perform a statistical test to determine whetheror not a deviation has occurred, and trigger a punishment ifthey conclude so. Our main contributions in this paper can besummarized as follows.

• We model a slotted multiple access communicationsscenario as a repeated game, which allows us to adopta repeated game strategy, including a review strategy, todesign a protocol.

• We first consider the case where nodes observe privatesignals on the channel access outcomes. We designdeviation-proof protocols assuming that a deviating nodecan employ only a deviation strategy using a constanttransmission probability. We provide a necessary andsufficient condition for a given protocol to be deviation-proof. We show that the efficiency loss of a deviation-proof protocol can be made arbitrarily small if there isa statistical test that becomes perfect as more signals areaccumulated.

• We also consider the case where nodes observe publicsignals on the channel access outcomes. We show thatwith public signals it is possible to design near-optimaldeviation-proof protocols even when nodes can use anydeviation strategy.

• Besides slotted MAC protocols, we provide a possibleapplication of our design methodology to the case ofCSMA/CA protocols with selfish nodes.

• We illustrate the properties of the proposed protocols withnumerical results.

The proposed protocols are fully distributed in the sense thatthey need no central entity to coordinate the operation of nodesand that nodes take actions depending solely on their own localinformation without communicating with other nodes.

The rest of this paper is organized as follows. In SectionII, we formulate a repeated game model for slotted multipleaccess communications. In Section III, we propose and analyze

deviation-proof protocols based on a review strategy whensignals are private, with an example presented in Section IV.In Section V, we investigate deviation-proof protocols whensignals are public, with an example presented in Section VI.In Section VII, we discuss a possible extension of the proposedprotocols to a CSMA/CA network with selfish nodes. Weconclude the paper in Section VIII.

II. REPEATEDGAME FRAMEWORK FORSLOTTED

MULTIPLE ACCESSCOMMUNICATIONS

A. Stage Game

We consider a wireless communication network with a setN = {1, 2, . . . , N} of N nodes interacting over time. Time isdivided into slots of equal length, and in each slot, a node hasa packet to transmit (i.e., saturated arrivals) and can attemptto send the packet or wait. Due to interference in the sharedcommunication channel, a packet is transmitted successfullyonly if there is no other packet transmitted in the same slot.If more than one transmission takes place in a slot, a collisionoccurs and no packet is transmitted successfully. We modelthe interaction of nodes in a single slot as a non-cooperativegame in normal form, called therandom access game.

The set of pure actions available to nodei ∈ N in a slotis Ai , {T,W}, whereT stands for “transmit” andW for“wait.” We denote the pure action of nodei by ai ∈ Ai and apure action profile bya , (a1, . . . , aN ) ∈ A ,

∏

i∈N Ai.A mixed action for nodei is a probability distribution onAi. Since there are only two pure actions, a mixed actionfor node i can be represented by a transmission probabilitypi ∈ [0, 1], and the set of mixed actions for nodei can bewritten asPi , [0, 1]. A mixed action profile is denoted byp , (p1, . . . , pN ) ∈ P ,

∏

i∈N Pi. The payoff function ofnodei is defined byui : A → R, whereui(a) = 1 if ai = Tandaj = W for all j 6= i andui(a) = 0 otherwise. That is, anode receives payoff 1 if it has a successful transmission and0 otherwise. Then, the expected payoff of a node is given bythe probability that it has a successful transmission, and witha slight abuse of notation, the payoff of nodei when mixedaction profilep is chosen can be written as

ui(p) = pi∏

j∈N\{i}

(1− pj).

The random access game is defined by the tupleΓ ,

〈N , (Ai)i∈N , (ui)i∈N 〉. It is well-known from the static anal-ysis of the random access game that there is at least one nodeichoosingpi = 1 at any pure strategy Nash equilibrium (NE)[9], [13]. That is, when nodes myopically maximize their ownpayoffs, there is at least one node always transmitting its pack-ets, and thus there can be at most one node obtaining a positivepayoff. Moreover, in the unique symmetric NE, every nodetransmits with probability 1, which results in zero payoff forevery node. On the other hand, the symmetric Pareto optimal(PO) outcome is achieved when each node choosespc = 1/N ,which yields a positive payoffuPO = (1 − 1/N)N−1/N forevery node [22]. We callpc the cooperation probabilityanduPO the optimal payoff.


B. Repeated Game

We now formulate the repeated random access game, wherethe actions of a node can depend on its past observations, orinformation histories. Time slots are indexed byt = 1, 2, . . ..At the end of each slot, nodes obtain signals on the pure actionprofile chosen in the slot. LetZi be the finite set of signalsthat nodei can receive. DefineZ ,

∏

i∈N Zi, and letQbe a mapping fromA to ∆(Z), whereQ(a) represents thedistribution of signals when nodes choose pure action profilea. A signal structureis specified by the pair(Z, Q). We saythat signals areprivate if there existz , (z1, . . . , zN ) ∈ Zand a ∈ A such thatzi 6= zj for some i, j ∈ N and z

occurs with positive probability inQ(a). We say that signalsare public if they are not private. That is, signals are privateif it is possible for nodes to receive different signals, whereassignals are public if signal realization is the same for all nodes.

The history of nodei in slot t, denoted byhti, contains the

signals that nodei has received by the end of slott− 1. Thatis, ht

i = (z0i , . . . , zt−1i ), for t = 1, 2, . . ., wherezti represents

the signal that nodei receives in slott and z0i is set as anarbitrary element ofZi.1 The set of slott histories of nodei is written asHt

i , and the set of all possible histories ofnodei is given byHi , ∪∞

t=1Hti . The (behavior) strategy of

nodei specifies a mixed action for nodei in the stage gameconditional on a history it reaches. Thus, it can be representedby a mappingσi : Hi → Pi. We useΣi to denote the set ofstrategies of nodei. We define aprotocolas a strategy profileσ , (σ1, . . . , σN ) ∈ Σ ,

∏

i∈N Σi. To evaluate payoffs inthe repeated game model, we use the limit of means criterionsince the length of a slot is typically short.2, A protocol σinduces a probability distribution on the sequences of mixedaction profiles{pt}∞t=1, wherept is the mixed action profile inslot t. The payoff of nodei under protocolσ can be expressedas

Ui(σ) = limJ→∞

E

[

1

J

J∑

t=1

ui(pt)

∣

∣

∣

∣

σ

]

,

assuming that the limit exists. If the limit does not exist, wereplace the operatorlim by lim inf .3

We say that a signal structure(Z, Q) is symmetric ifZ1 = · · · = ZN and the signal distributionQ is preservedunder permutations of indices for nodes. With a symmetricsignal structure, we haveH1 = · · · = HN and thusΣ1 =· · · = ΣN sinceP1 = · · · = PN . We say that a protocolσ issymmetric if it prescribes the same strategy to every node, i.e.,σ1 = · · · = σN . In the remainder of this paper, we assumethat the signal structure is symmetric and focus on symmetricprotocols. Since a symmetric protocol can be represented with

1In slot t ≥ 2, nodei also knows its past mixed actions(p1i , . . . , pt−1

i )

and their realizations(a1i , . . . , at−1

i ). However, since we focus on repeatedgame strategies using only past signals, we do not include them in our historyspecification.

2For example, the slot duration of the 802.11 DCF basic accessmethod is20µs [1].

3Although we consider the limit of means criterion, the following resultscan be extended with a complication to the case of the discounting criterionas long as the discount factor is close to 1, as in [20]. Our analysis can alsobe extended to the case where a node incurs a transmission cost whenever itattempts transmission, as long as the cost is small.

a strategy, we use the two terms “protocol” and “strategy”interchangeably. Also, we useU(σ1;σ2) to denote the payoffof a node when it follows strategyσ1 while every other nodefollows strategyσ2. Note that a symmetric protocol yieldsthe same payoff to every node, thus achieving fairness amongnodes.

C. Deviation-Proof Protocols and the Efficiency Loss

The goal of this paper is to build a protocol that fulfillsthe following two requirements: (i) selfish nodes do not gainfrom manipulating the protocol, and (ii) the protocol achievesan optimal outcome. We formalize the first requirement usingthe concept of deviation-proofness while evaluating the secondrequirement using the concept of efficiency loss.

Definition 1: A protocol σ ∈ Σi is deviation-proof (DP)against a strategyσ′ ∈ Σi if

U(σ;σ) ≥ U(σ′;σ).

Whenσ is DP againstσ′, a node cannot gain by deviating toσ′ while other nodes followσ. Hence, if a deviating node hasonly one possible deviation strategyσ′, a protocolσ that is DPagainstσ′ satisfies the first requirement. However, in principle,a deviating node can choose any strategy inΣi, in which casewe need a stronger concept than deviation-proofness.

Let Σc ⊂ Σi be the set of all constant strategies that pre-scribe a fixed transmission probabilitypd, called thedeviationprobability, in every slot regardless of the history.

Definition 2: A protocolσ ∈ Σi is robustǫ-deviation-proof(robustǫ-DP) if

U(σ;σ) + ǫ ≥ U(σ′;σ) for all σ′ ∈ Σc.

In words, if a protocolσ is robustǫ-DP, a node cannot gainmore thanǫ by deviating to a constant strategy using a fixeddeviation probability. If there is a fixed cost of manipulatinga given protocol and a deviation strategy is constrained toconstant strategies, then a robustǫ-DP protocol can preventa deviation by havingǫ smaller than the cost. When there isno restriction on possible deviation strategies, the followingconcept is relevant.

Definition 3: A protocol σ ∈ Σi is ǫ-Nash equilibrium(ǫ-NE) if

U(σ;σ) + ǫ ≥ U(σ′;σ) for all σ′ ∈ Σi.

We define the system payoff as the sum of the payoffs ofall the nodes in the system. Then, the system payoff whenall nodes follow a protocolσ is given byV (σ) , NU(σ;σ).SinceNuPO is the maximum system payoff in the stage gameachievable with a symmetric action profile, we measure theefficiency loss from using a protocol by the following concept.

Definition 4: The efficiency lossof a protocolσ ∈ Σi isdefined as

C(σ) = NuPO − V (σ). (1)

Definition 5: A protocolσ ∈ Σi is δ-Pareto optimal(δ-PO)if

C(σ) ≤ δ.


TABLE IMAIN RESULTS

Section Signal Test Robustness to selfish manipulation Optimality

III (Proposition 2) Private (general) Asymptotically perfect test DP against a strategy usingδ-POa constant transmission probability

IV (Theorem 2) Private (ACK feedback) ACK ratio test robustǫ-DP δ-PO

V (Proposition 5) Public (general) Asymptotically perfect testDP against a strategy using a constant

δ-POtransmission probability in a review phaseVI (Theorem 4) Public (ternary feedback) Idle slot ratio test ǫ-NE δ-PO

A δ-PO protocol is a protocol that yields an efficiency lossless than or equal toδ. Let σc be the strategy that prescribesthe cooperation probabilitypc in every slot regardless ofthe history.4 Then U(σc;σc) = uPO, and thusσc achievesfull efficiency (i.e.,0-PO). However,σc is not DP against aconstant deviation strategy withpd > pc as a deviating nodecan increase its payoff frompc(1−pc)

N−1 to pd(1−pc)N−1.

We construct DP protocols that achieve a near-optimal systempayoff in the following sections, whose main results aresummarized in Table I.

III. D EVIATION -PROOFPROTOCOLSWHEN SIGNALS ARE

PRIVATE

A. Description of Protocols with Private Signals

In this section, we consider private signals. As pointedout in [23], when signals are private, it is difficult, if notimpossible, to construct a NE that has a simple structure andis easy to compute. Thus, we focus on a simpler problemof constructing a DP protocol against a constant deviationstrategyσd ∈ Σc. Since a simple protocol such asσc isDP againstσd with pd ∈ [0, pc], we restrict our attention todeviation strategies withpd ∈ (pc, 1]. Note that the restrictionto constant deviation strategies is relevant when a deviatingnode has a limited deviation capability in the sense that it canreset its transmission probability only at the beginning.

We build a protocol based on a review strategy. When a nodeuses a review strategy, it starts from a review phase for whichit transmits with probabilitypc and collects signals. When thereview phase ends, the node performs a statistical test whosenull hypothesis is that every node transmitted with probabilitypc during the review phase, using the collected signals. Then,the node moves to a reciprocation phase for which it transmitswith probability pc (cooperation phase) if the test is passedand with probability 1 (punishment phase) if the test fails.When the reciprocation phase ends, a new review phase begins.A review strategy, denoted byσr, can be characterized bythree elements,(R,L,M), whereR is a statistical test, andL andM are natural numbers that represent the lengths of areview phase and a reciprocation phase, respectively. Thus,we sometimes writeσr as σr(R,L,M). With a protocolbased on review strategyσr(R,L,M), each node performsthe statistical testR after slot l(L + M) + L based on thesignals (zl(L+M)+1

i , . . . , zl(L+M)+Li ) collected in the recent

review phase, forl = 0, 1, . . .. A schematic representation ofa review strategy with private signals is provided in Fig. 1.

4Note that σc corresponds to a slotted Aloha protocol that does notdistinguish new and backlogged packets as in [12].

R i h (L l t )Review phase (L slots)

(cooperate and

collect signals)

Statistical test

fail pass

Punishment Cooperation

fail pass

Punishment

phase

(M slots)

Cooperation

phase

(M slots)

Fig. 1. Review strategy with private signals

The review strategies in [20] differ from the review strate-gies described above in that in [20] a new review phase beginswithout having a reciprocation phase if the test is passed. Akey difference between the model of [20] and ours is that inthe principal-agent model of [20] only the principal reviewsthe performance of the agent whereas in our model multiplenodes simultaneously review the performance of other nodes.When signals are private, nodes do not know the results ofthe test performed by other nodes. Hence, without a recipro-cation phase followed by a successful review, nodes cannotdistinguish a deviating node from a punishing node and thuscannot coordinate to begin a new review phase. This problemcan be avoided when signals are public, because the resultsof the test are the same across nodes in the case of publicsignals.5 Thus, a review strategy is modified accordingly inSection V, where we consider public signals.

B. Analysis of Protocols with Private Signals

1) Existence of Deviation-Proof Protocols:For the sake ofanalysis, we consider a fixed constant deviation strategyσd ∈Σc and the corresponding deviation probabilitypd ∈ (pc, 1].Given a symmetric protocol that prescribes a review strategy,we can compute two probabilities of errors.

• False punishment probabilityPf (R,L): probability thatthere is at least one node whose test fails after a reviewphase when nodes follow a protocolσr.

• Miss detection probabilityPm(R,L; pd): probability thatthere is no node among those followingσr whose testfails after a review phase when there is exactly one nodedeviating toσd.

5Alternatively, this problem can be avoided by having a node that has afailed test broadcast that it moves to a punishment phase, asin [21]. However,this requires communication among nodes, which we do not allow in thispaper.


Since the payoff of every node is zero when there are twoor more punishing nodes, we need to have a small falsepunishment probability to achieve a small efficiency loss. Onthe other hand, in order to punish a deviating node effectively,we need to have a small miss detection probability. Indeed, aswill be shown in Proposition 2, achieving smallPf andPm

is sufficient to design a near-optimal DP protocol.The payoff of a node when every node follows a review

strategyσr is given by

U(σr;σr) =(1− pc)

N−1

L+M

(

pcL+ pc (1− Pf )M

+(

(1− Pf )N−1

N

(

1− (1− Pf )1

N

))

M

)

.

The payoff of a node choosing deviation strategyσd whileother nodes followσr is given by

U(σd;σr) =pd(1− pc)

N−1

L+M(L+ PmM) .

By Definition 1,σr is DP againstσd if and only if

U(σr;σr) ≥ U(σd;σr). (2)

The following theorem provides a necessary and sufficientcondition for a review strategy to be DP againstσd.

Theorem 1:Given pd ∈ (pc, 1], protocol σr(R,L,M) isDP againstσd if and only if g(R,L; pd) > 0 and M ≥Mmin(R,L; pd), where

g(R,L; pd) , (1− Pf (R,L))N−1

N − (1− pc) (1− Pf (R,L))

− pdPm(R,L; pd) (3)

and

Mmin(R,L; pd) ,(pd − pc)L

g(R,L; pd).

Proof: Note that the net payoff gain from deviating to thedeviation strategyσd is given by

U(σd;σr)− U(σr;σr)

=(1− pc)

N−1

L+M

(

(pd − pc)L− g(R,L; pd)M)

. (4)

The first term in (4) is the gain during a review phase whilethe second term is the loss during a reciprocation phase.By (2), σr is DP againstσd if and only if (pd − pc)L ≤g(R,L; pd)M . It is easy to check thatg(R,L; pd) > 0 andM ≥ Mmin(R,L; pd) imply (pd − pc)L ≤ g(R,L; pd)M .Suppose that(pd−pc)L ≤ g(R,L; pd)M . Since(pd−pc)L >0, we must haveg(R,L; pd) > 0, which in turn impliesM ≥ Mmin(R,L; pd).

Theorem 1 shows that for a given statistical testR, wecan construct a DP protocol based on the test if and only ifthere exists a natural numberL such thatg(R,L; pd) > 0.Once we find suchL, we can use it as the length of areview phase and then choose a natural numberM satisfyingM ≥ Mmin(R,L; pd) to determine the length of a recipro-cation phase. An immediate consequence of Theorem 1 isthat if protocolσr(R,L,M) is DP againstσd, then protocolσr(R,L,M ′) with M ′ ≥ M is also DP againstσd. Thus,Mmin(R,L; pd) can be interpreted as the minimum length of a

reciprocation phase to makeσr(R,L,M) DP againstσd. Thefollowing result provides a sufficient condition onR underwhich we can findL such thatg(R,L; pd) > 0 and thus a DPprotocol based onR can be constructed.

Corollary 1: Given pd ∈ (pc, 1], suppose thatR satisfieslimL→∞ Pf (R,L) = 0 and limL→∞ Pm(R,L; pd) = 0. Thenthere existsL such thatg(R,L; pd) > 0.

Proof: By (3), limL→∞ Pf (R,L) = 0 andlimL→∞ Pm(R,L; pd) = 0 imply that limL→∞ g(R,L; pd) =pc > 0. Thus,g(R,L; pd) > 0 for sufficiently largeL.

Combining Theorem 1 and Corollary 1, we can see thatif test R is “asymptotically perfect” in the sense that thetwo probabilities of errors converge to zero as the test isperformed using more signals, then we can always design areview strategy based onR that is DP againstσd.

2) Near-Optimal Deviation-Proof Protocols:Suppose thatevery node follows a review strategyσr . Since signals provideonly imperfect information about the transmission probabil-ities of other nodes, it is possible that a punishment istriggered, which results in an efficiency loss as confirmed inthe following proposition. We useΣr to denote the set of allreview strategies with private signals.

Proposition 1: C(σr) ≥ 0 for all σr ∈ Σr (with equality ifand only ifPf = 0).

Proof: Fix a protocolσr(R,L,M) ∈ Σr. By (1), we canexpress the efficiency loss ofσr as

C(σr) =NM

L+M(1− pc)

N−1

(

pcPf − (1 − Pf )N−1

N + (1 − Pf ))

. (5)

Since(1 − Pf )N−1

N is concave, we have(1− Pf )N−1

N ≤ 1−N−1N Pf for Pf ∈ [0, 1], with equality if and only ifPf = 0.

Using pc = 1/N , we obtain the result.Proposition 1 says that there is always a positive efficiency

loss resulting from a review strategy unless there is a perfectstatistical test in the sense that punishment is never triggeredwhen every node followsσr (i.e.,Pf = 0). Punishment resultsin an efficiency loss because the system payoff is the sameasNuPO when there is only one punishing node while it iszero when there are two or more. Hence, a longer punishmentinduces a larger efficiency loss. As can be seen from (5),for given R andL, C(σr) is non-decreasing (and increasingif Pf > 0) in M . Therefore, if we find(R,L) such thatg(R,L; pd) > 0, choosingM = ⌈Mmin(R,L; pd)⌉ minimizesthe efficiency loss while havingσr(R,L,M) DP againstσd,where⌈·⌉ denotes the ceiling function. This observation allowsus to reduce the design choice from(R,L,M) to (R,L). Thefollowing proposition provides a sufficient condition on thestatistical test for constructing a near-optimal DP protocol.

Proposition 2: Given pd ∈ (pc, 1], suppose thatR satisfieslimL→∞ Pf (R,L) = 0 and limL→∞ Pm(R,L; pd) = 0. Thenfor any δ > 0, there existL andM such thatσr(R,L,M) isDP againstσd andδ-PO.

Proof: Since limL→∞ g(R,L; pd) = pc > 0, there existsL1 such thatg(R,L; pd) > 0 for all L ≥ L1. By Theorem 1,σr(R,L, ⌈Mmin(R,L; pd)⌉) is DP againstσd for all L ≥ L1.


SinceC(σr) is non-decreasing inM , we have

0 ≤ C(σr) ≤N (Mmin(R,L; pd) + 1)

L+ (Mmin(R,L; pd) + 1)(1− pc)

N−1

[

pcPf − (1 − Pf )N−1

N + (1− Pf )]

. (6)

Note that limL→∞ Mmin(R,L; pd)/L = (pd − pc)/pc, andthus the right-hand side of (6) converges to zero asL goes toinfinity, which implies limL→∞ C(σr) = 0. Therefore, thereexistsL2 such thatC(σr) < δ for all L ≥ L2. ChooseL ≥max{L1, L2} andM = ⌈Mmin(R,L; pd)⌉ to obtain a protocolwith the desired properties.

Proposition 2 shows that the efficiency loss of a DP protocolcan be made arbitrarily small when there is an asymptoticallyperfect statistical test. It also points out a trade-off betweenoptimality and implementation cost. In order to make theefficiency loss within a small desired level,L should be chosensufficiently large, which requires largeM by the relationshipM = ⌈Mmin(R,L; pd)⌉. At the same time, asL and Mbecome larger, each node needs to maintain longer memory toexecute a review strategy, which can be considered as higherimplementation cost.

We make a couple of remarks. First, the constructed DPprotocols are DP against multiple nodes deviating toσd.The payoff gain from deviation decreases with the numberof deviating nodes. Hence, if a protocol can deter a singlenode from deviating toσd, it can also deter multiple nodesfrom doing so. Second, the constructed DP protocols areDP against a more general class of deviation strategies withwhich a permanent deviation topd occurs in an arbitrary slot(determined deterministically or randomly). A deviating nodecannot gain starting from a review phase after a deviationoccurs, and without discounting its temporary gain is smallerthan the perpetual loss.

IV. PROTOCOLSBASED ON THEACK RATIO TEST

A. Description of the ACK Signal Structure and ProtocolsBased on the ACK Ratio Test

In this section, we illustrate the results in Section IIIby considering a particular signal structure and a particularstatistical test. In the slotted Aloha protocol in [24], a nodereceives an acknowledgement (ACK) signal if it transmits itspacket successfully and no signal otherwise. In the ACK signalstructure, the signal space can be written asZi = {S, F}, forall i ∈ N , wherezi = S means that nodei receives an ACKsignal andF means that it does not. We assume that thereis no error in the transmission and reception of ACK signals.The signal distributionQ is such thatQ(a) puts probabilitymass 1 onz ∈ Z with zi = S and zj = F for all j 6= iif ai = T and aj = W for all j 6= i, for eachi ∈ N , andprobability mass 1 onzi = F for all i otherwise. Since it ispossible for nodes to receive different signals (only one nodereceives signalS when a success occurs), ACK signals areprivate.

In a review strategy with the ACK signal structure, a nodeuses its ACK signals collected in a review phase to perform astatistical test. We propose a particular statistical testcalled theACK ratio test. The test statistic of the ACK ratio test is the

R

1

R

2

{F}

….

C

1

C

M

P

1

P

M

….

….

Start

{S,F}

{F}

{S}

{S,F}

{S,F} {S,F}

Review

Phase

Reciprocation

Phase

R

L+1

R

2L

R

L

{F}

R

2L-1R

3L-3

….

{S, F}{F}

{S}

{S}

{S,F}{F}

{F}

{S,F} {S}

{S}

{S,F}

{S}

{S,F}

{S,F}

R

L+2

{F}{S}

….

….

Fig. 2. Automaton representation of a review strategy basedon the ACKratio test with parameters satisfying1 ≤ L(qc −B) < 2.

ratio of the number of ACK signals obtained in a review phaseto the length of a review phase, i.e.,

∑Lk=1 χ{z

τ+ki = S}/L,

whereχ is an indicator function andτ + 1 represents a slotwhen a review phase begins. The test is passed if the statisticexceeds a threshold value,qc−B, whereqc , pc(1− pc)

N−1

and B ∈ (0, qc), and fails otherwise. Note thatqc is theexpected value of the ACK ratio when every node transmitswith probabilitypc. If there is a deviating node, the ACK ratiotends to be smaller because its expected value is reduced toqd , pc(1−pc)

N−2(1−pd). The ACK ratio test is designed todistinguish between these two events statistically while havingB as a “margin of error.” Since the ACK ratio test can beidentified with B, we useB instead ofR to represent theACK ratio test.

A review strategy based on the ACK ratio test,σr(B,L,M),can be represented formally as follows:

σr(hti) =

pc, t ∈ [l(L+M) + 1, l(L+M) + L],1, t ∈ [l(L+M) + L+ 1, (l+ 1)(L+M)],

∑l(L+M)+Lk=l(L+M)+1 χ{z

ki = S}/L ≤ qc −B

pc, t ∈ [l(L+M) + L+ 1, (l+ 1)(L+M)],∑l(L+M)+L

k=l(L+M)+1 χ{zki = S}/L > qc −B

for l = 0, 1, . . .. Fig. 2 shows an automaton representation ofthe review strategyσr for 1 ≤ L(qc −B) < 2 so that a nodetriggers punishment if it obtains less than two successes ina review phase. Each state transition is labeled by the set ofsignals that induce the transition. In a reciprocation phase, anode goes through either states P1 to PM (punishment phase)or states C1 to CM (cooperation phase) depending on thenumber of ACK signals obtained in the review phase. Notethat the number of states in the automaton representation ofprotocol σr(B,L,M) is given by Ns(σ

r) = kL − k(k −1)/2 + 2M , wherek ≥ 2 is the natural number satisfyingk − 2 ≤ L(qc −B) < k − 1.


B. Analytical Results

Let F (y;n, p) be the cumulative distribution function of abinomial random variable with total number of trialsn andprobability of successp, i.e.,

F (y;n, p) =

⌊y⌋∑

m=0

(

n

m

)

pm(1− p)n−m,

where⌊·⌋ denotes the floor function. Suppose that every nodetransmits with probabilitypc in a review phase. Then, thenumber of ACK signals that a node receives in the reviewphase follows a binomial distribution with parametersL andqc. Thus, the probability that a punishment is triggered bynodei is given by

Pr

{

L∑

k=1

χ{zτ+ki = S}/L ≤ qc −B

}

= F (L(qc −B);L, qc),

and the false punishment probability is given by

Pf (B,L) = 1− [1− F (L(qc −B);L, qc)]N.

Suppose that there is exactly one deviating node usingσd, i.e.,transmitting with probabilitypd in a review phase. Then, thesuccess probability in the binomial distribution changes to qd,and thus the miss detection probability is given by

Pm(B,L; pd) = [1− F (L(qc −B);L, qd)]N−1

.

The monotonicity ofPf and Pm with respect to the testparameterB is readily obtained.

Proposition 3: Given pd ∈ (pc, 1] and L, Pf (B,L) andPm(B,L; pd) are non-increasing and non-decreasing inB ∈(0, qc), respectively.

Proof: The proof is straightforward by noting thatF (L(qc − B);L, qc) and F (L(qc − B);L, qd) are non-increasing inB ∈ (0, qc).

As the margin of error is larger, it is more likely that thetest is passed, yielding a smaller false punishment probabilityand a larger miss detection probability. The following lemmaexamines the asymptotic properties ofPf and Pm as Lbecomes large.

Lemma 1:Given pd ∈ (pc, 1], limL→∞ Pf (B,L) = 0for all B ∈ (0, qc), limL→∞ Pm(B,L; pd) = 0 for allB ∈ (0, qc − qd), and limL→∞ Pm(B,L; pd) = 1 for allB ∈ (qc − qd, qc).

Proof: Since χ{zτ+ki = S}, for k = 1, . . . , L, can be

considered asL i.i.d. random variables, we can apply thestrong law of large numbers to the ACK ratio [25]. When everynode transmits with probabilitypc, the ACK ratio convergesalmost surely toqc asL goes to infinity, which implies thatthe false punishment probability goes to zero for allB > 0.When there is exactly one node transmitting with probabilitypd, the ACK ratio of a node transmitting with probabilitypcconverges almost surely toqd asL goes to infinity. Hence, ifqd < qc−B (resp.qd > qc−B), the miss detection probabilitygoes to zero (resp. one).

Lemma 1 provides a sufficient condition on the ACK ratiotest to apply Proposition 2.

Proposition 4: Suppose thatB ∈ (0, qc − qd). For anyδ >0, there existL andM such thatσr(B,L,M) is DP againstσd andδ-PO.

Proof: The proposition follows from Lemma 1 and Propo-sition 2.

Proposition 4 states that for givenpd ∈ (pc, 1], we canconstruct a protocolσr that is DP againstσd and achieves anarbitrarily small PoS by settingB such that0 < B < qc−qd =pc(1− pc)

N−2(pd − pc). Note that aspd is larger, it is easierto detect a deviation, and thus we have a wider range ofBthat renders deviation-proofness and near-optimality.

So far we have considered a constant deviation strategyσd

prescribing a fixed deviation probabilitypd and designed aprotocol that is DP againstσd. However, it is natural to regardpd as a choice by a deviating node, and thus in principle itcan be any probability. Now we allow the possibility that adeviating node can use any constant deviation strategy, andweobtain the following result.

Theorem 2:For anyǫ > 0 andδ > 0, there existB, L, andM such thatσr(B,L,M) is robustǫ-DP andδ-PO.

Proof: The proof is relegated to Appendix A.We can interpretǫ andδ as performance requirements. Re-

quiring smallerǫ makes protocols more robust while requiringsmallerδ results in a higher system payoff. In addition to thetrade-off between optimality and implementation cost alreadymentioned following Proposition 2, we can identify a similartrade-off between robustness and implementation cost in thatsmaller ǫ in general requires largerL andM to construct arobustǫ-DP protocol.

C. Numerical Results

To provide numerical results, we consider a network with5 nodes, i.e.,N = 5 and pc = 1/N = 0.2. Fig. 3 plots thefalse punishment probabilityPf (B,L) and the miss detectionprobabilityPm(B,L; pd) while varying the length of a reviewphaseL. Fig. 3(a) shows thatPf (B,L) exhibits a decreasingtendency asL increases, with discontinuities occurring at thepoints where the floor function ofL(qc − B) has a jump.We can also see thatPf is smaller for largerB, as shown inProposition 3. The upper threshold for the parameterB to yieldlimL→∞ Pm(R,L; pd) = 0 in Lemma 1 isqc − qd = 0.0512for pd = 0.7. We can see from Fig. 3(b) thatPm(B,L; pd)approaches 0 asL becomes large whenB is smaller thanthis threshold, whereas it approaches 1 whenB exceeds thethreshold. Fig. 3(b) also shows that, for fixedB, Pm(B,L; pd)is smaller for largerpd, i.e., as the deviation becomes greedier,it is more likely to be detected.

Fig. 4 plots the relationship between the length of a reviewphaseL and the minimum length of a reciprocation phase⌈Mmin(B,L; pd)⌉ to have a DP protocol for different valuesof B and pd. In Fig. 4(a), we fixpd = 0.7 and considerB = 0.04 and 0.06. Note that whenB = 0.06, some valuesof L result in large minimum values ofM , which are notdisplayed in Fig. 4(a). Also, the values ofL with which noDP protocol can be constructed for givenB and pd (i.e.,g(B,L; pd) ≤ 0) are indicated with⌈Mmin(B,L; pd)⌉ = 0in Fig. 4(a). For example, we cannot construct a DP protocol


20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

L

Pf(B

,L)

B = 0.04

B = 0.06

(a)

20 40 60 80 100 120 140 160 180 2000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

L

Pm

(B,L

;pd)

B = 0.04, pd = 0.7

B = 0.06, pd = 0.7

B = 0.06, pd = 0.85

(b)

Fig. 3. Pf (B,L) andPm(B,L; pd) versus the length of a review phaseL.

20 40 60 80 100 120 1400

100

200

300

400

500

600

700

L

⌈Mm

in(B

,L

;pd)⌉

B = 0.04

B = 0.06

(a)

20 40 60 80 100 120 140 160 180 2000

100

200

300

400

500

600

700

L

⌈Mm

in(B

,L;p

d)⌉

pd = 0.7

pd = 0.85

(b)

Fig. 4. The minimum length of a reciprocation phase⌈Mmin(B,L; pd)⌉ versus the length of a review phaseL: (a) pd = 0.7, and (b)B = 0.04.

10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

L

C(σ

r )

B = 0.04

B = 0.06

(a)

10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

L

C(σ

r )

pd = 0.6

pd = 0.85

(b)

Fig. 5. Efficiency lossC(σr) versus the length of a review phaseL: (a) pd = 0.7, and (b)B = 0.04.

using L such that 42 ≤ L ≤ 45 or 84 ≤ L ≤ 91when B = 0.06 and pd = 0.7. When B = 0.04, we canconstruct a DP protocol using anyL ≥ 10. In Fig. 4(b),we fix B = 0.04 and considerpd = 0.7 and 0.85. For theconsidered values ofpd, we observe that the minimum lengthof a reciprocation phase is increasing inpd for the most valuesof L. Also, in general, a longer review phase requires a longerreciprocation phase for fixedpd although a reverse relationshipmay be obtained, especially whenL is small. Note thatL and⌈Mmin(B,L; pd)⌉ have a linear relationship in the limit sincelimL→∞ Mmin(B,L; pd)/L = (pd − pc)/pc.

Fig. 5 plots efficiency lossC(σr) against the length of areview phaseL when the length of a reciprocation phase ischosen as⌈Mmin(B,L; pd)⌉ for different values ofB andpd.The points where efficiency loss is shown as 0 in Fig. 5(a)are where no DP protocol exists for the given parameters.We can observe that asL increases, efficiency loss tends todecrease to 0, which is consistent with Proposition 4. Fig.5(a) shows that for fixedpd = 0.7, efficiency loss is smallerwhen B = 0.06 than whenB = 0.04. This is because thefalse punishment probability of the former case is smaller thanthat of the latter case as shown in Fig. 3(a). Fig. 5(b) shows


TABLE IIPARAMETERS AND THE EFFICIENCY LOSS OFOPTIMAL PROTOCOLS

pd 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1(L,M) (22,101) (23,101) (23,94) (23,91) (23,90) (23,92) (23,96) (23,102) (22,106)C(σr) 0.0570 0.0490 0.0483 0.0480 0.0479 0.0481 0.0485 0.0490 0.0575

that efficiency loss is almost the same for the two considereddeviation probabilities whenB = 0.04.

D. Deviation-Proof Protocols with Complexity Considerations

1) Protocol Design Problem with a Complexity Constraint:So far we have explored the possibility of constructing near-optimal deviation-proof protocols based on a review strategy.We mention briefly how to incorporate complexity consid-erations in the protocol design problem. One approach tomeasure the complexity of a repeated game strategy is touse the number of the states of the smallest automaton thatcan implement the strategy [26]. Thus, we can formulate thefollowing protocol design problem, assuming that the deviationstrategy is fixed asσd.

minimize C(σr(B,L,M))

subject to σr is DP againstσd (7)

Ns(σr) ≤ Ns

The second constraint can be interpreted as a complexityconstraint which bounds the number of states in the automatonrepresentation ofσr. Without a complexity constraint, effi-ciency loss can be made arbitrarily small while satisfying thefirst constraint by choosing sufficiently largeL, as shown inProposition 4. Thus, the second constraint preventsL fromgrowing without bound.

2) Protocol Design Method:We propose a method to findan optimal protocol that solves the protocol design problem(7).

• Step 1. Determine a finite setB ⊂ (0, qc) as the set ofpossible values ofB.

• Step 2. Fix B ∈ B. Identify the set of feasible(L,M) inthe sense that(L,M) satisfies the second constraint of(7) givenB.

• Step 3. Fix feasibleL, and check whetherg(B,L; pd) in(3) is positive. If so, chooseM as the smallest feasiblevalue of M larger than or equal toMmin(B,L; pd),which we denote byM(B,L), if such a value exists.Then,σr(B,L,M(B,L)) is a protocol that satisfies bothconstraints of (7).

• Step 4. By varying B and L, obtain protocols thatsatisfy both constraints. Among these protocols, choosea protocol that yields the smallest efficiency loss.

As an illustrative example, we considerN = 5 and setNs = 28 = 256 so that protocols can be implemented using 8-bit memory. For simplicity, we fixB at 0.04, i.e.,B = {0.04}.Table II presents the parameters(L,M) and the efficiency lossof optimal protocols for different deviation probabilities. Wecan see that the optimal protocols have different parametersfor different values ofpd. Due to jumps in the efficiencyloss curves as shown in Fig. 5, the optimal protocols do notnecessarily have the longest possible review phase.

V. DEVIATION -PROOFPROTOCOLSWHEN SIGNALS ARE

PUBLIC

A. Motivation

As mentioned in Section III-A, when signals are private,nodes do not know the results of the test that other nodesperform. Hence, nodes need to have a reciprocation phaseregardless of the results of the test in order to synchronizethe beginning of a review phase across nodes. However, thisstructure of a review strategy creates a weakness that can beexploited by a deviating node. A deviating node can cooperatein a review phase to avoid punishment and then defect in areciprocation phase to obtain a payoff gain. To exclude sucha “smart” deviation, in Sections III and IV we have focusedon constant deviation strategies when designing DP protocols.However, this complication does not arise when signals arepublic. Since the result of the test is commonly known amongnodes, a reciprocation phase can be skipped when the testis passed, eliminating the room for exploitation. This addedrobustness of protocols with public signals can be regardedas thevalue of public signalswhen the signal structure is adesign choice.

B. Description of Protocols with Public Signals

When signals are public, nodes receive a common signal,and thus we usezt, without subscripti, to denote the signalin slot t. A review strategy with public signals is the sameas the one described in Section III-A except that there isno cooperation phase. That is, a new review phase beginsimmediately if the statistical test is passed. If the test fails,a punishment phase occurs as before. Since we focus onsymmetric protocols, all nodes use the same statistical test andperform the test based on the same signals. Hence, all nodesobtain the same result of the test, and thus they are alwaysin the same phase. We useσr(R,L,M) to denote the reviewstrategy with public signals that uses testR and hasL andM as the lengths of a review phase and a punishment phase,respectively.

C. Analysis of Protocols with Public Signals

We first consider a fixed deviation strategyσd that has thesame structure as the prescribed review strategyσr . That is, adeviating node transmits with probabilitypd in a review phaseand withpr in a punishment phase. Since no node obtains apositive payoff in a punishment phase, the choice ofpr doesnot affect the analysis, and thus for analysis onlypd matters.For the same reason as in Section III, we focus on the casewherepd > pc.

As in the case of private signals, we can compute two prob-abilities of errors: the false punishment probabilityPf (R,L)and the miss detection probabilityPm(R,L; pd). Since a


punishment phase occurs with probabilityPf and results inzero payoff for every node when all nodes follow a reviewstrategy, we have

U(σr ; σr) =Lqc

L+ PfM.

Note that(L+PfM) is the average length of an epoch, definedas a review phase and the following punishment phase if oneexists, andLqc is the accumulated expected payoff for a nodein an epoch. The payoff of a node choosing deviation strategyσd while other nodes followσr is given by

U(σd; σr) =Lqd

L+ (1− Pm)M.

The efficiency loss ofσr can be computed as

C(σr) =NPfMqc

L+ PfM, (8)

which is always nonnegative (positive ifPf > 0). Note that thenonnegativity of the efficiency loss does not requirepc = 1/N ,unlike in the case of private signals (see Proposition 1). Thefollowing theorem is an analogue of Theorem 1 for the caseof public signals.

Theorem 3:Given pd ∈ (pc, 1], protocol σr(R,L,M) isDP againstσd if and only if g(R,L; pd) > 0 and M ≥Mmin(R,L; pd), where

g(R,L; pd) , pc(

1− Pm(R,L; pd))

− pdPf (R,L)

and

Mmin(R,L; pd) ,(pd − pc)L

g(R,L; pd).

Proof: U(σr; σr) ≥ U(σd; σr) if and only if (pd −pc)L ≤ g(R,L; pd)M . Note that (1 − pc)

N−1(pd − pc)Lis the gain from deviation in a review phase while(1 −pc)

N−1g(R,L; pd)M is the expected loss from deviation ina punishment phase. The result can be obtained by using asimilar argument as in the proof of Theorem 1.

Theorem 3 shows that for a given statistical testR, we canconstruct a DP protocol based on the test if and only if thereexists a natural numberL such thatg(R,L; pd) > 0. Once wefind suchL, we can use it as the length of a review phaseand then chooseM larger than or equal toMmin(R,L; pd)to determine the length of a punishment phase. SinceC(σr)is non-decreasing inM for fixed R and L as can be seenfrom (8), the efficiency loss is minimized for given(R,L)by settingM = ⌈Mmin(R,L; pd)⌉ so that the length of apunishment phase is just enough to prevent deviation. Again,this observation reduces the design choices for a reviewstrategy from(R,L,M) to (R,L). The next result is ananalogue of Proposition 2, showing that if an asymptoticallyperfect statistical test is available, we can construct a near-optimal DP protocol.

Proposition 5: Given pd ∈ (pc, 1], suppose thatR satisfieslimL→∞ Pf (R,L) = 0 and limL→∞ Pm(R,L; pd) = 0. Thenfor any δ > 0, there existL andM such thatσr(R,L,M) isDP againstσd andδ-PO.

Proof: The proof is similar to that of Proposition 2, andthus is omitted for brevity.

R

1

R

2

{1,e}

P

1

P

M

…. …

.

Start

{1,e}

{0}

Review

Phase

Punishment

Phase

R

L+1

R

2L

R

L

{1,e}

R

2L-1R

3L-3

….

{0,1,e}{1,e}

{0}

{0}

{0,1,e}{1,e}

{1,e}

{0}

{0}

{0} {0,1,e}

R

L+2

{1,e}{0}

{0,1,e}

{0,1,e}

{0,1,e}

{0,1,e}

….

….

Fig. 6. Automaton representation of a review strategy basedon the idle slotratio test with parameters satisfying1 ≤ L(qc −B) < 2.

VI. PROTOCOLSBASED ON THE IDLE SLOT RATIO TEST

A. Description of the Ternary Signal Structure and ProtocolsBased on the Idle Slot Ratio Test

To illustrate the results in Section V, we consider the ternarysignal structure as in [27], [28], whose signal space can bewritten asZ = {0, 1, e}. Nodes receive signal 0 if the slotis idle, 1 if there is a success, ande if there is a collision.Signals under the ternary signal structure are public becausenodes always receive a common signal. We consider a reviewstrategy with which nodes use the fraction of idle slots in areview phase, or the idle slot ratio, as the test statistics.Ifevery node transmits with probabilitypc, the expected valueof the idle slot ratio isqc , (1 − pc)

N . On the other hand,if there is exactly one deviating node that transmits withprobability pd during a review phase, the expected value isreduced toqd , (1− pd)(1− pc)

N−1. The idle slot ratio testis passed if the idle slot ratio,

∑Lk=1 χ{z

τ+k = 0}/L, exceedsa threshold value,qc − B, and fails otherwise. Fig. 6 showsan automaton representation of a review strategyσr whoseparameters satisfying1 ≤ L(qc − B) < 2. State transitionoccurs depending on the received signals, as depicted in Fig. 6.When a review phase ends, nodes either start a new reviewphase or move to a punishment phase depending on whetherthe number of idle slots in the review phase exceedsL(qc−B)or not.

B. Analytical Results

Suppose that every node follows a review strategy basedon the idle slot ratio test,σr(B,L,M). Then, every nodetransmits with probabilitypc in a review phase, and thenumber of idle slots occurring in a review phase follows abinomial distribution with parametersL andqc. Thus, the falsepunishment probability is given by

Pf (B,L) = F (L(qc −B);L, qc).


Since a deviating node using transmission probabilitypdchanges the “success probability” of the binomial distributionfrom qc to qd, the miss detection probability is given by

Pm(B,L; pd) = 1− F (L(qc −B);L, qd).

The monotonicity ofPf and Pm with respect to the marginof errorB is stated as follows.

Proposition 6: Given pd ∈ (pc, 1] and L, Pf (B,L) andPm(B,L; pd) are non-increasing and non-decreasing inB ∈(0, qc), respectively.

Proof: The proof is straightforward by noting thatF (L(qc−B);L, qc) andF (L(qc−B);L, qd) is non-increasingin B ∈ (0, qc).

The next lemma examines the asymptotic properties ofPf

and Pm asL becomes large.Lemma 2:Given pd ∈ (pc, 1], limL→∞ Pf (B,L) = 0

for all B ∈ (0, qc), limL→∞ Pm(B,L; pd) = 0 for allB ∈ (0, qc − qd), and limL→∞ Pm(B,L; pd) = 1 for allB ∈ (qc − qd, qc).

Proof: We use the same approach as in the proof ofLemma 1. When every node transmits with probabilitypc,the idle slot ratio converges almost surely toqc asL goes toinfinity, which implies that the false punishment probabilitygoes to zero for allB > 0. When there is exactly one nodetransmitting with probabilitypd, the idle slot ratio convergesalmost surely toqd asL goes to infinity. Hence, ifqd < qc−B(resp.qd > qc−B), the miss detection probability goes to zero(resp. one).

Lemma 2 gives a sufficient condition on the idle slot ratiotest to apply Proposition 5.

Proposition 7: Suppose thatB ∈ (0, qc − qd). For anyδ >0, there existL andM such thatσr(B,L,M) is DP againstσd andδ-PO.

Proof: The proposition follows from Lemma 2 and Propo-sition 5.

Proposition 7 states that for givenpd ∈ (pc, 1], we canalways construct a protocol based on the idle slot ratio testthatis DP againstσd and achieves an arbitrarily small efficiencyloss by choosingB such that0 < B < qc− qd = (pd−pc)(1−pc)

N−1. As in the case of the ACK ratio test, we have a widerrange ofB that renders deviation-proofness aspd is larger.

We have considered deviation strategies that prescribe aconstant transmission probability in a review phase. We nowconsider the case where a deviating node can use any strategyin Σi, which includes strategies that adjust transmission proba-bilities depending on the signals obtained in the current reviewphase. The following theorem shows that we can construct aprotocol based on the idle slot ratio test that is deviation-proofand near-optimal.

Theorem 4:For anyǫ > 0 andδ > 0, there existB, L, andM such thatσr(B,L,M) is ǫ-NE andδ-PO.

Proof: The proof is relegated to Appendix B.The interpretation ofǫ andδ as performance requirements as

well as the trade-off between performance and implementationcost, as discussed following Theorem 2, is still valid in thecaseof public signals.

Remark: Protocols with sliding windows.Suppose that morethan L(qc − B) idle slots have occurred before the end

of a review phase. Then, a deviating node, knowing thata punishment will not occur regardless of the outcome inthe remaining slots of the review phase, can increase itstransmission probability for the remainder of the review phaseto obtain a payoff gain. We can make a protocol based on areview strategy robust to such a manipulation by having slidingwindows for review phases. In a review strategy with slidingwindows, a review phase begins in each slot unless there is anew or ongoing punishment. Once the idle slot ratio test basedon the recentL signals fails, a review stops and punishmentoccurs forM slots. Once a punishment phase ends, a reviewphase begins in each slot until another punishment occurs. Adetailed analysis of this protocol is left for future research.

C. Numerical Results

We provide numerical results to demonstrate the findingson DP protocols with public signals. Again, we consider anetwork withN = 5 andpc = 1/N = 0.2 while varyingpdand the protocol parameters.

Fig. 7 plotsPf andPm against the length of a review phaseL for B = 0.1 and 0.25 when pd = 0.7. As in the case ofprivate signals,Pf tends to decrease withL and approacheszero for largeL. Also, Pf is smaller for a larger marginof error B. Note that the upper threshold forB to yieldlimL→∞ Pm(B,L; pd) = 0 in Lemma 2 isqc − qd = 0.2048.We can see that whenB is larger than this threshold,Pm

tends to increases withL and approaches 1 for largeL. On thecontrary, whenB is smaller than the threshold,Pm approacheszero for largeL, making the test asymptotically perfect.

Fig. 8 plots the minimum length of a reciprocation phase⌈Mmin(B,L; pd)⌉ to have a DP protocol as a function ofthe length of a review phaseL. We can see that for fixedpd = 0.7, a longer reciprocation phase is needed for largerB, except whenL is small, and that DP protocols cannotbe constructed with some small values ofL whenB = 0.1(displayed as⌈Mmin(B,L; pd)⌉ = 0). Also, whenB = 0.1,a longer reciprocation phase is needed forpd = 1 thanfor pd = 0.7. The efficiency loss of DP protocols with theminimum length of a reciprocation phase is shown in Fig. 9.We can see that largerB results in smaller efficiency loss,becausePf is smaller for largerB as shown in Proposition6. Also, efficiency loss approaches zero asL becomes large,which is consistent with Proposition 7.

VII. E XTENSION TO A CSMA/CA NETWORK WITH

SELFISH NODES

In this section, we discuss how the proposed protocolsbased on a review strategy can be modified for a CSMA/CAnetwork. In [9], the authors consider a CSMA/CA networkin which a selfish node uses a fixed contention window size.They show a discrepancy between NE and Pareto optimum.The contention window size of each node at the unique POoutcome is denoted byW ∗, which results in a transmissionprobability pc = 2/(W ∗ + 1). The optimal payoffuPO, i.e.,the throughput at Pareto optimum, can be computed using Eq.(1) of [9], based on the model of [29].


10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

L

Pf(B

,L

)

B = 0.1

B = 0.25

(a)

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

L

Pm

(B,L

;pd)

B = 0.1

B = 0.25

(b)

Fig. 7. Pf (B,L) and Pm(B,L; pd) versus the length of a review phaseL whenpd = 0.7.

10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

3000

3500

4000

L

⌈Mm

in(B

,L;p

d)⌉

B = 0.1

B = 0.25

(a)

10 20 30 40 50 60 70 80 90 1000

200

400

600

800

1000

1200

1400

L

⌈Mm

in(B

,L;p

d)⌉

pd = 0.7

pd = 1

(b)

Fig. 8. The minimum length of a punishment phase⌈Mmin(B,L; pd)⌉ versus the length of a review phaseL: (a) pd = 0.7, and (b)B = 0.1.

10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

L

C(σ

r)

B = 0.1

B = 0.25

(a)

10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

L

C(σ

r)

pd = 0.7

pd = 1

(b)

Fig. 9. Efficiency lossC(σr) versus the length of a review phaseL: (a) pd = 0.7, and (b)B = 0.1.

A review strategy for a CSMA/CA network can be describedas follows, assuming private signals (i.e., sensing informationis private). At the beginning, nodes are synchronized to starta review phase. In a review phase, which lasts forL timeperiod, each node sets its window size atW ∗. After a reviewphase, each node computes its actual throughput, denoted byτi, and compares it withuPO, the expected throughput whenno node has deviated fromW ∗. A deviating node choosesits window sizeW d smaller thanW ∗ in order to increase itstransmission probability frompc and thus to obtain a higherthroughput. Since a deviation decreases the throughput of the

well-behaved nodes, we can design a test such that the testperformed by nodei is passed if and only ifτi > uPO −B for some constantB ∈ (0, uPO). If the test of nodei ispassed, nodei moves to a cooperation phase during which itcontinues to set its window size atW ∗. Otherwise, it movesto a punishment phase during which it sets its window size atthe minimum value 1. A reciprocation phase lasts forM timeperiod, and a new review phase begins after a reciprocationphase.

As in a slotted Aloha network,τi converges almost surelyto uPO asL goes to infinity, and thus the proposed test can be


made asymptotically perfect by choosing an appropriate valueof B. Hence, when window sizes take discrete values, we canconstruct a protocol that is DP against any constant deviationstrategy and achieves a small efficiency loss, following asimilar approach to Theorem 2. We omit the details due tolack of space.

VIII. C ONCLUSION

It is well-known that the decentralized operation of multipleaccess communication systems with selfish nodes often resultsin an inefficient use of a shared medium. To overcome thisproblem, we have proposed new classes of slotted MAC pro-tocols that are robust to selfish manipulation while achievingnear-optimality. The proposed protocols are based on the ideaof a review strategy in the theory of repeated games. Withthe proposed protocols, nodes perform a statistical test todetermine whether a deviation has occurred and trigger a pun-ishment when they conclude so. We have provided conditionsunder which we can design deviation-proof protocols with asmall efficiency loss and illustrated the results with particularstatistical tests. Our framework and design methodology arenot limited to multiple access communications. They can beapplied to other networking and communication scenarios inwhich agents obtain imperfect signals about the decisions ofother agents and a deviation influences the distribution ofsignals.

APPENDIX APROOF OFTHEOREM 2

Choose arbitraryǫ > 0 and δ > 0. Define pǫ ,

pc + ǫ/(1 − pc)N−1. Note thatpǫ is the minimum deviation

probability with which a deviating node gains at leastǫ in aslot when other nodes transmit with probabilitypc. ChooseB ∈ (0, ǫ/(N − 1)). Note thatqc − qd ≥ ǫ/(N − 1) for allpd ∈ [pǫ, 1]. Define

g(B,L) , (1− Pf (B,L))N−1

N − (1− pc) (1− Pf (B,L))

− Pm(B,L; pǫ).

Since Pm(B,L; pd) is non-increasing in pd, we haveg(B,L; pd) ≥ g(B,L) for all pd ∈ [pǫ, 1], whereg(B,L; pd) is defined in (3). Also, by Lemma 1, we havelimL→∞ Pf (B,L) = 0 and limL→∞ Pm(B,L; pǫ) = 0.Therefore,limL→∞ g(B,L) = pc, and thus there existsL1

such that g(B,L; pd) > 0 for all pd ∈ [pǫ, 1], for allL ≥ L1. Define M(L) , ⌈(1 − pc)L/g(B,L)⌉. SinceM(L) ≥ Mmin(B,L; pd) for all pd ∈ [pǫ, 1], protocolσr(B,L, M(L)) is DP against all constant strategies usingpd ∈ [pǫ, 1], for all L ≥ L1.

SinceC(σr) is non-decreasing inM , we have

0 ≤ C(σr(B,L, M(L))) ≤N ((1 − pc)L/g(B,L) + 1)

L+ ((1 − pc)L/g(B,L) + 1)

×(1− pc)N−1

[

pcPf − (1− Pf )N−1

N + (1− Pf )]

.

Therefore,limL→∞ Ps(σr) = 0, and there existsL2 such that

C(σr) < δ for all L ≥ L2. ChooseL ≥ max{L1, L2} andM = M(L). Then σr(B,L,M) is DP against all constant

strategies usingpd ∈ [pǫ, 1] and satisfiesC(σr) < δ. Finally,note that the payoff gain from deviating to a constant strategyusingpd ∈ [0, pǫ) is bounded above byǫ. Hence,σr(B,L,M)is robustǫ-DP andδ-PO. This completes the proof.

APPENDIX BPROOF OFTHEOREM 4

Consider the problem of a deviating node maximizing itspayoff given that all the other nodes use a review strategyσr(B,L,M), i.e., maxσ∈Σi

U(σ; σr). We can define a statespace with totalL(L + 1)/2 + M states, where a state is apair consisting of the slot position and the number of idle slotssince the beginning of the current review phase in the case ofa review phase while it is the slot position in the case of apunishment phase. By the principle of dynamic programming,we can obtain a stationary optimal strategy, denoted byσ∗.Let pt be the expected value of the transmission probabilityof a node usingσ∗ in slot t of a review phase (conditional onnull history) when other nodes followσr. Let It = χ{zt = 0}.SinceE[It] = (1− pt)(1 − pc)

N−1, we have

U(σ∗; σr) =(1− pc)

N−1∑τ+L

t=τ+1 pt

L+ P ∗fM

=L(1− pc)

N−1 − E[

∑τ+Lt=τ+1 It

]

L+ P ∗fM

, (9)

whereτ + 1 is the first slot of a review phase andP ∗f is the

punishment probability when the deviating node usesσ∗, i.e.,P ∗f = Pr

{

∑τ+Lt=τ+1 It ≤ L(qc −B)

}

. Since∑τ+L

t=τ+1 It ≥ 0,using Markov’s inequality, we have

E

[

τ+L∑

t=τ+1

It

]

≥ (1− P ∗f )L(qc −B). (10)

Combining (9) and (10), we obtain

U(σ; σr) ≤Lqc + P ∗

f (1− pc)NL+ (1 − P ∗

f )BL

L+ P ∗fM

(11)

for all σ ∈ Σi.Choose arbitraryǫ > 0 andδ > 0. Following [20], we relate

the choice ofM andB to L as follows:

B = βLρ−1, β > 0,1

2< ρ < 1

M = µL, µ > 0

Fix β, ρ, andµ such thatβ > 0, 1/2 < ρ < 1, andµ > N−1.By Chebychev’s inequality,

Pf (B,L) ≤qc(1− qc)

B2L=

qc(1− qc)

β2L2ρ−1. (12)

Also, note that

C(σr) =NPfMqc

L+ PfM=

NPfµqc

1 + Pfµ. (13)

SincePf (B,L) in (12) converges to zero asL goes to infinity,we can achieve an arbitrarily small efficiency loss in (13) bychoosing sufficiently largeL. In other words, for anyδ > 0,


there existsL′δ such thatC(σr) < δ for all L ≥ L′

δ. Withµ > N − 1, the upper bound on the deviation payoff in (11)

qc + P ∗f (1− pc)

N + (1− P ∗f )βL

ρ−1

1 + P ∗f µ

is decreasing inP ∗f . Thus, the deviation payoff is bounded

above byqc + βLρ−1.ChooseL such that

L ≥ max

{

L′δ, L

′Nǫ/2,

(2β

ǫ

)1

1−ρ

}

.

SinceL ≥ L′Nǫ/2, we have

qc −ǫ

2< U(σr; σr) ≤ U(σ∗; σr). (14)

SinceL ≥ (2β/ǫ)1/(1−ρ), we have

U(σ∗; σr) ≤ qc + βLρ−1 ≤ qc + ǫ/2. (15)

Then, by (14) and (15), we obtain an upper bound on thedeviation gain as

U(σ∗; σr)− U(σr; σr) ≤ ǫ,

which proves thatσr(B,L,M) is anǫ-NE. Lastly, sinceL ≥L′δ, we haveC(σr) < δ, and thusσr(B,L,M) is δ-PO.

REFERENCES

[1] Wireless LAN Media Access Control (MAC) and Physical Layer (PHY)Specifications, IEEE 802.11, Aug. 1999.

[2] L. Chen, S. H. Low, and J. C. Doyle, “Random access game andmediumaccess control design,”IEEE/ACM Trans. Netw., to be published.

[3] J. Park and M. van der Schaar, “Medium access control protocols withmemory,” IEEE/ACM Trans. Netw., to be published.

[4] R. T. Ma, V. Misra, and D. Rubenstein, “An analysis of generalizedslotted-Aloha protocols,”IEEE/ACM Trans. Netw., vol. 17, no. 3, pp.936–949, Jun. 2009.

[5] E. Altman, R. El Azouzi, and T. Jimenez, “Slotted Aloha asa gamewith partial information,” Comput. Networks, vol. 45, no.6, pp. 701–713, Aug. 2004.

[6] R. El Azouzi, T. Jimenez, E. S. Sabir, S. Benarfa, and E. H.Bouyakhf,“Cooperative and non-cooperative control for slotted Aloha with randompower level selections algorithms,” inProc. VALUETOOLS, Nantes,France, Oct. 2007.

[7] L. Buttyan and J.-P. Hubaux,Security and Cooperation in WirelessNetworks. Cambridge, U.K.: Cambridge Univ. Press, 2008.

[8] G. Tan and J. Guttag, “The 802.11 MAC protocol leads to inefficientequilibria,” in Proc. INFOCOM, Miami, FL, Mar. 2005.

[9] M. Cagalj, S. Ganeriwal, I. Aad, and J.-P. Hubaux, “On selfish behaviorin CSMA/CA networks,” inProc. INFOCOM, Miami, FL, Mar. 2005.

[10] P. Kyasanur and N. H. Vaidya, “Selfish MAC layer misbehavior inwireless networks,”IEEE Trans. Mobile Comput., vol. 4, no. 5, pp.502–516, Sep. 2005.

[11] A. B. MacKenzie and S. B. Wicker, “Stability of multipacket slottedAloha with selfish users and perfect information,” inProc. INFOCOM,San Francisco, CA, Apr. 2003.

[12] Y. Jin and G. Kesidis, “Equilibria of a non-cooperativegame forheterogeneous users of an Aloha network,”IEEE Commun. Lett., vol.6, no. 7, pp. 282–284, Jul. 2002.

[13] J. Park and M. van der Schaar, “Stackelberg contention games inmultiuser networks,”EURASIP J. Advances Signal Process., vol. 2009,Article ID 305978, 15 pages, 2009.

[14] Y. Jin and G. Kesidis, “A pricing strategy for an Aloha network ofheterogeneous users with inelastic bandwidth requirements,” in Proc.CISS, Princeton, NJ, Mar. 2002.

[15] G. Mailath and L. Samuelson,Repeated Games and Reputations: Long-Run Relationships. Oxford, U.K.: Oxford Univ. Press, 2006.

[16] Y. Wu, B. Wang, K. J. R. Liu, and T. C. Clancy, “Repeated open spec-trum sharing game with cheat-proof strategies,”IEEE Trans. WirelessCommun., vol. 8, no. 4, pp. 1922–1933, Apr. 2009.

[17] D. Niyato and E. Hossain, “Competitive pricing for spectrum sharingin cognitive radio networks: dynamic game, inefficiency of Nash equi-librium, and collusion,”IEEE J. Sel. Areas Commun., vol. 6, no. 1, pp.192–202, Jan. 2008.

[18] R. J. La and V. Anantharam, “Optimal routing control: repeated gameapproach,”IEEE Trans. Autom. Control, vol. 47, no. 3, pp. 437–450,Mar. 2002.

[19] C. Pandana, Z. Han, and K. J. R. Liu, “Cooperation enforcementand learning for optimizing packet forwarding in autonomous wirelessnetworks,”IEEE Trans. Wireless Commun., vol. 7, no. 8, pp. 3150–3163,Aug. 2008.

[20] R. Radner, “Repeated principal-agent games with discounting,” Econo-metrica, vol. 53, no. 5, pp. 1173–1198, Sep. 1985.

[21] R. Radner, “Repeated partnership games with imperfectmonitoring andno discounting,”Review Econ. Stud., vol. 53, no. 1, pp. 43–57, Jan.1986.

[22] J. L. Massey and P. Mathys, “The collision channel without feedback,”IEEE Trans. Inf. Theory, vol. AP-31, no. 2, pp. 192–204, Mar. 1985.

[23] M. Kandori, “Introduction to repeated games with private monitoring,”J. Econ. Theory, vol. 102, no. 1, pp. 1–15, Jan. 2002.

[24] L. G. Roberts, “Aloha packet system with and without slots and capture,”ACM SIGCOMM Comput. Commun. Rev., vol. 5, no. 2, pp. 28–42, Apr.1975.

[25] P. Billingsley, Probability and Measure. New York: Wiley, 1995.[26] E. Kalai and W. Stanford, “Finite rationality and interpersonal complex-

ity in repeated games,”Econometrica, vol. 56, no. 2, pp. 397–410, Mar.1988.

[27] D. Bertsekas and R. Gallager,Data Networks. 2nd ed. Saddle River, NJ:Prentice Hall, 1992.

[28] B. Hajek and T. van Loon, “Decentralized dynamic control of amultiaccess broadcast channel,”IEEE Trans. Autom. Control, vol. 27,no. 3, pp. 559–569, Jun. 1982.

[29] G Bianchi, “Performance analysis of the IEEE 802. 11 distributedcoordination function,”IEEE J. Sel. Areas Commun., vol. 18, no. 3,pp. 535–547, Mar. 2000.

20 40 60 80 100 120

L

B = 0.04

B = 0.06

20 30 40 50 60 70 80 90

L

B = 0.04

B = 0.06

20 30 40 50 60 70 80 90

L

pd

pd

10 20 30 40 50 60 70 80 90

L

B = 0.1

B = 0.25

10 20 30 40 50 60 70 80 90

L

p

p

Documents

PHAN, PARK, AND VAN DER SCHAAR: NEAR-OPTIMAL … · 2018. 11. 19. · signals on the channel access outcomes. We show that with public signals it is possible to design near-optimal