1 Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update …medianetlab.ee.ucla.edu/data/publications/APS_Rating.pdf · 2013-07-02 · 1 Socially-Optimal Design

1

Socially-Optimal Design of Crowdsourcing

Platforms with Reputation Update Errors

Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar

Abstract

Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a platform

for requesters, who have tasks to solve, to ask for help from workers. Vital to the proliferation of

crowdsourcing systems is incentivizing the workers to exert high effort to provide high-quality services.

In particular, this incentive should be provided, preferably, without monetary payment, which adds extra

infrastructure to the system. Reputation mechanisms have been shown to work effectively as incentive

schemes in crowdsourcing systems. A reputation agency updates the reputations of the workers based

on the requesters’ reports on the quality of the workers’ services. A low-reputation worker is less likely

to get served when it requests for help, which provides incentive for the workers to obtain a high

reputation by exerting high effort. However, reputation update errors are inevitable, because of either

system errors such as loss of reports, or inaccurate reports, resulting from the difficulty in accurately

assessing the quality of a worker’s service. The reputation update error prevents existing reputation

mechanisms from achieving the social optimum. In this paper, we propose a simple binary reputation

mechanism, which has only two reputation labels (“good” and “bad”). To the best of our knowledge, our

proposed reputation mechanism is the first that is proven to be able to achieve the social optimum even

in the presence of reputation update errors. We give the conditions under which the designed binary

reputation mechanism can achieve the social optimum, and discuss how to design the reputation rules,

such that the conditions for achieving social optimum can be fulfilled.

I. INTRODUCTION

Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a plat-

form for a user to elicit collective efforts from the other users in order to solve a task. In a

typical crowdsourcing system, a user can either post tasks and request for help as a requester, or

solve the tasks posted by others as a worker. In some crowdsourcing systems [1][2], the workers

are rewarded by monetary payments from the requesters. The payment is usually paid when the

October 19, 2012 DRAFT

2

worker is assigned with the task, instead of after the worker completes the task. This creates

an incentive problem, namely the worker may want to exert low effort on the task since it has

already been paid. In other systems [3], the servers are rewarded by the benefit obtained from

other users’ services. In such systems without monetary payment, it is even more difficult to

provide the servers with the incentive to exert high efforts. In summary, the incentive provision

for the servers is vital to the effectiveness of crowdsourcing systems.

One effective incentive mechanism is the reputation mechanism [1]–[3]. In a reputation mech-

anism, there is a reputation agency, who assigns reputations to all the users based on their

behaviors. If a user exerts high efforts in the past as a server, it will receives a high reputation

as a summary of its good behaviors in the past. Similarly, a user’s low reputation indicates that

it used to exert low efforts as a server. Since a low-reputation user may get its task served with

low effort, the users have incentive to get high reputation by exerting high efforts on the others’

tasks. Built on this intuition, the existing reputation mechanisms work well, except when there

are reputation update errors. The reputation update error comes from two sources. First, a client

cannot perfectly assess the service quality of a server, and hence, will report the service quality

as low to the reputation agency, even when the server actually exert high effort. Second, the

reputation agency may miss the client’s report on the service quality, or update the reputation

in a wrong way by mistake. In the presence of reputation update errors, the existing reputation

mechanisms [1]–[3] have performance loss that is increasing with the update error probability.

This performance loss in [1]–[3] partially comes from the restriction of attention on stationary

Markov strategies. As we will see later, once we remove this restriction, the social optimum can

be achieved.

In this paper, we show that the social optimum can be achieved in the presence of reputation

update errors, if we do not restrict our attention to stationary Markov strategies as in [1]–[3]. In

other words, we allow the users to take different actions given the same current state at different

time instants, which increases the strategy space to be considered significantly and potentially

complicates the optimal strategy. Nevertheless, we rigorously prove that under certain conditions,

the social optimum can be achieved by a class of simple reputation mechanisms, namely binary

reputation mechanisms that assign binary reputation labels to the users. We derive the conditions

under which the social optimum can be achieved, which provide guidelines for the design of

reputation update rules. In addition, we further simplify the optimal strategy by proving that


3

strategies of a simple structure can be optimal, and propose an algorithm to construct the optimal

strategy.

The rest of the paper is organized as follows. In Section II, we discuss related works. In

Section III, we describe the model of crowdsourcing systems. Then we design the optimal repu-

tation mechanisms in Section V. Simulation results in Section VI demonstrate the performance

improvement of the proposed reputation mechanism. Finally, Section VII concludes the paper.

II. RELATED WORKS

The idea of social norm and reputation mechanism was originally proposed in [4]. Assuming

that there is no reputation update error, [4] proposed a simple reputation mechanism, in which

any deviating user will be assigned with the lowest reputation forever and will be effectively

excluded from the system. Without reputation update errors, this reputation mechanism with

the most severe punishment can achieve social optimum. However, with reputation update

errors, eventually all the users will have the lowest reputation and receive no payoff. Hence,

the performance loss of the reputation mechanism in [4] is large under reputation update errors.

Based on the idea in [4], [5][6][7] did some performance analysis on different reputation

mechanisms through simulation. Through simulation, some simple reputation mechanisms are

shown to have performance loss under reputation update errors.

[8] is the first one that rigorously analyzes a class of reputation mechanisms under reputation

update errors. For the class of reputation mechanisms considered, [8] quantifies the performance

loss in terms of the reputation update error probability. Since there is only one long-live player in

the model of [8], the reputation mechanisms in [8] are actually equivalent to a class of reputation

mechanisms that we will analyze in Section V-C. Our analysis on this subclass of reputation

mechanisms shows the same intuition as in [8]. However, we will also design another class

of reputation mechanisms that asymptotically achieve social optimum under reputation update

errors. The contrast between these two classes of reputation mechanisms shows that differential

punishment is crucial to achieve social optimum. In our setting, we can punish users with low

reputation and reward users with high reputation, in order to have differential punishment. On

the contrary, in [8], because there is only one long-live player, all the users (considered as the

single long-live player) are punished or rewarded simultaneously.

[1]–[3] also rigorously analyze a class of reputation mechanisms in the limit case when the


4

TABLE I

COMPARISON WITH RELATED WORKS.

User Strategy Deviation-proof Discount factorPerformance loss

(due to update errors)

[1]–[3] Stationary Markov Yes δ < 1 Yes

[5][6][7] Stationary Markov No δ < 1 Yes

[8] Stationary Markov Yes δ < 1 Yes

[4] Nonstationary Yes δ → 1 Yes

This work Nonstationary Yes δ < 1 No

reputation update error probability goes to 0. However, under positive reputation update error

probabilities, there will also be performance loss.

Our paper is the first one that proposes a class of socially optimum reputation mechanisms

under reputation update errors. Although we model the interaction among the users in the

crowdsourcing platform as a stochastic game, our results are fundamentally different from the

folk theorem-type results in repeated games [9] and in stochastic games [10] in two aspects.

First, [9] and [10] show that for any feasible payoff profile v, there exists a discount factor

δ, such that under any δ ≥ δ, the payoff profile v can be achieved at the equilibrium. However,

they cannot obtain the lower bound δ. Hence, they cannot construct the equilibrium strategy

that achieves v, because a feasible discount factor δ ≥ δ is needed to compute the strategy. In

contrast, for a payoff profile v that is arbitrarily close to the social optimum (b− c) ·1N , we can

analytically determine the lower bound of feasible discount factors δ, and thus can construct the

equilibrium strategy to achieve v based on a feasible discount factor.

Second, although both our work and [9][10] build on the theory of self-generating sets [11],

we cannot use the theory of self-generating sets in the same way as [9] and [10] did, because of

some restriction on the strategies in our setting. We will discuss this in details in Section V-A.

In Table I, we compare the proposed work with existing reputation mechanisms.


5

III. SYSTEM MODEL

A. Basic Setup

Consider a crowdsourcing platform with N users. Denote the set of users by N = {1, . . . , N}.

We assume that the number of users N is displayed on the platform and is known to all the

users. Each user has plenty of tasks to solve, and possesses some resources (e.g. knowledge)

valuable to the other users’ tasks. A user can request for help to solve its task as a requester, and

can provide service for another user as a worker. Since the users usually stay in the platform

for a long period of time, we divide time into periods labeled by t = 0, 1, 2, . . .. In each period

t, the users act in the following order:

• Each user requests for help to solve its tasks.

• Each user is matched to another user’s task by a matching rule.

• Each user chooses to exert high or low effort in solving the task.

A matching is define as a mapping m : N → N , where user i, as a worker, is matched to

the task of user m(i). Since a user cannot be matched to itself, we denote the set of all possible

matchings as

M = {m : m(i) 6= i,∀i ∈ N} . (1)

A matching rule is then defined as a probability distribution µ on the set of all possible matchings

M . In this paper, we will focus on the uniformly random matching rule, which satisfies

µ(m) = 1/

(N !

N∑i=2

(−1)i

i!

), ∀m ∈ M. (2)

We assume that there is no cost for requesting for service, and that the users always have

tasks to solve. Hence, each user requests for service in every period. A user pays a high or low

cost to exert high or low effort, respectively, as a worker. We assume that the cost of high or

low effort is the same across all the users. A user receives a high or low benefit as a requester

if the matched worker exerts high or low effort, respectively. We also assume that the high or

low benefit is the same across all the users. Since only the difference in the costs of high effort

and low effort matters, we normalize the worker’s cost of exerting low effort to 0, and set the

cost of exerting high effort as c > 0. For the same reason, we normalize the requester’s received

benefit from the worker’s low effort to 0, and set the benefit from high effort as b > 0. Hence,


6

TABLE II

THE GIFT-GIVING GAME BETWEEN A CLIENT AND A SERVER.

high effort low effort

request (b,−c) (0, 0)

the interaction of a requester and a worker can be summarized by the “gift-giving” game in

Table II, where the row player is the requester and the column player is the worker.

We can see that in the unique Nash equilibrium of the gift-giving game, the worker will exert

low effort, which results in a zero payoff for both the requester and the worker. We are interested

in the scenarios where b > c, namely exerting high effort leads to the social optimum. Our goal

is to design an incentive scheme such that it is in their self-interests for the workers to exert

high effort.

B. Binary Reputation Mechanisms

A powerful incentive scheme is the reputation mechanism, which exploits the repeated in-

teraction of the requesters and the workers, and assigns each user a reputation as the summary

of its past behavior. In this paper, we focus on the simplest reputation mechanisms, namely

binary reputation mechanisms, and will show that binary reputation mechanisms can achieve

social optimum. A binary reputation mechanism assigns binary reputations to the users. Denote

the set of binary reputations by Θ = {0, 1}, and the reputation of user i by θi, ∀i ∈ N . The

reputation profile is defined as the vector of all the users’ reputations, θ = (θ1, . . . , θN). The

reputation profile contains the users’ private information and is not known to the users. However,

an important statistics, namely the reputation distribution, can be listed on the platform and

observed by all the users. A reputation distribution is defined as a tuple s(θ) = (s0(θ), s1(θ)),

where s1(θ) =∑

i∈N θi is the number of users with reputation 1, and s0(θ) =∑

i∈N (1− θi) is

the number of users with reputation 0. The set of all possible reputation distributions is denoted

S. Note, however, that when a worker is matched to a requester, it is informed by the platform

of its requester’s reputation.

As described before, a user always request for service as a requester because there is no cost

to request. Hence, a user only needs to determine its effort level as a worker. Consequently, we


7

model each user’s action as a contingent plan of exerting high or low effort based on its own

reputation and the reputation of the requester matched to it. Formally, each user i’s action, denoted

by αi, is a mapping αi : Θ × Θ → Z, where Z = {0, 1} is the set of effort levels with z = 0

representing “low effort”. Then αi(θm(i), θi) denotes user i’s effort level as a worker when it is

matched to a requester with reputation θm(i). We write the action set as A = {α|α : Θ×Θ → Z},

and the joint action profile of all the users as α = (α1, . . . , αN).

After each worker exerts effort based on its action, the requester reports its assessment of the

effort level to the platform. The report is defined as a mapping R : Z → ∆(Z), where ∆(Z)

is the probability distribution over Z. For example, R(1|z) is the probability that the requester

reports “high effort” given the worker’s actual effort level z. An example report could be

R(z′|z) =

1− ε, z′ = z

ε, z′ 6= z, (3)

where ε ∈ [0, 0.5) is the report error probability.1 This report error may be caused by the

requester’s inaccurate assessment of the effort level and by the system error of the platform (e.g.

loss of reports). Since the users are homogeneous, we assume the same report mapping R for

all the users.

Based on the requester’s report, the platform will update the worker’s reputation based on the

reputation update rule. Again, since the users are homogeneous, we assume that the reputation

update rule is the same for the users. Then the reputation update rule, denoted by τ , is defined as

a mapping τ : Θ×Θ× Z → ∆(Θ). For example, τ(θ′w|θr, θw, z) is the probability distribution

of the worker’s updated reputation θ′w, given the reputation of the requester θr, the worker’s

own reputation θw, and the requester’s report z. We focus on a class of reputation update rules

1We confine the report error probability ε to be smaller than 0.5, because with an error probability ε = 0.5, the report contains

no information about the effort level. If an error probability is larger than 0.5, the platform can use the opposite of the report

as an indication of the effort level, which is equivalent to the case with an error probability smaller than 0.5.


8

defined as

τ(θ′w|θr, θw, z)=

β+θw

, θ′w = 1, z ≥ α0(θr, θ)

1− β+θw

, θ′w = 0, z ≥ α0(θr, θ)

1− β−θw, θ′w = 1, z < α0(θr, θ)

β−θw, θ′w = 0, z < α0(θr, θ)

, for θw = 0, 1,

where α0 is the platform’s recommended action, based on which the reputation is updated. In

the above reputation update rule, if the reported effort level is not lower than the one specified

by the recommended action, a worker with reputation θw will have reputation 1 with probability

β+θw

; otherwise, it will have reputation 0 with probability β−θw.

In summary, a reputation mechanism (Θ, τ) is completely determined by the set of reputations

Θ and the reputation update rule τ .

Finally, we summarize the repeated interaction among the users in the crowdsourcing platform

with a reputation mechanism as follows.

1) At the beginning of period t, the platform displays the current reputation distribution s(θ),

and announces the recommended action α0.

2) Each user i requests for services, following which it is matched as a worker to the task

of user m(i) with probability µ(m). It is then informed by the platform of the requester’s

reputation θm(i). According to its action αi, it chooses its effort level αi(θm(i), θi).

3) Each requester reports its assessment of the effort level to the platform.

4) At the end of period t, the platform updates the reputation profile based on the reputation

update rule τ .

IV. STOCHASTIC GAME FORMULATION

In the section, we first formulate the repeated interaction among the users in the crowdsourcing

platform with reputation mechanisms as a stochastic game. Then we make some restrictions on

the users’ strategies based on the users’ knowledge of the system, and define the correspond-

ing equilibrium, considering the above restrictions. Finally, we define the platform designer’s

problem.


9

A. The Stochastic Game

A stochastic game is described by the set of players, the set of states with the state transition

probability, the set of actions, and the players’ stage-game payoff functions. We consider the

platform as a player, which is indexed by 0, and define the set of players as {0} ∪ N . The

state is defined as the reputation profile θ, and the set of states as ΘN . The platform’s action

is the recommended action, which is defined in the same way as the users’ actions, namely

α0 : Θ × Θ → {0, 1}. We write the state transition probability as q(θ′|θ, α0, α), which is the

probability that the next state is θ′ given the current state θ, the platform’s recommended action,

and the joint action profile α. Note that the state transition probability q is determined by the

matching rule µ, the report function R, and the reputation update rule τ . Finally, each user i’s

stage-game payoff function is ui(θ, α0, α), which depends on2 the state θ and the joint action

profile α. We define the platform’s payoff as a constant u0(θ, α0, α) = 0,∀θ, α0, α, such that it

will follow the platform designer’s decisions on how to choose the recommended action. Note

that the platform designer’s goal (or payoff) is the social welfare, which will be defined later.

The users interact in the platform repeatedly. Hence, each user should have a strategy, which

is a contingent plan of which action to take based on the history. The history is the collection

of the past and the current states (i.e. reputation profiles) and its own service qualities received

from the workers. This history is different for different users, who have different records of

service qualities, and thus is called private history. By contrast, we define the public history as

the collection of states, which is the same for all the users. In this paper, we focus on public

strategies, which determine the actions based on public histories only. Public strategies are less

complicated than private strategies, and more importantly, will be proven to be able to achieve

the social optimum.

Denote the public history at period t as ht = (θ0, . . . ,θt), where θt is the reputation profile

at the beginning of period t, and the set of public histories at period t as Ht = (ΘN)t+1. Then

each user i’s strategy πi : ∪∞t=0Ht → A as a mapping from the set of all possible histories to

the action set. Similarly, we define the platform’s recommended strategy as π0 : ∪∞t=0Ht → A.

2Note that, although the stage-game payoff does not depend on the recommended action, we write the recommended action

α0 as an argument of the payoff function by the convention of game theory. However, the recommended action does affect the

long-term payoff through the state transition probability.


10

The joint strategy profile of all the users is written as π = (π1, . . . , πN). We write the set of all

strategies and the set of all strategy profiles as Π and ΠN , respectively. Given the matching rule

µ, the initial reputation profile θ0, the report function R, and the reputation mechanism (Θ, τ),

the recommended strategy π0 and the joint strategy profile π induce a probability distribution

over the set of all the histories H∞. Taking the expectation with respect to this probability

distribution, each user i receives a discounted average payoff Ui(θ0, π0, π) defined as

Ui(θ0, π0, π) = Eh∞

{(1− δ)

∞∑t=0

δtui(θt, π0(h

t), π(ht))

},

where δ ∈ [0, 1) is the common discount factor of all the users. The discount factor δ is the rate

at which the users discount future payoffs, and reflects the patience of the users. A more patient

user has a larger discount factor.

B. Symmetric Feasible Strategy Profiles

First, since the users are homogeneous, we assume that all the users adopt the same strategy,

namely πi = π, ∀i ∈ N . In particular, we write the symmetric joint strategy profile as π · 1N ,

where 1N is a vector of 1’s with length N . Note that although all the users adopt the same action

in each period, they will choose different effort levels since both their own and their requesters’

reputations are different.

Second, since the users know the reputation distributions only, but not the reputation profiles,

we restrict the users’ strategies to be indifferent in reputation profiles that have the same

reputation distribution. We call such strategies feasible strategies, since it is infeasible for the

users to adopt strategies that require the knowledge of reputation profiles. Formally, we define

the feasible strategies as follows.

Definition 1: A strategy π is feasible, if for all t ≥ 0 and for all ht, ht ∈ Ht, we have

π(ht) = π(ht), if s(θk) = s(θk), k = 0, 1, . . . , t. (4)

We write the set of all feasible strategies as Πf .

In this paper, we focus on symmetric feasible strategy profiles, which is defined as follows.

Definition 2: A symmetric feasible strategy profile is a symmetric strategy profile π · 1N , in

which the strategy π is feasible, namely π ∈ Πf . We write the set of all symmetric feasible

strategy profiles as ΠNf .


11

If the users adopt symmetric feasible strategies, their stage-game payoffs can be calculated as

ui(θ, α0, (αi, α · 1N−1)) = b ·[sθi− 1

N − 1α(θi, θi) +

s1−θi

N − 1α(θi, 1− θi)

](5)

− c ·[sθi− 1

N − 1αi(θi, θi) +

s1−θi

N − 1αi(1− θi, θi)

],

where (αi, α · 1N−1) is the action profile in which user i chooses αi and all the other users

choose α. We can see that each user’s payoff can be determined by the reputation distribution

s(θ) only.

Finally, since the action set A has 24 = 16 elements, the complexity of choosing the action is

large for the users. Hence, we consider the strategies that choose actions from a subset B ∈ A,

and define Πf (B) as the set of symmetric feasible strategies restricted on the subset of actions

B. We are particularly interested in two subsets of actions, Aafs , {αa, αf , αs} with three actions

and Aas , {αa, αs} with two actions. Specifically, the action αa is the altruistic action, which

we define as

αa(θr, θw) = 1,∀θr, θw ∈ {0, 1}, (6)

where the worker exerts high effort regardless of the worker’s and the requester’s reputations.

The action αf is the fair action, which we define as

αf(θr, θw) =

0 θw > θr

1 θw ≤ θr

, (7)

where the worker exerts high effort only when the requester has a higher or equal reputation.

The action αs is the selfish action, which we define as

αs(θr, θw) = 0,∀θr, θw ∈ {0, 1}, (8)

where the worker exerts low effort regardless of the worker’s and the requester’s reputations.

As we will show later, a pair of a recommended strategy and a strategy profile (π0, π · 1N) ∈

Πf (Aafs) × ΠN

f (Aafs) can achieve social optimum at the equilibrium if we design reputation

update rules carefully, while a pair (π0, π · 1N) ∈ Πf (Aas) × ΠN

f (Aas) incurs performance loss

at the equilibrium under all reputation update rules unless the reputation update error ε = 0.


12

C. Definition of The Equilibrium

As in the game theory literature on stochastic games [10][12], we use public perfect equi-

librium (PPE) as the equilibrium concept, but adapt PPE in our setting because we focus on

feasible strategy profiles. In a PPE, no user has the incentive to deviate following any history.

This is a stronger requirement than that in the Nash equilibrium (NE), which requires that the

users do not deviate following the histories that happen in the equilibrium. However, in our

setting, due to the reputation update error, all the states occur with positive probabilities under

all the strategy profiles. Hence, NE is equivalent to PPE, which means that PPE is the only

appropriate equilibrium concept.

Before we define PPE, we need to define the continuation strategy π|ht , which is a mapping

π|ht : ∪∞k=0Hk → A with π|ht(hk) = π(hthk), where hthk is the concatenation of ht and

hk. Keeping in mind that we focus on symmetric feasible strategy profiles, we formally define

symmetric feasible PPE (SF-PPE) as follows.

Definition 3: A pair of a feasible recommended strategy and a symmetric feasible strategy

profile (π0, π · 1N) ∈ Πf ×ΠNf is a SF-PPE, if for all t ≥ 0, for all ht ∈ Ht, and for all i ∈ N ,

we have

Ui(θt, π0|ht , π|ht · 1N) (9)

≥ Ui(θt, π0|ht , (πi|ht , π|ht · 1N−1)), ∀πi|ht ∈ Πf ,

where (πi|ht , π|ht · 1N−1) is the continuation strategy profile in which user i deviates to πi|ht

and the other users follow the strategy π|ht .

The major difference between SF-PPE and conventional PPE is that in PPE, we need to prevent

the users from deviating to any strategy π ∈ Π. However, in SF-PPE, we need to prevent the

deviation to feasible strategies π ∈ Πf only, because the users do not know the reputation

profiles.

D. The Platform Design Problem

The goal of the platform designer is to maximize the social welfare at the equilibrium in the

worst case (with respect to different initial reputation distributions). Hence, the platform design


13

problem can be formulated as follows.

maxτ,(π0,π·1N )∈Π0f×ΠN

f

minθ0∈ΘN

1

N

∑i∈N

Ui(θ0, π0, π · 1N) (10)

s.t. (π0, π · 1N) is a SF− PPE.

The social optimum is b− c, achieved by the workers exerting high efforts all the time, which

is not an equilibrium strategy. In the next section, we will show that under properly designed

reputation update rules, the social optimum b− c can be asymptotically achieved at the SF-PPE

when the users are sufficiently patient (i.e. the discount factor δ → 1).

V. SOCIALLY OPTIMAL DESIGN

In this section, we design reputation mechanisms that asymptotically achieve social optimum

at the equilibrium, even when the reputation update rule ε > 0. In our design, we use the APS

technique, named after the authors of the seminal paper [11], which is also used to prove the folk

theorem for repeated games in [9] and for stochastic games in [10]. We will briefly introduce the

APS technique first. Meanwhile, more importantly, we will illustrate why we cannot use APS

in our setting in the same way as [9] and [10] did. Then, we will show how we use APS in a

different way in our setting, in order to design the optimal reputation mechanism and to construct

the equilibrium strategy. Finally, we analyze the performance of a class of simple but suboptimal

strategies, which sheds light on why the proposed strategy can achieve social optimum.

A. The APS Technique

APS [11] provides a characterization of the set of PPE payoffs. It builds on the idea of

self-generating sets described as follows. Define a set Wθ ⊂ RN for every state θ ∈ ΘN ,

and write (Wθ′)θ′∈ΘN as the collection of these sets. Then we have the following definitions

[11][10][12]. First, we say a payoff profile v(θ) ∈ RN is decomposable on (Wθ′)θ′∈ΘN given θ,

if there exists a recommended action α0, an action profile α∗, and a continuation payoff function

γ : ΘN → ∪θ′∈ΘNWθ′ with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all αi ∈ A,

vi = (1− δ)ui(θ, α0, α∗) + δ

∑θ′

γi(θ′)q(θ′|θ, α0, α

∗) (11)

≥ (1− δ)ui(θ, α0, αi, α∗−i) + δ

∑θ′

γi(θ′)q(θ′|θ, α0, αi, α

∗−i).


14

Then, we say a set (Wθ)θ∈ΘN is a self-generating set, if for any θ, every payoff profile v(θ) ∈

Wθ is decomposable on (Wθ′)θ′∈ΘN given θ. The important property of self-generating sets is

that any self-generating set is a set of PPE payoffs [11][10][12].

Based on the idea of self-generating sets, [9] and [10] proved the folk theorem for repeated

games and stochastic games, respectively. However, we cannot use APS in the same way as

[9] and [10] did for the following two reasons. First, we restrict our attention on symmetric

feasible strategy profiles, which impose the following constraints on the self-generating sets and

the continuation payoff functions.

Lemma 1: Suppose that a self-generating set (Wθ)θ∈ΘN is a set of SF-PPE payoffs. Then for

all θ, every payoff profile v(θ) ∈ Wθ should satisfy

vi(θ) = vj(θ), ∀i, j ∈ N s.t. θi = θj, (12)

and should be decomposed by a continuation payoff function such that for all θ′,

γi(θ′) = γj(θ

′), ∀i, j ∈ N s.t. θ′i = θ′j. (13)

Proof: For a pair of a feasible recommended strategy and a symmetric feasible strategy

profile (π0, π · 1N), we have for all θ,

Ui(θ, π0, π · 1N) = Uj(θ, π0, π · 1N), ∀i, j ∈ N s.t. θti = θt

j. (14)

Since for all θ, any payoff profile v(θ) ∈ Wθ can be achieved by a SF-PPE, it should satisfy

vi(θ) = vj(θ), ∀i, j ∈ N s.t. θi = θj. (15)

Since the range of the continuation payoff function is in the self-generating set, for the same

reason, any continuation payoff function should satisfy for all θ′,

γi(θ′) = γj(θ

′), ∀i, j ∈ N s.t. θ′i = θ′j. (16)

Since the restrictions in Lemma 1 are not considered in [9][10], we cannot apply APS in the

same way as [9] and [10] did.

Moreover, when deriving the sufficient conditions for the folk theorem to hold, both [9] and

[10] use the NE of the stage game to decompose the payoffs in the interior of the self-generating

set. This prevents them from obtaining the lower bound of the discount factor. More specifically,


15

for any feasible payoff profile v, they can prove that there exists a discount factor δ, such that

under any δ ≥ δ, the payoff profile v can be achieved at PPE. However, they cannot obtain the

lower bound δ. Because of this, they cannot construct the equilibrium strategy that achieves v,

because a feasible discount factor δ ≥ δ is needed to compute the strategy. In contrast, for a

payoff profile v that is arbitrarily close to the social optimum (b− c) · 1N , we can analytically

determine the lower bound of feasible discount factors δ, and thus can construct the equilibrium

strategy to achieve v based on a feasible discount factor.

B. Socially Optimal Design

Now we use APS in a different way to prove that social optimum can be achieved at a

SF-PPE. For the given ε0 and ε1, we construct the following self-generating set. For θ with

1 ≤ s1(θ) ≤ N − 1,

Wθ ={v : vi = vθi , ∀i ∈ N

}, (17)

where (v0, v1) satisfies

v1 − v0 ≥ ε0 − ε1, (18)

v1 +c

(N − 1)b· v0 ≤ z2 , (1 +

c

(N − 1)b)(b− c)− c

(N − 1)bε0 − ε1, (19)

v1 − bN−2N−1

b− c· v0 ≤ z3 , −

bN−2N−1

b−c− 1

1 + c(N−1)b

· z2. (20)

For θ with s1(θ) = 0,

W0N ={v : vi = v0, ∀i ∈ N

}, (21)

where v0 ∈

ε0−ε1−z3b

N−2N−1

b−c−1

, z2−(ε0−ε1)1+ c

(N−1)b

. For θ with s1(θ) = N ,

W1N ={v : vi = v1, ∀i ∈ N

}, (22)

where v1 ∈

bN−2N−1

b−c(ε0−ε1)−z3

bN−2N−1

b−c−1

,( b

N−2N−1

b−c−1)z2+ c

(N−1)bz3

bN−2N−1

b−c+ c

(N−1)b

.

We can see that each set Wθ is bounded by hyperplanes, which is different from the smooth

set (e.g. a ball) used in [9] and [10].


16

Apart from the restrictions in Lemma 1, we impose more restrictions on the continuation

payoff function to make it even simpler. Specifically, we focus on the simple continuation payoff

functions defined as below.

Definition 4: We say a continuation payoff function γ : ΘN → ∪θWθ is a simple continuation

payoff function, if for any given state θ, and any given payoff profile v ∈ Wθ, γ satisfies

γi(θ) = γj(θ′), ∀θ, θ′ ∈ ΘN , and ∀i, j ∈ N s.t. θi = θ′j.

Given a state θ and a payoff v ∈ Wθ, a simple continuation payoff function assigns the same

continuation payoff for the users with the same reputation under all the future states.

Using the above self-generating set and simple continuation payoff functions, we prove that

under certain conditions on the reputation update rule τ and the discount factor δ, the social

optimum can be asymptotically achieved at a SF-PPE. For better presentation of the theorem, we

define a few auxiliary variables first. Define κ1 , bN−2N−1

b−c− 1 and κ2 , 1 + c

(N−1)b. In addition,

we define δ, which will be proved to be the lower bound on the feasible discount factors, as

follows

δ , max{

maxs1∈{1,...,N−1}: s1

N−1b+

N−s1N−1

c>ε0−ε1

ε0 − ε1 −(

s1

N−1b + N−s1

N−1c)

(ε0 − ε1)(

N−s1

N−1β+

1 + s1−1N−1

x+1

)−(

s1

N−1b + N−s1

N−1c) ,

maxθ∈{0,1}c

c + (1− 2ε)(β+θ − (1− β−θ ))(ε0 − ε1)

, (23)

maxθ∈{0,1}

b− c + cx+

θ

(1−2ε)[β+θ−(1−β−

θ)]

b− c + cx+

θ

(1−2ε)[β+θ−(1−β−

θ)]

+ κ1z2+(κ2−1)z3

κ1+κ2− (1+κ1)(ε0−ε1)−z3

κ1

}.

Theorem 1: Choose any small real numbers ε1 > 0 and ε0 ∈ (ε1, (1 + κ2

κ1)ε1). If the following

three sets of conditions are satisfied

• Condition 1: β+1 > 1− β−1 and x+

1 , (1− ε)β+1 + ε(1− β−1 ) > 1

1+ c(N−1)b

;

• Condition 2: β+0 > 1− β−0 and x+

0 , (1− ε)β+0 + ε(1− β−0 ) <

1−β+1

c(N−1)b

;

• Condition 3: δ ∈ [δ, 1);

then there exists a SF-PPE (π0, π0 · 1N) ∈ Πf (Aafs)× ΠN

f (Aafs) that achieves

Ui(θ0, π0, π0 · 1N) = b− c− εθi

, ∀i ∈ N , (24)

starting from any initial reputation profile θ0.

Proof: See Appendix A.


17

Theorem 1 gives the sufficient conditions under which the social optimum can be asymptot-

ically achieved. The first two conditions are the restrictions on the optimal reputation update

rules. First, the probability that the reputation goes up when the effort level is not lower than

the recommended one, β+θ , should be larger than 1 − β−θ , which is the probability that the

reputation goes up when the effort level is lower than the recommended one. Second, for the

users with reputation 1, the expected probability that the reputation goes up when the effort

level is not lower than the recommended one, x+1 , should be larger than the threshold specified

in Condition 1. Meanwhile, for the users with reputation 0, the expected probability that the

reputation goes up when the effort level is not lower than the recommended one, x+0 , should

be smaller than the threshold specified in Condition 2. In this way, a user will prefer to have

reputation 1 because x+0 < x+

1 .

Condition 3 gives us an analytical lower bound for the discount factor. This is important

for constructing the equilibrium (π0, π0 · 1N), because a feasible discount factor is needed to

determine the equilibrium. Note that in [9] and [10], the lower bound for the discount factor

cannot be obtained analytically.

Remark 1: Note that the SF-PPE has a desirable property that the user’s strategy π = π0 is

the same as the recommended strategy. Hence, the platform can announce the recommended

action in each period, such that the users can follow the recommended action. In this way, the

users do not need to calculate the action to take in each period by themselves.

Once the conditions for achieving social optimum are satisfied, we know that an optimal

equilibrium strategy exists. Now we show how to construct the optimal equilibrium strategy.

The optimal equilibrium strategy is not stationary, in that given the same state, the actions

taken can be different at different periods. Hence, at each period t, we need to determine the

recommended action αt0, which is the same as the users’ actions α. The algorithm of constructing

the recommended strategy, given a sufficiently large discount factor δ, is described in Table III.

Remark 2: Note that in the algorithm, if all the users have the same reputation, the altruistic

action αa and the selfish action αs are used; otherwise, the altruistic action αa and the fair action

αf are used.


18

TABLE III

THE ALGORITHM OF CONSTRUCTING THE EQUILIBRIUM STRATEGY.

Require: ε0, ε1, δ ≥ δ, sθ

Initialization: t = 0, vθ = b− c− εθ .

repeat

if s1(θ) = 0 then

if v0 ≥ (1− δ)[b− c +

(1−ε)β+0 +ε(1−β−0 )

(1−2ε)(β+0 −(1−β−0 )

c]

+ δ ε0−ε1−z3b

N−2N−1 b−c

−1then

αt0 = αt = αa

v0 ← v0

δ− 1−δ

δ

[b− c +

(1−ε)β+0 +ε(1−β−0 )

(1−2ε)(β+0 −(1−β−0 )

c]

,v1 ← v0 + 1−δδ

[1

(1−2ε)(β+0 −(1−β−0 )

c]

else

αt0 = αt = αs

v0 ← v0

δ, v1 ← v0

end

elseif s1(θ) = N then

if v1 ≥ (1− δ)[b− c +

(1−ε)β+1 +ε(1−β−1 )

(1−2ε)(β+1 −(1−β−1 )

c]

+ δ ε0−ε1−z3b

N−2N−1 b−c

−1then

αt0 = αt = αa

v1 ← v1

δ− 1−δ

δ

[b− c +

(1−ε)β+1 +ε(1−β−1 )

(1−2ε)(β+1 −(1−β−1 )

c]

, v0 ← v1 − 1−δδ

[1

(1−2ε)(β+1 −(1−β−1 )

c]

else

αt0 = αt = αs

v1 ← v1

δ, v0 ← v1

end

else

if1+( b

N−2N−1 b−c

−1)x+0

x+1 −x+

0v1 −

1+( bN−2N−1 b−c

−1)x+1

x+1 −x+

0v0 ≤ δz3 − (1− δ)(b− c)( b

N−2N−1 b−c

− 1) then

αt0 = αt = αa

v1′ ← 1δ

(1−x+0 )v1−(1−x+

1 )v0

x+1 −x+

0− 1−δ

δ(b− c), v0′ ← 1

δ

x+1 v0−x+

0 v1

x+1 −x+

0− 1−δ

δ(b− c)

v1 ← v1′, v0 ← v0′

else

αt0 = αt = αf

v1′ ← 1δ

(1−x+0 )v1−(1−x+

s1(θ))v0

x+s1(θ)

−x+0

− 1−δδ

(b− s1(θ)−1N−1 c)(1−x+

0 )−(s0(θ)−1

N−1 b−c)(1−x+s1(θ)

)

x+s1(θ)

−x+0

v1′ ← 1δ

x+0 v1−x+

s1(θ)v0

x+0 −x+

s1(θ)

− 1−δδ

(b− s1(θ)−1N−1 c)x+

0 −(s0(θ)−1

N−1 b−c)x+s1(θ)

x+0 −x+

s1(θ)

v1 ← v1′, v0 ← v0′

end

end

t← t + 1

until ∅October 19, 2012 DRAFT

19

C. Performance Analysis of Strategies Restricted on Aas

Previously, we have shown that under certain conditions, the social optimum can be achieved

by feasible recommended strategies and symmetric feasible strategy profiles restricted on the

subset of action Aafs. Now we consider a even simpler class of strategies, restricted on the

subset of action Aas. The following proposition quantifies the performance loss of the optimal

feasible recommended strategy and symmetric feasible strategy profile restricted on Aas.

Before stating the proposition, we first define

ρ(θ, α0, SB) , maxi∈N

max

sθi

−1

N−1∑s′∈S\SB

q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α0, αa · 1N),

s1−θi

N−1∑s′∈S\SB

q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α1, αa · 1N), (25)

1∑s′∈S\SB

q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α01, αa · 1N)

}.

Proposition 1: Starting from any initial reputation profile θ, the maximum social welfare

achievable by feasible recommended strategies and symmetric feasible strategy profiles restricted

on Aas, namely (π0, π · 1N) ∈ Πf (Aas)× ΠN

f (Aas), is at most

b− c− c · ρ(θ, α∗0, S∗B)

∑s′∈S∗B

q(s′|θ, α∗0, αa · 1N), (26)

where α∗0, the optimal recommended action, and S∗B, the optimal subset of reputation distribu-

tions, are the solutions to the following optimization problem:

minα0 minSB⊂S

ρ(θ, α0, SB)∑

s′∈SB

q(s′|θ, α0, αa · 1N)

(27)

s.t.∑

s′∈S\SB

q(s′|θ, α0, αa · 1N) >

∑s′∈S\SB

q(s′|θ, α0, αi = α0, αa · 1N), ∀i ∈ N ,

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) >

∑s′∈S\SB

q(s′|θ, α0, αi = α1, αa · 1N), ∀i ∈ N ,

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) >

∑s′∈S\SB

q(s′|θ, α0, αi = α01, αa · 1N), ∀i ∈ N .

Proof: See Appendix B.

The following corollary shows that the maximum social welfare achievable by (π0, π · 1N) ∈

Πf (Aas)×ΠN

f (Aas) is bounded away from the social optimum b−c, unless there is no reputation

update error. In contrast, if we can use the fair action αf , the social optimum can be asymptotically


20

achieve when the discount factor goes to 1. In other words, the differential punishment introduced

by the fair action is crucial for achieving social optimum.

Corollary 1: Suppose that the reputation update error ε > 0. Then, starting from any initial

reputation profile θ, the maximum social welfare achievable by feasible recommended strategies

and symmetric feasible strategy profiles restricted on Aas, namely (π0, π · 1N) ∈ Πf (Aas) ×

ΠNf (Aas), is bounded away from the social optimum b− c, regardless of the discount factor.

Proof: We can see from (28) that, the distance between the maximum social welfare

achievable by (π0, π · 1N) ∈ Πf (Aas)× ΠN

f (Aas) and the social optimum b− c

c · ρ(θ, α∗0, S∗B)

∑s′∈S∗B

q(s′|θ, α∗0, αa · 1N), (28)

is independent of the discount factor δ.

Moreover, since ρ(θ, α0, SB) > 0, the distance can be zero only when∑

s′∈S∗Bq(s′|θ, α∗0, α

a ·

1N) = 0. However, when the reputation update error ε is strictly larger than 0, any reputa-

tion profile can occur with positive probabilities. Hence, we must choose S∗B = ∅ to have∑

s′∈S∗Bq(s′|θ, α∗0, α

a · 1N) = 0. In this case, the constraint

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) >

∑s′∈S\SB

q(s′|θ, α0, αi = α0, αa · 1N) (29)

cannot be satisfied, since∑

s′∈S\SBq(s′|θ, α0, αi = α0, αa ·1N) =

∑s′∈S q(s′|θ, α0, αi = α0, αa ·

1N) = 1.

VI. SIMULATION RESULTS

We compare against the state-of-the-art reputation mechanism, namely the reputation mech-

anism with optimal stationary Markov strategies in [3]. The reputation mechanism in [3] has

been shown to outperform existing reputation mechanisms [1] and BitTorrent protocols [?]. In

a stationary Markov strategy, the action to take depends on the current state only, which is

independent of when the current state appears. In contrast, the proposed strategy depends on the

history of states. The same current state may lead to different actions in different periods. In

our experiments, we consider a system with N = 10 users and discount factor δ = 0.99. In the

simulation of the optimal stationary Markov strategy [3], the recommended action is fixed to be

the fair action αf .


21

0.01 0.06 0.11 0.16 0.21 0.26 0.31 0.36 0.41 0.460

0.2

0.4

0.6

0.8

1

Error probability

Nor

mal

ized

soc

ial w

elfa

re

proposedstationary Markov (b=2)stationary Markov (b=3)stationary Markov (b=4)stationary Markov (b=5)

Fig. 1. Comparison of the social welfare under different reputation update errors.

We first compare the social welfare (i.e. the average payoff), which has been normalized to the

social optimum b− c, of the two reputation mechanisms under different benefit values. We can

see from Fig. 1 that as the error probability grows, the social welfare of the optimal stationary

Markov strategy decreases, and drops to 0 when the error probability is large (e.g. when ε > 0.4).

Then, we compare the evolution of the states, the recommended actions, and the actions taken

by the users in these two reputation mechanisms, in order to see how they work differently. The

evolution under the optimal stationary Markov strategy is shown in Table IV, and that of the

proposed strategy is shown in Table V. The first difference is that in the reputation mechanism

with stationary Markov strategies [3], the recommended action is fixed and the action taken

can be different from the recommended action. On the contrary, in the proposed reputation

mechanism, the recommended action is not fixed over time and the action taken is always the

same as the recommended action. The second difference is that in stationary Markov strategies,

the actions taken in different periods are the same as long as the current state is the same. In

contrast, in the proposed strategy, the actions taken can be different even at the same current

state.

VII. CONCLUSION

In this paper, we proposed a design framework for binary reputation mechanisms that can

achieve the social optimum in the presence of reputation update errors. We discovered the


22

TABLE IV

EVOLUTION OF STATES AND ACTIONS TAKEN IN THE OPTIMAL STATIONARY MARKOV STRATEGY.

Period State Recommended action (stationary Markov) Action taken (stationary Markov)

0 (4,6) Fair Selfish






6 (3,7) Fair Altruistic




TABLE V

EVOLUTION OF STATES, RECOMMENDED ACTIONS, AND ACTIONS TAKEN IN THE PROPOSED STRATEGY.

Period state Recommended action (proposed) Action taken (proposed)

0 (4,6) Altruistic Altruistic

1 (3,7) Fair Fair


3 (4,6) Fair Fair



6 (4,6) Fair Fair




enforceability conditions, which guarantee that the social optimum can be achieved. We proposed

a class of reputation update rules, which can fulfill the enforceability conditions when the

parameters are chosen carefully. Moreover, we constructed the optimal recommended strategy

in the reputation mechanism. The computational complexity is greatly reduced, since only three

actions are proved to be enough for the optimal strategy. Simulation results demonstrate the


23

significant performance gain of the proposed reputation mechanism over the state-of-the-art

mechanisms, especially when the reputation update error probability is large.

APPENDIX A

PROOF OF THEOREM 1

A. Outline of the proof

We derive the conditions under which the set (Wθ)θ∈ΘN is a self-generating set. Specifically,

we derive the conditions under which any payoff profile v ∈ Wθ is decomposable on (Wθ′)θ′∈ΘN

given θ, for all θ ∈ ΘN .

B. When users have different reputations

1) Preliminaries: We first focus on the states θ with 1 ≤ s1(θ) ≤ N − 1, and derive the

conditions under which any payoff profile v ∈ Wθ can be decomposed by (α0 = αa, αa ·1N) or

(α0 = αf , αf · 1N). First, v could be decomposed by (αa, αa · 1N), if there exists a continuation

payoff function γ : ΘN → ∪θ′∈ΘNWθ′ with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all

αi ∈ A,

vi = (1− δ)ui(θ, αa, αa · 1N) + δ∑θ′

γi(θ′)q(θ′|θ, αa, αa · 1N) (30)

≥ (1− δ)ui(θ, αa, αi, αa · 1N−1) + δ

∑θ′

γi(θ′)q(θ′|θ, αa, αi, α

a · 1N−1).

Since we focus on simple continuation payoff functions, all the users with the same future

reputation will have the same continuation payoff regardless of the recommended action α0, the

action profile (αi, α · 1N−1), and the future state θ′. Hence, we write the continuation payoffs

for the users with future reputation 1 and 0 as γ1 and γ0, respectively. Consequently, the above

conditions on decomposability can be simplified to

vi = (1− δ) · ui(θ, αa, αa · 1N) (31)

+ δ

γ1∑

θ′:θ′i=1

q(θ′|θ, αa, αa · 1N) + γ0∑

θ′:θ′i=0

q(θ′|θ, αa, αa · 1N)

≥ (1− δ) · ui(θ, αa, αi, α

a · 1N−1)

+ δ

γ1∑

θ′:θ′i=1

q(θ′|θ, αa, αi, αa · 1N−1) + γ0

∑θ′:θ′i=0

q(θ′|θ, αa, αi, αa · 1N−1)

.


24

First, consider the case when user i has reputation 1 (i.e. θi = 1). Based on (5), we can calculate

the stage-game payoff as ui(θ, αa, αa · 1N) = b − c. The term∑

θ′:θ′i=1 q(θ′|θ, αa, αa · 1N) is

the probability that user i has reputation 1 in the next state. Since user i’s reputation update is

independent of the other users’ reputation update, we can calculate this probability as

∑θ′:θ′i=1

q(θ′|θ, αa, αa · 1N) = [(1− ε)β+1 + ε(1− β−1 )]

∑m∈M :θm(i)=1

µ(m) (32)

+ [(1− ε)β+1 + ε(1− β−1 )]

∑m∈M :θm(i)=0

µ(m) (33)

= (1− ε)β+1 + ε(1− β−1 ) = x+

1 . (34)

Similarly, we can calculate∑

θ′:θ′i=0 q(θ′|θ, αa, αa ·1N), the probability that user i has reputation

0 in the next state, as

∑θ′:θ′i=0

q(θ′|θ, αa, αa · 1N) = [(1− ε)(1− β+1 ) + εβ−1 ]

∑m∈M :θm(i)=1

µ(m) (35)

+ [(1− ε)(1− β+1 ) + εβ−1 ]

∑m∈M :θm(i)=0

µ(m) (36)

= (1− ε)(1− β+1 ) + εβ−1 = 1− x+

1 . (37)

Now we discuss what happens if user i deviates. Since the recommended action αa is to exert

high effort for all the users, user i can deviate to the other three actions, namely “exert high effort

for reputation-1 users only”, “exert high effort for reputation-0 users only”, “exert low effort

for all the users”. We can calculate the corresponding stage-game payoff and state transition

probabilities under each deviation.

• “exert high effort for reputation-1 users only” (αi(1, θi) = 1, αi(0, θi) = 0):

ui(θ, αa, αi, αa · 1N−1) = b− c ·

∑m∈M :θm(i)=1

µ(m) = b− c · s1(θ)− 1

N − 1(38)

∑θ′:θ′i=1

q(θ′|θ, αa, αi, αa · 1N−1) (39)

= [(1− ε)β+1 + ε(1− β−1 )]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)(1− β−1 ) + εβ+1 ]

∑m∈M :θm(i)=0

µ(m)

= [(1− ε)β+1 + ε(1− β−1 )]

s1(θ)− 1

N − 1+ [(1− ε)(1− β−1 ) + εβ+

1 ]s0(θ)

N − 1.


25

∑θ′:θ′i=0

q(θ′|θ, αa, αi, αa · 1N−1) (40)

= [(1− ε)(1− β+1 ) + εβ−1 ]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)β−1 + ε(1− β+1 )]

∑m∈M :θm(i)=0

µ(m)

= [(1− ε)(1− β+1 ) + εβ−1 ]

s1(θ)− 1

N − 1+ [(1− ε)β−1 + ε(1− β+

1 )]s0(θ)

N − 1.

• “exert high effort for reputation-0 users only” (αi(1, θi) = 0, αi(0, θi) = 1):

ui(θ, αa, αi, αa · 1N−1) = b− c ·

∑m∈M :θm(i)=0

µ(m) = b− c · s0(θ)

N − 1(41)

∑θ′:θ′i=1

q(θ′|θ, αa, αi, αa · 1N−1) (42)

= [(1− ε)(1− β−1 ) + εβ+1 ]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)β+1 + ε(1− β−1 )]

∑m∈M :θm(i)=0

µ(m)

= [(1− ε)(1− β−1 ) + εβ+1 ]

s1(θ)− 1

N − 1+ [(1− ε)β+

1 + ε(1− β−1 )]s0(θ)

N − 1.

∑θ′:θ′i=0

q(θ′|θ, αa, αi, αa · 1N−1) (43)

= [(1− ε)β−1 + ε(1− β+1 )]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)(1− β+1 ) + εβ−1 ]

∑m∈M :θm(i)=0

µ(m)

= [(1− ε)β−1 + ε(1− β+1 )]

s1(θ)− 1

N − 1+ [(1− ε)(1− β+

1 ) + εβ−1 ]s0(θ)

N − 1.

• “exert low effort for all the users” (αi(1, θi) = 0, αi(0, θi) = 0):

ui(θ, αa, αi, αa · 1N−1) = b (44)

∑θ′:θ′i=1

q(θ′|θ, αa, αi, αa · 1N−1) (45)

= [(1− ε)(1− β−1 ) + εβ+1 ]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)(1− β−1 ) + εβ+1 ]

∑m∈M :θm(i)=0

µ(m)

= (1− ε)(1− β−1 ) + εβ+1 .

∑θ′:θ′i=0

q(θ′|θ, αa, αi, αa · 1N−1) (46)

= [(1− ε)β−1 + ε(1− β+1 )]

∑m∈M :θm(i)=1

µ(m) + [(1− ε)β−1 + ε(1− β+1 )]

∑m∈M :θm(i)=0

µ(m)

= (1− ε)β−1 + ε(1− β+1 ).


26

Plugging the above expressions into (31), we can simplify the incentive compatibility con-

straints (i.e. the inequality constraints) to

(1− 2ε)[β+

1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ

δ· c, (47)

under all three deviating actions.

Hence, if user i has reputation 1, the decomposability constraints (31) reduces to

v1 = (1− δ) · (b− c) + δ ·[x+

1 γ1 + (1− x+1 )γ0

], (48)

where v1 is the payoff of the users with reputation 1, and

(1− 2ε)[β+

1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ

δ· c. (49)

Similarly, if user i has reputation 0, we can reduce the decomposability constraints (31) to

v0 = (1− δ) · (b− c) + δ ·[x+

0 γ1 + (1− x+0 )γ0

], (50)

and

(1− 2ε)[β+

0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ

δ· c. (51)

For the above incentive compatibility constraints (the above two inequalities) to hold, we need

to have β+1 − (1−β−1 ) > 0 and β+

0 − (1−β−0 ) > 0, which are part of Condition 1 and Condition

2. Now we will derive the rest of the sufficient conditions in Theorem 1.

The above two equalities determine the continuation payoff γ1 and γ0 as belowγ1 = 1

δ· (1−x+

0 )v1−(1−x+1 )v0

x+1 −x+

0

− 1−δδ· (b− c)

γ0 = 1δ· x+

1 v0−x+0 v1

x+1 −x+

0

− 1−δδ· (b− c)

. (52)

Now we consider the decomposability constraints if we want to decompose a payoff profile

v ∈ Wθ using the fair action αf . Since we focus on decomposition by simple continuation

payoff functions, we write the decomposition constraints as

vi = (1− δ) · ui(θ, αf , αf · 1N) (53)

+ δ

γ1∑

θ′:θ′i=1

q(θ′|θ, αf , αf · 1N) + γ0∑

θ′:θ′i=0

q(θ′|θ, αf , αf · 1N)

≥ (1− δ) · ui(θ, αf , αi, α

f · 1N−1)

+ δ

γ1∑

θ′:θ′i=1

q(θ′|θ, αf , αi, αf · 1N−1) + γ0

∑θ′:θ′i=0

q(θ′|θ, αf , αi, αf · 1N−1)

.


27

Due to space limitation, we omit the details and directly give the simplification of the above

decomposability constraints as follows. First, the incentive compatibility constraints (i.e. the

inequality constraints) are simplified to

(1− 2ε)[β+

1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ

δ· c, (54)

and

(1− 2ε)[β+

0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ

δ· c, (55)

under all three deviating actions. Note that the above incentive compatibility constraints are the

same as the ones when we want to decompose the payoffs using the altruistic action αa.

Then, the equality constraints in (53) can be simplified as follows. For the users with reputation

1, we have

v1 = (1− δ) ·(b− s1(θ)− 1

N − 1c

)+ δ ·

[x+

s1(θ) · γ1 + (1− x+

s1(θ)) · γ0], (56)

where

xs1(θ) ,

[(1− ε)

s1(θ)− 1

N − 1+

s0(θ)

N − 1

]β+

1 +

(εs1(θ)− 1

N − 1

)(1− β−1 ). (57)

For the users with reputation 0, we have

v0 = (1− δ) ·(

s0(θ)− 1

N − 1b− c

)+ δ ·

[x+

0 γ1 + (1− x+0 )γ0

]. (58)

The above two equalities determine the continuation payoff γ1 and γ0 as belowγ1 = 1

δ·

(1−x+0 )v1−(1−x+

s1(θ))v0

x+s1(θ)

−x+0

− 1−δδ·

(b− s1(θ)−1

N−1c

)(1−x+

0 )−(

s0(θ)−1

N−1b−c

)(1−x+

s1(θ))

x+s1(θ)

−x+0

γ0 = 1δ·

x+s1(θ)

v0−x+0 v1

x+s1(θ)

−x+0

− 1−δδ·

(b− s1(θ)−1

N−1c

)x+0 −(

s0(θ)−1

N−1b−c

)x+

s1(θ)

x+s1(θ)

−x+0

. (59)

2) Sufficient conditions: Now we derive the sufficient conditions under which any payoff

profile v ∈ Wθ can be decomposed by (α0 = αa, αa · 1N) or (α0 = αf , αf · 1N). Specifically,

we will derive the conditions such that for any payoff profile v ∈ Wθ, at least one of the two

decomposability constraints (31) and (53) is satisfied. From the preliminaries, we know that the

incentive compatibility constraints in (31) and (53) can be simplified into the same constraints:

(1− 2ε)[β+

1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ

δ· c, (60)


28

and

(1− 2ε)[β+

0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ

δ· c. (61)

The above constraints impose the constraint on the discount factor, namely

δ ≥ maxθ∈Θ

c

c + (1− 2ε)[β+

θ − (1− β−θ )](γ1 − γ0)

. (62)

Since γ1 and γ0 should satisfy γ1 − γ0 ≥ ε0 − ε1, the above constraints can be rewritten as

δ ≥ maxθ∈Θ

c

c + (1− 2ε)[β+

θ − (1− β−θ )](ε0 − ε1)

, (63)

where is part of Condition 3 in Theorem 1.

In addition, the continuation payoffs γ1 and γ0 should satisfy the constraints of the self-

generating set, namely

γ1 − γ0 ≥ ε0 − ε1, (64)

γ1 +c

(N − 1)b· γ0 ≤ z2 , (1 +

c

(N − 1)b)(b− c)− c

(N − 1)bε0 − ε1, (65)

γ1 − bN−2N−1

b− c· γ0 ≤ z3 , −

bN−2N−1

b−c− 1

1 + c(N−1)b

· z2. (66)

We can plug the expressions of the continuation payoffs γ1 and γ0 in (52) and (59) into

the above constraints. Specifically, if a payoff profile v is decomposed by the altruistic action,

the following constraints should be satisfied for the continuation payoff profile to be in the

self-generating set: (for notational simplicity, we define κ1 , bN−2N−1

b−c− 1 and κ2 , 1 + c

(N−1)b)

1

δ· v1 − v0

x+1 − x+

0

≥ ε0 − ε1, (αa-1)

1

δ·{

(1− κ2x+0 )v1 − (1− κ2x

+1 )v0

x+1 − x+

0

− κ2 · (b− c)

}≤ z2 − κ2 · (b− c), (αa-2)

1

δ·{

(1 + κ1x+0 )v1 − (1 + κ1x

+1 )v0

x+1 − x+

0

+ κ1 · (b− c)

}≤ z3 + κ1 · (b− c). (αa-3)

The constraint (αa-1) is satisfied for all v1 and v0 as long as x+1 > x+

0 , because v1−v0 > ε0−ε1,

|x+1 > x+

0 | < 1, and δ < 1.

Since both the left-hand side (LHS) and the right-hand side (RHS) of (αa-2) are smaller than

0, we have

(αa-2) ⇔ δ ≤(1−κ2x+

0 )v1−(1−κ2x+1 )v0

x+1 −x+

0

− κ2 · (b− c)

z2 − κ2 · (b− c)(67)


29

The RHS of (αa-3) is larger than 0. Hence, we have

(αa-3) ⇔ δ ≥(1+κ1x+

0 )v1−(1+κ1x+1 )v0

x+1 −x+

0

+ κ1 · (b− c)

z3 + κ1 · (b− c). (68)

If a payoff profile v is decomposed by the fair action, the following constraints should be

satisfied for the continuation payoff profile to be in the self-generating set:

1

δ·

v1 − v0

x+s1(θ) − x+

0

−s1(θ)N−1

b + s0(θ)N−1

c

x+s1(θ) − x+

0

≥ ε0 − ε1 −s1(θ)N−1

b + s0(θ)N−1

c

x+s1(θ) − x+

0

, (αf-1)

1

δ·

(1− κ2x+0 )v1 − (1− κ2x

+s1(θ))v

0

x+s1(θ) − x+

0

−(1− κ2x

+0 )(b− s1(θ)−1

N−1c)− (1− κ2x

+s1(θ))

(s0(θ)−1

N−1b− c

)x+

s1(θ) − x+0

≤ z2 −

(1− κ2x+0 )(b− s1(θ)−1

N−1c)− (1− κ2x

+s1(θ))

(s0(θ)−1

N−1b− c

)x+

s1(θ) − x+0

, (αf-2)

1

δ·

(1 + κ1x+0 )v1 − (1 + κ1x

+s1(θ))v

0

x+s1(θ) − x+

0

−(1 + κ1x

+0 )(b− s1(θ)−1

N−1c)− (1 + κ1x

+s1(θ))

(s0(θ)−1

N−1b− c

)x+

s1(θ) − x+0

≤ z3 −

(1 + κ1x+0 )(b− s1(θ)−1

N−1c)− (1 + κ1x

+s1(θ))

(s0(θ)−1

N−1b− c

)x+

s1(θ) − x+0

. (αf-3)

Since v1−v0

x+s1(θ)

−x+0

> ε0 − ε1, the constraint (αf-1) is satisfied for all v1 and v0 if v1 − v0 ≥s1(θ)N−1

b + s0(θ)N−1

c. Hence, the constraint (αf-1) is equivalent to

δ ≥v1 − v0 −

(s1(θ)N−1

b + s0(θ)N−1

c)

(ε0 − ε1)(x+s1(θ) − x+

0 )−(

s1(θ)N−1

b + s0(θ)N−1

c) , for θ s.t.

s1(θ)

N − 1b +

s0(θ)

N − 1c ≥ v1 − v0. (69)

For (αf-2), we want to make the RHS have the same (minus) sign under any state θ, which

is true if

1− κ2x+0 > 0, 1− κ2x

+s1(θ) < 0,

1− κ2x+s1(θ)

1− κ2x+0

≥ −(κ2 − 1), s1(θ) = 1, . . . , N − 1, (70)

which leads to

x+s1(θ) >

1

κ2

, x+0 <

1

κ2

, x+0 <

1− x+s1(θ)

1− κ2

, s1(θ) = 1, . . . , N − 1, (71)

⇔ N − 2

N − 1x+

1 +1

N − 1β+

1 >1

κ2

, x+0 < min

{1

κ2

,1− β+

1

1− κ2

}. (72)

Since the RHS of (αf-2) is smaller than 0, we have

(αf-2) ⇔ δ ≤

(1−κ2x+0 )v1−(1−κ2x+

s1(θ))v0

x+s1(θ)

−x+0

−(1−κ2x+

0 )

(b− s1(θ)−1

N−1c

)−(1−κ2x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

z2 −(1−κ2x+

0 )

(b− s1(θ)−1

N−1c

)−(1−κ2x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

. (73)


30

For (αf-3), since1+κ1x+

s1(θ)

1+κ1x+0

< 1 + κ1, the RHS is always smaller than 0. Hence, we have

(αf-3) ⇔ δ ≤

(1+κ1x+0 )v1−(1+κ1x+

s1(θ))v0

x+s1(θ)

−x+0

−(1+κ1x+

0 )

(b− s1(θ)−1

N−1c

)−(1+κ1x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

z3 −(1+κ1x+

0 )

(b− s1(θ)−1

N−1c

)−(1+κ1x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

. (74)

We briefly summarize what requirements on δ we have obtained now. To make the continuation

payoff profile in the self-generating under the decomposition of αa, we have one upper bound on

δ resulting from (αa-2) and one lower bound on δ resulting from (αa-3). To make the continuation

payoff profile in the self-generating under the decomposition of αf , we have two upper bounds

on δ resulting from (αf-2) and (αf-3), and one lower bound on δ resulting from (αf-1). First,

we want to eliminate the upper bounds, namely make the upper bounds larger than 1, such that

δ can be arbitrarily close to 1.

To eliminate the following upper bound resulting from (αa-2)

δ ≤(1−κ2x+

0 )v1−(1−κ2x+1 )v0

x+1 −x+

0

− κ2 · (b− c)

z2 − κ2 · (b− c), (75)

we need to have (since z2 − κ2 · (b− c) < 0)

(1− κ2x+0 )v1 − (1− κ2x

+1 )v0

x+1 − x+

0

≤ z2, ∀v1, v0. (76)

The LHS of the above inequality is maximized when v0 = z2−z3

κ1+κ2and v1 = v0+ κ1z2+κ2z3

κ1+κ2. Hence,

the above inequality is satisfied if

(1− κ2x+0 )(

z2−z3

κ1+κ2+ κ1z2+κ2z3

κ1+κ2

)− (1− κ2x

+1 ) z2−z3

κ1+κ2

x+1 − x+

0

≤ z2 (77)

⇔(

1− x+1 − x+

0 (κ2 − 1)

x+1 − x+

0

κ1

κ1 + κ2

)z2 ≤ −

1− x+1 − x+

0 (κ2 − 1)

x+1 − x+

0

κ2

κ1 + κ2

z3. (78)

Since x+0 <

1−β+1

1−κ2<

1−x+1

1−κ2, we have

z2 ≤ −κ2

κ1

z3. (79)

To eliminate the following upper bound resulting from (αf-2)

δ ≤

(1−κ2x+0 )v1−(1−κ2x+

s1(θ))v0

x+s1(θ)

−x+0

−(1−κ2x+

0 )

(b− s1(θ)−1

N−1c

)−(1−κ2x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

z2 −(1−κ2x+

0 )

(b− s1(θ)−1

N−1c

)−(1−κ2x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

, (80)


31

we need to have (since z2 −(1−κ2x+

0 )

(b− s1(θ)−1

N−1c

)−(1−κ2x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

< 0)

(1− κ2x+0 )v1 − (1− κ2x

+s1(θ))v

0

x+s1(θ) − x+

0

≤ z2, ∀v1, v0. (81)

Similarly, the LHS of the above inequality is maximized when v0 = z2−z3

κ1+κ2and v1 = v0 +

κ1z2+κ2z3

κ1+κ2. Hence, the above inequality is satisfied if

(1− κ2x+0 )(

z2−z3

κ1+κ2+ κ1z2+κ2z3

κ1+κ2

)− (1− κ2x

+s1(θ))

z2−z3

κ1+κ2

x+s1(θ) − x+

0

≤ z2 (82)

⇔

1− x+s1(θ) − x+

0 (κ2 − 1)

x+s1(θ) − x+

0

κ1

κ1 + κ2

z2 ≤ −1− x+

s1(θ) − x+0 (κ2 − 1)

x+s1(θ) − x+

0

κ2

κ1 + κ2

z3. (83)

Since x+0 <

1−β+1

1−κ2<

1−x+s1(θ)

1−κ2, we have

z2 ≤ −κ2

κ1

z3. (84)

To eliminate the following upper bound resulting from (αf-3)

δ ≤

(1+κ1x+0 )v1−(1+κ1x+

s1(θ))v0

x+s1(θ)

−x+0

−(1+κ1x+

0 )

(b− s1(θ)−1

N−1c

)−(1+κ1x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

z3 −(1+κ1x+

0 )

(b− s1(θ)−1

N−1c

)−(1+κ1x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

, (85)

we need to have (since z3 −(1+κ1x+

0 )

(b− s1(θ)−1

N−1c

)−(1+κ1x+

s1(θ))

(s0(θ)−1

N−1b−c

)x+

s1(θ)−x+

0

< 0)

(1 + κ1x+0 )v1 − (1 + κ1x

+s1(θ))v

0

x+s1(θ) − x+

0

≤ z3, ∀v1, v0. (86)

Again, the LHS of the above inequality is maximized when v0 = z2−z3

κ1+κ2and v1 = v0 + κ1z2+κ2z3

κ1+κ2.

Hence, the above inequality is satisfied if

(1 + κ1x+0 )(

z2−z3

κ1+κ2+ κ1z2+κ2z3

κ1+κ2

)− (1 + κ1x

+s1(θ))

z2−z3

κ1+κ2

x+s1(θ) − x+

0

≤ z3 (87)

⇔

1− x+s1(θ) + x+

0 (κ1 + 1)

x+s1(θ) − x+

0

κ1

κ1 + κ2

z2 ≤ −1− x+

s1(θ) + x+0 (κ1 + 1)

x+s1(θ) − x+

0

κ2

κ1 + κ2

z3. (88)

Since 1− x+s1(θ) + x+

0 (κ1 + 1) > 0, we have

z2 ≤ −κ2

κ1

z3. (89)


32

In summary, to eliminate the upper bounds on δ, we only need to have z2 ≤ −κ2

κ1z3, which is

satisfied since we define z3 , −κ1

κ2z2.

Now we derive the analytical lower bound on δ based on the lower bounds resulting from

(αa-3) and (αf-1):

(αa-3) ⇔ δ ≥(1+κ1x+

0 )v1−(1+κ1x+1 )v0

x+1 −x+

0

+ κ1 · (b− c)

z3 + κ1 · (b− c), (90)

and

δ ≥v1 − v0 −

(s1(θ)N−1

b + s0(θ)N−1

c)

(ε0 − ε1)(x+s1(θ) − x+

0 )−(

s1(θ)N−1

b + s0(θ)N−1

c) , for θ s.t.

s1(θ)

N − 1b +

s0(θ)

N − 1c ≥ v1 − v0. (91)

We define an intermediate lower bound based on the latter inequality along with the inequalities

resulting from the incentive compatibility constraints:

δ′ = max

maxs1∈{1,...,N−1}: s1

N−1b+

N−s1N−1

c>ε0−ε1

ε0 − ε1 −(

s1

N−1b + N−s1

N−1c)

(ε0 − ε1)(

N−s1

N−1β+

1 + s1−1N−1

x+1

)−(

s1

N−1b + N−s1

N−1c) ,

maxθ∈{0,1}

c

c + (1− 2ε)(β+θ − (1− β−θ ))(ε0 − ε1)

}.(92)

Then the lower bound can be written as δ = max {δ′, δ′′}, where δ′′ is the lower bound that we

will derive for the case when the users have the same reputation. If the payoffs v1 and v0 satisfy

the constraint resulting from (αa-3), namely satisfy

(1 + κ1x+0 )v1 − (1 + κ1x

+1 )v0

x+1 − x+

0

≤ δz3 − (1− δ)κ1 · (b− c), (93)

then we use αa to decompose v1 and v0. Otherwise, we use αf to decompose v1 and v0

C. When the users have the same reputation

Now we derive the conditions under which any payoff profile in W1N and W0N can be

decomposed.

If all the users have reputation 1, namely θ = 1N , to decompose v ∈ W1N , we need to find

a recommended action α0 and a simple continuation payoff function γ such that for all i ∈ N

and for all αi ∈ A,

vi = (1− δ)ui(θ, α0, α0 · 1N) + δ∑θ′

γi(θ′)q(θ′|θ, α0, α0 · 1N) (94)

≥ (1− δ)ui(θ, α0, αi, α0 · 1N−1) + δ∑θ′

γi(θ′)q(θ′|θ, α0, αi, α0 · 1N−1).


33

When all the users have the same reputation, the altruistic action αa is equivalent to the fair

action αf . Hence, we use the altruistic action and the selfish action to decompose the payoff

profiles.

If we use the altruistic action αa to decompose a payoff profile v, we have

v1 = (1− δ)(b− c) + δ(x+1 γ1 + (1− x+

1 )γ0), (95)

and the incentive compatibility constraint

(1− 2ε)[β+

1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ

δc. (96)

Setting γ1 = γ0 + 1−δδ

c

(1−2ε)[β+1 −(1−β−1 )]

and noticing that γ0 ∈[

(1+κ1)(ε0−ε1)−z3

κ1, κ1z2+(κ2−1)z3

κ1+κ2

],

we get an lower bound on v1 that can be decomposed by αa

v1 = (1− δ)(b− c) + δ

γ0 + x+1

1− δ

δ

c

(1− 2ε)[β+

1 − (1− β−1 )] (97)

≥ (1− δ)

b− c + cx+

1

(1− 2ε)[β+

1 − (1− β−1 )]+ δ

(1 + κ1)(ε0 − ε1)− z3

κ1

(98)

If we use the selfish action αs to decompose a payoff profile v, we have

v1 = δ(x+1 γ1 + (1− x+

1 )γ0). (99)

Since the selfish action is NE of the stage game, the incentive compatibility constraint is

satisfied as long as we set γ1 = γ0. Hence, we have v1 = δγ0. Again, noticing that γ0 ∈[(1+κ1)(ε0−ε1)−z3

κ1, κ1z2+(κ2−1)z3

κ1+κ2

], we get an upper bound on v1 that can be decomposed by αs

v1 = δγ0 ≤ δκ1z2 + (κ2 − 1)z3

κ1 + κ2

. (100)

In order to decompose any payoff profile v ∈ W1N , the lower bound on v1 that can be

decomposed by αa must be smaller than the upper bound on v1 that can be decomposed by αs,

which leads to

(1− δ)(b− c + c

x+1

(1−2ε)[β+1 −(1−β−1 )]

)+ δ (1+κ1)(ε0−ε1)−z3

κ1≤ δ κ1z2+(κ2−1)z3

κ1+κ2

⇒ δ ≥b−c+c

x+1

(1−2ε)[β+1−(1−β−

1)]

b−c+cx+1

(1−2ε)[β+1−(1−β−

1)]

+κ1z2+(κ2−1)z3

κ1+κ2− (1+κ1)(ε0−ε1)−z3

κ1

. (101)

Finally, following the same procedure, we derive the lower bound on δ when all the users

have reputation 0, namely θ = 0N . Similarly, in this case, the altruistic action αa is equivalent


34

to the fair action αf . Hence, we use the altruistic action and the selfish action to decompose the

payoff profiles.

If we use the altruistic action αa to decompose a payoff profile v, we have

v0 = (1− δ)(b− c) + δ(x+0 γ1 + (1− x+

0 )γ0), (102)

and the incentive compatibility constraint

(1− 2ε)[β+

0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ

δc. (103)

If we use the selfish action αs to decompose a payoff profile v, we have

v1 = δ(x+0 γ1 + (1− x+

0 )γ0). (104)

Note that when θ = 0N , if we substitute β+0 , β−0 , x−0 with β+

1 , β−1 , x−1 , respectively, the

decomposability constraints become the same as those when θ = 1N . Hence, we derive a

similar lower bound on δ

δ ≥b− c + c

x+0

(1−2ε)[β+0 −(1−β−0 )]

b− c + cx+0

(1−2ε)[β+0 −(1−β−0 )]

+ κ1z2+(κ2−1)z3

κ1+κ2− (1+κ1)(ε0−ε1)−z3

κ1

. (105)

Finally, we can obtain the lower bound on δ when the users have the same reputation as

δ′′ = maxθ∈{0,1}

b− c + cx+

θ

(1−2ε)[β+θ−(1−β−

θ)]

b− c + cx+

θ

(1−2ε)[β+θ−(1−β−

θ)]

+ κ1z2+(κ2−1)z3

κ1+κ2− (1+κ1)(ε0−ε1)−z3

κ1

. (106)

Together with the lower bound δ′ derived for the case when the users have different reputations,

we can get the lower bound δ specified in Condition 3 of Theorem 1.

APPENDIX B

PROOF OF PROPOSITION 1

We prove that for any self-generating set (Wθ)θ∈ΘN , the maximum payoff in (Wθ)θ∈ΘN ,

namely maxθ∈ΘN maxv∈Wθ maxi∈N vi, is bounded away from the social optimum b−c, regardless

of the discount factor. In this way, we can prove that any equilibrium payoff is bounded away

from the social optimum. In addition, we analytically quantify the efficiency loss, which is

independent of the discount factor.

Since the strategies are restricted on the subset of actions Aas, in each period, all the users

will receive the same stage-game payoff, either (b− c) or 0, regardless of the matching rule and


35

the reputation profile. Hence, the expected discounted average payoff for each user is the same.

More precisely, at any given history ht = (θ0, . . . ,θt), we have

Ui(θt, π0|ht , π|ht · 1N) = Uj(θ

t, π0|ht , π|ht · 1N), ∀i, j ∈ N , (107)

for any (π0, π · 1N) ∈ Πf (Aas) × ΠN

f (Aas). As a result, when we restrict to the action set Aas,

the self-generating set (Wθ)θ∈ΘN satisfies for any θ and any v ∈ Wθ

vi = vj, ∀i, j ∈ N . (108)

Given any self-generating set (Wθ)θ∈ΘN , define the maximum payoff v as

v , maxθ∈ΘN

maxv∈Wθ

maxi∈N

vi. (109)

Now we derive the upper bound of v by looking at the decomposability constraints.

To decompose the payoff profile v · 1N , we must find a recommended action α0 ∈ Aas, an

action profile α · 1N with α ∈ Aas, and a continuation payoff function γ : ΘN → ∪θ′∈ΘNWθ′

with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all αi ∈ A,

v = (1− δ)ui(θ, α0, α · 1N) + δ∑θ′

γi(θ′)q(θ′|θ, α0, α · 1N) (110)

≥ (1− δ)ui(θ, α0, αi, α · 1N−1) + δ∑θ′

γi(θ′)q(θ′|θ, α0, αi, α · 1N−1).

Note that we do not require the users’ action α to be the same as the recommended action α0,

and that we also do not require the continuation payoff function γ to be a simple continuation

payoff function.

First, the payoff profile v · 1N cannot be decomposed by a recommended action α0 and the

selfish action αs. Otherwise, since γ(θ′) ∈ Wθ′ , we have

v = (1− δ) · 0 + δ∑θ′


a · 1N) ≤ δ∑θ′

vi · q(θ′|θ, α0, αa · 1N) = δ · v < v,

which is a contradiction.

Since we must use a recommended action α0 and the altruistic action αa to decompose v ·1N ,

we can rewrite the decomposability constraint as

v = (1− δ)(b− c) + δ∑θ′


a · 1N) (111)

≥ (1− δ)ui(θ, α0, αi, αa · 1N−1) + δ

∑θ′

γi(θ′)q(θ′|θ, α0, αi, α

a · 1N−1).


36

Since the continuation payoffs under different reputation profiles θ, θ′ that have the same

reputation distribution s(θ) = s(θ′) are the same, namely γ(θ) = γ(θ′), the continuation

payoff depends only on the reputation distribution. For notational simplicity, with some abuse of

notation, we write γ(s) as the continuation payoff when the reputation distribution is s, write

q(s′|θ, α0, αi, αa · 1N−1) as the probability that the next state has a reputation distribution s′,

and write ui(s, αa, αi, αa · 1N−1) as the stage-game payoff when the next state has a reputation

distribution s. Then the decomposability constraint can be rewritten as

v = (1− δ)(b− c) + δ∑s′

γi(s′)q(s′|θ, α0, α

a · 1N) (112)

≥ (1− δ)ui(s, α0, αi, αa · 1N−1) + δ

∑s′

γi(s′)q(s′|θ, α0, αi, α

a · 1N−1).

Now we focus on a subclass of continuation payoff functions, and derive the maximum payoff

v achievable under this subclass of continuation payoff functions. Later, we will prove that we

cannot increase v by choosing other continuation payoff functions. Specifically, we focus on a

subclass of continuation payoff functions that satisfy

γi(s) = xA, ∀i ∈ N , ∀s ∈ SA ⊂ S, (113)

γi(s) = xB, ∀i ∈ N , ∀s ∈ SB ⊂ S, (114)

where SA and SB are subsets of the set of reputation distributions S that have no intersection,

namely SA ∩ SB = ∅. In other words, we assign the two continuation payoff values to two

subsets of reputation distributions, respectively. Without loss of generality, we assume xA ≥ xB.

Now we derive the incentive compatibility constraints. There are three actions to deviate to,

the action α0 in which the user does not serve users with reputation 0, the action α1 in which

the user does not serve users with reputation 1, and the action α01 in which the user does not

serve anyone. The corresponding incentive compatibility constraints for a user i with reputation

θi = 1 are ∑s′∈SA


(xA − xB) ≥ 1− δ

δ

s0

N − 1c,

∑s′∈SA


(xA − xB) ≥ 1− δ

δ

s1 − 1

N − 1c,

∑s′∈SA


(xA − xB) ≥ 1− δ

δc. (115)


37

Similarly, the corresponding incentive compatibility constraints for a user j with reputation

θj = 0 are ∑s′∈SA

q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αj = α0, αa · 1N)

(xA − xB) ≥ 1− δ

δ

s0 − 1

N − 1c,

∑s′∈SA


(xA − xB) ≥ 1− δ

δ

s1

N − 1c,

∑s′∈SA


(xA − xB) ≥ 1− δ

δc. (116)

We can summarize the above incentive compatibility constraints as

xA − xB ≥1− δ

δc · ρ(θ, α0, SA), (117)

where

ρ(θ, α0, SB) , maxi∈N

max

sθi

−1

N−1∑s′∈S\SB


s1−θi

N−1∑s′∈S\SB


1∑s′∈S\SB


}. (120)

Since the maximum payoff v satisfies

v = (1− δ)(b− c) + δ

xA

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) + xB

∑s′∈SB

q(s′|θ, α0, αa · 1N)

, (121)

to maximize v, we choose xB = xA − 1−δδ

c · ρ(θ, α0, SB). Since xA ≥ v, we have

v = (1− δ)(b− c) + δ

xA −1− δ

δc · ρ(θ, α0, SB)

∑s′∈SB

q(s′|θ, α0, αa · 1N)

(122)

≤ (1− δ)(b− c) + δ

v − 1− δ

δc · ρ(θ, α0, SB)

∑s′∈SB

q(s′|θ, α0, αa · 1N)

, (123)

which leads to

v ≤ b− c− c · ρ(θ, α0, SB)∑

s′∈SB

q(s′|θ, α0, αa · 1N). (124)

Hence, the maximum payoff v satisfies

v ≤ b− c− c · minSB⊂S

ρ(θ, α0, SB)∑

s′∈SB

q(s′|θ, α0, αa · 1N)

, (125)


38

where SB satisfies for all i ∈ N ,

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) ≥

∑s′∈S\SB

q(s′|θ, α0, αi = α0, αa · 1N),

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) ≥

∑s′∈S\SB

q(s′|θ, α0, αi = α1, αa · 1N),

∑s′∈S\SB

q(s′|θ, α0, αa · 1N) ≥

∑s′∈S\SB

q(s′|θ, α0, αi = α01, αa · 1N). (126)

Following the same logic as in the proof of Proposition 6 in [8], we can prove that we cannot

achieve a higher maximum payoff by other continuation payoff functions.

REFERENCES

[1] Y. Zhang and M. van der Schaar, “Reputation-based incentive protocols in crowdsourcing applications,” in Proceedings of

the IEEE INFOCOM, ser. INFOCOM ’12, 2012.

[2] C.-J. Ho, Y. Zhang, and M. van der Schaar, “Towards social norm design for crowdsourcing markets,” in Proceedings of

the AAAI Human Computation Workshop, ser. HCOMP’2012, 2012.

[3] Y. Zhang and M. van der Schaar, “Strategic learning and robust protocol design for online communities with selfish users,”

IEEE Transactions on Automatic Control, 2011.

[4] M. Kandori, “Social norms and community enforcement,” Review of Economic Studies, vol. 59, no. 1, pp. 63 – 80, 1992.

[5] A. Blanc, Y.-K. Liu, and A. Vahdat, “Designing incentives for peer-to-peer routing,” in Proceedings of the IEEE INFOCOM,

ser. INFOCOM ’05, 2005.

[6] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, “Free-riding and whitewashing in peer-to-peer systems,” IEEE

Journal on Selected Areas in Communications, vol. 24, no. 5, pp. 1010–1019, 2006.

[7] B. Q. Zhao, J. C. S. Lui, and D.-M. Chiu, “Analysis of adaptive incentive protocols for p2p networks,” in Proceedings of

the IEEE INFOCOM, ser. INFOCOM ’09, 2009.

[8] C. Dellarocas, “Reputation mechanism design in online trading environments with pure moral hazard,” Information Systems

Research, vol. 16, no. 2, pp. 209–230, 2005.

[9] D. Fudenberg, D. K. Levine, and E. Maskin, “The folk theorem with imperfect public information,” Econometrica, vol. 62,

no. 5, pp. 997–1039, 1994.

[10] J. Horner, T. Sugaya, S. Takahashi, and N. Vielle, “Recursive methods in discounted stochastic games: An algorithm for

δ → 1 and a folk theorem,” Econometrica, vol. 79, no. 4, pp. 1277–1318, 2011.

[11] D. Abreu, D. Pearce, and E. Stacchetti, “Toward a theory of discounted repeated games with imperfect monitoring,”

Econometrica, vol. 58, no. 5, pp. 1041–1063, 1990.

[12] G. Mailath and L. Samuelson, Repeated Games and Reputations: Long-run Relationships. Oxford, U.K.: Oxford University

Press, 2006.


Documents

1 Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update …medianetlab.ee.ucla.edu/data/publications/APS_Rating.pdf · 2013-07-02 · 1 Socially-Optimal Design