Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Socially-Optimal Design of Crowdsourcing
Platforms with Reputation Update Errors
Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar
Abstract
Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a platform
for requesters, who have tasks to solve, to ask for help from workers. Vital to the proliferation of
crowdsourcing systems is incentivizing the workers to exert high effort to provide high-quality services.
In particular, this incentive should be provided, preferably, without monetary payment, which adds extra
infrastructure to the system. Reputation mechanisms have been shown to work effectively as incentive
schemes in crowdsourcing systems. A reputation agency updates the reputations of the workers based
on the requesters’ reports on the quality of the workers’ services. A low-reputation worker is less likely
to get served when it requests for help, which provides incentive for the workers to obtain a high
reputation by exerting high effort. However, reputation update errors are inevitable, because of either
system errors such as loss of reports, or inaccurate reports, resulting from the difficulty in accurately
assessing the quality of a worker’s service. The reputation update error prevents existing reputation
mechanisms from achieving the social optimum. In this paper, we propose a simple binary reputation
mechanism, which has only two reputation labels (“good” and “bad”). To the best of our knowledge, our
proposed reputation mechanism is the first that is proven to be able to achieve the social optimum even
in the presence of reputation update errors. We give the conditions under which the designed binary
reputation mechanism can achieve the social optimum, and discuss how to design the reputation rules,
such that the conditions for achieving social optimum can be fulfilled.
I. INTRODUCTION
Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a plat-
form for a user to elicit collective efforts from the other users in order to solve a task. In a
typical crowdsourcing system, a user can either post tasks and request for help as a requester, or
solve the tasks posted by others as a worker. In some crowdsourcing systems [1][2], the workers
are rewarded by monetary payments from the requesters. The payment is usually paid when the
October 19, 2012 DRAFT
2
worker is assigned with the task, instead of after the worker completes the task. This creates
an incentive problem, namely the worker may want to exert low effort on the task since it has
already been paid. In other systems [3], the servers are rewarded by the benefit obtained from
other users’ services. In such systems without monetary payment, it is even more difficult to
provide the servers with the incentive to exert high efforts. In summary, the incentive provision
for the servers is vital to the effectiveness of crowdsourcing systems.
One effective incentive mechanism is the reputation mechanism [1]–[3]. In a reputation mech-
anism, there is a reputation agency, who assigns reputations to all the users based on their
behaviors. If a user exerts high efforts in the past as a server, it will receives a high reputation
as a summary of its good behaviors in the past. Similarly, a user’s low reputation indicates that
it used to exert low efforts as a server. Since a low-reputation user may get its task served with
low effort, the users have incentive to get high reputation by exerting high efforts on the others’
tasks. Built on this intuition, the existing reputation mechanisms work well, except when there
are reputation update errors. The reputation update error comes from two sources. First, a client
cannot perfectly assess the service quality of a server, and hence, will report the service quality
as low to the reputation agency, even when the server actually exert high effort. Second, the
reputation agency may miss the client’s report on the service quality, or update the reputation
in a wrong way by mistake. In the presence of reputation update errors, the existing reputation
mechanisms [1]–[3] have performance loss that is increasing with the update error probability.
This performance loss in [1]–[3] partially comes from the restriction of attention on stationary
Markov strategies. As we will see later, once we remove this restriction, the social optimum can
be achieved.
In this paper, we show that the social optimum can be achieved in the presence of reputation
update errors, if we do not restrict our attention to stationary Markov strategies as in [1]–[3]. In
other words, we allow the users to take different actions given the same current state at different
time instants, which increases the strategy space to be considered significantly and potentially
complicates the optimal strategy. Nevertheless, we rigorously prove that under certain conditions,
the social optimum can be achieved by a class of simple reputation mechanisms, namely binary
reputation mechanisms that assign binary reputation labels to the users. We derive the conditions
under which the social optimum can be achieved, which provide guidelines for the design of
reputation update rules. In addition, we further simplify the optimal strategy by proving that
October 19, 2012 DRAFT
3
strategies of a simple structure can be optimal, and propose an algorithm to construct the optimal
strategy.
The rest of the paper is organized as follows. In Section II, we discuss related works. In
Section III, we describe the model of crowdsourcing systems. Then we design the optimal repu-
tation mechanisms in Section V. Simulation results in Section VI demonstrate the performance
improvement of the proposed reputation mechanism. Finally, Section VII concludes the paper.
II. RELATED WORKS
The idea of social norm and reputation mechanism was originally proposed in [4]. Assuming
that there is no reputation update error, [4] proposed a simple reputation mechanism, in which
any deviating user will be assigned with the lowest reputation forever and will be effectively
excluded from the system. Without reputation update errors, this reputation mechanism with
the most severe punishment can achieve social optimum. However, with reputation update
errors, eventually all the users will have the lowest reputation and receive no payoff. Hence,
the performance loss of the reputation mechanism in [4] is large under reputation update errors.
Based on the idea in [4], [5][6][7] did some performance analysis on different reputation
mechanisms through simulation. Through simulation, some simple reputation mechanisms are
shown to have performance loss under reputation update errors.
[8] is the first one that rigorously analyzes a class of reputation mechanisms under reputation
update errors. For the class of reputation mechanisms considered, [8] quantifies the performance
loss in terms of the reputation update error probability. Since there is only one long-live player in
the model of [8], the reputation mechanisms in [8] are actually equivalent to a class of reputation
mechanisms that we will analyze in Section V-C. Our analysis on this subclass of reputation
mechanisms shows the same intuition as in [8]. However, we will also design another class
of reputation mechanisms that asymptotically achieve social optimum under reputation update
errors. The contrast between these two classes of reputation mechanisms shows that differential
punishment is crucial to achieve social optimum. In our setting, we can punish users with low
reputation and reward users with high reputation, in order to have differential punishment. On
the contrary, in [8], because there is only one long-live player, all the users (considered as the
single long-live player) are punished or rewarded simultaneously.
[1]–[3] also rigorously analyze a class of reputation mechanisms in the limit case when the
October 19, 2012 DRAFT
4
TABLE I
COMPARISON WITH RELATED WORKS.
User Strategy Deviation-proof Discount factorPerformance loss
(due to update errors)
[1]–[3] Stationary Markov Yes δ < 1 Yes
[5][6][7] Stationary Markov No δ < 1 Yes
[8] Stationary Markov Yes δ < 1 Yes
[4] Nonstationary Yes δ → 1 Yes
This work Nonstationary Yes δ < 1 No
reputation update error probability goes to 0. However, under positive reputation update error
probabilities, there will also be performance loss.
Our paper is the first one that proposes a class of socially optimum reputation mechanisms
under reputation update errors. Although we model the interaction among the users in the
crowdsourcing platform as a stochastic game, our results are fundamentally different from the
folk theorem-type results in repeated games [9] and in stochastic games [10] in two aspects.
First, [9] and [10] show that for any feasible payoff profile v, there exists a discount factor
δ, such that under any δ ≥ δ, the payoff profile v can be achieved at the equilibrium. However,
they cannot obtain the lower bound δ. Hence, they cannot construct the equilibrium strategy
that achieves v, because a feasible discount factor δ ≥ δ is needed to compute the strategy. In
contrast, for a payoff profile v that is arbitrarily close to the social optimum (b− c) ·1N , we can
analytically determine the lower bound of feasible discount factors δ, and thus can construct the
equilibrium strategy to achieve v based on a feasible discount factor.
Second, although both our work and [9][10] build on the theory of self-generating sets [11],
we cannot use the theory of self-generating sets in the same way as [9] and [10] did, because of
some restriction on the strategies in our setting. We will discuss this in details in Section V-A.
In Table I, we compare the proposed work with existing reputation mechanisms.
October 19, 2012 DRAFT
5
III. SYSTEM MODEL
A. Basic Setup
Consider a crowdsourcing platform with N users. Denote the set of users by N = {1, . . . , N}.
We assume that the number of users N is displayed on the platform and is known to all the
users. Each user has plenty of tasks to solve, and possesses some resources (e.g. knowledge)
valuable to the other users’ tasks. A user can request for help to solve its task as a requester, and
can provide service for another user as a worker. Since the users usually stay in the platform
for a long period of time, we divide time into periods labeled by t = 0, 1, 2, . . .. In each period
t, the users act in the following order:
• Each user requests for help to solve its tasks.
• Each user is matched to another user’s task by a matching rule.
• Each user chooses to exert high or low effort in solving the task.
A matching is define as a mapping m : N → N , where user i, as a worker, is matched to
the task of user m(i). Since a user cannot be matched to itself, we denote the set of all possible
matchings as
M = {m : m(i) 6= i,∀i ∈ N} . (1)
A matching rule is then defined as a probability distribution µ on the set of all possible matchings
M . In this paper, we will focus on the uniformly random matching rule, which satisfies
µ(m) = 1/
(N !
N∑i=2
(−1)i
i!
), ∀m ∈ M. (2)
We assume that there is no cost for requesting for service, and that the users always have
tasks to solve. Hence, each user requests for service in every period. A user pays a high or low
cost to exert high or low effort, respectively, as a worker. We assume that the cost of high or
low effort is the same across all the users. A user receives a high or low benefit as a requester
if the matched worker exerts high or low effort, respectively. We also assume that the high or
low benefit is the same across all the users. Since only the difference in the costs of high effort
and low effort matters, we normalize the worker’s cost of exerting low effort to 0, and set the
cost of exerting high effort as c > 0. For the same reason, we normalize the requester’s received
benefit from the worker’s low effort to 0, and set the benefit from high effort as b > 0. Hence,
October 19, 2012 DRAFT
6
TABLE II
THE GIFT-GIVING GAME BETWEEN A CLIENT AND A SERVER.
high effort low effort
request (b,−c) (0, 0)
the interaction of a requester and a worker can be summarized by the “gift-giving” game in
Table II, where the row player is the requester and the column player is the worker.
We can see that in the unique Nash equilibrium of the gift-giving game, the worker will exert
low effort, which results in a zero payoff for both the requester and the worker. We are interested
in the scenarios where b > c, namely exerting high effort leads to the social optimum. Our goal
is to design an incentive scheme such that it is in their self-interests for the workers to exert
high effort.
B. Binary Reputation Mechanisms
A powerful incentive scheme is the reputation mechanism, which exploits the repeated in-
teraction of the requesters and the workers, and assigns each user a reputation as the summary
of its past behavior. In this paper, we focus on the simplest reputation mechanisms, namely
binary reputation mechanisms, and will show that binary reputation mechanisms can achieve
social optimum. A binary reputation mechanism assigns binary reputations to the users. Denote
the set of binary reputations by Θ = {0, 1}, and the reputation of user i by θi, ∀i ∈ N . The
reputation profile is defined as the vector of all the users’ reputations, θ = (θ1, . . . , θN). The
reputation profile contains the users’ private information and is not known to the users. However,
an important statistics, namely the reputation distribution, can be listed on the platform and
observed by all the users. A reputation distribution is defined as a tuple s(θ) = (s0(θ), s1(θ)),
where s1(θ) =∑
i∈N θi is the number of users with reputation 1, and s0(θ) =∑
i∈N (1− θi) is
the number of users with reputation 0. The set of all possible reputation distributions is denoted
S. Note, however, that when a worker is matched to a requester, it is informed by the platform
of its requester’s reputation.
As described before, a user always request for service as a requester because there is no cost
to request. Hence, a user only needs to determine its effort level as a worker. Consequently, we
October 19, 2012 DRAFT
7
model each user’s action as a contingent plan of exerting high or low effort based on its own
reputation and the reputation of the requester matched to it. Formally, each user i’s action, denoted
by αi, is a mapping αi : Θ × Θ → Z, where Z = {0, 1} is the set of effort levels with z = 0
representing “low effort”. Then αi(θm(i), θi) denotes user i’s effort level as a worker when it is
matched to a requester with reputation θm(i). We write the action set as A = {α|α : Θ×Θ → Z},
and the joint action profile of all the users as α = (α1, . . . , αN).
After each worker exerts effort based on its action, the requester reports its assessment of the
effort level to the platform. The report is defined as a mapping R : Z → ∆(Z), where ∆(Z)
is the probability distribution over Z. For example, R(1|z) is the probability that the requester
reports “high effort” given the worker’s actual effort level z. An example report could be
R(z′|z) =
1− ε, z′ = z
ε, z′ 6= z, (3)
where ε ∈ [0, 0.5) is the report error probability.1 This report error may be caused by the
requester’s inaccurate assessment of the effort level and by the system error of the platform (e.g.
loss of reports). Since the users are homogeneous, we assume the same report mapping R for
all the users.
Based on the requester’s report, the platform will update the worker’s reputation based on the
reputation update rule. Again, since the users are homogeneous, we assume that the reputation
update rule is the same for the users. Then the reputation update rule, denoted by τ , is defined as
a mapping τ : Θ×Θ× Z → ∆(Θ). For example, τ(θ′w|θr, θw, z) is the probability distribution
of the worker’s updated reputation θ′w, given the reputation of the requester θr, the worker’s
own reputation θw, and the requester’s report z. We focus on a class of reputation update rules
1We confine the report error probability ε to be smaller than 0.5, because with an error probability ε = 0.5, the report contains
no information about the effort level. If an error probability is larger than 0.5, the platform can use the opposite of the report
as an indication of the effort level, which is equivalent to the case with an error probability smaller than 0.5.
October 19, 2012 DRAFT
8
defined as
τ(θ′w|θr, θw, z)=
β+θw
, θ′w = 1, z ≥ α0(θr, θ)
1− β+θw
, θ′w = 0, z ≥ α0(θr, θ)
1− β−θw, θ′w = 1, z < α0(θr, θ)
β−θw, θ′w = 0, z < α0(θr, θ)
, for θw = 0, 1,
where α0 is the platform’s recommended action, based on which the reputation is updated. In
the above reputation update rule, if the reported effort level is not lower than the one specified
by the recommended action, a worker with reputation θw will have reputation 1 with probability
β+θw
; otherwise, it will have reputation 0 with probability β−θw.
In summary, a reputation mechanism (Θ, τ) is completely determined by the set of reputations
Θ and the reputation update rule τ .
Finally, we summarize the repeated interaction among the users in the crowdsourcing platform
with a reputation mechanism as follows.
1) At the beginning of period t, the platform displays the current reputation distribution s(θ),
and announces the recommended action α0.
2) Each user i requests for services, following which it is matched as a worker to the task
of user m(i) with probability µ(m). It is then informed by the platform of the requester’s
reputation θm(i). According to its action αi, it chooses its effort level αi(θm(i), θi).
3) Each requester reports its assessment of the effort level to the platform.
4) At the end of period t, the platform updates the reputation profile based on the reputation
update rule τ .
IV. STOCHASTIC GAME FORMULATION
In the section, we first formulate the repeated interaction among the users in the crowdsourcing
platform with reputation mechanisms as a stochastic game. Then we make some restrictions on
the users’ strategies based on the users’ knowledge of the system, and define the correspond-
ing equilibrium, considering the above restrictions. Finally, we define the platform designer’s
problem.
October 19, 2012 DRAFT
9
A. The Stochastic Game
A stochastic game is described by the set of players, the set of states with the state transition
probability, the set of actions, and the players’ stage-game payoff functions. We consider the
platform as a player, which is indexed by 0, and define the set of players as {0} ∪ N . The
state is defined as the reputation profile θ, and the set of states as ΘN . The platform’s action
is the recommended action, which is defined in the same way as the users’ actions, namely
α0 : Θ × Θ → {0, 1}. We write the state transition probability as q(θ′|θ, α0, α), which is the
probability that the next state is θ′ given the current state θ, the platform’s recommended action,
and the joint action profile α. Note that the state transition probability q is determined by the
matching rule µ, the report function R, and the reputation update rule τ . Finally, each user i’s
stage-game payoff function is ui(θ, α0, α), which depends on2 the state θ and the joint action
profile α. We define the platform’s payoff as a constant u0(θ, α0, α) = 0,∀θ, α0, α, such that it
will follow the platform designer’s decisions on how to choose the recommended action. Note
that the platform designer’s goal (or payoff) is the social welfare, which will be defined later.
The users interact in the platform repeatedly. Hence, each user should have a strategy, which
is a contingent plan of which action to take based on the history. The history is the collection
of the past and the current states (i.e. reputation profiles) and its own service qualities received
from the workers. This history is different for different users, who have different records of
service qualities, and thus is called private history. By contrast, we define the public history as
the collection of states, which is the same for all the users. In this paper, we focus on public
strategies, which determine the actions based on public histories only. Public strategies are less
complicated than private strategies, and more importantly, will be proven to be able to achieve
the social optimum.
Denote the public history at period t as ht = (θ0, . . . ,θt), where θt is the reputation profile
at the beginning of period t, and the set of public histories at period t as Ht = (ΘN)t+1. Then
each user i’s strategy πi : ∪∞t=0Ht → A as a mapping from the set of all possible histories to
the action set. Similarly, we define the platform’s recommended strategy as π0 : ∪∞t=0Ht → A.
2Note that, although the stage-game payoff does not depend on the recommended action, we write the recommended action
α0 as an argument of the payoff function by the convention of game theory. However, the recommended action does affect the
long-term payoff through the state transition probability.
October 19, 2012 DRAFT
10
The joint strategy profile of all the users is written as π = (π1, . . . , πN). We write the set of all
strategies and the set of all strategy profiles as Π and ΠN , respectively. Given the matching rule
µ, the initial reputation profile θ0, the report function R, and the reputation mechanism (Θ, τ),
the recommended strategy π0 and the joint strategy profile π induce a probability distribution
over the set of all the histories H∞. Taking the expectation with respect to this probability
distribution, each user i receives a discounted average payoff Ui(θ0, π0, π) defined as
Ui(θ0, π0, π) = Eh∞
{(1− δ)
∞∑t=0
δtui(θt, π0(h
t), π(ht))
},
where δ ∈ [0, 1) is the common discount factor of all the users. The discount factor δ is the rate
at which the users discount future payoffs, and reflects the patience of the users. A more patient
user has a larger discount factor.
B. Symmetric Feasible Strategy Profiles
First, since the users are homogeneous, we assume that all the users adopt the same strategy,
namely πi = π, ∀i ∈ N . In particular, we write the symmetric joint strategy profile as π · 1N ,
where 1N is a vector of 1’s with length N . Note that although all the users adopt the same action
in each period, they will choose different effort levels since both their own and their requesters’
reputations are different.
Second, since the users know the reputation distributions only, but not the reputation profiles,
we restrict the users’ strategies to be indifferent in reputation profiles that have the same
reputation distribution. We call such strategies feasible strategies, since it is infeasible for the
users to adopt strategies that require the knowledge of reputation profiles. Formally, we define
the feasible strategies as follows.
Definition 1: A strategy π is feasible, if for all t ≥ 0 and for all ht, ht ∈ Ht, we have
π(ht) = π(ht), if s(θk) = s(θk), k = 0, 1, . . . , t. (4)
We write the set of all feasible strategies as Πf .
In this paper, we focus on symmetric feasible strategy profiles, which is defined as follows.
Definition 2: A symmetric feasible strategy profile is a symmetric strategy profile π · 1N , in
which the strategy π is feasible, namely π ∈ Πf . We write the set of all symmetric feasible
strategy profiles as ΠNf .
October 19, 2012 DRAFT
11
If the users adopt symmetric feasible strategies, their stage-game payoffs can be calculated as
ui(θ, α0, (αi, α · 1N−1)) = b ·[sθi− 1
N − 1α(θi, θi) +
s1−θi
N − 1α(θi, 1− θi)
](5)
− c ·[sθi− 1
N − 1αi(θi, θi) +
s1−θi
N − 1αi(1− θi, θi)
],
where (αi, α · 1N−1) is the action profile in which user i chooses αi and all the other users
choose α. We can see that each user’s payoff can be determined by the reputation distribution
s(θ) only.
Finally, since the action set A has 24 = 16 elements, the complexity of choosing the action is
large for the users. Hence, we consider the strategies that choose actions from a subset B ∈ A,
and define Πf (B) as the set of symmetric feasible strategies restricted on the subset of actions
B. We are particularly interested in two subsets of actions, Aafs , {αa, αf , αs} with three actions
and Aas , {αa, αs} with two actions. Specifically, the action αa is the altruistic action, which
we define as
αa(θr, θw) = 1,∀θr, θw ∈ {0, 1}, (6)
where the worker exerts high effort regardless of the worker’s and the requester’s reputations.
The action αf is the fair action, which we define as
αf(θr, θw) =
0 θw > θr
1 θw ≤ θr
, (7)
where the worker exerts high effort only when the requester has a higher or equal reputation.
The action αs is the selfish action, which we define as
αs(θr, θw) = 0,∀θr, θw ∈ {0, 1}, (8)
where the worker exerts low effort regardless of the worker’s and the requester’s reputations.
As we will show later, a pair of a recommended strategy and a strategy profile (π0, π · 1N) ∈
Πf (Aafs) × ΠN
f (Aafs) can achieve social optimum at the equilibrium if we design reputation
update rules carefully, while a pair (π0, π · 1N) ∈ Πf (Aas) × ΠN
f (Aas) incurs performance loss
at the equilibrium under all reputation update rules unless the reputation update error ε = 0.
October 19, 2012 DRAFT
12
C. Definition of The Equilibrium
As in the game theory literature on stochastic games [10][12], we use public perfect equi-
librium (PPE) as the equilibrium concept, but adapt PPE in our setting because we focus on
feasible strategy profiles. In a PPE, no user has the incentive to deviate following any history.
This is a stronger requirement than that in the Nash equilibrium (NE), which requires that the
users do not deviate following the histories that happen in the equilibrium. However, in our
setting, due to the reputation update error, all the states occur with positive probabilities under
all the strategy profiles. Hence, NE is equivalent to PPE, which means that PPE is the only
appropriate equilibrium concept.
Before we define PPE, we need to define the continuation strategy π|ht , which is a mapping
π|ht : ∪∞k=0Hk → A with π|ht(hk) = π(hthk), where hthk is the concatenation of ht and
hk. Keeping in mind that we focus on symmetric feasible strategy profiles, we formally define
symmetric feasible PPE (SF-PPE) as follows.
Definition 3: A pair of a feasible recommended strategy and a symmetric feasible strategy
profile (π0, π · 1N) ∈ Πf ×ΠNf is a SF-PPE, if for all t ≥ 0, for all ht ∈ Ht, and for all i ∈ N ,
we have
Ui(θt, π0|ht , π|ht · 1N) (9)
≥ Ui(θt, π0|ht , (πi|ht , π|ht · 1N−1)), ∀πi|ht ∈ Πf ,
where (πi|ht , π|ht · 1N−1) is the continuation strategy profile in which user i deviates to πi|ht
and the other users follow the strategy π|ht .
The major difference between SF-PPE and conventional PPE is that in PPE, we need to prevent
the users from deviating to any strategy π ∈ Π. However, in SF-PPE, we need to prevent the
deviation to feasible strategies π ∈ Πf only, because the users do not know the reputation
profiles.
D. The Platform Design Problem
The goal of the platform designer is to maximize the social welfare at the equilibrium in the
worst case (with respect to different initial reputation distributions). Hence, the platform design
October 19, 2012 DRAFT
13
problem can be formulated as follows.
maxτ,(π0,π·1N )∈Π0f×ΠN
f
minθ0∈ΘN
1
N
∑i∈N
Ui(θ0, π0, π · 1N) (10)
s.t. (π0, π · 1N) is a SF− PPE.
The social optimum is b− c, achieved by the workers exerting high efforts all the time, which
is not an equilibrium strategy. In the next section, we will show that under properly designed
reputation update rules, the social optimum b− c can be asymptotically achieved at the SF-PPE
when the users are sufficiently patient (i.e. the discount factor δ → 1).
V. SOCIALLY OPTIMAL DESIGN
In this section, we design reputation mechanisms that asymptotically achieve social optimum
at the equilibrium, even when the reputation update rule ε > 0. In our design, we use the APS
technique, named after the authors of the seminal paper [11], which is also used to prove the folk
theorem for repeated games in [9] and for stochastic games in [10]. We will briefly introduce the
APS technique first. Meanwhile, more importantly, we will illustrate why we cannot use APS
in our setting in the same way as [9] and [10] did. Then, we will show how we use APS in a
different way in our setting, in order to design the optimal reputation mechanism and to construct
the equilibrium strategy. Finally, we analyze the performance of a class of simple but suboptimal
strategies, which sheds light on why the proposed strategy can achieve social optimum.
A. The APS Technique
APS [11] provides a characterization of the set of PPE payoffs. It builds on the idea of
self-generating sets described as follows. Define a set Wθ ⊂ RN for every state θ ∈ ΘN ,
and write (Wθ′)θ′∈ΘN as the collection of these sets. Then we have the following definitions
[11][10][12]. First, we say a payoff profile v(θ) ∈ RN is decomposable on (Wθ′)θ′∈ΘN given θ,
if there exists a recommended action α0, an action profile α∗, and a continuation payoff function
γ : ΘN → ∪θ′∈ΘNWθ′ with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all αi ∈ A,
vi = (1− δ)ui(θ, α0, α∗) + δ
∑θ′
γi(θ′)q(θ′|θ, α0, α
∗) (11)
≥ (1− δ)ui(θ, α0, αi, α∗−i) + δ
∑θ′
γi(θ′)q(θ′|θ, α0, αi, α
∗−i).
October 19, 2012 DRAFT
14
Then, we say a set (Wθ)θ∈ΘN is a self-generating set, if for any θ, every payoff profile v(θ) ∈
Wθ is decomposable on (Wθ′)θ′∈ΘN given θ. The important property of self-generating sets is
that any self-generating set is a set of PPE payoffs [11][10][12].
Based on the idea of self-generating sets, [9] and [10] proved the folk theorem for repeated
games and stochastic games, respectively. However, we cannot use APS in the same way as
[9] and [10] did for the following two reasons. First, we restrict our attention on symmetric
feasible strategy profiles, which impose the following constraints on the self-generating sets and
the continuation payoff functions.
Lemma 1: Suppose that a self-generating set (Wθ)θ∈ΘN is a set of SF-PPE payoffs. Then for
all θ, every payoff profile v(θ) ∈ Wθ should satisfy
vi(θ) = vj(θ), ∀i, j ∈ N s.t. θi = θj, (12)
and should be decomposed by a continuation payoff function such that for all θ′,
γi(θ′) = γj(θ
′), ∀i, j ∈ N s.t. θ′i = θ′j. (13)
Proof: For a pair of a feasible recommended strategy and a symmetric feasible strategy
profile (π0, π · 1N), we have for all θ,
Ui(θ, π0, π · 1N) = Uj(θ, π0, π · 1N), ∀i, j ∈ N s.t. θti = θt
j. (14)
Since for all θ, any payoff profile v(θ) ∈ Wθ can be achieved by a SF-PPE, it should satisfy
vi(θ) = vj(θ), ∀i, j ∈ N s.t. θi = θj. (15)
Since the range of the continuation payoff function is in the self-generating set, for the same
reason, any continuation payoff function should satisfy for all θ′,
γi(θ′) = γj(θ
′), ∀i, j ∈ N s.t. θ′i = θ′j. (16)
Since the restrictions in Lemma 1 are not considered in [9][10], we cannot apply APS in the
same way as [9] and [10] did.
Moreover, when deriving the sufficient conditions for the folk theorem to hold, both [9] and
[10] use the NE of the stage game to decompose the payoffs in the interior of the self-generating
set. This prevents them from obtaining the lower bound of the discount factor. More specifically,
October 19, 2012 DRAFT
15
for any feasible payoff profile v, they can prove that there exists a discount factor δ, such that
under any δ ≥ δ, the payoff profile v can be achieved at PPE. However, they cannot obtain the
lower bound δ. Because of this, they cannot construct the equilibrium strategy that achieves v,
because a feasible discount factor δ ≥ δ is needed to compute the strategy. In contrast, for a
payoff profile v that is arbitrarily close to the social optimum (b− c) · 1N , we can analytically
determine the lower bound of feasible discount factors δ, and thus can construct the equilibrium
strategy to achieve v based on a feasible discount factor.
B. Socially Optimal Design
Now we use APS in a different way to prove that social optimum can be achieved at a
SF-PPE. For the given ε0 and ε1, we construct the following self-generating set. For θ with
1 ≤ s1(θ) ≤ N − 1,
Wθ ={v : vi = vθi , ∀i ∈ N
}, (17)
where (v0, v1) satisfies
v1 − v0 ≥ ε0 − ε1, (18)
v1 +c
(N − 1)b· v0 ≤ z2 , (1 +
c
(N − 1)b)(b− c)− c
(N − 1)bε0 − ε1, (19)
v1 − bN−2N−1
b− c· v0 ≤ z3 , −
bN−2N−1
b−c− 1
1 + c(N−1)b
· z2. (20)
For θ with s1(θ) = 0,
W0N ={v : vi = v0, ∀i ∈ N
}, (21)
where v0 ∈
ε0−ε1−z3b
N−2N−1
b−c−1
, z2−(ε0−ε1)1+ c
(N−1)b
. For θ with s1(θ) = N ,
W1N ={v : vi = v1, ∀i ∈ N
}, (22)
where v1 ∈
bN−2N−1
b−c(ε0−ε1)−z3
bN−2N−1
b−c−1
,( b
N−2N−1
b−c−1)z2+ c
(N−1)bz3
bN−2N−1
b−c+ c
(N−1)b
.
We can see that each set Wθ is bounded by hyperplanes, which is different from the smooth
set (e.g. a ball) used in [9] and [10].
October 19, 2012 DRAFT
16
Apart from the restrictions in Lemma 1, we impose more restrictions on the continuation
payoff function to make it even simpler. Specifically, we focus on the simple continuation payoff
functions defined as below.
Definition 4: We say a continuation payoff function γ : ΘN → ∪θWθ is a simple continuation
payoff function, if for any given state θ, and any given payoff profile v ∈ Wθ, γ satisfies
γi(θ) = γj(θ′), ∀θ, θ′ ∈ ΘN , and ∀i, j ∈ N s.t. θi = θ′j.
Given a state θ and a payoff v ∈ Wθ, a simple continuation payoff function assigns the same
continuation payoff for the users with the same reputation under all the future states.
Using the above self-generating set and simple continuation payoff functions, we prove that
under certain conditions on the reputation update rule τ and the discount factor δ, the social
optimum can be asymptotically achieved at a SF-PPE. For better presentation of the theorem, we
define a few auxiliary variables first. Define κ1 , bN−2N−1
b−c− 1 and κ2 , 1 + c
(N−1)b. In addition,
we define δ, which will be proved to be the lower bound on the feasible discount factors, as
follows
δ , max{
maxs1∈{1,...,N−1}: s1
N−1b+
N−s1N−1
c>ε0−ε1
ε0 − ε1 −(
s1
N−1b + N−s1
N−1c)
(ε0 − ε1)(
N−s1
N−1β+
1 + s1−1N−1
x+1
)−(
s1
N−1b + N−s1
N−1c) ,
maxθ∈{0,1}c
c + (1− 2ε)(β+θ − (1− β−θ ))(ε0 − ε1)
, (23)
maxθ∈{0,1}
b− c + cx+
θ
(1−2ε)[β+θ−(1−β−
θ)]
b− c + cx+
θ
(1−2ε)[β+θ−(1−β−
θ)]
+ κ1z2+(κ2−1)z3
κ1+κ2− (1+κ1)(ε0−ε1)−z3
κ1
}.
Theorem 1: Choose any small real numbers ε1 > 0 and ε0 ∈ (ε1, (1 + κ2
κ1)ε1). If the following
three sets of conditions are satisfied
• Condition 1: β+1 > 1− β−1 and x+
1 , (1− ε)β+1 + ε(1− β−1 ) > 1
1+ c(N−1)b
;
• Condition 2: β+0 > 1− β−0 and x+
0 , (1− ε)β+0 + ε(1− β−0 ) <
1−β+1
c(N−1)b
;
• Condition 3: δ ∈ [δ, 1);
then there exists a SF-PPE (π0, π0 · 1N) ∈ Πf (Aafs)× ΠN
f (Aafs) that achieves
Ui(θ0, π0, π0 · 1N) = b− c− εθi
, ∀i ∈ N , (24)
starting from any initial reputation profile θ0.
Proof: See Appendix A.
October 19, 2012 DRAFT
17
Theorem 1 gives the sufficient conditions under which the social optimum can be asymptot-
ically achieved. The first two conditions are the restrictions on the optimal reputation update
rules. First, the probability that the reputation goes up when the effort level is not lower than
the recommended one, β+θ , should be larger than 1 − β−θ , which is the probability that the
reputation goes up when the effort level is lower than the recommended one. Second, for the
users with reputation 1, the expected probability that the reputation goes up when the effort
level is not lower than the recommended one, x+1 , should be larger than the threshold specified
in Condition 1. Meanwhile, for the users with reputation 0, the expected probability that the
reputation goes up when the effort level is not lower than the recommended one, x+0 , should
be smaller than the threshold specified in Condition 2. In this way, a user will prefer to have
reputation 1 because x+0 < x+
1 .
Condition 3 gives us an analytical lower bound for the discount factor. This is important
for constructing the equilibrium (π0, π0 · 1N), because a feasible discount factor is needed to
determine the equilibrium. Note that in [9] and [10], the lower bound for the discount factor
cannot be obtained analytically.
Remark 1: Note that the SF-PPE has a desirable property that the user’s strategy π = π0 is
the same as the recommended strategy. Hence, the platform can announce the recommended
action in each period, such that the users can follow the recommended action. In this way, the
users do not need to calculate the action to take in each period by themselves.
Once the conditions for achieving social optimum are satisfied, we know that an optimal
equilibrium strategy exists. Now we show how to construct the optimal equilibrium strategy.
The optimal equilibrium strategy is not stationary, in that given the same state, the actions
taken can be different at different periods. Hence, at each period t, we need to determine the
recommended action αt0, which is the same as the users’ actions α. The algorithm of constructing
the recommended strategy, given a sufficiently large discount factor δ, is described in Table III.
Remark 2: Note that in the algorithm, if all the users have the same reputation, the altruistic
action αa and the selfish action αs are used; otherwise, the altruistic action αa and the fair action
αf are used.
October 19, 2012 DRAFT
18
TABLE III
THE ALGORITHM OF CONSTRUCTING THE EQUILIBRIUM STRATEGY.
Require: ε0, ε1, δ ≥ δ, sθ
Initialization: t = 0, vθ = b− c− εθ .
repeat
if s1(θ) = 0 then
if v0 ≥ (1− δ)[b− c +
(1−ε)β+0 +ε(1−β−0 )
(1−2ε)(β+0 −(1−β−0 )
c]
+ δ ε0−ε1−z3b
N−2N−1 b−c
−1then
αt0 = αt = αa
v0 ← v0
δ− 1−δ
δ
[b− c +
(1−ε)β+0 +ε(1−β−0 )
(1−2ε)(β+0 −(1−β−0 )
c]
,v1 ← v0 + 1−δδ
[1
(1−2ε)(β+0 −(1−β−0 )
c]
else
αt0 = αt = αs
v0 ← v0
δ, v1 ← v0
end
elseif s1(θ) = N then
if v1 ≥ (1− δ)[b− c +
(1−ε)β+1 +ε(1−β−1 )
(1−2ε)(β+1 −(1−β−1 )
c]
+ δ ε0−ε1−z3b
N−2N−1 b−c
−1then
αt0 = αt = αa
v1 ← v1
δ− 1−δ
δ
[b− c +
(1−ε)β+1 +ε(1−β−1 )
(1−2ε)(β+1 −(1−β−1 )
c]
, v0 ← v1 − 1−δδ
[1
(1−2ε)(β+1 −(1−β−1 )
c]
else
αt0 = αt = αs
v1 ← v1
δ, v0 ← v1
end
else
if1+( b
N−2N−1 b−c
−1)x+0
x+1 −x+
0v1 −
1+( bN−2N−1 b−c
−1)x+1
x+1 −x+
0v0 ≤ δz3 − (1− δ)(b− c)( b
N−2N−1 b−c
− 1) then
αt0 = αt = αa
v1′ ← 1δ
(1−x+0 )v1−(1−x+
1 )v0
x+1 −x+
0− 1−δ
δ(b− c), v0′ ← 1
δ
x+1 v0−x+
0 v1
x+1 −x+
0− 1−δ
δ(b− c)
v1 ← v1′, v0 ← v0′
else
αt0 = αt = αf
v1′ ← 1δ
(1−x+0 )v1−(1−x+
s1(θ))v0
x+s1(θ)
−x+0
− 1−δδ
(b− s1(θ)−1N−1 c)(1−x+
0 )−(s0(θ)−1
N−1 b−c)(1−x+s1(θ)
)
x+s1(θ)
−x+0
v1′ ← 1δ
x+0 v1−x+
s1(θ)v0
x+0 −x+
s1(θ)
− 1−δδ
(b− s1(θ)−1N−1 c)x+
0 −(s0(θ)−1
N−1 b−c)x+s1(θ)
x+0 −x+
s1(θ)
v1 ← v1′, v0 ← v0′
end
end
t← t + 1
until ∅October 19, 2012 DRAFT
19
C. Performance Analysis of Strategies Restricted on Aas
Previously, we have shown that under certain conditions, the social optimum can be achieved
by feasible recommended strategies and symmetric feasible strategy profiles restricted on the
subset of action Aafs. Now we consider a even simpler class of strategies, restricted on the
subset of action Aas. The following proposition quantifies the performance loss of the optimal
feasible recommended strategy and symmetric feasible strategy profile restricted on Aas.
Before stating the proposition, we first define
ρ(θ, α0, SB) , maxi∈N
max
sθi
−1
N−1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α0, αa · 1N),
s1−θi
N−1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α1, αa · 1N), (25)
1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α01, αa · 1N)
}.
Proposition 1: Starting from any initial reputation profile θ, the maximum social welfare
achievable by feasible recommended strategies and symmetric feasible strategy profiles restricted
on Aas, namely (π0, π · 1N) ∈ Πf (Aas)× ΠN
f (Aas), is at most
b− c− c · ρ(θ, α∗0, S∗B)
∑s′∈S∗B
q(s′|θ, α∗0, αa · 1N), (26)
where α∗0, the optimal recommended action, and S∗B, the optimal subset of reputation distribu-
tions, are the solutions to the following optimization problem:
minα0 minSB⊂S
ρ(θ, α0, SB)∑
s′∈SB
q(s′|θ, α0, αa · 1N)
(27)
s.t.∑
s′∈S\SB
q(s′|θ, α0, αa · 1N) >
∑s′∈S\SB
q(s′|θ, α0, αi = α0, αa · 1N), ∀i ∈ N ,
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) >
∑s′∈S\SB
q(s′|θ, α0, αi = α1, αa · 1N), ∀i ∈ N ,
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) >
∑s′∈S\SB
q(s′|θ, α0, αi = α01, αa · 1N), ∀i ∈ N .
Proof: See Appendix B.
The following corollary shows that the maximum social welfare achievable by (π0, π · 1N) ∈
Πf (Aas)×ΠN
f (Aas) is bounded away from the social optimum b−c, unless there is no reputation
update error. In contrast, if we can use the fair action αf , the social optimum can be asymptotically
October 19, 2012 DRAFT
20
achieve when the discount factor goes to 1. In other words, the differential punishment introduced
by the fair action is crucial for achieving social optimum.
Corollary 1: Suppose that the reputation update error ε > 0. Then, starting from any initial
reputation profile θ, the maximum social welfare achievable by feasible recommended strategies
and symmetric feasible strategy profiles restricted on Aas, namely (π0, π · 1N) ∈ Πf (Aas) ×
ΠNf (Aas), is bounded away from the social optimum b− c, regardless of the discount factor.
Proof: We can see from (28) that, the distance between the maximum social welfare
achievable by (π0, π · 1N) ∈ Πf (Aas)× ΠN
f (Aas) and the social optimum b− c
c · ρ(θ, α∗0, S∗B)
∑s′∈S∗B
q(s′|θ, α∗0, αa · 1N), (28)
is independent of the discount factor δ.
Moreover, since ρ(θ, α0, SB) > 0, the distance can be zero only when∑
s′∈S∗Bq(s′|θ, α∗0, α
a ·
1N) = 0. However, when the reputation update error ε is strictly larger than 0, any reputa-
tion profile can occur with positive probabilities. Hence, we must choose S∗B = ∅ to have∑
s′∈S∗Bq(s′|θ, α∗0, α
a · 1N) = 0. In this case, the constraint
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) >
∑s′∈S\SB
q(s′|θ, α0, αi = α0, αa · 1N) (29)
cannot be satisfied, since∑
s′∈S\SBq(s′|θ, α0, αi = α0, αa ·1N) =
∑s′∈S q(s′|θ, α0, αi = α0, αa ·
1N) = 1.
VI. SIMULATION RESULTS
We compare against the state-of-the-art reputation mechanism, namely the reputation mech-
anism with optimal stationary Markov strategies in [3]. The reputation mechanism in [3] has
been shown to outperform existing reputation mechanisms [1] and BitTorrent protocols [?]. In
a stationary Markov strategy, the action to take depends on the current state only, which is
independent of when the current state appears. In contrast, the proposed strategy depends on the
history of states. The same current state may lead to different actions in different periods. In
our experiments, we consider a system with N = 10 users and discount factor δ = 0.99. In the
simulation of the optimal stationary Markov strategy [3], the recommended action is fixed to be
the fair action αf .
October 19, 2012 DRAFT
21
0.01 0.06 0.11 0.16 0.21 0.26 0.31 0.36 0.41 0.460
0.2
0.4
0.6
0.8
1
Error probability
Nor
mal
ized
soc
ial w
elfa
re
proposedstationary Markov (b=2)stationary Markov (b=3)stationary Markov (b=4)stationary Markov (b=5)
Fig. 1. Comparison of the social welfare under different reputation update errors.
We first compare the social welfare (i.e. the average payoff), which has been normalized to the
social optimum b− c, of the two reputation mechanisms under different benefit values. We can
see from Fig. 1 that as the error probability grows, the social welfare of the optimal stationary
Markov strategy decreases, and drops to 0 when the error probability is large (e.g. when ε > 0.4).
Then, we compare the evolution of the states, the recommended actions, and the actions taken
by the users in these two reputation mechanisms, in order to see how they work differently. The
evolution under the optimal stationary Markov strategy is shown in Table IV, and that of the
proposed strategy is shown in Table V. The first difference is that in the reputation mechanism
with stationary Markov strategies [3], the recommended action is fixed and the action taken
can be different from the recommended action. On the contrary, in the proposed reputation
mechanism, the recommended action is not fixed over time and the action taken is always the
same as the recommended action. The second difference is that in stationary Markov strategies,
the actions taken in different periods are the same as long as the current state is the same. In
contrast, in the proposed strategy, the actions taken can be different even at the same current
state.
VII. CONCLUSION
In this paper, we proposed a design framework for binary reputation mechanisms that can
achieve the social optimum in the presence of reputation update errors. We discovered the
October 19, 2012 DRAFT
22
TABLE IV
EVOLUTION OF STATES AND ACTIONS TAKEN IN THE OPTIMAL STATIONARY MARKOV STRATEGY.
Period State Recommended action (stationary Markov) Action taken (stationary Markov)
0 (4,6) Fair Selfish
1 (4,6) Fair Selfish
2 (6,4) Fair Selfish
3 (6,4) Fair Selfish
4 (4,6) Fair Selfish
5 (4,6) Fair Selfish
6 (3,7) Fair Altruistic
7 (1,9) Fair Altruistic
8 (1,9) Fair Altruistic
9 (2,8) Fair Altruistic
TABLE V
EVOLUTION OF STATES, RECOMMENDED ACTIONS, AND ACTIONS TAKEN IN THE PROPOSED STRATEGY.
Period state Recommended action (proposed) Action taken (proposed)
0 (4,6) Altruistic Altruistic
1 (3,7) Fair Fair
2 (5,5) Altruistic Altruistic
3 (4,6) Fair Fair
4 (5,5) Altruistic Altruistic
5 (6,4) Altruistic Altruistic
6 (4,6) Fair Fair
7 (2,8) Altruistic Altruistic
8 (1,9) Altruistic Altruistic
9 (1,9) Altruistic Altruistic
enforceability conditions, which guarantee that the social optimum can be achieved. We proposed
a class of reputation update rules, which can fulfill the enforceability conditions when the
parameters are chosen carefully. Moreover, we constructed the optimal recommended strategy
in the reputation mechanism. The computational complexity is greatly reduced, since only three
actions are proved to be enough for the optimal strategy. Simulation results demonstrate the
October 19, 2012 DRAFT
23
significant performance gain of the proposed reputation mechanism over the state-of-the-art
mechanisms, especially when the reputation update error probability is large.
APPENDIX A
PROOF OF THEOREM 1
A. Outline of the proof
We derive the conditions under which the set (Wθ)θ∈ΘN is a self-generating set. Specifically,
we derive the conditions under which any payoff profile v ∈ Wθ is decomposable on (Wθ′)θ′∈ΘN
given θ, for all θ ∈ ΘN .
B. When users have different reputations
1) Preliminaries: We first focus on the states θ with 1 ≤ s1(θ) ≤ N − 1, and derive the
conditions under which any payoff profile v ∈ Wθ can be decomposed by (α0 = αa, αa ·1N) or
(α0 = αf , αf · 1N). First, v could be decomposed by (αa, αa · 1N), if there exists a continuation
payoff function γ : ΘN → ∪θ′∈ΘNWθ′ with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all
αi ∈ A,
vi = (1− δ)ui(θ, αa, αa · 1N) + δ∑θ′
γi(θ′)q(θ′|θ, αa, αa · 1N) (30)
≥ (1− δ)ui(θ, αa, αi, αa · 1N−1) + δ
∑θ′
γi(θ′)q(θ′|θ, αa, αi, α
a · 1N−1).
Since we focus on simple continuation payoff functions, all the users with the same future
reputation will have the same continuation payoff regardless of the recommended action α0, the
action profile (αi, α · 1N−1), and the future state θ′. Hence, we write the continuation payoffs
for the users with future reputation 1 and 0 as γ1 and γ0, respectively. Consequently, the above
conditions on decomposability can be simplified to
vi = (1− δ) · ui(θ, αa, αa · 1N) (31)
+ δ
γ1∑
θ′:θ′i=1
q(θ′|θ, αa, αa · 1N) + γ0∑
θ′:θ′i=0
q(θ′|θ, αa, αa · 1N)
≥ (1− δ) · ui(θ, αa, αi, α
a · 1N−1)
+ δ
γ1∑
θ′:θ′i=1
q(θ′|θ, αa, αi, αa · 1N−1) + γ0
∑θ′:θ′i=0
q(θ′|θ, αa, αi, αa · 1N−1)
.
October 19, 2012 DRAFT
24
First, consider the case when user i has reputation 1 (i.e. θi = 1). Based on (5), we can calculate
the stage-game payoff as ui(θ, αa, αa · 1N) = b − c. The term∑
θ′:θ′i=1 q(θ′|θ, αa, αa · 1N) is
the probability that user i has reputation 1 in the next state. Since user i’s reputation update is
independent of the other users’ reputation update, we can calculate this probability as
∑θ′:θ′i=1
q(θ′|θ, αa, αa · 1N) = [(1− ε)β+1 + ε(1− β−1 )]
∑m∈M :θm(i)=1
µ(m) (32)
+ [(1− ε)β+1 + ε(1− β−1 )]
∑m∈M :θm(i)=0
µ(m) (33)
= (1− ε)β+1 + ε(1− β−1 ) = x+
1 . (34)
Similarly, we can calculate∑
θ′:θ′i=0 q(θ′|θ, αa, αa ·1N), the probability that user i has reputation
0 in the next state, as
∑θ′:θ′i=0
q(θ′|θ, αa, αa · 1N) = [(1− ε)(1− β+1 ) + εβ−1 ]
∑m∈M :θm(i)=1
µ(m) (35)
+ [(1− ε)(1− β+1 ) + εβ−1 ]
∑m∈M :θm(i)=0
µ(m) (36)
= (1− ε)(1− β+1 ) + εβ−1 = 1− x+
1 . (37)
Now we discuss what happens if user i deviates. Since the recommended action αa is to exert
high effort for all the users, user i can deviate to the other three actions, namely “exert high effort
for reputation-1 users only”, “exert high effort for reputation-0 users only”, “exert low effort
for all the users”. We can calculate the corresponding stage-game payoff and state transition
probabilities under each deviation.
• “exert high effort for reputation-1 users only” (αi(1, θi) = 1, αi(0, θi) = 0):
ui(θ, αa, αi, αa · 1N−1) = b− c ·
∑m∈M :θm(i)=1
µ(m) = b− c · s1(θ)− 1
N − 1(38)
∑θ′:θ′i=1
q(θ′|θ, αa, αi, αa · 1N−1) (39)
= [(1− ε)β+1 + ε(1− β−1 )]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)(1− β−1 ) + εβ+1 ]
∑m∈M :θm(i)=0
µ(m)
= [(1− ε)β+1 + ε(1− β−1 )]
s1(θ)− 1
N − 1+ [(1− ε)(1− β−1 ) + εβ+
1 ]s0(θ)
N − 1.
October 19, 2012 DRAFT
25
∑θ′:θ′i=0
q(θ′|θ, αa, αi, αa · 1N−1) (40)
= [(1− ε)(1− β+1 ) + εβ−1 ]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)β−1 + ε(1− β+1 )]
∑m∈M :θm(i)=0
µ(m)
= [(1− ε)(1− β+1 ) + εβ−1 ]
s1(θ)− 1
N − 1+ [(1− ε)β−1 + ε(1− β+
1 )]s0(θ)
N − 1.
• “exert high effort for reputation-0 users only” (αi(1, θi) = 0, αi(0, θi) = 1):
ui(θ, αa, αi, αa · 1N−1) = b− c ·
∑m∈M :θm(i)=0
µ(m) = b− c · s0(θ)
N − 1(41)
∑θ′:θ′i=1
q(θ′|θ, αa, αi, αa · 1N−1) (42)
= [(1− ε)(1− β−1 ) + εβ+1 ]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)β+1 + ε(1− β−1 )]
∑m∈M :θm(i)=0
µ(m)
= [(1− ε)(1− β−1 ) + εβ+1 ]
s1(θ)− 1
N − 1+ [(1− ε)β+
1 + ε(1− β−1 )]s0(θ)
N − 1.
∑θ′:θ′i=0
q(θ′|θ, αa, αi, αa · 1N−1) (43)
= [(1− ε)β−1 + ε(1− β+1 )]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)(1− β+1 ) + εβ−1 ]
∑m∈M :θm(i)=0
µ(m)
= [(1− ε)β−1 + ε(1− β+1 )]
s1(θ)− 1
N − 1+ [(1− ε)(1− β+
1 ) + εβ−1 ]s0(θ)
N − 1.
• “exert low effort for all the users” (αi(1, θi) = 0, αi(0, θi) = 0):
ui(θ, αa, αi, αa · 1N−1) = b (44)
∑θ′:θ′i=1
q(θ′|θ, αa, αi, αa · 1N−1) (45)
= [(1− ε)(1− β−1 ) + εβ+1 ]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)(1− β−1 ) + εβ+1 ]
∑m∈M :θm(i)=0
µ(m)
= (1− ε)(1− β−1 ) + εβ+1 .
∑θ′:θ′i=0
q(θ′|θ, αa, αi, αa · 1N−1) (46)
= [(1− ε)β−1 + ε(1− β+1 )]
∑m∈M :θm(i)=1
µ(m) + [(1− ε)β−1 + ε(1− β+1 )]
∑m∈M :θm(i)=0
µ(m)
= (1− ε)β−1 + ε(1− β+1 ).
October 19, 2012 DRAFT
26
Plugging the above expressions into (31), we can simplify the incentive compatibility con-
straints (i.e. the inequality constraints) to
(1− 2ε)[β+
1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ
δ· c, (47)
under all three deviating actions.
Hence, if user i has reputation 1, the decomposability constraints (31) reduces to
v1 = (1− δ) · (b− c) + δ ·[x+
1 γ1 + (1− x+1 )γ0
], (48)
where v1 is the payoff of the users with reputation 1, and
(1− 2ε)[β+
1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ
δ· c. (49)
Similarly, if user i has reputation 0, we can reduce the decomposability constraints (31) to
v0 = (1− δ) · (b− c) + δ ·[x+
0 γ1 + (1− x+0 )γ0
], (50)
and
(1− 2ε)[β+
0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ
δ· c. (51)
For the above incentive compatibility constraints (the above two inequalities) to hold, we need
to have β+1 − (1−β−1 ) > 0 and β+
0 − (1−β−0 ) > 0, which are part of Condition 1 and Condition
2. Now we will derive the rest of the sufficient conditions in Theorem 1.
The above two equalities determine the continuation payoff γ1 and γ0 as belowγ1 = 1
δ· (1−x+
0 )v1−(1−x+1 )v0
x+1 −x+
0
− 1−δδ· (b− c)
γ0 = 1δ· x+
1 v0−x+0 v1
x+1 −x+
0
− 1−δδ· (b− c)
. (52)
Now we consider the decomposability constraints if we want to decompose a payoff profile
v ∈ Wθ using the fair action αf . Since we focus on decomposition by simple continuation
payoff functions, we write the decomposition constraints as
vi = (1− δ) · ui(θ, αf , αf · 1N) (53)
+ δ
γ1∑
θ′:θ′i=1
q(θ′|θ, αf , αf · 1N) + γ0∑
θ′:θ′i=0
q(θ′|θ, αf , αf · 1N)
≥ (1− δ) · ui(θ, αf , αi, α
f · 1N−1)
+ δ
γ1∑
θ′:θ′i=1
q(θ′|θ, αf , αi, αf · 1N−1) + γ0
∑θ′:θ′i=0
q(θ′|θ, αf , αi, αf · 1N−1)
.
October 19, 2012 DRAFT
27
Due to space limitation, we omit the details and directly give the simplification of the above
decomposability constraints as follows. First, the incentive compatibility constraints (i.e. the
inequality constraints) are simplified to
(1− 2ε)[β+
1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ
δ· c, (54)
and
(1− 2ε)[β+
0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ
δ· c, (55)
under all three deviating actions. Note that the above incentive compatibility constraints are the
same as the ones when we want to decompose the payoffs using the altruistic action αa.
Then, the equality constraints in (53) can be simplified as follows. For the users with reputation
1, we have
v1 = (1− δ) ·(b− s1(θ)− 1
N − 1c
)+ δ ·
[x+
s1(θ) · γ1 + (1− x+
s1(θ)) · γ0], (56)
where
xs1(θ) ,
[(1− ε)
s1(θ)− 1
N − 1+
s0(θ)
N − 1
]β+
1 +
(εs1(θ)− 1
N − 1
)(1− β−1 ). (57)
For the users with reputation 0, we have
v0 = (1− δ) ·(
s0(θ)− 1
N − 1b− c
)+ δ ·
[x+
0 γ1 + (1− x+0 )γ0
]. (58)
The above two equalities determine the continuation payoff γ1 and γ0 as belowγ1 = 1
δ·
(1−x+0 )v1−(1−x+
s1(θ))v0
x+s1(θ)
−x+0
− 1−δδ·
(b− s1(θ)−1
N−1c
)(1−x+
0 )−(
s0(θ)−1
N−1b−c
)(1−x+
s1(θ))
x+s1(θ)
−x+0
γ0 = 1δ·
x+s1(θ)
v0−x+0 v1
x+s1(θ)
−x+0
− 1−δδ·
(b− s1(θ)−1
N−1c
)x+0 −(
s0(θ)−1
N−1b−c
)x+
s1(θ)
x+s1(θ)
−x+0
. (59)
2) Sufficient conditions: Now we derive the sufficient conditions under which any payoff
profile v ∈ Wθ can be decomposed by (α0 = αa, αa · 1N) or (α0 = αf , αf · 1N). Specifically,
we will derive the conditions such that for any payoff profile v ∈ Wθ, at least one of the two
decomposability constraints (31) and (53) is satisfied. From the preliminaries, we know that the
incentive compatibility constraints in (31) and (53) can be simplified into the same constraints:
(1− 2ε)[β+
1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ
δ· c, (60)
October 19, 2012 DRAFT
28
and
(1− 2ε)[β+
0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ
δ· c. (61)
The above constraints impose the constraint on the discount factor, namely
δ ≥ maxθ∈Θ
c
c + (1− 2ε)[β+
θ − (1− β−θ )](γ1 − γ0)
. (62)
Since γ1 and γ0 should satisfy γ1 − γ0 ≥ ε0 − ε1, the above constraints can be rewritten as
δ ≥ maxθ∈Θ
c
c + (1− 2ε)[β+
θ − (1− β−θ )](ε0 − ε1)
, (63)
where is part of Condition 3 in Theorem 1.
In addition, the continuation payoffs γ1 and γ0 should satisfy the constraints of the self-
generating set, namely
γ1 − γ0 ≥ ε0 − ε1, (64)
γ1 +c
(N − 1)b· γ0 ≤ z2 , (1 +
c
(N − 1)b)(b− c)− c
(N − 1)bε0 − ε1, (65)
γ1 − bN−2N−1
b− c· γ0 ≤ z3 , −
bN−2N−1
b−c− 1
1 + c(N−1)b
· z2. (66)
We can plug the expressions of the continuation payoffs γ1 and γ0 in (52) and (59) into
the above constraints. Specifically, if a payoff profile v is decomposed by the altruistic action,
the following constraints should be satisfied for the continuation payoff profile to be in the
self-generating set: (for notational simplicity, we define κ1 , bN−2N−1
b−c− 1 and κ2 , 1 + c
(N−1)b)
1
δ· v1 − v0
x+1 − x+
0
≥ ε0 − ε1, (αa-1)
1
δ·{
(1− κ2x+0 )v1 − (1− κ2x
+1 )v0
x+1 − x+
0
− κ2 · (b− c)
}≤ z2 − κ2 · (b− c), (αa-2)
1
δ·{
(1 + κ1x+0 )v1 − (1 + κ1x
+1 )v0
x+1 − x+
0
+ κ1 · (b− c)
}≤ z3 + κ1 · (b− c). (αa-3)
The constraint (αa-1) is satisfied for all v1 and v0 as long as x+1 > x+
0 , because v1−v0 > ε0−ε1,
|x+1 > x+
0 | < 1, and δ < 1.
Since both the left-hand side (LHS) and the right-hand side (RHS) of (αa-2) are smaller than
0, we have
(αa-2) ⇔ δ ≤(1−κ2x+
0 )v1−(1−κ2x+1 )v0
x+1 −x+
0
− κ2 · (b− c)
z2 − κ2 · (b− c)(67)
October 19, 2012 DRAFT
29
The RHS of (αa-3) is larger than 0. Hence, we have
(αa-3) ⇔ δ ≥(1+κ1x+
0 )v1−(1+κ1x+1 )v0
x+1 −x+
0
+ κ1 · (b− c)
z3 + κ1 · (b− c). (68)
If a payoff profile v is decomposed by the fair action, the following constraints should be
satisfied for the continuation payoff profile to be in the self-generating set:
1
δ·
v1 − v0
x+s1(θ) − x+
0
−s1(θ)N−1
b + s0(θ)N−1
c
x+s1(θ) − x+
0
≥ ε0 − ε1 −s1(θ)N−1
b + s0(θ)N−1
c
x+s1(θ) − x+
0
, (αf-1)
1
δ·
(1− κ2x+0 )v1 − (1− κ2x
+s1(θ))v
0
x+s1(θ) − x+
0
−(1− κ2x
+0 )(b− s1(θ)−1
N−1c)− (1− κ2x
+s1(θ))
(s0(θ)−1
N−1b− c
)x+
s1(θ) − x+0
≤ z2 −
(1− κ2x+0 )(b− s1(θ)−1
N−1c)− (1− κ2x
+s1(θ))
(s0(θ)−1
N−1b− c
)x+
s1(θ) − x+0
, (αf-2)
1
δ·
(1 + κ1x+0 )v1 − (1 + κ1x
+s1(θ))v
0
x+s1(θ) − x+
0
−(1 + κ1x
+0 )(b− s1(θ)−1
N−1c)− (1 + κ1x
+s1(θ))
(s0(θ)−1
N−1b− c
)x+
s1(θ) − x+0
≤ z3 −
(1 + κ1x+0 )(b− s1(θ)−1
N−1c)− (1 + κ1x
+s1(θ))
(s0(θ)−1
N−1b− c
)x+
s1(θ) − x+0
. (αf-3)
Since v1−v0
x+s1(θ)
−x+0
> ε0 − ε1, the constraint (αf-1) is satisfied for all v1 and v0 if v1 − v0 ≥s1(θ)N−1
b + s0(θ)N−1
c. Hence, the constraint (αf-1) is equivalent to
δ ≥v1 − v0 −
(s1(θ)N−1
b + s0(θ)N−1
c)
(ε0 − ε1)(x+s1(θ) − x+
0 )−(
s1(θ)N−1
b + s0(θ)N−1
c) , for θ s.t.
s1(θ)
N − 1b +
s0(θ)
N − 1c ≥ v1 − v0. (69)
For (αf-2), we want to make the RHS have the same (minus) sign under any state θ, which
is true if
1− κ2x+0 > 0, 1− κ2x
+s1(θ) < 0,
1− κ2x+s1(θ)
1− κ2x+0
≥ −(κ2 − 1), s1(θ) = 1, . . . , N − 1, (70)
which leads to
x+s1(θ) >
1
κ2
, x+0 <
1
κ2
, x+0 <
1− x+s1(θ)
1− κ2
, s1(θ) = 1, . . . , N − 1, (71)
⇔ N − 2
N − 1x+
1 +1
N − 1β+
1 >1
κ2
, x+0 < min
{1
κ2
,1− β+
1
1− κ2
}. (72)
Since the RHS of (αf-2) is smaller than 0, we have
(αf-2) ⇔ δ ≤
(1−κ2x+0 )v1−(1−κ2x+
s1(θ))v0
x+s1(θ)
−x+0
−(1−κ2x+
0 )
(b− s1(θ)−1
N−1c
)−(1−κ2x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
z2 −(1−κ2x+
0 )
(b− s1(θ)−1
N−1c
)−(1−κ2x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
. (73)
October 19, 2012 DRAFT
30
For (αf-3), since1+κ1x+
s1(θ)
1+κ1x+0
< 1 + κ1, the RHS is always smaller than 0. Hence, we have
(αf-3) ⇔ δ ≤
(1+κ1x+0 )v1−(1+κ1x+
s1(θ))v0
x+s1(θ)
−x+0
−(1+κ1x+
0 )
(b− s1(θ)−1
N−1c
)−(1+κ1x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
z3 −(1+κ1x+
0 )
(b− s1(θ)−1
N−1c
)−(1+κ1x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
. (74)
We briefly summarize what requirements on δ we have obtained now. To make the continuation
payoff profile in the self-generating under the decomposition of αa, we have one upper bound on
δ resulting from (αa-2) and one lower bound on δ resulting from (αa-3). To make the continuation
payoff profile in the self-generating under the decomposition of αf , we have two upper bounds
on δ resulting from (αf-2) and (αf-3), and one lower bound on δ resulting from (αf-1). First,
we want to eliminate the upper bounds, namely make the upper bounds larger than 1, such that
δ can be arbitrarily close to 1.
To eliminate the following upper bound resulting from (αa-2)
δ ≤(1−κ2x+
0 )v1−(1−κ2x+1 )v0
x+1 −x+
0
− κ2 · (b− c)
z2 − κ2 · (b− c), (75)
we need to have (since z2 − κ2 · (b− c) < 0)
(1− κ2x+0 )v1 − (1− κ2x
+1 )v0
x+1 − x+
0
≤ z2, ∀v1, v0. (76)
The LHS of the above inequality is maximized when v0 = z2−z3
κ1+κ2and v1 = v0+ κ1z2+κ2z3
κ1+κ2. Hence,
the above inequality is satisfied if
(1− κ2x+0 )(
z2−z3
κ1+κ2+ κ1z2+κ2z3
κ1+κ2
)− (1− κ2x
+1 ) z2−z3
κ1+κ2
x+1 − x+
0
≤ z2 (77)
⇔(
1− x+1 − x+
0 (κ2 − 1)
x+1 − x+
0
κ1
κ1 + κ2
)z2 ≤ −
1− x+1 − x+
0 (κ2 − 1)
x+1 − x+
0
κ2
κ1 + κ2
z3. (78)
Since x+0 <
1−β+1
1−κ2<
1−x+1
1−κ2, we have
z2 ≤ −κ2
κ1
z3. (79)
To eliminate the following upper bound resulting from (αf-2)
δ ≤
(1−κ2x+0 )v1−(1−κ2x+
s1(θ))v0
x+s1(θ)
−x+0
−(1−κ2x+
0 )
(b− s1(θ)−1
N−1c
)−(1−κ2x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
z2 −(1−κ2x+
0 )
(b− s1(θ)−1
N−1c
)−(1−κ2x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
, (80)
October 19, 2012 DRAFT
31
we need to have (since z2 −(1−κ2x+
0 )
(b− s1(θ)−1
N−1c
)−(1−κ2x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
< 0)
(1− κ2x+0 )v1 − (1− κ2x
+s1(θ))v
0
x+s1(θ) − x+
0
≤ z2, ∀v1, v0. (81)
Similarly, the LHS of the above inequality is maximized when v0 = z2−z3
κ1+κ2and v1 = v0 +
κ1z2+κ2z3
κ1+κ2. Hence, the above inequality is satisfied if
(1− κ2x+0 )(
z2−z3
κ1+κ2+ κ1z2+κ2z3
κ1+κ2
)− (1− κ2x
+s1(θ))
z2−z3
κ1+κ2
x+s1(θ) − x+
0
≤ z2 (82)
⇔
1− x+s1(θ) − x+
0 (κ2 − 1)
x+s1(θ) − x+
0
κ1
κ1 + κ2
z2 ≤ −1− x+
s1(θ) − x+0 (κ2 − 1)
x+s1(θ) − x+
0
κ2
κ1 + κ2
z3. (83)
Since x+0 <
1−β+1
1−κ2<
1−x+s1(θ)
1−κ2, we have
z2 ≤ −κ2
κ1
z3. (84)
To eliminate the following upper bound resulting from (αf-3)
δ ≤
(1+κ1x+0 )v1−(1+κ1x+
s1(θ))v0
x+s1(θ)
−x+0
−(1+κ1x+
0 )
(b− s1(θ)−1
N−1c
)−(1+κ1x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
z3 −(1+κ1x+
0 )
(b− s1(θ)−1
N−1c
)−(1+κ1x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
, (85)
we need to have (since z3 −(1+κ1x+
0 )
(b− s1(θ)−1
N−1c
)−(1+κ1x+
s1(θ))
(s0(θ)−1
N−1b−c
)x+
s1(θ)−x+
0
< 0)
(1 + κ1x+0 )v1 − (1 + κ1x
+s1(θ))v
0
x+s1(θ) − x+
0
≤ z3, ∀v1, v0. (86)
Again, the LHS of the above inequality is maximized when v0 = z2−z3
κ1+κ2and v1 = v0 + κ1z2+κ2z3
κ1+κ2.
Hence, the above inequality is satisfied if
(1 + κ1x+0 )(
z2−z3
κ1+κ2+ κ1z2+κ2z3
κ1+κ2
)− (1 + κ1x
+s1(θ))
z2−z3
κ1+κ2
x+s1(θ) − x+
0
≤ z3 (87)
⇔
1− x+s1(θ) + x+
0 (κ1 + 1)
x+s1(θ) − x+
0
κ1
κ1 + κ2
z2 ≤ −1− x+
s1(θ) + x+0 (κ1 + 1)
x+s1(θ) − x+
0
κ2
κ1 + κ2
z3. (88)
Since 1− x+s1(θ) + x+
0 (κ1 + 1) > 0, we have
z2 ≤ −κ2
κ1
z3. (89)
October 19, 2012 DRAFT
32
In summary, to eliminate the upper bounds on δ, we only need to have z2 ≤ −κ2
κ1z3, which is
satisfied since we define z3 , −κ1
κ2z2.
Now we derive the analytical lower bound on δ based on the lower bounds resulting from
(αa-3) and (αf-1):
(αa-3) ⇔ δ ≥(1+κ1x+
0 )v1−(1+κ1x+1 )v0
x+1 −x+
0
+ κ1 · (b− c)
z3 + κ1 · (b− c), (90)
and
δ ≥v1 − v0 −
(s1(θ)N−1
b + s0(θ)N−1
c)
(ε0 − ε1)(x+s1(θ) − x+
0 )−(
s1(θ)N−1
b + s0(θ)N−1
c) , for θ s.t.
s1(θ)
N − 1b +
s0(θ)
N − 1c ≥ v1 − v0. (91)
We define an intermediate lower bound based on the latter inequality along with the inequalities
resulting from the incentive compatibility constraints:
δ′ = max
maxs1∈{1,...,N−1}: s1
N−1b+
N−s1N−1
c>ε0−ε1
ε0 − ε1 −(
s1
N−1b + N−s1
N−1c)
(ε0 − ε1)(
N−s1
N−1β+
1 + s1−1N−1
x+1
)−(
s1
N−1b + N−s1
N−1c) ,
maxθ∈{0,1}
c
c + (1− 2ε)(β+θ − (1− β−θ ))(ε0 − ε1)
}.(92)
Then the lower bound can be written as δ = max {δ′, δ′′}, where δ′′ is the lower bound that we
will derive for the case when the users have the same reputation. If the payoffs v1 and v0 satisfy
the constraint resulting from (αa-3), namely satisfy
(1 + κ1x+0 )v1 − (1 + κ1x
+1 )v0
x+1 − x+
0
≤ δz3 − (1− δ)κ1 · (b− c), (93)
then we use αa to decompose v1 and v0. Otherwise, we use αf to decompose v1 and v0
C. When the users have the same reputation
Now we derive the conditions under which any payoff profile in W1N and W0N can be
decomposed.
If all the users have reputation 1, namely θ = 1N , to decompose v ∈ W1N , we need to find
a recommended action α0 and a simple continuation payoff function γ such that for all i ∈ N
and for all αi ∈ A,
vi = (1− δ)ui(θ, α0, α0 · 1N) + δ∑θ′
γi(θ′)q(θ′|θ, α0, α0 · 1N) (94)
≥ (1− δ)ui(θ, α0, αi, α0 · 1N−1) + δ∑θ′
γi(θ′)q(θ′|θ, α0, αi, α0 · 1N−1).
October 19, 2012 DRAFT
33
When all the users have the same reputation, the altruistic action αa is equivalent to the fair
action αf . Hence, we use the altruistic action and the selfish action to decompose the payoff
profiles.
If we use the altruistic action αa to decompose a payoff profile v, we have
v1 = (1− δ)(b− c) + δ(x+1 γ1 + (1− x+
1 )γ0), (95)
and the incentive compatibility constraint
(1− 2ε)[β+
1 − (1− β−1 )](γ1 − γ0) ≥ 1− δ
δc. (96)
Setting γ1 = γ0 + 1−δδ
c
(1−2ε)[β+1 −(1−β−1 )]
and noticing that γ0 ∈[
(1+κ1)(ε0−ε1)−z3
κ1, κ1z2+(κ2−1)z3
κ1+κ2
],
we get an lower bound on v1 that can be decomposed by αa
v1 = (1− δ)(b− c) + δ
γ0 + x+1
1− δ
δ
c
(1− 2ε)[β+
1 − (1− β−1 )] (97)
≥ (1− δ)
b− c + cx+
1
(1− 2ε)[β+
1 − (1− β−1 )]+ δ
(1 + κ1)(ε0 − ε1)− z3
κ1
(98)
If we use the selfish action αs to decompose a payoff profile v, we have
v1 = δ(x+1 γ1 + (1− x+
1 )γ0). (99)
Since the selfish action is NE of the stage game, the incentive compatibility constraint is
satisfied as long as we set γ1 = γ0. Hence, we have v1 = δγ0. Again, noticing that γ0 ∈[(1+κ1)(ε0−ε1)−z3
κ1, κ1z2+(κ2−1)z3
κ1+κ2
], we get an upper bound on v1 that can be decomposed by αs
v1 = δγ0 ≤ δκ1z2 + (κ2 − 1)z3
κ1 + κ2
. (100)
In order to decompose any payoff profile v ∈ W1N , the lower bound on v1 that can be
decomposed by αa must be smaller than the upper bound on v1 that can be decomposed by αs,
which leads to
(1− δ)(b− c + c
x+1
(1−2ε)[β+1 −(1−β−1 )]
)+ δ (1+κ1)(ε0−ε1)−z3
κ1≤ δ κ1z2+(κ2−1)z3
κ1+κ2
⇒ δ ≥b−c+c
x+1
(1−2ε)[β+1−(1−β−
1)]
b−c+cx+1
(1−2ε)[β+1−(1−β−
1)]
+κ1z2+(κ2−1)z3
κ1+κ2− (1+κ1)(ε0−ε1)−z3
κ1
. (101)
Finally, following the same procedure, we derive the lower bound on δ when all the users
have reputation 0, namely θ = 0N . Similarly, in this case, the altruistic action αa is equivalent
October 19, 2012 DRAFT
34
to the fair action αf . Hence, we use the altruistic action and the selfish action to decompose the
payoff profiles.
If we use the altruistic action αa to decompose a payoff profile v, we have
v0 = (1− δ)(b− c) + δ(x+0 γ1 + (1− x+
0 )γ0), (102)
and the incentive compatibility constraint
(1− 2ε)[β+
0 − (1− β−0 )](γ1 − γ0) ≥ 1− δ
δc. (103)
If we use the selfish action αs to decompose a payoff profile v, we have
v1 = δ(x+0 γ1 + (1− x+
0 )γ0). (104)
Note that when θ = 0N , if we substitute β+0 , β−0 , x−0 with β+
1 , β−1 , x−1 , respectively, the
decomposability constraints become the same as those when θ = 1N . Hence, we derive a
similar lower bound on δ
δ ≥b− c + c
x+0
(1−2ε)[β+0 −(1−β−0 )]
b− c + cx+0
(1−2ε)[β+0 −(1−β−0 )]
+ κ1z2+(κ2−1)z3
κ1+κ2− (1+κ1)(ε0−ε1)−z3
κ1
. (105)
Finally, we can obtain the lower bound on δ when the users have the same reputation as
δ′′ = maxθ∈{0,1}
b− c + cx+
θ
(1−2ε)[β+θ−(1−β−
θ)]
b− c + cx+
θ
(1−2ε)[β+θ−(1−β−
θ)]
+ κ1z2+(κ2−1)z3
κ1+κ2− (1+κ1)(ε0−ε1)−z3
κ1
. (106)
Together with the lower bound δ′ derived for the case when the users have different reputations,
we can get the lower bound δ specified in Condition 3 of Theorem 1.
APPENDIX B
PROOF OF PROPOSITION 1
We prove that for any self-generating set (Wθ)θ∈ΘN , the maximum payoff in (Wθ)θ∈ΘN ,
namely maxθ∈ΘN maxv∈Wθ maxi∈N vi, is bounded away from the social optimum b−c, regardless
of the discount factor. In this way, we can prove that any equilibrium payoff is bounded away
from the social optimum. In addition, we analytically quantify the efficiency loss, which is
independent of the discount factor.
Since the strategies are restricted on the subset of actions Aas, in each period, all the users
will receive the same stage-game payoff, either (b− c) or 0, regardless of the matching rule and
October 19, 2012 DRAFT
35
the reputation profile. Hence, the expected discounted average payoff for each user is the same.
More precisely, at any given history ht = (θ0, . . . ,θt), we have
Ui(θt, π0|ht , π|ht · 1N) = Uj(θ
t, π0|ht , π|ht · 1N), ∀i, j ∈ N , (107)
for any (π0, π · 1N) ∈ Πf (Aas) × ΠN
f (Aas). As a result, when we restrict to the action set Aas,
the self-generating set (Wθ)θ∈ΘN satisfies for any θ and any v ∈ Wθ
vi = vj, ∀i, j ∈ N . (108)
Given any self-generating set (Wθ)θ∈ΘN , define the maximum payoff v as
v , maxθ∈ΘN
maxv∈Wθ
maxi∈N
vi. (109)
Now we derive the upper bound of v by looking at the decomposability constraints.
To decompose the payoff profile v · 1N , we must find a recommended action α0 ∈ Aas, an
action profile α · 1N with α ∈ Aas, and a continuation payoff function γ : ΘN → ∪θ′∈ΘNWθ′
with γ(θ′) ∈ Wθ′ , such that for all i ∈ N and for all αi ∈ A,
v = (1− δ)ui(θ, α0, α · 1N) + δ∑θ′
γi(θ′)q(θ′|θ, α0, α · 1N) (110)
≥ (1− δ)ui(θ, α0, αi, α · 1N−1) + δ∑θ′
γi(θ′)q(θ′|θ, α0, αi, α · 1N−1).
Note that we do not require the users’ action α to be the same as the recommended action α0,
and that we also do not require the continuation payoff function γ to be a simple continuation
payoff function.
First, the payoff profile v · 1N cannot be decomposed by a recommended action α0 and the
selfish action αs. Otherwise, since γ(θ′) ∈ Wθ′ , we have
v = (1− δ) · 0 + δ∑θ′
γi(θ′)q(θ′|θ, α0, α
a · 1N) ≤ δ∑θ′
vi · q(θ′|θ, α0, αa · 1N) = δ · v < v,
which is a contradiction.
Since we must use a recommended action α0 and the altruistic action αa to decompose v ·1N ,
we can rewrite the decomposability constraint as
v = (1− δ)(b− c) + δ∑θ′
γi(θ′)q(θ′|θ, α0, α
a · 1N) (111)
≥ (1− δ)ui(θ, α0, αi, αa · 1N−1) + δ
∑θ′
γi(θ′)q(θ′|θ, α0, αi, α
a · 1N−1).
October 19, 2012 DRAFT
36
Since the continuation payoffs under different reputation profiles θ, θ′ that have the same
reputation distribution s(θ) = s(θ′) are the same, namely γ(θ) = γ(θ′), the continuation
payoff depends only on the reputation distribution. For notational simplicity, with some abuse of
notation, we write γ(s) as the continuation payoff when the reputation distribution is s, write
q(s′|θ, α0, αi, αa · 1N−1) as the probability that the next state has a reputation distribution s′,
and write ui(s, αa, αi, αa · 1N−1) as the stage-game payoff when the next state has a reputation
distribution s. Then the decomposability constraint can be rewritten as
v = (1− δ)(b− c) + δ∑s′
γi(s′)q(s′|θ, α0, α
a · 1N) (112)
≥ (1− δ)ui(s, α0, αi, αa · 1N−1) + δ
∑s′
γi(s′)q(s′|θ, α0, αi, α
a · 1N−1).
Now we focus on a subclass of continuation payoff functions, and derive the maximum payoff
v achievable under this subclass of continuation payoff functions. Later, we will prove that we
cannot increase v by choosing other continuation payoff functions. Specifically, we focus on a
subclass of continuation payoff functions that satisfy
γi(s) = xA, ∀i ∈ N , ∀s ∈ SA ⊂ S, (113)
γi(s) = xB, ∀i ∈ N , ∀s ∈ SB ⊂ S, (114)
where SA and SB are subsets of the set of reputation distributions S that have no intersection,
namely SA ∩ SB = ∅. In other words, we assign the two continuation payoff values to two
subsets of reputation distributions, respectively. Without loss of generality, we assume xA ≥ xB.
Now we derive the incentive compatibility constraints. There are three actions to deviate to,
the action α0 in which the user does not serve users with reputation 0, the action α1 in which
the user does not serve users with reputation 1, and the action α01 in which the user does not
serve anyone. The corresponding incentive compatibility constraints for a user i with reputation
θi = 1 are ∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α0, αa · 1N)
(xA − xB) ≥ 1− δ
δ
s0
N − 1c,
∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α1, αa · 1N)
(xA − xB) ≥ 1− δ
δ
s1 − 1
N − 1c,
∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α01, αa · 1N)
(xA − xB) ≥ 1− δ
δc. (115)
October 19, 2012 DRAFT
37
Similarly, the corresponding incentive compatibility constraints for a user j with reputation
θj = 0 are ∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αj = α0, αa · 1N)
(xA − xB) ≥ 1− δ
δ
s0 − 1
N − 1c,
∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αj = α1, αa · 1N)
(xA − xB) ≥ 1− δ
δ
s1
N − 1c,
∑s′∈SA
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αj = α01, αa · 1N)
(xA − xB) ≥ 1− δ
δc. (116)
We can summarize the above incentive compatibility constraints as
xA − xB ≥1− δ
δc · ρ(θ, α0, SA), (117)
where
ρ(θ, α0, SB) , maxi∈N
max
sθi
−1
N−1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α0, αa · 1N), (118)
s1−θi
N−1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α1, αa · 1N), (119)
1∑s′∈S\SB
q(s′|θ, α0, αa · 1N)− q(s′|θ, α0, αi = α01, αa · 1N)
}. (120)
Since the maximum payoff v satisfies
v = (1− δ)(b− c) + δ
xA
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) + xB
∑s′∈SB
q(s′|θ, α0, αa · 1N)
, (121)
to maximize v, we choose xB = xA − 1−δδ
c · ρ(θ, α0, SB). Since xA ≥ v, we have
v = (1− δ)(b− c) + δ
xA −1− δ
δc · ρ(θ, α0, SB)
∑s′∈SB
q(s′|θ, α0, αa · 1N)
(122)
≤ (1− δ)(b− c) + δ
v − 1− δ
δc · ρ(θ, α0, SB)
∑s′∈SB
q(s′|θ, α0, αa · 1N)
, (123)
which leads to
v ≤ b− c− c · ρ(θ, α0, SB)∑
s′∈SB
q(s′|θ, α0, αa · 1N). (124)
Hence, the maximum payoff v satisfies
v ≤ b− c− c · minSB⊂S
ρ(θ, α0, SB)∑
s′∈SB
q(s′|θ, α0, αa · 1N)
, (125)
October 19, 2012 DRAFT
38
where SB satisfies for all i ∈ N ,
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) ≥
∑s′∈S\SB
q(s′|θ, α0, αi = α0, αa · 1N),
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) ≥
∑s′∈S\SB
q(s′|θ, α0, αi = α1, αa · 1N),
∑s′∈S\SB
q(s′|θ, α0, αa · 1N) ≥
∑s′∈S\SB
q(s′|θ, α0, αi = α01, αa · 1N). (126)
Following the same logic as in the proof of Proposition 6 in [8], we can prove that we cannot
achieve a higher maximum payoff by other continuation payoff functions.
REFERENCES
[1] Y. Zhang and M. van der Schaar, “Reputation-based incentive protocols in crowdsourcing applications,” in Proceedings of
the IEEE INFOCOM, ser. INFOCOM ’12, 2012.
[2] C.-J. Ho, Y. Zhang, and M. van der Schaar, “Towards social norm design for crowdsourcing markets,” in Proceedings of
the AAAI Human Computation Workshop, ser. HCOMP’2012, 2012.
[3] Y. Zhang and M. van der Schaar, “Strategic learning and robust protocol design for online communities with selfish users,”
IEEE Transactions on Automatic Control, 2011.
[4] M. Kandori, “Social norms and community enforcement,” Review of Economic Studies, vol. 59, no. 1, pp. 63 – 80, 1992.
[5] A. Blanc, Y.-K. Liu, and A. Vahdat, “Designing incentives for peer-to-peer routing,” in Proceedings of the IEEE INFOCOM,
ser. INFOCOM ’05, 2005.
[6] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, “Free-riding and whitewashing in peer-to-peer systems,” IEEE
Journal on Selected Areas in Communications, vol. 24, no. 5, pp. 1010–1019, 2006.
[7] B. Q. Zhao, J. C. S. Lui, and D.-M. Chiu, “Analysis of adaptive incentive protocols for p2p networks,” in Proceedings of
the IEEE INFOCOM, ser. INFOCOM ’09, 2009.
[8] C. Dellarocas, “Reputation mechanism design in online trading environments with pure moral hazard,” Information Systems
Research, vol. 16, no. 2, pp. 209–230, 2005.
[9] D. Fudenberg, D. K. Levine, and E. Maskin, “The folk theorem with imperfect public information,” Econometrica, vol. 62,
no. 5, pp. 997–1039, 1994.
[10] J. Horner, T. Sugaya, S. Takahashi, and N. Vielle, “Recursive methods in discounted stochastic games: An algorithm for
δ → 1 and a folk theorem,” Econometrica, vol. 79, no. 4, pp. 1277–1318, 2011.
[11] D. Abreu, D. Pearce, and E. Stacchetti, “Toward a theory of discounted repeated games with imperfect monitoring,”
Econometrica, vol. 58, no. 5, pp. 1041–1063, 1990.
[12] G. Mailath and L. Samuelson, Repeated Games and Reputations: Long-run Relationships. Oxford, U.K.: Oxford University
Press, 2006.
October 19, 2012 DRAFT