23
1 23 Biology & Philosophy ISSN 0169-3867 Biol Philos DOI 10.1007/s10539-012-9327-1 An evolutionary perspective on the long- term efficiency of costly punishment Ulrich J. Frey & Hannes Rusch

An evolutionary perspective on the long-term efficiency of costly punishment

Embed Size (px)

Citation preview

1 23

Biology & Philosophy ISSN 0169-3867 Biol PhilosDOI 10.1007/s10539-012-9327-1

An evolutionary perspective on the long-term efficiency of costly punishment

Ulrich J. Frey & Hannes Rusch

1 23

Your article is protected by copyright and

all rights are held exclusively by Springer

Science+Business Media B.V.. This e-offprint

is for personal use only and shall not be self-

archived in electronic repositories. If you

wish to self-archive your work, please use the

accepted author’s version for posting to your

own website or your institution’s repository.

You may further deposit the accepted author’s

version on a funder’s repository at a funder’s

request, provided it is not made publicly

available until 12 months after publication.

An evolutionary perspective on the long-term efficiencyof costly punishment

Ulrich J. Frey • Hannes Rusch

Received: 24 October 2011 / Accepted: 24 May 2012

� Springer Science+Business Media B.V. 2012

Abstract Many studies show that punishment, although able to stabilize cooper-

ation at high levels, destroys gains which makes it less efficient than alternatives

with no punishment. Standard public goods games (PGGs) in fact show exactly

these patterns. However, both evolutionary theory and real world institutions give

reason to expect institutions with punishment to be more efficient, particularly in the

long run. Long-term cooperative partnerships with punishment threats for non-

cooperation should outperform defection prone non-punishing ones. This article

demonstrates that fieldwork data from hunter-gatherers, common pool resource

management cases and even PGGs support this hypothesis. Although earnings in

PGGs with a punishment option may be lower at the beginning, efficiency increases

dramatically over time. Most ten-period PGGs cannot capture this change because

their time horizon is too short.

Keywords Efficiency � Punishment � Public goods games � Cooperation �Hunter-gatherer � Evolution

Introduction

Since Darwin, the question as to why humans and other animals (even bacteria, see

e.g. Brockhurst et al. 2008) engage in cooperative behavior has led to many theories

on how to explain this evolutionary puzzle. Most notable are approaches that focus

on direct reciprocity (Guala 2012; Burnham and Johnson 2005; Gachter and

Herrmann 2009), indirect reciprocity (Leimar and Hammerstein 2001; Rockenbach

and Milinski 2006), spatial considerations (Helbing and Yu 2009) and cultural

U. J. Frey (&) � H. Rusch

Center for Philosophy and the Foundations of Science, Justus-Liebig-University Giessen,

Rathenaustrasse 8, 35394 Giessen, Germany

e-mail: [email protected]

123

Biol Philos

DOI 10.1007/s10539-012-9327-1

Author's personal copy

group selection (Boyd et al. 2003). It has been noted (West et al. 2011) that all of

these can be subsumed under inclusive fitness theory (Hamilton 1964a, b).

This article confines itself to the questions: Which evolutionarily plausiblesettings explain the peculiarities of cooperative behavior in humans? And in

particular: why do humans engage in punishment since it is individually costly? We

suggest that humans punish because it has been more efficient in our ancestral

environment: even taking its costs into account, punishment increases individual

returns. This suggestion is controversial, since most experimental laboratory studies

conclude that punishment is inferior in efficiency to non-punishment.

Cooperative behavior of humans has been addressed in many ways during the last

decades. One way has been to conduct experimental tests in the laboratory (e.g.

Rockenbach and Milinski 2006); another has been tests under artificial conditions in

the field (e.g. Cardenas 2003); a third is the study of real world systems (Ostrom

1990). In the laboratory, the most frequently used experimental design has been

public goods games (PGGs; see e.g. Fehr and Fischbacher 2003). This design nicely

captures the social dilemma involved, without being too complex. This article

restricts itself to PGGs, because they resemble real world cooperation problems

most precisely. Close resemblance is important because ecological validity is indeed

a problem as will be discussed later. The literature on PGGs is enormous; for

comprehensive reviews and surveys see (Zelmer 2003; Ledyard 1995; Chaudhuri

2011; Gachter and Herrmann 2009).

This paper is organized as follows: the following section (‘‘Punishment as a

means to increase cooperation’’) explains why sanctions, at first glance, pose a

puzzle to game theory and evolutionary biology. The section ‘‘Is punishment really

less efficient?’’ discusses evidence from the laboratory (PGGs) concerning

efficiency of punishment, focusing particularly on the time horizon and the trend

of punishment. The section ‘‘Punishment in social-ecological systems’’ introduces

results from social-ecological systems where sanctions are surprisingly rare and

cheap at the same time, supporting our hypothesis. The ‘‘Punishment in hunter-

gatherer societies’’ section analyzes data from hunter-gatherers, again suggesting

that long-term cooperation supported by punishment rather than short-term

defection is more efficient. The converging evidence from these three sections

leads to the hypothesis that punishment is more efficient than no sanctions

(‘‘Hypothesis: punishment is efficient and attuned to long-term repeated interac-

tions’’). ‘‘Methods’’ describes the laboratory data that is analyzed. ‘‘Results’’

presents evidence that—in the long run—efficiency (earnings) is higher with

sanctions than without sanctions. These findings are discussed in section

‘‘Discussion’’.

Punishment as a means to increase cooperation

Even though we all witness that punishing behavior is ubiquitous in humans,

substantial theoretical problems remain. On the one hand, the rational payoff

maximization assumption predicts no cooperation in one-shot, anonymous encoun-

ters. However, if participants interact repeatedly while the number of interactions is

not known, cooperation and punishment may be rational as predicted by the folk

U. J. Frey, H. Rusch

123

Author's personal copy

theorems (Binmore 2006). Even when the number of interactions is known—as in

most 10-period PGGs—few people engage in backward induction. Since most

people do not behave as if others are perfectly rational, cooperation again becomes

possible (Gintis 2009).

Still, the dominant strategy is zero contribution at all times for all participants in

standard PGGs. The same logic applies to punishment. Nobody should punish since

this incurs costs, thus representing a second-order dilemma. Therefore, a payoff

maximizing player should not punish at all, at least not in one-shot anonymous

encounters. Evolutionary theory essentially predicts the same since each individual

should maximize his payoff in terms of inclusive fitness (compare e.g. West et al.

2011). Therefore, the behavior actually displayed in the laboratory and the real

world makes the problem of punishment even more puzzling: why does anyone

punish at all if it decreases overall efficiency and increases individual costs?

However, evolutionary theory also predicts the existence of mechanisms for coping

with free riders (Clutton-Brock and Parker 1995). In some situations this may

amount to nothing more than simply walking away (Gurven 2004), in others this

might be punishment.

‘‘Punishment’’ has multiple meanings in the literature. In this paper it is short for

‘‘costly punishment’’: punishment that may have payoff advantages for the punisher

even with costs included (therefore it is selfish punishment). Costly punishment

hypotheses assume that there is an ultimate benefit for the punisher, namely

increased inclusive fitness (West et al. 2007). In sum, punishment is an enforcement

mechanism to prevent free riders from reaping the rewards of their strategy.

The existing literature is not in agreement concerning the value of punishment

(P). Although the well-known decline of contributions in PGGs in treatments

without punishment can be stopped by punishment while also stabilizing

contributions at a high level, it nevertheless seems to be an inferior alternative to

non-punishment (NP, see e.g. Fehr and Gachter 2002). Why is that? Punishment

destroys welfare in a twofold manner; both the punisher and the punished lose part

of their earnings. The punishment costs are subtracted from their respective

earnings. Therefore, most studies conclude that earnings are lower than in non-

punishment treatments.

Is punishment really less efficient?

However, we believe this to be only part of the picture, which is due to three

reasons. First, the short duration of PGGs, which usually last 10 periods. Second, the

results have been interpreted with an unfortunate focus on static measures instead of

also taking into account the trend over time. Third, there has been a failure to draw

the appropriate conclusions from expenditures that were extremely high at the

beginning but then low afterwards.

In order to make this argument explicit, the following table (Table 1) lists major

studies dealing with the efficiency of punishment in order to present the facts on

which our analysis is based.

Table 2 (below)

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

Ta

ble

1C

om

par

iso

no

fst

ud

ies

on

effi

cien

cyo

fP

Ref

eren

ceP

artn

er/s

tran

ger

Per

iods

Effi

cien

cyP

or

NP

more

effi

cien

t?

Dif

fere

nce

inea

rnin

gs

(to

ken

so

r

per

centa

ge)

Sig

n.

of

dif

fere

nce

Gac

hte

ret

al.

(20

08)

Par

tner

10

1:3

aN

P(o

ver

all)

NP

?4

.68

MU

per

per

iod

p=

0.0

32

9

Par

tner

50

1:3

aP

(fro

mp

erio

d7

on

war

ds)

P?

2.9

8M

Up

erp

erio

dp

=0

.006

5

Gure

rket

al.

(20

06)

(cal

cula

ted

from

SO

M)

Par

tner

30

1:3

NP

(ov

eral

l)N

P?

17

tok

ens

(60

:77

)N

ot

rep

ort

ed

Her

rman

net

al.

(20

08)

(SO

M)

Par

tner

10

1:3

NP

13

sub

ject

po

ols

neg

ativ

e

(-0

.4%

to-

57

.9%

)

3p

osi

tiv

e(?

9.1

%to

?0

.5%

)

No

tre

po

rted

Eg

asan

dR

ied

l(2

00

8)

Str

ang

er6

1:1

1:2

3:1

3:3

NP

(over

all)

Rel

ativ

egai

ns

of

Pver

sus

NP

No

sig.

in-

or

dec

reas

e

ov

er6

per

iods

Pag

eet

al.

(20

05)

Reg

rou

pin

g

acco

rdin

gto

rank

20

1:4

P(o

ver

all)

77

%(N

P)

ver

sus

81

%(P

)o

fm

ax.

po

ssib

leea

rnin

gs

No

tre

po

rted

Boch

etet

al.

(20

06)

Par

tner

10

1:4

NP

(ov

eral

l)

(Ph

igh

erfr

om

per

iod

8o

nw

ard

s)

NP

?0.3

1ex

per

imen

tal

Doll

ars

Not

report

ed

aA

pu

nis

hm

ent

effe

ctiv

enes

s(P

)o

f1

:3m

eans

that

on

eto

ken

inv

este

db

yth

ep

un

ish

erre

du

ces

3to

ken

so

fth

ep

un

ish

edp

lay

er

U. J. Frey, H. Rusch

123

Author's personal copy

Summarizing Table 1 and 2 (the latter includes those studies that are analyzed in

greater detail), the majority of studies concerning the efficiency of punishment sees

punishment as a necessary evil to keep contributions high and stable. Since it

destroys welfare, it is generally considered to be inferior in efficiency to NP

treatments and this claim is seemingly well supported by experimental data, as

presented above and below. Only 3 out of 9 treatments in Tables 1 and 2 can claim a

higher efficiency of P, whereas 6 see the NP baseline outperform P institutions. One

study (Egas and Riedl 2008) does not find a significant difference and another one

(Nikiforakis and Normann 2008) differentiates between lower and higher

effectiveness of punishment. However, as soon as attention is focused on later

periods (the long-term perspective) the results begin to change: in 8 compared to 6

treatments P is more efficient than NP in some later period—and always

consistently so onwards.

Apart from efficiency, punishment as a mechanism to deter free riders faces two

other problems: antisocial punishment and counter-punishment. Recent research has

pointed out that humans do not punish ‘‘rationally’’. The theoretical expectation that

punishment should be directed against those that free ride the most is violated.

Punishment is often so-called antisocial punishment. Antisocial punishment is

sanctioning people who behave prosocially, that is, who contribute the same amount

or more than the punisher to the public good (Herrmann et al. 2008). A quarter of all

investments in punishment may be antisocial punishment (Nikiforakis 2008). A second

strategy in humans seems to consist in punishing those that punished the punishing

player previously. Reasons for such counter-punishment seem to be partly revenge,

partly strategic attempts to reduce one’s own punishment (Nikiforakis 2008). Such

behavior destroys a large part of the motivation to cooperate as well as part of the gains.

Before we continue the analysis of the efficiency of P in the laboratory, sanctioning in

the real world will be explored. The first example, complex social-ecological systems,

demonstrates that punishment is a precondition for success and can be very efficient.

The second example, social interactions in hunter-gatherer societies, shows the

evolutionary setting in which punishment mechanisms of humans evolved.

Punishment in social-ecological systems

One persistent criticism of laboratory experiments is their lack of ecological

validity. One example is the restart effect in PGGs (Andreoni 1988). If subjects are

unexpectedly told that the PGG starts anew, groups in partner treatments largely

repeat their behavior from the first round, increasing their contributions consider-

ably again. Criticism is not only limited to artificial settings, but extends to the

selection of participants as well. More than 96 % of the subjects in leading

psychology journals are from western, industrialized countries (and most of them

are US-American undergraduates) who may well constitute an outlier group in

cooperative behavior (Henrich et al. 2010). For these reasons, real world examples

of punishment are important.

More than 20 years ago, Ostrom convincingly demonstrated with a worldwide

data set of over 80 institutions managing common pool resources such as irrigation

systems or fisheries that monitoring and punishment (here called sanctions) are

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

Ta

ble

2C

om

par

iso

no

fst

ud

ies

anal

yze

dh

ere

Ref

eren

ceN

GS

EM

TF

P/

S

P/N

PD

iffe

rence

inea

rnin

gs

bet

wee

nP

and

non-P

Sig

nifi

cance

of

dif

fere

nce

and

tv

alu

eo

f

tte

st

Sef

ton

etal

.

(20

07)

14

44

60

.41

0?

10

1:2

PN

P(o

ver

all)

Per

iod

11

cmp

.to

20

inP

trea

tmen

tav

erag

e

incr

ease

=1

0.2

%(P

vs.

NP

earn

ing

s1

1–

20

)

p=

0.0

44

(t=

2.3

93

)

p\

0.0

00

1(t

=6

.88

8)

Feh

ran

dG

ach

ter

(20

00)

11

24

20

0.4

10

?1

01

:2S

NP

(Pfr

om

per

iod

18

on

war

ds)

No

tre

po

rted

No

tre

po

rted

42

00

.41

0?

10

1:2

PN

P(P

from

per

iod

4

on

war

ds)

No

tre

po

rted

No

tre

po

rted

Nik

ifo

rak

isan

d

No

rman

n

(20

08)

12

04

20

0.6

10

1:1

1:2

PN

PP

1:1

nev

er

P1

:2fr

om

per

iod

8o

nw

ard

No

tre

po

rted

42

00

.6(d

eriv

ed)

10

1:3

1:4

PP

P1

:3fr

om

per

iod

6o

nw

ard

P1

:4fr

om

per

iod

4o

nw

ard

No

tre

po

rted

Np

arti

cip

ants

,G

Sg

rou

psi

ze,

Een

do

wm

ent,

Tp

erio

ds,

Mm

arg

inal

per

cap

ita

retu

rn,

Fp

un

ish

men

tef

fect

iven

ess,

P/S

par

tner

or

stra

nger

,P

/NP

Po

rN

Pm

ore

effi

cien

t

U. J. Frey, H. Rusch

123

Author's personal copy

essential for the success of cooperative groups (Ostrom 1990). This has been found

to be a valid conclusion over and over again, including other common pool resource

systems such as forests (e.g. Chhatre and Agrawal 2008). Consistently, systems

without punishment mechanisms face severe rule-breaking and, in the end

cooperation and hence the systems break down. Therefore, the existence of

sanctions is a sine qua non condition for successful management (Gibson et al.

2005).

Besides establishing that sanctions are a necessity, one other result of common

pool resource research is important. Successful institutions are able to minimizeexpenditures on punishment by adapting their rules to local circumstances (e.g.

‘‘The initial sanctions used in these systems are surprisingly low’’ (Ostrom 1990,

p. 94). This is achieved by a system of graduated sanctions. Additionally, due to

smart monitoring the number of sanctions is often very low. One example taken

from irrigation may demonstrate this. In some systems, farmers have an allocated

time slice when they are allowed to water their fields. When it is their time, they

open the gate to their fields thus diverting the flow of water. Each farmer knows that

the farmer who is next in line waits for the time slice to end so that he can then

water his fields. Such a rotation rule embeds monitoring within the system and does

not require outside help—thus reducing costs to a minimum. Defection will be

detected within minutes by the person concerned (who has the highest motivation to

end that particular defecting behavior) and will subsequently be punished by the

community (Ostrom 1990). However, such well adjusted rules develop neither

automatically nor quickly. It takes years (sometimes centuries) to refine rule sets to

minimize costs for the community and to guarantee fair treatment to all participants.

Often enough, communities fail to develop such adapted rules with low transaction

costs.

Still, this evidence supports unequivocally the claim that punishment is a key

element of human cooperative and coordinative interaction, because sanctions are

definitively one critical aspect of the success of institutions and groups. Further-

more, sanctions are often efficient and associated with low costs for both punisher

and punished (Ostrom 1992). Our hypothesis (see ‘‘Hypothesis: punishment is

efficient and attuned to long-term repeated interactions’’ below) suggests that

strategies and efficiency may be tuned towards such long-term cooperation and

stresses the pivotal role of punishment. But is this hypothesis compatible with the

settings of hunter-gatherer social interactions? How universal are efficient

punishment institutions?

Punishment in hunter-gatherer societies

It seems certain that humans lived for millions of years in small groups as hunter-

gatherers where everyone knew everyone else (Hill 2001). Exogamy and long-range

trade existed but group members still knew each other. Our argument focuses on

two structural characteristics of such a supposed setting. First, social interactions are

frequent, repeated and long-term, often over years or decades, and include many

people, including kin. Second, the payoff structure of most interactions is skewed

towards cooperation—cooperation pays much more than defection. A simple

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

example would be supervising children while another individual gathers food, i.e.

classical division of labor. In this example, leaving the child would yield a small,

short-term benefit while at the same time it would bring an end to a profitable

mutual partnership. From these two structural properties we infer that long-term

alliances and partnerships should thrive and render substantial advantages to a

cooperative person while punishment should be inexpensive and rare. Although the

problem of free riding persists, free riders can only gain a very small advantage by

cheating in a few transactions—as soon as they are found out, they are publicly

denounced. If they continue cheating, they gradually lose access to sharing networks

and are shunned and excluded from the group in more extreme cases. Therefore,

such small advantages quickly turn into huge losses if compared with the

perspective of years of cooperation and mutual gain.

Two alternative hypotheses have been suggested. The first posits that the main

reason for food sharing is costly signaling by male hunters (Hawkes and Bird 2002).

Hence, food sharing would not be a cooperative activity, but would increase

individual reproductive benefits, because successful hunters have more or harder

working women (Hawkes and Bird 2002). Still, this and the hypothesis above are

complementary not conflicting (Hawkes 1991). The second suggestion sees intra-

group cooperation as mainly driven by conflicts between groups (Bowles 2006,

2009) e.g. over mates or other resources (Seabright 2010, pp. 60–64). Such conflicts

were both frequent and lethal as shown by archaeological and ethnographic

evidence (Ember 1978).

However, we think that the asymmetries in payoffs coupled with long-term

relationships as described above are the most relevant characteristics for the

explanation of cooperation in food-sharing. In addition, cooperation in food-sharing

does not run contrary to claims that conflicts over other resources are quite frequent

(e.g. mate selection, Voland 2009).

These characteristics are in fact in place in hunter-gatherer societies. First, the

time horizon is clearly a very long one—covering many years. For example, young

hunters lacking hunting skills receive their full share of food throughout the years in

which they learn to contribute game to the group (Hill 2001). Second, the payoff

asymmetry is large and even relevant for survival. Especially food sharing is a case

in point. Successful hunts result in quantities of meat that the family of the hunter

often cannot consume completely. Therefore, these surplus calories are of much less

use to this particular individual or family than to the family of another hunter, who

may have been unsuccessful that day. The importance of cooperation when payoffs

are asymmetrical is underlined by the fact that in hunter-gatherer societies even

experienced hunters are—depending on prey size and numbers of hunters—

successful on only 10–50 % of all days on average (Hill and Hurtado 2009).

An elegant evolutionary solution to the problem of unpredictable hunting success

is a tightly knit system of long-term reciprocity interwoven with kinship relations.

Although this solution addresses e.g. the problem of the unpredictability of hunting

success as mentioned above, the problem of free riding is only partly solved.

Whereas in small communities it is very easy to see who does not contribute and

although the advantages of long-term cooperation are massive in comparison to

short-term free riding there still are free riders (Gurven 2004).

U. J. Frey, H. Rusch

123

Author's personal copy

The theoretical considerations above predict terminating non-profitable partner-

ships as the most efficient solution for dealing with free riders (see also Semmann

et al. 2003; Aktipis 2011). What is found in many hunter-gatherer societies is

something very similar: defectors are ‘‘shunned’’—that is, they are excluded from

social reciprocal networks (Gurven 2004). Defection, therefore, is not tolerated in

most societies (with a few exceptions, see Gurven 2004). First punishment measures

such as social pressure (via gossip, name calling, contempt, etc.) are quickly

intensified until the defector provides an equal share. If this does not happen, which

is very rare, then the termination of cooperation is the ultimate punishment; the

defector is excluded from the community. Therefore, punishment often consists in

the removal of benefits of cooperation. Social pressure is simple and inexpensive to

the punishers, but costly to the punished, particularly if a group works together as a

group. However, if it comes to a rare exclusion, the loss of a member may be

expensive for the group as well. This argument has to remain somewhat speculative,

since there is little hard data. The existing data, however, points in the direction of

the argument above:

But it is fair to say that there is currently no evidence that cooperation is

sustained by strong negative reciprocity in small societies. And whatever

evidence there is, it rather points in the direction of cheap mechanisms like

ostracism and coalitional punishment. (Guala 2012, p. 11).

To sum up: the described setting of small groups whose members are dependent

on one another for reproduction and daily survival as well as the above

considerations serve as the basis for our hypothesis. Assuming that humans lived

for millions of years in such or similar settings, it seems reasonable to assume that

long-term cooperation rather than short-term defection is the key for each individual

to reap the highest benefits.

These considerations are strongly supported by the strand of research discussed

above (‘‘Punishment in social-ecological systems’’): the ability of many modern

small communities to overcome social dilemmas in many real world settings.

Among the success factors are low cost sanctions, graduated social punishment

(shunning) and clear group boundaries which allow exclusion (Ostrom 1990).

Hypothesis: punishment is efficient and attuned to long-term repeated

interactions

Both evolutionary theory—as argued above (‘‘Is punishment really less efficient?’’)—

and social-ecological systems (‘‘Punishment in social-ecological systems’’), where

cooperation is critical and free riders pose a danger, support the claim of

Hypothesis 1: Punishment institutions should be more efficient in payoff than

non-punishment institutions in the long run.

If this is true, environments with punishment options should generate higher

earnings even with costs subtracted than corresponding settings without punish-

ment. With regard to hypothesis 2, this should not be observable in general, but only

if interactions are long-term.

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

Hypothesis 2: The highest efficiency should be linked to long-term repeated

interactions within the in-group. The lowest efficiency is therefore expected for

short-term interactions with strangers. There is a continuous gradation between

these extremes.

Thus, we expect to find a higher efficiency of punishment treatments in

laboratory experiments in the long run and a competitive edge of sanctioning

institutions in the real world. To date, this seems to be a decided case againstpunishment for most researchers and an open question to others:

Hence, whether punishment opportunities yield welfare improvements irre-

spective of punishment effectiveness in the long run remains an open question.

(Nikiforakis and Normann 2008, p. 365).

As argued above, it is important to keep in mind the time frame—which ranges

from minutes in laboratory PGGs to years in hunter-gatherer partnerships and to

centuries in common pool resource management. More than 20 years ago,

increasing the time horizon to foster cooperation was suggested by Axelrod; he

dubbed it ‘‘the shadow of the future’’ (Axelrod 1984/2000). Since the time horizon

in laboratory experiments is extremely short and punishment is expensive for the

punisher in comparison to real world settings (see ‘‘Punishment in social-ecological

systems’’ and ‘‘Punishment in hunter-gatherer societies’’) it may well be that a

higher efficiency of punishment treatments is barely visible.

Methods

Data from various previously published standard laboratory PGG studies is

reanalyzed here. Using these independent data has several advantages:

• the sample size is larger,

• possible biases are reduced (e.g. expectancy effects, see Frey 2007),

• effects are more robust if found in all samples,

• data quality is high since all work has undergone peer review processes.

The focus is on three studies: Nikiforakis and Normann (2008), Fehr and Gachter

(2000), Sefton et al. (2007); for details see Table 2. The data sets were chosen

because they concentrate on efficiency in punishment while adding baselines. Also,

data availability is an issue, but wherever possible, additional studies are used to

underscore a particular point of our argument.

Results

Earnings are higher in the long run in P treatments

Recall hypothesis 1: punishment institutions should be more efficient in payoff than

non-punishment institutions in the long run.

U. J. Frey, H. Rusch

123

Author's personal copy

A precondition for all later analyses is the question as to whether the time horizon

in PGGs is long enough to observe all relevant processes pertaining to efficiency.

One study (Herrmann et al. 2008, SOM) reports that NP treatments are clearly

superior in efficiency to P, since 13 out of 16 subject pools in different countries

earned less money when punishment was available (see Table 1). Moreover, there is

a large asymmetry: the worst punishment treatment is -57.9 % in efficiency

(percentage change compared to the NP baseline), whereas the best punishment

treatment is ?9.1 % in comparison to NP.

However, a different picture emerges when the trend in time is analyzed. All but

the worst pool in average earnings (Muscat) show a clearly positive trend in

earnings in the P treatments. In addition, seven pools have a P/NP ratio of greater

than 1 (earnings in P/earnings in NP) in the last four periods (7–10). This means that

from period seven onwards already, punishment is superior to NP in 44 % of the

cases—a very short time interval indeed. So it might be suspected that punishment

is not as inefficient as it appears and maybe even superior to NP in the long run.

More support for this claim is presented in ‘‘Discussion’’ section.

At first, it seems counterintuitive that P treatments should ever be more efficient

than NP ones, because they are always encumbered by the costs of punishment.

True, but there is one exception: if punishment leads to high and stable contributions

and costs are low, P treatments can lead to a higher efficiency. Surprisingly, both

conditions often seem to be satisfied. Much has been written on the typical decay of

contributions in PGGs without punishment (e.g. Fehr and Fischbacher 2003) and

high and stable contributions in punishment treatments (e.g. Fehr and Gachter

2000). The following figure (Fig. 1), which is typical for many results, demonstrates

that this is in fact so. Moreover, it shows the pattern of decay in NP versus stable or

increasing contributions in P treatments in the long run.

The next figure (Fig. 2) shows earnings for five different treatments (baseline vs.

punishment with different effectiveness in punishment) without costs. It demonstrates

Fig. 1 Declining NP versus increasing P contributions in the long run (data from Gachter et al. 2008,authors’ calculations and graphic)

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

very clearly two facts: the more effective the punishment, the higher the contributions

and earnings; and, earnings increase with the effectiveness of P and decrease in NP

treatments over time.

Note the very similar starting points of the different treatments and their fast

divergence. Earnings begin to separate between the treatments as early as period

two onwards (a Kruskal–Wallis test shows highly significant differences (df = 4;

p \ 0.001) between all treatments in earnings without costs). It is also striking that

only the two more effective punishment treatments manage to have an increasing

trend in earnings over time. The reason for these higher earnings are the higher

contributions in these treatments compared to the baseline. This does not explain,

however, their increasing trend. This is due to another fact, which has been

somewhat neglected in the research literature. The expenditures on punishment

decline significantly and are close to zero in the later periods (except for the last).

The following figure (Fig. 3) demonstrates these typical decreasing expenditures on

punishment in the same treatments as shown in Fig. 3.

This result can be generalized. As the next figure (Fig. 4) shows, the costs of

punishment are very high in the first periods in all analyzed studies. Up to 70 %

(27–70 %) of all earnings are destroyed in the first period, the mean for the first five

Fig. 2 Higher earnings for more effective punishment without costs (data from Nikiforakis and Normann2008, authors’ calculations and graphics)

U. J. Frey, H. Rusch

123

Author's personal copy

periods is 22 %. The mean for all periods and all punishment treatments in all

studies is 16 %. This changes noticeably with only 10 % of all earnings being

destroyed in the last five periods. Keep in mind that from an ecological standpoint a

10-period PGG is considered as ultrashort regarding the time frame of cooperation

and the sanctions are very costly indeed in contrast to real world measures.

Expenditures sometimes even seem to be confined to the first few periods of

repeated interactions (and the last period). For example, a study by Janssen et al.

(2010) which uses a graphical common pool resource game finds over four times the

amount of punishment events in the first period as compared to period 2, with period

3 again less than half of that. After period 3, punishment events occur only

sporadically (in 5 periods) compared to 13 periods without any punishment (Janssen

et al. 2010, SOM).

The exception is the last period with sharply decreasing contributions in almost

all treatments. In punishment treatments this is accompanied by sharply increasing

punishment. This seemingly irrational last round punishment has been puzzling

researchers for a long time. We will suggest an explanation in ‘‘Discussion’’ section.

Since hypothesis 1 emphasizes the long run differences between earnings,

earnings should become smaller over time (closing the gap between P and NP), be

on par for some time and then increase again in reverse order (with P being more

efficient). A statistical comparison of the data in (Nikiforakis and Normann 2008)

Fig. 3 Declining expenditures on punishment (data from Nikiforakis and Normann 2008, authorscalculations and graphics; note the increasing costs in the last period of three out of four treatments,which is an effect of the anticipated end of the game)

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

comes to exactly that conclusion. Thus, efficiency in P treatments is indeed lower at

the start, but increases over time, even in 10-period PGGs. Another study is even

more conclusive. It analyzes a ‘‘long-term’’ PGG with 50 periods (Gachter et al.

2008). Its conclusions run contrary to the received wisdom, but very much along the

lines presented here. Contributions and earnings are significantly higher in the P

treatment (Gachter et al. 2008, p. 1510). Hence, simply extending a PGG from 10 to

50 rounds reverses the usual picture, though this pattern is derived from one study

only.

Further support comes from Gurerk et al. (2006) who compare high contributors

in P with free riders in NP and find a statistically significant difference for the

punishment treatment in earnings. The earnings are higher from period 5 onward.

This result suggests that cooperative individuals can outperform free riders provided

they may assort themselves or punish free riders intruding into their circles. Another

study finds a near perfect 98.24 % of maximal possible earnings in a P treatment of

a group that plays a PGG a second time the day after the first PGG (Casari 2003).

This is good evidence from a situation with a (slightly) longer time horizon, a group

that stays together for more than a minimal time period and apparently already

having established a group norm of cooperation.

We conclude with a graph that wraps up our argument in a nutshell (Fig. 5).

Earnings in both P treatments are way below the NP treatments in the first few

periods. However, this changes rapidly—faster in the partner treatment than in the

Fig. 4 Declining expenditures on punishment (data from Fehr and Gachter 2000; Sefton et al. 2007;Nikiforakis and Normann 2008, authors calculations and graphics)

U. J. Frey, H. Rusch

123

Author's personal copy

stranger treatment as predicted by Hypothesis 2. Both NP treatments show

decreasing, both P treatments increasing earnings. In the partner P treatment,

earnings are higher from period 4 onwards, in the stranger treatment, in period 10.

Contrary evidence

Although this article analyzed a number of studies and generally found an

increasing efficiency in P treatments in the later phases of PGGs, there are a few

studies where this is definitely not the case.

These studies (e.g. Egas and Riedl 2008; Dreber et al. 2008; Fehr and Gachter

2002) find evidence that punishment does not pay and that there are no significant

differences in earnings in P treatments from the first periods to later ones, therefore

no increasing efficiency. Dreber et al. (2008) for example find that participants who

punish do worse than those that do not (therefore their paper’s title ‘‘Winners don’t

punish’’). Reacting to defection by defecting is better than P in terms of payoff. This

is not surprising given the fact that the P ratio is high (1:4) and P destroys gains by

reducing earnings of punisher and punished. In comparison, defection results only in

lower gains. How can this be explained in the light of our hypotheses?

In fact, the explanation is straightforward since all three studies differ in two

decisive aspects from the other studies examined. A first difference is the choice of a

‘‘stranger’’ design instead of a ‘‘partner’’ design (Dreber et al. is actually a mixture

of stranger and partner). This means that subjects do not stay in one group, but are

Fig. 5 Comparison of stranger (NP and P) and partner (NP and P) treatments (data from Fehr andGachter 2000; author’s calculations and graphics)

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

randomly rematched each round with new players. The second difference consists in

the number of periods. All three studies use a rather small number of rounds: in

Egas et al. and Fehr et al. subjects play 6 periods, whereas in Dreber et al. the game

terminates with a 0.25 % probability each round. Thus, on average, each player

played only 3.3 rounds with the same partners (Dreber et al. 2008).

The combination of stranger treatments and short interactions leads to low trust,

no opportunity to establish group norms and in consequence to high punishment

expenditures that remain high. Although in P cooperation levels are higher,

efficiency is not:

Comparing each control experiment with its corresponding treatment, we find

that punishment increases the frequency of cooperation (Dreber et al. 2008,

p. 349).

That means that the short-term effect of punishment improves cooperation levels,

but not efficiency. Without a punishment option (control treatments) defection

increases significantly over time (Dreber et al. 2008, SOM).

Therefore, evidence that seems to contradict our hypothesis actually fits quite

well with it. As predicted, cooperation levels are lowest when interacting with

strangers over short periods of time. Since group norms cannot be established,

expenditures on punishment remain high throughout the game (e.g. in Fehr and

Gachter 2002, authors’ calculations). It may be debated in which way a randomly

assembled ‘‘group’’ of players in a laboratory is perceived as a kind of ‘‘in-group’’ in

the evolutionary sense, but many studies suggest that even tiny cues (‘‘blue group’’,

‘‘Kandinsky group’’) suffice to create a feeling of belonging to a group (Yamagishi

and Kiyonari 2000; Koopmans and Rebers 2009).

Discussion

Interpretation of laboratory results

Laboratory experiments have to be interpreted with caution. This fact has been

pointed out by researchers from fields as diverse as cognitive psychology

(Gigerenzer et al. 1999), common pool resource research (Cardenas 2000; Ostrom

et al. 1994) and behavioral ecology or evolutionary psychology (Haselton et al.

2009; Buss 2004). It is also true for cooperation research (Wiessner 2009). As

discussed above (see ‘‘Punishment in hunter-gatherer societies’’), results from

stranger treatments are artifacts since there have been hardly any anonymous, one-

shot interactions with strangers in the real world (Hill 2001).

Since the comparisons in our study are always between P and NP treatments under

anonymous settings without face-to-face contact or communication, such settings are

both artificial and hostile to cooperation. More realistic settings which include

reputation building (e.g. Masclet et al. 2003; Rockenbach and Milinski 2006), non-

anonymity and communication (Ostrom et al. 1994) increase cooperation enormously.

Communication, reputation and close social contact were common since humans have

been living in close social groups that are partly kin-related for millions of years (see

U. J. Frey, H. Rusch

123

Author's personal copy

above). Thus, if punishment is found to be successful and efficient even in short-term

anonymous interactions with strangers in the laboratory, then it is reasonable

to assume that under more favorable conditions punishment will be more efficient by

far.

Additionally, the standard ten-period design of PGGs is probably too short to get

meaningful results about social interactions. The reversed results from a 50-period

PGG (Gachter et al. 2008) and our analysis of the neglected trend of increasing

efficiency in P treatments clearly suggest this. Efficiency is higher in P compared

with NP treatments where parameters come closest to resembling real social

situations.

To sum up the above from an evolutionary perspective: the mechanisms of direct

and indirect reciprocity have evolved in settings with small groups, face-to-face

communication, kin-relatedness with partners and long-term partnerships (i.e. over

many years) with pay-offs skewed towards cooperation. Therefore, human behavior

in the laboratory may be the indiscriminate answer of cooperative brains to an

environment that is not exactly typical for the environment they evolved in.

Therefore, a parsimonious explanation of cooperation in anonymous, one-shot

interactions with non-kin might be that the evolved reciprocal mechanisms usually

do not properly discriminate between such highly artificial laboratory settings and

the real world with small groups and no or few strangers. This may even be the case

when subjects explicitly state that they understand the difference.

Likewise, the seemingly irrational behavior of dealing out punishment in the last

round has been puzzling researchers for a long time. What happens is that

researchers typically find sharply decreasing contributions in almost all treatments.

In punishment treatments this is accompanied by sharply increased punishment. Our

hypothesis suggests an explanation. Strategies for social interactions are not tied to

situations (e.g. PGGs), but to persons. Therefore, it is perfectly rational to punish at

the end of situation A, which is, in a real world setting, almost always prior in time

to a situation B, where interaction between the same two (or more) people

continues.

Costly punishment?

Evidence from hunter-gatherers shows that costly punishment is not so costly after

all. An analogy is provided by dominance hierarchies: it is expensive to establish

such a system, but once it is up and running, it is very cheap producing a lot less

conflicts (Voland 2009). Often, the first and most cost-effective way to cope with

free riders is to simply walk away and terminate further interactions on an

individual level (Gurven 2004). This is why trust and reputation is so important

because it facilitates finding cooperative partners while avoiding freeriders. Hence,

assortment is an important and reliable mechanism in humans. Assortment studies

underscore this claim. To pick out one study (Page et al. 2005): groups playing a

PGG with the option to regroup according to cooperation levels have the highest

efficiency with 88 % of the maximum attainable, followed by the combination of

regrouping and P (86 %), P alone (81 %) and the baseline (77 %). The second best

way to cope with free riders may be sanctions, since enforcing cooperative behavior

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

may provide direct and indirect benefits, while being efficient. This has been

demonstrated in various species ranging from meerkats to soybeans (West et al.

2011).

An important contribution to this point comes from Masclet et al. (2003). This

study compares a P treatment with the option of costlessly assigning points to show

one’s disapproval regarding contributions. There is no significant difference incontributions between the two treatments. This result indicates that disapproval at

first suffices to stop defectors, although not for long—contributions in the

disapproval treatment are lower in later periods compared to the P treatment.

Thus, it can be argued that an important part of sanctioning institutions is the threatto punish without necessarily doing so. Still, the threat must be credible and

enforced from time to time. In turn, efficiency is high because costs are low.

Additionally, more severe punishment in small communities is typically

administered by a council or board consisting of powerful individuals; they act as

a group and have not been directly involved in the act which is punished. Their

number and their neutrality reduce antisocial punishment. The punishment itself

often consists in warnings (low cost), compensation (increasing efficiency) and in

the worst case, shunning (excluding free riders, thus increasing the cooperation

level). There are countless ways to implement sanctions in a cost-effective way. One

example comes from common pool resource systems in Italy where ‘‘individual

users could inspect other users at their own cost and impose a predetermined

sanction (a fine) when a free rider was discovered. The fine was paid to the user who

found a violator’’ (Casari 2003, p. 217).

Lastly, since our theory claims that P is more efficient than NP in the long run,

humans may have a preference for P environments. Support for this comes from

studies that find that humans indeed prefer punishment (after a short time) to

institutions without sanctions (Gurerk et al. 2006; Rockenbach and Milinski 2006).

Conclusion

Humans lived for millions of years in small, kin-related groups. Evidence from

hunter-gatherer societies adds that not only kin-relations are important in such

groups, but direct and indirect reciprocity with non-kin plays a major role, too (Hill

2001). This is due to the fact that humans are cooperative breeders with high

incentives to cooperate due to variable hunting success (Hill and Hurtado 2009), and

widely practiced food sharing (Gurven 2004). It has been argued here that in this

particular environment long-term cooperative partnerships are superior to short-

term defection strategies on an individual level of selection. We see mainly two

mechanisms at work: cooperation with other reliable cooperators (be they kin or

not) is preferred, defectors are punished by either terminating social interactions

with them (cheap) or costly punishment such as compensations or exclusion from

the group (more expensive and only used if freeriding occurs repeatedly). Most of

the time, punishment is not too expensive as well, because it is graduated, starting

out with warnings, etc. These inexpensive punishment mechanisms have been found

U. J. Frey, H. Rusch

123

Author's personal copy

to be frequent and viable in common pool resource management situations in the

real world and in the laboratory.

Once social norms have been established, free riders face severe costs and do

better to switch to cooperative strategies. Thus, expenditures on punishment drop

rapidly and credible threats suffice in most cases. Cooperators assort with one

another, because it is their choice with whom to hunt, to trade favors, to share food,

etc. Crucially, in such an environment there are highly skewed payoffs in favor of

cooperation over defection. Short-term defection benefits are puny in comparison to

benefits derived from long-term, highly profitable cooperative partnerships and

alliances making only sporadic punishment of cheaters necessary who deviate from

the established cooperative norms. They can usually be disciplined by the threat of

exclusion from the in-group while only rarely needing more drastic measures then

that, making punishment cheap and thus efficient.

Acknowledgments The generosity of James Walker, Martin Sefton and Robert Schupp; Ernst Fehr;

Nikos Nikiforakis and Hans-Theo Normann; Simon Gachter and Elke Renner to make their data publicly

available or allowing me to analyze them is very much appreciated. Thanks to Eckart Voland and an

anonymous reviewer for helpful discussions.

References

Aktipis CA (2011) Is cooperation viable in mobile organisms? Simple Walk Away rule favors the

evolution of cooperation in groups. Evolut Hum Behav. doi: 10.1016/j.evolhumbehav.2011.01.002

Andreoni J (1988) Why free ride? Strategies and learning in public goods experiments. J Public Econom

37(3):291–304

Axelrod R (1984/2000) Die Evolution der Kooperation. Oldenbourg, Munchen

Binmore K (2006) Why do people cooperate? J Polit Philos Econ 5(1):81–96

Bochet O, Page T, Putterman L (2006) Communication and punishment in contribution experiments.

J Econ Behav Organ 60(1):11–26. doi:10.1016/j.jebo.2003.06.006

Bowles S (2006) Group competition, reproductive leveling, and the evolution of human altruism. Science

314(5805):1569–1572. doi:10.1126/science.1134829

Bowles S (2009) Did warfare among ancestral hunter-gatherers affect the evolution of human social

behaviors? Science 324(5932):1293–1298. doi:10.1126/science.1168112

Boyd R, Gintis H, Bowles S, Richerson PJ (2003) The evolution of altruistic punishment. Proc Nat Acad

Sci 100(6):3531–3535

Brockhurst MA, Buckling A, Racey D, Gardner A (2008) Resource supply and the evolution of public-

goods cooperation in bacteria. BMC Biol 6:20–26

Burnham TC, Johnson DDP (2005) The biological and evolutionary logic of human cooperation. Analyse

und Kritik 27:113–135

Buss DM (2004) Evolutionary psychology: the new science of the mind. Pearson, Boston

Cardenas S (2000) How do groups solve local commons dilemmas? Lessons from experimental

economics in the field. Environ Dev Sustain 2(3–4):305–322

Cardenas J (2003) Real wealth and experimental cooperation: experiments in the field lab. J Dev Econ

70:263–289. doi:10.1016/S0304-3878(02)00098-6

Casari M (2003) Decentralized management of common property resources: experiments with a

centuries-old institution. J Econ Behav Organ 51(2):217–247. doi:10.1016/S0167-2681(02)00098-7

Chaudhuri A (2011) Sustaining cooperation in laboratory public goods experiments: a selective survey of

the literature. Exp Econ 14(1):47–83

Chhatre A, Agrawal A (2008) Forest commons and local enforcement. Proc Nat Acad Sci 105(36):

13286–13291. doi:10.1073/pnas.0803399105

Clutton-Brock TH, Parker GA (1995) Punishment in animal societies. Nature 373(6511):209–216. doi:

10.1038/373209a0

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy

Dreber A, Rand DG, Fudenberg D, Nowak MA, Rand DG, Nowak MA (2008) Winners don’t punish.

Nature 452(7185):348–351. doi:10.1038/nature06723

Egas M, Riedl A (2008) The economics of altruistic punishment and the maintenance of cooperation.

Proc R Soc B Biol Sci 275:871–878. doi:10.1098/rspb.2007.1558

Ember CR (1978) Myths about Hunter-Gatherers. Ethnology 17(4):439–448

Fehr E, Fischbacher U (2003) The nature of human altruism. Nature 425(6960):785–791. doi:10.1038/

nature02043

Fehr E, Gachter S (2000) Cooperation and punishment in public goods experiments. Am Econ Rev

90(4):980–994

Fehr E, Gachter S (2002) Altruistic punishment in humans. Nature 415(6868):137–140

Frey U (2007) Der blinde Fleck—Kognitive Fehler in der Wissenschaft und ihre evolutionsbiologischen

Grundlagen. Ontos, Heusenstamm

Gachter S, Herrmann B (2009) Reciprocity, culture and human cooperation: previous insights and a new

cross-cultural experiment. Philos Trans R Soc Lond B Biol Sci 364(1518):791–806. doi:

10.1098/rstb.2008.0275

Gachter S, Renner E, Sefton M (2008) The long-run benefits of punishment. Science 322:1510. doi:

10.1126/science.1164744

Gibson CC, Williams JT, Ostrom E (2005) Local enforcement and better forests. World Dev

33(2):273–284

Gigerenzer G, Todd PM, ABC Research Group (eds) (1999) Simple heuristics that make us smart. Oxford

University Press, Oxford

Gintis H (2009) The bounds of reason. Game theory and the unification of the behavioral sciences.

Princeton University Press, Princeton

Guala F (2012) Reciprocity: weak or strong? What punishment experiments do (and do not) demonstrate.

Behav Brain Sci 35(1):1–15. doi:10.1017/S0140525X11000069

Gurerk O, Irlenbusch B, Rockenbach B (2006) The competitive advantage of sanctioning institutions.

Science 312(5770):108–111. doi:10.1126/science.1123633

Gurven M (2004) To give and to give not: the behavioral ecology of human food transfers. Behav Brain

Sci 27(4):543–583. doi:10.1017/S0140525X04000123

Hamilton WD (1964a) The genetical evolution of social behaviour. I. J Theor Biol 7:1–16

Hamilton WD (1964b) The genetical evolution of social behaviour. II. J Theor Biol 7:17–52

Haselton MG, Bryant GA, Wilke A, Frederick DA, Galperin A, Frankenhuis WE, Moore T (2009)

Adaptive rationality: an evolutionary perspective on cognitive bias. Soc Cognit 27(5):733–763

Hawkes K (1991) Showing off tests of an hypothesis about men’s foraging goals. Ethol Sociobiol

12(1):29–54. doi:10.1016/0162-3095(91)90011-E

Hawkes K, Bird RB (2002) Showing off, handicap signaling, and the evolution of men’s work. Evol

Anthropol 11(2):58–67

Helbing D, Yu W (2009) The outbreak of cooperation among success-driven individuals under noisy

conditions. Proc Nat Acad Sci 106(10):3680–3685

Henrich J, Heine SJ, Norenzayan A (2010) The weirdest people in the world? Behav Brain Sci 33:61–135

Herrmann B, Thoni C, Gachter S (2008) Antisocial punishment across societies. Science 319:1362–1367.

doi:10.1126/science.1153808

Hill K (2001) Altruistic cooperation during foraging by the Ache, and the evolved human predisposition

to cooperate. Hum Nat 13(1):105–128

Hill K, Hurtado MA (2009) Cooperative breeding in South American hunter-gatherers. Proc R Soc B Biol

Sci 276(1674):3863–3870

Janssen MA, Holahan R, Lee A, Ostrom E (2010) Lab experiments for the study of social-ecological

systems. Science 328(5978):613–617

Koopmans R, Rebers S (2009) Collective action in culturally similar and dissimilar groups: an experiment

on parochialism, conditional cooperation, and their linkages. Evolut Hum Behav 30(3):201–211

Ledyard JO (1995) Public goods: a survey of experimental research. In: Kagel JH, Roth AE (eds) The

handbook of experimental economics. Princeton University Press, Princeton, pp 111–194

Leimar O, Hammerstein P (2001) Evolution of cooperation through indirect reciprocity. Proc R Soc B

Biol Sci 268:745–753. doi:10.1098/rspb.2000.1573

Masclet D, Noussair C, Tucker S, Villeval M (2003) Monetary and nonmonetary punishment in the

voluntary contributions mechanism. Am Econ Rev 93(1):366–380

Nikiforakis N (2008) Punishment and counter-punishment in public good games: can we really govern

ourselves? J Public Econom 92(1):91–112. doi:10.1016/j.jpubeco.2007.04.008

U. J. Frey, H. Rusch

123

Author's personal copy

Nikiforakis N, Normann H (2008) A comparative statics analysis of punishment in public-good

experiments. Exp Econ 11(4):358–369. doi:10.1007/s10683-007-9171-3

Ostrom E (1990) Governing the commons: the evolution of institutions for collective action. Cambridge

University Press, Cambridge

Ostrom E (1992) Crafting institutions for self-governing irrigation systems. Institute for Contemporary

Studies, San Francisco

Ostrom E, Gardner R, Walker J (1994) Rules, games, and common-pool resources. University of

Michigan Press, Ann Arbor

Page T, Putterman L, Unel B (2005) Voluntary association in public goods experiments: reciprocity,

mimicry efficiency. Econ J 115(506):1032–1053. doi:10.1111/j.1468-0297.2005.01031.x

Rockenbach B, Milinski M (2006) The efficient interaction of indirect reciprocity and costly punishment.

Nature 444:718–723. doi:10.1038/nature05229

Seabright P (2010) The company of strangers. A natural history of economic life, 2nd edn. Princeton

University Press, Princeton

Sefton M, Shupp R, Walker JM (2007) The effect of rewards and sanctions in provision of public goods.

Econ Inq 45(4):671–690. doi:10.1111/j.1465-7295.2007.00051.x

Semmann D, Krambeck H, Milinski M (2003) Volunteering leads to rock–paper–scissors dynamics in a

public goods game. Nature 425(6956):390–392. doi:10.1038/nature01986

Voland E (2009) Soziobiologie—Die Evolution von Kooperation und Konkurrenz. Spektrum, Akadem-

ischer Verlag, Heidelberg

West SA, Griffin AS, Gardner A (2007) Social semantics: altruism, cooperation, mutualism, strong

reciprocity and group selection. J Evolut Biol 20(2):415–432. doi:10.1111/j.1420-9101.2006.

01258.x

West SA, El Mouden C, Gardner A (2011) Sixteen common misconceptions about the evolution of

cooperation in humans. Evolut Hum Behav 32(4):231–262. doi:10.1016/j.evolhumbehav.2010.

08.001

Wiessner P (2009) Experimental games and games of life among the Ju/’hoan Bushmen. Curr Anthropol

50(1):133–138. doi:10.1086/595622

Yamagishi T, Kiyonari T (2000) The group as the container of generalized reciprocity. Soc Psychol Q

63(2):116–132

Zelmer J (2003) Linear public goods experiments: a meta-analysis. Exp Econ 6(3):299–310

An evolutionary perspective on the long-term efficiency of costly punishment

123

Author's personal copy