Nash Equilibrium in Tullock Contests - ETH Z · Nash Equilibrium in Tullock Contests Aidas Masiliunas1 1Aix-Marseille School of Economics Controversies in Game Theory III, ETH Zurich

Nash equilibrium Non-standard preferences Experimental design Results Other projects

Nash Equilibrium in Tullock Contests

Aidas Masiliunas1

1Aix-Marseille School of Economics

Controversies in Game Theory III, ETH Zurich

2 June, 2016


Rent-seeking (Tullock) contest

Two players compete for a prize (16 ECU) by making costlyinvestments (x1, x2 ≤ 16)

Higher investments increase the probability to win the prize

Probability that player i receives the prize: xixi+xj

Applications:

Competition for monopoly rentsInvestments in R&DCompetition for a promotion/bonusPolitical contests


Rent-seeking (Tullock) contest

Two players compete for a prize (16 ECU) by making costlyinvestments (x1, x2 ≤ 16)

Higher investments increase the probability to win the prize

Probability that player i receives the prize: xixi+xj

Applications:

Competition for monopoly rentsInvestments in R&DCompetition for a promotion/bonusPolitical contests


Theory

E (π) = xixi+xj

· 16 + 16− xi

BRi (xj) : x∗i =√

16xj − xj

RNNE : x∗i = 4, dominance solvable in three steps.

5 10 15

510

15

Standard preferences

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

12

34

56

78

910

1214

16


Explanatory power of Nash equilibrium in experiments

7.04% of choices are exactly Nash

60.19% of choices are strictly dominated

Investments are spread across the whole strategy space

Experience does not help

Less stability compared to auctions


Comparative statics of Nash equilibrium

An alternative to point predictions is comparative statics

Is behaviour sensitive to changes in the Nash prediction?

Players Nash Mean investment

2 250 3253 222 2834 188 3025 160 3229 99 326

Source: Lim, Matros & Turocy, 2014


Comparative statics of Nash equilibrium

An alternative to point predictions is comparative statics

Is behaviour sensitive to changes in the Nash prediction?

Players Nash Mean investment

2 250 3253 222 2834 188 3025 160 3229 99 326

Source: Lim, Matros & Turocy, 2014


Why should players choose Nash equilibrium?

Interpretation #1: Nash equilibrium is the unique actionprofile that can be justified by common knowledge ofrationality.

Rationality = maximization of expected payoff given somebelief.


Rationalizable strategies

xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16BR(xi ) 3 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1



xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BR(xi ) 3 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1

BR(BR(xi )) 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3BR(BR(BR(xi ))) 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Rationality

Rationalizable: 3, 4, 2, 1

Rationality + belief that the opponent is rational

Rationalizable: 3, 4

Rationality + belief that the opponent is rational + beliefthat the opponent believes in my rationality

Rationalizable: 4

Epistemic definition of Nash equilibrium: common belief inrationality + simple belief hierarchy



xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BR(xi ) 3 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1BR(BR(xi )) 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3

BR(BR(BR(xi ))) 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Rationality





Rationalizable: 4




xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BR(xi ) 3 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1BR(BR(xi )) 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3BR(BR(BR(xi ))) 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Rationality





Rationalizable: 4




xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

BR(xi ) 3 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1BR(BR(xi )) 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3BR(BR(BR(xi ))) 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Rationality





Rationalizable: 4




Nash equilibrium is the unique action profile that cannot beruled out by common knowledge of rationality.

1 Players care about expected payoffs2 Players have the ability to calculate expected payoffs and

identify dominated strategies3 Players believe that other players satisfy 1-2, and believe that

they believe that they satisfy 1-2...

Nash equilibrium is the rest point of various learning dynamics

Belief-based learning, e.g. Cournot best-response, fictitiousplay

Assumption 3 is not necessary

Payoff-based learning, e.g. reinforcement learning

Players must be willing to explore, remember past payoffs,receive accurate feedback.
























Which assumptions are violated?


Preference-based explanations: joy of winning

Participants receive non-monetary utility from winning (Parcoet al, 2005, Sheremeta, 2011) or lose utility after losing(Delgado et al., 2008).

Sheremeta (2011) elicits joy of winning by implementing acontest where prize has no value.

5 10 15

510

15

Joy of winning with w=3

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

12

34

56

78

910

1214

16

5 10 15

510

15

Joy of winning with w = 8

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

12

34

56

78

910

1214

16


Preference-based explanations: risk preferences

CRRA untility function: u(πi ) =π1−ρi1−ρ

Risk aversion if ρ = 0.5, risk seeking if ρ = −0.5

5 10 15

510

15

Risk aversion

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

12

34

56

78

910

1214

16

5 10 15

510

15

Risk seeking

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

12

34

56

78

910

1214

16


Preference-based explanations: social preferences

Fehr & Schmidt (1999) inequality aversion:

u(πi , πj) =

{πi − α(πj − πi ) if πi ≤ πjπi − β(πi − πj) if πi > πj

5 10 15

510

15Fehr and Schmidt (1999) inequality aversion

Other plays

Bes

t Res

pons

e

1 2 3 4 5 6 7 8 9 10 12 14 16

12

34

56

78

911

1315

a=0, b=0a=0.5, b=0a=1, b=0


All preferences from Sheremeta (2015)


”Behavioral Variation in Tullock Contests”, joint with F.Mengel and Ph. Reiss

Deviations from NE could be a result of bounded rationality

Players optimize given the feedback in previous rounds.

Noisy feedback prevents players from discovering optimalactions

Research questions:

Can we identify whether deviations from NE are a result ofbounded rationality or of preferences?Is behavioral variability lower and choices closer to theoreticalpredictions when feedback is more informative?


”Behavioral Variation in Tullock Contests”, joint with F.Mengel and Ph. Reiss

Deviations from NE could be a result of bounded rationality

Players optimize given the feedback in previous rounds.

Noisy feedback prevents players from discovering optimalactions

Research questions:

Can we identify whether deviations from NE are a result ofbounded rationality or of preferences?Is behavioral variability lower and choices closer to theoreticalpredictions when feedback is more informative?


How informative is the feedback that players observe?

Reinforcement learning converges to NE as t →∞In experiments players rely on small samples of experience

Suppose that players always choose the action that yieldedhighest average payoff in the past.










Feedback depends on other’s choices and lottery outcomes


Treatment 1: eliminate lottery allocation


Treatment 2: eliminate variability of opponent’s choices


Treatment 3: eliminate both


How easy is it to learn in different treatments?

Estimate the likelihood that action 4 will yield a higheraverage payoff than action 6.

Π(4) > Π(6)

Memory length

% o

f ite

ratio

ns

0 10 20 30 40 50

0

25

50

75

100

● Shared prize, fixed actionsShared prize, changing actionsLottery, fixed actionsLottery, changing actions




Π(4) > Π(6)

Memory length

% o

f ite

ratio

ns

0 10 20 30 40 50

0

25

50

75

100





Π(4) > Π(6)

Memory length

% o

f ite

ratio

ns

0 10 20 30 40 50

0

25

50

75

100





Π(4) > Π(6)

Memory length

% o

f ite

ratio

ns

0 10 20 30 40 50

0

25

50

75

100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



Procedure

40 rounds, divided into 4 blocks of 10 rounds

Each block divided into experimentation phase (rounds 1-5)and incentivized phase (rounds 6-10)

1 5 106 11 15 16 20 21 26 3025 3531 36 40

Non-incentivized Non-incentivized Non-incentivized Non-incentivizedIncentivized Incentivized Incentivized Incentivized

Block 4Block 3Block 2Block 1

One round from each block randomly chosen for payment

Incentivized numeracy test at the end of the experiment

Average earnings 15.15 euro, duration 60 minutes


Explanatory power of Nash equilibrium

Changing actions Fixed actionsLottery EV Lottery EV

P(x = NE ) 7.04% 13.33% - -P(x = BR) - - 22.50% 65.23%P(|x − NE | ≤ 1) 25.74% 32.78% - -P(|x − BR| ≤ 1) - - 47.95% 83.64%P(x > 4) 60.19% 62.78% 51.36% 16.14%

Absolute value of deviation from equilibrium significantly differentbetween EV/Fixed treatment and the other three treatments, but not inother comparisons.


Behavioral variation

Is the distribution of choices more concentrated? (notnecessarily around NE)

Entropy measures the stochastic variation of a randomvariable (0 = one strategy always chosen, 4 = all strategieschosen with equal frequency):

H = −∑

i=1...16

pi log(pi )


Entropy 3.22 2.79 2.45 1.50Std. Dev. 3.28 2.56 3.15 1.16


Behavioral variation

Is the distribution of choices more concentrated? (notnecessarily around NE)

Entropy measures the stochastic variation of a randomvariable (0 = one strategy always chosen, 4 = all strategieschosen with equal frequency):

H = −∑

i=1...16

pi log(pi )


Entropy 3.22 2.79 2.45 1.50Std. Dev. 3.28 2.56 3.15 1.16


Best-response curves in Fixed treatments


Stability of choices and convergence

Changing strategies between rounds in experimentation andincentivized rounds.


Stability of choices and convergence

Changing strategies between rounds in experimentation andincentivized rounds.


Replacing humans by computers

Playing against a computer player is different than playingagainst a human player: no social preferences, lower joy ofwinning (?)

Additional treatment replacing computers by human players.

All effects replicate if Fixed/EV treatment is replaced by thistreatment.

Changing actions Fixed actionsLottery EV Lottery EV EV-Human

P(x = NE ) 7.04% 13.33% - - -P(x = BR) - - 22.50% 65.23% 50.42%P(|x − NE | ≤ 1) 25.74% 32.78% - - -P(|x − BR| ≤ 1) - - 47.95% 83.64% 74.58%P(x > 4) 60.19% 62.78% 51.36% 16.14% 23.33%Entropy 3.22 2.79 2.45 1.50 1.13Std. Dev. 3.28 2.56 3.15 1.16 0.91


Replacing humans by computers

Playing against a computer player is different than playingagainst a human player: no social preferences, lower joy ofwinning (?)

Additional treatment replacing computers by human players.

All effects replicate if Fixed/EV treatment is replaced by thistreatment.

Changing actions Fixed actionsLottery EV Lottery EV EV-Human

P(x = NE ) 7.04% 13.33% - - -P(x = BR) - - 22.50% 65.23% 50.42%P(|x − NE | ≤ 1) 25.74% 32.78% - - -P(|x − BR| ≤ 1) - - 47.95% 83.64% 74.58%P(x > 4) 60.19% 62.78% 51.36% 16.14% 23.33%Entropy 3.22 2.79 2.45 1.50 1.13Std. Dev. 3.28 2.56 3.15 1.16 0.91


Strategic uncertainty vs stability

Matching players to computers has two effects:The action of the other party is stable over time, hence it iseasier to learn.Players face no strategic uncertainty, hence it is easier tooptimize

Is stability of choices necessary in addition to the removal ofstrategic uncertainty?Design: computer plays actions from the baseline contest,players know these actions.

Changing actions Changing but known Fixed actionsLottery EV Lottery EV Lottery EV

P(a = NE ) 7.04% 13.33% - - - -P(a = BR) - - 7.59% 25.37% 22.50% 65.23%P(|a− NE | ≤ 1) 25.74% 32.78% - - - -P(|a− BR| ≤ 1) - - 25.00% 51.85% 47.95% 83.64%P(a > 4) 60.19% 62.78% 62.96% 47.04% 51.36% 16.14%
















Contests with forgone payoff information

Conclusion from the first paper: when feedback is moreinformative about the quality of actions, players make betterchoices.

Can we improve the quality of feedback without changing thenature of the game?

Hypothesis: more information and higher quality ofinformation increases the rate of learning

Design: 10 rounds of standard contest, 20 rounds of contestwith foregone payoff information, 10 rounds of standardcontest


Contests with forgone payoff information

Conclusion from the first paper: when feedback is moreinformative about the quality of actions, players make betterchoices.

Can we improve the quality of feedback without changing thenature of the game?

Hypothesis: more information and higher quality ofinformation increases the rate of learning

Design: 10 rounds of standard contest, 20 rounds of contestwith foregone payoff information, 10 rounds of standardcontest


”Contests with foregone payoff information”


Hypotheses: reinforcement learning simulation

Π(2) > Π(4)

Memory length

% o

f ite

ratio

ns

0 10 20 30 40 50

0

25

50

75

100

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

● ●●

●

● ●● ●

● ● ● ●

● ●● ●

● ● ● ●● ● ● ●

● ●

● Same actions, same random numbersDifferent actions, same random numbersSame actions, different random numbersDifferent actions, different random numbers


Results: average investments


Results: dominated strategies


Payoff based learning, joint with H. Nax

Calculating expected values is very complicated

Convergence is much higher when players can use a payofftable/calculator and with neutral framing

020

040

060

080

0in

vest

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20


Summary

Nash equilibrium has a very low explanatory power in Tullockcontests

Explanatory power is much higher when actions have directpayoff consequences

Providing additional feedback about foregone payoffinformation does not improve the explanatory power

Paying the expected payoffs does not improve learning, unlessplayers know these payoffs.

Documents

Nash Equilibrium in Tullock Contests - ETH Z · Nash Equilibrium in Tullock Contests Aidas Masiliunas1 1Aix-Marseille School of Economics Controversies in Game Theory III, ETH Zurich