Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
EVOLUTIONARY GAME THEORY
Heinrich H. Nax
www.nax.science
Dec 2, 2019 Agent-Based Modeling and Social System Simulation (Fall Semester 2019)
1
Lecture 6: Evolutionary game theory
Common knowledge of rationality and the game
Suppose that players are rational decision makers and that mutual rationalityis common knowledge, that is:
I know that she knows that I will play rational
She knows that “I know that she knows that I will play rational”
I know that “She knows that “I know that she knows that I will playrational””
...
Further suppose that all players know the game and that again is commonknowledge.
2
Lecture 6: Evolutionary game theory
Rationality and the “as if” approach
The rationalistic paradigm in economics (Savage, The Foundations ofStatistics, 1954)
A person’s behavior is based on maximizing some goal function (utility)under given constraints and information
The “as if” approach (Friedman, The methodology of positive economics,1953)
Do not theorize about the intentions of agents’ actions but consider onlythe outcome (observables)Similar to the natural sciences where a model is seen as an approximationof reality rather than a causal explanation (e.g., Newton’s laws)
But is the claim right? Do people act (as if) they where rational?
3
Lecture 6: Evolutionary game theory
Nash’s mass-action interpretation (Nash, PhD thesis,1950)
“We shall now take up the “mass-action” interpretation of equilibriumpoints. In this interpretation solutions have no great significance. It isunnecessary to assume that the participants have full knowledge of the totalstructure of the game, or the ability and inclination to go through anycomplex reasoning processes. But the participants are supposed toaccumulate empirical information on the relative advantages of the variouspure strategies at their disposal.
...
Thus the assumption we made in this “mass-action” interpretation lead to theconclusion that the mixed strategies representing the average behavior ineach of the populations form an equilibrium.”
(bold text added for this presentation)
4
Lecture 6: Evolutionary game theory
Nash’s mass-action interpretation (Nash, PhD thesis,1950)
A large population of identical individuals represents each player role ina gameThe game is played recurrently (t = 0, 1, 2, 3, ...):
In each period one individual from each player population is drawnrandomly to play the game
Individuals observe samples of earlier behaviors in their own populationand avoid suboptimal play (successful strategies are copied morefrequently)
Nash’s claim: If all individuals avoid suboptimal pure strategies and thepopulation distribution is stationary then it constitutes a [Nash] equilibrium
Almost true! Evolutionary game theory formalizes these questions andprovides answers.
5
Lecture 6: Evolutionary game theory
The folk theorem of evolutionary game theory
Folk theorem
If the population process converges from an interior initial state,then for large t the distribution is a Nash equilibrium
If a stationary population distribution is stable, then it coincideswith a Nash equilibrium
Charles Darwin: “Survival of the fittest”The population which is best adapted to environment (exogenous) willreproduce more
Evolutionary game theoryThe population which performs best against other populations (endogenous)will survive/reproduce more
6
Lecture 6: Evolutionary game theory
Domain of analysis
Symmetric two-player games
A symmetric two-player normal form game G = 〈N, {Si}i∈N , {ui}i∈N〉consists of three object:
1 Players: N = {1, 2}, with typical player i ∈ N.2 Strategies: S1 = S2 = S with typical strategy s ∈ S.3 Payoffs: A function ui : (h, k)→ R mapping strategy profiles to a
payoff for each player i such that for all h, k ∈ S:
u2(h, k) = u1(k, h)
7
Lecture 6: Evolutionary game theory
Battle of the Sexes
Cafe PubCafe 4, 3 0, 0Pub 0, 0 3, 4
Not symmetric since:
u1(Cafe,Cafe) 6= u2(Cafe,Cafe)
8
Lecture 6: Evolutionary game theory
Prisoner’s dilemma
Cooperate DefectCooperate −1,−1 −8, 0
Defect 0,−8 −5,−5
Symmetric since:
u1(Cooperate,Cooperate) = u2(Cooperate,Cooperate) = −1
u1(Cooperate,Defect) = u2(Defect,Cooperate) = −8
u1(Defect,Cooperate) = u2(Cooperate,Defect) = 0
u1(Defect,Defect) = u2(Defect,Defect) = −5
9
Lecture 6: Evolutionary game theory
Symmetric Nash equilibrium
Definition: Symmetric Nash Equilibrium
A symmetric Nash equilibrium is a strategy profile σ∗ such that for ev-ery player i,
ui(σ∗, σ∗) ≥ ui(σ, σ
∗) for all σ
In words: If no player has an incentive to deviate from their part in aparticular strategy profile, then it is Nash equilibrium.
Proposition
In a symmetric normal form game there always exists a symmetric Nashequilibrium.
Note: Not all Nash equilibria of a symmetric game need to be symmetric.10
Lecture 6: Evolutionary game theory
Evolutionarily stable strategy (Maynard Smith and Price,1972)
Definition: Evolutionarily stable strategy (ESS)
A mixed strategy σ ∈ ∆(S) is an evolutionarily stable strategy (ESS)if for every strategy τ 6= σ there exists ε(τ) ∈ (0, 1) such that for allε ∈ (0, ε(τ)):
U(σ, ετ + (1− ε)σ) > U(τ, ετ + (1− ε)σ)
Let ∆ESS be the set of evolutionarily stable strategies.
11
Lecture 6: Evolutionary game theory
Prisoner’s dilemma
Cooperate DefectCooperate −1,−1 −8, 0
Defect 0,−8 −5,−5
∆ESS = {Defect}
13
Lecture 6: Evolutionary game theory
Coordination game
A BA 4, 4 0, 0B 0, 0 1, 1
Nash equilibria:(A,A), (B,B), (0.2 · A + 0.8 · B, 0.2 · A + 0.8 · B)
All Nash equilibra are symmetric.
But the mixed Nash equilibrium is not ESS:A performs better against it!
Note that the mixed Nash equilibrium is trembling-hand perfect.
14
Lecture 6: Evolutionary game theory
Existence of ESS not guaranteed
Example: Rock, paper, scissors
R P SR 0, 0 −1, 1 1,−1P 1,−1 0, 0 −1, 1S −1, 1 1,−1 0, 0
Unique Nash equilibrium and thus symmetric:σ = (1
3 R, 13 P, 1
3 S)
All pure strategies are best replies and do as well against themselves as σdoes against them⇒ Not an ESS!
15
Lecture 6: Evolutionary game theory
Relations to normal form refinements
Propositions
If σ ∈ ∆(S) is weakly dominated, then it is not evolutionarilystable.
If σ ∈ ∆ESS, then (σ, σ) is a perfect equilibrium.
If (σ, σ) is a strict Nash equilibrium, then σ is evolutionarilystable.
16
Lecture 6: Evolutionary game theory
Summary
Evolutionary game theory studies mutation processes (ESS)
The stable states often coincide with solution concepts from the“rational” framework
Evolutionary game theory does not explain how a population arrives atsuch a strategy⇒ Learning in games and behavioral game theory
The “best” textbook: Weibull, Evolutionary game theory, 1995
17
Game theory describes interactions
Stock market:
Individuals (traders)
Strategies (buy/sell)
Outcome (profit/loss)
wired.co.uk
Fun and games:
Players (hands)
Strategies (rock-paper-scissors)
Outcome (winner/looser)
Shamma
3
Game theory and (distributed) control...
Biology:
Individuals (honeybees)
Strategies (foraging nectar)
Outcome (survival)
beecare.bayer.com
CONTROL THEORY:
Distributed agents (turbines)
Actions (orientation)
System performance (energy)
studyindenmark.dk
4
Distributed control applications
Characteristics:Multiple decision making elementsInterdependencyNo central authorityDistributed informationCollective performance
From the analyst’s point of view, this constitutes a “game”!
6
Centralized versus distributed control
Optimization
vs
Decentralization
Distributed informationCostly (time, energy, etc) communicationNot just “multi component”Not just “graph structure”
Efficiency lossTragedy of the commonsPrice of Anarchy
7
Aims of today’s lecture
Understand the common principles ofdistributed control applications:routing, flocking, formation, coverage,assignment, cooperation, ...
Walk through details of one application:wind farm
Understand the game-theoretic parallels:players, actions, outcomes
Required reading (please ask for more!):
Marden & Shamma, “Game theory and distributed control”, Handbook ofGame Theory IV, Young & Zamir (Eds.), 2015.
8
Game theory and distributed systems
“... the study of mathematical models of conflict and cooperationbetween intelligent rational decision-makers."
Myerson, Game Theory, 1991.
“...systems are characterized by decentralization in available in-formation, multiplicity of decision makers, and individuality ofobjective functions for each decision maker.”
Saksena, O’Reilly, & Kokotovic, Automatica, 1984.
9
Game theory: distributed efficiency loss
Local objectives 6= collective objective
Braess Paradox
S D
A
B
% 1
0
%1
New road worsens congestion
60 people from S to D
No middle road:NE – 90 mins
With middle road:NE – 119/120 mins
10
Game theory: Basic conceptsElements:
Players/Agents/Actors/Individuals
Actions/Strategies/Choices/Decisions
Individual preferences over joint choices (payoffs, utility functions)
Solution concept in a distributed environmentWhat to expect?
Nash Equilibrium.Everyone’s choice is a best response from an individual perspective given thechoices of others.
12
Nash equilibrium & descriptive agenda
Game Elements
Solution Concept: NE
Rationality
“Keynes beauty contest”
Choose number between 0 and 100
Winner = Closest to 1/2 of average
NE: All pick 0
13
Nash equilibrium & descriptive agenda
Game Elements
Solution Concept
Rationality Perception
“Keynes beauty contest”
Choose number between 0 and 100
Winner = Closest to 1/2 of average
Individual best reply: pick 1/2 of what YOU THINK others’ will play
14
Repeated beauty contest
First round choices Fourth vs third round
Nagel, “Unraveling in guessing games: An experimental study”, AER, 1995.
15
“Our” beauty contest from lecture 1
16
Nash equilibrium & descriptive agenda
Game Elements
Solution Concept
Rationality Perception Evolution
“Keynes beauty contest”
Choose number between 0 and 100
Winner = Closest to 1/2 of average
Long-run outcome: All pick 0, i.e. NE
17
Learning/evolutionary games
Shift of focus:
Away from solution concept—Nash equilibrium
Towards how players might arrive to solution—i.e., dynamics
“The attainment of equilibrium requires a disequilibrium process.”
Arrow, 1987.
“The explanatory significance of the equilibrium concept depends onthe underlying dynamics.”
Skyrms, 1992.Distributed control: first, identify the target state; second, encouragedynamics that lead to it.
18
Literature
Monographs:
Weibull, Evolutionary Game Theory, 1997.
Young, Individual Strategy and Social Structure, 1998.
Fudenberg & Levine, The Theory of Learning in Games, 1998.
Samuelson, Evolutionary Games and Equilibrium Selection, 1998.
Young, Strategic Learning and Its Limits, 2004.
Sandholm, Population Dynamics and Evolutionary Games, 2010.
Surveys:
Hart, “Adaptive heuristics”, Econometrica, 2005.
Fudenberg & Levine, “Learning and equilibrium”, Annual Review ofEconomics, 2009.
19
Illustration: Fictitious play (1951)Stages: t = 0, 1, 2, ...Each player:
Maintain empirical frequencies (histograms) of opposing actionsForecasts (incorrectly) that others play according to observed empiricalfrequenciesSelects an action that maximizes expected payoff
Bookkeeping:
xi(·) = evolving empirical frequency of player i
Discrete-time:
xi(t + 1) = xi(t) +1
t + 1(xi(t)− rand[βi(x−i(t))]
)Continuous-time:
dxi
dt= −xi + βi(x−i)
20
Descriptive agenda analysis
Meta-theoremFor [special structure games] under [specific dynamics], players exhibit[asymptotic behavior].
TheoremFor zero-sum games under fictitious play, empirical frequencies converge toNE.
Many more...
21
Prescriptive agendaDesign degrees of freedom:
Game elements: Players, Actions, PreferencesEvolutionary dynamics: Online adaptation
Game ElementsCollectiveObjective
Evolution
CollectiveBehavior
Game ElementsCollectiveObjective
Evolution
CollectiveBehavior
Potential appeal:Distributed self-organizationAdaptation to environmentResilience to disruptions
Marden & JSS, “Game theory and distributed control”, Handbook of Game Theory IV, Young & Zamir (eds), forthcoming.22
Prescriptive agenda in action
TheoremFor potential games under restricted movement log linear learning, jointactions “linger” at potential maximizer.
Distributed graph coverage
Local movements
Local information exchange
Linger at maximal coverageMarden and JSS, “Cooperative control and potential games”, 2009.Yazicioglu, Egerstedt, and JSS, “A game theoretic approach to distributed coverage of graphs by heterogenous mobile agents”, 2013.
23
There and back again...
Game Elements
Solution Concept
Rationality Perception Evolution
Game ElementsCollectiveObjective
Evolution
CollectiveBehavior
“The explanatory significance of the equilibrium concept depends onthe underlying dynamics.”
Skyrms, 1992.
How to identify appropriate dynamics?
24
An applicationA wind farm:
Each windmill takes a directionalorientation and a blade angleDepending on wind direction, thisleads to an energy production foreach windmillThe central authority (forsimplicity) aims to maximize theenergy totalFor larger wind farms thecentralized control approach hasproven unsuccessful
Marden et al. 2013. “A Model-Free Approach to Wind Farm Control Using Game Theoretic Methods”. IEEE Transactions on Control Systems Technology 21(4):“Each turbine does not have access to the functional form of the power generated by the wind farm. This is because the aerodynamic interaction
access to the choices of other turbines. This is because of the lack of asuitable communication system.”
between the turbines is poorly understood. [...]"
Bee intermezzo
Bees:Bees fly to different patches offlowers foraging for nectarIf nectar per flower is abundant(high payoffs), bees continue in thecurrent patch with high probabilityIf a series of flowers yields lowpayoff, bees fly far away to a newpatch
Rule governs the behavior of bees (Thuijsman et al. JTB 1995)Shown to be a successful foraging strategy at the population level(implementing NE – Young 2009, even total payoff maximizing NE – Pradelskiand Young 2012).
26
More formally:
Game:Players i = 1, 2, ..., nFinite strategy setAi = {ai, bi, ..., ki}Joint strategy space A = ΠiAi
Payoffs ui : A→ R
How do you get windmills to play this game –giving them private utilityfunctions– so as to maximize total energy production?
27
The single turbine:
Game (given a certain winddirection):
Players i = 1, 2, ..., n(windmills/turbines)Finite strategy setAi = {ai, bi, ..., ki} (orientations)Joint strategy space A = ΠiAi (windpark configuration)Payoffs ui : A→ R (own energyproduction)
28
The learning rule (pseudo code)1. Initialize. t = 0, 1: each turbine i select a random (benchmark) orientation
ati resulting in power ut
i–. Windmill ‘moods’. t + 1 > 1:
if ati 6= at−1
i or uti ≥ ut−1
i , windmill ‘content’if at
i = at−1i and ut
i < ut−1i , windmill ‘discontent’
2a. Benchmark update. t + 1 > 1:if ‘content’,keep or switch benchmark according to higher payoffif ‘discontent’,keep old benchmark
2. Action update. t + 1 > 1:if ‘content’,play at
i with (high probability) 1− ε andRAND with εif ‘discontent’,windmill plays RAND with probability 1
29
Performance
Theorem. For any desired probability p < 1, there exists ε > 0 such that,for sufficiently large iterations, total power generated ismaximal with at least probability p.
Intuition:
A series of experiments leads to states with ever higher welfare untilsomeone’s payoff goes down.
That individual becomes discontent, and his searching may cause otheragents to become discontent.
Eventually the discontent agents settle into a new all-content state, wherethe settling probability increases with the overall welfare of the state.
30
Alternative approaches: cooperative control
Game:Players i = 1, 2, ..., nFinite strategy setAi = {ai, bi, ..., ki}Joint strategy space A = ΠiAi
Payoffs ui : A→ R (total energyproduction)
Making windmills play this game –giving them altruistic utility functions– willalso maximize total energy production.
31
Harmony intermezzo
Recall the difference between theprisoner’s dilemma and the harmonygame:
defection dominant strategy inprisoner’s dilemmacooperation dominant strategy inharmony game
Prisoner’s dilemma:
Confess Stay quietA A
Confess -6 -10B -6 0
Stay quiet 0 -2B -10 -2
How to transform a prisoner’s dilemma into a harmony game by addingaltruism...
32
Harmony intermezzo
Confess Stay quietA A
Confess -6 -10B -6 0
Stay quiet 0 -2B -10 -2
Defect CooperateA A
Defect 10-6 10-10B 10-6 10-0
Cooperate 10-0 10-2B 10-10 10-2
33
Harmony intermezzo
Confess Stay quietA A
Confess -6 -10B -6 0
Stay quiet 0 -2B -10 -2
Defect CooperateA A
Defect 4 0B 4 10
Cooperate 10 8B 0 8
34
Harmony intermezzo
Now each player cares forself and other the same way:
Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO
i.e.altruism/other-regardingconcern
Defect CooperateA A
Defect 4+4 0+10B 4+4 10+0
Cooperate 10+0 8+8B 0+10 8+8
35
Harmony intermezzo
Now each player cares forself and other the same way:
Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO
i.e.altruism/other-regardingconcern
Defect CooperateA A
Defect 8 10B 8 10
Cooperate 10 16B 10 16
Now any dynamic that implements Nash equilibrium in this modified harmonygame would maximize total payoffs...
36
Staghunt game
Think of the following coordinationgame:
there are two actions: safe and riskyone equilibrium is when bothplayers play safeanother is when both players playriskyrisky leads to higher total payoffs
Staghunt dilemma:
Risky SafeA A
Risky 5 4.5B 5 0
Safe 0 2B 4.5 2
Adding altruism...
37
Staghunt modified
Now each player cares forself and other the same way:
Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO
i.e.altruism/other-regardingconcern
Risky SafeA A
Risky 5+5 4.5+0B 5+5 0+4.5
Safe 0+4.5 2+2B 4.5+0 2+2
38
Staghunt modified
Now each player cares forself and other the same way:
Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO
i.e.altruism/other-regardingconcern
Risky SafeA A
Risky 10 4.5B 10 4.5
Safe 4.5 4B 4.5 4
Now risky-risky is the unique Nash equilibrium.
39
Harmony intermezzo
Now each player cares forself and other the same way:
Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO
i.e.altruism/other-regardingconcern
Risky SafeA A
Risky 10 4.5B 10 4.5
Safe 4.5 4B 4.5 4
Now any dynamic that implements Nash equilibrium in this game wouldmaximize total payoffs...
40
But this need not always work
Risky SafeA A
Risky 5 3B 5 0
Safe 0 2B 3 2
Risky SafeA A
Risky 10 3B 10 3
Safe 3 4B 3 4
Now not any dynamic that implements Nash equilibrium in this game wouldmaximize total payoffs – we are back to a selection problem!
41
Differences in information
Own energy only:ui(φS) = φS
no information necessary aboutstructure of the gameprogram dynamic offlinevery specific dynamics willworkdynamic requires no feedback
Total energy:e.g. ui(φS) = φS + φO
need to understand structure ofthe game in order to identifywhich specification willgenerate desired equilibriamore general class of dynamicswill workprogram dynamic offlinedynamic requires feedbackabout energy total as gamecontinues
Which approach is better depends on the application.
42
Summary: game theory describes interactions
Economics:
Individuals (traders)
Strategies (buy/sell)
Outcome (profit/loss)
wired.co.uk
Mechanism design:
Players (doctors and hospitals)
Strategies (applications)
Outcome (Matching)
NRMP
44
Game theory and (distributed) control...
Biology:
Individuals (honeybees)
Strategies (foraging nectar)
Outcome (survival)
beecare.bayer.com
Distributed control:
Distributed agents (turbines)
Actions (orientation)
System performance (energy)
studyindenmark.dk
45
Broad agenda comparison
Biology Social Mechanism Distributedsystems design control
Game structure given given manipulable manipulableActions given given given givenPayoffs given given given manipulableInformation given given manipulable given
46
Thanks & Acknowledgements
Bary Pradelski Jeff ShammaPeyton Young
47