Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Decision Making in Robots and Autonomous Agents
Introduc2on to Game Theory; Stochas2c & Bayesian Games
Subramanian Ramamoorthy School of Informa2cs
4 February, 2014
Robots O(en Face Strategic Adversaries
4/02/2014 2
Key issue we seek to model: Misaligned/conflicting interest
On Self-‐Interest
What does it mean to say that agents are self-‐interested?
• It does not necessarily mean that they want to cause harm to each other, or even that they care only about themselves.
• Instead, it means that each agent has his own descripHon of which states of the world he likes—which can include good things happening to other agents —and that he acts in an a.empt to bring about these states of the world (beLer term: inter-‐dependent decision making)
4/02/2014 3
Basic Constructs of Game (Extensive Form)
• Move: A point of decision for the player, defined by a set of acHons that could be chosen (e.g., I am holding King-‐10-‐2). In an abstract form:
• Choice: The parHcular acHon that has actually been chosen
(play King) • Play: A sequence of choices, one following another, unHl the
game terminates (King -‐> 10 -‐> 2)
4/02/2014 4
The move looks identical in a different game involving, say, passing – calling- betting
Abstract RelaHon Between Moves
4/02/2014 5
Number refers to which player has to make a choice
Game Trees
• The previous picture is o(en referred to as a game tree, a lisHng of moves and consequences
• It is a tree in the mathemaHcal sense
• Strictly speaking, many games should have a game graph (not just a tree) – Why?
• However, the convenHon is to treat the play (history of moves) as unique even if it revisits a specific state
4/02/2014 6
InformaHon Sets
• What can each player know when he makes a choice at a move?
• We are not asking about their way of playing – just, what is the most they could possibly know without violaHng the rules of the game? – e.g., think of card games with chance moves, or where a player picks a
card and places it face down on the table
• Rules of the game specify which moves are indisHnguishable • Two requirements for informaHon set:
– Moves must be assigned to same player – Moves must have same number of alternaHves
4/02/2014 7
InformaHon Sets -‐ Pictorially
4/02/2014 8
SpecificaHon of an Extensive Form Game
• A finite tree with a disHnguished node • A parHHon of the nodes of the tree into n+1 sets (specifying who
takes the move) • A probability distribuHon over the branches of chance moves • A refinement of the player parHHon into informaHon sets
(characterizes, for each player, the ambiguity of locaHon of the game tree of each of his moves)
• An idenHficaHon of corresponding branches for each of the moves in each informaHon set
• A set of outcomes and an assignment of outcomes to each endpoint of the tree
4/02/2014 9
How Do Players Actually Choose?
• Each player has a linear uHlity funcHon Mi over outcomes • Each player is fully aware of the rules, and will maximize
expected uHlity
• Pure strategy: prescripHon of decision for each situaHon • For any fixed strategy, given the rules of the game, the game
tree can be evaluated directly to yield a value • If there are chance moves, selecHon of strategies of players
defines a distribuHon over plays and the payoff is expected value w.r.t. this distribuHon
4/02/2014 10
A Different Simple Model of a Game
• Two decision makers – Robot (has an acHon space: a) – Adversary (has an acHon space: θ)
• Cost or payoff (to use the term common in game theory) depends on acHons of both decision makers: R(a, θ) – denote as a matrix corresponding to product space
4/02/2014 11
This is the normal form – simultaneous choice over moves
A
RepresenHng Payoffs
In a general, bi-‐matrix, normal form game:
The combined acHons (a1, a2, …, an) form an ac#on profile a Є A
4/02/2014 12
Action sets of players Payoff function:
a.k.a. utility u2(a)
Example: Rock-‐Paper-‐Scissors
• Famous children’s game • Two players; Each player simultaneously picks an acHon which
is evaluated as follows, – Rock beats Scissors – Scissors beats Paper – Paper beats Rock
4/02/2014 13
TCP Game
• Imagine there are only two internet users: you and me • Internet traffic is governed by TCP protocol, one feature of
which is the backoff mechanism: when network is congested then backoff and reduce transmission rates for a while
• Imagine that there are two implementaHons: C (correct, does what is intended) and D (defecHve)
• If you both adopt C, packet delay is 1 ms; if you both adopt D, packet delay is 3 ms
• If one adopts C but other adopts D then D user gets no delay and C user suffers 4 ms delay
4/02/2014 14
TCP Game in Normal Form
4/02/2014 15
Note that this is another way of writing a bi-matrix game: First number represents payoff of row player and second number is payoff for column player
Some Famous Matrix Examples -‐ What are they Capturing?
• Prisoner’s Dilemma: Cooperate or Defect (same as TCP game)
• Bach or Stravinsky (von Neumann called it BaLle of the Sexes)
• Matching Pennies: Try to get the same outcome, Heads/Tails
4/02/2014 16
Different CategorizaHon: Common Payoff
A common-‐payoff game is a game in which for all ac>on profiles a ∈ A1 ×·∙ ·∙ ·∙× An and any pair of agents i, j , it is the case that ui(a) = uj(a)
4/02/2014 17
Pure coordination: e.g., driving on a side of the road
Different CategorizaHon: Constant Sum
A two-‐player normal-‐form game is constant-‐sum if there exists a constant c such that for each strategy profile a ∈ A1 × A2 it is the case that u1(a) + u2(a) = c
4/02/2014 18
Pure competition: One player wants to coordinate Other player does not!
What Can Players Do?
4/02/2014 19
Strategies
4/02/2014 20
Expected utility
SoluHon Concepts
Many ways of describing what one ought to do: – Dominance – Minimax – Pareto Efficiency – Nash Equilibria – Correlated Equilibria
Remember that in the end game theory aspires to predict behaviour given specificaHon of the game.
Norma>vely, a soluHon concept is a ra>onale for behaviour
4/02/2014 21
Concept: Dominance
4/02/2014 22
Concept: Iterated Dominance
4/02/2014 23
Concept: Minimax
4/02/2014 24
Minimax
4/02/2014 25
CompuHng Minimax: Linear Programming
4/02/2014 26 We will look at this LP formulation in Tutorial 1, along with MDPs.
Pick-‐a-‐Hand • There are two players: chooser (player I) & hider (player II)
• The hider has two gold coins in his back pocket. At the beginning of a turn, he puts his hands behind his back and either takes out one coin and holds it in his le( hand, or takes out both and holds them in his right hand.
• The chooser picks a hand and wins any coins the hider has hidden there.
• She may get nothing (if the hand is empty), or she might win one coin, or two.
4/02/2014 27
Pick-‐a-‐Hand, Normal Form:
• Hider could minimize losses by placing 1 coin in le( hand, most he can lose is 1
• If chooser can figure out hider’s plan, he will surely lose that 1
• If hider thinks chooser might strategise, he has incenHve to play R2, …
• All hider can guarantee is max loss of 1 coin
4/02/2014 28
• Similarly, chooser might try to maximise gain, picking R
• However, if hider strategizes, chooser ends up with zero
• So, chooser can’t actually guarantee winning anything
Pick-‐a-‐Hand, with Mixed Strategies
• Suppose that chooser decides to choose R with probability p and L with probability 1 − p
• If hider were to play pure strategy R2 his expected loss would be 2p
• If he were to play L1, expected loss is 1 − p
• Chooser maximizes her gains by choosing p so as to maximize min{2p, 1 − p}
• Thus, by choosing R with probability 1/3 and L with probability 2/3, chooser assures expected payoff of 2/3, regardless of whether hider knows her strategy
4/02/2014 29
Mixed Strategy for the Hider
• Hider will play R2 with some probability q and L1 with probability 1−q
• The payoff for chooser is 2q if she picks R, and 1 − q if she picks L
• If she knows q, she will choose the strategy corresponding to the maximum of the two values.
• If hider, knows chooser’s plan, he will choose q = 1/3 to minimize this maximum, guaranteeing that his expected payout is 2/3 (because 2/3 = 2q = 1 − q)
• Chooser can assure expected gain of 2/3, hider can assure an expected loss of no more than 2/3, regardless of what either knows of the other’s strategy.
4/02/2014 30
Safety Value as IncenHve
• Clearly, without some extra incenHve, it is not in hider’s interest to play Pick-‐a-‐hand because he can only lose by playing.
• Thus, we can imagine that chooser pays hider to enHce him into joining the game.
• 2/3 is the maximum amount that chooser should pay him in order to gain his parHcipaHon.
4/02/2014 31
Another Game
Mixed strategies: • Suppose player I plays T
with probability p and B with probability 1−p
• Player II plays L with probability q and R with probability 1 − q
• For player I, expected payoff is 2(1 − q) for playing pure strategy T; 4q + 1 for playing pure strategy B.
• If she knows q, she’ll pick the strategy corresponding to max{2(1 − q), 4q + 1}
• Player II can choose q = 1/6 so as to minimize this maximum, and expected amount player II will pay player I is 5/3.
4/02/2014 32
The Game Analysed Graphically
If pl. I knows q, she’ll pick strategy based on max{2(1 − q), 4q + 1}. Player II can choose q = 1/6 so as to minimize this maximum. Expected amount player II will pay player I is 5/3.
For pl. II, expected loss is 5(1 − p) if he plays pure strategy L and 1+p if he plays pure strategy R; he will aim to minimize this expected payout. In order to maximize this minimum, player I will choose p = 2/3, yielding expected gain 5/3.
4/02/2014 33
General FormulaHon
The mixed strategies can be represented as follows, where xi are pure strategies of player I and yi are pure strategies of player II
4/02/2014 34
CondiHon for OpHmality
• Mixed strategy is opHmal for player I if:
• Similarly, mixed strategy is opHmal for player II if:
4/02/2014 35
A General Result
4/02/2014 36
Equilibrium as a Saddle Point
4/02/2014 37
Concept: Pareto Efficiency
4/02/2014 38
Pareto Efficiency
4/02/2014 39
Concept: Nash Equilibrium
4/02/2014 40
Nash Equilibrium
4/02/2014 41
Nash Equilibrium -‐ Example
4/02/2014 42
Nash Equilibrium -‐ Example
4/02/2014 43
Concept: Correlated Equilibrium
4/02/2014 44
Correlated Equilibrium
4/02/2014 45
Correlated Equilibrium
4/02/2014 46
StochasHc Games
4/02/2014 47
StochasHc Game -‐ Setup
4/02/2014 48
StochasHc Game -‐ Policies
4/02/2014 49
StochasHc Game -‐ Example
4/02/2014 50
StochasHc Game -‐ Remarks
4/02/2014 51
Nash Equilibria – StochasHc Game
4/02/2014 52
Nash Equilibria – StochasHc Game
4/02/2014 53
Incomplete InformaHon
• So far, we assumed that everything relevant about the game being played is common knowledge to all the players: – the number of players – the acHons available to each – the payoff vector associated with each acHon vector
• True even for imperfect-‐informaHon games – The actual moves aren’t common knowledge, but the game is
• Next, consider games of incomplete (not imperfect) informa#on – Players are uncertain about the game being played
4/02/2014 54
Bayesian Games
• We know the set G of all possible games, but do not know anything about which game in G – Enough informaHon to put a probability distribuHon over games
• A Bayesian Game is a class of games G that saHsfies two fundamental condiHons
• Condi#on 1: – The games in G have the same number of agents, and the same strategy space (set of possible strategies) for each agent. The only difference is in the payoffs of the strategies.
• This condiHon isn’t very restricHve – Other types of uncertainty can be reduced to the above, by reformulaHng the problem
4/02/2014 55
An Example • Suppose we do not know whether player 2 only has strategies L and
R, or also an addiHonal strategy C:
• If player 2 does not have strategy C, this is equivalent to having a
strategy C that is strictly dominated by other strategies: – Nash equilibria for G1' are the same as for G1
• Problem is reduced to whether C’s payoffs are those of G1' or G2 4/02/2014 56
Bayesian Games
Condi#on 2 (common prior): – The probability distribuHon over the games in G is common
knowledge (i.e., known to all the agents) • So a Bayesian game defines
– the uncertainHes of agents about the game being played, – what each agent believes the other agents believe about the game
being played • The beliefs of the different agents are posterior probabiliHes
– Combine the common prior distribuHon with individual “private signals” (what is “revealed” to the individual players)
• The common-‐prior assumpHon rules out whole families of games – But it greatly simplifies the theory, so most work in game theory uses it
4/02/2014 57
The Bayesian Game Model
A Bayesian game consists of • a set of games that differ only in their payoffs • a common (known to all players) prior distribuHon over them • for each agent, a parHHon structure (set of informaHon sets)
over the games
4/02/2014 58
Bayesian Game: InformaHon Sets Defn.
A Bayesian game is a 4-‐tuple (N,G,P,I)
• N is a set of agents • G is a set of N-‐agent games • For every agent i, every game in
G has the same strategy space • P is a common prior over G
– common: common knowledge (known to all the agents)
– prior: probability before learning any addiHonal informaHon
• I = (I1, …, IN) is a tuple of parHHons of G, one for each agent (informaHon sets)
4/02/2014 59
Example
• Suppose the randomly chosen game is MP
• Agent 1’s informaHon set is I1,1 – 1 knows it is MP or PD – 1 can infer posterior
probabili2es for each
• Agent 2’s informaHon set is I2,1
4/02/2014 60
Another InterpretaHon: Extensive Form
4/02/2014 61
Epistemic Types
• We can assume the only thing players are uncertain about is the game’s uHlity funcHon
• Thus we can define uncertainty directly over a game’s uHlity funcHon
• All this is common knowledge; each agent knows its own type 4/02/2014 62
Types
An agent’s type consists of all the informaHon it has that isn’t common knowledge, e.g.,
• The agent’s actual payoff funcHon • The agent’s beliefs about other agents’ payoffs, • The agent’s beliefs about their beliefs about his own payoff • Any other higher-‐order beliefs
4/02/2014 63
Strategies
Similar to what we had in imperfect-‐informaHon games: • A pure strategy for player i maps each of i’s types to an acHon
– what i would play if i had that type • A mixed strategy si is a probability distribuHon over pure strategies
si(ai|θj) = Pr[i plays action aj | i’s type is θj] • Many kinds of expected uHlity: ex post, ex interim, and ex
ante – Depend on what we know about the players’ types
4/02/2014 64
Expected UHlity
4/02/2014 65
Bayes-‐Nash Equilibrium
4/02/2014 66
CompuHng Bayes-‐Nash Equilibria
• The idea is to construct a payoff matrix for the enHre Bayesian game, and find equilibria on that matrix
Write each of the pure strategies as a list of acHons, one for each type:
4/02/2014 67
CompuHng Bayes-‐Nash Equilibria
Compute ex ante expected uHlity for each pure-‐strategy profile:
4/02/2014 68
CompuHng Bayes-‐Nash Equilibria
4/02/2014 69
CompuHng Bayes Nash Equilibria
4/02/2014 70 and so on…
Acknowledgements
Many slides are adapted from the following sources: • Tutorial at IJCAI 2003 by Prof Peter Stone, University of Texas • Y. Peres, Game Theory, Alive (Lecture Notes) • Game Theory lectures by Prof. Dana Nau, University of
Maryland
4/02/2014 71