Decision(Making(Basic&Constructs&of&Game&(Extensive&Form)& • Move:&A&pointof&decision&for&the&player,&deﬁned&by&asetof& acHons&thatcould&be&chosen&(e.g.,&Iam&holding&King;10;2

Decision Making in Robots and Autonomous Agents

Introduc2on to Game Theory; Stochas2c & Bayesian Games

Subramanian Ramamoorthy School of Informa2cs

4 February, 2014

Robots O(en Face Strategic Adversaries

4/02/2014 2

Key issue we seek to model: Misaligned/conflicting interest

On Self-‐Interest

What does it mean to say that agents are self-‐interested?

•  It does not necessarily mean that they want to cause harm to each other, or even that they care only about themselves.

•  Instead, it means that each agent has his own descripHon of which states of the world he likes—which can include good things happening to other agents —and that he acts in an a.empt to bring about these states of the world (beLer term: inter-‐dependent decision making)

4/02/2014 3

Basic Constructs of Game (Extensive Form)

•  Move: A point of decision for the player, defined by a set of acHons that could be chosen (e.g., I am holding King-‐10-‐2). In an abstract form:

•  Choice: The parHcular acHon that has actually been chosen

(play King) •  Play: A sequence of choices, one following another, unHl the

game terminates (King -‐> 10 -‐> 2)

4/02/2014 4

The move looks identical in a different game involving, say, passing – calling- betting

Abstract RelaHon Between Moves

4/02/2014 5

Number refers to which player has to make a choice

Game Trees

•  The previous picture is o(en referred to as a game tree, a lisHng of moves and consequences

•  It is a tree in the mathemaHcal sense

•  Strictly speaking, many games should have a game graph (not just a tree) –  Why?

•  However, the convenHon is to treat the play (history of moves) as unique even if it revisits a specific state

4/02/2014 6

InformaHon Sets

•  What can each player know when he makes a choice at a move?

•  We are not asking about their way of playing – just, what is the most they could possibly know without violaHng the rules of the game? –  e.g., think of card games with chance moves, or where a player picks a

card and places it face down on the table

•  Rules of the game specify which moves are indisHnguishable •  Two requirements for informaHon set:

–  Moves must be assigned to same player –  Moves must have same number of alternaHves

4/02/2014 7

InformaHon Sets -‐ Pictorially

4/02/2014 8

SpecificaHon of an Extensive Form Game

•  A finite tree with a disHnguished node •  A parHHon of the nodes of the tree into n+1 sets (specifying who

takes the move) •  A probability distribuHon over the branches of chance moves •  A refinement of the player parHHon into informaHon sets

(characterizes, for each player, the ambiguity of locaHon of the game tree of each of his moves)

•  An idenHficaHon of corresponding branches for each of the moves in each informaHon set

•  A set of outcomes and an assignment of outcomes to each endpoint of the tree

4/02/2014 9

How Do Players Actually Choose?

•  Each player has a linear uHlity funcHon Mi over outcomes •  Each player is fully aware of the rules, and will maximize

expected uHlity

•  Pure strategy: prescripHon of decision for each situaHon •  For any fixed strategy, given the rules of the game, the game

tree can be evaluated directly to yield a value •  If there are chance moves, selecHon of strategies of players

defines a distribuHon over plays and the payoff is expected value w.r.t. this distribuHon

4/02/2014 10

A Different Simple Model of a Game

•  Two decision makers –  Robot (has an acHon space: a) –  Adversary (has an acHon space: θ)

•  Cost or payoff (to use the term common in game theory) depends on acHons of both decision makers: R(a, θ) – denote as a matrix corresponding to product space

4/02/2014 11

This is the normal form – simultaneous choice over moves

A

RepresenHng Payoffs

In a general, bi-‐matrix, normal form game:

The combined acHons (a1, a2, …, an) form an ac#on profile a Є A

4/02/2014 12

Action sets of players Payoff function:

a.k.a. utility u2(a)

Example: Rock-‐Paper-‐Scissors

•  Famous children’s game •  Two players; Each player simultaneously picks an acHon which

is evaluated as follows, –  Rock beats Scissors –  Scissors beats Paper –  Paper beats Rock

4/02/2014 13

TCP Game

•  Imagine there are only two internet users: you and me •  Internet traffic is governed by TCP protocol, one feature of

which is the backoff mechanism: when network is congested then backoff and reduce transmission rates for a while

•  Imagine that there are two implementaHons: C (correct, does what is intended) and D (defecHve)

•  If you both adopt C, packet delay is 1 ms; if you both adopt D, packet delay is 3 ms

•  If one adopts C but other adopts D then D user gets no delay and C user suffers 4 ms delay

4/02/2014 14

TCP Game in Normal Form

4/02/2014 15

Note that this is another way of writing a bi-matrix game: First number represents payoff of row player and second number is payoff for column player

Some Famous Matrix Examples -‐ What are they Capturing?

•  Prisoner’s Dilemma: Cooperate or Defect (same as TCP game)

•  Bach or Stravinsky (von Neumann called it BaLle of the Sexes)

•  Matching Pennies: Try to get the same outcome, Heads/Tails

4/02/2014 16

Different CategorizaHon: Common Payoff

A common-‐payoff game is a game in which for all ac>on profiles a ∈ A1 ×·∙ ·∙ ·∙× An and any pair of agents i, j , it is the case that ui(a) = uj(a)

4/02/2014 17

Pure coordination: e.g., driving on a side of the road

Different CategorizaHon: Constant Sum

A two-‐player normal-‐form game is constant-‐sum if there exists a constant c such that for each strategy profile a ∈ A1 × A2 it is the case that u1(a) + u2(a) = c

4/02/2014 18

Pure competition: One player wants to coordinate Other player does not!

What Can Players Do?

4/02/2014 19

Strategies

4/02/2014 20

Expected utility

SoluHon Concepts

Many ways of describing what one ought to do: –  Dominance – Minimax –  Pareto Efficiency –  Nash Equilibria –  Correlated Equilibria

Remember that in the end game theory aspires to predict behaviour given specificaHon of the game.

Norma>vely, a soluHon concept is a ra>onale for behaviour

4/02/2014 21

Concept: Dominance

4/02/2014 22

Concept: Iterated Dominance

4/02/2014 23

Concept: Minimax

4/02/2014 24

Minimax

4/02/2014 25

CompuHng Minimax: Linear Programming

4/02/2014 26 We will look at this LP formulation in Tutorial 1, along with MDPs.

Pick-‐a-‐Hand •  There are two players: chooser (player I) & hider (player II)

•  The hider has two gold coins in his back pocket. At the beginning of a turn, he puts his hands behind his back and either takes out one coin and holds it in his le( hand, or takes out both and holds them in his right hand.

•  The chooser picks a hand and wins any coins the hider has hidden there.

•  She may get nothing (if the hand is empty), or she might win one coin, or two.

4/02/2014 27

Pick-‐a-‐Hand, Normal Form:

•  Hider could minimize losses by placing 1 coin in le( hand, most he can lose is 1

•  If chooser can figure out hider’s plan, he will surely lose that 1

•  If hider thinks chooser might strategise, he has incenHve to play R2, …

•  All hider can guarantee is max loss of 1 coin

4/02/2014 28

•  Similarly, chooser might try to maximise gain, picking R

•  However, if hider strategizes, chooser ends up with zero

•  So, chooser can’t actually guarantee winning anything

Pick-‐a-‐Hand, with Mixed Strategies

•  Suppose that chooser decides to choose R with probability p and L with probability 1 − p

•  If hider were to play pure strategy R2 his expected loss would be 2p

•  If he were to play L1, expected loss is 1 − p

•  Chooser maximizes her gains by choosing p so as to maximize min{2p, 1 − p}

•  Thus, by choosing R with probability 1/3 and L with probability 2/3, chooser assures expected payoff of 2/3, regardless of whether hider knows her strategy

4/02/2014 29

Mixed Strategy for the Hider

•  Hider will play R2 with some probability q and L1 with probability 1−q

•  The payoff for chooser is 2q if she picks R, and 1 − q if she picks L

•  If she knows q, she will choose the strategy corresponding to the maximum of the two values.

•  If hider, knows chooser’s plan, he will choose q = 1/3 to minimize this maximum, guaranteeing that his expected payout is 2/3 (because 2/3 = 2q = 1 − q)

•  Chooser can assure expected gain of 2/3, hider can assure an expected loss of no more than 2/3, regardless of what either knows of the other’s strategy.

4/02/2014 30

Safety Value as IncenHve

•  Clearly, without some extra incenHve, it is not in hider’s interest to play Pick-‐a-‐hand because he can only lose by playing.

•  Thus, we can imagine that chooser pays hider to enHce him into joining the game.

•  2/3 is the maximum amount that chooser should pay him in order to gain his parHcipaHon.

4/02/2014 31

Another Game

Mixed strategies: •  Suppose player I plays T

with probability p and B with probability 1−p

•  Player II plays L with probability q and R with probability 1 − q

•  For player I, expected payoff is 2(1 − q) for playing pure strategy T; 4q + 1 for playing pure strategy B.

•  If she knows q, she’ll pick the strategy corresponding to max{2(1 − q), 4q + 1}

•  Player II can choose q = 1/6 so as to minimize this maximum, and expected amount player II will pay player I is 5/3.

4/02/2014 32

The Game Analysed Graphically

If pl. I knows q, she’ll pick strategy based on max{2(1 − q), 4q + 1}. Player II can choose q = 1/6 so as to minimize this maximum. Expected amount player II will pay player I is 5/3.

For pl. II, expected loss is 5(1 − p) if he plays pure strategy L and 1+p if he plays pure strategy R; he will aim to minimize this expected payout. In order to maximize this minimum, player I will choose p = 2/3, yielding expected gain 5/3.

4/02/2014 33

General FormulaHon

The mixed strategies can be represented as follows, where xi are pure strategies of player I and yi are pure strategies of player II

4/02/2014 34

CondiHon for OpHmality

•  Mixed strategy is opHmal for player I if:

•  Similarly, mixed strategy is opHmal for player II if:

4/02/2014 35

A General Result

4/02/2014 36

Equilibrium as a Saddle Point

4/02/2014 37

Concept: Pareto Efficiency

4/02/2014 38

Pareto Efficiency

4/02/2014 39

Concept: Nash Equilibrium

4/02/2014 40

Nash Equilibrium

4/02/2014 41

Nash Equilibrium -‐ Example

4/02/2014 42

Nash Equilibrium -‐ Example

4/02/2014 43

Concept: Correlated Equilibrium

4/02/2014 44

Correlated Equilibrium

4/02/2014 45

Correlated Equilibrium

4/02/2014 46

StochasHc Games

4/02/2014 47

StochasHc Game -‐ Setup

4/02/2014 48

StochasHc Game -‐ Policies

4/02/2014 49

StochasHc Game -‐ Example

4/02/2014 50

StochasHc Game -‐ Remarks

4/02/2014 51

Nash Equilibria – StochasHc Game

4/02/2014 52

Nash Equilibria – StochasHc Game

4/02/2014 53

Incomplete InformaHon

•  So far, we assumed that everything relevant about the game being played is common knowledge to all the players: –  the number of players –  the acHons available to each –  the payoff vector associated with each acHon vector

•  True even for imperfect-‐informaHon games –  The actual moves aren’t common knowledge, but the game is

•  Next, consider games of incomplete (not imperfect) informa#on –  Players are uncertain about the game being played

4/02/2014 54

Bayesian Games

•  We know the set G of all possible games, but do not know anything about which game in G –  Enough informaHon to put a probability distribuHon over games

•  A Bayesian Game is a class of games G that saHsfies two fundamental condiHons

•  Condi#on 1: –  The games in G have the same number of agents, and the same strategy space (set of possible strategies) for each agent. The only difference is in the payoffs of the strategies.

•  This condiHon isn’t very restricHve –  Other types of uncertainty can be reduced to the above, by reformulaHng the problem

4/02/2014 55

An Example •  Suppose we do not know whether player 2 only has strategies L and

R, or also an addiHonal strategy C:

•  If player 2 does not have strategy C, this is equivalent to having a

strategy C that is strictly dominated by other strategies: –  Nash equilibria for G1' are the same as for G1

•  Problem is reduced to whether C’s payoffs are those of G1' or G2 4/02/2014 56

Bayesian Games

Condi#on 2 (common prior): –  The probability distribuHon over the games in G is common

knowledge (i.e., known to all the agents) •  So a Bayesian game defines

–  the uncertainHes of agents about the game being played, –  what each agent believes the other agents believe about the game

being played •  The beliefs of the different agents are posterior probabiliHes

–  Combine the common prior distribuHon with individual “private signals” (what is “revealed” to the individual players)

•  The common-‐prior assumpHon rules out whole families of games –  But it greatly simplifies the theory, so most work in game theory uses it

4/02/2014 57

The Bayesian Game Model

A Bayesian game consists of •  a set of games that differ only in their payoffs •  a common (known to all players) prior distribuHon over them •  for each agent, a parHHon structure (set of informaHon sets)

over the games

4/02/2014 58

Bayesian Game: InformaHon Sets Defn.

A Bayesian game is a 4-‐tuple (N,G,P,I)

•  N is a set of agents •  G is a set of N-‐agent games •  For every agent i, every game in

G has the same strategy space •  P is a common prior over G

–  common: common knowledge (known to all the agents)

–  prior: probability before learning any addiHonal informaHon

•  I = (I1, …, IN) is a tuple of parHHons of G, one for each agent (informaHon sets)

4/02/2014 59

Example

•  Suppose the randomly chosen game is MP

•  Agent 1’s informaHon set is I1,1 –  1 knows it is MP or PD –  1 can infer posterior

probabili2es for each

•  Agent 2’s informaHon set is I2,1

4/02/2014 60

Another InterpretaHon: Extensive Form

4/02/2014 61

Epistemic Types

•  We can assume the only thing players are uncertain about is the game’s uHlity funcHon

•  Thus we can define uncertainty directly over a game’s uHlity funcHon

•  All this is common knowledge; each agent knows its own type 4/02/2014 62

Types

An agent’s type consists of all the informaHon it has that isn’t common knowledge, e.g.,

•  The agent’s actual payoff funcHon •  The agent’s beliefs about other agents’ payoffs, •  The agent’s beliefs about their beliefs about his own payoff •  Any other higher-‐order beliefs

4/02/2014 63

Strategies

Similar to what we had in imperfect-‐informaHon games: •  A pure strategy for player i maps each of i’s types to an acHon

–  what i would play if i had that type •  A mixed strategy si is a probability distribuHon over pure strategies

si(ai|θj) = Pr[i plays action aj | i’s type is θj] •  Many kinds of expected uHlity: ex post, ex interim, and ex

ante –  Depend on what we know about the players’ types

4/02/2014 64

Expected UHlity

4/02/2014 65

Bayes-‐Nash Equilibrium

4/02/2014 66

CompuHng Bayes-‐Nash Equilibria

•  The idea is to construct a payoff matrix for the enHre Bayesian game, and find equilibria on that matrix

Write each of the pure strategies as a list of acHons, one for each type:

4/02/2014 67


Compute ex ante expected uHlity for each pure-‐strategy profile:

4/02/2014 68


4/02/2014 69

CompuHng Bayes Nash Equilibria

4/02/2014 70 and so on…

Acknowledgements

Many slides are adapted from the following sources: •  Tutorial at IJCAI 2003 by Prof Peter Stone, University of Texas •  Y. Peres, Game Theory, Alive (Lecture Notes) •  Game Theory lectures by Prof. Dana Nau, University of

Maryland

4/02/2014 71

Documents

Decision(Making(Basic&Constructs&of&Game&(Extensive&Form)& • Move:&A&pointof&decision&for&the&player,&deﬁned&by&asetof& acHons&thatcould&be&chosen&(e.g.,&Iam&holding&King;10;2