AM 121: Intro to Optimization Models and Methodssites.fas.harvard.edu/~apm121/lectures/lec11-w.pdfAM 121: Intro to Optimization Models and Methods Yiling Chen SEAS Lecture 11: Game

AM 121: Intro to OptimizationModels and Methods

Yiling ChenSEAS

Lecture 11: Game theory

Lesson Plan

Two player, zero-sum games4 techniques to solve such gamesThe minimax theorem and Nash equilibriumSolving poker (for those interested)

Very elegant connection between game theory and duality theory!

Example: Political CampaignTwo days, two cities. Available strategies:

one day in each city (E)two days in city Atwo days in city B

Payoff for Row player:

payoff to row player)

E A BE 1, -1 2, -2 4, -4A 1, -1 0, 0 5, -5B 0, 0 1, -1 -1, 1

row

column

Example: Political CampaignTwo days, two cities. Available strategies:

one day in each city (E)two days in city Atwo days in city B

Payoff for Row player:

payoff to row player)

E A BE 1 2 4A 1 0 5B 0 1 -1

row

column

The family of games we consider

Two player, zero sum games

Denote entry aij!R in payoff table, the payoff to row when row plays i and column plays j

r(i,j) = aij, c(i,j) = - aij

" c(i,j) + r

Goal: compute a solution to the game that provides an optimal strategy for each player

Solution Concept: Nash Equilibrium

Roughly speaking, a strategy profile is a NE

I. Solving via Iterated removal of strictly dominated strategiesRow: Strategy i is strictly dominated by strategy

r # r(i,j) for all j!r r(i,j) for some j!

Column: Strategy j is strictly dominated by when

c # c(i,j) for all i!c c(i,j) for some i!

Can apply iteratively: iterated removal of strictly dominated strategies

Two days, two cities. Available strategies:one day in each city (E)two days in city Atwo days in city B

Can solve by iterated removal of strictly dominated strategies:

Example: Political Campaign

E A BE 1 2 4A 1 0 5B 0 1 -1

Say game has value 1. (Payoff to row player in solution.)Solution is (E,E)




E A BE 1 2 4A 1 0 5B 0 1 -1 1





E A BE 1 2 4A 1 0 5B 0 1 -1 1

2





E A BE 1 2 4A 1 0 5B 0 1 -1 1

2

3

4


Political Campaign: Variation 1No dominated strategies!

E A BE -3 -2 6A 2 0 2B 5 -2 -4

Political Campaign: Variation 1-

E A BE -3 -2 6A 2 0 2B 5 -2 -4

The arrow from (E,E) to (B,E) indicates that (B,E) has more payoff for row than (E,E) and that row would deviate and play (E,E) if it knew column was playing E.

Not all arrows are shown! (e.g., also one from (A,E) to (B,E) not shown.) Unique stable solution is (A,A).

Political Campaign: Variation 1

E A BE -3 -2 6A 2 0 2B 5 -2 -4

min value

max value 5 0 6

-3

0-4

worst case payoff row canexpect to achieve when plays each strategy

worst case payoff columncan expect to achieve when plays each strategy

maximinstrategy

minimaxstrategy

saddlepoint

II. Solving via Saddle pointsMinimax criterion:Row: maxi minj r(i,j) Column: minj maxi r(i,j) (= -maxj mini c(i,j))

saddle point

This is why the maximin and minimax strategies form a stable point and solve this game. Even if column knows row is playing A, column does not want to deviate (and vice versa)

Political Campaign: Variation 2Strategy B for row is dominated by strategy A.Apply minimax criterion?

E A BE 0 -2 2A 3 4 -3B 2 3 -4

Political Campaign: Variation 2Strategy B for row is dominated by strategy A. Apply minimax criterion

E A BE 0 -2 2A 3 4 -3B 2 3 -4

But (E,B) not a saddle point."$column can deviate and play A, then row can

-2

-3

243

minimax strategyfor column

maximinstrategy forrow

not a saddlepoint

Political Campaign: Variation 2Consider best-response dynamics:

E A BE 0 -2 2A 3 4 -3B 2 3 -4

No stable solution! If one player knows the other

deviation.

Simpler Example: Odds and Evens

Two players (Odd and Even) simultaneously choose the number of fingers (1 or 2) to put out. If the sum is odd, then Odd wins $1 from Even. If the sum is even, Even wins $1 from Odd.

one twoone -1 +1

two +1 -1Odd

Even

No stable solution. How do you play this game?

Mixed strategiesLet x=(x1 m), xi#0, ixi=1 denote a mixed strategy of row player; the probability xi with which row plays each strategy i!Define a maximin strategy x* as solving

maxx minj! i=1 xiaij

Find the mixed strategy x that maximizes my worst-case payoff (given that column knows what I will do)

m

m

one two

one -1 +1

two +1 -1Odd

Even

0 1!

1

-1

EV when column

EV when column

Expectedvalue (EV)row player

x1

Graphical method:plot expected value to row for each pure strategy of column as mixed strategy of row varies

Lower envelope is the minimal payoff rowwould receive for each strategy (x1, 1-x1)

maximinfor row

Mixed strategiesLet x=(x1 m), xi#0, ixi=1 denote a mixed strategy of row player; the probability xi with which row plays each strategy i!Let y=(y1 n), yj#0, jyj=1 denote a mixed strategy of column player Define a maximin strategy x* as solving

maxx minj! i=1 xiaij

Define a minimax strategy y* as solving miny maxi!{1,...,m} j=1 yjaij

m

n

m

n

one two

one -1 +1

two +1 -1Odd

Even

0 1!

1

-1

EV when row

EV when row

EV torow player

y1

Graphical method:plot expected value to column for each pure strategy of row as mixed strategy of column varies

Upper envelope is the r (= - c) for each strategy (y1, 1-y1) by column

minimaxfor column

Mixed strategies

x*=(!, !) and y*=(!, !) as solutionsNote: the value of the maximin strategy to row = the value of the minimax strategy to column = 0Also a stable solution: if column plays y* then row does not wish to deviate from x*, and vice versa Consider this a solution to the game

one two

one -1 +1

two +1 -1Odd

Even

III. Solving via the Graphical methodCan use in a game with only two pure strategies for each player the graphical method can be usedRow: vary x1 and plot EV to row for each possible strategy j! -envelope and find x*

1 that maximizes.Column: vary y1 and plot EV to row for each possible strategy i! -envelope and find y1

* that minimizes.

Can also use graphical method as long as one of the players has only two pure strategies; can solve for other player algebraically.

IV. Solving via LP

Computing maximin strategy via LPmaxx minj i=1aij xi

s.t. i=1xi = 1xi # 0

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3

m

m

Computing maximin strategy via LPmaxx minj i=1aij xi

s.t. i=1xi = 1xi # 0

max vs.t. v % i=1 aij xi, & j!

i=1xi = 1xi # 0, & i!v free

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3

m

m

m

m

Example: Political campaignmax vs.t. v % i=1aij xi, & j!

i=1xi = 1xi # 0, & i!v free

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3

max vs.t. -3x2 + v % 0

2x1 4x2 + v % 0-2x1 + 3x2 + v % 0

x1 + x2 = 1x1, x2 # 0v free

Solution:x*=(7/11,4/11) v*=2/11

m

m

Computing minimax strategy via LPminy maxi j=1 yj aij

s.t. j=1yj = 1yj # 0

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3n

n

Computing minimax strategy via LPminy maxi j=1 yj aij

s.t. j=1yj = 1yj # 0

min ws.t. w # j=1 aij yj & i!

j=1 yj = 1yj # 0, & j!

w free

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3n

n

n

n

Example: Political campaignmin ws.t. w # j=1 aij yj & i!

j=1 yj = 1yj # 0, & j!

w free

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3n

n

min ws.t. 2y2 2y3 + w # 0

3y1 4y2 + 3y3 + w # 0y1 +y2 +y3 = 1y1, y2, y3 # 0

w free

Solution:y*=(0,5/11,6/11) w*=2/11

Minimax TheoremTheorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game).Definition. Strategies (x,y) are a Nash equilibrium of a game if r(x,y)# r x and

c(x,y)# c yTheorem. Nash (1950). Existence of NE in all matrix-form games.

Given (x,y), prob. play (i,j) is xiyj, so expected payoff r(x,y)= i jxi aij yj; also have c(x,y)=- r(x,y)

(von Neumann 1928)

Example: Political campaignmax vs.t. -3x2 + v % 0

2x1 4x2 + v % 0-2x1 + 3x2 + v % 0

x1 + x2 = 1x1, x2 # 0v free

E A BE 0 -2 2A 3 4 -3

x1

x2

y1 y2 y3

min ws.t. 2y2 2y3 + w # 0

3y1 4y2 +3y3 + w # 0y1 +y2 +y3 = 1y1, y2, y3 # 0

w free

minimax problem is the dualof the maximin problem.both feasible (easy to see).duality theorem "must have the same value!w*=v*

max 0x1 + 0x2 + vs.t. a11x1 a21x2 + v % 0

a12x1 a22x2 + v % 0x1 + x2 = 1x1, x2 # 0

v free

a11 a12

a21 a22

x1

x2

y1 y2

min 0y1 + 0y2 + ws.t. a11y1 + a12y2 + w # 0

a21y1 a22y2 + w # 0y1 + y2 = 1y1, y2 # 0

w free

max (0 1) xv

s.t. -AT 1 x % 01 0 v = 1x # 0, v free

min (0 1) yw

s.t. -A 1 y # 01 0 w = 1y # 0, w free

Minimax TheoremTheorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game).

" still need to show that this is a Nash equilibrium

(von Neumann 1928)

max vs.t. v % i=1 aij xi, & j

i=1xi = 1xi # 0, & iv free

min ws.t. w # j=1 aij yj & i

j=1 yj = 1yj # 0, & jw free

a11 a12

a21 a22

x1

x2

y1 y2

Complementary slackness: y*j s*

j=0, & j" j y*

j ( i aijxi*-v*) = 0

" j i y*j aij xi

* = j y*j v* = v*

Shows that v*= i j x*i aij y*

j = expected payoff to row given (x*,y*)

max vs.t. v % i=1 aij xi, & j

i=1xi = 1xi # 0, & iv free

min ws.t. w # j=1 aij yj & i

j=1 yj = 1yj # 0, & j

w free

a11 a12

a21 a22

x1

x2

y1 y2

i j xiaijyj*% ixiw*= w*= v* = jyjv*% i jx*

iaijyj

multiply ith dual constraint by xi and sum

multiply jth primal constraint by yj and sumstrong duality

x* is best-response forrow player (recall thatv*= i j x*

i aij y*j )

y* is best-response forcolumn player (recall thatw*= i j x*

i aij y*j )

Minimax TheoremTheorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game).

Useful: shows that can solve two-player, zero sum games via linear programming!

(von Neumann 1928)

Example: Tic-tac-toeX

Model as an extensive-form game:

corner

center

side

1 2

1

1

2

2

1

1

1

corner

side

Hint: what is a (pure) strategy i!

-constructing all possible strategiesStrategy == complete description of how to play in all possible states of the game

Given this, possible to construct the payoff matrix

Example: Poker

goes first. Goal is to win back the kitty.Each dealt one card. Game stops when bet followed

In first two, winner decided by comparing cards. In third, player with bet wins.

Possible plays:A pass, B pass: $1 to holder of higher cardA pass, B bet, A pass: $1 to BA pass, B bet, A bet: $2 to holder of higher cardA bet, B pass: $1 to AA bet, B bet: $2 to holder of higher card

(Kuhn, 1950)

What are the strategies?1. Pass. If B bets, pass again.2. Pass. If B bets ,bet.3. Bet.

1. Pass no matter what.2. If A passes, pass. If A bets, bet.3. If A passes, bet. If A bets, pass.4. Bet no matter what.

Pure strategy = statement about line for each possible card that player is dealt. Defined by triples (a1,a2,a3) and (b1,b2,b3)

where ai is the line row player will use when have card i and bi is line column player will use when has card i.

E.g., (3,1,2) for player A and (3,2,4) for player B

How to determine the payoff aij?E.g., (3,1,2) for player A and (3,2,4) for player BSix ways in which cards can be dealt

card dealt betting session payment A to B

A B1 2 A bets, B bets 2

1 3 A bets, B bets 2

2 1 A passes, B bets, A passes

1

2 3 A passes, B bets, A passes

1

3 1 A passes, B bets, A bets -23 2 A passes, B passes -1

Average payment = (2+2+1+1-2-1)/6= !

How many combinations are there?

Player A has 3 x 3 x 3 = 27 pure strategiesPlayer B has 4 x 4 x 4 = 64 pure strategiesHence there are 27 x 64 = 1728 pairsCan first apply iterated elimination of strictly dominated strategies.

Eliminating strictly dominated strategies

1. Pass. If B bets, pass again.2. Pass. If B bets ,bet.3. Bet.


Player holding 1 should never answer a bet with a bet, since the player will lose regardless and will lose less by passing.

Prune: (2,a2,a3) Prune (2,b2,b3), (4,b2,b3)Player holding 3 should never answer a bet with a pass, since by passing the player will lose but by betting the player will win.

Prune: (a1,a2,1) Prune (b1,b2,1), (b1,b2,3)Player holding 3 should always answer a pass with a bet, since in either case will win but answering with a bet opens possibility that opponent will bet and increases size of win

Prune: (b1,b2,2)



Player A has 2 x 3 x 3 = 12 pure strategies and player B has 2 x 4 x 1 = 8 pure strategies. Dropped to 96 pairsWhen holding a 2, player A should not play line 3. Prune (a1,3,a3). Player B has either a 1 (in which case plays lines 1

better for A to pass in first round.When holding a 2, player B should not play lines 3 or 4. Prune (b1,3,b3), and (b1,4,b3)Player A has 2 x 2 x 2 = 8 pure strategies and player B has 2 x 2 x 1 = 4 pure strategies. Dropped to 32 pairs.

Construct the payoff matrix(1,1,4) (1,2,4) (3,1,4) (3,2,4)

(1,1,2) 1/6 1/6

(1,1,3) -1/6 1/3 1/6

(1,2,2) 1/6 1/6 -1/6 -1/6

(1,2,3) 1/6 -1/6

(3,1,2) -1/6 1/3 1/2

(3,1,3) -1/6 1/6 1/6 1/2

(3,2,2) 1/2 -1/3 1/6

(3,2,3) 1/3 -1/6 1/6



x*=(1/2, 0, 0, 1/3, 0, 0, 0, 1/6) y* = (2/3, 0, 0, 1/3)Player A:

when holding 1, mix lines 1 and 3 in 5:1 proportionwhen holding 2, mix lines 1 and 2 in 1:1 proportionwhen holding 3, mix lines 2 and 3 in 1:1 proportion

Player B:when holding 1, mix lines 1 and 3 in 2:1 proportionwhen holding 2, mix lines 1 and 2 in 2:1 proportionwhen holding 3, use line 4.

Players sometimes bluff and sometimes underbid!

{(1,1,2),(1,1,3),(1,2,2),(1,2,3),(3,1,2),(3,1,3),(3,2,2),(3,2,3)}

{(1,1,4),(1,2,4),(3,1,4),(3,2,4)}

remaining strategies

Documents

AM 121: Intro to Optimization Models and Methodssites.fas.harvard.edu/~apm121/lectures/lec11-w.pdfAM 121: Intro to Optimization Models and Methods Yiling Chen SEAS Lecture 11: Game