View
218
Download
0
Category
Preview:
Citation preview
How to Win Texas Hold’em Poker
Richard Mealing
Machine Learning and Optimisation GroupSchool of Computer Science
University of Manchester
1 / 44
How to Play Texas Hold’em Poker
1 Deal 2 private cards per player
2 1st (sequential) betting round
3 Deal 3 shared cards (“flop”)
4 2nd betting round
5 Deal 1 shared card (“turn”)
6 3rd betting round
7 Deal 1 shared card (“river”)
8 4th (final) betting round
If all but 1 player folds, that player wins the pot (total bet)
Otherwise at the end of the game hands are compared (“showdown”)and the player with the best hand wins the pot
2 / 44
How to Play Texas Hold’em Poker
3 / 44
How to Play Texas Hold’em Poker
Ante = forced bet (everyone pays)
Blinds = forced bets (2 people pay big/small)
If players > 2 then (big blind player, small blind player, dealer)If players = 2 (“heads-up”) then (big blind, small blind/dealer)
No-Limit Texas Hold’em lets you bet all your money in a round
Minimum bet = big blindMaximum bet = all your money
Limit Texas Hold’em Poker has fixed betting limits
A $4/$8 game means in betting rounds 1 & 2 bets = $4 and in bettingrounds 3 & 4 bets = $8Big blind usually equals “small” bet e.g. $4 and small blind is usually50% of big blind e.g. $2Total number of raises per betting round is usually capped at 4 or 5
4 / 44
1-Card Poker Trees
1 Game tree - both players’ private cards are known
5 / 44
1-Card Poker Trees
1 Public tree - both players’ private cards are hidden
6 / 44
1-Card Poker Trees
1 P1 information set tree - P2’s private card is hidden
7 / 44
1-Card Poker Trees
1 P2 information set tree - P1’s private card is hidden
8 / 44
1-Card Poker Trees
1 Game tree - both players’ private cards are known
2 Public tree - both players’ private cards are hidden
3 P1 information set tree - P2’s private card is hidden
4 P2 information set tree - P1’s private card is hidden
9 / 44
Heads-Up Limit Texas Hold’em Poker Tree Size
F
F
©
C
F
©
C
F
©
C
F
©
C
F
©
C
R
R
R
R
C
F
©
C
F
©
C
F
©
C
F
©
C
R
R
R
R
Cards Dealt
P1 dealt 2 private cards =(52
2
)= 1326
P2 dealt 2 private cards =(50
2
)= 1225
1st betting round = 29, 9 continuing
Flop dealt =(48
3
)= 17296
2nd betting round = 29, 9 continuing
Turn dealt = 45
3rd betting round = 29, 9 continuing
River dealt = 44
4th betting round = 29
10 / 44
Heads-Up Limit Texas Hold’em Poker Tree Size
Player 1 Deal = 1
Player 2 Deal = 1326
1st Betting Round = 1326 * 1225 * 29
2nd Betting Round = 1326 * 1225 * 9 * 17296 * 29
3rd Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 29
4th Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 9 * 44 * 29
Total = 1.179× 1018 (quintillion)
11 / 44
Abstraction
Lossless
Suit isomorphism, at the start (pre-flop) two hands are strategically thesame if each of their cards’ ranks match and they are both “suited” or“off-suit” e.g. (A♠K♠, A♣K♣) or (T♣J♠, T♦J♥), 169 equivalenceclasses reduces possible starting hands from 1624350 to 28561
Lossy
Bucketing (binning) groups hands into equivalence classes e.g. basedon their probability of winning at showdown against a random handImperfect recall eliminates past informationBetting round reductionBetting round elimination
12 / 44
Abstraction
Heads-up Limit Texas Hold’em poker has around 1018 states
Abstraction can reduce the game to e.g. 107 states
Nesterov’s excessive gap technique can find approximate Nashequilibria in a game with 1010 states
Counterfactual regret minimization can find approximate Nashequilibria in a game with 1012 states
13 / 44
Nash Equilibrium
Game theoretic solution
Set of strategies 1 per player such that no one can do better bychanging their strategy if the others keep their strategies fixed
Nash proved that in every game with finite players and pure strategiesthere is at least 1 (possibly mixed) Nash equilibrium
14 / 44
Annual Computer Poker Competition 2012Heads-up Limit Texas Hold’em
Total Bankroll:1 Slumbot (Eric Jackson, USA)
2 Little Rock (Rod Byrnes, Australia) and Zbot (Ilkka Rajala, Finland)
Bankroll Instant Run-off:1 Slumbot (Eric Jackson, USA)
2 Hyperborean (University of Alberta, Canada)
3 Zbot (Ilkka Rajala, Finland)
Heads-up No-Limit Texas Hold’emTotal Bankroll:
1 Little Rock (Rod Byrnes, Australia)
2 Hyperborean (University of Alberta, Canada)
3 Tartanian5 (Carnegie Mellon University, USA)
Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)
2 Tartanian5 (Carnegie Mellon University, USA)
3 Neo Poker Bot (Alexander Lee, Spain)
3-player Limit Texas Hold’emTotal Bankroll:
1 Hyperborean (University of Alberta, Canada)
2 Little Rock (Rod Byrnes, Australia)
3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)
Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)
2 Little Rock (Rod Byrnes, Australia)
3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)
Source: http://www.computerpokercompetition.org/index.php/competitions/results/90-2012-results
15 / 44
Annual Computer Poker Competition
Total Bankroll = total money won against all agents
Bankroll Instant Run-off1 Set S = all agents2 Set N = agents in a game3 Play every
(|S||N|)
possible matches between agents in S storing each
agent’s total bankroll4 Remove the agent(s) with the lowest total bankroll from S5 Repeat steps 2 and 3 until S only contains |N| agents6 Play a match between the last |N| agents and rank them according to
their total bankroll in this game
16 / 44
Extensive-Form Game
A finite set of players N = {1, 2, ..., |N|} ∪ {c}A finite set of action sequences or histories e.g.H = {(), ..., (A♥A♠), ...}Z ⊆ H terminal histories e.g. Z = {..., (A♥A♠, 2♦7♣, r ,F ), ...}A(h) = {a : (h, a) ∈ H} actions available after history h ∈ H\ZP(h) ∈ N ∪ {c} player who takes an action after history h ∈ H\Zui : Z → R utility function for player i
17 / 44
Extensive-Form Game
fc maps every history h where P(h) = c to an independent probabilitydistribution fc(a|h) for all a ∈ A(h)
Ii is an information partition (set of nonempty subsets of X whereeach element of X is in 1 subset) for player i
Ij ∈ Ii is player i ’s jth information set containing indistinguishablehistories e.g. Ij = {..., (A♥A♠, 2♦7♣), ..., (A♥A♠, 6♣3♠), ...}Player i ’s strategy σi is a function that assigns a distribution overA(Ij) for all Ij ∈ Ii where A(Ij) = A(h) for any h ∈ Ij
A strategy profile σ is a strategy for each player σ = {σ1, σ2, ..., σ|N|}
18 / 44
Nash Equilibrium
Nash Equilibrium:u1(σ) ≥ maxσ′
1∈Σ1u1(σ′1, σ2)
u2(σ) ≥ maxσ′2∈Σ2
u2(σ1, σ′2)
ε-Nash Equilibrium:u1(σ) + ε ≥ maxσ′
1∈Σ1u1(σ′1, σ2)
u2(σ) + ε ≥ maxσ′2∈Σ2
u2(σ1, σ′2)
19 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
20 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
21 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
22 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
23 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
24 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
25 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
26 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
27 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
28 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1
29 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
30 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
31 / 44
Extensive-Form Game
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4
32 / 44
Counterfactual Regret Minimization
Counterfactual regret minimization minimizes the maximumcounterfactual regret (over all actions) at every information set
Minimizing counterfactual regrets minimizes overall regret
In a two-player zero-sum game at time T , if both players’ averageoverall regret is less than ε, then σ̄T is a 2ε Nash equilibrium.
33 / 44
Counterfactual Regret Minimization
Counterfactual Value
vi (Ij |σ) =∑n∈Ij
πσ−i (root, n)ui (n)
ui (n) =∑
z∈Z [n]
πσ(n, z)ui (z)
vi (Ij |σ) is the counterfactual value to player i of information set Ijgiven strategy profile σ
πσ−i (root, n) is the probability of reaching node n from the rootignoring player i ’s contributions according to strategy profile σ
πσ(n, z) is the probability of reaching node z from node n accordingto strategy profile σ
ui (n) is the payoff to player i at node n if it is a leaf node or itsexpected payoff if it is a non-leaf node
Z [n] is the set of terminal nodes that can be reached from node n
34 / 44
Counterfactual Regret Minimization
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
0.0F
2
1.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
0.0F
0
1.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
v1(I8|σ) =∑n∈I8
πσ−i (root, n)u1(n)
= 0.5 ∗ 0.5 ∗ 0.2 ∗ (0.0 ∗ −1 + 1.0 ∗ 2) +
0.5 ∗ 0.5 ∗ 0.9 ∗ (0.0 ∗ −1 + 1.0 ∗ 0)
= 0.1
35 / 44
Counterfactual Regret Minimization
Counterfactual Regret
r(Ij , a) = vi (Ij |σIj→a)− vi (Ij |σ)
r(Ij , a) is the counterfactual regret of not playing action a atinformation set Ij
Positive regret means the player would have preferred to play action arather than their strategy
Zero regret means the player was indifferent between their strategyand action a
Negative regret means the player preferred their strategy rather thanplaying action a
36 / 44
Counterfactual Regret Minimization
I1
I3
0
0.8C
I7
-1
1.0F
0
0.0C
0.2R
0.6C
I4
1
1.0F
0
0.0C
0.4R
0.5J
I1
I5
-1
0.1C
I7
-1
1.0F
-2
0.0C
0.9R
0.6C
I6
1
0.0F
-2
1.0C
0.4R
0.5K
0.5J
I2
I3
1
0.8C
I8
-1
1.0F
2
0.0C
0.2R
0.3C
I4
1
1.0F
2
0.0C
0.7R
0.5J
I2
I5
0
0.1C
I8
-1
1.0F
0
0.0C
0.9R
0.3C
I6
1
0.0F
0
1.0C
0.7R
0.5K
0.5K
v1(I8|σI8→F ) = 0.5 ∗ 0.5 ∗ 0.2 ∗ (1.0 ∗ −1 + 0.0 ∗ 2) +
0.5 ∗ 0.5 ∗ 0.9 ∗ (1.0 ∗ −1 + 0.0 ∗ 0)
= −0.275
r1(I8|F ) = v1(I8|σI8→F )− v1(I8|σ) = −0.275− 0.1 = −0.375
37 / 44
Counterfactual Regret Minimization
Cumulative Counterfactual Regret
RT (Ij , a) =T∑t=1
r t(Ij , a)
RT (Ij , a) is the cumulative counterfactual regret of not playing actiona at information set Ij for T time steps
Positive cumulative regret means the player would have preferred toplay action a rather than their strategy over those T steps
Zero cumulative regret means the player was indifferent between theirstrategy and action a over those T steps
Negative cumulative regret means the player preferred their strategyrather than playing action a over those T steps
38 / 44
Counterfactual Regret Minimization
Regret Matching
σT+1(Ij , a) =
RT ,+(Ij ,a)∑
a′∈A(Ij ) RT ,+(Ij ,a′)
if denominator is positive
1
|A(Ij )| otherwise
RT ,+(Ij , a) = max(RT (Ij , a), 0)
39 / 44
Counterfactual Regret Minimization
1 Initialise the strategy profile σ e.g. for all i ∈ N, for all Ij ∈ Ii and forall a ∈ A(Ij) set σ(Ij , a) = 1
|A(Ij )|2 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) calculate
r(Ij , a) and add it to R(Ij , a)
3 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) use regretmatching to update σ(Ij , a)
4 Repeat from 2
40 / 44
Counterfactual Regret Minimization
Cumulative counterfactual regret is bounded by
RTi (Ij) ≤
(maxz ui (z)−minz ui (z))√|A(Ij)|√
T
Total counterfactual regret is bounded by
RTi ≤
|Ii |(maxz ui (z)−minz ui (z))√
maxh:P(h)=i |A(h)|√T
41 / 44
Counterfactual Regret Minimization
(a) Number of game states, number of iterations, computation time, andexploitability of the resulting strategy for different sized abstractions
(b) Convergence rates for three different sized abstractions, x-axis showsiterations divided by the number of information sets in the abstraction
Source: 2008 - “Regret Minimization in Games with Incomplete Information” - Zinkevich et al
42 / 44
Summary
If you want to win (in expectation) at Texas Hold’em poker (againstexploitable players) then. . .
1 Abstract the version of Texas Hold’em poker you are interested so ithas at most 1012 game states
2 Run the counterfactual minimization algorithm on the abstraction forT iterations and obtain the average strategy profile σ̄Tabs
3 Map the average strategy profile σ̄Tabs for the abstracted game to oneσ̄T for the real game
4 Play your average strategy profile σ̄T against your (exploitable)opponents
43 / 44
References
1 Annual Computer Poker Competition Websitehttp://www.computerpokercompetition.org/
2 2008 - “Regret Minimization in Games with Incomplete Information” -Zinkevich et al -http://martin.zinkevich.org/publications/regretpoker.pdf
3 2007 - “Robust strategies and counter-strategies Building a champion levelcomputer poker player” - Johanson -http://poker.cs.ualberta.ca/publications/johanson.msc.pdf
4 2013 - “Monte Carlo Sampling and Regret Minimization for EquilibriumComputation and Decision-Making in Large Extensive Form Games” -Lanctot http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae
44 / 44
Recommended