218
Game Playing State-of-the-Art http://bit.ly/3GEMok9

Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 2: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 3: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 4: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 5: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 6: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Playing State-of-the-Art

http://bit.ly/3GEMok9

Chinook beat 40-year-reign of champion MarionTinsley using complete 8-piece endgame.2007: Checkers solved!

Chess: 1997: Deep Blue defeats humanchampion Gary Kasparov in a six-game match.Deep Blue examined 200M positions persecond, used sophisticated evaluation andundisclosed methods for extending some lines ofsearch up to 40 ply. Current programs are evenbetter, if less historic.

Go: Human champions are being beaten. In go,b > 300! Classically use pattern knowledgebases, but big recent advances use Monte Carlo(randomized) expansion methods.

Pacman

Page 7: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Behavior from Computation

http://bit.ly/3GEMok9

Demo : mysterypacman(L6D1)

Page 8: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Behavior from Computation

http://bit.ly/3GEMok9

Demo : mysterypacman(L6D1)

Page 9: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Mystery Pacman

Page 10: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Games

http://bit.ly/3GEMok9

Page 11: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 12: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 13: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:

t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 14: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?

t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 15: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?

t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 16: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?

t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 17: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 18: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 19: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Types of Games

http://bit.ly/3GEMok9

Many different kinds of games!

Axes:t Deterministic or stochastic?t One, two, or more players?t Zero sum?t Perfect information (can you see the state)?

Want algorithms for calculating a strategy(policy) which recommends a move from eachstate.

Page 20: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 21: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)

t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 22: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N}

(usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 23: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).

t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 24: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)

t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 25: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.

t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 26: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}

t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 27: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 28: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 29: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Deterministic Games

http://bit.ly/3GEMok9

Many possible formalizations, one is:t States: S (start at s0)t Players: P = {1...N} (usually take turns).t Actions: A (may depend on player / state)t Transition Function: S×A→ S.t Terminal Test: S→{t , f}t Terminal Utilities: S×P→ R.

Solution for a player is a policy: S→ A.

Page 30: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 31: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 32: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)

t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 33: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizes

t Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 34: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 35: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)

t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 36: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossible

t More later on non-zero-sumgames

Page 37: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 38: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Zero-Sum Games

http://bit.ly/3GEMok9

Zero-Sum Gamest Agents have opposite utilities(values on outcomes)t Lets us think of a single valuethat one maximizes and the otherminimizest Adversarial, pure competition

General Gamest Agents have independentutilities (values on outcomes)t Cooperation, indifference,competition, and more are allpossiblet More later on non-zero-sumgames

Page 39: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search

Page 40: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Single-Agent Trees

8

2 0 · · · 2 6 · · · 4 6

Page 41: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 42: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 43: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 44: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 45: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 46: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 47: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 48: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 49: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 50: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 51: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Value of Game Tree

Value of a State: utility of best achievable outcome from state.

8

2 0 · · · 2 6 · · · 4 6

Terminal States:V (s) = known

Non terminal states:V (s) = maxs′∈kids(s) V (s′)

Page 52: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 53: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 54: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 55: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 56: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 57: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 58: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 59: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 60: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 61: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Game Tree

8

2 0 · · · 2 6 · · · 4 6

Page 62: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control. Max.

Terminal States.

States Under Opponent’s Control. Min.

Page 63: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control.

Max.

Terminal States.

States Under Opponent’s Control. Min.

Page 64: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control. Max.

Terminal States.

States Under Opponent’s Control. Min.

Page 65: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control. Max.

Terminal States.

States Under Opponent’s Control. Min.

Page 66: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control. Max.

Terminal States.

States Under Opponent’s Control.

Min.

Page 67: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Values

+8 −10 −5 −8

Terminal States: V (s) = known

Max states: V (s) = maxs′∈succ(s) V (s′)

Min states: V (s) = mins′∈succ(s) V (s′)

States Under Agent’s Control. Max.

Terminal States.

States Under Opponent’s Control. Min.

Page 68: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Tic-Tac-Toe Game Tree

Page 69: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 70: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkers

t One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 71: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes result

t The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 72: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 73: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 74: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search tree

t Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 75: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turns

t Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 76: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 77: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Adversarial Search (Minimax)

2 5

5

8 2 5 6

Minimax values:computed recursively.

Terminal values:part of the game.

Deterministic, zero-sum games:t Tic-tac-toe, chess, checkerst One player maximizes resultt The other minimizes result

Minimax search:t A state-space search treet Players alternate turnst Compute each node’s minimax value:the best achievable utility against arational (optimal) adversary

Page 78: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 79: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):

initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 80: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 81: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:

v = min(v, max-value(s))return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 82: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 83: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 84: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 85: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):

initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 86: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.

for each successor s of state:v = max(v, min-value(s))

return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 87: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 88: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))

return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 89: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 90: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 91: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):

if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 92: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utility

if the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 93: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)

if the next agent is MIN: return min-value(state)

Page 94: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 95: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Implementation

V (s) = mins′∈succ(s) V (s′)

def min-value(state):initialize v = +∞

for each successor s of state:v = min(v, max-value(s))

return v

V (s′) = maxs∈succ(s′) V (s)

def max-value(state):initialize v = −∞.for each successor s of state:

v = max(v, min-value(s))return v

def value(state):if the state is terminal: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

Page 96: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 97: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3

2 5

5

3 12 8 2 4 6 14 5 20

Page 98: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2

5

5

3 12 8 2 4 6 14 5 20

Page 99: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 100: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 101: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 102: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFS

t Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 103: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)

t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 104: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 105: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 106: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.

t But, do we need to explore the whole tree?

Page 107: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 108: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Efficiency

How efficient is minimax?t Just like (exhaustive) DFSt Time: O(bm)t Space: O(bm)

Example: For chess, b ≈ 35, m ≈ 100t Exact solution is completely infeasible.t But, do we need to explore the whole tree?

Page 109: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 110: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10

9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 111: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 112: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 113: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player.

Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 114: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 115: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Properties

10 9

10

10 10 9 100

Minimax values:computed recursively.

Terminal values:part of the game.

Optimal against a perfect player. Otherwise?

Demo : minvsexp(L6D2,L6D3)

Page 116: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Min vs. Exp (Min)

Page 117: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Min vs. Exp (Exp)

Page 118: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Page 119: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 120: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 121: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited search

t Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 122: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.

t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 123: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 124: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 125: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:

t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 126: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sec

t So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 127: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.

t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 128: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 129: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 130: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.

More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 131: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG difference

Use iterative deepening for an anytime algorithm

Page 132: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Resource Limits

Problem: In realistic games, cannot search toleaves!

Solution: Depth-limited searcht Instead, search to limited depth in the tree.t Use an evaluation function for non-terminalpositions

Example:t Suppose we have 100 seconds,can explore 10K nodes / sect So can check 1M nodes per move.t ↓,↑- reaches about depth 8– decent chess program

Guarantee of optimal play is gone.More plies makes a BIG differenceUse iterative deepening for an anytime algorithm

Page 133: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters

An important example of the tradeoff betweencomplexity of features and complexity ofcomputation

Demo : depthlimited(L6D4,L6D5)

Page 134: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters

An important example of the tradeoff betweencomplexity of features and complexity ofcomputation

Demo : depthlimited(L6D4,L6D5)

Page 135: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters

An important example of the tradeoff betweencomplexity of features and complexity ofcomputation

Demo : depthlimited(L6D4,L6D5)

Page 136: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters

An important example of the tradeoff betweencomplexity of features and complexity ofcomputation

Demo : depthlimited(L6D4,L6D5)

Page 137: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation function isburied, the less the quality of the evaluationfunction matters

An important example of the tradeoff betweencomplexity of features and complexity ofcomputation

Demo : depthlimited(L6D4,L6D5)

Page 138: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Limited Depth (2)

Page 139: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Limited Depth (10)

Page 140: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Page 141: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 142: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 143: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 144: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:

w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 145: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 146: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:w1f1(s)+w2f2(s)+ · · ·wnfn(s).

Example: f1(s) = (num white queens – num black queens), etc.

Page 147: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation for Pacman

Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)

Page 148: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation for Pacman

Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)

Page 149: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation for Pacman

Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)

Page 150: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation for Pacman

Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)

Page 151: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Evaluation for Pacman

Demo : thrashingd =2, thrashingd =2(fixedevaluationfunction),smartghostscoordinate(L6D6,7,8,10)

Page 152: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Thrashing (d=2)

Page 153: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 154: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 155: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 156: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 157: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)

t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 158: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)

t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 159: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)

t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 160: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 161: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Why Pacman Starves

A danger of replanning agents!t He knows his score will go up by eating the dot now (west, east)t He knows his score will go up just as much by eating the dot later(east, west)t There are no point-scoring opportunities after eating the dot (withinthe horizon, two here)t Therefore, waiting seems just as good as eating: he may go east,then back west in the next round of replanning!

Page 162: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Thrashing – Fixed (d=2)

Page 163: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Smart Ghosts (Coordination)

Page 164: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Video of Demo Smart Ghosts (Coordination) –Zoomed In

Page 165: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Game Tree Pruning

Page 166: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 167: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3

2 5

5

3 12 8 2 4 6 14 5 20

Page 168: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2

5

5

3 12 8 2 4 6 14 5 20

Page 169: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 170: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Example

3 2 5

5

3 12 8 2 4 6 14 5 20

Page 171: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2 5

5

3 12 8 2 14 5 20

Page 172: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3

≤ 2 5

5

3 12 8 2 14 5 20

Page 173: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3

≤ 2 5

5

3 12 8 2 14 5 20

Page 174: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2

5

5

3 12 8 2 14 5 20

Page 175: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2

5

5

3 12 8 2 14 5 20

Page 176: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2

5

5

3 12 8 2 14 5 20

Page 177: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2 5

5

3 12 8 2 14 5 20

Page 178: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Minimax Pruning

3 ≤ 2 5

5

3 12 8 2 14 5 20

Page 179: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 180: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.

t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 181: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s children

t n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 182: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is dropping

t Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 183: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAX

t a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 184: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”

t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 185: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.

So search “prunes” other children.

Symmetric for MAX version.

Page 186: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 187: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 188: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning

General configuration (MIN version)t Computing MIN-VALUE at some node n.t Looping over n’s childrent n’s estimate of min is droppingt Who cares about n’s value? MAXt a = “ best MAX value on path to root.”t If n < a, than MAX will never choose it.So search “prunes” other children.

Symmetric for MAX version.

Page 189: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 190: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 191: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):

initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 192: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.

for each successor s of state:v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 193: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 194: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))

if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 195: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return v

β =min(β ,v)return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 196: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 197: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 198: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):

initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 199: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.

for each successor s of state:v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 200: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 201: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))

if v ≤ α return vα =min(α,v)

return v

Page 202: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return v

α =min(α,v)return v

Page 203: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 204: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Implementation

α - MAX’s best option on path to root.

β - MIN’s best option on path to root.

def min-value(state , α,β ):initialize v = +∞.for each successor s of state:

v = min(v, value(s, α,β ))if v ≤ α return vβ =min(β ,v)

return v

def max-value(state , α,β ):initialize v = −∞.for each successor s of state:

v = max(v, value(s, α,β ))if v ≤ α return vα =min(α,v)

return v

Page 205: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 206: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 207: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrong

t→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 208: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 209: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 210: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 211: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)

t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 212: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!

t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 213: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 214: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 215: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Pruning Properties

This pruning has no effect on minimax valuecomputed for the root!

Values of intermediate nodes might be wrongt Important: root’s children may be wrongt→ most naive version not for action selection

Good child ordering improves effectiveness

With “perfect ordering”:t Time complexity drops to O(bm/2)t Doubles solvable depth!t Full search of, e.g. chess, is still hopeless...

This is a simple example of metareasoning (computing about what tocompute)

Page 216: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Quiz

Page 217: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Alpha-Beta Quiz 2

Page 218: Game Playing State-of-the-Artinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-8.pdfchampion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used

Next Time: Uncertainty!