49
Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer

Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Embed Size (px)

Citation preview

Page 1: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Games

Tamara Berg

CS 560 Artificial Intelligence

Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer

Page 2: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Announcements

• I sent out an email to the class mailing list on Thurs (making the medium and big cheese mazes extra credit).

• HW2 will be released later this week.

Page 3: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Adversarial search(Chapter 5)

World Champion chess player Garry Kasparov is defeated by IBM’s Deep

Blue chess-playing computer in a six-game match in May, 1997

(link)

© Telegraph Group Unlimited 1997

Page 4: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Why study games?

• Games are a traditional hallmark of intelligence• Games are easy to formalize• Games can be a good model of real-world

competitive or cooperative activities– Military confrontations, negotiation, auctions, etc.

Page 5: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Types of game environments

Deterministic Stochastic

Perfect information(fully observable)

Imperfect information(partially observable)

Chess, checkers, go Backgammon, monopoly

Battleship Scrabble, poker, bridge

Page 6: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alternating two-player zero-sum games

• Players take turns• Each game outcome or terminal state has a

utility for each player (e.g., 1 for win, 0 for loss)• The sum of both players’ utilities is a constant

Page 7: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Games vs. single-agent search

• We don’t know how the opponent will act– The solution is not a fixed sequence of actions from start

state to goal state, but a strategy or policy (a mapping from state to best move in that state)

• Efficiency is critical to playing well– The time to make a move is limited– The branching factor, search depth, and number of

terminal configurations are huge• In chess, branching factor ≈ 35 and depth ≈ 100, giving a search

tree of 10154 nodes– Number of atoms in the observable universe ≈ 1080

– This rules out searching all the way to the end of the game

Page 8: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Search problem components• Initial state• Actions• Transition model

– What state results fromperforming a given action in a given state?

• Goal state• Path cost

– Assume that it is a sum of nonnegative step costs

• The optimal solution is the sequence of actions that gives the lowest path cost for reaching the goal

Initialstate

Goal state

Page 9: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Deterministic Games

Many possible formulations, e.g.:• States: S (start at initial state)• Players: P={1…N} (usually take turns)• Actions: A (may depend on player/state)• Transition Model: SxA -> S• Terminal Test: S->{t,f}• Terminal Utilities: SxP -> R

Solution for a player is a policy: S->A

Page 10: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Deterministic Single-Player

Deterministic, single player, perfect information (e.g. 8-puzzle, rubik’s cube):• Know the rules, know what actions do,

know when you win

… it’s just search!

Slight reinterpretation:• Each node stores a value: the best

outcome it can reach• This is the maximal value of its children • Note we don’t have path sums as

before (utilities at end)

After search, pick moves that lead to best outcome

Page 11: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,
Page 12: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,
Page 13: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Game tree• A game of tic-tac-toe between two players, “MAX” (X) and “MIN” (O)

Page 14: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,
Page 15: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

A more abstract game tree

Terminal utilities (for MAX)

3 2 2

3

A two-ply game

Page 16: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

A more abstract game tree

• Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides

• Minimax strategy: Choose the move that gives the best worst-case payoff

3 2 2

3

Page 17: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Computing the minimax value of a node

• Minimax(node) = Utility(node) if node is terminal maxaction Minimax(Succ(node, action)) if player = MAX

minaction Minimax(Succ(node, action)) if player = MIN

3 2 2

3

Page 18: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Optimality of minimax

• The minimax strategy is optimal against an optimal opponent

• What if your opponent is suboptimal?– Your utility can only be higher than if

you were playing an optimal opponent!– A different strategy may work better for

a sub-optimal opponent, but it will necessarily be worse against an optimal opponent

11

Example from D. Klein and P. Abbeel

Page 19: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

More general games

• More than two players, non-zero-sum• Utilities are now tuples• Each player maximizes their own utility at their node• Utilities get propagated (backed up) from children to parents

4,3,2 7,4,1

4,3,2

1,5,2 7,7,1

1,5,2

4,3,2

Page 20: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

10 9

x

Page 21: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

Page 22: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

3

3

Page 23: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

3

3

2

Page 24: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

3

3

2 14

Page 25: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

3

3

2 5

Page 26: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree

3

3

2 2

Page 27: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

General alpha-beta pruning• Consider a node n in the tree

---

• If player has a better choice at:– Parent node of n– Or any choice point further up

• Then n will never be reached in play.

• Hence, when that much is known about n, it can be pruned.

Page 28: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta Algorithm• Depth first search

– only considers nodes along a single path from root at any time

a = highest-value choice found at any choice point of path for MAX

(initially, a = −infinity)

b = lowest-value choice found at any choice point of path for MIN

(initially, = +infinity)

• Pass current values of a and b down to child nodes during search.• Update values of a and b during search as:

– MAX updates at MAX nodes– MIN updates at MIN nodes

• Prune remaining branches at a node when a ≥ b

Page 29: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example Revisited

, , initial valuesDo DF-search until first leaf

=− =+

=− =+

, , passed to kids

Page 30: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

MIN updates , based on kids

=− =+

=− =3

Page 31: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=− =3

MIN updates , based on kids.No change.

=− =+

Page 32: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=− =3

MIN updates , based on kids.No change.

=− =+

Page 33: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

MAX updates , based on kids.=3 =+

3 is returnedas node value.

Page 34: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=3 =+

=3 =+

, , passed to kids

Page 35: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=3 =+

=3 =2

MIN updates ,based on kids.

Page 36: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=3 =2

≥ ,so prune.

=3 =+

Page 37: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

2 is returnedas node value.

MAX updates , based on kids.No change. =3

=+

Page 38: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

,=3 =+

=3 =+

, , passed to kids

Page 39: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

,

=3 =14

=3 =+

MIN updates ,based on kids.

Page 40: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

,

=3 =5

=3 =+

MIN updates ,based on kids.

Page 41: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=3 =+

2 =3 =2

MIN updates ,based on kids.

Page 42: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

=3 =+ 2 is returned

as node value.

2

Page 43: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-Beta Example (continued)

2

Page 44: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning

• Pruning does not affect final result• Amount of pruning depends on move ordering

Page 45: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Example

3 4 1 2 7 8 5 6

-which nodes can be pruned?

Page 46: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Answer to Example

3 4 1 2 7 8 5 6

-which nodes can be pruned?

Answer: NONE! Because the most favorable nodes for both are explored last (i.e., in the diagram, are on the right-hand side).

Max

Min

Max

Page 47: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Second Example(the exact mirror image of the first example)

6 5 8 7 2 1 3 4

-which nodes can be pruned?

Page 48: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Answer to Second Example(the exact mirror image of the first example)

6 5 8 7 2 1 3 4

-which nodes can be pruned?

Min

Max

Max

Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side).

Page 49: Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Alpha-beta pruning

• Pruning does not affect final result• Amount of pruning depends on move ordering

– Should start with the “best” moves (highest-value for MAX or lowest-value for MIN)

– For chess, can try captures first, then threats, then forward moves, then backward moves

– Can also try to remember “killer moves” from other branches of the tree

• With perfect ordering, the time to find the best move is reduced to O(bm/2) from O(bm)– Depth of search you can afford is effectively doubled