View
354
Download
2
Category
Tags:
Preview:
DESCRIPTION
Dr. Ali Abdalla Lectures
Citation preview
Adversarial Search and Adversarial Search and Game PlayingGame Playing
(Where making good decisions requires respecting your (Where making good decisions requires respecting your opponent)opponent)
Dr. Ali Abdallah Dr. Ali Abdallah
(dr_ali_abdallah@hotmail.com)(dr_ali_abdallah@hotmail.com)
Games like Chess or Go are compact Games like Chess or Go are compact settings which involve uncertainty and settings which involve uncertainty and interactioninteraction
For centuries humans have used them For centuries humans have used them to exert their intelligenceto exert their intelligence
Recently, there has been great Recently, there has been great success in building game programs success in building game programs that challenge human supremacythat challenge human supremacy
UncertaintyUncertainty uncertainty is caused by the actions of another uncertainty is caused by the actions of another
agent (MIN), which competes with our agent (MAX)agent (MIN), which competes with our agent (MAX)
UncertaintyUncertainty Here, uncertainty is caused by the actions of Here, uncertainty is caused by the actions of
another agent (MIN), which competes with our another agent (MIN), which competes with our agent (MAX)agent (MAX)
MIN wants MAX to fail (and vice versa)MIN wants MAX to fail (and vice versa)
No plan exists that guarantees MAX’s success No plan exists that guarantees MAX’s success regardless of which actions MIN executes (the regardless of which actions MIN executes (the same is true for MIN)same is true for MIN)
At each turn, the choice of which action to perform At each turn, the choice of which action to perform must be made within a specified must be made within a specified time limittime limit
The state space is enormous: only a tiny fraction of The state space is enormous: only a tiny fraction of this space can be explored within the time limitthis space can be explored within the time limit
Specific SettingSpecific Setting Two-player, turn-taking, deterministic, fully Two-player, turn-taking, deterministic, fully observable, time-constrained gameobservable, time-constrained game
State spaceState space Initial stateInitial state Successor functionSuccessor function: it tells which actions can : it tells which actions can
be executed in each state and gives the be executed in each state and gives the successor state for each actionsuccessor state for each action
MAX’s and MIN’s actions alternate, with MAX MAX’s and MIN’s actions alternate, with MAX playing first in the initial stateplaying first in the initial state
Terminal testTerminal test: it: it tells if a state is terminal tells if a state is terminal and, if yes, if it’s a win or a loss for MAX, or a and, if yes, if it’s a win or a loss for MAX, or a drawdraw
All states are fully observable All states are fully observable
Game TreeGame Tree
MAX’s play
MIN’s play
Terminal state(win for MAX)
Here, symmetries have been used to reduce the branching factor
MIN nodes
MAX nodes
Game TreeGame Tree
MAX’s play
MIN’s play
Terminal state(win for MAX)
In general, the branching factor and the depth of terminal states are largeChess:• Number of states: ~1040
• Branching factor: ~35• Number of total moves in a game: ~100
Choosing an Action: Basic Choosing an Action: Basic IdeaIdea1)1) Using the current state as the initial Using the current state as the initial
state, build the game tree uniformly state, build the game tree uniformly to the maximal depth h (called to the maximal depth h (called horizonhorizon) feasible within the time limit) feasible within the time limit
2)2) EvaluateEvaluate the states of the leaf nodes the states of the leaf nodes3)3) Back upBack up the results from the leaves to the results from the leaves to
the root and pick the best action the root and pick the best action assuming the worst from MINassuming the worst from MIN
Minimax algorithmMinimax algorithm
Evaluation FunctionEvaluation Function
Function e: state s Function e: state s number e(s) number e(s) e(s) is a e(s) is a heuristics heuristics that estimates how that estimates how
favorable s is for MAXfavorable s is for MAX e(s) e(s) 0 means that s is favorable to MAX 0 means that s is favorable to MAX
(the larger the better)(the larger the better) e(s) e(s) 0 means that s is favorable to MIN 0 means that s is favorable to MIN e(s) e(s) 0 means that s is neutral 0 means that s is neutral
Example: Tic-tac-ToeExample: Tic-tac-Toe
e(s) = number of rows, columns, and diagonals open for
MAX number of rows, columns, and diagonals open for MIN
88 = 0 64 = 2 33 = 0
Construction of an Construction of an Evaluation FunctionEvaluation Function
Usually a weighted sum of Usually a weighted sum of “features”:“features”:
Features may includeFeatures may include Number of pieces of each typeNumber of pieces of each type Number of possible movesNumber of possible moves Number of squares controlled Number of squares controlled
n
i ii=1
e(s)= wf(s)
Backing up ValuesBacking up Values
6-5=1
5-6=-15-5=0
5-5=0 6-5=1 5-5=1 4-5=-1
5-6=-1
6-4=25-4=1
6-6=0 4-6=-2
-1
-2
1
1Tic-Tac-Toe treeat horizon = 2 Best move
ContinuationContinuation
0
1
1
1 32 11 2
1
0
1 1 0
0 2 01 1 1
2 22 3 1 2
Why using backed-up Why using backed-up values?values?
At each non-leaf node N, the backed-At each non-leaf node N, the backed-up value is the value of the best up value is the value of the best state that MAX can reach at depth h state that MAX can reach at depth h if MIN plays well (by the same if MIN plays well (by the same criterion as MAX applies to itself)criterion as MAX applies to itself)
If e is to be trusted in the first place, If e is to be trusted in the first place, then the backed-up value is a better then the backed-up value is a better estimate of how favorable estimate of how favorable STATESTATE(N) is (N) is than e(than e(STATESTATE(N)) (N))
Minimax AlgorithmMinimax Algorithm1.1. Expand the game tree uniformly from the Expand the game tree uniformly from the
current state (where it is MAX’s turn to play) current state (where it is MAX’s turn to play) to depth hto depth h
2.2. Compute the evaluation function at every leaf Compute the evaluation function at every leaf of the treeof the tree
3.3. Back-up the values from the leaves to the Back-up the values from the leaves to the root of the tree as follows:root of the tree as follows:a.a. A MAX node gets the A MAX node gets the maximummaximum of the evaluation of of the evaluation of
its successorsits successorsb.b. A MIN node gets the A MIN node gets the minimumminimum of the evaluation of of the evaluation of
its successorsits successors
4.4. Select the move toward a MIN node that has Select the move toward a MIN node that has the largest backed-up valuethe largest backed-up value
Game Playing (for MAX)Game Playing (for MAX)
Repeat until a terminal state is reachedRepeat until a terminal state is reached
1.1. Select move using MinimaxSelect move using Minimax
2.2. Execute moveExecute move
3.3. Observe MIN’s moveObserve MIN’s move
Note that at each cycle the large game tree built to horizon h is used to select only one move
All is repeated again at the next cycle (a sub-tree of depth h-2 can be re-used)
Can we do better?Can we do better?
Yes ! Much better !Yes ! Much better !
3
-1
Pruning
-1
3
This part of the tree can’t have any effect on the value that will be backed up to the root
ExampleExample
ExampleExample
= 2
2
The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase
ExampleExample
The beta value of a MINnode is an upper bound onthe final backed-up value.It can never increase
1
= 1
2
ExampleExample
= 1
The alpha value of a MAXnode is a lower bound onthe final backed-up value.It can never decrease
1
= 1
2
ExampleExample
= 1
1
= 1
2 -1
= -1
ExampleExample
= 1
1
= 1
2 -1
= -1
Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors
Search can be discontinued belowany MIN node whose beta value is less than or equal to the alpha valueof one of its MAX ancestors
Alpha-Beta PruningAlpha-Beta Pruning
Explore the game tree to depth h in Explore the game tree to depth h in depth-first depth-first mannermanner
Back up alpha and beta valuesBack up alpha and beta values whenever possiblewhenever possible
Prune branches that can’t lead to Prune branches that can’t lead to changing the final decisionchanging the final decision
Alpha-Beta AlgorithmAlpha-Beta Algorithm
Update the alpha/beta value of the parent Update the alpha/beta value of the parent of a node N when the search below N has of a node N when the search below N has been completed or discontinuedbeen completed or discontinued
Discontinue the search below a MAX node Discontinue the search below a MAX node N if its alpha value is N if its alpha value is the beta value of a the beta value of a MIN ancestor of NMIN ancestor of N
Discontinue the search below a MIN node Discontinue the search below a MIN node N if its beta value is N if its beta value is the alpha value of a the alpha value of a MAX ancestor of NMAX ancestor of N
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0 -3
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0 -3
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3 3
3
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0 -3 3
3
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
5
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
5
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
1
1
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
2
2
2
2
1
1
ExampleExample
0 5 -3 25-2 32-3 033 -501 -350 1-55 3 2-35
0
0
0
0 -3 3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
1
2
2
2
2
1
How much do we gain?How much do we gain?
Consider these two cases:Consider these two cases:
3
= 3
-1
=-1
(4)
3
= 3
4
=4
-1
How much do we gain?How much do we gain? Assume a game tree of uniform branching factor bAssume a game tree of uniform branching factor b Minimax examines O(bMinimax examines O(bhh) nodes, so does alpha-beta ) nodes, so does alpha-beta
in the worst-case in the worst-case The gain for alpha-beta is The gain for alpha-beta is maximummaximum when: when:
• The MIN children of a MAX node are ordered in decreasing The MIN children of a MAX node are ordered in decreasing backed up valuesbacked up values
• The MAX children of a MIN node are ordered in increasing The MAX children of a MIN node are ordered in increasing backed up valuesbacked up values
Then alpha-beta examines Then alpha-beta examines O(bO(bh/2h/2)) nodes nodes [Knuth and Moore, [Knuth and Moore, 1975]1975]
But this requires an oracle But this requires an oracle (if we knew how to order nodes (if we knew how to order nodes perfectly, we would not need to search the game tree)perfectly, we would not need to search the game tree)
If nodes are ordered at random, then the average If nodes are ordered at random, then the average number of nodes examined by alpha-beta is number of nodes examined by alpha-beta is ~O(b~O(b3h/43h/4))
Heuristic Ordering of NodesHeuristic Ordering of Nodes
Order the nodes below the root according to Order the nodes below the root according to the values backed-up at the previous the values backed-up at the previous iterationiteration
Order MAX (resp. MIN) nodes in decreasing Order MAX (resp. MIN) nodes in decreasing (increasing) values of the evaluation (increasing) values of the evaluation function e computed at these nodesfunction e computed at these nodes
Other ImprovementsOther Improvements
Adaptive horizonAdaptive horizon + iterative deepening + iterative deepening Extended searchExtended search: Retain k: Retain k1 best paths, 1 best paths,
instead of just one, and extend the tree at instead of just one, and extend the tree at greater depth below their leaf nodes to greater depth below their leaf nodes to (help dealing with the “horizon effect”)(help dealing with the “horizon effect”)
Singular extensionSingular extension: If a move is obviously : If a move is obviously better than the others in a node at horizon better than the others in a node at horizon h, then expand this node along this moveh, then expand this node along this move
Use Use transposition tablestransposition tables to deal with to deal with repeated statesrepeated states
Null-moveNull-move search search
State-of-the-ArtState-of-the-Art
Checkers: Tinsley vs. Checkers: Tinsley vs. ChinookChinook
Name: Marion TinsleyProfession:Teach mathematicsHobby: CheckersRecord: Over 42 years loses only 3 games of checkersWorld champion for over 40 years
Mr. Tinsley suffered his 4th and 5th losses against Chinook
ChinookChinook
First computer to become official world champion of Checkers!First computer to become official world champion of Checkers!
Chess: Kasparov vs. Deep Chess: Kasparov vs. Deep BlueBlue
Kasparov
5’10” 176 lbs 34 years50 billion neurons
2 pos/secExtensiveElectrical/chemicalEnormous
HeightWeight
AgeComputers
SpeedKnowledge
Power SourceEgo
Deep Blue
6’ 5”2,400 lbs
4 years32 RISC processors
+ 256 VLSI chess engines200,000,000 pos/sec
PrimitiveElectrical
None
1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws
Chess: Kasparov vs. Deep Chess: Kasparov vs. Deep JuniorJunior
August 2, 2003: Match ends in a 3/3 tie!
Deep Junior
8 CPU, 8 GB RAM, Win 2000
2,000,000 pos/secAvailable at $100
Othello: Murakami vs. Othello: Murakami vs. LogistelloLogistello
Takeshi MurakamiWorld Othello Champion
1997: The Logistello software crushed Murakami by 6 games to 0
Go: Goemate vs. ??Go: Goemate vs. ??Name: Chen ZhixingProfession: RetiredComputer skills:
self-taught programmerAuthor of Goemate (arguably the
best Go program available today)
Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000
Go: Goemate vs. ??Go: Goemate vs. ??Name: Chen ZhixingProfession: RetiredComputer skills:
self-taught programmerAuthor of Goemate (arguably the
strongest Go programs)
Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000
Go has too high a branching factor for existing search techniques
Current and future software must rely on huge databases and pattern-recognition techniques
Go has too high a branching factor for existing search techniques
Current and future software must rely on huge databases and pattern-recognition techniques
SecretsSecrets Many game programs are based on alpha-beta + Many game programs are based on alpha-beta +
iterative deepening + extended/singular search iterative deepening + extended/singular search + transposition tables + huge databases + ...+ transposition tables + huge databases + ...
For instance, Chinook searched all checkers For instance, Chinook searched all checkers configurations with 8 pieces or less and created configurations with 8 pieces or less and created an endgame database of 444 billion board an endgame database of 444 billion board configurationsconfigurations
The methods are general, but their The methods are general, but their implementation is dramatically improved by implementation is dramatically improved by many specifically tuned-up enhancements (e.g., many specifically tuned-up enhancements (e.g., the evaluation functions) like an F1 racing carthe evaluation functions) like an F1 racing car
Perspective on Games: Perspective on Games: ConCon and and ProPro
Chess is the Drosophila of artificial intelligence. However, computer chess has developed much as genetics might have if the geneticists had concentrated their efforts starting in 1910 on breeding racing Drosophila. We would have some science, but mainly we would have very fast fruit flies.
John McCarthy
Saying Deep Blue doesn’t really think about chess is like saying an airplane
doesn't really fly because it doesn't flap
its wings.
Drew McDermott
Other Types of GamesOther Types of Games Multi-player games, with alliances or Multi-player games, with alliances or
notnot Games with randomness in successor Games with randomness in successor
function (e.g., rolling a dice) function (e.g., rolling a dice) Expectminimax algorithm Expectminimax algorithm
Games with partially observable Games with partially observable states (e.g., card games)states (e.g., card games) Search of belief state spaces Search of belief state spaces
See R&N p. 175-180See R&N p. 175-180
Recommended