Adversarial Search: Nine Men’s Morris, Minimax, and Alpha ...seaugust.lmu.build/TAILSweb/html/Nine-Mens-Morris... · 1.2.2 Alpha-Beta Pruning Alpha-Beta pruning is almost always

Adversarial Search:

Nine Men’s Morris, Minimax, and Alpha Beta Pruning

Search Algorithms

Artificial Intelligence Laboratory

This document is a TAILS module.

Stephanie E. AugustMatthew J. ShieldsOctober 25, 2015

Loyola Marymount University

1

1 The Idea

1.1 Purpose

The purpose of this lab is to familiarize the experimenter with the adversarial search algo-rithm named minimax via an interactive two player game called Nine Men’s Morris. Theexperimenter is provided the interactive game and a working implementation of the basiclimited depth minimax algorithm. While performing the experiments in this lab, an studentlearns how slight modifications to the algorithm can greatly enhance its performance. First,the student is instructed to add Alpha-Beta pruning to eliminate from consideration portionsof the search space that are unlikely to a↵ect the outcome of the game. Next the studentis encouraged to modify the evaluation function and the search depth cut-o↵ to learn howthose changes a↵ect game play.

1.2 Background

1.2.1 Minimax Algorithm

Minimax is an adversarial search algorithm commonly used to solve turn-based games havingtwo or more players. This recursive algorithm implements a depth first search of a gametree. The game tree is built by playing the game out in memory to maximize the possibleloss of the other player or adversary. In other words the game tree is built such that theroot node is the current state of the game and the subsequent levels of the tree alternatelyrepresent the hypothetical moves of the starting player and the adversary. The children ofthe root node include all of the possible game states for each move that the current playercan make on their turn. The root node’s grandchildren include all of the possible gamestates two turns or two ply in the future, or all the possible moves that the starting playercan make in response to the adversary’s move. The great-grandchildren of the root includeall of the possible game states three ply into the future, where the adversary is respondingto the player’s next move, and so on. The algorithm assumes that the player whose turnis represented by the root node wants to maximize the value of the root’s children, thatis, select a move that is to the starting player’s maximum advantage, given the moves thatthe player’s opponent might make. The next depth of the tree portrays the moves that theadversary might take in response to the player’s initial move. The algorithm assumes thatthe adversary would like to maximize the adversary’s advantage, which has the a↵ect ofinflicting the greatest possible damage on the starting player.

A search tree for tic-tac-toe is presented in figure 1 to illustrate this. The evaluation functionin this example was taken from Russell and Norvig (2003). Xn is defined as the number ofrows, columns, or diagonals with just n X’s. On is defined as the number of rows, columns,or diagonals with just n O’s. The utility function assigns +1 to any position with X3 = 1

2

Figure 1: 2-ply game tree for Tic-Tac-Toe showing MINIMAX Algorithm.

and -1 to any position with O3 = 1. All other terminal positions have utility 0. For eachnon-terminal position s the linear evaluation function is defined as

Eval(s) = (3X2(s) +X1(s)) – (3O2(s) + O1(s))

While the minimax algorithm is running it first generates and evaluates all of the MINnodes. It then generates and evaluates all of the MAX nodes below the newly generatedMIN nodes working from left to right top to bottom. In this example the search ends at a2-ply depth. Since minimax is a recursivefunction each branch in the MAX depth returnsits minimum-valued evaluation to its MIN node parent. This value becomes the MINIMAX

backed up value. Then the MIN node children return their maximum minimax backed upvalue to the root node. The automated player then selects the path that leads to the MINlevel’s maximum value branch. This move gives the current player the best possible chanceof winning the game. This search is repeated, with all nodes regenerated and reevaluated,for every move that the agent makes.

3

Figure 2: Two-ply game tree for Tic-Tac-Toe showing pruned branches circled in red.

1.2.2 Alpha-Beta Pruning

Alpha-Beta pruning is almost always found in any implementation of the MINIMAX algo-rithm. This is because it is easily added to the algorithm with very little e↵ort, and it cangreatly increase the performance of the algorithm by pruning o↵ branches that do not needto be evaluated. It does this by passing two new values, alpha and beta, into the recursivemaxValue and minValue calls. The maxValue function maximizes alpha while the minValuefunction minimizes beta. When in a MAX ply if a child node evaluates to a value thatis greater than or equal to beta then all remaining children will be pruned without beingevaluated. When in a MIN ply the opposite is true, if a child node evaluates to a valuethat is less than or euqal to alpha then all remaining children will be pruned without beingevaluated. The main idea for the MAX ply is that if a child evaluates to something greaterthan or equal to beta then this move is too risky and this branch is deserted and the oppositeis true for the MIN ply. Figure 2 shows the branches that would be pruned from the 2-plyTic-Tac-Toe search tree circled in red.

4

1.2.3 Nine Men’s Morris

Nine Men’s Morris (9MM) is a strategy board game in the windmills game category. Ac-cording to Botermans and Fankbonner the first stone windmills game board was found in asmall Irish town’s graveyard. That board dated back to 2000 BCE. Another windmills gameboard was also found on the ceiling of an ancient Egyptian temple in Kurna which datedback to 1400 BCE. The oldest Nine Men’s Morris version of the windmills games was foundin Ceylon where the game board is engraved in the steps of a hill in Mihintale between 17CEand 19CE. This version of the game gained popularity throughout Europe during the middleages around 1600CE. It has since lost its popularity and many people have never heard ofthis game. The game has very simple rules yet requires quite a bit of strategy to master.Nine Men’s Morris is considered a non-trivial game.

Rules: The board consists of three concentric squares with the faces of each square connectedby four intersecting lines as shown in figure 2. Each intersection is a space on the board thatcan be occupied by one piece from either player.

Figure 3: Nine Men’s Morris game board.

This game belongs to a category of games referred to as the windmills. In this game a mill(short for windmill) is three pieces from either player in a row. When a player makes a mill,the player must remove one of their opponent’s pieces from any space on the board that is

5

not also in a mill. There are two possible ways of winning states: either the opponent hasonly 2 pieces remaining on the board, or the opponent has no moves available. The gamehas three distinct phases of play, populate, fight, and flight. Populate and fight alwaysoccur. Flight might or might not depending on how the victor wins the game. The rulesfor mills as described above are applicable for all phases of the game.

Each player starts the game in the populate phase. During this phase the players take turnsplacing each of their nine pieces onto empty spaces on the board. Blue always goes first.Players are not allowed to move a piece that has already been placed on the board. Whenboth players have placed all of their pieces on the board, both players enter thefight phaseof the game.

During the fight phase of the game players take turns moving one of their own pieces toa vacant adjacent space. Game play at this point is according to each player’s winningstrategy. A player might try to block all of the opponent’s moves, or attempt to build millsand remove all of the opponent’s pieces.

When a player has only three pieces remaining on the board, the player with three piecesand only that player enters the flight phase. During this phase a player can move to anyopen space on the board even if it is on the opposite side of the board.

6

2 Applications

The Minimax algorithm is an essential adversarial search algorithm that has been appliedto problems ranging from zero-sum game play to real-time pursuer evasion. The applicationof the Minimax algorithm to real world problems is no di↵erent than any other algorithmsapplication in that it is often twisted into a hybrid with other algorithms or concepts. Thefollowing three real-world applications will be considered below; Deep Blue; Envelope Con-

strained Filters ; and Pursuit Evasion.

Deep Blue II was a super compuetr designed by IBM to play Chess at a Grand Masterlevel. In general Deep Blue implements a depth limited Minimax algorithm with alpha-betapruning and NegaScout, which incorporates singular extensions. To improve the e�ciencyof alpha-bets pruning, the move generator was designed such that the optimal moves weregenerated first. Singular extension algorithms identify the interesting branches of the searchtree while it is being searched. Deep Blue used singular extensions to determine whichbranches it should search to a deeper ply, as deep as 30 or 40 ply (i.e., 30 or 40 consecutivemoves). Singular extension also allowed Deep Blue to search branches that would normallybe cut o↵ during alpha-beta pruning. Deep Blue used a combination of software and 256processors working in parallel that were specifically designed to run a variation of Minimax

with an extremely intelligent evaluation function. The software would start the search andintroduce changes to the search such as singular extensions, it would then pass the search ofthe last 5-ply to the hardware [Ham]. The processors could analyze a combined 200 millionboard positions per second [Kur]. Deep Blue successfully beat the Chess World ChampionKasparov in 1997 with a match score of 3.5 to 2.5.

Envelope Constrained Filters (ECFs) provide an output response to inputs that are withina predefined upper and lower limit. The upper and lower limits are defined by time func-tions which form the filter output window in the time domain. [Pet1]. ECFs are used forradar pulse compression and had previously been implemented using a linear programmingapproach. This linear programming approach had an inherent design problem; it could notminimize the sidelobes of the filter response. Petrovic et al, developed a non-linear approachbased on the Minimax algorithm which could successfully minimize the sidelobes. Sidelobesare duplicates of the main beam but are not the main beam and not as strong as the mainbeam, they are essentially noise. Sidelobes are caused by the shape of the producing an-tenna’s aperture. Figure 4 shows a diagram of a typical antenna pattern and what side lobeslook like.

Honeywell has developed a 3D Minimax Pursuit Evasion algorithm. This algorithm assumesthat the missile and aircraft can both accelerate perpendicular to their velocity and that theaircraft can maneuver in an optimal way to avoid the missile [Fri]. In this application ofthe Minimax algorithm the pursuer wants to minimize the missile distance and the missilewants to maximize some cost function.

These examples provide some understanding of how the Minimax algorithm can be used tosolve real world problems.

7

Figure 4: Typical Antenna Pattern [Mr. PIM at English Wikipedia, April 6, 2007]

8

3 Input/Process/Output

3.1 Input

The input to this laboratory includes two distinct pieces; player moves and adversarialagent moves. Experiments performed over the course of this laboratory will require theexperimenter to modify the adversarial agent code and play ganes against the modifiedagent to evaluate its performance.

3.2 Process

The process of this laboratory is to have the experimenter modify the adversarial agent andplay games against that agent to demonstrate the e↵ect of changes made to the agent code.The player by default is blue and makes the first move to start the game. The rules specifyinghow the game is played can be found in Section 2 (Background) of the idea statement.

3.3 Output

The output of this laboratory will be set of statistics for each experiment showing the numberof wins, losses, and ties that each modified agent achieved against its human competitor.The statistics, although somewhat skewed by the human aspect of the process, should showthe improvement or degradation of the adversarial agent’s performance for each of the codechanges performed in each experiment. The author of this laboratory realizes that a moresolid scientific approach would have to have a static agent, rather than a human player,play against an agent that is modified for each experiment. This would take the humanaspect out of the process, but in the author’s opinion it would also take the fun out of thelaboratory. The firm factor is important here because the goal of this laboratory is to sparkthe experiementer’s interest in artificial intelligence.

9

4 Design Description

4.1 Introduction

4.1.1 Purpose

This software design document describes the architecture and system design of the NineMen’s Morris computer game.

4.1.2 Scope

The purpose of the Nine Men’s Morris game is to familiarize the reader with the adversarialsearch algorithm names Minimax via an interactive two player game called Nine Men’sMorris. The reader will be provided with the interactive game and working implementationof the basic limited depth Minimax algorithm. This project will be designed such that thereader can modify/improve the Minimax algorithm with no impact to the game engines core.The main focus of this design document will be on the Minimax algorithm and the statedescription used to define a single instance of a Nine Men’s Morris game.

4.1.3 Overview

This document is organized in a top-down manner starting with a system overview whichwill provide a general description of the fucntionality, context and design of the Nine Men’sMorris laboratory. Next, a high-level overview of the system architecture will be providedwhich will include the architectural design, a decomposition description, and a rational forthe design decisions that were made for this system. Next, a detailed description of the datadesign wull be covered, which will define the state description of a single instance of a NineMen’s Morris game. Next a low-level description of theMinimax components will be specifiedincluding brief descriptions of each component, their attributes, and operations. Finally anoverview of the Human Machine Interface (HMI) will be provided including instrcutionsdetailing how the HMI is used.

4.1.4 Graphical Notation

The graphical notation used in the diagrams and throughout this document are describedin the table below:

10

4.2 System Overview

The Nine Men’s Morris laboratory consists of one to three possible active components atany given time. The HMI component is the only component required to play the Nine Men’sMorris game, but optionally player one and/or player two can be played by adversarialagents. In Figure 5 the user interacts with the HMI by either playing against themselves orby selecting an agent for player one and/or player two. If the user selects an agent for playerone and/or player two the HMI listens on a posrt and launches the agent program whichthen attaches to the listening port. Player one and player two agents both have their ownport numbers and attach independent of one another.

4.3 System Architecture

4.3.1 Architectural Design

As shown in Figure 6, the software architecture for the Nine Men’s Morris system consistsof one to three components. The HMI enforces all of the game rules and provides the userwith an interactive interface to evaluate the performance of one or two adversarial agentssimultaneously. The user can select an agent to play as player one and an agent to play asplayer two. Over the course of this laboratory the user may be instructed to create a newenhanced agent. The user can then play their enhanced agent against the baseline agent toevaluate the performance of the enhanced agent.

11

Figure 5: System Overview Diagram - The focus in this diagram is the HMI

Figure 6: Software Architecture Diagram

4.3.2 Decomposition Description

The decomposition description is limited to the baseline agent that is provided with thislaboratory. Figure 7 shows the Class Association Diagran for the baseline adversarial agent.An adversarial agent program is composed of its socket information and a Minimax object.The Minimax object encapsulates all of the attricbutes and operations needed to perform a

12

Minimax search as well as the current board state. A Nine Men’s Morris state is composedof all of the state information about a single instance of a game including a board stateand the action taken to achieve that state. An action is composed of a start node and adestination node which represents the node from which a piece was moved and the node towhere it was moved. A game board is composed of nodes and mills both of which containthe details about where all of the pieces can be found on the board.

Figure 8 details the sequence of events during nominal game play with two Agent playingagainst one another. This sequence diagram is meant to capture the communications betweenthe HMI and the agents.

Figure 9 detials the sequence of events that take place within an agent during nominal gameplay. This sequence diagram is meant to capture the communications between each of thecomponents that make up an agent.

4.3.3 Design Rationale

The system architecture as shown in Figure 6 above separates the agents from the HMI. Thereason this separation was architected into this project was to separate agent implementationfrom the games framework. The decision to attach the agents to the HMI through a TCPsocket connection was made to allow an agent to be implemented in any programminglanguage that supports TCP sockets. This provides the flexibility necessary to allow aprogrammer to write an agent in the programming language of their choice.

13

Figure 7: Adversarial Agent Class Association Diagram

14

Figure 8: Sequence Diagram - High level showing nominal game play.15

Figure 9: Sequence Diagram - Low level showing communication between agent components.16

5 Implementation-Specific Design Description

5.1 Data Design

5.1.1 Data Description

The baseline agent stores its state data in the form of a search tree where each node of thetree contains the following key information:

• The board state

• The action taken to reach the board state

• The turn count

• The blue players current phase of game play

• The red players current phase of game play

• A value which is the utility of the state

• A reference to the child state with the highest utility

The board state is queried from the HMI which returns a string of 24 characters separatedby spaces. Each character belongs to the set e r b where ’e’ denotes an empty node; ’r’denotes a node populated by a red piece; and ’b’ denotes a node populated by a blue piece.The nodes are ordered in the list by reading the board from left-to-right and top-to-bottom.The character number in the string of 24 characters is shown in Figure 10 which maps thenode identifiers to the game board. The action taken to reach the board state is populatedfor all states with the exception of the initial state passed into the Minimax algorithm.

The turn count is the sum of the number of turns taken by player one and player twosince the beginning of the current game. The blue players current phase belongs to the setPOPULATE FIGHT FLIGHT where ”POPULATE” means the player still has pieces thatare not on the board; ”FIGHT” means the player has all their pieces on the board andhas left the ”POPULATE” phase; and ”FLIGHT” means the player has only three piecesremaining on the board.

The value which is the utility of the board state is populated by the Minimax adversarialsearch algorithm during execution.

The reference to the successor state with the largest value exists merely for convenience.

17

Figure 10: Map of Nodes on Nine Men’s Morris board.

5.1.2 Data Dictionary

Details about every class including their attributes and methods are listed below:

Class Summary

Action

The Action class is a simple class that defines the details about amove in the Nine Men’s Morris board game.

Agent

The Agent class is the adversarial agent that knows how to play theboard game Nine Men’s Morris.

Board

The Board class contains the state of a single instance in a NineMen’s Morris game.

Main The Main class conducts the execution of the agent program.

Mill

The Mill class is a simple class that defines the details about a singlemill in the Nine Men’s Morris board game.

MiniMax

The MiniMax class encapsulates all of the fields and methods usedfor analyzing a Nine Men’s Morris game state using the ArtificialIntelligence adversarial search algorithm called MiniMax.

Node

The Node class is a simple class that defines the details about a nodein the Nine Men’s Morris board game.

State

The State class contains all of the state details for a single turn in aNine Men’s Morris game.

18

Action Attribute Detail

Node destinationNode The node on the game board where a player’s piece can move.

Node opponentNode

If not null then the opponent’s piece that will be removed from theboard as the result of this player creating a mill.

Node startNode

The node on the game board where a player’s piece is currently lo-cated.

19

6 Test suite and Drivers

20

7 Experiments

This lab consists of the several distinct activities or experiments listed below. Each activityprovides the experimenter with the information necessary to complete this lab.

7.1 Location

All of the experiments and exercises of this section are located in the Adversarial Search/

Experiments .

7.2 Experiment 1: Learn the 9MM game by playing it

This section will contain information about how to start and play the TAILS version of9MM. Generally what the user can expect to see while running the application.

7.3 Experiment 2: Study the design of the code

a. Sketch a design for the 9MM code as you envision it. The design can be graphical ornarrative, high level, and contained within one or two pages.

b. Study the implementation-independent design description provided.

c. How close was your design to the actual design? Record your thoughts.

d. What is the biggest surprise about the actual design of the code? Record your thoughts.

e. What part of the design is the most challenging to understand? Record your thoughts.

f. Exchange your design and reflection with your lab partner. Do your designs comple-ment with one another or conflict? Discuss any points where they di↵er. Summarizeyour team’s discussion.

g. Discuss the implementation-independent design description provided, especially theaspects you each found challenging. Summarize your discussion.

21

8 Source Code

8.1 Location

The source code for this module can be found under the Software Application/Agent Archi-tecture Application/ Program folder path.

8.2 Source code’s headers

The main source codes for the online application of this module begin with a header whichcontains the following information:

<filename>.<file extension>A little description of how and where the code is being used.

<filename>.<file extension>Author’s name

And if exists: A change history.

22

9 Complexity Analysis

9.1 Tima and Space Complexity of MINIMAX algorithm

Let us imagine a partial game tree for the tic-tac-toe game like figure 11 using the minimaxalgorithm.

Figure 11: Partial tree for a Tic-Tac-Toe game

9.1.1 Time

All the nodes in the tree have to be generated once at some point, and the assumption isthat it costs a constant time c to generate a node (constant times can vary, you can justpick c to be the highest constant time to generate any node). The order is determined bythe algorithm and ensures that nodes don’t have to be repeatedly expanded.If we show the branches by a factor called b, we will see from the figure that it costs cb0 tocalculate the zero level. The next level in the tree will have b1 number of nodes and it costscb1 to generate this level. If we continue like this, we can say that the cost to generate themth level will be cbm.At the deepest level of the tree at depth d there will be bd nodes, the work at that leveltherefor is c ⇤ bd. The total amount of work done to this point is c ⇤ b0 + c ⇤ b1 + ...+ c ⇤ bd.

23

For the complexity we only look at the fastest rising term and drop the constant so we get:O(c+ c ⇤ b+ c ⇤ b2 + ...+ c ⇤ bd) = O(bd)

9.1.2 Space

let us assume a smaller tree with branching factor or b = 3 and special notations like figure12. The figure shows the algorithm at di↵erent stages for b = 3. Star {*} indicates currently

Figure 12: A small tree with b = 3

expanded nodes, question mark {?} indicates unknown nodes and summation {+} indicatesnodes who’s score has been fully calculated.In order to calculate the score of a node, you expand the node, pick a child and recursivelyexpand until you reach a leaf node at depth d. Once a child node is fully calculated youmove on to the next child node. Once all b child nodes are calculated the parents scoreis calculated based on the children and at that point the child nodes can be removed fromstorage. This is illustrated in the figure above, where the algorithm is shown at 4 di↵erentstages.At any time you have one path expanded and you need c ⇤ b storage to store all the childnodes at every level. Here again the assumption is that you need a constant amount of spaceper node. The key is that any subtree can summarised by its root. Since the maximal lengthof a path is d, you will maximally need c ⇤ b ⇤ d of space. As above we can drop constantterms and we get O(c ⇤ b ⇤ d) = O(b ⇤ d).

24

9.2 Time and Space Complexity of Alpha-Beta Pruning

The benefit of alpha–beta pruning lies in the fact that branches of the search tree can beeliminated. This way, the search time can be limited to the ’more promising’ subtree, anda deeper search can be performed in the same time. Like its predecessor, it belongs tothe branch and bound class of algorithms. The optimization reduces the e↵ective depth toslightly more than half that of simple minimax if the nodes are evaluated in an optimal ornear optimal order (best choice for side on move ordered first at each node).

9.2.1 Time

With an (average or constant) branching factor of b, and a search depth of d plies, themaximum number of leaf node positions evaluated (when the move ordering is pessimal) isO(b ⇤ b ⇤ ... ⇤ b) = O(bd) – the same as a simple minimax search. If the move ordering forthe search is optimal (meaning the best moves are always searched first), the number of leafnode positions evaluated is about O(b⇤1⇤b⇤1⇤ ...⇤b) for odd depth and O(b⇤1⇤b⇤1⇤ ...⇤1)for even depth, or O(bd/2). In the latter case, where the ply of a search is even, the e↵ectivebranching factor is reduced to its square root, or, equivalently, the search can go twice asdeep with the same amount of computation.The explanation of b ⇤ 1 ⇤ b ⇤ 1 ⇤ ... is that all the first player’s moves must be studied to findthe best one, but for each, only the best second player’s move is needed to refute all but thefirst (and best) first player move – alpha–beta ensures no other second player moves need beconsidered. The alpha-beta pruning examines only O(b3/4).

9.2.2 Space

The e↵ectiveness of alpha–beta pruning [?] is highly dependent on the order in which thestates are examined. In an optimized order, alpha-beta needs to examine only O(bd/2) nodesto pick the best move, instead of O(bd) for minimax. This means that the e↵ective branchingfactor becomes

pb instead of b, so we can say that the space complexity of alpha-beta pruning

in this case will bepb ⇤ d multiplied by any constant c.

25

Documents

Adversarial Search: Nine Men’s Morris, Minimax, and Alpha ...seaugust.lmu.build/TAILSweb/html/Nine-Mens-Morris... · 1.2.2 Alpha-Beta Pruning Alpha-Beta pruning is almost always