14
Game Applications of Deep Neural Networks Savanna Endicott Carleton University [email protected] December 15, 2017 Abstract In the past ten years, neural networks have made a significant impact on machine learning by enabling computers to make decisions previously difficult for them to accomplish. This predominantly began as recognizing photographic content, and is now being applied in a variety of contexts. Recently, there have been significant achievements for neural networks that select best moves to play in games. DeepMind’s AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this field. This study analyzes which games are best suited to a neural network solution, and how to design a game specific neural network to successfully select best moves. I. Introduction Neural networks are systems inspired by neu- rons in the human brain, with the goal of achieving tasks previously too difficult for com- puters. Neural networks are currently in popular use for speech and image recognition, and are becoming an important research development in the field of autonomous vehicles. With the success of DeepMind’s AlphaGo against 18-time Go world champion Lee Sedol, they have also broken into the game playing industry. How applicable this style of machine learning is to general gameplay, however, is still unknown. The research discussed in this paper will is intended to help identify which games make for good applications of neural networks and how to design them. This research is intended to aid in the design of neural networks used to select moves in games, particularly strategic board games. This involves strategies for deciding whether or not deep learning will apply well to a given game, as well as guidance for design of the system and important decisions that arise. The research is supported by the design of a player that uses deep learning strategies to select moves in the game of Blokus. Developing a neural network approach to a game is no easy task, due to computa- tional capabilities of the average developer, understanding of calculus and neural net- works, and limited learning resources. This paper is meant to reduce the learning curve, by describing the types of problems commonly faced, and the factors that contribute to an approach’s success. As successful implemen- tations of neural networks in game play is a recent development, background knowledge about important events in this field will also be imparted, along with predictions of future research. II. Concept Overview Before getting into specific applications and implementation details, it is important to describe some of the major concepts of machine learning and define terms that will appear in this paper. In general, artificial intelligence refers to the ability of computers to perform tasks that 1

Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep NeuralNetworks

Savanna Endicott Carleton [email protected]

December 15, 2017

Abstract

In the past ten years, neural networks have made a significant impact on machine learning by enablingcomputers to make decisions previously difficult for them to accomplish. This predominantly began asrecognizing photographic content, and is now being applied in a variety of contexts. Recently, there havebeen significant achievements for neural networks that select best moves to play in games. DeepMind’sAlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have beengiven serious consideration in this field. This study analyzes which games are best suited to a neural networksolution, and how to design a game specific neural network to successfully select best moves.

I. Introduction

Neural networks are systems inspired by neu-rons in the human brain, with the goal ofachieving tasks previously too difficult for com-puters.

Neural networks are currently in popularuse for speech and image recognition, and arebecoming an important research developmentin the field of autonomous vehicles. Withthe success of DeepMind’s AlphaGo against18-time Go world champion Lee Sedol, theyhave also broken into the game playingindustry. How applicable this style of machinelearning is to general gameplay, however, isstill unknown. The research discussed in thispaper will is intended to help identify whichgames make for good applications of neuralnetworks and how to design them.

This research is intended to aid in thedesign of neural networks used to select movesin games, particularly strategic board games.This involves strategies for deciding whetheror not deep learning will apply well to a givengame, as well as guidance for design of thesystem and important decisions that arise. Theresearch is supported by the design of a player

that uses deep learning strategies to selectmoves in the game of Blokus.

Developing a neural network approachto a game is no easy task, due to computa-tional capabilities of the average developer,understanding of calculus and neural net-works, and limited learning resources. Thispaper is meant to reduce the learning curve,by describing the types of problems commonlyfaced, and the factors that contribute to anapproach’s success. As successful implemen-tations of neural networks in game play is arecent development, background knowledgeabout important events in this field will alsobe imparted, along with predictions of futureresearch.

II. Concept Overview

Before getting into specific applicationsand implementation details, it is importantto describe some of the major concepts ofmachine learning and define terms that willappear in this paper.

In general, artificial intelligence refers tothe ability of computers to perform tasks that

1

Page 2: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

would normally require human intelligence. Inthe context of this research, it refers to decisionmaking. Machine learning is a method of solv-ing these tasks that does not directly computethe solution, but learns how to solve tasks overtime using knowledge gained from previousattempts. This is an approach intended tosolve problems in the same manner that ahuman would, while still taking advantageof computational capabilities beyond that ofa human. The terms deep learning and deepneural networks are used interchangeably. Theyrefer to neural networks that are unsupervisedand learn from unlabeled data. This meansthe desired output isn’t necessarily knownand the network doesn’t depend on the initialsample pairs of input and output values. Toachieve success without knowing the desiredoutput we use reinforcement learning, whichgenerally put is the ability for a system tobehave in a way that is learned from theenvironment’s feedback.

The general structure of a Neural Net-work appears as follows:

As seen in Figure 1, neural networks tradi-

Figure 1: Neural Network with a Single Hidden Layerfrom tutorialspoint.com/artificial_intelligence

tionally have one input layer and one outputlayer, in addition to 1-to-many hidden layers.The input layer neurons are what we callfeatures of an input variable. For example, inthe case of images, the input variable could

be a image represented as 64 pixels. Possibleinput features would be red qualities of thepixels, green quantities of the pixels, and bluequantities of the pixels, each also being anarray of equal dimension to the photo. Theoutput value would then be 1: cat, 0: not a cat.

With these concepts and terms defined, wecan now proceed into a background of appli-cations of neural networks, and discuss finerdetails.

III. Background

Two of the most successful and commonuses of deep learning are speech and imagerecognition. One of the first and arguablymost widely known achievements of neuralnetworks was presented by Google in 2012in the form of identifying images of cats (Leet. al, 2012). Although a relatively trivialtask, the implications are very important.This provided evidence for the use of neuralnetworks to complete tasks that humans areable to do with ease, but previously could notbe achieved reliably by computers. This worklaunched a series of developments in imagerecognition using neural networks which isstill highly active today. Though an importantachievement for deep learning, one could sayGoogle was only scratching the surface.

Google now uses neural networks inlanguage processing, another large applicationarea of deep learning. Google Assistant speechrecognition AI uses deep neural networks tobetter understand the user’s verbal requestsand commands. Apple’s Siri uses deep neuralnetworks to recognize when she is beingspoken to (Siri Team, 2017). Apple also useddeep neural networks to create on on-deviceface detection, whose debut came with iOS10 (Computer Vision Machine Learning Team,2017).

Additionally, neural networks are currentlygenerating a lot of development in the field ofautonomous vehicles (Tian et al., 2017). These

2

Page 3: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

networks are relatively more complex than themethods described in this research, as theygenerally use hybrid/custom neural networkarchitectures.

IV. Developments in Go

In 1997, IBM’s Deep Blue defeated a reigninggrand master in the game of chess. It usedbrute force deterministic tree search methodsalong with the aid of human knowledge. Thiswas one of the first breakthroughs in the fieldof game playing for artificial intelligence. Thismethod cannot apply to Go, because the searchspace for Go is much larger. The amount ofmoves a player has to consider at any giventime is much larger than chess, and each movehas a large impact on the rest of the game.

Figure 2: Example Board State of a game of Go fromgobeginnings.files.wordpress.com

The first big advancement in learning toplay go was using Monte Carlo Search Trees.The basic structure of these are trees that selectrandom moves from root to tree until theyreach the result of a game, then propagateback up through the tree and update theweights of each move. Backpropagation is animportant concept used in supervised learning

whose purpose is to correct itself from errorover time. In 2008, a system called MoGoachieved dan level in 9x9 Go, and in 2012,the Zen program defeated an amateur 2 danplayer on a standard 19x19 board. Monte Carlosearch trees were gaining some traction, butthe programs could not compete with the bestGo players in the world. This didn’t happenuntil DeepMind’s AlphaGo came on the seenin 2016.

DeepMind Technologies is a British com-pany acquired by Google in 2014. DeepMindpublished results of using deep learning toplay 7 Atari games in 2013 (Mnih, 2013),including Pong and SpaceInvaders. Althoughthis was a significant step in the right direction,the major influence that sparked interest ofneural networks for games was AlphaGo,which uses deep learning to select moves in thegame of Go (Silver et al., 2016). AlphaGo beatLee Sedol in a five game match, with a scoreof 4:1. AlphaGo was awarded an honorary9-dan (master) level for this achievement. Thiswas a huge breakthrough in machine learningbecause it forced the world to take gameplaying neural network machines seriously.

AlphaGo uses policy and value networksand a Monte Carlo tree search. The policy net-works include a supervised learning policy net-work trained on expert move data, a fast policyused to quickly sample actions during rollouts,and a policy that performs reinforcement learn-ing in order to optimize outcomes. The valuenetwork predicts the winner of games playedby the reinforcement learning policy againstitself. These policies are combined with MonteCarlo tree searches, discussed earlier in thissection. More details of the implementationwill follow as we discuss implementing ourown neural network approach to the game ofBlokus.

When implementing Blokus, I used AlphaGoto gather inspiration and ideas of how to de-sign the various components. It is important tonote that when I discuss "AlphaGo’s" details,I am referring to the version published in Jan-

3

Page 4: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

uary 2016, Mastering the game of Go with deepneural networks and tree search(Silver et al., 2016)

V. Designing the Input for the

Neural Network

Input design is an essential part of implement-ing a game specific neural network. It has theability to make or break a project, and in myopinion directly determines how effective asystem will be. Before creating a model of theinput, it is important to determine if we cancreate a model that will result in an effectivesystem. In this section we will discuss whataspects of a game make neural networks agood or bad choice, the first steps to take, andimportant design decisions and considerations.

i. Model Complexity

An important question to ask in order to de-termine whether or not to use a deep learningapproach is to compare the representationof states for the game to the structure ofcommon representations used in successfulimplementations. For example, an image. Ifyour game state is very similar to an image, itis likely a good application of neural networks.But what does it mean to be similar to animage? As mentioned before, an imagewould be represented as a grid, where eachposition in the grid can be represented bya set of attributes with identical structure,and this accumulation of states gives a fullrepresentation of the game’s state.

This is the case in the example of Go. Eachgrid position has a state of black, white, orempty. Then additionally we can use attributeslike the liberties available and legality ofpositions to add context. We will discuss thedesign of Go further as we go along as it isnot important to fully understand what thismeans right now. The key point is that simpleattributes of each square in the grid can create

a full image of the state, and any empty spaceshould be considered as a possible move.

In the case of Chess, we also have a gridwhere we can store attributes. However, theseattributes are significantly more complex. Forexample, there are 6 different pieces that canbe in any given square. Each of these piecescan be played differently, compared to Gowhere each piece has the same capabilities.Additionally, the number of spaces a playercan move to on the board at any given time ismore limited than Go.

What should be noted here is the differencebetween move complexities. In Go, the resultsof each action are quite different from eachother and relatively complex in their dynamics,but the complexity of the actual action isvery low. There are 19x19 places to moveon the board. Each has the same attributesas the others, with different values for thoseattributes. Each of the values is also relativelysimple, with 2-3 possible values. This canbe represented in a very similar fashion toan image. In Chess, on the other hand, apiece can only move to certain locations andeach possible piece/location action has aunique result. The problem is that there isno continuity in the board’s representation.What is meant by this is that the value ofpositions are not closely related, even if theyonly have one changed value of input (e.g.piece used). Representing this in a way thatwill be easily interpreted by a neural networkis very difficult, and has not been found toplay better than various human-knowledgebased AIs.

Not only should we strongly considerthis state-action model before we decide toimplement the neural network, we should alsoput a large emphasis on the model’s designthroughout the project. A mistake I madeduring the development of a neural networkfor Blokus within a small amount of time,(approximately 100 hours including learningabout machine learning and AlphaGo) was

4

Page 5: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

to quickly move forward with a model thatmade sense to me at that time. I wouldhave achieved much more success in myproject if I had taken the time early on tounderstand the implications of my problemto the model, and what aspects of the modelwere more challenging in the context of Blokus.

Another factor to consider would be thenumber of possible moves in the entire game.For example, in Go we wouldn’t expect to play10 games before seeing a given move twice.There are games, such as Blokus, as we willsee, where some moves are very rare, andso comparing the probability of that movewinning compared to frequently seen moves ismost likely not representative.

In my opinion, designing the input modelis the most important aspect of designing aneural network. To do this well, one needs tohave an understanding of how neural networksmanipulate this model in order to develop anunderstanding of move quality. The ability todesign a strong representational model of theinput facilitates and speeds up the process dras-tically. In my case, with a very brief learningperiod before beginning, I took a trial and er-ror approach to the design of the model, whichhad negative impacts on the project. In orderto avoid this in the future, important designcomponents of the input will be discussed, andexamples from AlphaGo and my implementa-tion of Blokus will be given.

ii. State representation

First we need to decide how to model ourstate. The state is actually simply the variablewe will be using to create our features from. Inmost board games, we can use a basic boardstate that would be maintained over the courseof the game anyways. Figure 2 is an exampleof a board state in Go. In this research, theseare the only types of games that are discussed.The state should be a simple 2 dimensionalarray with the state of each box in that array.

For checkers a possible state used to createfeatures from would be an 8x8 array whereeach position would have a value of 0 forempty, 1 if a white piece was there, or 2 if ablack piece was there.

Each time we go to make a move, we use ourstate as input to the neural network. We canalso use the result of the game in the case ofa supervised learning policy. Then when se-lecting moves using our network, we can sendthe state that would result from each possiblemove, and that will return the move with thehighest probability to win by comparing withpast moves.

iii. Feature Design

Designing features for games is relativelycomplex. It is important to select featureswisely, as they will affect how your networkperceives the game state. They should includeall basic features of the state, as well as anyattributes that apply to each position of theboard that are not biased. A trick I use todetermine which features to use is to askmyself if it provides knowledge about the state.Anything that provides knowledge aboutthe state that would otherwise be unknownshould be included in the features. Anythingthat does not, should not. I was inclined toadd to the feature details that would perhapsencourage the network to select actions that Iwould play, given my strategy; however thisdoes not make sense for input features to aneural network.

It is a bit hard to explain exactly how toselect features, but it will be more clear as welook at the examples of AlphaGo and Blokus.The features used for input to AlphaGo’snetworks can be examined in Figure 3, atable I made based off Extended Data Table 2from Mastering the game of Go with deep neuralnetworks and tree search (Silver et al., 2017).

Figure 3 states a number of planes asso-

5

Page 6: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

Feature # of PlanesStone colour 3Ones 1Turns since 8Liberties 8Capture size 8Self-atari size 8Liberties after move 8Ladder capture 1Ladder escape 1Sensibleness 1Zeros 1

Figure 3: Input features for neural networks

ciated with each feature. These planes areN1xN2 matrices, where N1 and N2 are thedimensions of the game’s board. In the caseof AlphaGo, N1 and N2 are both 19. So afeature for some state of the board would berepresented by 48 matrices with dimensions19x19, which is a lot of data. When trying tounderstand this, I had difficulty at first as Ido not understand the game of Go very well,and have never played. To clarify, the reasonLiberties have 8 planes is because 8 binaryplanes are used to represent whether any givenposition is a corner with x liberties. So eachplane is for a specific number of liberties from1 to 8. Almost every plane is a direct inferenceof the state, and includes rules from the game.Looking at the features, it can perhaps start tobecome clear why AlphaGo was so successful.

The features used clearly give a strongrepresentation of a state and it’s implicationson a board, in such a way that the featuresof planes can be compared to each other.The high number of features as well as thestandard shape work well together in thehidden layers, where they are represented invarious forms. However, the issue is that this isa very complex set of features, which requiresa lot of computational power to use. I havecome across many relatively professional andadvanced open source imitations of AlphaGo,and none of them were able to use this set of

features due to the required time and spacecomplexities. That said, some were able toachieve the abilities of an amateur dan playerusing a subset of the features.

iv. Action Representation

Figure 4: Example Chess Move from chessfox.com

Action representation is also important, andmore difficult in some scenarios. For example,in Go, it is quite simple, as it can be repre-sented by a position on the board given by xand y between 0 and 18. Actions that cannot berepresented this way, however, need some sortof alternative. I will give Chess as example,because despite not being a good applicationof neural networks, it is a widely understoodboard game that I can use to give examplesof game features. Additionally, using it as anexample will expose some of the reasons whydeep learning doesn’t apply as well. In chess,an action could be represented as two positionson the board: the starting position and the posi-tion the piece moved to. For example, in Figure4 the move could be represented as ([2,1],[2,2]).However, that only works if you include someinformation regarding piece type inside yourfeatures. The features of that state would need

6

Page 7: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

to result in quite different planes in order forus to use them to determine which piece to use(and not only which position). Consequently,we introduce a barrier between each possibleaction, which again explains the challenges ofchess and other games that produce more com-plex action states. The more we can reduce thenumber of possible actions and increase theirrelation to each other, the better our networkwill be able to interpret this information.

VI. Designing the Network

When designing a network, we first need tochoose an architecture. Convolutional NeuralNetworks (CNN) are commonly used in gamecontexts. I have limited knowledge of thedetails of other forms of neural networks.However, the type of input produced bya board game is very similar to that of animage. CNNs are the architecture of choicefor image recognition, so intuitively, theywould apply best to our problem. Thiswould be in comparison to Recurrent NeuralNetworks(RNN), generally more useful forareas such as speech recognition, which doesnot provide obvious overlaps with the repre-sentation of game states. A third option wouldbe to create a custom architecture, perhapsa hybrid CNNs and RNNs as is sometimesused for autonomous vehicle applications.Having limited experience with deep learning,I decided not to take this approach.

In AlphaGo, the reinforcement learningpolicy neural network could be interpretedas the "brains" of the operation, despite the"expert knowledge" used by the supervisedlearning network. The end goal is for thesystem to beat the expert players, and not toplay similarly to them, by learning in ways thatleverage computational abilities beyond that ofa human brain. Reinforcement learning alsohelps us to recover from flaws in the system.An important example of a flaw mended byreinforcement learning is overfitting to data.A metaphor for this could be a child learningfrom only one person their whole life, or a

small group of people with similar opinions.The child’s brain may become overfitted to itspeers limited way of thinking. Another flawit helps avoid is learning from incorrect ordamaging actions.

Gradient Descent iteratively updates aset of parameters to minimize these types oferrors. The reinforcement learning policy ofAlphaGo uses Stochastic Gradient Descent(SGD) to update its weights at every stepthrough the tree that does not result in thegame ending. The difference between GradientDescent and SGD is that the former requiresyou to run through all samples in the trainingset to update one parameter, where as thelatter only uses one training sample or a smallsubset. Using regular Gradient Descent witha large amount of training samples wouldtake significantly longer. The trade-off is thatthe error is slightly less minimized in SGD;however, SGD offers a significant advantage inthat it makes progress right away with eachnew sample.

The structure used by AlphaGo for boththe Supervised Learning and ReinforcementLearning policy networks alternated betweenconvolutional layers and rectifier nonlinearities,and its last layer is a softmax layer that outputsa probability distribution over all legal moves.The design of convolutional layers is critical todesigning a convolutional network. To get anidea of how to approach this, it is importantto understand the different attributes of thelayers. One of the important features of eachlayer is the activation function used.

i. Activation Functions

AlphaGo uses the Rectifier Linear Unit (ReLU)activation function for all of its hidden layers.There are several choices to use for activationfunctions, and it can be beneficial to try a cou-ple of different ones, if not for the network,then for learning. Commonly used activationfunctions and their parabolas can be seen in

7

Page 8: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

Figure 5.

Figure 5: Activation functions from cdn-images-1.medium.com

In Andrew Ng’s course Neural Networks andDeep Learning (Coursera inc., 2017), he explainsthat the sigmoid function is a good choiceonly when working with binary classification.When the input to the activation function iseither very large or very small, the slope ofthe function is close to 0, slowing gradientdescent. ReLU is currently the most success-ful and widely-used activation function (Ra-machandran et. al., 2017). The reason it is sucha popular option is because it lessens the effectof this Gradient Descent slowing, speeding upthe entire learning process significantly. How-ever, there are other options to consider. Asuccessful imitation of AlphaGo that I foundclaims that using Elu greatly sped up the learn-ing (McGlynn, 2016). Additionally, the LeakyReLU activation function as seen in Figure 5has been claimed to improve speeds and faulthandling by a group from Stanford University(Maas et al., 2013).

ii. Convolutional Layers

TensorFlow simplifies the creation of convo-lutional neural network layers (Google, 2017).To see which input parameters work best, asolution could be to simply try all the classicexample inputs and compare their results.However, it is perhaps better to actuallyunderstand the parameters that go into theselayers, in order to get a better understandingof how the entire architecture functions.

TensorFlow has a function tf.layers.conv2d(Google, 2017) which, among other things,takes in inputs, the number of filters, thedimensions of the filters, padding specification,and an activation function. Input is straightfor-ward, feed it the input layer constructed (seesection V. Designing the Input for the NeuralNetwork). Understanding what the filters areused for is important to understanding howthese convolutional layers work and what theyachieve.

The idea behind convolutional layers isthat each filter is an area that will be looked atand checked for some sort of signal activation.For example in images it could be a curve oredge. The math behind this is a dot productof each filter with a small piece of the inputmatrix. The more filters we have, the morepreservation we have of the image. However,they also correspond to the output dimensionof our features, so using a higher numberof filters will not necessarily provide anyadditional information and will drasticallygrow our computational requirements. Forthe dimensions of the filter, this is a generalquestion of how large of an area containshighly dependent pixels. For example, if wesaid 1X1 for the filter, we would be statingthat pixels are completely independent of theirneighbors, where as if we go bigger, we aresaying large areas have pixels that are relatedto each other in a significant way. In images,if we were looking for, say, a cat, we wouldwant to use a filter big enough that it couldrecognize a curve without the entire paw. Ofcourse we don’t know how big the paw will bein the photo, which is a reason to use a varietyof filter dimensions in different layers of thenetwork.

Another commonly specified attribute isthe stride. This is simply the number ofpositions you shift horizontally or verticallybefore taking a new filter. For example if youhad a 3x3 filter size and took your first filterfrom x = 0...2 and y = 0...2, the next filterwould jump x and y to a certain new number.

8

Page 9: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

The default stride is 1, so in this case we wouldhave x = 1...3, y = 1...3. The TensorFlow inputfor this is an array of length 4. The first andlast items in the array are the samples andthe features, which have to be set to 1. Theother two items in the area are these widthand height strides that can be specified. In thecase of games, it is most likely better to keepthe default, however there is no harm in tryingthings out.

I will not go into the reasoning behindeach of the selected parameters for AlphaGo,as these values are implementation-specific;however, Figure 6 below gives a brief break-down of the parameters used in AlphaGo’s 12hidden layers.

Layer Details1

• zero pads the input into a23x23 matrix

• Applies 192 5x5 filters withstride 1

• Applies ReLu

2-12

• pads the previous hiddenlayer into a 21x21 matrix

• Applies 192 3x3 filters withstride 1

• Applies ReLu

Output

• Applies 1 1x1 filter withstride 1

• Applies position bias 1• Applies softmax function

Figure 6: Details of AlphaGo’s Policy Neural Network

VII. Implementation of Blokus

As part of this research, I began the implemen-tation of a neural network player for Blokus,with a model inspired by AlphaGo. I designedthe input to the neural network and its features,implemented the game and neural network.However, I ran out of time before I was ableto determine how to feed the input properlyinto the reinforcement learning network. Thissection will include a description of the game,the design of input for the neural networks,challenges faced during the project, and futurework.

i. Blokus Rules

Blokus is traditionally a 4-player game playedon a 20x20 board. Each player has 21 pieces,each of which has a unique shape. The shapesare based on free polyominoes between 1 and 5squares. Each turn, a player has to play a pieceif they are able. On their first turn, they mustplay a piece such that a tile of the piece lays intheir designated corner tile of the board (e.g.[0,0], [19,19], ...). They can only play a piecein a given location if one of its corners will beattached to the corner of another of its pieces,and if none of its side’s lengths are against thelength of another of its pieces (see Figure 7).

ii. Input Design

When starting the design and implementationof Blokus, I did not have the level of under-standing displayed in this document. It did notstand out to me that Blokus could have similarissues to Chess, and did not contain some ofthe important features that made Go applicablefor a neural network approach. As discussedin section V. Designing the Input for the Neu-ral Network, this game contains a significantlycomplex action state that does not offer a sim-ple solution for highly related input features.As in Chess, the piece one uses results in a

9

Page 10: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

Figure 7: Example Blokus board layout fromhttps://target.scene7.com

unique action and, consequently, has a uniqueeffect on the value of the move. Unfortunately,this applies to Blokus more dramatically. Notonly are there 21 different pieces as opposedto 6, but they have different states. The dif-ferent states are unique rotations and flips ofthe piece, where each unique state forms inde-pendent features as in chess. Fortunately, notall of them have 8 unique states, one for eachof the four rotations before and after flipping.Figure 8 displays the number of shapes thathave x unique sets of indices for each shape,meaning all flips and rotations that result in aunique set of tile positions and indices, wherex is between 1 and 8. This means there areconsequently 102 unique actions for each posi-tion on the board (where the move fits in theboundaries).

# of unique forms # of shapes8 74 102 21 2

Figure 8: The number of unique tile positions for eachshape

Compared to AlphaGo’s 1 possible state perposition and even chess’ 6 states per position,this already presents doubt that Blokus willbe a good application for neural networks.Additionally, the number of moves possible ata given state in the board is more similar tochess than to Go, as there are several optionsfor pieces and only a limited number oflocations they can be placed. The number ofpieces/shapes you can place is greater thanchess; however, the number of locations wherea corner will be available is significantly less.Generally, the features of the variety of actionswill not be continuous.

Selecting features took a significant amountof time, as I did not visualize the problem in away that was appropriate for neural networks.It took a lot of analyzing and trial and error tounderstand how the feature planes of AlphaGoare formed and how they are used, as it is notexplained. Once I had a better understandingof deep learning, I was able to understandAlphaGo’s approach, and finally design thefeature planes for Blokus. Figure 9 displays thefeature planes I selected for Blokus.

Feature # of PlanesTile colour 5Liberties 4Sensibleness 1Zeros 1

Figure 9: Input features for neural networks

The main reason that there are only fourtypes of feature planes is that there are fewerrules in Blokus. At first, I had wanted tomake biases according to strategies I support;however, this would be incorrectly implement-ing a neural network. There are a total of 11features planes here. The colour planes arethe same as AlphaGo except that there are 4players instead of 2, so there are 2 extra planes.The liberties in this case refer to tiles that areattached to corners, are empty, and do notshare side lengths with another piece of that

10

Page 11: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

player’s colour. Each player’s planes are usedto enabled a blocking strategy. It is a binaryplane, where 1 means the piece is a liberty and0 means it is not. The planes will representthe possible locations for each player to go,but does not acknowledge piece options. Forexample, if there is an empty liberty where aplayer can go but only with their single-squarepiece and they have already played it, it willcount as a liberty even though they cannotplay there. Sensibleness is the legality plane,that shows places the player could place a tile,without acknowledging attachments. A tileis legal if it is empty and none of its edges isadjacent to a tile with that player’s colour.

The features are created from the player’scolour and the board state, a 19x19 matrix withvalues -1, 0, 1, 2, or 3, where -1 means thelocation is empty and the other four are rep-resentative of each player having a tile there.The action is represented as a 5-tuple (piece, x,y, rotation, flip) where piece is a number 1-21identifying the piece of choice from a list, x andy are the "center tile" positions for the piece,rotation is the number of spins clockwise be-tween 0 and 3, and flip is 1 or 0 (flipped or notflipped).

iii. Pain Points and Major BlockingSources

As discussed in section ii, the first major blockI had in this project was trying to design aproper input model for the neural network,and generally trying to understand how Al-phaGo’s network functioned. Once I had asolid understanding, had implemented my in-put to the neural network including features,and had set up basic TensorFlow classes andmethods, I was nearly out of time. The majorstepping stone I did not overcome, primarilydue to time, was simply feeding the input intoa neural network. TensorFlow’s data format-ting examples for self-created input to theirfunctions are exclusively images, and the pro-cess of creating persistent tensors and sessions

is vaguely described. I found the documenta-tion quite difficult to understand. The exam-ples I used were image-based, with the excep-tion of AlphaGo (McGlynn, 2016). AlphaGofeeds inputs to the network behind the scenes,as each move is simply a position (x,y) in theboard state. The training function outputs a setof probabilities for each position in the boardstate, and McGlynn (2016) simply chooses theindex with the maximum probability from thatlist. Having a 5-tuple action representation andover 40,000 possible moves, this is not feasiblefor Blokus. Moving forward, I would like tofind a solution to this problem that is more effi-cient. My thinking was to simply run samples(possible actions) through the network one byone and keep track of probabilities producedfor possible actions. However I hope to finda cleaner and more dynamic solution. A help-ful strategy would be to use a smaller board,however this greatly changes the game, anddoes not actually aid in a representation of thepossible moves - only lowers their quantity.

VIII. General Implementation

Advice

Through the learning process of this project, Ihave developed a list of tips for implementinga neural network for games. This discusseshigh level decisions and strategies to use whileprogramming as well as the research processused to gather knowledge required for thistype of project. This advice comes from myown personal experience, and should there-fore be considered along with the context of aproject and resources available.

1. Do not start by learning the low level math.Read about convolutional neural networks,their layers, and their architectures first,before trying to understand the detailedmath behind these concepts. Althoughthe math is important to understandinghow deep learning works, a higher levelunderstanding of how neural networksfunction and interact should come first.This will allow you to develop a model of

11

Page 12: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

how these concepts apply to your projectearly in the process. After these conceptsare understood, more knowledge of theirdetails will be required and more easilyabsorbed.

2. Take the time to understand input model de-signs before implementing one. Althoughdesigning an input model seem intuitive,having a full understanding of how yourdesign will be interpreted by a neural net-work will greatly reduce implementationtime. This will result in a better chance ofmoving into the network design earlier inthe project’s timeframe.

3. Focus on one reinforcement learning policy net-work. Although a set of neural networksidentical to AlphaGo’s seems appealing,these networks may not be necessary fora different project to be successful, andmay require Google-level computationalcapabilities to achieve. Generally, the rein-forcement learning policy network shouldbe held at a higher priority than the super-vised learning network. The supervisedlearning network by itself does not rep-resent the capabilities of the neural net-work, it represents the capabilities of theplayers who created the data. Supervisednetworks should only be used as a tool tohelp kick-start the reinforcement learningpolicy’s learning. The reinforcement learn-ing policy is the policy that, well, doesthe machine learning! It is where the de-signs of input and the network itself can betested and improved. Once content with aversion of this policy, additional networkscan be added to enhance the system or tocompare with AlphaGo’s abilities. Timeshould be allocated appropriately to allaspects of the project in order to priori-tize the development of the reinforcementlearning policy, which leads to the nextpoint.

4. Don’t spend a lot of time on the game devel-opment or its output. I spent at least halfof my time implementing the game and

producing output samples in a way thatI thought would speed up the develop-ment process, and it had the reverse ef-fect. I developed 3 AI players based onan alpha-beta mini-max strategy, with dif-ferent heuristics. The purpose of the AIplayers was to provide a variety of inputdata for the supervised learning network.This is another example of time wasteddue to a lack of understanding. I thenformatted the sample game output in vari-ous ways to help determine desired inputto the network, which took a very longtime. Making matters worse, the AI play-ers are quite slow, playing one game inover a minute. If I were to go back andredo the process, I would not develop theAI players at all. As mentioned in the lastprinciple, I would have saved a lot of timefocusing on the reinforcement learning net-work development and feeding it features,then modifying those features as I went.The changes needed for the game itselfare discovered along the way, and hard topredict. Trying to implement them beforehand (e.g. functions needed for features,sample output formatting and creation,parsing of that sample data) wasted a lotof time, and a lot of it ended up not beingused.

5. Learn a simple, similar open source implemen-tation vigorously. Find a project on GitHubthat is finished and well documented, andgo through the important files. Theseprojects tend to include a ton of unusedand possibly incorrect files, so understand-ing a full walkthrough of used code in aprogram similar to yours will help youknow what to look for when analyzingother projects. Additionally, anytime theyexclude something that would normallybe included, even if there’s an explanation,try including it first. What didn’t work forthem might still work for you, and if notthen you can develop theories as to why itcan have a harmful effect on the results.

6. Try to keep the focus of your learning as di-

12

Page 13: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

rected as possible. Neural networks are be-coming very popular, and reading aboutvarious projects and applications is veryinteresting. That said, all of this informa-tion can make your project much moreconfusing than it has to be. Try to researchas much as possible inside the area of rel-evance of your project without strayinginto other interesting applications of deeplearning. Of course, this only applies tosignificantly short-term projects. If thegoal was to come up with unique waysof developing a neural network to play agame, researching a wide variety of suc-cessful architectures in unrelated topicswould be important. This could also iden-tify what specific qualities of an architec-ture apply in various contexts. Althoughfocusing on relevant material is important,narrowing your research too quickly canalso be problematic.

7. Don’t jump right into TensorFlow. Tensor-Flow is a great tool, and can save a lotof time, but trying to use it before hav-ing programmed smaller, more genericsteps is difficult if a developer is unfa-miliar with the library. Sometimes codethat is the simplest is the hardest to use,because its simplification makes it diffi-cult to read and understand. Addition-ally, using library methods that encompassmany complex steps makes code difficultto test and debug, especially if those stepsaren’t understood. This is a case where Iwould recommend Andrew Ng’s NeuralNetworks and Deep Learning course (Ng,2017). The course includes programmingassignments that break down each com-ponent of a neural network, and thereforeimparts an understanding of the work be-ing done by TensorFlow behind the scenes.

IX. Future Work

Any future trials of my project will includeintegrating the input into the neural network totest the system as a whole, trying out different

activation functions and/or a combinationof them, using different filter quantities anddimensions, specifying strides, and more. Thesubstitution of smaller sized boards will needto be tested as well, as it may greatly reducethe difficulty of the problem.

Additionally, I would like to try othergames that may be more applicable to neuralnetworks. I think it would be interestingto investigate uses of neural networks innon-board based games other than Atarigames or first-person-shooter games, wherethey are currently most popular. The projectcould also take a major turn and attempt tobe completely unsupervised, as inspired byAlphaGo Zero.

AlphaGo Zero is a new version of Al-phaGo that does not leverage any humanexpertise data (Silver et. al.,2017). AlphaGoZero defeated the version of AlphaGo dis-cussed in this paper by 100 games to 0, aswell as defeating the world champion in Go,Ke Jie, in 2017 (Silver et. al., 2017). AlphaGoZero is arguably the strongest Go player inhistory, human or computer. Only publishedthis October, the research paper containstechniques that could potentially be usedto facilitate the implementation of neuralnetworks for games that do not have expertdata available or are difficult to produce datafor.

X. Conclusion

Deep learning is currently a very popular de-velopment, being experimented with in veryinteresting fields such as speech and imagerecognition, autonomous vehicles, and recom-mendation systems. I believe a large reasonthis is such an important area of developmentis due to the combination of computational abil-ities with human abilities. Learning is a veryimportant characteristic of the human brainthat, if leveraged correctly, could result in avariety of systems that behave much more dy-namically than current artificial intelligence.

13

Page 14: Game Applications of Deep Neural Networks...AlphaGo defeated 18-time Go world champion Lee Sedol in 2016, and since then neural networks have been given serious consideration in this

Game Applications of Deep Neural Networks • Endicott, 2017

Theoretically, a player of a strategic board gamethat can attain the same knowledge of a humanexpert while leveraging the efficiency of moveanalysis for a machine, should defeat any hu-man player. This type of machine would alsoavoid human-like mistakes by not "missing"anything, if it was correctly designed. AlphaGois a huge step toward this. However, the fieldhas a lot of developments to be made that canbe applied to a larger variety of games. Thisresearch is meant to develop an understandingof AlphaGo’s strategic strengths as well as itsshortcomings when applying its techniques toother games. Despite the fact that Blokus isunfinished, I feel that I have learned a lot fromapplying AlphaGo’s techniques to it, and hopeto have more success in the future.

XI. References

1. Le, Q et. at. "Building High-level FeaturesUsing Large Scale Unsupervised Learn-ing", Proceedings of the 29th InternationalConference on Machine Learning, Edinburgh,Scotland, UK, 2012, Google, 2012

2. Mnih, V, et al. "Playing Atari withDeep Reinforcement Learning." Deep-Mind, DeepMind Technologies, 1 Jan. 2013,deepmind.com/research/publications/playing-atari-deep-reinforcement-learning/

3. Maas, A, et al. "Rectifier NonlinearitiesImprove Neural Network Acoustic Mod-els",Proceedings of the 30th International Con-ference on Machine Learning, Atlanta, Geor-gia, USA, 2013, JMLR:W&CP, 2013

4. Silver, D, et al. "Mastering the gameof Go with deep neural networks andtree search." Nature News, Nature Pub-lishing Group, 27 Jan. 2016, na-ture.com/articles/nature16961

5. McGlynn, G. "Go-playing neural networkin Python using TensorFlow" GitHub, 10May 2016, github.com/TheDuck314/go-NN

6. Ng, Andrew. "Neural Networks andDeep Learning" Coursera, Coursera inc,Jan. 2017, coursera.org

7. Tian, Y, et. al. "DeepTest: Au-tomated Testing of Deep-Neural-Network-driven Autonomous Cars"arXiv:1708.08559v1 [cs.SE], 28 Aug. 2017,https://arxiv.org/pdf/1708.08559.pdf

8. Siri Team. "Hey Siri: An On-deviceDNN-powered Voice Trigger for Ap-ple?s Personal Assistant" MachineLearning Journal, Apple inc., 2017,machinelearning.apple.com/2017/10/01/hey-siri.html

9. Siri Team. "An On-device Deep Neu-ral Network for Face Detection" Ma-chine Learning Journal, Apple inc., 2017,machinelearning.apple.com/2017/10/01/hey-siri.html

10. Ramachandran, P, et. al. "Searching forActivation Functions" Cornell UniversityLibrary, arXiv:1710.05941 [cs.NE], 16 Oct.2017, arxiv.org/abs/1710.05941

11. Silver, D, et al. "Mastering the game of Gowithout human knowledge." Nature News,Nature Publishing Group, 18 Oct. 2017,nature.com/articles/nature24270

12. Google, "tf.layers.conv2d" Tensor-Flow 1.4, Creative Commons 3.0Attribution License, 2 Nov. 2017,tensor f low.org/apidocs/python/t f /layers/conv2d

14