Learning to Play: History of Machine Learning in Classic Games

  • View

  • Download

Embed Size (px)

Text of Learning to Play: History of Machine Learning in Classic Games

How machines learn to playOfer egozi

Not a deck about AI in general, not even ML in general, but specifically about interesting efforts around the use of AI to play games, which was an academic interest of mine from the first AI courses I took

Artificial IntelligenceLets start from the basicsActually, lets go straight to the cutting edge!Heres a demo for a multi-layered convolutional neural network using feed-forward training to perform supervised learning


Well-defined rulesA clear, measurable goalA task one can train forBonus: human opponents!

AI was always about how to make algorithms do tasks that humans need intelligence to do. And AI researchers have always been fascinated with game playing. Lets see what makes games such an interesting task3

Lets get started with a very simple game tic tac toe4

The ADDITRONJosef Kates1950

Josef Kates was a young Austrian Jew whose family fled the Nazis to Canada, where he worked in a radar parts manufacturing company called Rogers Majestic. Kates designed an advanced version of vacuum tubes that could do basic arithmetic called the Additron and was waiting for the company to patent and commercialize it. In the meantime, he wanted to demonstrate its potential power, and so he built a computer demonstration in a science fair the company was in.5

Josef Kates1950

Bertie the Brain was the first known computer game implemented. He programmed it so the playing level could be tuned down to allow people to sometimes win It was an extremely popular exhibit, but was dismantled after the exhibition. Unfortunately the Additron became obsolete by the time the patent process ended, so it was abortedRecognize the player?... the comedian Danny Kay happy after winning6

Early days MiniMaxBuild a tree of game states (from current state)Well-defined transition rules

Define a function to score each stateHow close are we to the goal (a winning board)?

Choose path that maximizes our gain and minimizes opponents gains

We assume opponent plays well and wont miss opportunities. When you have the tree then this is called in AI a search problem you search through the states space, this is where we apply what some of you know as DFS, BFS and heuristic approaches such as A*. Luckily, here the search space is not very large7

Toc-Tac-Toe has only 765 unique states Solved!!

Remember where this screenshot is from?8


5 x 1020 (500 billion billion) possible positions


Arthur L. Samuel1956

1956: Samuels checkers (IBM) It actually learned from playing, discovered more board positions11


1989: Schaeffers Chinook (U. Alberta)The only thing that changed is the glasses style 12


1992: Chinook loses to the world champion


Marion Tinsley

Marion Tinsley, a priest, PhD. The best player ever. Lost only 7 games in 45-year career.I studied the game so intensely, the board is burnt onto my mind I'm human, it's just that I have a better programmer God gave me a logical mind14


With his passing, we lost not only a feared adversary but also a friend. Every member of our team had the deepest respect and admiration for Tinsley. It was a privilege to know him

Resigned his title to play. 1992: Beat Chinook 4-2-33 (the 2 were Tinsleys 6 and 7th losses ever). Youre going to regret that.1994: Tinsley resigned the match citing health reasons after 6 draws. Chinook was declared winner.1995: Tinsley asked for a rematch to regain his title. His cancer relapsed. He died 3 months later.15


I actually got to meet Schaeffer. He decided to prove that humans cant beat the machine after receiving hate letters following Tinsleys death.The challenge was actually all about hardware, and at one point it made more sense to wait 3 years for 64-bit processors to become common than to modify the whole software.16

10120 possible positions (checkers squared)




We all know Shannon for information theory, but Chess was a hobby of his in this paper he practically invented the use of minimax. We can see the basic traits of games for AI right here in this quote, very fascinating and easy to read paper.19


160,000 positions per second1980

Alpha-beta pruning



200 million positions per second

1996: Kasparov won 4-21997: Deep Blue won 3.5-2.521

Generic LearningSo far, humans were central in the learning processPre-encoding the allowed movesProviding the winning states

Can machines learn on their own, like real toddlers?







1) 33600 raw pixels2) Target scoreInput:2013

Output: player!

And thats it!...26

Were skipping entire ML courses now

Whats fundamentally different about Deep Learning?No predefined rules a generic systemA bishop moves this way, and a knight this wayNo domain knowledge system finds the featurescount #pieces within 3 steps from the kingDeep learning

In classic machine learning the builder provides a lot of domain knowledge, like a teacher teaching kids in a course. In deep learning its more like learning on your own just from watching many games being played.27

Were skipping even more entire ML courses now

Uses Artificial Neural Network, with LOTS of dataDeep Learning == multiple hidden layersDeep learning

Magic happens HEREMagic built-in


Showtime!Deep learning in 8-bit


2016#positions > #atoms in universe1202 CPUs and 176 GPUs


Its a lot more difficult to use minimax and alpha-beta pruning in GOAfter acquisition, DeepMind was used for a lot more than this; for example, they optimized Googles electricity consumption which already paid off the acquisition30

How about AI Building the game?...


Generative language modelsNot done with skipping ML courses just yet First, lets divert to literature for a bit, shall we?...

Robert Cohn was once middleweight boxi?


Whats the next letter? How do you know?...Generative models are such that learn the patterns to the extent that they can then produce output that is similar to the input in these hidden patterns. Well talk text but same is for speech, images etc.32

Generative language models

Feed the book into an RNN, let it train itself

100 iterations:1000 iterations:

Recurrent Neural Network learns a sequence33

Generative language models

Feed the book into an ANN, let it train itself