Monte Carlo Go Has a Way to Go

Monte Carlo Go Has a Way to Monte Carlo Go Has a Way to GoGo

Haruhiro Yoshimoto (*1)Haruhiro Yoshimoto (*1)Kazuki Yoshizoe (*1)Kazuki Yoshizoe (*1)

Tomoyuki Kaneko (*1)Tomoyuki Kaneko (*1)Akihiro Kishimoto (*2)Akihiro Kishimoto (*2)

Kenjiro Taura (*1)Kenjiro Taura (*1)(*1)University of Tokyo(*2)Future University Hakodate

Adapted from the slides presented at AAAI Adapted from the slides presented at AAAI 20062006

Games in AIGames in AI• Ideal test bed for AI researchIdeal test bed for AI research

– Clear resultsClear results– Clear motivationClear motivation– Good challengeGood challenge

• Success in search-based approachSuccess in search-based approach– chess (1997, Deep Blue)chess (1997, Deep Blue)– and othersand others

• Not successful in the game of GoNot successful in the game of Go– Go is to Chess as Poetry is to Double-entry Go is to Chess as Poetry is to Double-entry

accountingaccounting– It goes to the core of artificial intelligence, which It goes to the core of artificial intelligence, which

involves the study of learning and decision-making, involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuitionrecognition and, perhaps most intriguingly, intuition

The game of GoThe game of Go

• An 4,000 years old An 4,000 years old board game from board game from ChinaChina

• Standard size 19Standard size 19××1919

• Two players, Black Two players, Black and White, place the and White, place the stones in turnsstones in turns

• Stones can not be Stones can not be moved, but can be moved, but can be captured and taken offcaptured and taken off

• Larger territory winsLarger territory wins

Terminology of GoTerminology of Go

BlockBlock - connected stones of the same - connected stones of the same colorcolor

LibertyLiberty - adjacent empty intersection- adjacent empty intersection

CaptureCapturedd

- when no liberty available- when no liberty available

EyeEye - surrounded region providing one - surrounded region providing one or more safe libertiesor more safe liberties

Playing StrengthPlaying StrengthComputer programs

20 15 10 5 4 3 2 1

1 2 3 4 5 6 7 8 9

Kyu (student)

Dan

Level: weak strong

Master Professional

Handicap stones

Computer programs

20 15 10 5 4 3 2 1

1 2 3 4 5 6 7 8 9

Kyu (student)

Dan

Level: weak strong

Master Professional

Handicap stones

$1.2M was set for beating a professional wit$1.2M was set for beating a professional with no handicap (expired!!!)h no handicap (expired!!!)

Handtalk in 1997 claimed $7,700 for winning Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 an 11-stone handicap match against a 8-9 years old masteryears old master

Difficulties in Computer GoDifficulties in Computer Go

• Large search spaceLarge search space

– the game becomes progressively more the game becomes progressively more complex, at least for the first 100 plycomplex, at least for the first 100 ply

ChessChess GoGo

Board sizeBoard size 88×8×8 1919×19×19

DepthDepth ~80~80 ~300~300

Branching factorBranching factor 3535 235235

Search spaceSearch space 10104040 1010170170

Difficulties in Computer GoDifficulties in Computer Go• Lack of good evaluation Lack of good evaluation

functionfunction– a material advantage does not a material advantage does not

mean a simple way to victory, mean a simple way to victory, and may just mean that short-and may just mean that short-term gain has been given term gain has been given prioritypriority

– legal moves around 150–250, legal moves around 150–250, usually <50 acceptable (even usually <50 acceptable (even <10), but computers have a <10), but computers have a hard time distinguishing them. hard time distinguishing them.

• Very high degree of pattern Very high degree of pattern recognition involved in recognition involved in human capacity to play human capacity to play well. well.

Why Monte Carlo Go?Why Monte Carlo Go?

• Success in other domainsSuccess in other domainsBridge [Ginsberg:1999], Poker [Billings et al.:2002]Bridge [Ginsberg:1999], Poker [Billings et al.:2002]

• Reasonable position evaluation based on samplingReasonable position evaluation based on samplingsearch space from O(bsearch space from O(bdd) to O(Nbd)) to O(Nbd)

• Easy to parallelizeEasy to parallelize• Can win against search-based approachCan win against search-based approach

– Crazy Stone won the 11th Computer Olympiad in 9x9 GoCrazy Stone won the 11th Computer Olympiad in 9x9 Go– MoGo 19MoGo 19thth, 20, 20thth KGS 9x9 winner, rated highest on CGOS KGS 9x9 winner, rated highest on CGOS

Replace evaluation function by random samplingBrugmann:1993, Bouzy:2003Brugmann:1993, Bouzy:2003

Basic idea of Monte Carlo Basic idea of Monte Carlo GoGo

• Generate next moves by 1-ply searchGenerate next moves by 1-ply search

• Play a number of random games and Play a number of random games and compute the expected scorecompute the expected score

• Choose the move with the maximal Choose the move with the maximal scorescore

• The only domain-dependent The only domain-dependent information is eye.information is eye.

Terminal Position of GoTerminal Position of Go

Larger territory wins

Territory = surrounded area + stones

▲ Black’s territory is 36 points× White’s territory is 45 points

White wins by 9 points

ExampleExample

• Play many sample games– Each player plays

randomly

• Compute average points for each move

• Select the move that has the highest average9 points win for black 5 points win for black

move A: (5 + 9) / 2 = 7 points

Play rest of the game randomly

Monte Carlo Go and Sample Monte Carlo Go and Sample SizeSize

• Can reduce statistical errors with Can reduce statistical errors with additional samples additional samples

• Relationships between sample size and Relationships between sample size and strength are not yet investigatedstrength are not yet investigated– Sampling error～– N: # of random gamesN: # of random games

Diminishing returns must appearDiminishing returns must appear

Monte Carlo with1000 sample games

Monte Carlo with100 sample gamesStronger

than

Our Monte Carlo Go Our Monte Carlo Go ImplementationImplementation

• basic Monte Carlo Gobasic Monte Carlo Go• atari-50 enhancement: atari-50 enhancement: Utilization of simple

go knowledge in move selection• progressive pruning [Bouzy 2003]: progressive pruning [Bouzy 2003]: statistic

al move pruning in simulations

Atari-50 EnhancementAtari-50 Enhancement

• Basic Monte Carlo: assign Basic Monte Carlo: assign uniform probability for uniform probability for each move in sample each move in sample game (no eye filling)game (no eye filling)

• Atari-50: higher Atari-50: higher probability for capture probability for capture moves moves – Capture is “mostly” a good Capture is “mostly” a good

movemove– 50%50% Move A captures black stones

Progressive Pruning Progressive Pruning [Bouzy2003][Bouzy2003]• Try sampling with smaller sample Try sampling with smaller sample

sizesize

• Prune statistically inferior movesPrune statistically inferior moves

score

move

Can assign more sample games to promising moves

Experimental DesignExperimental Design

• MachineMachine– Intel Xeon Dual CPU at 2.40 GHz with 2 GB memoryIntel Xeon Dual CPU at 2.40 GHz with 2 GB memory– Use 64 PCs (128 processors) connected by 1GB/s Use 64 PCs (128 processors) connected by 1GB/s

networknetwork

• Three versions of programsThree versions of programs– BASIC: Basic Monte Carlo GoBASIC: Basic Monte Carlo Go– ATARI: BASIC + Atari-50 enhancementATARI: BASIC + Atari-50 enhancement– ATARIPP: ATARI + Progressive PruningATARIPP: ATARI + Progressive Pruning

• ExperimentsExperiments– 200 self-play games200 self-play games– Analysis of decision quality from 58 professional Analysis of decision quality from 58 professional

gamesgames

Diminishing ReturnsDiminishing Returns4*4*NN samples vs samples vs NN samples samplesfor each movefor each move

Additional enhancements and Additional enhancements and Winning PercentageWinning Percentage

Decision Quality of Each Decision Quality of Each MoveMove

153025

20 17 10

72112

Evaluation score of “Oracle”(64 million sample games)

Selected move for100 sample gameMonte Carlo Go

Average error of one move is((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points

a b c

1

2

3

2b -> 9 times2c -> 1 times

Decision Quality of Each MoveDecision Quality of Each Move(Basic)(Basic)

Decision Quality of Each Move Decision Quality of Each Move (with Atari50 Enhancement)(with Atari50 Enhancement)

Summary of Experimental Summary of Experimental ResultsResults

• Additional enhancements improve Additional enhancements improve strength of Monte Carlo Gostrength of Monte Carlo Go

• Diminish returns eventuallyDiminish returns eventually

• Additional enhancements get quicker Additional enhancements get quicker diminishing returnsdiminishing returns

• Need to collect more samples in the Need to collect more samples in the early stage game of 9x9 Goearly stage game of 9x9 Go

Conclusions and Future Conclusions and Future WorkWork• ConclusionsConclusions

– Additional samples achieve only small improvementsAdditional samples achieve only small improvements• Not like search algorithm, e.g. chessNot like search algorithm, e.g. chess

– Good at strategy, not tacticsGood at strategy, not tactics• blunder due to lack of domain knowledgeblunder due to lack of domain knowledge

– Easy to evaluateEasy to evaluate– Easy to parallelizeEasy to parallelize– The way for Monte Carlo Go to goThe way for Monte Carlo Go to go

Small sample games with many enhancements will be promisingSmall sample games with many enhancements will be promising• Future WorkFuture Work

– Adjust probability with pattern matchingAdjust probability with pattern matching– LearningLearning– Search + Monte Carlo GoSearch + Monte Carlo Go

• MoGo (exploration-exploitation in the search tree using UCT)MoGo (exploration-exploitation in the search tree using UCT)– Scale to 19Scale to 19×19×19

Reference:Reference:• Go wiki Go wiki http://en.wikipedia.org/wiki/Go_(board_game)http://en.wikipedia.org/wiki/Go_(board_game)• Gnu Go Gnu Go http://http://www.gnu.org/software/gnugowww.gnu.org/software/gnugo//• KGS Go Server KGS Go Server http://www.gokgs.comhttp://www.gokgs.com• CGOS 9x9 Computer Go Server CGOS 9x9 Computer Go Server http://http://cgos.boardspace.netcgos.boardspace.net

QuestionsQuestions

??

http://en.wikipedia.org/wiki/Go_(board_game)

http://www.gnu.org/software/gnugo/



http://www.gokgs.com/

http://cgos.boardspace.net/

http://cgos.boardspace.net/

Documents

Monte Carlo Go Has a Way to Go