Upload
gusty
View
41
Download
1
Embed Size (px)
DESCRIPTION
Monte Carlo Go Has a Way to Go. Adapted from the slides presented at AAAI 2006. Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1). (*1)University of Tokyo (*2) Future University Hakodate. Games in AI. - PowerPoint PPT Presentation
Citation preview
Monte Carlo Go Has a Way to Monte Carlo Go Has a Way to GoGo
Haruhiro Yoshimoto (*1)Haruhiro Yoshimoto (*1)Kazuki Yoshizoe (*1)Kazuki Yoshizoe (*1)
Tomoyuki Kaneko (*1)Tomoyuki Kaneko (*1)Akihiro Kishimoto (*2)Akihiro Kishimoto (*2)
Kenjiro Taura (*1)Kenjiro Taura (*1)(*1)University of Tokyo(*2)Future University Hakodate
Adapted from the slides presented at AAAI Adapted from the slides presented at AAAI 20062006
Games in AIGames in AI• Ideal test bed for AI researchIdeal test bed for AI research
– Clear resultsClear results– Clear motivationClear motivation– Good challengeGood challenge
• Success in search-based approachSuccess in search-based approach– chess (1997, Deep Blue)chess (1997, Deep Blue)– and othersand others
• Not successful in the game of GoNot successful in the game of Go– Go is to Chess as Poetry is to Double-entry Go is to Chess as Poetry is to Double-entry
accountingaccounting– It goes to the core of artificial intelligence, which It goes to the core of artificial intelligence, which
involves the study of learning and decision-making, involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuitionrecognition and, perhaps most intriguingly, intuition
The game of GoThe game of Go
• An 4,000 years old An 4,000 years old board game from board game from ChinaChina
• Standard size 19Standard size 19××1919
• Two players, Black Two players, Black and White, place the and White, place the stones in turnsstones in turns
• Stones can not be Stones can not be moved, but can be moved, but can be captured and taken offcaptured and taken off
• Larger territory winsLarger territory wins
Terminology of GoTerminology of Go
BlockBlock - connected stones of the same - connected stones of the same colorcolor
LibertyLiberty - adjacent empty intersection- adjacent empty intersection
CaptureCapturedd
- when no liberty available- when no liberty available
EyeEye - surrounded region providing one - surrounded region providing one or more safe libertiesor more safe liberties
Playing StrengthPlaying StrengthComputer programs
20 15 10 5 4 3 2 1
1 2 3 4 5 6 7 8 9
Kyu (student)
Dan
Level: weak strong
Master Professional
Handicap stones
Computer programs
20 15 10 5 4 3 2 1
1 2 3 4 5 6 7 8 9
Kyu (student)
Dan
Level: weak strong
Master Professional
Handicap stones
$1.2M was set for beating a professional wit$1.2M was set for beating a professional with no handicap (expired!!!)h no handicap (expired!!!)
Handtalk in 1997 claimed $7,700 for winning Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 an 11-stone handicap match against a 8-9 years old masteryears old master
Difficulties in Computer GoDifficulties in Computer Go
• Large search spaceLarge search space
– the game becomes progressively more the game becomes progressively more complex, at least for the first 100 plycomplex, at least for the first 100 ply
ChessChess GoGo
Board sizeBoard size 88×8×8 1919×19×19
DepthDepth ~80~80 ~300~300
Branching factorBranching factor 3535 235235
Search spaceSearch space 10104040 1010170170
Difficulties in Computer GoDifficulties in Computer Go• Lack of good evaluation Lack of good evaluation
functionfunction– a material advantage does not a material advantage does not
mean a simple way to victory, mean a simple way to victory, and may just mean that short-and may just mean that short-term gain has been given term gain has been given prioritypriority
– legal moves around 150–250, legal moves around 150–250, usually <50 acceptable (even usually <50 acceptable (even <10), but computers have a <10), but computers have a hard time distinguishing them. hard time distinguishing them.
• Very high degree of pattern Very high degree of pattern recognition involved in recognition involved in human capacity to play human capacity to play well. well.
Why Monte Carlo Go?Why Monte Carlo Go?
• Success in other domainsSuccess in other domainsBridge [Ginsberg:1999], Poker [Billings et al.:2002]Bridge [Ginsberg:1999], Poker [Billings et al.:2002]
• Reasonable position evaluation based on samplingReasonable position evaluation based on samplingsearch space from O(bsearch space from O(bdd) to O(Nbd)) to O(Nbd)
• Easy to parallelizeEasy to parallelize• Can win against search-based approachCan win against search-based approach
– Crazy Stone won the 11th Computer Olympiad in 9x9 GoCrazy Stone won the 11th Computer Olympiad in 9x9 Go– MoGo 19MoGo 19thth, 20, 20thth KGS 9x9 winner, rated highest on CGOS KGS 9x9 winner, rated highest on CGOS
Replace evaluation function by random samplingBrugmann:1993, Bouzy:2003Brugmann:1993, Bouzy:2003
Basic idea of Monte Carlo Basic idea of Monte Carlo GoGo
• Generate next moves by 1-ply searchGenerate next moves by 1-ply search
• Play a number of random games and Play a number of random games and compute the expected scorecompute the expected score
• Choose the move with the maximal Choose the move with the maximal scorescore
• The only domain-dependent The only domain-dependent information is eye.information is eye.
Terminal Position of GoTerminal Position of Go
Larger territory wins
Territory = surrounded area + stones
▲ Black’s territory is 36 points× White’s territory is 45 points
White wins by 9 points
ExampleExample
• Play many sample games– Each player plays
randomly
• Compute average points for each move
• Select the move that has the highest average9 points win for black 5 points win for black
move A: (5 + 9) / 2 = 7 points
Play rest of the game randomly
Monte Carlo Go and Sample Monte Carlo Go and Sample SizeSize
• Can reduce statistical errors with Can reduce statistical errors with additional samples additional samples
• Relationships between sample size and Relationships between sample size and strength are not yet investigatedstrength are not yet investigated– Sampling error~– N: # of random gamesN: # of random games
Diminishing returns must appearDiminishing returns must appear
Monte Carlo with1000 sample games
Monte Carlo with100 sample gamesStronger
than
Our Monte Carlo Go Our Monte Carlo Go ImplementationImplementation
• basic Monte Carlo Gobasic Monte Carlo Go• atari-50 enhancement: atari-50 enhancement: Utilization of simple
go knowledge in move selection• progressive pruning [Bouzy 2003]: progressive pruning [Bouzy 2003]: statistic
al move pruning in simulations
Atari-50 EnhancementAtari-50 Enhancement
• Basic Monte Carlo: assign Basic Monte Carlo: assign uniform probability for uniform probability for each move in sample each move in sample game (no eye filling)game (no eye filling)
• Atari-50: higher Atari-50: higher probability for capture probability for capture moves moves – Capture is “mostly” a good Capture is “mostly” a good
movemove– 50%50% Move A captures black stones
Progressive Pruning Progressive Pruning [Bouzy2003][Bouzy2003]• Try sampling with smaller sample Try sampling with smaller sample
sizesize
• Prune statistically inferior movesPrune statistically inferior moves
score
move
Can assign more sample games to promising moves
Experimental DesignExperimental Design
• MachineMachine– Intel Xeon Dual CPU at 2.40 GHz with 2 GB memoryIntel Xeon Dual CPU at 2.40 GHz with 2 GB memory– Use 64 PCs (128 processors) connected by 1GB/s Use 64 PCs (128 processors) connected by 1GB/s
networknetwork
• Three versions of programsThree versions of programs– BASIC: Basic Monte Carlo GoBASIC: Basic Monte Carlo Go– ATARI: BASIC + Atari-50 enhancementATARI: BASIC + Atari-50 enhancement– ATARIPP: ATARI + Progressive PruningATARIPP: ATARI + Progressive Pruning
• ExperimentsExperiments– 200 self-play games200 self-play games– Analysis of decision quality from 58 professional Analysis of decision quality from 58 professional
gamesgames
Diminishing ReturnsDiminishing Returns4*4*NN samples vs samples vs NN samples samplesfor each movefor each move
Additional enhancements and Additional enhancements and Winning PercentageWinning Percentage
Decision Quality of Each Decision Quality of Each MoveMove
153025
20 17 10
72112
Evaluation score of “Oracle”(64 million sample games)
Selected move for100 sample gameMonte Carlo Go
Average error of one move is((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points
a b c
1
2
3
2b -> 9 times2c -> 1 times
Decision Quality of Each MoveDecision Quality of Each Move(Basic)(Basic)
Decision Quality of Each Move Decision Quality of Each Move (with Atari50 Enhancement)(with Atari50 Enhancement)
Summary of Experimental Summary of Experimental ResultsResults
• Additional enhancements improve Additional enhancements improve strength of Monte Carlo Gostrength of Monte Carlo Go
• Diminish returns eventuallyDiminish returns eventually
• Additional enhancements get quicker Additional enhancements get quicker diminishing returnsdiminishing returns
• Need to collect more samples in the Need to collect more samples in the early stage game of 9x9 Goearly stage game of 9x9 Go
Conclusions and Future Conclusions and Future WorkWork• ConclusionsConclusions
– Additional samples achieve only small improvementsAdditional samples achieve only small improvements• Not like search algorithm, e.g. chessNot like search algorithm, e.g. chess
– Good at strategy, not tacticsGood at strategy, not tactics• blunder due to lack of domain knowledgeblunder due to lack of domain knowledge
– Easy to evaluateEasy to evaluate– Easy to parallelizeEasy to parallelize– The way for Monte Carlo Go to goThe way for Monte Carlo Go to go
Small sample games with many enhancements will be promisingSmall sample games with many enhancements will be promising• Future WorkFuture Work
– Adjust probability with pattern matchingAdjust probability with pattern matching– LearningLearning– Search + Monte Carlo GoSearch + Monte Carlo Go
• MoGo (exploration-exploitation in the search tree using UCT)MoGo (exploration-exploitation in the search tree using UCT)– Scale to 19Scale to 19×19×19
Reference:Reference:• Go wiki Go wiki http://en.wikipedia.org/wiki/Go_(board_game)http://en.wikipedia.org/wiki/Go_(board_game)• Gnu Go Gnu Go http://http://www.gnu.org/software/gnugowww.gnu.org/software/gnugo//• KGS Go Server KGS Go Server http://www.gokgs.comhttp://www.gokgs.com• CGOS 9x9 Computer Go Server CGOS 9x9 Computer Go Server http://http://cgos.boardspace.netcgos.boardspace.net
QuestionsQuestions
??