26
Monte-Carlo Tree Search in Crazy Stone emi Coulom Universit´ e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007 UEC and 12th Game Programming Workshop, Japan

Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

  • Upload
    haphuc

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

Monte-Carlo Tree Search in Crazy Stone

Remi Coulom

Universite Charles de Gaulle, INRIA, CNRS, Lille, France

November 8th-10th, 2007

UEC and 12th Game Programming Workshop, Japan

Page 2: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

Talk Outline

1 Introduction

2 Crazy Stone’s AlgorithmPrinciples of Monte-Carlo EvaluationTree SearchPatterns

3 Playing Style

4 Conclusion

Page 3: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

A New Approach to Go

The Challenge of Go

strongest programs weaker than amateur humans

Difficulty of Position Evaluation

has to be dynamic

unlike quiescence search + static evaluation of western chess

local search lacks global understanding

The Monte-Carlo Approach

random playouts

dynamic evaluation with global understanding

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 4: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

The Monte-Carlo Revolution: Pioneers

1993: Bernd Brugmann (Gobble)

Not considered seriously

2000-2005: The Paris School

Bernard Helmstetter (Oleg)

Tristan Cazenave (Golois)

Bruno Bouzy (Indigo)

Guillaume Chaslot (Mango), joined in 2005

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 5: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

The Monte-Carlo Revolution: Success

2006: Success on small boards

Crazy Stone wins 9× 9 Computer Olympiad

Viking (Magnus Persson), then Crazy Stone, then MoGo(Yizao Wang and Sylvain Gelly) lead 9× 9 CGOS

2007: Success on all boards

MoGo wins 19× 19 Computer Olympiad

Steenvreter (Erik van der Werf) wins 9× 9

Crazy Stone beats KCC Igo with a score of 15-4 on 19× 19

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 6: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Principle: Random Playouts

One Playout

Play at random

Don’t fill-up eyes

Position Evaluation

Run many playouts

Average them

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 7: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Move-Selection Method

4/103/109/10

Algorithm

N playouts for every move

pick the best winning rate

Cost

accurate like 1/√

N

0.01 precision requires∼ 10, 000 playouts

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 8: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Efficient Playout Allocation

4/92/614/15

Idea

more playouts to best moves

UCB: Upper Confidence Bound

UCBi =Wi

Ni+ c

√log t

Ni

Wi : wins (move i)

Ni : playouts (move i)

c: exploration parameter

t: playouts (all moves)

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 9: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Recursive Tree Search: UCT

3/92/69/15

Apply UCB to every positionvisited more than N0 times

No min-max backup:backup average outcome

Proved convergence tomin-max value

Best-first tree growth

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 10: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Efficiency of Tree Search

Successes

gold in Turin Olympiad on 9× 9

9× 9 level on KGS: about 10k

strength scales with thinking time

only domain knowledge: don’t fill eyes, and in atari, extend

Limits

Not deep enough, even on 9× 9

Too many moves on 19× 19

19× 19 level on KGS: about 30k

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 11: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Patterns

learnt from human games

Combine several features:

shape (surrounding stones)distance to previous movecapture, extension. . .

Probability distribution over moves

Used in playouts

High probability

Low probability

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 12: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Random playout with patterns

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 13: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Comparison 1

no patterns patterns

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 14: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Comparison 2

no patterns patterns

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 15: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Progressive Widening

Sort moves with patterns

Keep best moves only

Progressively add more

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 16: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Principles of Monte-Carlo EvaluationTree SearchPatterns

Playing Strength

Stronger than classical programs on 19× 19

Ranked 2k on KGS

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 17: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Crazy Fuseki

MoGoCrazy Stone

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 18: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Play in the Center

GNU GoCrazy Stone

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 19: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Win by 0.5, Lose by a lot

Crazy StoneJimmy

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 20: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Speculative Attacks: Provoke Opponent Blunder

Go IntellectCrazy Stone

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 21: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Speculative Attacks: Another Tricky Move

Miel (human)Crazy Stone

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 22: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Ugly Blunder

Crazy StoneHuman

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 23: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Future of Monte-Carlo Search

Improving Crazy Stone further

More knowledge: playouts + progressive widening

Adaptive playouts

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 24: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Adaptive playouts

adaptive UCB policy

static playout policy

Interesting ideas in RLGO (David Silver)

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 25: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

Future of Monte-Carlo Search

Application to Other Domains

Other games (Hex, Clobber)

Automated book learning (for chess?)

Automated Planning in general

Remi Coulom Monte-Carlo Tree Search in Crazy Stone

Page 26: Monte-Carlo Tree Search in Crazy Stone · Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´e Charles de Gaulle, INRIA, CNRS, Lille, France November 8th-10th, 2007

IntroductionCrazy Stone’s Algorithm

Playing StyleConclusion

If You Wish to Know More

http://remi.coulom.free.fr/Hakone2007/

Download these slides

Download papers

Connect to KGS and play against Crazy Stone

Remi Coulom Monte-Carlo Tree Search in Crazy Stone