74
From Monte-Carlo to win rate first search for “Dobutsu Shogi” 2010/05/22 IHARA Takehiro

win rate first search

Embed Size (px)

DESCRIPTION

discussion of Monte-Carlo search method for computer game, especially shogi (Japanese chess). presenting "win rate first search."

Citation preview

Page 1: win rate first search

From Monte-Carloto win rate first searchfor “Dobutsu Shogi”

2010/05/22IHARA Takehiro

Page 2: win rate first search

Abstract

• On algorithm for computer Shogi (Japanese chess)

• Contents– Exhibition of Dobutsu Shogi– Min-max method (conventional)– Monte-Carlo method (conventional)– Win rate first search (presented)

Page 3: win rate first search

Dobutsu shogi

• This slide mentions computer game algorithm by using Dobutsu Shogi

• Dobutsu Shogi: a miniature shogi• Shogi: Japanese chess• Dobutsu: animal• Normal shogi is too large to examine new

methods

Page 4: win rate first search

Rule of Dobutsu Shogi 1

Five kind of piecesInitial position is as figureWin if you catch lionWin if your lion reaches to opposite end

Chick promotes chicken

Page 5: win rate first search

Rule of Dobutsu Shogi 2

All pieces move by one step

forwardvertical horizontal and forward-diagonal

around 8 squares

diagonal

vertical horizontal

You can reuse (drop) the pieces that you took

Page 6: win rate first search

Copy right of Dobutsu shogi

• I do not know who has copy right– FUJITA Maiko (illustration)

– KITAO Madoka (making rule)

– LPSA (the two designers had belonged to)

– GENTOSHA Education (toy seller)

Page 7: win rate first search

Illustration on this slide

• Because of that complex copy right, I use the illustrations on the website below in this slide, instead of FUJITA's ones

• “SOZAIYA JUN”• (http://park18.wakwak.com/~osyare/)

Page 8: win rate first search

Exhibition initial position

Black: win rate first search (presented)White: min-max method, search depth 9, evaluation function is composed by only piece value (conventional)

Page 9: win rate first search

Exhibition 1st move

Black advanced giraffe

Page 10: win rate first search

Exhibition 2nd move

White advanced giraffe

Page 11: win rate first search

Exhibition 3rd move

Black took chick by chick

Page 12: win rate first search

Exhibition 4th move

White took chick by elephant

Page 13: win rate first search

Exhibition 5th move

Black advanced elephant

Page 14: win rate first search

Exhibition 6th move

White dropped chick for defense

Page 15: win rate first search

Exhibition 7th move

Black moved giraffe backward

Page 16: win rate first search

Exhibition 8th move

White advanced giraffe

Page 17: win rate first search

Exhibition 9th move

Black dropped chick for defense

Page 18: win rate first search

Exhibition 10th move

White took elephant by giraffe

Page 19: win rate first search

Exhibition 11th move

Black took giraffe by lion

Page 20: win rate first search

Exhibition 12th move

White dropped elephantThis elephant combination style is strong

Page 21: win rate first search

Exhibition 13th move

Black lion escaped

Page 22: win rate first search

Exhibition 14th move

White advanced lion

Page 23: win rate first search

Exhibition 15th move

Black dropped giraffe and check

Page 24: win rate first search

Exhibition 16th move

White escaped lion

Page 25: win rate first search

Exhibition 17th move

Black advanced giraffeBlack forced white to select taking giraffe or escaping elephant

Page 26: win rate first search

Exhibition 18th move

White took giraffe by elephant

Page 27: win rate first search

Exhibition 19th move

Black took elephant by lion

Page 28: win rate first search

Exhibition 20th move

White dropped giraffe

Page 29: win rate first search

Exhibition 21st move

Black dropped elephant behind lion

Page 30: win rate first search

Exhibition 22nd move

White moved elephant backward

Page 31: win rate first search

Exhibition 23rd move

Black advanced elephant

Page 32: win rate first search

Exhibition 24th move

White check by giraffe

Page 33: win rate first search

Exhibition 25th move

Black took giraffe by elephant

Page 34: win rate first search

Exhibition 26th move

White took elephant by chickIf white had taken by elephant, white would be mate

Page 35: win rate first search

Exhibition 27th move

Black lion escaped

Page 36: win rate first search

Exhibition 28th move

White dropped elephant

Page 37: win rate first search

Exhibition 29th move

Black check by giraffe

Page 38: win rate first search

Exhibition 30th move

White took giraffe by elephant

Page 39: win rate first search

Exhibition 31st move

Black took chick by lion, and white resignedAfter it, white drops giraffe on side of lion, black giraffe takes elephant and check, white lion takes it, black chick advances, white lion moves backward, black drops chick, check mate

Page 40: win rate first search

Min-max method

• A conventional method• Today the most successful method for shogi• Explanation using tree structure from next

page

Page 41: win rate first search

Min-max Example: 3 depth

Present board positionBoard position

after 1 and 2 moves

Board position after 3 moves

Page 42: win rate first search

Min-maxS

uppose scores after 3 moves

were revealed

-4 -3 10 3 -9 5 23 -8

Page 43: win rate first search

Min-maxS

cores after 2 moves are

maxim

um of each score

-4 -3 10 3 -9 5 23 -8

-3 10 5 23

Page 44: win rate first search

Min-maxS

cores after 1 moves are

minim

um of each score

-4 -3 10 3 -9 5 23 -8

-3 10

5

23

-3

5

Page 45: win rate first search

Min-maxS

elect the move having

maxim

um score

-4 -3 10 3 -9 5 23 -8

-3 10

5

23

-3

5

5

Page 46: win rate first search

Min-max method

• Theoretically you can select the move that has the maximum score after N moves

• Theoretically if we could obtain the score of the end of the game, we would always win the game

• Practically because of too large computational cost, we cannot calculate all moves

Page 47: win rate first search

Min-max method

• Although many methods for reducing computational cost is presented, they will be not mentioned this slide (It is called pruning to reduce the number of searched nodes)

Page 48: win rate first search

Conclusion of min-max method

• It uses tree structure• Scores after N moves are needed• Pruning is needed

Page 49: win rate first search

Monte-Carlo method

• While I do not know the history of Monte-Carlo method, it have been successful for computer “go” (precisely successful by Monte-Carlo tree search)

• They say that it is difficult to apply computer shogi (or chess-like game) yet

Page 50: win rate first search

Outline of Monte-Carlo

• Repeat random moves

• Then game finishes and winner is revealed

• making game end by random moves is called playout

first move

end of gameplayout

random m

ove

Page 51: win rate first search

Outline of Monte-Carlo

• Repeat playout• Obtain win rate of

the first move• (number of win) /

(number of playout)• Select move having

highest win rate at the last

Page 52: win rate first search

Outline of Monte-Carlo

• Outline is only it• As to “Go”, this method has become

stronger by combining tree structure and making Monte-Carlo tree search (this slide does not mention it)

• Another improvement is that playout uses moves by knowledge of “Go” instead of simple random moves

Page 53: win rate first search

Example of knowledge of “Go”

• Observe 3x3 squares• Set low probability to drop

black stone the center of above figure

• Set high probability to drop black stone the center of below figure

Page 54: win rate first search

Monte-Carlo for shogi

• Simple Monte-Carlo method does not work for shogi (too many bad moves appear)

• A causal must be that few moves in all legal moves are good on shogi

• I do not want to use knowledge of shogi by neither machine learning nor manual setting

Page 55: win rate first search

Why Monte-Carlo for shogi

• Ability to determine the move by result of the end of game, which seems beautiful

• No evaluation function is needed, no preset knowledge is needed

Page 56: win rate first search

Discussion Monte using treeS

imple random

moves lead

equal win rate betw

eengreen and red

Truth is that green win and red loseIt tells importance of tree structure

Page 57: win rate first search

Discussion Monte using treeS

uppose you obtain win rate

after 3 moves

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

Obtain win rate of green and red from These 3-move-after rates by playout

Page 58: win rate first search

Discussion Monte using treeIdeally the rates are equal toones of m

in-max m

ethod

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

0.3 0.8 0.6 0.9

0.3 0.6

Page 59: win rate first search

Discussion Monte using tree

• Q: How do you calculate parent node 0.6 by children nodes 0.2 and 0.6

• A: Ignore 0.2

0.2 0.6

0.6

Page 60: win rate first search

Discussion Monte using tree

• Q: How do you ignore 0.2?• A1: Always search maximum

win rate node• A2: sometimes search through

node randomly

0.2 0.6

0.6

Page 61: win rate first search

Discussion Monte using treeS

earch node that hasm

aximum

win rate

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

This tactics finds the best path

Page 62: win rate first search

Win rate first search

• Remember win rate of searched node• Almost always search node that has

maximum win rate• Sometimes search randomly (ideally it is

not needed)• Then this algorithm finds the best move

Page 63: win rate first search

Additional explanation

• Update win rate at every playout• Keep numerator and denominator as win

rate• Add constant number to both numerator and

denominator when win the playout• Add constant number to only denominator

when lose the playout

Page 64: win rate first search

Problems of presented method

• Win rates of the nodes that have not been searched are mentioned from the next pages

• Many other issues must be hiding, though I have not defined them

Page 65: win rate first search

Unreached node

• On the node that has not been searched and no win rate

0.4 0.6 0.3

unreached

Page 66: win rate first search

Another win rate

• Before this page, knowledge of shogi does not appear and only graph is used

• This win rate uses knowledge of shogi• Win rate is calculated by kind of moves• For example, taking piece, promotion, and

etc.

Page 67: win rate first search

Another win rate

• Calculate win rate by these factors– Piece position before and after move– Kind of pieces moving and taken– Is position whether controlled or not

• Win rate table for all combination of these factors is prepared

• These win rates are learned by playout, whose values are not prepared

Page 68: win rate first search

Another smaller win rate

• Another smaller win rate table is prepared– Kind of pieces moving and taken– Is position whether controlled or not

• Since it is small, it learns fast• It is used when “another larger win rate” is

not learned yet• If all three kinds of win rate have not been

learned, let win rate be 1

Page 69: win rate first search

Conclusion of presented method

• Win rates of all searched nodes are remembered and learned by playout

• Select node that has highest win rate in playout (“win rate first search”)

• Sometimes select node randomly• If win rate has not been learned, other win

rates are used

Page 70: win rate first search

Condition of simulation game

• Win rate first search vs. Simple min-max method (evaluation function is composed by only values of pieces)

• If the game continues till 80 moves, the game is regarded as even (special rule for this simulation)

Page 71: win rate first search

Result of simulation 1

Number of playout 10000 30000 100000

Presented method: black

22-76 44-52 48-49

Presented method: white

16-81 30-68 61-35

Win-lose for presented method in 100 gamesSome even games existDepth of min-max method is 6More the playouts are, stronger the method is

Page 72: win rate first search

Result of simulation 2

Win-lose for presented method in 100 gamesSome even games exist100000 playouts for presented methodAlmost same strongness to 6-depth min-max

Depth of min-max 4 5 6 7 8 9

Present method: black

94-6 77-20 48-49 37-61 24-73 14-85

Present method: white

78-21 78-20 61-35 38-57 40-52 20-74

Page 73: win rate first search

Impression by human viewer

• Frequently presented method take bad moves

• Although it is a variation of Monte-Carlo method, it can find mate route

• It is good at finding narrow route• Difference of the number of playout shows

clearly difference of strongness

Page 74: win rate first search

Conclusion and future issue

• Conclusion– Playout by win rate first– Select moves without preset knowledge– Select moves by result of playout

• Future– Someone can apply it to “Go” or other

chess-like games– I return to research speech signal

processing