win rate first search

From Monte-Carloto win rate first searchfor “Dobutsu Shogi”

2010/05/22IHARA Takehiro

Abstract

• On algorithm for computer Shogi (Japanese chess)

• Contents– Exhibition of Dobutsu Shogi– Min-max method (conventional)– Monte-Carlo method (conventional)– Win rate first search (presented)

Dobutsu shogi

• This slide mentions computer game algorithm by using Dobutsu Shogi

• Dobutsu Shogi: a miniature shogi• Shogi: Japanese chess• Dobutsu: animal• Normal shogi is too large to examine new

methods

Rule of Dobutsu Shogi 1

Five kind of piecesInitial position is as figureWin if you catch lionWin if your lion reaches to opposite end

Chick promotes chicken

Rule of Dobutsu Shogi 2

All pieces move by one step

forwardvertical horizontal and forward-diagonal

around 8 squares

diagonal

vertical horizontal

You can reuse (drop) the pieces that you took

Copy right of Dobutsu shogi

• I do not know who has copy right– FUJITA Maiko （illustration）

– KITAO Madoka （making rule）

– ＬＰＳＡ（the two designers had belonged to）

– GENTOSHA Education （toy seller）

Illustration on this slide

• Because of that complex copy right, I use the illustrations on the website below in this slide, instead of FUJITA's ones

• “SOZAIYA JUN”• (http://park18.wakwak.com/~osyare/)

Exhibition initial position

Black： win rate first search (presented)White： min-max method, search depth 9, evaluation function is composed by only piece value (conventional)

Exhibition 1st move

Black advanced giraffe

Exhibition 2nd move

White advanced giraffe

Exhibition 3rd move

Black took chick by chick

Exhibition 4th move

White took chick by elephant

Exhibition 5th move

Black advanced elephant

Exhibition 6th move

White dropped chick for defense

Exhibition 7th move

Black moved giraffe backward

Exhibition 8th move

White advanced giraffe

Exhibition 9th move

Black dropped chick for defense

Exhibition 10th move

White took elephant by giraffe


Black took giraffe by lion


White dropped elephantThis elephant combination style is strong


Black lion escaped


White advanced lion


Black dropped giraffe and check


White escaped lion


Black advanced giraffeBlack forced white to select taking giraffe or escaping elephant


White took giraffe by elephant


Black took elephant by lion


White dropped giraffe

Exhibition 21st move

Black dropped elephant behind lion

Exhibition 22nd move

White moved elephant backward

Exhibition 23rd move

Black advanced elephant


White check by giraffe


Black took giraffe by elephant


White took elephant by chickIf white had taken by elephant, white would be mate


Black lion escaped


White dropped elephant


Black check by giraffe


White took giraffe by elephant

Exhibition 31st move

Black took chick by lion, and white resignedAfter it, white drops giraffe on side of lion, black giraffe takes elephant and check, white lion takes it, black chick advances, white lion moves backward, black drops chick, check mate

Min-max method

• A conventional method• Today the most successful method for shogi• Explanation using tree structure from next

page

Min-max Example: 3 depth

Present board positionBoard position

after 1 and 2 moves

Board position after 3 moves

Min-maxS

uppose scores after 3 moves

were revealed

-4 -3 10 3 -9 5 23 -8

Min-maxS

cores after 2 moves are

maxim

um of each score

-4 -3 10 3 -9 5 23 -8

-3 10 5 23

Min-maxS

cores after 1 moves are

minim

um of each score

-4 -3 10 3 -9 5 23 -8

-3 10

5

23

-3

5

Min-maxS

elect the move having

maxim

um score

-4 -3 10 3 -9 5 23 -8

-3 10

5

23

-3

5

5

Min-max method

• Theoretically you can select the move that has the maximum score after N moves

• Theoretically if we could obtain the score of the end of the game, we would always win the game

• Practically because of too large computational cost, we cannot calculate all moves

Min-max method

• Although many methods for reducing computational cost is presented, they will be not mentioned this slide (It is called pruning to reduce the number of searched nodes)

Conclusion of min-max method

• It uses tree structure• Scores after N moves are needed• Pruning is needed

Monte-Carlo method

• While I do not know the history of Monte-Carlo method, it have been successful for computer “go” (precisely successful by Monte-Carlo tree search)

• They say that it is difficult to apply computer shogi (or chess-like game) yet

Outline of Monte-Carlo

• Repeat random moves

• Then game finishes and winner is revealed

• making game end by random moves is called playout

first move

end of gameplayout

random m

ove


• Repeat playout• Obtain win rate of

the first move• (number of win) /

(number of playout)• Select move having

highest win rate at the last


• Outline is only it• As to “Go”, this method has become

stronger by combining tree structure and making Monte-Carlo tree search (this slide does not mention it)

• Another improvement is that playout uses moves by knowledge of “Go” instead of simple random moves

Example of knowledge of “Go”

• Observe 3x3 squares• Set low probability to drop

black stone the center of above figure

• Set high probability to drop black stone the center of below figure

Monte-Carlo for shogi

• Simple Monte-Carlo method does not work for shogi (too many bad moves appear)

• A causal must be that few moves in all legal moves are good on shogi

• I do not want to use knowledge of shogi by neither machine learning nor manual setting

Why Monte-Carlo for shogi

• Ability to determine the move by result of the end of game, which seems beautiful

• No evaluation function is needed, no preset knowledge is needed

Discussion Monte using treeS

imple random

moves lead

equal win rate betw

eengreen and red

Truth is that green win and red loseIt tells importance of tree structure


uppose you obtain win rate

after 3 moves

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

Obtain win rate of green and red from These 3-move-after rates by playout

Discussion Monte using treeIdeally the rates are equal toones of m

in-max m

ethod

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

0.3 0.8 0.6 0.9

0.3 0.6

Discussion Monte using tree

• Q: How do you calculate parent node 0.6 by children nodes 0.2 and 0.6

• A: Ignore 0.2

0.2 0.6

0.6

Discussion Monte using tree

• Q: How do you ignore 0.2?• A1: Always search maximum

win rate node• A2: sometimes search through

node randomly

0.2 0.6

0.6


earch node that hasm

aximum

win rate

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

This tactics finds the best path

Win rate first search

• Remember win rate of searched node• Almost always search node that has

maximum win rate• Sometimes search randomly (ideally it is

not needed)• Then this algorithm finds the best move

Additional explanation

• Update win rate at every playout• Keep numerator and denominator as win

rate• Add constant number to both numerator and

denominator when win the playout• Add constant number to only denominator

when lose the playout

Problems of presented method

• Win rates of the nodes that have not been searched are mentioned from the next pages

• Many other issues must be hiding, though I have not defined them

Unreached node

• On the node that has not been searched and no win rate

0.4 0.6 0.3

unreached

Another win rate

• Before this page, knowledge of shogi does not appear and only graph is used

• This win rate uses knowledge of shogi• Win rate is calculated by kind of moves• For example, taking piece, promotion, and

etc.

Another win rate

• Calculate win rate by these factors– Piece position before and after move– Kind of pieces moving and taken– Is position whether controlled or not

• Win rate table for all combination of these factors is prepared

• These win rates are learned by playout, whose values are not prepared

Another smaller win rate

• Another smaller win rate table is prepared– Kind of pieces moving and taken– Is position whether controlled or not

• Since it is small, it learns fast• It is used when “another larger win rate” is

not learned yet• If all three kinds of win rate have not been

learned, let win rate be 1

Conclusion of presented method

• Win rates of all searched nodes are remembered and learned by playout

• Select node that has highest win rate in playout (“win rate first search”)

• Sometimes select node randomly• If win rate has not been learned, other win

rates are used

Condition of simulation game

• Win rate first search vs. Simple min-max method (evaluation function is composed by only values of pieces)

• If the game continues till 80 moves, the game is regarded as even (special rule for this simulation)

Result of simulation 1

Number of playout １００００３００００１０００００

Presented method: black

22-76 44-52 48-49

Presented method: white

16-81 30-68 61-35

Win-lose for presented method in 100 gamesSome even games existDepth of min-max method is 6More the playouts are, stronger the method is

Result of simulation 2

Win-lose for presented method in 100 gamesSome even games exist100000 playouts for presented methodAlmost same strongness to 6-depth min-max

Depth of min-max ４５６７ 8 ９

Present method: black

94-6 77-20 48-49 37-61 24-73 14-85

Present method: white

78-21 78-20 61-35 38-57 40-52 20-74

Impression by human viewer

• Frequently presented method take bad moves

• Although it is a variation of Monte-Carlo method, it can find mate route

• It is good at finding narrow route• Difference of the number of playout shows

clearly difference of strongness

Conclusion and future issue

• Conclusion– Playout by win rate first– Select moves without preset knowledge– Select moves by result of playout

• Future– Someone can apply it to “Go” or other

chess-like games– I return to research speech signal

processing

Technology

win rate first search