Enhancements for Multi-Player Monte-Carlo Tree Search · Enhancements for Multi-Player Monte-Carlo...

Preview:

Citation preview

Enhancements for Multi-Player Monte-Carlo Tree Search

J. (Pim) A.M. NijssenMark H.M. Winands

29 September 2010

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 2

Overview• Introduction• Progressive History• MP-MCTS-Solver• Test domains• Experiments and Results• Conclusions• Future Research

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 3

Introduction• Enhancements for Multi-Player Monte-

Carlo Tree Search– More than 2 players– Techniques

• maxn (Luckhardt and Irani, 1986)• Paranoid (Sturtevant and Korf, 2000)

– Games• Chinese Checkers• Hearts

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 4

Introduction• Enhancements for Multi-Player Monte-

Carlo Tree Search– Best-first search technique– Monte Carlo simulations– Four phases

• Selection (UCT)• Expansion (1 node per sample)• Playout (ε-greedy)• Backpropagation

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 5

Introduction• Enhancements for Multi-Player Monte-

Carlo Tree Search– Stores tuple of size N in nodes– Game returns tuple of size N

• Winner gets a score of 1, losers get a score of 0• Score is split in case of multiple winners

– e.g. [½, ½, 0] is returned if Players 1 and 2 both win

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 6

Introduction• Enhancements for Multi-Player Monte-

Carlo Tree Search– Progressive History– Multi-Player Monte-Carlo Tree Search Solver

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 7

Progressive History• Combination of Progressive Bias (Chaslot

et al., 2008) and the history heuristic (Schaeffer, 1983)

• Move selection strategy uses action information

• More information available• Information is less accurate• Influence decreases over time

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 8

Progressive History

1)ln(

+−×+×+=

iia

a

i

p

i

ii sn

Wns

nn

Cnsv

History heuristic Progressive Bias

Divide by number of losses

UCT

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 9

MP-MCTS-Solver• Multi-Player version of MCTS-Solver

(Winands et al., 2008)• Updating game-theoretical values• Update rules

– Standard (mate in one, one winner)– Paranoid– First winner

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 10

MP-MCTS-Solver

A

B C D

E F G H I

Player 3

Player 1

[0,1,0][…]

[1,0,0]

[0,1,0]

[1,0,0]

[1,0,0]

[0,1,0]

[0,1,0]

[?] Paranoid [0,1,0]

[1,0,0]First winner

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 11

Test domains• Multi-player games• Zero-sum• Perfect information

• Focus• Chinese Checkers

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 12

Focus• Capturing pieces

by creating stacks• Goal

– Total number of pieces captured

– Number of pieces captured from each opponent

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 13

Focus• Moving

– Only stacks one owns– Orthogonally– Move as many squares

as the number of pieces

– Maximum stack size is 5

• Capture pieces by creating larger stacks

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 14

Chinese Checkers• Goal: move pieces to

other side of the board

• Move pieces to adjacent fields or jump over other pieces– Sequential jumps

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 15

Experiments and Results• Processor: AMD64 2.4 GHz• Programming language: Java 6

• MCTS settings: C = 0.2, ε = 0.05

• Time: 2.5s per turn• 3360 games per tournament• All possible configurations

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 16

Experiments and Results• Progressive History in Focus

W 2 players 3 players 4 players

0 52.0% 51.2% 50.8%

0.5 59.0% 61.1% 57.5%

0.1 59.8% 63.0% 58.9%

0.25 61.3% 62.9% 59.4%

0.5 64.1% 65.5% 59.9%

1 66.0% 65.4% 58.2%

3 62.2% 65.2% 59.6%

5 57.9% 63.8% 59.6%

7.5 51.3% 60.6% 57.1%

10 47.4% 57.8% 56.9%

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 17

Experiments and Results• Progressive History in Chinese Checkers

W 2 players 3 players 4 players

0.25 52.8% 59.0% 56.9%

0.5 58.2% 62.8% 58.3%

1 67.8% 63.5% 61.9%

3 79.9% 66.7% 66.4%

5 83.5% 65.8% 66.8%

10 83.2% 65.3% 69.6%

15 81.0% 65.0% 69.2%

20 60.8% 60.2% 63.2%

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 18

Experiments and Results• Divide by number of losses

Game 2 players 3 players 4 players

Focus 64.8% 61.0% 52.0%

Chinese Checkers 57.6% 54.8% 53.9%

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 19

Experiments and Results• MP-MCTS-Solver in Focus

Update rule 2 players 3 players 4 players

Standard 53.0% 54.9% 53.3%

Paranoid 51.9% 50.4% 44.9%

First winner 52.8% 51.5% 43.4%

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 20

Conclusions• Progressive history

– Significant enhancement in Chinese Checkers and Focus

– Dividing by number of losses in Progressive Bias part increases performance

• MP-MCTS-Solver– Small but significant enhancement in Chinese

Checkers– Standard update rule works best

5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 21

Future Research• Test Progressive History in other games• Compare Progressive History with similar

techniques, like RAVE, prior knowledge (Gelly and Silver, 2007), Gibbs Sampling (Björnsson and Finnsson, 2009), etc.

• Create new update rules for MP-MCTS-Solver

Recommended