Upload
dewey
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion. - PowerPoint PPT Presentation
Citation preview
Machine Learning in Computer Game
PlayersChikayama & Taura Lab.
M1 Ayato Miki
1
1. Introduction2. Computer Game Players3. Machine Learning in Computer Game
Players4. Tuning Evaluation Functions
◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms
5. Conclusion
Outline
2
Improvements in Computer Game Players◦ DEEP BLUE defeated Kasparov in 1997◦ GEKISASHI and TANASE SHOGI on WCSC 2008
Strong Computer Game Players are usually developed by strong human players◦ Input heuristics manually◦ Devote a lot of time and energy to tuning
1. Introduction
3
Machine Learning enables automatic tuning using a large amount of data
It is not necessary for a developer to be an expert of the game
Machine Learning for Games
4
1. Introduction2. Computer Game Players3. Machine Learning in Computer Game
Players4. Tuning Evaluation Functions
◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms
5. Conclusion
Outline
5
Games
Game Trees
Game Tree Search
Evaluation Function
2. Computer Game Players
6
Turn system games◦ ex. tic-tac-toe, chess, shogi, poker, mah-jong…
Additional Classification◦ two player or otherwise◦ zero-sum or otherwise◦ deterministic or non-deterministic◦ perfect or imperfect information
Game Tree Model
Games
7
Game Trees
8
← player’s turn
← move 2move 1 →
← opponent’s turn
ex. Minimax search algorithm
Game Tree Search
9
5
5 8 3 6
5 3
3 51 4 28 3 10 6 24
Max
Max Max
Min Min
Difficult to search up to leaf nodes◦ 10^220 possible positions in shogi
Stop search at practicable depth And “Evaluate” nodes
◦ Using Evaluation Function
10
Game Tree Search
Estimate the superiority of the position
Elements◦ feature vector of the position◦ parameter vector
Evaluation Function
),()( sfsV
s feature vector of position
sparameter vector
11
Introduction Computer Game Players Machine Learning in Computer Game
Players Tuning Evaluation Functions
◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms
Conclusion
Outline
12
Initial work◦ Samuel’s research [1959]
Learning objective◦ What do Computer Game Players Learn ?
3. Machine Learning inComputer Game Players
13
Many useful techniques◦ Rote learning◦ Quiescence search◦ 3-layer neural network evaluation function
And some machine learning techniques◦ Learning through self-play◦ Temporal-difference learning◦ Comparison training
Samuel’s Checker Player [1959]
14
Opening Book
Search Control
Evaluation Function
15
Learning Objective
Automatic construction of evaluation function◦ Construct and select a feature vector
automatically◦ ex. GLEM [Buro, 1998]◦ Difficult
Tuning evaluation function parameters◦ Make a feature vector manually and tune its
parameters automatically◦ Easy and effective
Learning Evaluation Functions
18
Introduction Computer Game Players Machine Learning in Computer Game
Players Tuning Evaluation Functions
◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms
Conclusion
Outline
19
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
4. Tuning Evaluation Functions
20
Provide the program with example positions and their exact evaluation values
Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values
Supervised Learning
2050
)(sVerror
・・・
1040 50
21
Manual labeling positions
Quantitative evaluation
22
Difficulty of Hard Supervised Training
Consider more soft approach
Soft Supervised Training
Require only relative order for the possible moves◦ Easier and more intuitive
Comparison Training
>
23
Comparison training using records of expert games
Simple relative order
Bonanza [Hoki, 2006]
The expert move other moves>
24
Based on the Optimal Control Theory Minimize the Cost Function J
Bonanza Method
1
0110 ),(),,,,(
N
iiN slsssJ
),(
i
i
sl
Ns example positions in the
records
error functiontotal number of example positions
25
Bonanza Method
1
10 )],'(),'([),(
M
mmm ssTsl
Error Function
)(),'(
0
'
xTs
mMs
m
m
child position with move mtotal number of possible movesthe move played in the recordminimax search valueorder discriminant function
26
Sigmoid Function
◦ k is the parameter to control the gradient◦ When , T(x) is Step Function◦ In this case, the error function means “the
number of moves that were considered to be better than the move in the record”
Order Discriminant Function
kxexT
11)(
k
27
30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used
The weight parameters of about 10,000 feature elements were tuned
And won in the World Computer Shogi Championship 2006
Bonanza
29
It is costly to accumulate a training data set◦ It takes a lot of time to label manually◦ Using expert records has been successful
But how if not enough expert records ?◦ New games◦ Minor games
Problem of Supervised Learning
30
Other approach without a training set◦ ex. Reinforcement Learning (Next)
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
4. Tuning Evaluation Functions
31
The learner gets “a reward” from the environment
In the domain of game, the reward is final outcome(win/lose)
Reinforcement learning requires only the objective information of the game
Reinforcement Learning
32
33
Reinforcement Learning
+100
+60
+30
+10
+200
+120
+60
+20
-100
-60
-30
-10
Inefficient in Games…
34
Temporal-Difference Learning
+100
+60
+30
+10
+80
+15
+10
+10
)()( 1 tt sVsVrTDerror
Trained through self-play
TD-Gammon [Tesauro, 1992]
35
Version Features Strength
TD-Gammon 0.0 Raw Board Information
Top of Computer Players
TD-Gammon 1.0 Plus Additional Heuristics
World-championship
Falling into a local optimum◦ Lack of playing variation
Solutions◦ Add intentional randomness◦ Play against various players (computer/human)
Credit Assignment Problem (CAP)◦ Not clear which action was effective
Problems of Reinforcement Learning
36
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
4. Tuning Evaluation Functions
37
Evolutionary AlgorithmInitialize Population
Randomly Vary Individuals
Evaluate “Fitness”
Apply Selection
38
Evolutionary algorithm for chess player
Using open-source chess program◦ Attempt to tune its parameters
Research of Fogel et al. [2004]
39
Make initial 10 parents◦ Initialize parameters with random values
Initialization
40
Create 10 offsprings from each surviving parent by mutating parental parameters
Variation
)',0(' iii sN
isN'
),( Gaussian random variablestrategy parameter
41
Each player plays ten games against randomly selected opponents
Ten best players become parents of the next generation
Evaluate Fitness and Selection
42
Select 10 opponents randomly
Material value
Positional value
Weights and biases of three neural networks
43
Tuned Parameters
Each network has 3 Layers Input = Arrangement of specific areas(front 2 rows, back 2 rows, and center 4x4 square) Hidden = 10 Units Output = Worth of the area arrangement
44
Three Neural Networks
16 input 10 hidden 1 output
Initial Rating = 2066 (Expert)◦ Rating of open-source player
Best Rating = 2437 (Senior Master)
But the program cannot yet compete with other strongest chess programs (R2800~)
Result
45
10 independent trials (Each has 50 generations)
Introduction Computer Game Players Machine Learning in Computer Game
Players Tuning Evaluation Functions
◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms
Conclusion
Outline
47
Advantages Disadvantages
Supervised Learning Direct and Effective Manual Labeling Cost
Reinforcement Learning Wide Application Local Optimal
CAP
Evolutionary Algorithm
Wide ApplicationNo CAP
IndirectRandom Dispersion
Characteristics
48
Automatic position labeling◦ Using records or computer play
Sophisticated reward◦ Consider opponent’s strength◦ Move analysis for credit assignment
Experiment in other games
Future Work
49