Upload
jungkyu-lee
View
430
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Demo Play: http://youtu.be/rA9SgaoY1jY
Citation preview
1
TETRIS WARTeam 4
2008.12.1JungKyu Lee
SangYun KimShinnyue Kang
2
Contents
• General Idea Description– Approximation using feature based MDP– policy iteration
• Apply to Tetris– Problem description–MDP formulation– Feature based MDP formulation
• Result• Conclusion
3
General Idea
• Infinite horizon MDP with discount fac-tor
• Goal : Find a policy that maxi-mize value function cost-to-go vector
– Let policy
10
AX :V
,...},{ 10
4
Cost-to-go value
• Definition of optimal cost-to-go vector
• By bellman’s optimal equation
• Using Optimal stationary policy
• Then optimal equation is given as
*V
,...},{
*V
5
Policy iteration• Policy iteration
• The value is updated as follows
• Vector has components given by
• Temporal difference(TD) associated with each transition under
t
),( ji 1t
6
Tetris
• Board size –Width 10 ; height 22
• Blocks– 7 pieces with predetermined
probability
• Score– (number of erased
line)2X100
• Action– Left, Right, Rotate, No move
7
MDP formulation• MDP Model for tetris– States : X={wall configuration + Piece} – Actions : A={rotation, right, left, nomove}– Transitions : deterministic new wall after(i,a) +
uniform random new piece– Reward : r(i,a,j)=number of lines removed after
(i,a)
• A value function can be computed only on the set of wall configurations.
• The optimal value function V* is the best average score!
8
Approximation for tetris
• Number of state is too Large to compute– Feature-based MDP
• Feature– Each column’s height, width– Absolute difference be-
tween adjacency column–Maximum height– Number of holes
9
Value function• We defined approximated value function
using above feature
• Where is a vector of features for state k
• Finally
• Our decision is as follows
V~
10
Weight vector
• Iteration to approximate to the optimal value function for policy iteration
• weight vector (equation 1)
– games– :state sequence of the
game m – :termination state of game m –
M
*VV~
11
Minimum squared error technique using psedoinverse
• To solve equation (1)• Goal : find a weight vector a satisfying
following equation– d : # of feature– n : # of samples
• Formal solution – Y is nonsingular
• Error vector
12
Squared error criterion function
• Minimize the squared length of error vec-tor
• Define the error criterion function
• Using gradient method for simplify
• Necessary condition yield by above equation has 0 value
– : psedoinverse
13
Apply to tetris problem
• Let
• Equation (1)
– M : # of games– n : # of samples– : feature vector
14
Simulation Result
• Let– = 0.6– Test 100 game• using random seed 0 to 100
• Simple TD algorithm is our heuristic al-gorithm
• Our learning algorithm improve 2010% of the heuristic algorithm
15
Simulation result
• zxzxz
16
Conclusion• Goal of project– Make a algorithm to achieve the highest average
cost
• Our learning algorithm is powerful– Average score and maximum score is satisfied to
compare with heuristic algorithm
• Problem of deviation – Deviation : difference between the highest score and
lowest score– Our learning algorithm gives big deviation
• Suggest– Reduce the deviation without dropped average score
17
Reference[1] Bertsekas, D. P. and Tsitsiklis, J. N., 1996, "Neuro-Dy-
namic Programming", Athena Scientific.[2] Colin Fahey., 2003, "Tetris AI", http://www.colinfahey.-
com[3] Dimitri P. Bertsekas2 and Sergey Ioffe.1996, "Temporal
Differences-Based Policy Iterationand Applications in Neuro-Dynamic Programming" LIDS-P-2349
[4] Donald Carr., 2005, "Applying reinforcement learning to Tetris", Dept. of CS, Rhodes University. South Africa.
[5] Niko Böhm at el., 2005, "An Evolutionary Approach to Tetris", MIC2005: The Sixth Metaheuristics International Conference, Vienna, Austria.
[6] Richard S. Sutton and Andrew G. Barto., 1998, "Rein-forcement Learning: An Introduction", The MIT Press.