View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Evolving Winning ControllersEvolving Winning Controllersfor Virtual Race Carsfor Virtual Race Cars
Yonatan Shichel & Moshe Sipper
Outline• Introduction
– Artificial Intelligence– AI in games
• Robocode: Java-based tank-battle simulator• RARS: Robot Auto Racing Simulator
– Evolutionary Computation• Key concepts in evolution• Genetic Algorithms (GA)• Genetic Programming (GP)
• GP-RARS: evolution of winning controllers for virtual race cars– Game description– Previous work– Evolutionary environment setup & calibration– Experiments and Results– Discussion– Result Analysis
• Concluding Remarks
Artificial Intelligence (AI)
Definition (Russell & Norvig, 2003):
“systems that [act/think] [like humans/rationally]”
Artificial Intelligence (AI)
Definition (Russell & Norvig, 2003):
“systems that [act/think] [like humans/rationally]”
Artificial Intelligence (AI)
Definition (Russell & Norvig, 2003):
“systems that [act/think] [like humans/rationally]”
Artificial Intelligence (AI)
Definition (Russell & Norvig, 2003):
“systems that [act/think] [like humans/rationally]”
Artificial Intelligence (AI)
Definition (Russell & Norvig, 2003):
“systems that [act/think] [like humans/rationally]”
AI in Games
• games are natural candidates for AI• games provide a variety of challenges• games allow exploration of real-world realms• games allow comparison to human behavior• games can be rewarding to master• games are fun!
Robocode
• tank-battle simulation• Java-based, open-source programming
game• simplistic physical model• active gamer community
– extensive online robot library– ongoing tournaments
RARS: Robot Auto Racing Simulator
• car-race simulation• C++-based, open-source programming game• sophisticated physical model• inactive gamer community
– limited online robot library– tournaments held between 1995 and 2003
Evolutionary Computation
“a family of algorithmic approachesaimed at finding optimal solutions tosearch problems of high complexity”
Key concepts in Evolution
The Origin of Species (Darwin, 1859):• a population is composed of many individuals• individuals differ in characteristics, which are
inheritable by means of sexual reproduction• environment consists of limited resources, leading
to a struggle for survival
Key concepts in Evolution
The Origin of Species (Darwin, 1859):• fitter individuals are more likely to survive and
reproduce, passing their characteristics to their offspring
• as time passes, populations slowly adapt to their surrounding environment
Genetic Algorithms (GA)
Inspired by Darwin’s evolutionary principles:• a fixed-size population is composed of many
solution instances for the problem at hand• solutions are encoded in genomes• a fitness function determines how fit each
individual is• population is re-populated on each generation• fitter individuals have higher probabilities to be
selected to next generation
Genetic Algorithms (GA)
• genetic operators – crossover and mutation – are applied on selected individuals for the creation of new individuals
• process is repeated for many generations
Genetic Algorithms (GA)
A schematic flow of a basic GA:
g=0initialize population P0
evaluate P0 //assign fitness values to individualswhile (termination condition not met) do
g=g+1;select Pg from Pg-1
crossover Pg
mutate Pg
evaluate Pg
end while
Genetic Algorithms (GA)
GA customization:• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Genetic Programming (GP)
“an evolutionary computation approachaimed at the creation of computer programs
rather than static solutions”
Genetic Programming (GP)
• individual’s genome is composed of LISP expressions
• LISP expressions are composed of functions and terminals
Genetic Programming (GP)
• individual’s genome is composed of LISP expressions
• LISP expressions are composed of functions and terminals
• LISP expressions evaluate to numeric values, hence representing functions
Genetic Programming (GP)
• individual’s genome is composed of LISP expressions
• LISP expressions are composed of functions and terminals
• LISP expressions evaluate to numeric values, hence representing functions
• genetic operators are defined to operate on (and return) LISP expressions
Genetic Programming (GP)
subtree substitution crossover:
++
xx
** 11
xx
--
11 **
11
xx
++
11
(+ (* x x) 1)x2+1
(- 1 (* 1 (+ x 1)))-x
Genetic Programming (GP)
subtree substitution crossover:
++
x* 11
x
--
11 **
11
xx
++
11
(+ (* x x) 1)x2+1
(- 1 (* 1 (+ x 1)))-x
Genetic Programming (GP)
subtree substitution crossover:
++
x* 11
x
--
11 *1
x+
1
(+ (* x x) 1)x2+1
(- 1 (* 1 (+ x 1)))-x
Genetic Programming (GP)
subtree substitution crossover:
++
11
--
11
(+ (* 1 (+ x 1)) 1)x+2
(- 1 (* x x))1-x2
*1x
+1
x*x
Genetic Programming (GP)A schematic flow of a basic GP:
g=0initialize population P0
evaluate P0 //assign fitness values to individualswhile (termination condition not met) do
g=g+1;while (Pg is not full) do
OP = choose a genetic operatorselect individual or individuals from Pg-1
according to OP's inputsapply OP on selected individualsadd the resulting individuals to Pg
end whileevaluate Pg
end while
Basic Rules
• one or more cars drive on a track for given number of laps
• cars are damaged when colliding or driving off track
• car may be disabled and disqualified if its damage exceeds a certain level
• the winner is the driver that finishes first
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Game Variants
• number of cars: one, two, multiple• number of tracks: one, multiple• race length: short, long• controller program: generic, specialized• driver class: reactive (c2), optimal-path (c1)
Controlling the Car
• movement: desired speed variable• steering: wheel angle variable• fuel & damage: pit stop request flag
Car Sensors
situation variables:• current speed, drift speed and heading• current track segment ID• position on current track segment• distances from left and right road shoulders• distance to next track segment• radii and lengths of current and next track segmentsadditional data:• complete track layout• nearby cars information
The Challenge
PEAS system (Russell & Norvig, 2003):• Performance measure• Environment• Actuators• Sensors
The Challenge
PEAS system (Russell & Norvig, 2003):• Performance measure• Environment• Actuators• Sensors
The Challenge
PEAS system (Russell & Norvig, 2003):• Performance measure• Environment• Actuators• Sensors
The Challenge
is the environment... RARS GP-RARS
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic? partially partially
...episodic?
...static?
...discrete?
...single agent?
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic? partially partially
...episodic? no no
...static?
...discrete?
...single agent?
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic? partially partially
...episodic? no no
...static? either static
...discrete?
...single agent?
static indicates whether the environment changes with or without the intervention of the active agent. In the basic RARS game it can be non-static if more than one agent is active; GP-RARS is single-car and thus fully static.
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic? partially partially
...episodic? no no
...static? either static
...discrete? continuous continuous
...single agent?
The Challenge
is the environment... RARS GP-RARS
...observable? fully fully
...deterministic? partially partially
...episodic? no no
...static? either static
...discrete? continuous continuous
...single agent? single OR multiple single
The Challenge
PEAS system (Russell & Norvig, 2003):• Performance measure• Environment• Actuators• Sensors
The Challenge
PEAS system (Russell & Norvig, 2003):• Performance measure• Environment• Actuators• Sensors
Previous Work
• planning approaches:– Genetic Algorithms (Eleveld, Sáez)– A* search (Pajala)
• reactive approaches:– Decision Trees (Wang)– Action Tables (Cleland)– Artificial Neural Networks (Ng, Pyeatt, Coulum)– Evolving Neural Networks (Stanley)
Previous Work
• planning approaches:– Genetic Algorithms (Eleveld, Sáez)– A* search (Pajala)
• reactive approaches:– Decision Trees (Wang)– Action Tables (Cleland)– Artificial Neural Networks (Ng, Pyeatt, Coulum)– Evolving Neural Networks (Stanley)
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Genome Representation
• each individual is composed of two trees:– steering tree– throttling tree
• trees evaluate to numeric values, which are truncated to fit game-world restrictions
• trees are defined using an extensive set of functions and terminals, both simple and complex
Genome Representation
• terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1}
• terminal set (complex):{a, a-angle, off-center, inner-wall, outer-wall, closest-wall}
• function set:{add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)}
Genome Representation
• terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1}
• terminal set (complex):{a, a-angle, off-center, inner-wall, outer-wall, closest-wall}
• function set:{add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)}
blue terminals and functions are the ones chosen after a calibration process
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Fitness Measure
• fitness evaluation performed on a single-lap, single-car race on one track: sepang
• track believed to exhibit various track features
• two fitness measures were used:– race distance– modified race time
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Selection Method
• several methods examined for a 250-individual population:– tournament of k, with k={2,3,4,5,6,7}– fitness proportionate selection– square-fitness proportionate selection
Selection Method
• several methods examined for a 250-individual population:– tournament of k, with k={2,3,4,5,6,7}– fitness proportionate selection– square-fitness proportionate selection
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Crossover & Mutation
• crossover: subtree substitution• mutation: random subtree growth• probabilities:
– 40% reproduction– 50% crossover– 10% mutation
• 5% random constant mutation• 5% structural (subtree) mutation
Evolutionary Setup & Calibration
• genome representation• fitness measure• selection method• crossover method• mutation method• termination condition• initial population creation
Initialization & Termination
• initial population creation: – Koza’s ‘ramped-half-and-half’ method: for each
k = {4,5,6,7,8}:• 10% of the trees grown to a depth up to k• 10% of the trees grown to a depth of exactly k
• termination condition:– evolution stops after 255 generations
Experiments & Results
• several evolutionary runs were made• two best runs were taken, and best driver of
last generation was extracted from each• driver was then tested for 10 single-lap,
single-car races
Result Comparison
• comparison to human-crafted drivers– on the training track– on ‘unseen’ tracks
• comparison to machine-crafted drivers
Result Comparison
• comparison to human-crafted drivers– on the training track– on ‘unseen’ tracks
• comparison to machine-crafted drivers
Result Comparisonsingle-car, single-lap race on sepang
# Driver Class Lap Time (sec.)1 Dodger13 1 146.3 ± 0.12 K1999 1 146.6 ± 0.13 K2001 1 147.1 ± 0.14 SmoothB4 1 148.3 ± 0.15 Bulle2 1 150.4 ± 0.16 Sparky5 1 150.4 ± 0.17 SmoothB3 1 153.3 ± 0.18 Felix16 1 153.6 ± 0.19 SmoothB2 1 156.5 ± 0.110 GPSingle1 - 160.0 ± 0.411 GPSingle2 - 160.9 ± 0.312 Vector 2 160.1 ± 0.113 WappuCar 2 161.7 ± 0.114 Apex8 2 162.5 ± 0.215 Djoefe 2 163.7 ± 0.116 Ali2 2 164.1 ± 0.117 Mafanja 2 164.4 ± 0.318 SBv1r4 2 165.7 ± 0.119 Burns 2 168.4 ± 5.720 Eagle 2 169.3 ± 0.621 Bulle 2 169.5 ± 0.222 Magic 2 174.0 ± 0.123 JR001 2 178.5 ± 0.1
Result Comparison
• comparison to human-crafted drivers– on the training track– on ‘unseen’ tracks
• comparison to machine-crafted drivers
Result ComparisonAug. 2004 season results (16 tracks)
# Driver 1st 2nd 3rd total
1 Vector 6 3 2 11
2 Eagle 3 2 1 6
3 GPSingle2 2 3 4 9
4 GPSingle1 2 2 2 6
5 SBv1r4 1 1 2 4
6 Bulle 1 1
7 Mafanja 2 2 4
8 Magic 2 2
9 WappuCar 1 1 2
10 Djoefe 2 2
11 Burns 1 1
12 Ali2
13 Apex8
14 JR001
Result Comparison
• comparison to human-crafted drivers– on the training track– on ‘unseen’ tracks
• comparison to machine-crafted drivers
Result ComparisonPrevious Works Results
Author Track Reported Time (sec.)
GP-Single-1 GP-Single-2
Eleveld
(GA)
v01 37.8 ± 0.1 38.1 ± 1.7 34.9 ± 0.1
suzuka 149.7 ± 0.1 177.1 ± 5.2 167.5 ± 0.3
race7 85.7 ± 0.2 61.9 ± 0.6 63.3 ± 0.4
Ng et al.
(ANN)
v03 59.4 55.3 ± 0.5 49.3 ± 0.1
oval 33.0 31.0 ± 0.1 30.8 ± 0.1
complex 209.0 196.2 ± 6.0 204.6 ± 1.3
Coulum
(ANN)
clkwis 38.0 37.8 ± 0.1 36.4 ± 0.1
Cleland
(Action Tables)
v01 37.4 38.1 ± 1.7 34.9 ± 0.1
Stanley et al.
(Evolving ANN)
clkwis 37.6 / 37.9 37.8 ± 0.1 36.4 ± 0.1
Conclusions
• GP-Drivers rank higher than any human-crafted driver in their class when racing on their training track
• GP-Drivers rank among the top human-crafted drivers in their class when racing on new, unseen tracks
• GP-Drivers perform better than any machine-crafted driver developed by past RARS researchers
Genome Representation
• terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1}
• terminal set (complex):{a, a-angle, off-center, inner-wall, outer-wall, closest-wall}
• function set:{add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)}
blue terminals and functions are the ones chosen after a calibration process
Genome Representation
• terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1}
• terminal set (complex):{a, a-angle, off-center, inner-wall, outer-wall, closest-wall}
• function set:{add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)}
blue terminals and functions are the ones “chosen” by evolution (in best-of-run)
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% 1.0 (% v a)) (neg a))) (- (- (* n (neg n)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% 1.0 (% v a)) (neg a))) (- (- (* n (neg n)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% a v) (neg a))) (- (- (* n (neg n)) (neg a)) (neg a))) (- (% a v) (neg (% (% a v) (% v a)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% a v) (neg a))) (- (- (* n (neg n)) (neg a)) (neg a))) (- (% a v) (neg (% (% a v) (% v a)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% a v) (neg a))) (- (- (* n (neg n)) (neg a)) (neg a))) (- (% a v) (neg (% (% a v) (% v a)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a )) (+ (% a v) a )) (- (- (neg (* n n)) (neg a)) (neg a))) (- (% a v) (neg (* (% a v) (% a v)))))
Genetic Analysis
GP-Single-2, Steering
(% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a )) (+ (% a v) a )) (- (- (neg (* n n)) (neg a)) (neg a))) (- (% a v) (neg (* (% a v) (% a v)))))
Genetic Analysis
GP-Single-2, Steering
behavior depends on distance, a, to upcoming curve: when next turn is far enough, controller slightly adjusts wheel angle to prevent drifting off track; when approaching a curve, however, controller steers according to relative curve angle—steep curves will result in extreme wheel angle values.
Genetic Analysis
GP-Single-2, Throttling
(ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a))(neg (% (* n (neg n)) (% v a))))))
Genetic Analysis
GP-Single-2, Throttling
(ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a))(neg (% (* n (neg n)) (% v a))))))
Genetic Analysis
GP-Single-2, Throttling
(- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a))))
Future Work
• apply GP to other RARS variants– multiple-car scenarios– long (endurance) races
• use GA to plan optimal paths• migrate research to TORCS
Bibliography• Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach. 2nd
edition. s.l. : Prentice Hall, 2003. ISBN 0-13-790395-2• Darwin, Charles. On the Origin of Species: By Means of Natural Selection or the
Preservation of Favoured Races in the Struggle for Life. London : John Murray, 1859. ISBN 0-486-45006-6
• GP-Robocode: Using Genetic Programming to Evolve Robocode Players. Shichel, Yehonatan, Ziserman, Eran and Sipper, Moshe. s.l. : Springer, 2005. 8th European Conference on Genetic Programming. pp. 143-154
• Eleveld, Doug. [Online] http://rars.sourceforge.net/selection/douge1.txt• Pajala, Jussi. [Online] http://rars.sourceforge.net/selection/jussi.html• Wang, Zhijin. Car Simulation Using Reinforcement Learning. Computer Science
Department, University of British Columbia. Vancouver, B.C., Canada : s.n., 2003• MoNiF: a modular neuro-fuzzy controller for race car navigation. Ng, Kim C, et al.
Monterey, CA, USA : s.n., 1997. IEEE International Symposium on Computational Intelligence in Robotics and Automation. pp. 74-79. ISBN 0-8186-8138-1
Bibliography• Learning to Race: Experiments with a Simulated Race Car. Pyeatt, Larry D and Howe,
Adele E. Sanibel Island, Florida, USA : s.n., 1998. 11th International Florida Artificial Intelligence Research Society Conference
• Coulom, Rémi. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. Institut National Polytechnique de Grenoble. 2002. PhD Thesis
• Cleland, Ben. Reinforcement Learning for Racecar Control. University of Waikato. 2006. M.Sc. Thesis
• Neuroevolution of an automobile crash warning system. Stanley, Kenneth, et al. 2005. Genetic And Evolutionary Computation Conference. pp. 1977 - 1984. ISBN 1-59593-010-8
• Sáez, Yago, et al. Driving Cars by Means of Genetic Algorithms. Parallel Problem Solving from Nature – PPSN X. s.l. : Springer, 2008, pp. 1101-1110