Upload
caleb-kim
View
30
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping. Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science. Typical Uses of MOEAs. Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010) - PowerPoint PPT Presentation
Citation preview
Evolving Agent Behavior in Evolving Agent Behavior in Multiobjective Domains Using Multiobjective Domains Using
Fitness-Based ShapingFitness-Based Shaping
Jacob Schrum and Risto Jacob Schrum and Risto MiikkulainenMiikkulainen
University of Texas at AustinUniversity of Texas at Austin
Department of Computer ScienceDepartment of Computer Science
Typical Uses of MOEAsTypical Uses of MOEAs Where have MOEAs proven themselves?Where have MOEAs proven themselves?
Wireless Sensor Networks (Woehrle et al, 2010)Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009)Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006)Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004)Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004)Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003)Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001)Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997)Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995)Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, Design of laminated ceramic composites (Belegundu et al,
1994)1994)
Many engineering/design problems!Many engineering/design problems!
New Domains for MOEAsNew Domains for MOEAs Simulated agents often face multiple Simulated agents often face multiple
objectivesobjectives Automatic discovery of intelligent behaviorAutomatic discovery of intelligent behavior
Video game opponents in Unreal Video game opponents in Unreal Tournament (van Hoorn, 2009) Tournament (van Hoorn, 2009)
Predator/prey scenarios Predator/prey scenarios (Schrum & Miikkulainen 2009)(Schrum & Miikkulainen 2009)
Race car driving in TORCS Race car driving in TORCS (Agapitos et al, 2008) (Agapitos et al, 2008)
Comparatively little so farComparatively little so far Direct application of MOEA seldom successfulDirect application of MOEA seldom successful
Success often depends on “Success often depends on “shapingshaping””
What is Shaping?What is Shaping? Term from Behavioral PsychologyTerm from Behavioral Psychology Identified by B. F. Skinner (1938)Identified by B. F. Skinner (1938) Task-Based Example: Task-Based Example:
Train rat to press lever Train rat to press lever First reward proximityFirst reward proximity Then any interaction with leverThen any interaction with lever Then actual pressing of leverThen actual pressing of lever
Evolutionary ShapingEvolutionary Shaping Environment changes, making task harderEnvironment changes, making task harder Evolution shapes behavior across generationsEvolution shapes behavior across generations Example: Migration given continental drift [1]Example: Migration given continental drift [1]
Animals become accustomed to short migrationAnimals become accustomed to short migration Continental drift increases distance of migrationContinental drift increases distance of migration Ability to travel increasing distances requiredAbility to travel increasing distances required
EC models with incremental evolution (ex. [2])EC models with incremental evolution (ex. [2])[1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975.[2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.
Arctic Tern
Atlantic Salmon
Fitness-Based ShapingFitness-Based Shaping Not extensively usedNot extensively used Little/no domain knowledge neededLittle/no domain knowledge needed Multiobjective approach a good fitMultiobjective approach a good fit Selection criteria changeSelection criteria change
Exploiting ignored objectives (Exploiting ignored objectives (TUGTUG)) Exploiting unfilled niches (Exploiting unfilled niches (BDBD))
Behavior Space
Crowded Niches
Uncrowded Niches
Objective Space
Dominated, but exploiting mostly ignored objective
Uncrowded NichesUncrowded Niches
Mutiobjective OptimizationMutiobjective Optimization
Pareto dominance: iffPareto dominance: iff
Assumes maximizationAssumes maximization Want nondominated pointsWant nondominated points NSGA-II used in this workNSGA-II used in this work
What to evolve?What to evolve? NNs as control policiesNNs as control policies
uv
ii uvni :,,1 ii uvni :,,1 Nondominate
d
Constructive NeuroevolutionConstructive Neuroevolution Genetic Algorithms + Neural NetworksGenetic Algorithms + Neural Networks Build structure incrementally Build structure incrementally
(complexification)(complexification) Good at generating control policiesGood at generating control policies Three basic mutations (no crossover Three basic mutations (no crossover
used)used)
Perturb WeightAdd Connection Add Node
TTargeting argeting UUnachieved nachieved GGoalsoals Main ideas:Main ideas:
Temporarily deactivate “easy” objectivesTemporarily deactivate “easy” objectives Focus on “hard” objectivesFocus on “hard” objectives
““Hard” and “easy” defined in terms of goal Hard” and “easy” defined in terms of goal valuesvalues Easy: average fitness “persists” above goal Easy: average fitness “persists” above goal
(achieved)(achieved) Hard: goal not yet achievedHard: goal not yet achieved
Objectives reactivated when no longer achieved Objectives reactivated when no longer achieved Increase goal values when all achievedIncrease goal values when all achieved
Evolution
Hard Objectives
TUG ExampleTUG Example
Goal achieved
Other goals also achieved → Goals increase
Reset recency-weighted average
Noisy evaluations
BBehavioral ehavioral DDiversityiversity Originally developed for single-objective tasks Originally developed for single-objective tasks
[3][3] Add behavioral diversity objectiveAdd behavioral diversity objective Encourage exploration of new behaviorsEncourage exploration of new behaviors Domain-specific behavior measure requiredDomain-specific behavior measure required
Extensions in this work:Extensions in this work: Multiobjective taskMultiobjective task Domain independent methodDomain independent method Only requires policy mapping Only requires policy mapping
ℝ ℝ to to ℝ , e.g. NNsℝ , e.g. NNs
[3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.
N M
Senses
Actions
Behavioral Diversity DetailsBehavioral Diversity Details Behavior vector:Behavior vector:
Given input vectors, concatenate outputsGiven input vectors, concatenate outputs
Behavioral diversity objective:Behavioral diversity objective: AVG distance from other AVG distance from other
behavior vectors behavior vectors
0.1 2.3 4.3 5.2 3.2
…
0.5 5.3 7.5 3.4 2.1
1.3 4.2 5.6 4.5 7.7
2.4 4.3 0.7 4.2 2.1 3.5 …
Behavior vector
High average distance from other points
Battle DomainBattle Domain Evolved monsters (blue)Evolved monsters (blue)
Monsters can hurt fighterMonsters can hurt fighter Scripted fighter (green)Scripted fighter (green)
Bat can hurt monstersBat can hurt monsters Three objectivesThree objectives
Deal damageDeal damage Avoid damageAvoid damage Stay aliveStay alive
Previous work required Previous work required incremental evolution to incremental evolution to solvesolve
Experimental ComparisonExperimental Comparison NN copied to 4 monsters NN copied to 4 monsters
Homogeneous teamsHomogeneous teams
In paperIn paper Control: Plain NSGA-IIControl: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goalsTUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectorsBD: NSGA-II with BD using random input vectors
Additional methods since publicationAdditional methods since publication TUG-Low: NSGA-II with TUG using minimal initial goalsTUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluationsBD-Obs: NSGA-II with BD using inputs from evaluations
Each repeated 30 timesEach repeated 30 times
Attainment Surfaces [4]Attainment Surfaces [4] Result attainment surfaceResult attainment surface
Shows space dominated by single Pareto frontShows space dominated by single Pareto front Summary attainment surface Summary attainment surface ss
Union of space dominated in at least Union of space dominated in at least ss out of out of nn runs runs Surface Surface ss weakly dominates weakly dominates s+1s+1, etc., etc.
Pareto Fronts(Approximation
Sets)
Result Attainment
Surfaces
Summary Attainment
Surfaces
Surface 1
Surface 2
Surface 3
Individual surfaces intersect
[4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.
Final Summary Attainment Final Summary Attainment SurfacesSurfaces
Control TUG BD
TUG-Low BD-Obs
Animation: worst to best summary attainment surface
Hypervolume Metric [5]Hypervolume Metric [5] Hypervolume of result attainment surfaceHypervolume of result attainment surface
Simply “volume” for 3 domain objectivesSimply “volume” for 3 domain objectives WRT reference pointWRT reference point
Slightly less than minimum scoresSlightly less than minimum scores Pareto-compliant metricPareto-compliant metric
Hypervolume = A + B + C + D
2121 HVHVFF
[5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.
HypervolumeHypervolume
Successful BehaviorsSuccessful Behaviors
BD
BD-Obs
TUG
TUG-Low
DiscussionDiscussion Control: more extreme trade-offsControl: more extreme trade-offs BD: more precise timingBD: more precise timing BD-Obs and BD similarBD-Obs and BD similar
““Real” inputs give no Real” inputs give no advantageadvantage
TUG: more teamworkTUG: more teamwork Particular initial objectivesParticular initial objectives
TUG-Low more like BD than TUGTUG-Low more like BD than TUG
ALL are better than ControlALL are better than Control
Future WorkFuture Work How to combine TUG and BDHow to combine TUG and BD
Naïve combination doesn’t workNaïve combination doesn’t work Scaling upScaling up
Many objectivesMany objectives More complex domainsMore complex domains Current work in Unreal Tournament Current work in Unreal Tournament
promisingpromising
ConclusionConclusion BD and TUG improve MO evolutionBD and TUG improve MO evolution Domain independence!Domain independence!
Contrast to task-based shapingContrast to task-based shaping Expand MOEAs to a new range of Expand MOEAs to a new range of
domainsdomains
Questions?Questions?
Email: Email: [email protected]@cs.utexas.edu
See movies at:See movies at:
http://nn.cs.utexas.edu/?fitness-http://nn.cs.utexas.edu/?fitness-shapingshaping
TUG DetailsTUG Details Persistence:Persistence:
Recency-weighted average Recency-weighted average surpasses goal surpasses goal
Goals:Goals:
Initial values based on domain knowledgeInitial values based on domain knowledge Or simply the minimal values for objectivesOr simply the minimal values for objectives Increase each goal when all are achievedIncrease each goal when all are achieved
Objectives reactivated when no longer Objectives reactivated when no longer achievedachieved
tr
)( 11 tttt rxrr
og)( max ooo gogg
Goal achieved
TUG CyclesTUG Cycles