Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolving Agent Behavior in Evolving Agent Behavior in Multiobjective Domains Using Multiobjective Domains Using

Fitness-Based ShapingFitness-Based Shaping

Jacob Schrum and Risto Jacob Schrum and Risto MiikkulainenMiikkulainen

University of Texas at AustinUniversity of Texas at Austin

Department of Computer ScienceDepartment of Computer Science

Typical Uses of MOEAsTypical Uses of MOEAs Where have MOEAs proven themselves?Where have MOEAs proven themselves?

Wireless Sensor Networks (Woehrle et al, 2010)Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009)Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006)Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004)Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004)Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003)Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001)Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997)Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995)Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, Design of laminated ceramic composites (Belegundu et al,

1994)1994)

Many engineering/design problems!Many engineering/design problems!

New Domains for MOEAsNew Domains for MOEAs Simulated agents often face multiple Simulated agents often face multiple

objectivesobjectives Automatic discovery of intelligent behaviorAutomatic discovery of intelligent behavior

Video game opponents in Unreal Video game opponents in Unreal Tournament (van Hoorn, 2009) Tournament (van Hoorn, 2009)

Predator/prey scenarios Predator/prey scenarios (Schrum & Miikkulainen 2009)(Schrum & Miikkulainen 2009)

Race car driving in TORCS Race car driving in TORCS (Agapitos et al, 2008) (Agapitos et al, 2008)

Comparatively little so farComparatively little so far Direct application of MOEA seldom successfulDirect application of MOEA seldom successful

Success often depends on “Success often depends on “shapingshaping””

What is Shaping?What is Shaping? Term from Behavioral PsychologyTerm from Behavioral Psychology Identified by B. F. Skinner (1938)Identified by B. F. Skinner (1938) Task-Based Example: Task-Based Example:

Train rat to press lever Train rat to press lever First reward proximityFirst reward proximity Then any interaction with leverThen any interaction with lever Then actual pressing of leverThen actual pressing of lever

Evolutionary ShapingEvolutionary Shaping Environment changes, making task harderEnvironment changes, making task harder Evolution shapes behavior across generationsEvolution shapes behavior across generations Example: Migration given continental drift [1]Example: Migration given continental drift [1]

Animals become accustomed to short migrationAnimals become accustomed to short migration Continental drift increases distance of migrationContinental drift increases distance of migration Ability to travel increasing distances requiredAbility to travel increasing distances required

EC models with incremental evolution (ex. [2])EC models with incremental evolution (ex. [2])[1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975.[2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.

Arctic Tern

Atlantic Salmon

Fitness-Based ShapingFitness-Based Shaping Not extensively usedNot extensively used Little/no domain knowledge neededLittle/no domain knowledge needed Multiobjective approach a good fitMultiobjective approach a good fit Selection criteria changeSelection criteria change

Exploiting ignored objectives (Exploiting ignored objectives (TUGTUG)) Exploiting unfilled niches (Exploiting unfilled niches (BDBD))

Behavior Space

Crowded Niches

Uncrowded Niches

Objective Space

Dominated, but exploiting mostly ignored objective

Uncrowded NichesUncrowded Niches

Mutiobjective OptimizationMutiobjective Optimization

Pareto dominance: iffPareto dominance: iff

Assumes maximizationAssumes maximization Want nondominated pointsWant nondominated points NSGA-II used in this workNSGA-II used in this work

What to evolve?What to evolve? NNs as control policiesNNs as control policies

uv

ii uvni :,,1 ii uvni :,,1 Nondominate

d

Constructive NeuroevolutionConstructive Neuroevolution Genetic Algorithms + Neural NetworksGenetic Algorithms + Neural Networks Build structure incrementally Build structure incrementally

(complexification)(complexification) Good at generating control policiesGood at generating control policies Three basic mutations (no crossover Three basic mutations (no crossover

used)used)

Perturb WeightAdd Connection Add Node

TTargeting argeting UUnachieved nachieved GGoalsoals Main ideas:Main ideas:

Temporarily deactivate “easy” objectivesTemporarily deactivate “easy” objectives Focus on “hard” objectivesFocus on “hard” objectives

““Hard” and “easy” defined in terms of goal Hard” and “easy” defined in terms of goal valuesvalues Easy: average fitness “persists” above goal Easy: average fitness “persists” above goal

(achieved)(achieved) Hard: goal not yet achievedHard: goal not yet achieved

Objectives reactivated when no longer achieved Objectives reactivated when no longer achieved Increase goal values when all achievedIncrease goal values when all achieved

Evolution

Hard Objectives

TUG ExampleTUG Example

Goal achieved

Other goals also achieved → Goals increase

Reset recency-weighted average

Noisy evaluations

BBehavioral ehavioral DDiversityiversity Originally developed for single-objective tasks Originally developed for single-objective tasks

[3][3] Add behavioral diversity objectiveAdd behavioral diversity objective Encourage exploration of new behaviorsEncourage exploration of new behaviors Domain-specific behavior measure requiredDomain-specific behavior measure required

Extensions in this work:Extensions in this work: Multiobjective taskMultiobjective task Domain independent methodDomain independent method Only requires policy mapping Only requires policy mapping

ℝ ℝ to to ℝ , e.g. NNsℝ , e.g. NNs

[3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.

N M

Senses

Actions

Behavioral Diversity DetailsBehavioral Diversity Details Behavior vector:Behavior vector:

Given input vectors, concatenate outputsGiven input vectors, concatenate outputs

Behavioral diversity objective:Behavioral diversity objective: AVG distance from other AVG distance from other

behavior vectors behavior vectors

0.1 2.3 4.3 5.2 3.2

…

0.5 5.3 7.5 3.4 2.1

1.3 4.2 5.6 4.5 7.7

2.4 4.3 0.7 4.2 2.1 3.5 …

Behavior vector

High average distance from other points

Battle DomainBattle Domain Evolved monsters (blue)Evolved monsters (blue)

Monsters can hurt fighterMonsters can hurt fighter Scripted fighter (green)Scripted fighter (green)

Bat can hurt monstersBat can hurt monsters Three objectivesThree objectives

Deal damageDeal damage Avoid damageAvoid damage Stay aliveStay alive

Previous work required Previous work required incremental evolution to incremental evolution to solvesolve

Experimental ComparisonExperimental Comparison NN copied to 4 monsters NN copied to 4 monsters

Homogeneous teamsHomogeneous teams

In paperIn paper Control: Plain NSGA-IIControl: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goalsTUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectorsBD: NSGA-II with BD using random input vectors

Additional methods since publicationAdditional methods since publication TUG-Low: NSGA-II with TUG using minimal initial goalsTUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluationsBD-Obs: NSGA-II with BD using inputs from evaluations

Each repeated 30 timesEach repeated 30 times

Attainment Surfaces [4]Attainment Surfaces [4] Result attainment surfaceResult attainment surface

Shows space dominated by single Pareto frontShows space dominated by single Pareto front Summary attainment surface Summary attainment surface ss

Union of space dominated in at least Union of space dominated in at least ss out of out of nn runs runs Surface Surface ss weakly dominates weakly dominates s+1s+1, etc., etc.

Pareto Fronts(Approximation

Sets)

Result Attainment

Surfaces

Summary Attainment

Surfaces

Surface 1

Surface 2

Surface 3

Individual surfaces intersect

[4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.

Final Summary Attainment Final Summary Attainment SurfacesSurfaces

Control TUG BD

TUG-Low BD-Obs

Animation: worst to best summary attainment surface

Hypervolume Metric [5]Hypervolume Metric [5] Hypervolume of result attainment surfaceHypervolume of result attainment surface

Simply “volume” for 3 domain objectivesSimply “volume” for 3 domain objectives WRT reference pointWRT reference point

Slightly less than minimum scoresSlightly less than minimum scores Pareto-compliant metricPareto-compliant metric

Hypervolume = A + B + C + D

2121 HVHVFF

[5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.

HypervolumeHypervolume

Successful BehaviorsSuccessful Behaviors

BD

BD-Obs

TUG

TUG-Low

DiscussionDiscussion Control: more extreme trade-offsControl: more extreme trade-offs BD: more precise timingBD: more precise timing BD-Obs and BD similarBD-Obs and BD similar

““Real” inputs give no Real” inputs give no advantageadvantage

TUG: more teamworkTUG: more teamwork Particular initial objectivesParticular initial objectives

TUG-Low more like BD than TUGTUG-Low more like BD than TUG

ALL are better than ControlALL are better than Control

Future WorkFuture Work How to combine TUG and BDHow to combine TUG and BD

Naïve combination doesn’t workNaïve combination doesn’t work Scaling upScaling up

Many objectivesMany objectives More complex domainsMore complex domains Current work in Unreal Tournament Current work in Unreal Tournament

promisingpromising

ConclusionConclusion BD and TUG improve MO evolutionBD and TUG improve MO evolution Domain independence!Domain independence!

Contrast to task-based shapingContrast to task-based shaping Expand MOEAs to a new range of Expand MOEAs to a new range of

domainsdomains

Questions?Questions?

Email: Email: [email protected]@cs.utexas.edu

See movies at:See movies at:

http://nn.cs.utexas.edu/?fitness-http://nn.cs.utexas.edu/?fitness-shapingshaping

mailto:[email protected]

TUG DetailsTUG Details Persistence:Persistence:

Recency-weighted average Recency-weighted average surpasses goal surpasses goal

Goals:Goals:

Initial values based on domain knowledgeInitial values based on domain knowledge Or simply the minimal values for objectivesOr simply the minimal values for objectives Increase each goal when all are achievedIncrease each goal when all are achieved

Objectives reactivated when no longer Objectives reactivated when no longer achievedachieved

tr

)( 11 tttt rxrr

og)( max ooo gogg

Goal achieved

TUG CyclesTUG Cycles

Documents

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping