Evolution of Teamwork in Multiagent Systems

Evolution of Teamwork Evolution of Teamwork in in

Multiagent SystemsMultiagent SystemsResearch Preparation Research Preparation

ExaminationExaminationby Jacob Schrumby Jacob Schrum

Why Multiple Agents?Why Multiple Agents? Many applicationsMany applications

– Physical WorldPhysical World RoboticsRobotics Autonomous automobilesAutonomous automobiles Military applicationsMilitary applications Network SystemsNetwork Systems

– Artificial WorldArtificial World GamesGames GraphicsGraphics EntertainmentEntertainment Artificial LifeArtificial Life

Why Multiagent Why Multiagent Perspective?Perspective?

Decentralized controlDecentralized control– Failure recoveryFailure recovery– Individual agents simpler Individual agents simpler

than whole than whole– Some environments don’t Some environments don’t

support central controlsupport central control Human interactionHuman interaction

– Humans are also agentsHumans are also agents– Agents interacting with Agents interacting with

humans are in MAS humans are in MAS

Teamwork in Multiagent Teamwork in Multiagent SystemsSystems

Problem divided amongst many agentsProblem divided amongst many agents Teamwork often required for successTeamwork often required for success Communication sometimes an issueCommunication sometimes an issue How to learn teamwork: open questionHow to learn teamwork: open question

Direct Approach: Careful Direct Approach: Careful DesignDesign Hand code everythingHand code everything

Benefits:Benefits:– Understand end productUnderstand end product

Drawbacks:Drawbacks:– Not generalNot general– DifficultDifficult– Programmer timeProgrammer time

Common in:Common in:– RoboticsRobotics– Video gamesVideo games– Most deployed systemsMost deployed systems

What if no one knows how to program it?What if no one knows how to program it?

Learn it: Reinforcement Learn it: Reinforcement LearningLearning

Environment is Markov Decision ProcessEnvironment is Markov Decision Process Learn optimal policyLearn optimal policy

– Depends on value function (TD methods)Depends on value function (TD methods)– Proven convergence in tabular caseProven convergence in tabular case– Function approximation needed for bigger problemsFunction approximation needed for bigger problems

Problems with Partially Observable MDPsProblems with Partially Observable MDPs Successes inSuccesses in

– Pred/Prey Scenarios (Tan 1993)Pred/Prey Scenarios (Tan 1993)– Soccer keep away Soccer keep away

(Kalyanakrishnan, Stone 2009)(Kalyanakrishnan, Stone 2009)– Robocup soccer (many…)Robocup soccer (many…)

Breed it: EvolutionBreed it: Evolution Based on evolution via natural Based on evolution via natural

selectionselection Benefits:Benefits:

– Less restrictive policy representationLess restrictive policy representation– Demonstrated success in POMDP domainsDemonstrated success in POMDP domains

Drawbacks:Drawbacks:– Computationally intensiveComputationally intensive– Time intensiveTime intensive

Focus of talkFocus of talk

Evolution BasicsEvolution Basics1.1. Initialize population PInitialize population P2.2. Evaluate all p in P (assign fitness)Evaluate all p in P (assign fitness)3.3. Derive P’ by selecting/modifying members Derive P’ by selecting/modifying members

of P based on their fitness scoresof P based on their fitness scores4.4. Repeat from step 2 with P’ as P until doneRepeat from step 2 with P’ as P until done

P’ is usually similar to P, but slightly betterP’ is usually similar to P, but slightly better Many variations: Many variations:

– Genetic Algorithms, Evolution Strategies, etc.Genetic Algorithms, Evolution Strategies, etc.

Evolution in Multiagent Evolution in Multiagent SystemsSystems

1.1. Team CompositionTeam CompositionA.A. HomogeneousHomogeneousB.B. HeterogeneousHeterogeneousC.C. Heterogeneous from SubpopulationsHeterogeneous from SubpopulationsD.D. Entire populationEntire population

2.2. Type of SelectionType of SelectionA.A. IndividualIndividualB.B. TeamTeamC.C. Self-SelectionSelf-Selection

3.3. Multiple ObjectivesMultiple ObjectivesPick one member from each

subpopulation to make a team

1.A. Homogeneous Teams1.A. Homogeneous Teams Team members share same policyTeam members share same policy Members know what to expect from team membersMembers know what to expect from team members One individual evaluated per trialOne individual evaluated per trial Evaluations reliable because of consistent team Evaluations reliable because of consistent team

compositioncomposition

1.B. Heterogeneous Teams1.B. Heterogeneous Teams Team composed of several policiesTeam composed of several policies Uncertainty as to who teammates will beUncertainty as to who teammates will be Multiple individuals evaluated per trialMultiple individuals evaluated per trial Evaluation differs depending on choice of team Evaluation differs depending on choice of team

membersmembers

1.C. Subpopulations1.C. Subpopulations Each slot filled by representative from specific Each slot filled by representative from specific

subpopulationsubpopulation Subpopulations specializeSubpopulations specialize Individuals know what to expect of members in each slotIndividuals know what to expect of members in each slot Team composition is still heterogeneousTeam composition is still heterogeneous

1.D. Entire Population1.D. Entire Population The entire population is seen as a cooperating teamThe entire population is seen as a cooperating team Team level selection not possibleTeam level selection not possible Population may divide into competing subpopulationsPopulation may divide into competing subpopulations

– Mating restrictionsMating restrictions– Genetic/Tag-based recognitionGenetic/Tag-based recognition

2.A. Individual Selection2.A. Individual Selection Individuals selected based on own Individuals selected based on own

fitnessfitness• Commonly used with heterogeneous Commonly used with heterogeneous

teamsteams• Can result in selfish behaviorsCan result in selfish behaviors• Altruism relevantAltruism relevant

• sacrificing own fitness to raise fitness of anothersacrificing own fitness to raise fitness of another• Reciprocity relevantReciprocity relevant

• helping another to get help in returnhelping another to get help in return

2.B. Team Selection2.B. Team Selection Individuals selected based on team fitnessIndividuals selected based on team fitness

– Common fitness, sum, average, etc.Common fitness, sum, average, etc.• Commonly used with homogeneous teamsCommonly used with homogeneous teams• Enables slackers in heterogeneous teamsEnables slackers in heterogeneous teams• Altruism and reciprocity have no meaningAltruism and reciprocity have no meaning• No credit assignment problems between No credit assignment problems between

membersmembers

2.C. Self-Selection2.C. Self-Selection Individuals choose when and with whom to Individuals choose when and with whom to

matemate• Common in Artificial Life simulationsCommon in Artificial Life simulations

• AL studies emergence of biological phenomenaAL studies emergence of biological phenomena• Usually involves a spatial componentUsually involves a spatial component• Extinction is possibleExtinction is possible

• Auto restartAuto restart• Spawn new membersSpawn new members

3. Multiple Objectives3. Multiple Objectives Assume individual has fitness scores:Assume individual has fitness scores:

– FF = (f1,…,fN) in objectives 1 through N = (f1,…,fN) in objectives 1 through N Which values of Which values of FF are best? are best? Traditional approachTraditional approach

– fitness(fitness(FF) = f1*w1 + … + fN*wN for weights w1,) = f1*w1 + … + fN*wN for weights w1,…,wN…,wN

Pareto-based approachPareto-based approach– Partition population into non-dominated Pareto Partition population into non-dominated Pareto

frontsfronts– Assign fitness based on Pareto-frontAssign fitness based on Pareto-front

Pareto Front ExamplePareto Front Example Each point represents Each point represents

an individual’s scores an individual’s scores Point dominates other Point dominates other

points in its boxpoints in its box 3 Pareto fronts of 3 Pareto fronts of

non-dominated pointsnon-dominated points

Case StudiesCase Studies Review State of the ArtReview State of the Art For each study:For each study:

– Classify type of selectionClassify type of selection– Classify team compositionClassify team composition– Identify unanswered questionsIdentify unanswered questions– Future research directionsFuture research directions

AntFarmAntFarm Evolve foraging behaviorEvolve foraging behavior

– Pheromones to communicatePheromones to communicate Individual selectionIndividual selection Entire population as a teamEntire population as a team No cooperative foraging!No cooperative foraging!

– Likely cause: individual selectionLikely cause: individual selection

Individual selection offers less incentive for Individual selection offers less incentive for teamworkteamwork

Teamwork especially difficult when there is only one Teamwork especially difficult when there is only one teamteam

* AntFarm: Towards Simulated Evolution. Collins, Jefferson. 1991

Evolving CommunicationEvolving Communication Exploration taskExploration task

– Pheromones to communicatePheromones to communicate Team selectionTeam selection Homogeneous teams vs. static botsHomogeneous teams vs. static bots Pairs of objectives, Pareto-basedPairs of objectives, Pareto-based Different behaviors in different runsDifferent behaviors in different runs

– Compromise strategyCompromise strategy– Blocking strategyBlocking strategy

Teamwork possible with homogeneous teamsTeamwork possible with homogeneous teams Need to move beyond grid-worldsNeed to move beyond grid-worlds Move beyond two objectivesMove beyond two objectives

* Emergence of Communication in Competitive Multi-Agent Systems: A Pareto Multi-Objective Approach. McPartland, Nolfi, Abbass. 2005

SwarmEvolveTagsSwarmEvolveTags Birds visit food stationsBirds visit food stations Energy can be sharedEnergy can be shared

– Sharing based on Sharing based on tagstags Self-selection Self-selection Entire population as teamEntire population as team

– Competing subpopulations emergedCompeting subpopulations emerged

Cooperation in entire population without team Cooperation in entire population without team selectionselection

Altruism via aiding similar individualsAltruism via aiding similar individuals Teamwork as a result of subpopulation homogeneityTeamwork as a result of subpopulation homogeneity* Tags and the Evolution of Cooperation in Complex Environments. Spector, Klein, Perry. 2004

* Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Legion-ILegion-I Roman legions defend countryside and citiesRoman legions defend countryside and cities Team level selection Team level selection Homogeneous teamsHomogeneous teams Multi-modal behaviorMulti-modal behavior

– Defend cityDefend city– Pursue barbariansPursue barbarians

Homogeneous team members must fill all Homogeneous team members must fill all rolesroles

Could not learn more complicated/strategic Could not learn more complicated/strategic taskstasksExample: building roads to speed up travelExample: building roads to speed up travel

* Neuroevolution for Adaptive Teams. Bryant, Miikkulainen. 2003

Role-Based CooperationRole-Based Cooperation Toroidal predator/prey grid worldToroidal predator/prey grid world Individual selectionIndividual selection

– Team fitness shared by team membersTeam fitness shared by team members Multi-Agent ESP: Multi-Agent ESP: subpopulationsubpopulation based based Simple non-communicating method Simple non-communicating method

outperforms communicating methodoutperforms communicating method

Teamwork without homogeneityTeamwork without homogeneity Communication not always neededCommunication not always needed

May only apply to simple domainsMay only apply to simple domains Still need to scale up complexityStill need to scale up complexity

Get away from grid worldsGet away from grid worlds

* Coevolution of Role-Based Cooperation in Multi-Agent Systems. Yong, Miikkulainen. 2007

NERONERO Machine Learning gameMachine Learning game

– Human interaction via fitness functionHuman interaction via fitness function Individual selectionIndividual selection Entire population is teamEntire population is team Multiple objectives Multiple objectives

– User defines weights dynamicallyUser defines weights dynamically

Maintenance of fitness functionMaintenance of fitness function Old behaviors can be forgotten Old behaviors can be forgotten

when learning new ones when learning new ones Need to learn multiple tasks simultaneouslyNeed to learn multiple tasks simultaneously

* Evolving Neural Network Agents in the NERO Videogame. Stanley, Bryant, Miikkulainen. 2005

Pareto Multi-objective NPCsPareto Multi-objective NPCs Evolved monsters vs. bot with stickEvolved monsters vs. bot with stick Individual selectionIndividual selection Large heterogeneous teams of 15Large heterogeneous teams of 15

– Third of entire populationThird of entire population Multiple objectives, Pareto-basedMultiple objectives, Pareto-based

– Credit assignment trickCredit assignment trick

Learns multiple objectives simultaneouslyLearns multiple objectives simultaneously Different runs can lead to very different resultsDifferent runs can lead to very different results

Different areas of trade-off surfaceDifferent areas of trade-off surface Population becomes mostly homogeneousPopulation becomes mostly homogeneous* Constructing Complex NPC Behavior via Multi-Objective Neuroevolution. Schrum, Miikkulainen. 2008

Dead End GameDead End Game

* Interactive Opponents Generate Interesting Games. Yannakakis, Hallam. 2004

Human prey vs. predatorsHuman prey vs. predators Offline evolution vs. botOffline evolution vs. bot

– Team level selectionTeam level selection– Homogeneous teamsHomogeneous teams

Online evolution vs. humanOnline evolution vs. human– Individual selectionIndividual selection– Small heterogeneous teamSmall heterogeneous team

Different configurations appropriate at different Different configurations appropriate at different levelslevels Sometimes the domain leaves no choiceSometimes the domain leaves no choice

Cooperating RobotsCooperating Robots Retrieve tokensRetrieve tokens Simulation Simulation →→ Robots Robots Compared selection levelsCompared selection levels

– Individual vs. TeamIndividual vs. Team Compared team compositionsCompared team compositions

– Homogeneous vs. heterogeneousHomogeneous vs. heterogeneous

Homogeneous better with teamwork and altruismHomogeneous better with teamwork and altruism Homogeneous best with team selectionHomogeneous best with team selection Heterogeneous best with individual selectionHeterogeneous best with individual selection Did not consider subpopulationsDid not consider subpopulations Tasks only involved foraging (no other objectives)Tasks only involved foraging (no other objectives)

* Genetic Team Composition and Level of Selection in the Evolution of Cooperation. Waibel, Keller, Floreano. 2008

Summary of IssuesSummary of Issues More complexityMore complexity

– Move beyond grid worldsMove beyond grid worlds– Need multiple Need multiple contradictorycontradictory objectives objectives– Act in continuous, real-time worldAct in continuous, real-time world

Best evolutionary configurationBest evolutionary configuration– More comparisons between team compositionsMore comparisons between team compositions

Especially subpopulation-based methodEspecially subpopulation-based method– Task/configuration pairings?Task/configuration pairings?– Credit assignment issuesCredit assignment issues

Multi-modal behaviorMulti-modal behavior– What to do and whenWhat to do and when

ExperimentExperiment Four monsters vs. bot with stickFour monsters vs. bot with stick

– Smaller team makes task harderSmaller team makes task harder Compare homogeneous, heterogeneous and Compare homogeneous, heterogeneous and

subpopulationsubpopulation– Homogeneous uses team selectionHomogeneous uses team selection– Others use individual selectionOthers use individual selection

Multiple objectives:Multiple objectives:– Group damageGroup damage– Individual injuryIndividual injury– Individual time aliveIndividual time alive

Heterogeneous ResultsHeterogeneous Results Many generations (600+)Many generations (600+)

– Not that long in real timeNot that long in real time Mostly selfishMostly selfish

– Good teamwork can arise though (Baiting)Good teamwork can arise though (Baiting) Teamwork depends on Teamwork depends on populationpopulation being being

homogeneoushomogeneous

TeamworkSelfish

Homogeneous ResultsHomogeneous Results Fewer Generations (100-200)Fewer Generations (100-200)

– Actually longer in real timeActually longer in real time Always some form a teamworkAlways some form a teamwork

– BaitingBaiting– Timed AssaultTimed Assault

Baiting Time Assault

Subpopulations ResultsSubpopulations Results Many Generations (400+)Many Generations (400+) Each generation takes a lot of real timeEach generation takes a lot of real time Easy for slacker subpopulation to persistEasy for slacker subpopulation to persist Limited teamworkLimited teamwork

– Only some members participateOnly some members participate

Cooperating Pair

DiscussionDiscussion Can subpopulation method do better?Can subpopulation method do better?

– Better credit assignmentBetter credit assignment– Team level selection (how?)Team level selection (how?)

Speed up homogeneous and Speed up homogeneous and subpopulationssubpopulations

Heterogeneous: discourage selfishnessHeterogeneous: discourage selfishness

Future Research QuestionsFuture Research Questions Credit assignment issuesCredit assignment issues

– Cooperating individuals cannot be identifiedCooperating individuals cannot be identified– Objectives define best evolutionary Objectives define best evolutionary

configuration?configuration? Complex domains/real problemsComplex domains/real problems

– Many objectivesMany objectives– Continuous, real-timeContinuous, real-time

Potential challenge domainsPotential challenge domains– Robocup SoccerRobocup Soccer– Unreal TournamentUnreal Tournament

ConclusionConclusion Teamwork in Multiagent Systems important Teamwork in Multiagent Systems important

areaarea Evolution has been successfulEvolution has been successful Better understand whyBetter understand why

– Team configurationTeam configuration– Level of selectionLevel of selection– Presence/absence of credit assignment problemsPresence/absence of credit assignment problems

Apply to harder domainsApply to harder domains– Real-timeReal-time– Continuous/noisyContinuous/noisy– Multiple contradictory objectivesMultiple contradictory objectives

[email protected]@cs.utexas.edu

Auxiliary SlidesAuxiliary Slides

Cooperation Without Cooperation Without ReciprocityReciprocity

Abstract study of the evolution of cooperationAbstract study of the evolution of cooperation Donor/recipient modelDonor/recipient model 3 random pairings with option of donating fitness 3 random pairings with option of donating fitness cc so that recipient can gain fitness so that recipient can gain fitness bb

Choice to donate based on similarity of Choice to donate based on similarity of tagstags Individual selection with entire population as Individual selection with entire population as

teamteam– Subpopulations emerged based on tagsSubpopulations emerged based on tags

Donation rate changes cyclically, but generally Donation rate changes cyclically, but generally stays high (73%) for stays high (73%) for cc < < bb

Need to apply in actual domain requiring Need to apply in actual domain requiring teamworkteamwork

* Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Cooperation Without Reciprocity Cooperation Without Reciprocity ResultsResults

Team Composition in MASTeam Composition in MAS Taxonomy proposed by Stone*:Taxonomy proposed by Stone*:

Heterogeneous Heterogeneous Communicating AgentsCommunicating Agents

Heterogeneous Heterogeneous Non-communicating AgentsNon-communicating Agents

Homogeneous Communicating Homogeneous Communicating AgentsAgents

Homogeneous Non-Homogeneous Non-communicating Agentscommunicating Agents

* Multiagent Systems: A Survey from a Machine Learning Perspective. Stone. 2000

Definition of communication is broad:Definition of communication is broad:– Message passing, blackboard, information Message passing, blackboard, information

sharing, etc.sharing, etc.

Documents

Evolution of Teamwork in Multiagent Systems