Upload
lynnea
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Evolution of Teamwork in Multiagent Systems. Research Preparation Examination by Jacob Schrum. Why Multiple Agents?. Many applications Physical World Robotics Autonomous automobiles Military applications Network Systems Artificial World Games Graphics Entertainment Artificial Life. - PowerPoint PPT Presentation
Citation preview
Evolution of Teamwork Evolution of Teamwork in in
Multiagent SystemsMultiagent SystemsResearch Preparation Research Preparation
ExaminationExaminationby Jacob Schrumby Jacob Schrum
Why Multiple Agents?Why Multiple Agents? Many applicationsMany applications
– Physical WorldPhysical World RoboticsRobotics Autonomous automobilesAutonomous automobiles Military applicationsMilitary applications Network SystemsNetwork Systems
– Artificial WorldArtificial World GamesGames GraphicsGraphics EntertainmentEntertainment Artificial LifeArtificial Life
Why Multiagent Why Multiagent Perspective?Perspective?
Decentralized controlDecentralized control– Failure recoveryFailure recovery– Individual agents simpler Individual agents simpler
than whole than whole– Some environments don’t Some environments don’t
support central controlsupport central control Human interactionHuman interaction
– Humans are also agentsHumans are also agents– Agents interacting with Agents interacting with
humans are in MAS humans are in MAS
Teamwork in Multiagent Teamwork in Multiagent SystemsSystems
Problem divided amongst many agentsProblem divided amongst many agents Teamwork often required for successTeamwork often required for success Communication sometimes an issueCommunication sometimes an issue How to learn teamwork: open questionHow to learn teamwork: open question
Direct Approach: Careful Direct Approach: Careful DesignDesign Hand code everythingHand code everything
Benefits:Benefits:– Understand end productUnderstand end product
Drawbacks:Drawbacks:– Not generalNot general– DifficultDifficult– Programmer timeProgrammer time
Common in:Common in:– RoboticsRobotics– Video gamesVideo games– Most deployed systemsMost deployed systems
What if no one knows how to program it?What if no one knows how to program it?
Learn it: Reinforcement Learn it: Reinforcement LearningLearning
Environment is Markov Decision ProcessEnvironment is Markov Decision Process Learn optimal policyLearn optimal policy
– Depends on value function (TD methods)Depends on value function (TD methods)– Proven convergence in tabular caseProven convergence in tabular case– Function approximation needed for bigger problemsFunction approximation needed for bigger problems
Problems with Partially Observable MDPsProblems with Partially Observable MDPs Successes inSuccesses in
– Pred/Prey Scenarios (Tan 1993)Pred/Prey Scenarios (Tan 1993)– Soccer keep away Soccer keep away
(Kalyanakrishnan, Stone 2009)(Kalyanakrishnan, Stone 2009)– Robocup soccer (many…)Robocup soccer (many…)
Breed it: EvolutionBreed it: Evolution Based on evolution via natural Based on evolution via natural
selectionselection Benefits:Benefits:
– Less restrictive policy representationLess restrictive policy representation– Demonstrated success in POMDP domainsDemonstrated success in POMDP domains
Drawbacks:Drawbacks:– Computationally intensiveComputationally intensive– Time intensiveTime intensive
Focus of talkFocus of talk
Evolution BasicsEvolution Basics1.1. Initialize population PInitialize population P2.2. Evaluate all p in P (assign fitness)Evaluate all p in P (assign fitness)3.3. Derive P’ by selecting/modifying members Derive P’ by selecting/modifying members
of P based on their fitness scoresof P based on their fitness scores4.4. Repeat from step 2 with P’ as P until doneRepeat from step 2 with P’ as P until done
P’ is usually similar to P, but slightly betterP’ is usually similar to P, but slightly better Many variations: Many variations:
– Genetic Algorithms, Evolution Strategies, etc.Genetic Algorithms, Evolution Strategies, etc.
Evolution in Multiagent Evolution in Multiagent SystemsSystems
1.1. Team CompositionTeam CompositionA.A. HomogeneousHomogeneousB.B. HeterogeneousHeterogeneousC.C. Heterogeneous from SubpopulationsHeterogeneous from SubpopulationsD.D. Entire populationEntire population
2.2. Type of SelectionType of SelectionA.A. IndividualIndividualB.B. TeamTeamC.C. Self-SelectionSelf-Selection
3.3. Multiple ObjectivesMultiple ObjectivesPick one member from each
subpopulation to make a team
1.A. Homogeneous Teams1.A. Homogeneous Teams Team members share same policyTeam members share same policy Members know what to expect from team membersMembers know what to expect from team members One individual evaluated per trialOne individual evaluated per trial Evaluations reliable because of consistent team Evaluations reliable because of consistent team
compositioncomposition
1.B. Heterogeneous Teams1.B. Heterogeneous Teams Team composed of several policiesTeam composed of several policies Uncertainty as to who teammates will beUncertainty as to who teammates will be Multiple individuals evaluated per trialMultiple individuals evaluated per trial Evaluation differs depending on choice of team Evaluation differs depending on choice of team
membersmembers
1.C. Subpopulations1.C. Subpopulations Each slot filled by representative from specific Each slot filled by representative from specific
subpopulationsubpopulation Subpopulations specializeSubpopulations specialize Individuals know what to expect of members in each slotIndividuals know what to expect of members in each slot Team composition is still heterogeneousTeam composition is still heterogeneous
1.D. Entire Population1.D. Entire Population The entire population is seen as a cooperating teamThe entire population is seen as a cooperating team Team level selection not possibleTeam level selection not possible Population may divide into competing subpopulationsPopulation may divide into competing subpopulations
– Mating restrictionsMating restrictions– Genetic/Tag-based recognitionGenetic/Tag-based recognition
2.A. Individual Selection2.A. Individual Selection Individuals selected based on own Individuals selected based on own
fitnessfitness• Commonly used with heterogeneous Commonly used with heterogeneous
teamsteams• Can result in selfish behaviorsCan result in selfish behaviors• Altruism relevantAltruism relevant
• sacrificing own fitness to raise fitness of anothersacrificing own fitness to raise fitness of another• Reciprocity relevantReciprocity relevant
• helping another to get help in returnhelping another to get help in return
2.B. Team Selection2.B. Team Selection Individuals selected based on team fitnessIndividuals selected based on team fitness
– Common fitness, sum, average, etc.Common fitness, sum, average, etc.• Commonly used with homogeneous teamsCommonly used with homogeneous teams• Enables slackers in heterogeneous teamsEnables slackers in heterogeneous teams• Altruism and reciprocity have no meaningAltruism and reciprocity have no meaning• No credit assignment problems between No credit assignment problems between
membersmembers
2.C. Self-Selection2.C. Self-Selection Individuals choose when and with whom to Individuals choose when and with whom to
matemate• Common in Artificial Life simulationsCommon in Artificial Life simulations
• AL studies emergence of biological phenomenaAL studies emergence of biological phenomena• Usually involves a spatial componentUsually involves a spatial component• Extinction is possibleExtinction is possible
• Auto restartAuto restart• Spawn new membersSpawn new members
3. Multiple Objectives3. Multiple Objectives Assume individual has fitness scores:Assume individual has fitness scores:
– FF = (f1,…,fN) in objectives 1 through N = (f1,…,fN) in objectives 1 through N Which values of Which values of FF are best? are best? Traditional approachTraditional approach
– fitness(fitness(FF) = f1*w1 + … + fN*wN for weights w1,) = f1*w1 + … + fN*wN for weights w1,…,wN…,wN
Pareto-based approachPareto-based approach– Partition population into non-dominated Pareto Partition population into non-dominated Pareto
frontsfronts– Assign fitness based on Pareto-frontAssign fitness based on Pareto-front
Pareto Front ExamplePareto Front Example Each point represents Each point represents
an individual’s scores an individual’s scores Point dominates other Point dominates other
points in its boxpoints in its box 3 Pareto fronts of 3 Pareto fronts of
non-dominated pointsnon-dominated points
Case StudiesCase Studies Review State of the ArtReview State of the Art For each study:For each study:
– Classify type of selectionClassify type of selection– Classify team compositionClassify team composition– Identify unanswered questionsIdentify unanswered questions– Future research directionsFuture research directions
AntFarmAntFarm Evolve foraging behaviorEvolve foraging behavior
– Pheromones to communicatePheromones to communicate Individual selectionIndividual selection Entire population as a teamEntire population as a team No cooperative foraging!No cooperative foraging!
– Likely cause: individual selectionLikely cause: individual selection
Individual selection offers less incentive for Individual selection offers less incentive for teamworkteamwork
Teamwork especially difficult when there is only one Teamwork especially difficult when there is only one teamteam
* AntFarm: Towards Simulated Evolution. Collins, Jefferson. 1991
Evolving CommunicationEvolving Communication Exploration taskExploration task
– Pheromones to communicatePheromones to communicate Team selectionTeam selection Homogeneous teams vs. static botsHomogeneous teams vs. static bots Pairs of objectives, Pareto-basedPairs of objectives, Pareto-based Different behaviors in different runsDifferent behaviors in different runs
– Compromise strategyCompromise strategy– Blocking strategyBlocking strategy
Teamwork possible with homogeneous teamsTeamwork possible with homogeneous teams Need to move beyond grid-worldsNeed to move beyond grid-worlds Move beyond two objectivesMove beyond two objectives
* Emergence of Communication in Competitive Multi-Agent Systems: A Pareto Multi-Objective Approach. McPartland, Nolfi, Abbass. 2005
SwarmEvolveTagsSwarmEvolveTags Birds visit food stationsBirds visit food stations Energy can be sharedEnergy can be shared
– Sharing based on Sharing based on tagstags Self-selection Self-selection Entire population as teamEntire population as team
– Competing subpopulations emergedCompeting subpopulations emerged
Cooperation in entire population without team Cooperation in entire population without team selectionselection
Altruism via aiding similar individualsAltruism via aiding similar individuals Teamwork as a result of subpopulation homogeneityTeamwork as a result of subpopulation homogeneity* Tags and the Evolution of Cooperation in Complex Environments. Spector, Klein, Perry. 2004
* Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001
Legion-ILegion-I Roman legions defend countryside and citiesRoman legions defend countryside and cities Team level selection Team level selection Homogeneous teamsHomogeneous teams Multi-modal behaviorMulti-modal behavior
– Defend cityDefend city– Pursue barbariansPursue barbarians
Homogeneous team members must fill all Homogeneous team members must fill all rolesroles
Could not learn more complicated/strategic Could not learn more complicated/strategic taskstasksExample: building roads to speed up travelExample: building roads to speed up travel
* Neuroevolution for Adaptive Teams. Bryant, Miikkulainen. 2003
Role-Based CooperationRole-Based Cooperation Toroidal predator/prey grid worldToroidal predator/prey grid world Individual selectionIndividual selection
– Team fitness shared by team membersTeam fitness shared by team members Multi-Agent ESP: Multi-Agent ESP: subpopulationsubpopulation based based Simple non-communicating method Simple non-communicating method
outperforms communicating methodoutperforms communicating method
Teamwork without homogeneityTeamwork without homogeneity Communication not always neededCommunication not always needed
May only apply to simple domainsMay only apply to simple domains Still need to scale up complexityStill need to scale up complexity
Get away from grid worldsGet away from grid worlds
* Coevolution of Role-Based Cooperation in Multi-Agent Systems. Yong, Miikkulainen. 2007
NERONERO Machine Learning gameMachine Learning game
– Human interaction via fitness functionHuman interaction via fitness function Individual selectionIndividual selection Entire population is teamEntire population is team Multiple objectives Multiple objectives
– User defines weights dynamicallyUser defines weights dynamically
Maintenance of fitness functionMaintenance of fitness function Old behaviors can be forgotten Old behaviors can be forgotten
when learning new ones when learning new ones Need to learn multiple tasks simultaneouslyNeed to learn multiple tasks simultaneously
* Evolving Neural Network Agents in the NERO Videogame. Stanley, Bryant, Miikkulainen. 2005
Pareto Multi-objective NPCsPareto Multi-objective NPCs Evolved monsters vs. bot with stickEvolved monsters vs. bot with stick Individual selectionIndividual selection Large heterogeneous teams of 15Large heterogeneous teams of 15
– Third of entire populationThird of entire population Multiple objectives, Pareto-basedMultiple objectives, Pareto-based
– Credit assignment trickCredit assignment trick
Learns multiple objectives simultaneouslyLearns multiple objectives simultaneously Different runs can lead to very different resultsDifferent runs can lead to very different results
Different areas of trade-off surfaceDifferent areas of trade-off surface Population becomes mostly homogeneousPopulation becomes mostly homogeneous* Constructing Complex NPC Behavior via Multi-Objective Neuroevolution. Schrum, Miikkulainen. 2008
Dead End GameDead End Game
* Interactive Opponents Generate Interesting Games. Yannakakis, Hallam. 2004
Human prey vs. predatorsHuman prey vs. predators Offline evolution vs. botOffline evolution vs. bot
– Team level selectionTeam level selection– Homogeneous teamsHomogeneous teams
Online evolution vs. humanOnline evolution vs. human– Individual selectionIndividual selection– Small heterogeneous teamSmall heterogeneous team
Different configurations appropriate at different Different configurations appropriate at different levelslevels Sometimes the domain leaves no choiceSometimes the domain leaves no choice
Cooperating RobotsCooperating Robots Retrieve tokensRetrieve tokens Simulation Simulation →→ Robots Robots Compared selection levelsCompared selection levels
– Individual vs. TeamIndividual vs. Team Compared team compositionsCompared team compositions
– Homogeneous vs. heterogeneousHomogeneous vs. heterogeneous
Homogeneous better with teamwork and altruismHomogeneous better with teamwork and altruism Homogeneous best with team selectionHomogeneous best with team selection Heterogeneous best with individual selectionHeterogeneous best with individual selection Did not consider subpopulationsDid not consider subpopulations Tasks only involved foraging (no other objectives)Tasks only involved foraging (no other objectives)
* Genetic Team Composition and Level of Selection in the Evolution of Cooperation. Waibel, Keller, Floreano. 2008
Summary of IssuesSummary of Issues More complexityMore complexity
– Move beyond grid worldsMove beyond grid worlds– Need multiple Need multiple contradictorycontradictory objectives objectives– Act in continuous, real-time worldAct in continuous, real-time world
Best evolutionary configurationBest evolutionary configuration– More comparisons between team compositionsMore comparisons between team compositions
Especially subpopulation-based methodEspecially subpopulation-based method– Task/configuration pairings?Task/configuration pairings?– Credit assignment issuesCredit assignment issues
Multi-modal behaviorMulti-modal behavior– What to do and whenWhat to do and when
ExperimentExperiment Four monsters vs. bot with stickFour monsters vs. bot with stick
– Smaller team makes task harderSmaller team makes task harder Compare homogeneous, heterogeneous and Compare homogeneous, heterogeneous and
subpopulationsubpopulation– Homogeneous uses team selectionHomogeneous uses team selection– Others use individual selectionOthers use individual selection
Multiple objectives:Multiple objectives:– Group damageGroup damage– Individual injuryIndividual injury– Individual time aliveIndividual time alive
Heterogeneous ResultsHeterogeneous Results Many generations (600+)Many generations (600+)
– Not that long in real timeNot that long in real time Mostly selfishMostly selfish
– Good teamwork can arise though (Baiting)Good teamwork can arise though (Baiting) Teamwork depends on Teamwork depends on populationpopulation being being
homogeneoushomogeneous
TeamworkSelfish
Homogeneous ResultsHomogeneous Results Fewer Generations (100-200)Fewer Generations (100-200)
– Actually longer in real timeActually longer in real time Always some form a teamworkAlways some form a teamwork
– BaitingBaiting– Timed AssaultTimed Assault
Baiting Time Assault
Subpopulations ResultsSubpopulations Results Many Generations (400+)Many Generations (400+) Each generation takes a lot of real timeEach generation takes a lot of real time Easy for slacker subpopulation to persistEasy for slacker subpopulation to persist Limited teamworkLimited teamwork
– Only some members participateOnly some members participate
Cooperating Pair
DiscussionDiscussion Can subpopulation method do better?Can subpopulation method do better?
– Better credit assignmentBetter credit assignment– Team level selection (how?)Team level selection (how?)
Speed up homogeneous and Speed up homogeneous and subpopulationssubpopulations
Heterogeneous: discourage selfishnessHeterogeneous: discourage selfishness
Future Research QuestionsFuture Research Questions Credit assignment issuesCredit assignment issues
– Cooperating individuals cannot be identifiedCooperating individuals cannot be identified– Objectives define best evolutionary Objectives define best evolutionary
configuration?configuration? Complex domains/real problemsComplex domains/real problems
– Many objectivesMany objectives– Continuous, real-timeContinuous, real-time
Potential challenge domainsPotential challenge domains– Robocup SoccerRobocup Soccer– Unreal TournamentUnreal Tournament
ConclusionConclusion Teamwork in Multiagent Systems important Teamwork in Multiagent Systems important
areaarea Evolution has been successfulEvolution has been successful Better understand whyBetter understand why
– Team configurationTeam configuration– Level of selectionLevel of selection– Presence/absence of credit assignment problemsPresence/absence of credit assignment problems
Apply to harder domainsApply to harder domains– Real-timeReal-time– Continuous/noisyContinuous/noisy– Multiple contradictory objectivesMultiple contradictory objectives
[email protected]@cs.utexas.edu
Auxiliary SlidesAuxiliary Slides
Cooperation Without Cooperation Without ReciprocityReciprocity
Abstract study of the evolution of cooperationAbstract study of the evolution of cooperation Donor/recipient modelDonor/recipient model 3 random pairings with option of donating fitness 3 random pairings with option of donating fitness cc so that recipient can gain fitness so that recipient can gain fitness bb
Choice to donate based on similarity of Choice to donate based on similarity of tagstags Individual selection with entire population as Individual selection with entire population as
teamteam– Subpopulations emerged based on tagsSubpopulations emerged based on tags
Donation rate changes cyclically, but generally Donation rate changes cyclically, but generally stays high (73%) for stays high (73%) for cc < < bb
Need to apply in actual domain requiring Need to apply in actual domain requiring teamworkteamwork
* Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001
Cooperation Without Reciprocity Cooperation Without Reciprocity ResultsResults
Team Composition in MASTeam Composition in MAS Taxonomy proposed by Stone*:Taxonomy proposed by Stone*:
Heterogeneous Heterogeneous Communicating AgentsCommunicating Agents
Heterogeneous Heterogeneous Non-communicating AgentsNon-communicating Agents
Homogeneous Communicating Homogeneous Communicating AgentsAgents
Homogeneous Non-Homogeneous Non-communicating Agentscommunicating Agents
* Multiagent Systems: A Survey from a Machine Learning Perspective. Stone. 2000
Definition of communication is broad:Definition of communication is broad:– Message passing, blackboard, information Message passing, blackboard, information
sharing, etc.sharing, etc.