Evolutionary Iterated Prisoner’s Dilemma Game H.-T. Kim Evolutionary Computation, 2009

Embed Size (px)

Citation preview

  • Slide 1

Evolutionary Iterated Prisoners Dilemma Game H.-T. Kim Evolutionary Computation, 2009 Slide 2 Outline Evolutionary Prisoner's Dilemma Game Prisoner's Dilemma Game Iterated Prisoner's Dilemma Game N-person Iterated Prisoner's Dilemma Game Robert Axelrods nIPD game Evolution of Iterated Prisoner's Dilemma Game Strategies in Structured Demes Under Random Pairing in Game Playing Simulation on Worksite Interactions between Laborers and Firms by using Multi-Agent based Evolutionary Computation 1 Slide 3 Prisoner's Dilemma Game 2 payoff table ( ) ( ) ( ) 2 ( ) 6 SC , . . . 2 . , 6 . . , ? A payoff table B ( ) ( ) A ( ) 44 00 ( ) 10 11 ! Slide 4 Prisoner's Dilemma Game 1950 Merrill Flood Melvin Dresher 2 2 player , , Ex) , R : payoffT : payoff P : payoffS : payoff 3 Slide 5 Iterated Prisoner's Dilemma Game IPD 2 player Prisoner Dilemma IPD player , 4 Slide 6 N-person Iterated Prisoner's Dilemma Game nIPD 2 n player Real-world problem Robert Axelrod nIPD nIPD , ? step1) step2) , ? 5 Slide 7 Robert Axelrods nIPD game Step1 IPD 3 ( , ) ex) 2 , , 2 1 : round-robin tournament 63 TIT FOR TAT TIT FOR TAT: , 6 Slide 8 Robert Axelrods nIPD game Step2 Encoding C : CooperationD : Defect 1 , : ( , ) ex) TIT FOR TAT 7 CCCDDCDD case 1 case 2case 3 4 ! CCCDDCDD case 4 Slide 9 Robert Axelrods nIPD game Step2 Encoding - 3 CC CC CC (case 1) CC CC CD (case 2) CC CC DC (case 3) DD DD DC (case 63) DD DD DD (case 64) 64bit + 6bit encoding 64bit : 1 1 6bit : 3 EX) CCDCDDDC DC CCDDCD = 2 70 8 64 Slide 10 Robert Axelrods nIPD game Step2 Fitness : Payoff Population : 20 Generation : 50 TIT FOR TAT !! TIT FOR TAT 10~20 population 9 Slide 11 Evolution of Iterated Prisoner's Dilemma Game Strategies in Structured Demes Under Random Pairing in Game Playing Hisao Ishibuchi, Member, IEEE, and Naoki Namikawa, Student Member, IEEE IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 6, DECEMBER 2005 Slide 12 Outline Introduction Two neighbor structure IPD game structure Mating strategy Simulation Standard Pairing Scheme Random Pairing Scheme Simulation Random Pairing Scheme Conclusion 11 Slide 13 Introduction Spatial IPD game Framework of structured demes Cells of two-dimensional grid-world Two neighborhood structure Interaction among players through the IPD game Interaction among players for mating strategies Similar to world of territorial animals or plant Random pairing scheme Plays game with a randomly chosen neighbor at every round Demonstrate evolution of cooperation behavior (in random pairing) 12 Slide 14 Basic structure Payoff Matrix Payoff Matrix of the game 13 Slide 15 Basic structure Strategy Encoding Single player has a single strategy Every Strategy is represented by 5 bit binary sequence Example of strategy (TIT-FOR-TAT) 14 Slide 16 IPD game structure World and Neighborhood Use 31 * 31 grid-world All player locate on one cell 961 player exist Examples of neighborhood structure 15 Slide 17 IPD game structure - Game play and Fitness N IPD (i) The set of Player i and its neighbors Game play The game is iterated for a pre-specified number of rounds (e.g, 100 rounds) Each player plays game against only its neighbors Randomly select opponents Fitness Average payoff obtained from each round of the game 16 Slide 18 Mating strategy formulation N GA (i) Set of player i and its neighbors N IPD (i) = N GA (i) is not always hold Parents is selected from N GA (i) Using roulette wheel selection Selection probability of strategy j f(s i ) : fitness of player i with strategy s i F min (N GA (i)) : minimum fitness among the N GA (i) 17 Slide 19 Mating strategy crossover and mutation One point crossover Bitmap mutation 18 Slide 20 Simulation Two kinds of simulation Simulate two neighborhood structure with standard pairing scheme Verify the effect of two neighborhood structure on evolution of cooperative behavior Simulate two neighborhood structure with random pairing scheme Examine the effect random pairing scheme on evolution of cooperative behavior 961 spatially fixed player (31 * 31 grid-world) Mistake (noisy IPD model) A player chooses an action different from its strategy 19 Slide 21 Standard Pairing Simulation Parameter Setting Case of two neighborhood structure Parameter value 20 Mistake probability0, 0.001, 0.01, 0.1 Crossover probability1.0 Mutation probability1 / (5*961) Termination of IPD game100 rounds Termination of evolution1000 generations Slide 22 Standard Pairing Simulation Result N IPD has a significant effect on the evolution of cooperative behavior N GA has a much smaller effect than N IPD Small N IPD facilitate the evolution of cooperative behavior 21 Slide 23 Standard Pairing Simulation Result (2) Better results were obtained from smaller mistake probabilities Cooperative behavior were evolved independently from the two neighborhood structures 22 Slide 24 Random Pairing Scheme Every player chooses its opponent randomly from N IPD at every round of the game The memory about the interaction with a neighbor may influence an players future action against another neighbor 23 Slide 25 Random Pairing Simulation Result (1) The same parameter specifications were used as in the previous Evolution of cooperative behavior is very difficult to achieve Increase number of opponents Decreased the probability to play against the same opponent Decrease in average payoff 24 Slide 26 Random Pairing Simulation Result (2) Strategy characterized by the genetic form 1***1 25 ParameterValue Mistake probability0 NIPD(i)3 NGA(i)5 Slide 27 Random Pairing Simulation Result (3) Strategy characterized by the genetic form ****0 The existence of strategies of this type prevents the consecutive occurrence of mutual cooperation 26 ParameterValue Mistake probability0 NIPD(i)5 NGA(i)5 Slide 28 Random Pairing Simulation Result (4) Strategy characterized by the genetic form 11**1 Those strategies have the ability to recover from mutual defection (D, D) This ability seems to be important under a noisy situation 27 ParameterValue Mistake probability0.01 NIPD(i)3 NGA(i)5 Slide 29 Random Pairing Simulation Result (5) 28 ParameterValue Mistake probability0.01 NIPD(i)5 NGA(i)5 The TFT strategy 10011 increased its percentage to almost 100% Higher average payoff was obtained from strategies of the form 11**1, rather than the TFT strategy 10011. Slide 30 Other Simulations 29 Slide 31 Conclusion Formulated a spatial IPD game using the concept of two neighborhood structures Interaction among players through the IPD game Mating strategies Computer Simulation Use of a small interaction neighborhood facilitated the evolution of cooperative behavior Introduced a random pairing scheme with the two neighborhood structures Computer Simulation Cooperative behavior was evolved when we smallest interaction neighborhood is used Future Work Explain the results of random pairing scheme simulation Use a stochastic strategy represented by a string of real numbers between 0 and 1 Evolution of cooperative behavior under the random pairing scheme in a large interaction neighborhood 30 Slide 32 Simulation on Worksite Interactions between Laborers and Firms by using Multi-Agent based Evolutionary Computation Soft Computing Laboratory, Yonsei University Hee-Taek Kim and Sung-Bae Cho [email protected], [email protected] Social Simulation Workshop at the International Joint Conference on Artificial Intelligence Slide 33 Motivation Laborers and firms formulate strategic relationship What is rational strategy in position of laborer or firm Can we drive mutual benefits relation between Laborers and firm? General economic belief laborer tends to cooperate with cooperative firms Firm tends to cooperate with cooperative laborers 32 Wage Labor High wage... High wage... Low wage, but high productivity Low wage, but high productivity Slide 34 Introduction of the Simulation Model Construct computational work-site interaction model Multi-agent based approach Consist of worker agent and firm agent Implement adaptive agent by using evolutionary computation Simulate interaction between workers and firms Workers and firms are mutually interact each other Make collaborative or competitive relationship 33 Slide 35 Evolutionary Computation Based on Darwinism Survivals of the fittest Apply evolutionism to computation Widely used to modeling social phenomena Individual population, behavioral rule, selection and reproduction Each individual can adapt to dynamic environment Basic evolution process 34 Population Selection Reproduction (Crossover and mutation) Reproduction (Crossover and mutation) Calculate Fitness Calculate Fitness Slide 36 Simulation Process Laborers Phase The interaction protocol between workers and firms can be divided into two phase Laborers phase and firms phase 35 Laborers have to decide whether to resign from firm or not Laborers have to decide whether to cooperate or defect with his employer Laborers have to decide whether to resign from firm or not Laborers have to decide whether to cooperate or defect with his employer Slide 37 Simulation Process Firms Phase Firms phase 36 Firms have to decide whether to cooperate or defect with his opponent laborers Slide 38 Overall Process of Simulation 37 Slide 39 Simulation framework 38 Slide 40 Internal Attributes Laborer Attributes of laborerDescription int IDUnique identifier of this laborer int employedFirmIDUnique identifier of a firm who employed this laborer double assetTotal asset of this laborer doubleproductivityThe productivity offered to firm doublelivingCost Living expenses per one generation. S ubtract from asset intstateCurrent state { WORKING, JOBLESS, FRESH, FAILED } intcontinuesThe counts of generations from employment to now ArraychromosomeArray of integers representing strategy of this laborer ArrayfirmCareerAfter resignation, laborer never employed to same firm again QueuefirmPastBehaviorsThe cooperation history of the firm employed this laborer QueuelaborerPastBehaviorsThe cooperation history of this laborer 39 Slide 41 Internal Attributes Firm Attributes of firmDescription int IDUnique identifier of the firm double capitalTotal capital of this firm. Correspond to laborers asset doublesupportingCostThe cost for maintenance of a firm ArraychromosomeArray of integers representing strategy of this firm ArraymyLaborersArray of laborers who are employed in this firm 40 Slide 42 Action of Agent Cooperation and defection Laborer Cooperation : High Productivity ( Prod H ) Defection : Low Productivity ( Prod L ) Resign : resign from opponent firm Firm Cooperation : High wage ( Wage H ) Defection : Low wage ( Wage L ) 41 (Laborer, Firm) Firm cooperationdefection Laborer Cooperation(Wage H, Prod H - Wage H )(Wage L, Prod H Wage L ) Defection(Wage H, Prod L - Wage H )(Wage L, Prod L Wage L ) Slide 43 Behavioral Strategy of Agent Behavioral strategy determine current action of the agent All individuals has its own strategy All strategies evolve as the simulation being progressed 42 Slide 44 Evolutionary Engine Fitness evaluation Firm The capital attribute is treated as fitness of the firm Laborer The asset attribute is treated as fitness of the laborer Selection Used roulette wheel selection Possibility of selection 43 Slide 45 Evolutionary Engine Crossover and mutation One point cross over One point bit mutation Elimination Eliminate incapable agents from simulation 44 Slide 46 Experimental Design DescriptionValue Firm Initial capital2000 Initial number of laborers per one firm10 Maximum number of laborers per one firm30 supportingCost30 Wage H 12 Wage L Wage H /2 Laborer Initial asset200 livingCost10 Prod H 18 Prod L Prod H /2 Othe Initial number of firms30 Maximum capacity of history queue ( ) 10 Mistake probability0.01 45 DescriptionValue Initial population of firm30 Maximum population of firmInfinite Initial population of laborers330 Maximum population of laborersInfinite Increment rate of laborers population (Reproduce rate) 0.005 Mutation rate0.005 Selection methodRoulette wheel Crossover method1 point crossover (Worker, Firm) Firm cooperationDefection Worker Cooperation(12, 6)(6, 12) Defection(12, -3)(6, 3) (Laborer, Firm) Firm cooperationdefection Labor er Cooperation(Wage H, Prod H - Wage H )(Wage L, Prod H Wage L ) Defection(Wage H, Prod L - Wage H )(Wage L, Prod L Wage L ) Slide 47 Experimental Result 46 Slide 48 Experimental Result (2) 47 Slide 49 Conclusion Second Experiment Forbid resignation of laborers Laborers cannot escape from vicious firm Firms just want to extort faithful laborer Results in breakdown of all agents because of selfish behavior of the firms 48 Slide 50 Current Works Extend 2*2 interaction model Continuous model based on linear algebra Asset/livingCost X 1 + RecentGivenPay X 2 + Continuous X 3 Beside previous activity of opponent agent, many other factors can affect current action of the agent Environmental information, my current state, opponent state, and so on Test various policies to simulation model and analysis its effect 49 Slide 51 nIPD game ^^ Robert Axelrods nIPD game Population, Generation, Mutation Rate, Crossover Rate, payoff matrix, payoff matrix prisoners dilemma game http://www.aistudy.co.kr/biology/genetic/example_mitchell.htm http://www.aistudy.co.kr/biology/genetic/example_mitchell.htm , 3 , 1 6bit , : 9 24 : VS2008 (C, C++, C# ) 50 Slide 52 Population, Generation, Mutation Rate, Crossover Rate, payoff matrix, ? , 51 Slide 53