Iterated Prisoner’s Dilemma Game in Evolutionary Computation

Iterated Prisoner’s Dilemma Game in Evolutionary Computation

2003. 10. 2

Seung-Ryong Yang

http://www.yonsei.ac.kr/

2

Agenda

Motivation

Iterated Prisoner’s Dilemma Game

Related Works

Strategic Coalition

Improving Generalization Ability

Experimental Results

Conclusion

3

Motivation

Evolutionary approachUnderstanding complex behaviors by investigating simulation results using evolutionary process

Giving a way to find optimal strategies in a dynamic environment

IPD gameModel complex phenomena such as social and economic behaviors

Provide a testbed to model dynamic environment

ObjectivesObtaining multiple good strategies

Forming coalition to improve generalization ability

4

Iterated Prisoner’s Dilemma Game (1/2)

Overview

Prisoner’s possible choice

Defection

Cooperation

Characteristics

Non-cooperative

Non-zerosum

Types of Game

2IPD (2-player Iterated Prisoner’s Dilemma) game

NIPD (N-player Iterated Prisoner’s Dilemma) game

Cooperate Defect

Cooperate R / R T / S

Defect S / T P / P

Payoff Matrix of 2IPD Game by Axelrod, R.(1984)

STRSPRT 2,

Cooperate Defect

Cooperate 3 / 3 0 / 5

Defect 5 / 0 1 / 1

5

Iterated Prisoner’s Dilemma Game (2/2)

Representation of Strategy

History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action

Own History Opponent’s History

0 1 0 ∙∙∙ 1

l = 2 : Example History 11 01

2N History

6

Related Works

Previous Study

Paul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorica

l Modularization

Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coaliti

on Formation

Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary

Learning

Issues

Topics are broad about coalition formation in multi-agent environment

Darwen and Yao have studied coalition in IPD game, but different

Focused on cooperation, the number of player, payoff variances, etc

7

What is Different?

Co-evolutionary Learning

Selection Method

Rank Based

Roulette wheel

Tournament

Coalition Formation

Coalition keeps surviving to next generation

Condition to form coalition is flexible

Decision Making in Coalition

Adapting several decision making methods to coalition

Borda Function, Condorect Function

Average Payoff, Highest Payoff

Weighted Voting

8

Evolving Strategy

To evolve strategy, we use ;Genetic algorithm

Co-evolutionary learning

Strategic coalition

Evolutionary Process

9

Evolution of Agents (1/2)

Ci

C1

Ck

Before Population Current Population Next Population

Ci

C1

CkCj

Ci

C1

Ck

Cj

Cl

Evolution of AgentsAgents can develop their strategy using co-evolutionary learning

Weak agents are removed from the population

Evolution of CoalitionFormed coalition survives to next generation

Agents can join coalition generation by generation

Coalition survives or grows up

10

Evolution of Agents (2/2)

Problem : Possibility of evolving by weak agents

Caused by removing better agent from the population who belongs to

coalition

Making new agents by mixing better agents within coalition

PopulationCk

Ci

Cj

A1

A2

Random Extraction

CoalitionMutation

Ai

Repeat as the number of agents belong to coalition

11

Strategic Coalition (1/2)

What is Coalition?

A cooperative game as a set A of agents in which each subset of A is called coalition － Matthias Klusch and Andreas Gerber, 2002

A group of agents that work jointly in order to accomplish their tasks － Onn M. Shehory, 1995

Coalition in the IPD game

Forming coalition through round-robin game

Pursuing more payoff using generalization ability

Coalition forms autonomously without supervision

12

DefinitionsDefinition 1 : Coalition Value

Definition 2 : Payoff Function

Definition 3 : Coalition Identification

C

SC

p

pw

wpS

Cp

C

i i

ii

C

iiiC

1

1

where

Strategic Coalition (2/2)

STRSPRT 2,

(1)

10)(1

1)(0

1

1

1

1

C

i iDi

C

i iCi

C

i iDi

C

i iCi

C

wC

wCDefect

wC

wCCooperate

D

if

if

)1(1

CRankCw

C

Swp

Rankii

Cii

(2)

(3)

Definition 4 : Decision Making

Definition 5 : Payoff Distribution

13

Coalition Formation (1/2)

A1

A2

A3

A4

Ak

An

Am

A5

Aj

...Ai

A2

Ai

A5

A3

C1

Aj

...

C2

Ci

A1

A4

C1

Ak

Al

C2

Am

An

Ci

... ...

Initial Population PopulationIncluding coalition

2IPD game

FormCoalition

Ai A5 A5 C1 C2 Ci

...

14

Coalition Formation (2/2)

Algorithm

2IPD Game

Exceeds iterationper generation?

Game type?

Agent vs.Agent

Agent vs.Coalition

Coalition vs.Coalition

Satisfy conditionfor forming coalition?

FormingCoalition

JoiningCoalition

Genetic Operation

Satisfycondition?

N

N

N

Y

Y

StopY

2,

2.1

STp

STp ji

2.2 ,

STC

ji pp

2,.3

STpp ji

Forming coalition

1. Round-robin 2IPD game

2. Obtain rank

3. Determine confidence of

agent according to the rank

Joining coalition

1. Round-robin 2IPD game

2. Obtain rank

3. If number of agents > max. number of

agents within a coalition,

remove the weakest agent

4. Determine confidence of each agent

15

Coalition Decision Making

Decision makingTo decide coalition’s opinion

Use weighted voting method

Sharing profitsDistribution payoff with each agent’s confidence

Rank influences each weight

Determining next action of coalition

• : Weight for cooperation of coalition Ci

• : Weight for defection of coalition Ci

DiC

CiCCi

Cj

Ck

Cl

∑

∑

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

CiC

DiC

16

Weight of Agents

Adjusting weightGive incentive to agents in coalition

It reflects decision making of coalition

DiC

CiCCi

Cj

Ck

Cl

∑

∑

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

Adjusting weight

17

Improving Generalization Ability (1/2)

Problem of one good strategy

Not adaptive to dynamic environment

Obtain multiple good strategies for specific environment

Ex) Biological immune system

Method

Fitness sharing

Adjust confidences of multiple strategies by evolution

Co-evolution

Coalition formation

18

Improving Generalization Ability (2/2)

How good a player performs against unknown player

Evaluation

Random Generationof 100 Strategies

2IPD Game

Extract Top Strategies

in the Population

1 0001110...2 0000100...

3 0100100...

4 0001100...5 0010010...

10 0000010...Top Strategies

Genetically Evolved Strategies

IPDGame

19

Test Strategy

Test Strategies

Strategy Characteristics

Tit-For-Tat Initially cooperate, and then follow opponent

Trigger Initially cooperate. Once opponent defects, continuously defect

AllD Always defect

CDCD Cooperate and defect over and over

CCD Cooperate and cooperate and defect

Random Random move

Example Strategy

0 0 1 0 1 1 0 0

0 0 0 1 1 1 1 1

1 1 1 1 1 1 1 1

0 1 0 1 0 1 0 1

0 0 1 0 0 1 0 0

1 1 0 1 0 0 1 1

Tit-for-Tat

Trigger

AllD

CDCD

CCD

Random

20

Example of Game

Tit-for-Tat

1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1

Vs.Evolved Strategy

0 0 0 0

1 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history

1 1 1 0

1 1 1 1

1 1 1 1

0 0 1 0

1 0 1 1

1 1 1 1

1 1 1 1

0 1 0 03

5

1

1

1

3

0

1

1

1

Payoff Payoff

1

2 3 4 5

1

2 3 4 5

21

Test Environment

Population size : 100

Crossover rate : 0.3

Mutation rate : 0.001

Number of generations : 200

Number of iterations : a third of population

Training set : Well-known 6 strategies

Experimental Result

22

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

Random Payoff

Random S.D

Evolved Strategy vs. Random

Rank Genotype ofEvolved strategy

Evolved strategy Random

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

10111001111010111101001110011110101111011011101111111111111100111011111111111101001110111110111110110011000011111111111100111011111111111011001110111110111110111011111111111111111110111001111111111101

3.0800002.8000002.9200002.8800002.9400002.6800003.0400003.1600003.4800002.760000

1.9983991.9899751.9983991.9963971.9890701.6904441.9996001.9935901.9415461.985548

0.4800000.5500000.5200000.5700000.5400002.3500000.4900000.5000000.3800000.560000

0.4996000.4974940.4996000.6671580.5553381.9968730.4999000.6708200.4853860.496387

Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the gameagainst Random test strategies with high payoffs.

Experimental Result

23

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

TFT Payoff

TFT S.D

Evolved Strategy vs. Tit-for-Tat


Evolved strategy Tit-for-Tat


1 2 3 4 5 6 7 8 9 10

11000100001011011100011011000010100111001000100000101101110000000100001010011100100010000010110111000101010000101101110011001000001010011100110011000010110111100111010000101101110001010100011011011100

3.0200003.0000001.0400001.0800002.9800003.0000001.0400003.0000003.0200003.000000

1.6369480.0000000.3979950.5600000.3458321.6248080.3979950.0000001.6369480.000000

2.6400003.0000000.9900001.0200002.9700002.6700000.9900003.0000002.6400003.000000

2.0616500.0000000.0994990.4237920.4112182.0447740.0994990.0000002.0616500.000000

Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.

Experimental Result

24

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

Trigger Payoff

Trigger S.D

Evolved Strategy vs. Trigger


Evolved strategy Trigger


1 2 3 4 5 6 7 8 9 10

10111011110011101000101110111100111010010011101111001111100010111011110011111001101110111100111110011011101111001111100110111111110010111000001110111100111110011011101111001111100100111011110011111001

1.0400001.0400001.0600001.0400001.0800001.0400001.0400001.0400001.0600001.040000

0.3979950.3979950.4431700.3979950.4833220.3979950.3979950.3979950.4431700.397995

0.9900000.9900001.0100000.9900001.0300000.9900000.9900000.9900001.0100000.990000

0.0994990.0994990.2233830.0994990.2984960.0994990.0994990.0994990.2233830.099499

Trigger strategy is never forgiving strategy

for opponent’s defection. The way to win

a game against Trigger is also choosing

“defection” iteratively.

Experimental Result

25

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

AllD Payoff

AllD S.D

Evolved Strategy vs. AllD


Evolved strategy ALLD


1 2 3 4 5 6 7 8 9 10

00111111111110101111001111111111101011110011111111111010111100111011111110101111101111111111101011110011111111111010111110111011111110101111001111111111101011110011111111111010101100111111111110101111

1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000

1.0000001.0000001.0000001.0000001.0000001.0000001.0400001.0400001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.3979950.3979950.0000000.000000

The only way not to lose the game against

AllD is only choosing “defection” on all

moves. There is no way to cooperate for

the game.

Experimental Result

26

Number of Coalition

0

5

10

15

20

25

30

0 20 40 60 80 100 Generation

Coa

liti

on

Coalition survives next generation. In early evolutionary process, most of coalitionare formed. It makes genetic diversity high and better choice against opponents.Coalition can grow if the conditions of agents are satisfied.

Experimental Result

27

Comparing the Results

The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.

Experimental Result

28

Bias of the Strategy

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0 50 100 150 200

Random

TFT

Trigger

AllD

CDCD

CCD

Bia

s

Generation

Bias shows how next choice of the strategies is selected against its opponents.The higher rate of bias means that a strategy chooses more “cooperation” than“defection” with a bias rate and vice versa.

Experimental Result

29

Conclusions

Conclusion

Strategic coalition might be a robust method that can adapt to a dynamic environment

Decision making methods influence the results, but not serious

The evolved strategies by coalition generalize well against various opponents

Discussion

Can the strategic coalition be adapted to n-IPD game ?

Which parameters in IPD game influence generalization ability ?

How can make opponent strategies to test ?

How can adapt this problem to real world ?

30

Examples (1)

Market Observer

31

Examples (2)

Forest Prediction

Documents

Iterated Prisoner’s Dilemma Game in Evolutionary Computation