32
SIGNALING GAME BEHAVIOURAL STRATEGIES &UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA Models of language Evolution Session 09: Evolution of Pragmatic Strategies Roland M ¨ uhlenbernd University of T ¨ ubingen

Models of language Evolution - uni-tuebingen.demfranke/MoLE2011/slides/session09.pdfSIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA Models of language

Embed Size (px)

Citation preview

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

Models of language EvolutionSession 09: Evolution of Pragmatic Strategies

Roland MuhlenberndUniversity of Tubingen

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAME: DEFINITION

A signaling game is a tupleSG = 〈{S,R},T,Pr,M,A,US,UR〉with

I {S,R}: set of playersI T: set of statesI Pr: prior beliefs: Pr ∈ ∆(T)

I M: set of messagesI A: set of receiver actionsI US,R: utility function: T ×M× A→ R

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAME: EXAMPLE

A standard Lewis game is defined as:I Set of players {S,R}I Set of states T = {t1, t2}I Equiprobable prior beliefs: Pr(t1) = .5,Pr(t2) = .5I Set of messages M = {m1,m2} (no costs)I Set of actions A = {a1, a2}

I Utility function US,R(ti,mj, ak) =

{1 if i = k0 else

a1 a2

t1 1,1 0,0t2 0,0 1,1

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

STRATEGIES

The players’ ”actions” can be represented as pure strategies.For the Lewis game there are 4 strategies for each player:

S1:t1 m1

t2 m2

S2:t1

m2t2

m1

S3:t1 m1

t2 m2

S4:t1

m2t2

m1

R1:m1 a1

m2 a2

R2:m1

a2m2

a1

R3:m1 a1

m2 a2

R4:m1

a2m2

a1

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

EXPECTED UTILITIES

The expected utility for a combination of strategies is given as:

EU(Si,Rj) =∑t∈T

Pr(t)×U(t,Si(t),Rj(Si(t))) (1)

R1 R2 R3 R4S1 1 0 .5 .5S2 0 1 .5 .5S3 .5 .5 .5 .5S4 .5 .5 .5 .5

Table: Expected utilities for all strategy combinations of the Lewisgame

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAMES AS STATIC GAMES

I static games: agents choose simultaneouslyI SG as static game: agents choose strategiesI a strategy represents a ”contingency plan”: what would an

agent do in each state

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAMES AS STATIC GAMES

Extensions:1. multi-agent system

I graph G = 〈N,V〉 as interaction structureI nodes represent agents, edges represent connections for

interactionI different structures: grid, small world...

2. update rulesI imitate the majorityI imitate the bestI conditional imitationI best response

I against mixed strategy over all neighboursI against mixed strategy over preplayed rounds (fictitious

play)

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAMES AS DYNAMIC GAMES

I dynamic games: agents choose in sequenceI SG as dynamic game: agents play behavioural strategiesI a behavioural strategy represents a behaviour: what would

an agent do for a given situation

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

BEHAVIOURAL STRATEGIES

Behavioural strategies are functions that map choice points toprobability distributions over actions available in that choicepoint.

I behavioural sender strategyσ ∈ S = (∆(M))T

I behavioural receiver strategyρ ∈ R = (∆(A))M

σ =

t1 7→[

m1 7→ .9m2 7→ .1

]t2 7→

[m1 7→ .5m2 7→ .5

] ρ =

m1 7→[

a1 7→ .33a2 7→ .67

]m2 7→

[a1 7→ 1a2 7→ 0

]

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIGNALING GAMES AS DYNAMIC GAMES

Extensions:1. multi-agent system

I graph G = 〈N,V〉 as interaction structureI nodes represent agents, edges represent connections for

interactionI different structures: grid, small world...

2. update rulesI imitate the majorityI imitate the bestI conditional imitationI best response ??

I against mixed strategy over all neighboursI against mixed strategy over preplayed rounds (fictitious

play)

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

BEHAVIOURAL STRATEGIES

I Behavioural strategies represent probabilistic behaviourExample: σ(m1|t2) = .5 - for state t2 the sender sends message m1with a probability of .5

I Behavioural strategies represent beliefsExample: ρ(a1|m1) = .33 - the sender believes that the receiverconstrues message m1 with a1 with a probability of .33

σ =

t1 7→[

m1 7→ .9m2 7→ .1

]t2 7→

[m1 7→ .5m2 7→ .5

] ρ =

m1 7→[

a1 7→ .33a2 7→ .67

]m2 7→

[a1 7→ 1a2 7→ 0

]

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

EXPECTED UTILITY & BEST RESPONSE

I While for static SG the Expected utility EU(Si,Rj) wasdefined for a strategy pair 〈Si,Rj〉, for a dynamic SG it isdefined in the following way:

EUS(m|t, ρ) =∑a∈A

ρ(a|m)×U(t,m, a) (2)

EUR(a|m, σ) =∑t∈T

σ(m|t)×U(t,m, a) (3)

I The behavioural strategies σ and ρ represent beliefs aboutthe participant

I Just as for static games to play Best Response means tomaximize Expected utility

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

BELIEF LEARNING

I Behavioural strategies represent beliefs about theinterlocutor

I The belief is a result of observationI Example:

SO a1 a2

m1 8 2m2 7 13

ρ =

m1 7→[

a1 7→ .8a2 7→ .2

]m2 7→

[a1 7→ .35a2 7→ .65

]

RO t1 t2

m1 6 0m2 4 4

σ =

t1 7→[

m1 7→ .6m2 7→ .4

]t2 7→

[m1 7→ 0m2 7→ 1

]

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

BELIEF LEARNING & BEST RESPONSE

I After a played game both interlocutors can observe theresulting game path and update their beliefs accordingly

I Example:I Given the following observations and appropriate beliefs:

SO a1 a2

m1 8+1 2m2 7 13

RO t1 t2

m1 6+1 0m2 4 4

I the sender is faced with state t1I EU(m1|t1, ρ) = .8, EU(m2|t1, ρ) = .35I m1 maximises EU, thus sending m1 is best responseI the receiver has to construe message m1:I EU(a1|m1, σ) = .6, EU(a2|m1, σ) = 0I a1 maximises EU, thus playing a1 is best responseI players observe resulting game path 〈t1,m1, a1〉 and update

observation counts and beliefs accordingly

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

EXAMPLE: RESULT IN A SW NETWORK

Figure: Resulting structure after 30 simulation steps of 100 BL agents playing the Lewis game on a SWnetwork. The colours blue and green represent both signaling systems as target strategies.

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

REINFORCEMENT LEARNING

For a given signaling game SGI the sender has a urn ft for each t ∈ T filled with balls of a

type m ∈MI the receiver has a urn fm for each m ∈M filled with balls

of a type a ∈ A

I Example:m1 m2

ft1 8 2ft2 7 13

a1 a2

fm1 2 3fm2 7 1

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

REINFORCEMENT LEARNING

I If the sender is faced with state t she draws a ball from urnft and sends the message appropriate to the ball type m.

I If the receiver receives message m he draws a ball from urnfm and plays the action appropriate to the ball type a.

I Behavioural strategies represent probabilistic behaviourI After a played round successful communication will be

reinforced

σ =

t1 7→[

m1 7→ .8m2 7→ .2

]t2 7→

[m1 7→ .35m2 7→ .65

] ρ =

m1 7→[

a1 7→ .4a2 7→ .6

]m2 7→

[a1 7→ .875a2 7→ .125

]

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

REINFORCEMENT LEARNING

Example: Given the following urn settings:m1 m2

ft1 8 2+1ft2 7 13

a1 a2

fm1 2 3fm2 7+1 1

Play 1:I the sender is faced with t1

I she draws ball type m2from urn ft1

I the receiver has toconstrue m2

I he draws ball type a1 fromurn fm2

I communication per〈t1,m2, a1〉 is successful:reinforcement

Play 2:I the sender is faced with t2

I she draws ball type m1from urn ft2

I the receiver has toconstrue m1

I he draws ball type a1 fromurn fm1

I communication per〈t2,m1, a1〉 isn’t successful:no reinforcement

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

REINFORCEMENT LEARNING

Possible extensions:I negative reinforcement: decrease the number of

appropriate balls if communication is not successfulI lateral inhibition: for successful communication not only

increase the number of appropriate balls, but also decreasethe number of all other balls in the same urn

I limited memory: consider only the last n observations

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

EXAMPLE: RESULT IN A SW NETWORK

Figure: Resulting structure after 300 simulation steps of 100 RL agents playing the Lewis game (with lateralinhibition) on a SW network. The colours blue and green represent both signaling systems as target strategies.

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

BELIEF LEARNING VS. REINFORCEMENT LEARNING

behavioural rational learning speedBL + BR

√ √fast

RL√

- slow

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

NEO-GRICEAN PRAGMATICS

I the Conversational Implicature is a pragmatic phenomenonwhere an utterance’s intended meaning differs from itsliteral meaning.

I Interlocutors can resolve the difference between theintended pragmatic interpretation (PI) and the literalinterpretation (LI) by Cooperation Principles.

Levinson (2000) subdivided GCI’s in:

I Q-ImplicatureI I-ImplicatureI M-Implicature

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

Q-IMPLICATURE

(1) ”Some boys came to the party.”LI: Some, maybe all boys came. ∃ = ∃¬∀ ∨ ∀PI: Some but not all boys came. ∃¬∀

Strategy for LIt∀

t∃¬∀

mall

msome

msbna

a∀

a∃¬∀

Strategy for PIt∀

t∃¬∀

mall

msome

msbna

a∀

a∃¬∀

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

MODELLING Q-IMPLICATURE

Parameter settings:I T = {t∀, t∃¬∀}I M = {mall,msome,msbna}I A = {a∀, a∃¬∀}I Pr(t∀) = Pr(t∃¬∀) = .5I κ(msbna) = 1κ(mall) = κ(msome) > 1

I Initial LI strategy

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀

.5

.5.5.5

.5

.5

mall msome msbnaft∀ 50 50 0ft∃¬∀ 0 50 50

a∀ a∃¬∀fmall 100 0fmsome 50 50fmsbna 0 100

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIMULATION & RESULTS

I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners

I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure

strategy

Results:

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀%

ofag

ents

1 2 3 4 5

κ(msome),κ(mall)

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

I-IMPLICATURE

”What is expressed simply is stereotypically exemplified”

(2) ”Billy drank a glass of milk.”LI: A glass of any kind of milk. tc, tgPI: A glass of cow’s milk. tc

Strategy for LItc

tg

mcm

mm

mgm

ac

ag

Strategy for PItc

tg

mcm

mm

mgm

ac

ag

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

MODELLING I-IMPLICATURE

Parameter settings:I T = {tc, tg}I M = {mm,mcm,mgm}I A = {ac, ag}I Pr(tc) = .8 > Pr(tg) = .2I κ(mm) = 2κ(mcm) = κ(mgm) = 1

I Initial LI strategy

tc

tg

mcm

mm

mgm

ac

ag

.5

.5

1− pp

1− p

p

mcm mm mgmftc 100− n n 0ftg 0 n 100− n

for n = b100× pc

ac agfmcm 100 0fmm 50 50fmgm 0 100

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIMULATION & RESULTS

I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners

I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure

strategy

Results:

tc

tg

mcm

mm

mgm

tc

tg

tc

tg

mcm

mm

mgm

tc

tg

%of

agen

ts

.3 .35 .4 .45 .5 .55 .6 p

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

M-IMPLICATURE

”What’s said in an abnormal way isn’t normal.”

(3) ”Billy caused the sheriff to die.”LI: Billy killed the sheriff in any way. tp, trPI: Billy killed the sheriff in an abnormal way. tr

Strategy for LItp

tr

mk

mctd

ap

ar

Strategy for PItp

tr

mk

mctd

ap

ar

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

MODELLING THE M-IMPLICATURE

Parameter settings:

I T = {tp, tr}I M = {mk,mctd}I A = {ap, ar}I κ(mk) = 2, κ(mctd) = 1I Pr(tp) > Pr(tr)

I Initial LI strategy

tp

tr

mk

mctd

ap

ar

mk mctdftp 50 50ftr 50 50

ap arfmk 50 50fmctd 50 50

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SIMULATION & RESULTS

I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners

I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure

strategy

Results:

tp

tr

mk

mctd

ap

ar

tp

tr

mk

mctd

ap

ar

%of

agen

ts

.51 .52 .53 .54 .55 .56 .57 Pr(tp)

SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA

SUMMARY

I the difference betweenI a static SG (agents play pure strategies)I a dynamic SG (agents play behavioural strategies)

I update dynamics for dynamic signaling gamesI belief learning + best responseI reinforcement learning

I signaling games for neo-gricean implicature types