Upload
trinhcong
View
219
Download
0
Embed Size (px)
Citation preview
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
Models of language EvolutionSession 09: Evolution of Pragmatic Strategies
Roland MuhlenberndUniversity of Tubingen
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAME: DEFINITION
A signaling game is a tupleSG = 〈{S,R},T,Pr,M,A,US,UR〉with
I {S,R}: set of playersI T: set of statesI Pr: prior beliefs: Pr ∈ ∆(T)
I M: set of messagesI A: set of receiver actionsI US,R: utility function: T ×M× A→ R
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAME: EXAMPLE
A standard Lewis game is defined as:I Set of players {S,R}I Set of states T = {t1, t2}I Equiprobable prior beliefs: Pr(t1) = .5,Pr(t2) = .5I Set of messages M = {m1,m2} (no costs)I Set of actions A = {a1, a2}
I Utility function US,R(ti,mj, ak) =
{1 if i = k0 else
a1 a2
t1 1,1 0,0t2 0,0 1,1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
STRATEGIES
The players’ ”actions” can be represented as pure strategies.For the Lewis game there are 4 strategies for each player:
S1:t1 m1
t2 m2
S2:t1
m2t2
m1
S3:t1 m1
t2 m2
S4:t1
m2t2
m1
R1:m1 a1
m2 a2
R2:m1
a2m2
a1
R3:m1 a1
m2 a2
R4:m1
a2m2
a1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXPECTED UTILITIES
The expected utility for a combination of strategies is given as:
EU(Si,Rj) =∑t∈T
Pr(t)×U(t,Si(t),Rj(Si(t))) (1)
R1 R2 R3 R4S1 1 0 .5 .5S2 0 1 .5 .5S3 .5 .5 .5 .5S4 .5 .5 .5 .5
Table: Expected utilities for all strategy combinations of the Lewisgame
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS STATIC GAMES
I static games: agents choose simultaneouslyI SG as static game: agents choose strategiesI a strategy represents a ”contingency plan”: what would an
agent do in each state
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS STATIC GAMES
Extensions:1. multi-agent system
I graph G = 〈N,V〉 as interaction structureI nodes represent agents, edges represent connections for
interactionI different structures: grid, small world...
2. update rulesI imitate the majorityI imitate the bestI conditional imitationI best response
I against mixed strategy over all neighboursI against mixed strategy over preplayed rounds (fictitious
play)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS DYNAMIC GAMES
I dynamic games: agents choose in sequenceI SG as dynamic game: agents play behavioural strategiesI a behavioural strategy represents a behaviour: what would
an agent do for a given situation
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BEHAVIOURAL STRATEGIES
Behavioural strategies are functions that map choice points toprobability distributions over actions available in that choicepoint.
I behavioural sender strategyσ ∈ S = (∆(M))T
I behavioural receiver strategyρ ∈ R = (∆(A))M
σ =
t1 7→[
m1 7→ .9m2 7→ .1
]t2 7→
[m1 7→ .5m2 7→ .5
] ρ =
m1 7→[
a1 7→ .33a2 7→ .67
]m2 7→
[a1 7→ 1a2 7→ 0
]
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS DYNAMIC GAMES
Extensions:1. multi-agent system
I graph G = 〈N,V〉 as interaction structureI nodes represent agents, edges represent connections for
interactionI different structures: grid, small world...
2. update rulesI imitate the majorityI imitate the bestI conditional imitationI best response ??
I against mixed strategy over all neighboursI against mixed strategy over preplayed rounds (fictitious
play)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BEHAVIOURAL STRATEGIES
I Behavioural strategies represent probabilistic behaviourExample: σ(m1|t2) = .5 - for state t2 the sender sends message m1with a probability of .5
I Behavioural strategies represent beliefsExample: ρ(a1|m1) = .33 - the sender believes that the receiverconstrues message m1 with a1 with a probability of .33
σ =
t1 7→[
m1 7→ .9m2 7→ .1
]t2 7→
[m1 7→ .5m2 7→ .5
] ρ =
m1 7→[
a1 7→ .33a2 7→ .67
]m2 7→
[a1 7→ 1a2 7→ 0
]
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXPECTED UTILITY & BEST RESPONSE
I While for static SG the Expected utility EU(Si,Rj) wasdefined for a strategy pair 〈Si,Rj〉, for a dynamic SG it isdefined in the following way:
EUS(m|t, ρ) =∑a∈A
ρ(a|m)×U(t,m, a) (2)
EUR(a|m, σ) =∑t∈T
σ(m|t)×U(t,m, a) (3)
I The behavioural strategies σ and ρ represent beliefs aboutthe participant
I Just as for static games to play Best Response means tomaximize Expected utility
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING
I Behavioural strategies represent beliefs about theinterlocutor
I The belief is a result of observationI Example:
SO a1 a2
m1 8 2m2 7 13
ρ =
m1 7→[
a1 7→ .8a2 7→ .2
]m2 7→
[a1 7→ .35a2 7→ .65
]
RO t1 t2
m1 6 0m2 4 4
σ =
t1 7→[
m1 7→ .6m2 7→ .4
]t2 7→
[m1 7→ 0m2 7→ 1
]
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING & BEST RESPONSE
I After a played game both interlocutors can observe theresulting game path and update their beliefs accordingly
I Example:I Given the following observations and appropriate beliefs:
SO a1 a2
m1 8+1 2m2 7 13
RO t1 t2
m1 6+1 0m2 4 4
I the sender is faced with state t1I EU(m1|t1, ρ) = .8, EU(m2|t1, ρ) = .35I m1 maximises EU, thus sending m1 is best responseI the receiver has to construe message m1:I EU(a1|m1, σ) = .6, EU(a2|m1, σ) = 0I a1 maximises EU, thus playing a1 is best responseI players observe resulting game path 〈t1,m1, a1〉 and update
observation counts and beliefs accordingly
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXAMPLE: RESULT IN A SW NETWORK
Figure: Resulting structure after 30 simulation steps of 100 BL agents playing the Lewis game on a SWnetwork. The colours blue and green represent both signaling systems as target strategies.
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
For a given signaling game SGI the sender has a urn ft for each t ∈ T filled with balls of a
type m ∈MI the receiver has a urn fm for each m ∈M filled with balls
of a type a ∈ A
I Example:m1 m2
ft1 8 2ft2 7 13
a1 a2
fm1 2 3fm2 7 1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
I If the sender is faced with state t she draws a ball from urnft and sends the message appropriate to the ball type m.
I If the receiver receives message m he draws a ball from urnfm and plays the action appropriate to the ball type a.
I Behavioural strategies represent probabilistic behaviourI After a played round successful communication will be
reinforced
σ =
t1 7→[
m1 7→ .8m2 7→ .2
]t2 7→
[m1 7→ .35m2 7→ .65
] ρ =
m1 7→[
a1 7→ .4a2 7→ .6
]m2 7→
[a1 7→ .875a2 7→ .125
]
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
Example: Given the following urn settings:m1 m2
ft1 8 2+1ft2 7 13
a1 a2
fm1 2 3fm2 7+1 1
Play 1:I the sender is faced with t1
I she draws ball type m2from urn ft1
I the receiver has toconstrue m2
I he draws ball type a1 fromurn fm2
I communication per〈t1,m2, a1〉 is successful:reinforcement
Play 2:I the sender is faced with t2
I she draws ball type m1from urn ft2
I the receiver has toconstrue m1
I he draws ball type a1 fromurn fm1
I communication per〈t2,m1, a1〉 isn’t successful:no reinforcement
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
Possible extensions:I negative reinforcement: decrease the number of
appropriate balls if communication is not successfulI lateral inhibition: for successful communication not only
increase the number of appropriate balls, but also decreasethe number of all other balls in the same urn
I limited memory: consider only the last n observations
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXAMPLE: RESULT IN A SW NETWORK
Figure: Resulting structure after 300 simulation steps of 100 RL agents playing the Lewis game (with lateralinhibition) on a SW network. The colours blue and green represent both signaling systems as target strategies.
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING VS. REINFORCEMENT LEARNING
behavioural rational learning speedBL + BR
√ √fast
RL√
- slow
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
NEO-GRICEAN PRAGMATICS
I the Conversational Implicature is a pragmatic phenomenonwhere an utterance’s intended meaning differs from itsliteral meaning.
I Interlocutors can resolve the difference between theintended pragmatic interpretation (PI) and the literalinterpretation (LI) by Cooperation Principles.
Levinson (2000) subdivided GCI’s in:
I Q-ImplicatureI I-ImplicatureI M-Implicature
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
Q-IMPLICATURE
(1) ”Some boys came to the party.”LI: Some, maybe all boys came. ∃ = ∃¬∀ ∨ ∀PI: Some but not all boys came. ∃¬∀
Strategy for LIt∀
t∃¬∀
mall
msome
msbna
a∀
a∃¬∀
Strategy for PIt∀
t∃¬∀
mall
msome
msbna
a∀
a∃¬∀
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING Q-IMPLICATURE
Parameter settings:I T = {t∀, t∃¬∀}I M = {mall,msome,msbna}I A = {a∀, a∃¬∀}I Pr(t∀) = Pr(t∃¬∀) = .5I κ(msbna) = 1κ(mall) = κ(msome) > 1
I Initial LI strategy
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀
.5
.5.5.5
.5
.5
mall msome msbnaft∀ 50 50 0ft∃¬∀ 0 50 50
a∀ a∃¬∀fmall 100 0fmsome 50 50fmsbna 0 100
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners
I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure
strategy
Results:
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀%
ofag
ents
1 2 3 4 5
κ(msome),κ(mall)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
I-IMPLICATURE
”What is expressed simply is stereotypically exemplified”
(2) ”Billy drank a glass of milk.”LI: A glass of any kind of milk. tc, tgPI: A glass of cow’s milk. tc
Strategy for LItc
tg
mcm
mm
mgm
ac
ag
Strategy for PItc
tg
mcm
mm
mgm
ac
ag
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING I-IMPLICATURE
Parameter settings:I T = {tc, tg}I M = {mm,mcm,mgm}I A = {ac, ag}I Pr(tc) = .8 > Pr(tg) = .2I κ(mm) = 2κ(mcm) = κ(mgm) = 1
I Initial LI strategy
tc
tg
mcm
mm
mgm
ac
ag
.5
.5
1− pp
1− p
p
mcm mm mgmftc 100− n n 0ftg 0 n 100− n
for n = b100× pc
ac agfmcm 100 0fmm 50 50fmgm 0 100
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners
I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure
strategy
Results:
tc
tg
mcm
mm
mgm
tc
tg
tc
tg
mcm
mm
mgm
tc
tg
%of
agen
ts
.3 .35 .4 .45 .5 .55 .6 p
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
M-IMPLICATURE
”What’s said in an abnormal way isn’t normal.”
(3) ”Billy caused the sheriff to die.”LI: Billy killed the sheriff in any way. tp, trPI: Billy killed the sheriff in an abnormal way. tr
Strategy for LItp
tr
mk
mctd
ap
ar
Strategy for PItp
tr
mk
mctd
ap
ar
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING THE M-IMPLICATURE
Parameter settings:
I T = {tp, tr}I M = {mk,mctd}I A = {ap, ar}I κ(mk) = 2, κ(mctd) = 1I Pr(tp) > Pr(tr)
I Initial LI strategy
tp
tr
mk
mctd
ap
ar
mk mctdftp 50 50ftr 50 50
ap arfmk 50 50fmctd 50 50
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
I 200 RL agents play the Q-Implicature game repeatedly ona total network with random partners
I all agents start with the initial urn setting that represents LII The simulation ends if all agents have learned a pure
strategy
Results:
tp
tr
mk
mctd
ap
ar
tp
tr
mk
mctd
ap
ar
%of
agen
ts
.51 .52 .53 .54 .55 .56 .57 Pr(tp)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SUMMARY
I the difference betweenI a static SG (agents play pure strategies)I a dynamic SG (agents play behavioural strategies)
I update dynamics for dynamic signaling gamesI belief learning + best responseI reinforcement learning
I signaling games for neo-gricean implicature types