53
Dent Blanche 14,295’ 4,357m

12 prod computational - Stanford University

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Dent Blanche14,295’ 4,357m

Language production: computational models

PSYCH 140 / LINGUIST 145 / LINGUIST 245A

Today

1. Prominent theoretical alternative to Availability-based production: Uniform Information Density

2. Looking back: probability in different areas of language processing

Choices in production

Information selection vs. form selection

information

form

full thought/message/idea/intention the speaker wants to convey

thought/message

select specific aspects of the full thought to encode into a sentence

decide how to distribute the information across a sentence, within the constraints of syntax

compress

distribute

focus of most production research

Choices at many levels of production

Utterance level:

Phrasal level:

Word level:Morphological level:Phonological level:

Phonetic level:

Move the triangle to the left. Select the triangle. Move it to the left. She gave {him the key / the key to him} She already ate (dinner) She stabbed him (with a knife) I read a book (that) she wrote. I’ve/have gone there. t/d-deletion (tha[t] cat) metathesis (ask/aks) speech rate, clarity of articulation

Choices at many levels of production

Utterance level:

Phrasal level:

Word level:Morphological level:Phonological level:

Phonetic level:

Move the triangle to the left. Select the triangle. Move it to the left. She gave {him the key / the key to him} She already ate (dinner) She stabbed him (with a knife) I read a book (that) she wrote. I’ve/have gone there. t/d-deletion (tha[t] cat) metathesis (ask/aks) speech rate, clarity of articulation

Many factors affect these choices. Let’s investigate.

Availability-based production

Not all production is priming! Many factors affect the choice of syntactic structure. A prominent perspective:

The Principle of Immediate Mention“Production proceeds more efficiently if syntactic structures are used that permit quickly selected lemmas to be mentioned as soon as possible.” (p. 299)

Ferreira & Dell 2000

Availability-based production

Not all production is priming! Many factors affect the choice of syntactic structure. A prominent perspective:

The Principle of Immediate Mention“Production proceeds more efficiently if syntactic structures are used that permit quickly selected lemmas to be mentioned as soon as possible.” (p. 299)

Ferreira & Dell 2000

…because it allows for working memory to not get cluttered with pieces of information speakers are waiting to produce

What determines speed of lemma selection? Ease of retrieval (accessibility)

• imageability

• concreteness

• frequency

• predictability

• prior mention

• animacy

An alternative/addition to availability-based

production: Uniform Information Density

Communicating through a noisy channel

transmitter receiver

Assuming language is an instance of communication through a noisy channel: information density is optimized near the channel capacity, where speakers maximize the rate of information transmission while minimizing the danger of a mistransmitted message.

Shannon 1949, Levy & Jaeger 2007

Communicating through a noisy channel

Speakers should provide more redundancy in linguistic signal when message is less inferable.

transmitter receiver

Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.

Levy & Jaeger 2017; Jaeger 2010

Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.

Levy & Jaeger 2017; Jaeger 2010

…because it allows for efficient communication, minimizing effort and error on the listener’s side

Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.

Levy & Jaeger 20017; Jaeger 2010

information per unit time

…because it allows for efficient communication, minimizing effort and error on the listener’s side

Efficient morpho-syntactic production Frank & Jaeger 2008

Information content of NOT

Pres.Clintondidn’t/nothave…

more surprising/less predictable

mor

e lik

ely

contracted full

n’t: not:

-> less signal -> more signal

Estimating the information carried by a contractible element

Clinton did NOT have…w2 w1 w

I(NOT|context)

= � log p(NOT|context)

Definition of Shannon information

Frank & Jaeger 2008

Estimating the information carried by a contractible element

Clinton did NOT have…w2 w1 w

I(NOT|context)

= � log p(NOT|context)

⇡ � log p(NOT|“Clinton did”)

Definition of Shannon information

Frank & Jaeger 2008

Estimating the information carried by a contractible element

Clinton did NOT have…w2 w1 w

I(NOT|context)

= � log p(NOT|context)

⇡ � log p(NOT|“Clinton did”)

= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]

Definition of Shannon information

Frank & Jaeger 2008

Estimating the information carried by a contractible element

Clinton did NOT have…w2 w1 w

I(NOT|context)

= � log p(NOT|context)

⇡ � log p(NOT|“Clinton did”)

= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]

Definition of Shannon information

Used trigram model to estimate probability

Frank & Jaeger 2008

Estimating the information carried by a contractible element

Clinton did NOT have…w2 w1 w

I(NOT|context)

= � log p(NOT|context)

⇡ � log p(NOT|“Clinton did”)

= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]

Definition of Shannon information

Used trigram model to estimate probability

same as surprisal

Frank & Jaeger 2008

Surprisal

Certain events (P=1): 0 information (unsurprising) Impossible events (P=0): infinite information (highly surprising)

0

2

4

6

0.00 0.25 0.50 0.75 1.00Probability

Surprisal

MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms

Surprisal

Certain events (P=1): 0 information (unsurprising) Impossible events (P=0): infinite information (highly surprising)

0

2

4

6

0.00 0.25 0.50 0.75 1.00Probability

Surprisal

MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms

1 bit for equi-probable events

Efficient morpho-syntactic production Frank & Jaeger 2008

Pres.Clintondidn’t/nothave…

Replicated for{WAS,WERE, AM, ARE, IS, WILL} {HAD, HAS, HAVE}

The longer (full) form is more likely to be used the more surprising the contractible element is

Back to complement clausesJaeger 2010

My boss confirmed we were absolutely crazy

My boss confirmed that we were absolutely crazy

;

Can we predict complementizer omission?

POLL

Back to complement clausesJaeger 2010

My boss confirmed we were absolutely crazy

My boss confirmed that we were absolutely crazy

;

Can we predict complementizer omission?

POLL

Back to complement clausesJaeger 2010

My boss confirmed we were absolutely crazy

My boss confirmed that we were absolutely crazy

;

Can we predict complementizer omission?

POLL

Back to complement clausesJaeger 2010

My boss confirmed we were absolutely crazy

My boss confirmed that we were absolutely crazy

;

complement clause with null onset

Can we predict complementizer omission?

POLL

Back to complement clausesJaeger 2010

My boss confirmed we were absolutely crazy

My boss confirmed that we were absolutely crazy

;

complement clause with null onset

complement clause with complementizer

Can we predict complementizer omission?

POLL

I(CC) = � log[p(;CC|context) + p(“that” CC|context)]

ResultsThe complementizer is more likely to be used, the more surprising the complement clause is

Back to Availability-based productionBock & Warren 1985

What would UID predict for the choice between the prepositional and dative object structure?

Back to Availability-based productionBock & Warren 1985

BREAKOUT SESSION

What would UID predict for the choice between the prepositional and dative object structure?

Back to Availability-based productionBock & Warren 1985

What would UID predict for the choice between the prepositional and dative object structure?

Looking back: probability in different areas of language

processingThink back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an

important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there

generalizations you can draw about the role of probability in language processing?

Looking back: probability in different areas of language

processingBREAKOUT SESSION

Think back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an

important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there

generalizations you can draw about the role of probability in language processing?

Looking back: probability in different areas of language

processingThink back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an

important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there

generalizations you can draw about the role of probability in language processing?

Brains as prediction machines engaged in error minimization

Clark 2013

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

prediction

guides

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

integration

guides

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

integration

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

integration

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

ERROR SIGNAL

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

integration

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

updates

ERROR SIGNAL

ERROR SIGNAL

Brains as prediction machines engaged in error minimization

Clark 2013

top-down information

contains

bottom-up information

contains

prediction

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

predicts

integration

guides

linguistic unit phoneme, eg /b/ word, eg task structure, eg DO

yields

updates

goal: minimize

Utility of error minimization

• faster, less resource-intensive processing (lower surprisal)

• more accurate processing

Summary• Languages provide flexibility to allow speakers to

avoid suspension of speech.

• Speakers take advantage of this flexibility by ordering and timing linguistic material in ways that allow for efficient information transfer.

• Probabilistic/statistical information fundamentally guides language acquisition, comprehension, and production

Next time

Pragmatics: perspective-taking in

language comprehension