Prediction in language comprehension: theory and case studies

Prediction in language comprehension: theory and case studies

Roger LevyUC San Diego

Saarland University13 May 2013

1Friday, May 17, 13

Talk outline• Ambiguity resolution & prediction in sentence processing• Probabilistic grammars and surprisal as a theory unifying

the two• Application to garden-pathing• Application to predictability-based facilitation (without

ambiguity)• Rethinking syntactic complexity with surprisal:

discontinuous constituency• Surprisal as a quantitative theory of processing difficulty• Cloze probability and the relationship between linguistic

experience and prediction

2

2Friday, May 17, 13

Theories of sentence comprehension

3

3Friday, May 17, 13

Theories of sentence comprehension• Desiderata for a satisfactory theory of sentence

comprehension:• Robustness to arbitrary input• Accurate disambiguation• Inference on the basis of incomplete input (incrementality)• Processing difficulty is differential and localized

3

3Friday, May 17, 13



3

Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand

3Friday, May 17, 13



• Today I will focus on the relationship of the last of these desiderata to the rest

3

Not all sentences are equally easy to understand, nor are all parts of a given sentence are equally easy to understand

3Friday, May 17, 13

Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is

often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong

bet about a local ambiguity and pays for it later

• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity

4

Mary punished the children of the musician who...

This is the malt that the rat that the cat that the dog worried killed ate.

4Friday, May 17, 13

Ambiguity and syntactic complexity• In sentence processing research, differential difficulty is

often attributed to two major sources:• Ambiguity resolution: a comprehender makes the wrong

bet about a local ambiguity and pays for it later

• Syntactic complexity: some part of an utterance is difficult in the absence of major sources of ambiguity

4

Mary punished the children of the musician who...were

This is the malt that the rat that the cat that the dog worried killed ate.

4Friday, May 17, 13

Incrementality and Rationality• Online sentence comprehension is hard• But lots of information sources can be usefully brought to

bear to help with the task• Therefore, it would be rational for people to use all the

information available, whenever possible• This is what incrementality is• We have lots of evidence that people do this often

“Put the apple on the towel in the box.” (Tanenhaus et al., 1995, Science)

5Friday, May 17, 13

Anatomy of ye olde garden path sentence

6Friday, May 17, 13

Anatomy of ye olde garden path sentence• Classic example of incrementality in comprehension

The horse raced past the barn fell.

6Friday, May 17, 13



NP VP

S“Main Verb”

6Friday, May 17, 13



“Main Verb”

6Friday, May 17, 13



NP VP

S“Main Verb” “Reduced Relative”

6Friday, May 17, 13



NP VP


NP VP

S

6Friday, May 17, 13



NP VP


that was

NP VP

S

6Friday, May 17, 13



NP VP


NP VP

S

6Friday, May 17, 13



(The evidence examined by the lawyer was unreliable.)

NP VP


NP VP

S

6Friday, May 17, 13



NP VP


NP VP

S

6Friday, May 17, 13



• People fail to understand it most of the time

NP VP


NP VP

S

6Friday, May 17, 13



• People fail to understand it most of the time• People are likely to misunderstand it—e.g.,

• “What’s a barn fell?”• The horse that raced past the barn fell• The horse raced past the barn and fell

NP VP


NP VP

S

6Friday, May 17, 13

• Enter probabilistic grammars from computational linguistics...

7Friday, May 17, 13

a man arrived yesterday

8Friday, May 17, 13

a man arrived yesterday0.3 S → S CC S 0.15 VP → VBD ADVP0.7 S → NP VP 0.4 ADVP → RB0.35 NP → DT NN ...

8Friday, May 17, 13


8Friday, May 17, 13


8Friday, May 17, 13


0.7

0.150.35

0.40.3 0.03 0.02

0.07

Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.85×10-7

8Friday, May 17, 13

Probabilistic theories of ambiguity resolution

9Friday, May 17, 13

Probabilistic theories of ambiguity resolution• Jurafsky (1996) introduced probabilistic grammars from

computational linguistics into psycholinguistics

9Friday, May 17, 13


computational linguistics into psycholinguistics• For the horse raced past the barn, assume 2 incremental

parses:

9Friday, May 17, 13



parses:

9Friday, May 17, 13



parses:

• Jurafsky 1996 estimated the probability ratio of these parses as 82:1

9Friday, May 17, 13



parses:

• Jurafsky 1996 estimated the probability ratio of these parses as 82:1

• He proposed that the main-verb analysis “falls off the beam”

9Friday, May 17, 13

Quantifying probabilistic online processing difficulty

10Friday, May 17, 13


• Let a word’s difficulty be its surprisal given its context:

(Hale, 2001, NAACL; Levy, 2008, Cognition)10Friday, May 17, 13



• Captures the expectation intuition: the more we expect an event, the easier it is to process




• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines!




• Captures the expectation intuition: the more we expect an event, the easier it is to process• Brains are prediction engines! my brother came inside to…





(Hale, 2001, NAACL; Levy, 2008, Cognition)

chat?






chat? wash?






chat? wash? get warm?





the children went outside to…









play







• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)


play







• Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980)

• Combine with probabilistic grammars to give grammatical expectations (Hale, 2001, NAACL; Levy, 2008, Cognition)

play



The surprisal graph

0

1.0000

2.0000

3.0000

4.0000

0 0.3 0.5 0.8 1.0

Surp

risal

(-lo

g P)

Probability11Friday, May 17, 13

Garden-pathing and surprisal

When the dog scratched the vet and his new assistant removed the muzzle.

(Frazier & Rayner, 1982)12Friday, May 17, 13

Garden-pathing and surprisal• Here’s another type of local syntactic ambiguity














difficulty here(68ms/char)



• Compare with:


When the dog scratched, the vet and his new assistant removed the muzzle.

When the dog scratched its owner the vet and his new assistant removed the muzzle.




• Compare with:







• Compare with:







• Compare with:







• Compare with:





easier(50ms/char)


A small PCFG for this sentence type

S → SBAR S 0.3 Conj → and 1 Adj → new 1S → NP VP 0.7 Det → the 0.8 VP → V NP 0.5SBAR → COMPL S 0.3 Det → its 0.1 VP → V 0.5SBAR → COMPL S COMMA 0.7 Det → his 0.1 V → scratched 0.25COMPL → When 1 N → dog 0.2 V → removed 0.25NP → Det N 0.6 N → vet 0.2 V → arrived 0.5NP → Det Adj N 0.2 N → assistant 0.2 COMMA → , 1NP → NP Conj NP 0.2 N → muzzle 0.2

N → owner 0.2

(analysis in Levy, 2011)13Friday, May 17, 13

A small PCFG for this sentence type


N → owner 0.2

(analysis in Levy, 2011)13Friday, May 17, 13

Two incremental trees

14


Two incremental trees• “Garden-path” analysis:

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

14



S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

14



S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.826

14



• Ultimately-correct analysisS

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.826

14




SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14




SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14

removed?




SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

P (T |w1...10) = 0.826

14removed?

removed?




SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

Disambiguating word probability marginalizes over incremental trees:

P (T |w1...10) = 0.826

14




SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

S

NP VP

V

P (T |w1...10) = 0.174

Disambiguating word probability marginalizes over incremental trees:

P (removed|w1...10) =∑

T

P (removed|T )P (T |w1...10)

= 0.826× 0 + 0.174× 0.25P (T |w1...10) = 0.826

14


Preceding context can disambiguate• “its owner” takes up the object slot of scratched

S

SBAR

COMPL

When

S

NP

Det

the

N

dog

VP

V

scratched

NP

Det

its

N

owner

S

NP

NP

Det

the

N

vet

Conj

and

NP

Det

his

Adj

new

N

assistant

VP

V

Condition Surprisal at ResolutionNP absent 4.2NP present 2


Sensitivity to verb argument structure• A superficially similar example:

When the dog arrived the vet and his new assistant removed the muzzle.

(Staub, 2007)16Friday, May 17, 13



Easier here

(Staub, 2007)16Friday, May 17, 13



Easier here

(Staub, 2007)

But harder here!




(c.f. When the dog scratched the vet and his new assistant removed the muzzle.)

Easier here

(Staub, 2007)

But harder here!



N → owner 0.2

Modeling argument-structure sensitivity



N → owner 0.2




N → owner 0.2


• The “context-free” assumption doesn’t preclude relaxing probabilistic locality:

(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13


N → owner 0.2



(Johnson, 1999; Klein & Manning, 2003)17Friday, May 17, 13


N → owner 0.2



(Johnson, 1999; Klein & Manning, 2003)

VP → V NP 0.5 VP → Vtrans NP 0.45

VP → V 0.5Replaced by

⇒

VP → Vtrans 0.05

V → scratched 0.25 VP → Vintrans 0.45

V → removed 0.25 VP → Vintrans NP 0.05

V → arrived 0.5 Vtrans → scratched 0.5

Vtrans → removed 0.5

Vintrans → arrived 1


Result



ambiguity onset ambiguity resolution

Transitivity-distinguishing PCFGCondition Ambiguity onset ResolutionIntransitive (arrived) 2.11 3.20Transitive (scratched) 0.44 8.04


Move to broad coverage

• Instead of the pedagogical grammar, a “broad-coverage” grammar from the parsed Brown corpus (11,984 rules)

• Relative-frequency estimation of rule probabilities (“vanilla” PCFG)

−60

−40

−20

020

4060

Tran

sitiv

e R

T −

Intra

nsiti

ve R

T

the vet and his new assistant removed the muzzle.

10.

50

−0.5

−1Tr

ansi

tive

surp

risal

− In

trans

itive

sur

pris

al (b

its)

First−pass timeSurprisal


Surprisal and syntactic expectations without ambiguity

• Let’s consider the variation in pre-verbal dependency structure found in German

20

Die Einsicht, dass der Freund The insight, that the.NOM friend

dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic

verkaufte, erheiterte die Anderen.sold, amused the others.

(Konieczny & Doering, 2003)20Friday, May 17, 13

Surprisal and syntactic expectations without ambiguity

• Let’s consider the variation in pre-verbal dependency structure found in German

20

Die Einsicht, dass der Freund The insight, that the.NOM friend

dem Kunden das Auto aus Plastik the.DAT client the.ACC car of plastic

verkaufte, erheiterte die Anderen.sold, amused the others.

(Konieczny & Doering, 2003)20Friday, May 17, 13

What happens in German final-verb processing?

...daß der Freund DEM Kunden das Auto verkaufte

...that the friend the client the car sold

‘...that the friend sold the client a car...’

(Konieczny & Döring 2003)21Friday, May 17, 13




















...daß der Freund DES Kunden das Auto verkaufte


‘...that the friend of the client sold a car...’






























What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?









What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?









What does reducing the number of dependencies (changing dem→des) do to processing at the final verb?Make it easier because the dependency structure is simpler?No: it makes it harder!






daß

daß


daß

daß

SBAR

COMP

SBAR

COMP


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

S

NPnom

S

NPnom


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

DEM Kunden

DES Kunden

S

NPnom

S

NPnom


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

DEM Kunden

DES Kunden

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbdaß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

V

V

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Next:NPnom

NPacc

NPdat

PPADVPVerb

Next:NPnom

NPacc

NPdat

PPADVPVerbverkaufte

verkaufte

V

V

daß

daß

SBAR

COMP

SBAR

COMP

der Freund

der Freund

das Auto

das Auto

DEM Kunden

DES Kunden

NPacc

NPacc

VP

S

NPnom

S

NPnom

VP

NPdat

NPnom

NPgen


Model results

Reading time (ms)

P(wi): word probability

Locality-based predictions

dem Kunden(dative)

555 8.38×10-8 slower

des Kunden(genitive)

793 6.35×10-8 faster

~30% greater expectation in dative condition

once again, wrong monotonicity


Case study: discontinuous dependencies• Most word-word dependencies in most sentences of most

languages are projective• Formally, a set of word-word dependencies is projective iff

they do not cross

• However, sometimes dependencies are non-projective or discontinuous--that is, they cross

24(Levy, Fedorenko, Breen, and Gibson, 2012)


Rethinking locality: RC extraposition

(Levy, Fedorenko, Breen, & Gibson, 2012)25Friday, May 17, 13

• Equipped with surprisal, let’s consider the case of discontinuous dependencies




• Example: Levy et al. (2012) found found consistent difficulty effects induced by RC extraposition










easy





easy

hard





easy

hard





easy

hard





easy

hard




• Is this evidence for a special type of locality: a phrasal adjacency constraint (or a constraint against crossing dependencies)?


easy

hard


Probability & extraposition


Probability & extraposition• ButR C extraposition is relatively rare in English





In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)



In situ: PVP(RC|NP)=0.06 Extraposed: PVP(RC|NP,PP)=0.003(estimated from the parsed Brown corpus)

• Alternative hypothesis: processing extraposed RCs is hard because they’re unexpected


Testing the role of expectations


Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…


Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier


Testing the role of expectations• If extraposed RCs are hard because they’re unexpected…• …then making them more expected should make them easier• Work by Wasow, Jaeger, and colleagues (Wasow et al., 2005, Levy &

Jaeger 2007) has found that premodifier type can affect expectation for (in-situ) RCsa barber… low RC expectationthe barber… higher RC expectationthe only barber… very high RC expectation




• If premodifier-induced expectations are carried over past the continuous NP domain, we may be able to manipulate extraposed RC expectations the same way*









RC less expected





RC less expectedRC more expected


Experimental design

Levy, Fedorenko, Breen, & Gibson (2012)


Experimental design• We crossed RC expectation (low/high) with RC extraposition

(extraposed/unextraposed)




(extraposed/unextraposed)• Example sentence: The chairman consulted…

































• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition





• Our prediction is an interactive effect: high RC expectation (“only those”) will facilitate RC reading, but only in the extraposed condition

• We tested this in a self-paced reading study



Experimental results



• We see the interaction!



• We see the interaction!• When an RC is less

expected, the extraposed variant (executives←) is harder





penalty





• When more expected, it’s not

penalty






penalty

no penalty






• Alternatively, we can think of expectation as facilitating processing for extraposed variant

penalty

no penalty







penalty

no penaltyfacilitation







penalty

no penaltyfacilitation

Interaction p’s ≤ 0.025


Experiment: Discussion• Increasing the expectation for an RC facilitates the

processing of extraposed RCs• True even though the extraposed RC is outside of the

continuous-constituent NP domain • The first real evidence that syntactic prediction is

extended beyond the domain of continuous constituents

• Why are (some kinds of) discontinuous constituents hard?• One possibility: locality & phrasal adjacency constraints• New possibility: driven by probabilistic expectations


Surprisal vs. predictability in general

• But is there evidence for surprisal as the specific function relating probability to processing difficulty?

31(Smith & Levy, in press)31Friday, May 17, 13

Surprisal vs. predictability in general

• But is there evidence for surprisal as the specific function relating probability to processing difficulty?

31(Smith & Levy, in press)31Friday, May 17, 13

Proposed probability-time relationships• Linear?• Assumed by major models of eye movement control in

reading (Reichle, Pollatsek, Fisher, & Rayner, 1998; Engbert, Nuthmann, Richter, & Kliegl, 2005)• Predicted by simple “guessing” theories

32


(Smith & Levy, in press)32Friday, May 17, 13



32





32

the children went outside to… ...play?




32





32


(Smith & Levy, in press)

...play

(90% of the time)




32



...play

(90% of the time)




32



...play

(90% of the time)




32



...play

(90% of the time)

...play

(10% of the time)




32



...play

(90% of the time)

...play

(10% of the time)


Proposed probability-time relationships• Logarithmic? • Theory 1: Optimal perceptual discrimination (“what is this

word?”; Stone, 1960; Laming, 1968; Norris, 2006)

33

# time steps elapsed

Post

erio

r pro

babi

lity

of c

orre

ct w

ord

0.0

0.2

0.4

0.6

0.8

1.0 Decision Threshold

1 bit

3 bits

5 bits7 bits

0 200 400 600 800 1000 1200


Theories of word-time relationship• Logarithmic?

• Theory 1: Optimal discrimination: “what is this word?” (Stone, 1960; Laming, 1968; Norris, 2006)

# time steps elapsed

Post

erio

r pro

babi

lity

of c

orre

ct w

ord

0.0

0.2

0.4

0.6

0.8

1.0 Decision Threshold

1 bit

3 bits

5 bits7 bits

0 200 400 600 800 1000 1200

Word surprisals


Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in

press)

35


Theories of word-time relationship• Logarithmic?• Theory 2: highly incremental processing (Smith & Levy, in

press)

36


Other proposed probability-time rel’nships

37



37



37


Estimating probability/time curve shape


Estimating probability/time curve shape• As a proxy for “processing difficulty,” reading time in two

different methods: self-paced reading & eye-tracking



different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but

probability correlated with confounding variables



different methods: self-paced reading & eye-tracking• Challenge: we need big data to estimate curve shape, but

probability correlated with confounding variables

(5K words) (50K words)


Estimating probability/time curve shape• GAM regression:

total contribution of word (trigram) probability to RT near-linear over 6 orders of magnitude!

(Smith & Levy, in press)at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

Reading times in self-paced reading

Gaze durations in eye-tracking


Implications for different theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for different theories• Not good for guessing theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for different theories• Not good for guessing theories

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for different theories• Not good for guessing theories• Not good for the reciprocal theory• Not good for the super-logarithmic

theory of UID• But UID could still be rescued by an

“optimal alignment with the speaker” view

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)





40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)





• Good for theories based on:• Optimal perceptual discrimination• Highly incremental processing

40 at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)

at

orig

Tota

l am

ount

of s

low

dow

n (m

s)0

2040

6080

10−6 10−5 10−4 10−3 10−2 10−1 1P(word |context)


Implications for predictability norming

41


Implications for higher-level processing• I discussed this test of surprisal in terms of lexical

predictability• But let’s revisit syntactic expectations

• We argued that extraposed who was difficult because of syntactic expectations

• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising

42


Implications for higher-level processing• I discussed this test of surprisal in terms of lexical

predictability• But let’s revisit syntactic expectations

• We argued that extraposed who was difficult because of syntactic expectations

• Lexically, who is “unpredictable” in both cases, but extraposed who is many bits more surprising

42


Final case study: Cloze and linguistic experience

43


(Smith & Levy, 2011)43Friday, May 17, 13


43

the children went outside to… play



43

the children went outside to… playeat



43

the children went outside to… playeat

play



43


play

playeat

play



43


play

playeat

play


Cloze and linguistic experience

44


Cloze and linguistic experience

44


Cloze and linguistic experience• To understand the relationship among these, we want to

compare “ground truth” corpus probabilities to Cloze continuations

44




• Google Web n-grams

44




• Google Web n-grams• Google Books n-grams

44


Example contexts & method• Collect lots of completions to different contexts

• Fit a multivariate model to predict the completions

45

In the winter and ______It was no great ______He played a key ______The time needed to ______


Results

46


Results

47


What predicts reading times?• We ran a self-paced reading study with the same

materials• Cloze probabilities significantly predicted target word RTs• Corpus probabilities did not

48




48




48




48




48




48


General summary• Probabilistic grammars and surprisal theory unify

ambiguity resolution and prediction• Prediction takes into account rich syntactic & semantic

contexts• Striking quantitative support for surprisal as the right

index of incremental processing difficulty• Surprisal unifies grammatical expectations and lexical

predictability• The relationship between measured estimates of

linguistic experience and human prediction is non-trivial

49


Acknowledgments• Collaborators:

• Nathaniel Smith• Evelina Fedorenko• Mara Breen• Ted Gibson

• Funding• National Science Foundation• National Institutes of Health (NICHD)• Alfred P. Sloan Foundation

• UCSD Computational Psycholinguistics Lab

50


Thank you!

http://idiom.ucsd.edu/~rlevy

http://grammar.ucsd.edu/cpl51Friday, May 17, 13

Documents

Prediction in language comprehension: theory and case studies